4.7 Specifying Performance Objectives


Users of IT services are generally not concerned with metrics such as CPU utilization, memory contention, network bandwidth, failure rates, router uptime, and other indicators of system performance. On the contrary, users tend to be more interested in metrics related to quality of the services (QoS) provided by a system. QoS is indicated by specific objectives. End-users want to know how well the system is performing and if they can get their work done on time. Users perceive system services through performance metrics such as response time, availability, reliability, security, and cost. Expected service levels govern the relationship between users and systems.

Performance objectives should be stated precisely. In the process of analyzing system requirements, one should avoid vague statements such as the following:

  • The system response time should be satisfactory to end users.

  • The system availability should be adequate.

  • The system should be as reliable as possible.

In contrast, performance goals should be stated in a simple and precise manner, such as:

  • The system throughput should be greater than 1,000 query transactions per second, with at least 90% of the transactions responding in less than 2 sec.

  • The application server should have an availability of 99.9% during the business hours of weekdays.

  • The response time of the patient information system should not exceed one second for local users.

  • The mail server must be able to process at least 70,000 messages per day and 90% of the outgoing messages should not wait more than 2 minutes to be delivered.

4.7.1 Specifying a Service Level Agreement

Service Level Agreements (SLA) require IT to work with its end-users to define a list of services and their quality attributes, such as response time, availability, reliability, time-to-repair, and cost. The values of SLAs are specific to each organization and are determined by both management and users. As shown in Fig. 4.6, an SLA is useful to manage IT services in various ways: 1) planning, by determining what the service levels need to be, and 2) assurance, by monitoring the service levels to ensure that they meet the specified requirements and by identifying problems when a service level is not met. Performance engineering is a methodology to support and manage service level agreements.

Figure 4.6. Performance Engineering process and SLAs.

graphics/04fig06.gif

SLAs take different forms, depending on the nature of the IT service. For instance, SLAs can be divided into different performance levels, such as basic, enhanced, and premium. Each category typically incurs a different cost [10]. Users can then have a clear understanding of the resource and cost tradeoffs involved in designing, implementing, and operating an IT system [10].

Adequate capacity and infrastructure are provided so that acceptable or desirable values for performance metrics can be achieved. Before deciding on a particular service level for a system, one should assess the current level of service provided by the existing systems. After assessing the current level of service, management faces the following question: Do the service levels need to be changed? If they are unsatisfactory or expensive to the users or if they have to be changed as a function of business goals, new service levels should be established. The following are some helpful criteria to establish new service levels.

  • Cost x benefit. Service levels depend on both the workload and on the system configuration. Improving service levels usually implies an expansion of system capacity. This translates into cost. A trade-off analysis between benefits and cost should be carried out. For instance, response time is often reported to be inversely proportional to user productivity. The shorter the response time, the higher the productivity. However, it is reasonable to ask whether productivity gains compensate for the additional cost incurred to reduce response time. The better the service level, the higher the system cost. Managers and users should discuss performance goals in light of the cost of providing IT services.

  • Nature of the application. To be acceptable to customers, service applications such as point-of-sale and airline reservation systems must provide fast response times. An airline reservation system may guarantee that 90% of the transactions should respond in 3 sec or less, with an average of 1.5 sec. An airline company may not afford to leave a reservation agent or an airport agent, who has to deal with passengers, waiting much longer than 3 sec [9]. Real-time applications that deal with customers, control manufacturing, and other critical systems must respond to requests within specified time limits.

  • Past performance goals. Performance levels attained in the past can help IT management and users reach an agreement for future service levels. Users establish expectations based on their past experience about the time required to complete a given task.

4.7.2 Specifying Response Time

Response time is a critical factor to users of interactive systems. People interact so often with computer-based systems, everyone agrees that response time is an important determinant of personal productivity, error rates, and satisfaction [21]. It is evident that user satisfaction increases as response time shortens. Modest variations around the average response time are acceptable, but large variations may affect user behavior. A frequently asked question is: How should one set appropriate response time limits for a given system? The answer depends on how system response time affects user's performance. Adequate response time limits for a specific system and a user community can be determined by measuring the impact on user's productivity and by estimating the cost of providing improved response times. Guideline about computer response time and user behavior exist and the behavior of human-computer interactions have been studied extensively [17, 18]. Regarding the response time of a system:

  • 0.1 second is about the limit when a user perceives that the system is reacting instantaneously.

  • 1.0 second is about the limit when the flow of thought of a user is not interrupted, although the user may notice the delay.

  • 10 seconds is about the limit when a user loses attention and the interaction with the system is disrupted.

4.7.3 Specifying Cost

The estimation of the cost of developing and delivering IT systems is an essential component of a performance engineering methodology. The system life cycle involves a number of design trade-offs with significant cost and performance implications. For example, the choice of thin clients versus fat clients to execute networked applications, the choice of the number of architectural layers of a system, and the choice of the number and capacity of servers of a datacenter affect both the performance and the cost of a system [3].

Without cost estimates, it is meaningless to discuss service level and performance objectives for a specific business. Managers need to calculate cost and estimate benefits to make good IT decisions. For example, to specify the availability goals for a credit card authorization system or a catalog sales center, one has to know the cost of a minute of downtime. Depending on the nature of the application, the average cost of one minute of downtime can be as high as $100,000 in the case of systems that support brokerage operations [19].

The Total Cost of Ownership (TCO) [20] model has been applied to evaluate the cost of a system. It means the total cost of owing a given system over some time horizon (e.g., five-year period). TCO intends to identify and measure elements of IT expenses beyond the initial cost of implementing a system. The most significant contributing items to TCO are:

  • Hardware costs, including acquisition expenses or leasing expenses of equipment, such as servers, storage boxes, and connectivity components.

  • Software costs, including personal productivity software, applications, database management, transaction monitors, intrusion detection, performance monitoring, and operating system software.

  • Communication costs, including leased lines and communication services access.

  • Management costs, including network management, systems storage, maintenance, and outsourced services.

  • Support costs, including support services, support personnel training, end-user training, and help desk services.

  • Facilities costs, including leasing of physical space, air conditioning, power, and physical security.

  • Downtime costs, including both the cost of lost productivity of employees and the cost of lost income from missed business [19].

Measuring the Return on Investment (ROI) is also critical to evaluate the cost and benefits of a system. To measure the success of the investment of information technology two factors are considered: the improvement in the quality of service (e.g., user satisfaction and system use) and the return on investment (e.g., increased productivity and organizational impact). Thus, ROI methodologies are useful for analyzing the cost/benefits of IT projects.

Based on the requirement analysis of a system, one should develop a cost model to understand the cost/benefits of the system. When different system choices are evaluated, one needs to predict how much additional computing and communication resources will be needed for each choice and how much these elements cost now and in the future.

Example 4.6.

Consider the call center system described in the motivating example. Management is planning to replace the database server with a powerful cluster of servers. Two brands of clusters are being compared, system Y and system Z. What factors should be considered in the process of choosing between the two systems?

First, IT management is considering the TCO model, instead of the pure purchase cost of the system. Thus, the basic cost of both systems includes hardware and software costs, hardware and software maintenance, operational personnel, and facilities costs for a three-year period. System Y costs $300,000 and system Z costs $350,000.

Second, IT management is looking at the performance attributes of both systems. Standard benchmarks indicate that the throughput of systems Y and Z are 220 tps and 230 tps with 90% of the transactions responding in less than 0.5 sec, respectively.

Third, IT management is evaluating other aspects of computing in their analysis. In addition to performance, management also consider the dependability of the systems [19]. Information obtained on the dependability of the two systems suggests an expected 38 hours of unavailability for system Y and 21 hours of downtime for system Z, over a period of three years.

The call center charges $5 per call from customers. The company estimates that in the next three years, the average rate of telephone calls into the call center will be 1,000/hour. Therefore, the estimated average cost per hour of downtime due to revenue loss is $5,000. The total cost of a system is calculated as:

graphics/115equ01.gif


Using the above expression, the total cost for a three-year period for the two systems is:

  • Cost of system Y = 300,000 + 38 x 1000 x 5 = $490,000

  • Cost of system Z = 350,000 + 21 x 1000 x 5 = $455,000

The above methodology helps management justify selecting system Z. Although Z is more expensive initially, it is more dependable, which makes its total cost more attractive.



Performance by Design. Computer Capacity Planning by Example
Performance by Design: Computer Capacity Planning By Example
ISBN: 0130906735
EAN: 2147483647
Year: 2003
Pages: 166

Similar book on Amazon

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net