Users of IT services are generally not concerned with metrics such as CPU utilization, memory contention, network bandwidth, failure rates, router uptime, and other indicators of system performance. On the contrary, users tend to be more interested in metrics related to quality of the services (QoS) provided by a system. QoS is indicated by specific objectives. End-users want to know how well the system is performing and if they can get their work done on time. Users perceive system services through performance metrics such as response time, availability, reliability, security, and cost. Expected service levels govern the relationship between users and systems.
Performance objectives should be stated precisely. In the process of analyzing system requirements, one should avoid vague statements such as the following:
In contrast, performance goals should be stated in a simple and precise manner, such as:
4.7.1 Specifying a Service Level Agreement
Service Level Agreements (SLA) require IT to work with its end-users to define a list of services and their quality attributes, such as response time, availability, reliability, time-to-repair, and cost. The values of SLAs are specific to each organization and are determined by both management and users. As shown in Fig. 4.6, an SLA is useful to manage IT services in various ways: 1) planning, by determining what the service levels need to be, and 2) assurance, by monitoring the service levels to ensure that they meet the specified requirements and by identifying problems when a service level is not met. Performance engineering is a methodology to support and manage service level agreements.
Figure 4.6. Performance Engineering process and SLAs.
SLAs take different forms, depending on the nature of the IT service. For instance, SLAs can be divided into different performance levels, such as basic, enhanced, and premium. Each category typically incurs a different cost . Users can then have a clear understanding of the resource and cost tradeoffs involved in designing, implementing, and operating an IT system .
Adequate capacity and infrastructure are provided so that acceptable or desirable values for performance metrics can be achieved. Before deciding on a particular service level for a system, one should assess the current level of service provided by the existing systems. After assessing the current level of service, management faces the following question: Do the service levels need to be changed? If they are unsatisfactory or expensive to the users or if they have to be changed as a function of business goals, new service levels should be established. The following are some helpful criteria to establish new service levels.
4.7.2 Specifying Response Time
Response time is a critical factor to users of interactive systems. People interact so often with computer-based systems, everyone agrees that response time is an important determinant of personal productivity, error rates, and satisfaction . It is evident that user satisfaction increases as response time shortens. Modest variations around the average response time are acceptable, but large variations may affect user behavior. A frequently asked question is: How should one set appropriate response time limits for a given system? The answer depends on how system response time affects user's performance. Adequate response time limits for a specific system and a user community can be determined by measuring the impact on user's productivity and by estimating the cost of providing improved response times. Guideline about computer response time and user behavior exist and the behavior of human-computer interactions have been studied extensively [17, 18]. Regarding the response time of a system:
4.7.3 Specifying Cost
The estimation of the cost of developing and delivering IT systems is an essential component of a performance engineering methodology. The system life cycle involves a number of design trade-offs with significant cost and performance implications. For example, the choice of thin clients versus fat clients to execute networked applications, the choice of the number of architectural layers of a system, and the choice of the number and capacity of servers of a datacenter affect both the performance and the cost of a system .
Without cost estimates, it is meaningless to discuss service level and performance objectives for a specific business. Managers need to calculate cost and estimate benefits to make good IT decisions. For example, to specify the availability goals for a credit card authorization system or a catalog sales center, one has to know the cost of a minute of downtime. Depending on the nature of the application, the average cost of one minute of downtime can be as high as $100,000 in the case of systems that support brokerage operations .
The Total Cost of Ownership (TCO)  model has been applied to evaluate the cost of a system. It means the total cost of owing a given system over some time horizon (e.g., five-year period). TCO intends to identify and measure elements of IT expenses beyond the initial cost of implementing a system. The most significant contributing items to TCO are:
Measuring the Return on Investment (ROI) is also critical to evaluate the cost and benefits of a system. To measure the success of the investment of information technology two factors are considered: the improvement in the quality of service (e.g., user satisfaction and system use) and the return on investment (e.g., increased productivity and organizational impact). Thus, ROI methodologies are useful for analyzing the cost/benefits of IT projects.
Based on the requirement analysis of a system, one should develop a cost model to understand the cost/benefits of the system. When different system choices are evaluated, one needs to predict how much additional computing and communication resources will be needed for each choice and how much these elements cost now and in the future.
Consider the call center system described in the motivating example. Management is planning to replace the database server with a powerful cluster of servers. Two brands of clusters are being compared, system Y and system Z. What factors should be considered in the process of choosing between the two systems?
First, IT management is considering the TCO model, instead of the pure purchase cost of the system. Thus, the basic cost of both systems includes hardware and software costs, hardware and software maintenance, operational personnel, and facilities costs for a three-year period. System Y costs $300,000 and system Z costs $350,000.
Second, IT management is looking at the performance attributes of both systems. Standard benchmarks indicate that the throughput of systems Y and Z are 220 tps and 230 tps with 90% of the transactions responding in less than 0.5 sec, respectively.
Third, IT management is evaluating other aspects of computing in their analysis. In addition to performance, management also consider the dependability of the systems . Information obtained on the dependability of the two systems suggests an expected 38 hours of unavailability for system Y and 21 hours of downtime for system Z, over a period of three years.
The call center charges $5 per call from customers. The company estimates that in the next three years, the average rate of telephone calls into the call center will be 1,000/hour. Therefore, the estimated average cost per hour of downtime due to revenue loss is $5,000. The total cost of a system is calculated as:
Using the above expression, the total cost for a three-year period for the two systems is:
The above methodology helps management justify selecting system Z. Although Z is more expensive initially, it is more dependable, which makes its total cost more attractive.