IT systems touch people everywhere and every effort must be made to ensure that IT systems operate reliably and dependably so that they meet the needs of society and complement the capabilities of users .
This section discusses the following QoS attributes of an IT system: response time, throughput, availability, reliability, security, scalability, and extensibility.
1.2.1 Response Time
The time it takes a system to react to a human request is called the response time. An example is the time it takes for a page to appear in your browser with the results of a search of the catalog of your preferred online bookstore. The response time, usually measured in seconds, may be broken down into several components.
Figure 1.1 shows the three major components of the response time of a search request to an e-commerce site: browser time, network time, and server time. The browser time includes the processing and I/O time required to send the search request and display the result page. The network time component includes the time spent in the transmission from the browser to the user's Internet Service Provider (ISP), the time spent in the Internet, and the time spent in communication between the ISP at the e-commerce site and its server. The third component includes all the times involved in processing the request at the e-commerce site, all the I/O time, the networking time internal to the e-commerce site. Any of the three components include the time spent waiting to use various resources (processors, disks, and networks). This is called congestion time. The congestion time depends on the number of requests being processed by a system. The higher the number of requests in the system, the higher the congestion time. In this book we will learn how to compute the congestion time through the use of performance models.
Figure 1.1. Breakdown of response time.
The rate at which requests are completed from a computer system is called throughput and is measured in operations per unit time. The nature of the operation depends on the computer system in question. Examples of systems and corresponding typical throughput metrics are given in Table 1.1. When considering a throughput metric, one has to make sure that the operation in question is well-defined. For example, in an Online Transaction Processing (OLTP) system, throughput is generally measured in transactions per second (tps). However, transactions may vary significantly in nature and in the amount of resources they require from the OLTP system. So, in order for the throughput value to be meaningful, one has to characterize the type of transaction considered when reporting the throughput. In some cases, this characterization is done by referring to a well established industry benchmark. For example, the Transaction Processing Performance Council (TPC) defines a benchmark for OLTP systems, called TPC-C, that specifies a mix of transactions typical of an order-entry system. The throughput metric defined by the benchmark measures the number of orders that can be fully processed per minute and is expressed in tpm-C .
The throughput is a function of the load offered to a system and of the maximum capacity of a system to process work as illustrated in Example 1.1.
Assume that an I/O operation at a disk in an OLTP system takes 10 msec on average. If the disk is constantly busy (i.e., its utilization is 100%), then it will be executing I/O operations continuously at a rate of one I/O operation every 10 msec or 0.01 sec. So, the maximum throughput of the disk is 100 (= 1 / .01) I/Os per second. But if the rate at which I/O requests are submitted to the disk is less than 100 requests/sec, then its throughput will be equal to the rate at which requests are submitted. This leads to the expression
This is expression has to be qualified by the assumption that arriving requests do not "change their mind" if the system is busy, as happens routinely in Web sites.
As seen in the top curve of Fig. 1.2, throughput shows an almost linear increase at light loads and then saturates at its maximum value when one of the system resources achieves 100% utilization. However, in some cases, at high overall loads, throughput can actually decrease as the load increases further. This phenomenon is called thrashing, and its impact on throughput is depicted in the bottom curve of Fig. 1.2. An example of thrashing occurs when a computer system with insufficient main memory spends a significant amount of CPU cycles and I/O bandwidth to handle page faults as opposed to process the workload. This may occur because at high loads there are too many processes competing for a fixed amount of main memory. As each process gets less memory for its working set, the page fault rate increases significantly and the throughput decreases. The operating system continuously spends its time handling extra overhead operations (due to increased load), which diminishes the time the CPU can be allocated to processes. This increases the backlog even further, leading to a downward performance spiral that can cripple the system, in a way similar to a traffic jam.
Figure 1.2. Throughput vs. load.
An important consideration when evaluating computer systems is to determine the maximum effective throughput of that system and how to achieve it. More on this will be discussed in Chapter 3.
Imagine that you access an online bookstore and get as a result the page shown in Fig. 1.3. You are likely to become frustrated and may turn to another online bookstore to buy the book you are looking for. The consequences of system unavailability can be far more reaching than a loss of customers. The credibility and reputation of a company are vital. As mentioned by Schneider , service interruptions can even threaten lives and property.
Figure 1.3. Availability problems.
Availability is defined as the fraction of time that a system is up and available to its customers. For example, a system with 99.99% availability over a period of thirty days would be unavailable for
For many systems (e.g., an online bookstore), this level of unavailability would be considered excellent. However, for other systems (e.g., defense systems, 911 services), even 99.99% would be unacceptable.
The two main reasons for systems to be unavailable are failures and overloads. Failures may prevent users from accessing a computer system. For example, the network connection of a Web site may be down and no users may be able to send their requests for information. Alternatively, overloads occur when all components are operational but the system does not have enough resources to handle the magnitude of new incoming requests. This situation usually causes requests to be rejected. For instance, a Web server may refuse to open a new TCP connection if the maximum number of connections is reached.
Failures must be handled rapidly to avoid extended down times. The first step for failure handling is failure detection. Then, the causes of the failures must be found so that the proper resources (e.g., people and materiel) may be put in place to bring the system back to its normal operational state. Thus, failure handling comprises failure detection, failure diagnosis, and failure recovery.
One of the reasons for controlling and limiting the number of requests that are handled concurrently by an IT system is to guarantee good quality of service for the requests that are admitted. This is called admission control and is illustrated in Fig. 1.4, which shows two response time curves versus system load. If no admission control is used, response time tends to grow exponentially with the load. In the case of admission control, the number of requests within the system is limited so that response time does not exceed a certain threshold. This is accomplished at the expense of rejecting requests. Thus, while accepted requests experience an acceptable level of service, the reject ones may suffer very large delays to be admitted.
Figure 1.4. Impact of admission control on response time.
The reliability of a system is the probability that it functions properly and continuously over a fixed period of time . Reliability and availability are closely related concepts but are different. When the time period during which the reliability is computed becomes very large, the reliability tends to the availability.
Security is a combination of three basic attributes:
To enforce these properties, systems need to implement authentication mechanisms  to guarantee that each side in a message exchange is assured that the other is indeed the person they say they are. Most authentication mechanisms used to provide system security are based on one or more forms of encryption. Some encryption operations may be very expensive from the computational standpoint. The tradeoffs between security and performance have been studied in [6, 7, 9, 14].
A system is said to be scalable if its performance does not degrade significantly as the number of users, or equivalently, the load on the system increases. For example, the response time of system A in Fig. 1.5 increases in a non-linear fashion with the load, while that of system B exhibits a much more controlled growth. System A is not scalable while system B is.
Figure 1.5. Scalability.
Extensibility is the property of a system to easily evolve to cope with new functional and performance requirements. It is not uncommon for new functionalities to be required once a new system goes into production. Even a careful requirements analysis cannot necessarily uncover or anticipate all the needs of system users.
Changes in the environment in which the system has to operate (e.g., new laws and regulations, different business models) may require that the system evolve to adapt to new circumstances.