Quality of service is a broad term encompassing the means by which to predict and manage a variety of system-wide resources that are important to the runtime performance of an application. Since computing resources, such as network bandwidth, are limited, they must be managed so they are useful and predictable to those who use it.
Consider two perpendicular streets that meet at a four-way intersection. If the amount of traffic on each street is light, the flow of traffic on both streets is fast. As traffic increases on the two streets, the flow of traffic comes to a standstill as a result of uncontrolled gridlock at the intersection. As this happens, the amount of time needed to cross the intersection is unpredictable, and the two streets become unreliable and much less useful.
We can manage these resources (in this case, the two streets) by simply installing traffic lights at the intersection. Traffic lights control the flow of traffic and eliminate uncontrolled gridlock at the intersection. Based on the setting of the durations of the traffic lights, we can reliably predict that a minimum of n number of automobiles will flow through on the northbound segment of the street every m seconds. The traffic lights thus provide quality of service guarantees for the intersection, and guarantee that at least n automobiles can cross the intersection every m seconds (barring any accidents and other delays after the traffic signal).
We can also implement different levels of quality of service for different travelers traversing the same intersection. For instance, carpools automobiles with two or more occupants can be provided a different level of quality of service. Since the timing of the traffic lights cannot be made different for different automobiles, we can assign a separate lane for carpool traffic. Through this scheme we can guarantee that at least c carpool automobiles can cross the intersection every m seconds. The development of two levels of quality of service for the northbound-to-southbound lanes is illustrated in Figure 9-1.
Figure 9-1. Architecting a predictable flow of traffic through an intersection using quality of service to determine when automobiles are allowed to go through the intersection and when they must stop.
In this example, we were able to attain predictable and reliable flow of traffic through the intersection by simply managing the two streets by installing a traffic light at the intersection. The traffic light makes the intersection a more useful resource by making it have predictable and reliable properties.
QoS is most often associated with network resources and, in particular, network bandwidth. Mission-critical applications as well as multimedia systems vie for limited network resources to transmit packets. As more and more of these temporally-sensitive as well as standard applications are deployed, their growing bandwidth needs are outpacing available network bandwidth. The resulting congestion delays the delivery of packets traversing through the network. The delay affects all applications, but has a more profound effect on mission-critical and temporal applications, which must operate within time constraints and deadlines.
For many applications, the delays would be acceptable if they were predictable. For example, a movie application that requires the delivery of two packets per second works well if the network can provide such a guarantee. Suppose the network cannot accommodate two packets per second, but instead can only guarantee one packet per second. With this QoS guarantee, the movie application can still work quite well by simply buffering half of its required packets beforehand. That is, for a 60-second movie, if the application stores (buffers) 60 packets before it starts playing the movie, the movie can still show two packets every second. The only time the movie application cannot work well is when there are no guarantees on packet delivery whatsoever. With information about the worst-case available bandwidth or packet delay, the application can configure itself to reduce its picture quality, lessen its frame rate, or increase its buffer size so it matches the requirements of the data consumer with those of the data producer. Such a match makes efficient use of limited network bandwidth.
Although typically associated with network resources, QoS is equally applicable to other limited computing resources. Consider a Web server that locates and returns Web pages based on requests from client devices. The Web server can only service a finite number of requests every minute. Even if the network connecting the client device to the Web server is a private network with little traffic (and large bandwidth), the Web server's time to service the request will determine the packet delay (or response time) seen by the client application.
Scaling the number of Web servers available to service each request and balancing the request load over these multiple servers can improve the performance of the Web server. If the Web server cluster provides a predictable time in which a request is serviced, the client application can be optimized based on that information. For example, time-out variables can be set accordingly to minimize retries.
As the over-subscription of limited computing resources continues (and in fact increases), mechanisms are needed to manage these limited resources so that they are available in a predictable manner to potential users of the resources. Users can select or reject resources based on whether the resource's predictable properties meet the users needs. Once selected, users can optimize or configure themselves to best match the properties of the resource, thus efficiently using the resource.