Section 9.2. Queueing | Optimizing Oracle Performance

9.2 Queueing

Computer applications are all about requesters that demand things and providers that supply them. Oracle performance analysis is all about the relationships between the suppliers and demanders, especially when competition for shared resources gets intense .

Queueing is what happens when a requester demands service from a resource that happens to be busy serving another request. Most of us queue every day for busy resources. It's simple: you wait until it's finally your turn . Different cultures engage different queue disciplines. Many cultures engage the egalitarian discipline of granting service in the same order as the arrivals occurred, which is called a first-come, first- served discipline. Other cultures engage a priority discipline in which, for example, the status of the request affects the order of service. Examples of other queue disciplines include: insiders-first, royalty-first, sharpest-elbows-first, and meekest-last.

"Queueing" Versus "Queuing"

All queueing theorists who write about queueing must decide about whether to spell the word with two occurrences of the letter "e" or just one. My word processor's spell-check tool originally informed me (as did many dictionaries) that "queueing" is supposed to be spelled without the extra "e", as queuing. However, the accepted standard in the field of queueing theory (see http://www2.uwindsor.ca/~hlynka/qfaq.html for details) is to spell the word as "queueing." Happily, this spelling is a prescribed alternate in both the Oxford English Dictionary and the Oxford American Dictionary .

After you've "waited your turn," then you receive the service you asked for, which of course takes a bit more time. Then you get out of the way, and the person that was behind you in the queue gets service for his request. People queue for things like dinner tables, tellers, ticket agents , elevators, freeways, and software support hotlines. Computer programs queue for resources like CPU, memory, disk I/O, network I/O, locks, and latches.

9.2.1 Queueing Economics

Queueing of course gives you the distinct feeling that you're wasting your time. One way to reduce your queuing time is for your service provider to upgrade the quality or number of the resources that you're using. With faster resources or more resources, or both, your time in the queue would decrease. But, of course, the people providing your service would typically pass the cost of those improved resources on to you through higher prices for their service. In the end, it's your decision where to set the economic tradeoff between faster service and cheaper service.

We optimize to economic and response time constraints every day of our lives. For example, many of us pay thousands of dollars to own an automobile. Although bicycles are much cheaper to own and operate than automobiles, we buy cars in part because they are so much faster, thus providing significantly better response times for our travels . (We Americans are of course famously prone to using automobiles even in circumstances in which using a bicycle would be not only cheaper, but actually faster.)

Once we own a car, we find that further optimizations are necessary. For routine errands, a car that goes 200 mph (about 325 km/h) is no more time-efficient than a car with a 60-mph top speed (about 100-km/h), because traffic laws and safety concerns constrain your velocity more than your car's performance limitations do. Consequently, even people with fast cars plan their errands so they won't have to compete with rush- hour traffic. Some of our optimization tactics reduce service time. Some of our optimization tactics reduce queueing delay. The best win-win for you and your users occurs when you can convert a minimal investment into a reduction in service time, queueing delay, or both.

9.2.2 Queueing Visualized

In Chapter 1 I explained that a sequence diagram is a convenient way to denote how a user action consumes time as it visits different layers in a technology stack. Figure 9-1 shows a sequence diagram for a system with one CPU and one disk. A user making a request of the system motivates the consumption of CPU and disk capacity, arranged in time as shown in the drawing. In a sequence diagram, each line represents the capacity of a resource through time. Each shaded block on a resource's timeline represents a request's use of that resource. The length of each shaded block is proportional to the amount of time consumed at the resource. Portions of a timeline that do not contain a shaded block represent idle time. Read a sequence diagram from top to bottom. A left-to-right arrow represents demand for service, and a right-to-left arrow represents supply. Response time is the duration that elapses from the initiation of a request until fulfillment of the request.

Figure 9-1. A sequence diagram is a convenient way to denote how an operation consumes time

In Figure 9-1, an application user makes a request for service of a CPU. In this drawing, the CPU is unoccupied at the time it receives the request, so CPU service begins immediately upon receipt. As the request consumes CPU time, the system computes that a disk request must be fulfilled in order to continue. The CPU issues a service demand of the disk. The disk is unoccupied at the time it receives the request, and so the disk service begins immediately upon receipt. Upon completion of the disk service request, the disk returns the desired result back to the CPU, which continues as before. After the third CPU request, the original user demand has been satisfied, and the CPU supplies the result back to the user.

The response time from the user's perspective is the time that elapses from the user's request until fulfillment of that request. Note that Figure 9-1 depicts several other response times as well. For example, the length of the first (i.e., topmost) shaded bar on the Disk timeline is the response time, from the CPU's perspective, of the first disk I/O call.

The sequence diagram is an especially useful tool for understanding the impact of competition for shared resources on a multitasking system. For example, Figure 9-2 shows why response time can degrade if we add workload onto the system shown in Figure 9-1. Requests for CPU service are fulfilled without delay on the system in its unloaded state (case a ).

Figure 9-2. Executing only one application function on an unloaded system leaves idle CPU capacity (case a). The presence of other workload (case b) results in fewer wasted CPU cycles, but at the expense of degraded response time for our original application function

When we add load to the system (case b ), some of our requests for CPU service must wait because the CPU is busy servicing other workload at the time of the request. Figure 9-2 shows two such queueing delays. The second request for CPU service after control returns from the disk must wait because the CPU is already occupied with the lighter-shaded workload element. And the third request for CPU service waits again for the same reason. The amount of total response time degradation from the system in its unloaded state (case a ) to the system in its loaded state (case b ) is precisely the total duration that our service requests have spent queued for a busy resource.

How much response time degradation can we expect to incur as we add load to a system? The tool that is designed to answer this important question and many others is called queueing theory .

Top