Section 9.3. Queueing Theory

9.3 Queueing Theory

Queueing theory is a branch of mathematics dedicated to explaining the behavior of queueing systems. The sequence diagram demonstrates a fundamental relationship of queueing theory:

R = S + W

Response time equals service time plus queueing delay. Service time is the amount of time spent actually consuming a requested resource, and queueing delay is the duration that a request spends waiting in a queue.

Figure 9-3 shows the R = S + W relationship in a graph. Response time on the vertical axis responds to changes in system utilization on the horizontal axis. As you have already seen in the sequence diagram example, service time remains constant for all system load levels. However, queueing delay degrades (that is, increases) exponentially in response to increases in workload. Adding the variant queueing delay to the constant service time for each possible system utilization value produces the famous response time curve that is shaped like an ice hockey stick (Figure 9-4).

Figure 9-3. The fundamental relationship of queueing theory: R = S + W. Service time (S) remains constant at all load levels; however, response time (R) degrades under high loads because queueing delay (the distance from S to R) degrades exponentially as utilization increases

Figure 9-4. A hockey stick

9.3.1 Model Input and Output Values

Queueing theory provides enormous value to the performance analyst by allowing us to predict system response times in hypothetical situations. Sensibly applied queueing models can reveal future performance problems very nicely , without incurring the cost of actually trying different system configurations. For example, if a CPU upgrade is destined not to improve your performance, it's a lot cheaper to figure it out in Excel than to learn by actually implementing and testing a hardware upgrade.

But perhaps even more important is how queueing theory structures our thinking about response time. It highlights the very important distinction between time spent working and time spent waiting. The competent use of queueing theory forces us to understand the interrelationships and sensitivities of various performance optimization parameters. It forces us to see more clearly what is relevant and what is not.

You have already seen the fundamental result of queueing theory: response time equals service time plus queueing delay, or R = S + W . You have seen that when response time degrades as loads increase, the degradation is due to changes in W , not changes in S . The following sections explain the meanings of the formulas that will enable you to predict the performance characteristics of a specified system configuration, whether that configuration exists yet or not. (The entire list of queueing theory formulas used in this book is printed in Appendix C.) I'll begin by explaining the input parameters that drive these formulas.

9.3.1.1 Arrivals and completions

Most of the things you need to know about queueing theory are very simple to understand. You can think of a queueing system as a black box that takes input, processes it, and produces output that presumably represents some improvement upon the input. The number of arrivals that come into the system is denoted as A . The number of completed requests that exit the system is denoted as C . For any stable queueing system, A = C . That is, in a stable system, everything that goes into the box comes out, and nothing comes out of the box that didn't go in. If we consider a specific time period called T , then the ratio l = A / T ( l is the Greek letter lambda) yields the system's arrival rate , and X = C / T yields the system's completion rate (or throughput ). The mean interarrival time t = T / A = 1/ l ( t is the Greek letter tau) is the average duration between adjacent arrivals into a system. Figure 9-5 shows the relationships among A , T , l , and t

Figure 9-5. This two-second interval illustrates the relationships among the fundamental parameters A, T, l, and t, which describe a queueing system's arrivals

9.3.1.2 Service channels, utilization, and stability

Inside a queueing system ”the "black box" that I mentioned before ”there can be one or more service channels . Each service channel operates independently of other service channels to provide service in response to requests arriving at a given queue. For example, a symmetric multiprocessing (SMP) computer system with 8 CPUs servicing a single CPU run queue can be modeled as a single system with eight parallel service channels. The number of parallel service channels inside the system is denoted as m , c , and s in various texts . In this text, I have chosen to use m as in [Kleinrock (1975)].

The Motive for the Greek Letters

One aspect of queueing theory that makes the field inaccessible to a lot of Oracle performance analysts is that the formulas look really difficult. It doesn't help that a lot of the basic concepts are expressed in Greek letters. Many colleagues asked me during the preparation of this text to consider converting each Greek letter used in this chapter to something chosen from the Latin alphabet. (If I'm not mistaken, the event at which they pleaded with me to do this is called an "intervention.")

I've chosen to remain faithful to the notation used in the well-established literature of queueing theory. I believe that creating a "new" notation for this text would be pretentious of me, and in the long term it wouldn't help you either. Assuming that I would have converted all the Greek symbols to something "more comfortable" without introducing new errors in the notation, I would have left you with a notation that wouldn't match anything else you'd ever study. The formulas would still be ugly, even "in English" ”trust me. If I had converted Greek letters to Latin letters, I would have only penalized readers who actually wanted to study queueing theory by making them learn the technology in two different languages.

The Greek letter problem really isn't that bad anyway. In this chapter, you'll encounter only the Greek letters l ; (lambda), m (mu), r (rho), t (tau), q (theta), and S (which is the letter sigma, but which should be read as "sum of"). Other chapters of course include the letter D (delta) and maybe others. A complete list of Greek symbols and their English names is included in Appendix A.

The total amount of time that service channels inside the system spent actually fulfilling service requests is denoted as B , for busy time. Total system utilization for a given time interval is the proportion of the interval that the system was busy, U = B / T . If you have more than one service channel inside your system (if m > 1), then U can be greater than 1.0. Most people find this statistic disturbing until they normalize it to produce a per-channel average utilization. The mean utilization per channel is then r = U / m ( r is the Greek letter rho). The quantity r is also called a system's traffic intensity . Notice that having a mean utilization per channel of r does not mean that every channel has a mean utilization of r . For example, an eight-channel system can achieve r = 0.5 in any way between the extremes of four channels running at 100% and four others at 0%, to all eight channels running at exactly 50%.

A queueing system is said to be stable if and only if its per-server utilization is in the range 0 r 1. For example, as you drive arrival rates higher and higher, you can make it so that the system can't keep up unless it could operate at a utilization in the domain r 1. However, it is impossible to drive a system at r 1 in reality.

9.3.1.3 Service time and service rate

We have already discussed the service time for a system. More formally , a system's expected service time is the average amount of time that a service channel spends busy per completion, or S = B / C . Computing service time is usually easy, because on most systems it is easy to measure busy time and completion counts. Sometimes it is more convenient to discuss the service rate instead of service time . The service rate is the number of requests that a single service channel can complete per time unit, m = C / B ( m is the Greek letter mu), or equivalently, m = 1/ S .

9.3.1.4 Queueing delay and response time

As you have seen, a system's expected queueing delay ( W ) is simply the amount of time that you should expect to elapse between a request's arrival into the system and the time at which service begins. For your request, queueing delay is thus the sum of the expected service times of the requests that arrived at the queue ahead of you. Predicting queueing delay is one of the wondrous rewards of queueing theory. Queueing delay depends not only upon the average service time at the device in which you're interested, it also depends upon the number of people that are expected to be waiting when you get there.

The formula for predicting queueing delay is difficult to understand without a good bit of study, but fortunately the hard work of constructing the formula has been done for us. For an M/M/ m queueing system (the exact definition of which appears later in this chapter), the answer is:

where:

figs/eq_0902.gif

The hardest part was the formula for C ( m , r ), which was completed in 1917 by a Danish mathematician named Agner Erlang [Erlang (1917)]. The Erlang C formula produces the probability that an arriving request will queue for service.

Don't confuse the C used to denote the Erlang C formula with the C that denotes the number of completions in a system. It is usually easy to tell the two C 's apart by context, because the Erlang C formula is always depicted as a function with two arguments.

Programming Erlang C requires a bit more mathematical sophistication than most of us bring to the game. However, in 1974, a research scientist named David Jagerman developed a fast algorithm for computing Erlang C [Jagerman (1974)] that makes it easy to calculate a system's expected queueing delay. I have used Jagerman's algorithm in Example 9-1.

Example 9-1. This Visual Basic code uses Jagerman's algorithm to compute Erlang C

 Function ErlangC(m, Rho) As Double ' Erlang's C formula, adapted from [Gunther (1998), 65]     Dim i As Integer     Dim traffic, ErlangB, eb As Double     ' Jagerman's algorithm     traffic = Rho * m     ErlangB = traffic / (1 + traffic)     For i = 2 To m         eb = ErlangB         ErlangB = eb * traffic / (i + eb * traffic)     Next i     ErlangC = ErlangB / (1 - Rho + Rho * ErlangB) End Function

Once you know a system's expected service time and its expected queueing delay, computing the expected response time is trivial, as we have already explored. A system's expected response time is simply its expected service time plus its expected queueing delay, R = S + W .

Figure 9-6 depicts an m -channel queueing system. Requests arrive at an average rate of l requests per time unit. The duration between successive request arrivals is the interarrival time t . Each of m parallel channels completes service requests at an average service rate of m requests per time unit, consuming an average service time of s time units per request.

Figure 9-6. This drawing of a multi-channel queueing system illustrates the fundamental relationships among several queueing theory parameters (adapted from [Jain (1991) 511])

9.3.1.5 Maximum effective throughput

The maximum effective throughput of a system is the largest arrival rate that we can ask that system to process without exceeding a user 's response time tolerance r _max . The maximum effective throughput for the system shown in Figure 9-7 is the quantity l _max . As the rate of arrivals into the system increases, the average queueing delay increases, and system response time degrades. The throughput value at which the system's response time degrades beyond the users' threshold for response time degradation is the system's maximum effective throughput, l _max . It's the most work you can ask of a system without driving average response times too high.

Figure 9-7. Response time is a function of arrival rate. Specifying an average response time tolerance r _max determines the location of l _max , which is the largest completion rate value that this system can sustain without driving the average response time above the users' tolerance

To maintain satisfactory response times, you have to keep your system's arrival rate less than l _max , so of course it is important that you know how to compute its value. There is no closed form solution for computing the value of l _max , but estimating the value using interval bisection is fast and straightforward. Example 9-2 provides Visual Basic code to perform the computation.

Example 9-2 refers to objects such as the variable q and the function ResponseTime that are defined in the Microsoft Excel workbook provided on the catalog page for this book: http://www.oreilly.com/catalog/optoraclep.

Example 9-2. This Visual Basic code uses interval bisection to compute maximum effective throughput

 Function LambdaMax(Rmax, q, m, mu) As Double ' Maximum effective throughput of queueing system ' ASSUMPTION: ResponseTime(  ) is a monotonically increasing continuous ' function     Const error = 0.005     ' interval bisection halts when                             ' abs(lambda1-lamba0) <= error*lambda0     Dim lambda0 As Double   ' lambda value for which R < Rmax     Dim lambda1 As Double   ' lambda value for which R >= Rmax     Dim lambdaM As Double   ' arithmetic mean of {lambda0, lambda1}     ' Seek an interval [lambda0, lambda1] for which R(lambda0)<Rmax and     ' R(lambda1)>=Rmax     lambda0 = 0     lambda1 = 1     While ResponseTime(m, mu, Rho(lambda1 / q, m, mu)) < Rmax         lambda0 = lambda1         lambda1 = 2 * lambda1     Wend     ' Narrow the interval by iterative bisection     While Abs(lambda1 - lambda0) > error * lambda0         lambdaM = (lambda0 + lambda1) / 2         If ResponseTime(m, mu, Rho(lambdaM / q, m, mu)) < Rmax Then             lambda0 = lambdaM         Else             lambda1 = lambdaM         End If     Wend     LambdaMax = (lambda0 + lambda1) / 2 End Function

9.3.1.6 Cumulative distribution function (CDF) of response time

There is a problem with using maximum effective throughput as a performance measure, however. The problem is that users don't usually have a tolerance in mind for average response time; what they really have in mind is a tolerance for worst-case response time. The word "tolerance" connotes attention to some maximum value more often than it connotes attention to an average. For example, imagine the following dialog:

User : My order entry form is too slow. We agreed that my response time tolerance for the commit at the end of the form would be 1.5 seconds, but my response time is less than 1.5 seconds in only about 63.2% of cases. ^[1]

System manager : Actually, that's exactly what we agreed. We agreed only that your average response time for your order entry form would not exceed 1.5 seconds, and you have just admitted that we are meeting that goal.

^[1] I selected the number 63.2% carefully for this example. It happens to be the CDF of the exponential distribution at its mean. By constructing my example dialog this way, the user is observing the behavior of a system whose mean response time is roughly 1.5 seconds.

The information provider has abided by the letter of the service level agreement, but the system is not meeting the spirit of the user's wishes. The response time goal that the users really wish they had specified is probably something like this:

The commit at the end of the order entry form must complete in 1.5 seconds or less, for at least 95 out of every 100 user executions.

Or, to generalize:

Function f must complete in r seconds or less in at least p percent of executions ofthe function.

Functionally, a statement of this form is a very useful basis for a contract between an information consumer and the information provider. The statement reminds the provider that users are intolerant of response times for the function that are in excess of a specific value. The statement reminds the consumer that it is impossible to guarantee that no user will ever be dissatisfied with the performance of a given function, but that the provider has committed to limit the disappointment to some negotiated percentage of executions.

As you might expect, queueing theory gives us the mathematical means to compute the things we need in order to make service level agreements in this useful format. One final queueing theory formula gives the means to compute the minimum level of investment into system resources that is required to satisfy the system's performance requirements within its owner's economic constraints. The cumulative distribution function (CDF) of response time allows us to compute the probability P ( R r ), which is the likelihood that a given request will be fulfilled with total response time less than or equal to some response time tolerance r . This quantity is perhaps the most useful statistic emitted from a queueing model, because it is a direct measure of user satisfaction with response time.

The formula for the CDF of response time is complicated. For the particular queueing model (called M/M/ m ) that I shall describe later in this chapter, the formula is the following [Gross and Harris (1998) 72-73]:

where W _q (0) is:

and p is:

Example 9-3 shows how you can accomplish all this in plain old Visual Basic.

Example 9-3. This Visual Basic code computes the cumulative distribution (CDF) of response time of an M/M/m queueing system

 Function p0(m, Rho) As Double ' Compute P(zero jobs in system) [Jain (1991), 528]     Dim i, n As Integer     Dim t, term2, term3 As Double     term2 = 1 / (1 - Rho)     For i = 1 To m         term2 = term2 * (m * Rho) / i     Next i     term3 = 0     For n = 1 To m - 1         t = 1         For i = 1 To n             t = t * (m * Rho) / i         Next i         term3 = term3 + t     Next n     p0 = (1 + term2 + term3) ^ (-1) End Function     Function Wq0(m, Rho) As Double ' Compute Wq(0) [Gross & Harris (1998) 72] ' Note that r = m*rho, c = m in G&H's notation     Dim i As Integer     Dim f As Double     ' (r^c)/(c!) factor     f = 1     For i = 1 To m         f = f * (m * Rho) / i     Next i     Wq0 = 1 - f * p0(m, Rho) / (1 - Rho) End Function     Function CDFr(r, m, mu, Rho) As Double ' CDF of the response time. This wrapper function is necessary because ' the formula in [Gross & Harris (1998), 73] contains a singularity.     Const epsilon As Double = 0.000000001     If (Abs(m * (1 - Rho) - 1) < epsilon) Then         CDFr = (CDFr2(r, m, mu, Rho - epsilon) + CDFr2(r, m, mu, Rho + epsilon)) / 2     Else         CDFr = CDFr2(r, m, mu, Rho)     End If End Function     Function CDFr2(r, m, mu, Rho) As Double ' CDF of the response time, adapted from [Gross & Harris (1998), 73] ' Note that r = m*rho, c=m, lambda=rho*m*mu, t=r in G&H's notation     Dim w As Double             ' Wq(0) value     Dim cdf1, cdf2 As Double    ' complicated terms of CDF formula     If (Rho >= 1 Or r <= 0) Then         CDFr2 = 0         Exit Function     End If     w = Wq0(m, Rho)     cdf1 = (m * (1 - Rho) - w) / (m * (1 - Rho) - 1) * (1 - Exp(-mu * r))     cdf2 = (1 - w) / (m * (1 - Rho) - 1) * (1 - Exp(-(m * mu - Rho * m * mu) * r))     CDFr2 = cdf1 - cdf2 End Function

How I Figured Out That Jain's CDF Formula Is Incorrect

The Jain formula for the CDF of response time [Jain (1991), 528, 531] has bothered me for a long time. The formula contains the expression:

Without even understanding what this formula means, the presence of the two identical terms - e ^{- m r} is the thing that has worried me. Why would the author or his copyeditor not catch the two identical terms and combine them into the single term -2 e ^{- m r} ? The most likely explanation, I figured, was that the formula suffered from a typographical error of some sort .

So I created a test. I began by choosing an arbitrary integer value for m . Using Mathematica , I generated a random service time s ₁ from an exponential distribution with some arbitrarily chosen mean 1/ m . Next, I generated a random interarrival time t ₁ from an exponential distribution with an arbitrarily chosen mean 1/ l . Using the R = S + W formula described earlier, I computed the expected response time of the system whose service time and interarrival time were the two random numbers I generated. Using the input values m , m = 1/ s ₁ , and l = 1/ t ₁ , the calculation of R was easy.

I repeated this test several millions of times (for several millions of random service time s i values chosen from the exponential distribution with mean 1/ m , and the same number of random interarrival rate t _i values chosen from the exponential distribution with mean 1/ l ). I stored the results. Then, for some arbitrarily chosen response time value r , I simply counted the number of response times generated by my test that were less than r . The proportion of response time numbers that were less than r should have approximately matched the value of Jain's CDF. (This is the definition of how the CDF should behave.)

But Jain's formula consistently failed to match the results of my test. By contrast, the Gross and Harris CDF formula [Gross and Harris (1998) 72-73] consistently succeeded in matching the result of my test. I have attempted to contact Dr. Jain with the results of my testing, but as of the time of this writing, I have not yet received a response.

9.3.2 Random Variables

One tricky thing about real queueing systems is that arrivals don't enter the system at completely predictable times. (In fact, if requests arrive into a system at uniform intervals and if service times are constant, there will be no queueing unless the system is unstable.) For example, we may have evidence that the requests of a telephone system arrive at an average rate of 2.0 requests per second, but in real-life systems, it is almost never reasonable to expect that requests will arrive at a rate of exactly one every 0.5 seconds. Indeed, in most real systems we expect arrivals to be scattered randomly through time. We expect behaviors averaged across large numbers of requests to be predictable, but we expect individual arrival times to be un predictable.

9.3.2.1 Expected value

Mathematicians use the term "random variable" to describe the behavior of an unpredictable process. A random variable is simply a function whose value is a random number. The expected value E [ X ] of a random variable X is the mean (or average) of the values that X takes on. In many probability and statistics texts, a bare uppercase letter like X refers to a random variable, which has several properties, among which are its expected value and its distribution (defined in a moment). Queueing theory texts often use an uppercase letter to denote both a random variable and its expected value. The readers of those books are expected to understand by context which concept is being referenced. This book follows the same convention. For example, I use R = S + W instead of the more technically precise but cumbersome E [ R ] = E [ S ] + E [ W ].

9.3.2.2 Probability density function (pdf)

Although a random variable has, by definition, a random number value, the process through which values appear in nature is usually endowed with some sort of order. For example, on the telephone system whose average rate of arrivals is 2.0 requests per second, it may be possible for 200 requests to arrive in a given second, but it may be very unlikely . The mathematical function that models a random variable's likelihood of taking on a given value is called that random variable's distribution . Specifically, the probability that a discrete random variable X will take on a specified value x is called that variable's probability density function (or pdf), denoted f ( x ) = P ( X = x ) [Hogg and Tanis (1977) 51-58].

9.3.2.3 Using the pdf

Because the exact arrival time for the next incoming request cannot be predicted exactly, a system's interarrival time is a random variable. Thus, the arrival rate (the reciprocal of arrival time) is a random variable as well. Agner Erlang showed in 1909 that the arrival rate of phone calls in a telephone system often has a Poisson distribution [Erlang (1909)]. Specifically, if telephone calls arrive randomly at an average rate of l > 0, then the pdf of the arrival rate has the form:

Thus, if telephone calls arrive at an average rate of l = 2 calls per second, then the probability that there will be 200 calls in a given second is only f (200) = 2.7575 x 10 ^-316 ; in other words, if the telephone call arrival process is truly Poisson distributed with l = 2, then it is more likely that you could deal fifty-four straight royal flush poker hands than to ever observe a one-second interval in which 200 calls would occur. On the other hand, there's a much greater probability that a one-second interval will contain exactly zero, one, two, three, or four arrivals. The pdf of the Poisson distribution with mean l = 2 is shown in Figure 9-8. Conveniently, arrival rates in many computer applications, including many aspects of Oracle systems, also have a Poisson distribution.

Figure 9-8. The probability density function (pdf) for the Poisson distribution with l = 2 shows the probability P(A = x) that there will be exactly x arrivals in a one-second observation interval

It is, of course, no coincidence that the symbol l (lambda) chosen to denote the average arrival rate of a queueing system is the same symbol used to denote the mean of a Poisson distribution. It's actually the other way around. As I shall divulge shortly, the specific M/M/ m queueing theory model that is covered later in this chapter works only if a system's arrival process is Poisson distributed with mean l . The arrival rate in queueing theory is called l because it is the mean of a Poisson distribution.

A system's service time is also a random variable. For example, the time it takes a bank teller to count your money is predictable in the aggregate, but unpredictable in a specific case. Even the CPU time required to process an Oracle LIO is unpredictable. An Oracle logical I/O (LIO) is the operation that the Oracle kernel uses to fetch a single block from the Oracle database buffer cache. For example, a CPU might service an average of 40,000 LIO requests per second (that is, m = 40000), but from one second to the next, a CPUs service rate might vary significantly. Randomizing factors include the type and complexity of an Oracle block (e.g., whether the block is an index block or a table block), the varying number of rows in each Oracle block, and the varying column widths of data within those blocks.

9.3.2.4 Why understanding distribution is important

It is crucial to know a random variable's distribution before you can use that random variable's mean (expected value) in any predictive formulas. For example, you might say that customers arrive at a restaurant at an average rate of two customers per minute during lunchtime; thus, the expected interarrival time is 30 seconds. However, the average doesn't tell the whole story. If you know only the expected interarrival time, then you can't tell, for example, whether individual customers are really arriving exactly 30 seconds apart, or if they're arriving in groups. If you know only that requests have an expected interarrival time of 30 seconds, then for example we cannot know which ”if either ”of the cases in Table 9-1 has occurred.

Table 9-1. Two very different scenarios both lead to an expected interarrival time of t = 30 seconds

Time interval	Number of arrivals
	Case I	Case II
11:30 a.m. to 11:45 a.m.		34
11:45 a.m. to 12:00 noon		28
12:00 noon to 12:15 p.m.	240	31
12:15 p.m. to 12:30 p.m.		37
12:30 p.m. to 12:45 p.m.		24
12:45 p.m. to 1:00 p.m.		30
1:00 p.m. to 1:15 p.m.		32
1:15 p.m. to 1:30 p.m.		24
Average per 0:15 bucket	30	30

If reality consistently resembles Case I, then you shouldn't expect a mathematical formula to produce a reliable prediction of what happens between 1:00 p.m. and 1:15 p.m., if you tell the formula that your "average arrival rate is 120 arrivals per hour ." For a queueing model to produce reliable results, you need to tell it something more about the properties of its random variable input parameters than just an average. You must also tell the model about each random variable's distribution .

9.3.3 Queueing Theory Versus the "Wait Interface"

Now that you've seen the definition for each of the formulas of queueing theory, how do Oracle operational data fit in? Specifically, how does information obtained from the so-called wait interface fit into queueing theory? Unfortunately, Oracle Corporation teaches inaccurate information on this topic. Oracle MetaLink article 223117.1 is an example:

Performance tuning is based on the following fundamental equation:

Response Time = Service Time + Wait Time

In the context of an Oracle database, "Service Time" is measured using the statistic "CPU used by this session" and "Wait Time" is measured using Wait Events [my emphasis].

The emphasized portion of this statement is false. A so-called Oracle wait event is not what this statement says it is.

9.3.3.1 Oracle wait times

The confusion begins with the name "wait event." It's an unfortunate choice of terminology, because the mere name encourages people to believe that the duration of an Oracle kernel event is a queueing delay. However, it is not. As you learned in Chapter 7, the elapsed time of a wait event actually includes lots of individual components. The response time components for a single OS read call are depicted in Figure 9-9.

Figure 9-9. An Oracle wait time for a system call (like the disk I/O call shown here) is really a response time that is measured from the Oracle kernel's perspective. The duration ela=t1-t0 is not a queueing delay; it consists of service times and queueing delays

The Oracle wait time recorded for a system call execution is the total wall clock time that elapses from the final instruction before the OS call execution until the first instruction after the return of the OS call. Everything that happens in that interval between times t and t ₁ is an Oracle wait time. In Figure 9-9, a single Oracle wait time includes all of the following components:

s _CPU: CPU service time consumed to set up the system call. For a disk read call, most of this time is consumed in kernel running state. However, some system calls may consume CPU in user running state as well.
w _disk: Queueing delay for the disk device, which in this picture includes the transmission latency required for the request to reach the disk device from the CPU device.
s _disk: Service time for the disk device, including the seek latency, rotational latency, and data transfer latency from the I/O device back to the memory that is addressable by the CPU.
w _CPU: Queueing delay for the CPU, consumed with the process in the ready to run state.
s _CPU: Another segment of CPU service time required to complete the system call. Again for different system calls, some of this CPU capacity may be consumed in kernel running state, and some may be consumed in user running state.

I hope you can see clearly now that an Oracle wait time for an OS read call is definitely not a measure of w _disk . Other system calls behave similarly.

9.3.3.2 Differences in queueing theory notation

Studying different queueing theory books tends to even further confuse the issue of what kind of quantity an Oracle wait time really is. While most of the Greek letters retain consistent meaning across different texts, different authors use different notation for their fundamental queueing theory formula, as shown in Table 9-2.

Table 9-2. A sample of queueing theory notations

Notation	Source
R = S + W	Oracle MetaLink , [Gunther (1998) 84]
T = S + T _q	[Gross and Harris (1998) 11]
W = 1/ m + W _q	[Gross and Harris (1998) 71]
s n = x _n + w _n	[Kleinrock (1975) 198]

I have rearranged the terms on the right-hand side of each of these equations so that all of the equations represent exactly the same concept. In other words, Gunther's and my W is exactly the same thing as Gross and Harris's T _q and W _q , which are exactly the same thing as Kleinrock's w _n . It can be especially confusing when different books use the same words to mean completely different things. For example:

Expected steady-state system waiting time W equals service time 1/ m plus line delay W _q [Gross and Harris (1998) 64].

Response time R equals service time S plus time spent waiting in the queue W [Gunther (1998) 52].

Gross and Harris use the term "waiting time" to mean what Gunther calls "response time." Furthermore, notice that the two sets of authors use the term "wait" to mean two completely different things. Considering the R = S + W notation for a moment, Gross and Harris are calling R a wait , while Gunther calls W a wait. An Oracle wait time is closer in spirit to the Gross and Harris definition than to the Gunther definition.

Who's right? The choice of words doesn't matter, as long as the concepts represented by those words aren't intermingled. I've chosen notation that resembles what Jain and Gunther use, principally because those were the first two books on queueing theory that I studied. However, it's fine to call R , S , and W by any name that you like. It is not correct to regard an Oracle wait time as any single component in the right-hand side of an equation like R = S + W . An Oracle wait time is really the response time for an operating system call as viewed from the Oracle kernel's perspective. The Oracle kernel publishes wait times in several places, including the ela statistic in the WAIT lines of extended trace data, and the following fixed views:

V$SESSION_WAIT.WAIT_TIME

V$SESSION_EVENT.TIME_WAITED

V$SESSION_EVENT.AVERAGE_WAIT

V$SESSION_EVENT.MAX_WAIT

V$SESSION_EVENT.TIME_WAITED_MICRO

V$SYSTEM_EVENT.TIME_WAITED

V$SYSTEM_EVENT.AVERAGE_WAIT

V$SYSTEM_EVENT.TIME_WAITED_MICRO

Each of these statistics refers to a quantity of time that does include a queueing delay for the device being requested, but an Oracle wait time includes many other response time components as well. Specifically, an Oracle wait time is not the W of an R = S + W equation from queueing theory.

Top