Chapter 7: Queuing Theory | Computer Systems Performance Evaluation and Prediction

< Free Open Study >

In this chapter, we will build upon the basic probability theory covered in Chapter 5 and stochastic processes covered in Chapter 6. The discussions will lead to the definition and analysis of several useful queuing models for the behavior of many types of service systems. The methods discussed herein complement those provided by simulation analysis. Frequently, the development of a general queuing model for a particular system will aid in the development of a refined Petri net model or a detailed simulation of specific parts of the system. Also, the results and behavior observed from simulation help to tune the analytical models.

This chapter is organized into three general topics: queuing models, estimation, and computational methods for theoretical systems analysis. Stochastic processes form the basis for many of the analytical techniques that apply to the queuing systems that we will discuss. The section on estimation provides some methods for defining the values that parameterize the queuing models with real-world data.

7.1 Queuing systems

In this section, we will cover the basic analysis techniques associated with queuing systems. The prime motivation for performing queuing analysis is to assess local system behavior under a variety of assumptions, initial conditions, and operational scenarios. The modeling aspect seeks to represent the behavior of system components as processes that have calculable statistics and that adequately reflect reality. Thus, the use of queuing analysis provides us with a set of techniques for calculating quantities, such as wait time for service, throughput of a server, the effect of different servers or queuing strategies, and the effects of coupled and closed networks of queues.

The assumption that we must make in order to take advantage of these techniques is that the system under observation can be adequately represented by a queuing system. In the remainder of this section, we will first look at analytical modeling in general, at the characteristics of the systems that we are interested in modeling, and then at the suitability of queuing models in general and their use in particular.

What are we seeking to quantify when we set out to model a system? The answer can be summed up in just one word: performance. This one word, however, may have very different meaning for different people. Take automobile performance, for instance. For the speed enthusiast, performance is how fast the car can go and how quickly it can get to that speed. For the back-road driver, it is the ability to corner without difficulty under severe conditions. For the economist, high performance means fuel efficiency and low maintenance costs. The list goes on. So it is for the performance of a computer system as well. At issue here are performance measures, such as the utilization of the system components, effective throughput, average waiting time for a potential user, average number of users in the system at any given time, and the availability of service resources.

In addition, trade-off analyses and "what if" studies can be performed to establish performance measures such as speedup and improved availability. In general, such studies provide the ability to analyze the sensitivity of the previously mentioned measures to changes in the system under study.

The general process of analytical modeling involves mapping the behavior of a complex system onto a relatively simpler system, solving the simpler system for the measures of interest, and then extrapolating the results back to the complex system. Sometimes this process has several levels, where models are broken into submodels. Here, the lowest-level models are solved (or partially solved) first, their results propagated up to the next higher layer for inclusion in that layer's solution, and so on to the top level.

In some cases, portions of a model can be replaced by a technique called decomposition, or isolation. Here, a queuing subsystem is replaced with a flow-equivalent server, where the server output is precalculated for each number of units (or customers) in the system. Thus, the job flow through the flow-equivalent server can be implemented using a simple lookup table indexed by the number of customers currently in the system. This technique is appropriate if the impact of the removed subsystem is minimal when compared with the effect of other model subsystems.

The basic premise behind the use of queuing models for computer systems analysis is that the components of a computer system can be represented by a network of servers (or resources) and waiting lines (queues). A server is defined as an entity that can affect, or even stop, the flow of jobs through the system. In a computer system, a server may be the CPU, I/O channel, memory, or a communication port. Awaiting line is just that: a place where jobs queue for service. To make a queuing model work, jobs (or customers or message packets or anything else that requires the sort of processing provided by the server) are inserted into the network. A simple example, the single server model, is shown in Figure 7.1. In that system, jobs arrive at some rate, queue for service on a first-come first-served basis, receive service, and exit the system. This kind of model, with jobs entering and leaving the system, is called an open queuing system model.

Figure 7.1: Single server model.

By cascading simple queuing models and allowing the existence of parallel servers, networks of queues and servers may be formed. These combinations are formally called queuing networks, although we will also call them network models and queuing systems. Figure 7.2 shows one such model of a computer system with a fixed number of jobs competing for a CPU and two I/O processors.

Figure 7.2: Queuing network model.

In Figure 7.2, jobs that have finished I/O service loop back into the CPU queue for another cycle of computation and I/O. A system like this, where the number of customers remains constant, is called a closed queuing network system model.

A combination of open and closed concepts is certainly possible if one considers each job to have an associated class. For example, a computer system may contain two job classes, interactive and system, where interactive jobs come and go as users log on and off and where system jobs execute continually. A system that contains both open and closed class customers is called mixed.

The concept of customer classes also allows different classes to receive different treatment at the same server, as well as the definition of a group of customers as open or closed. A system with more than one customer class is called multiclass, and it may be either open, closed, or mixed.

Once we have a network model established, the collection of n₁ customers at server 1, n₂ at server 2, and so on for the entire collection of queues in the network system defines the state of the network model. An analytical model for a queuing network would provide a method for calculating the probability that the network is in a particular state (i.e., that the number of customers is at certain levels for each queue and service center). In addition, network throughput, mean queue length for any server, and mean response time (wait time and service time) for any server can be found by a variety of methods.

In a network model, a server typically has associated with it a service time distribution, from which customer service times are drawn. Upon arrival at a server, a customer receives service, the duration of which is determined by the service time distribution.

We will now turn our attention to some of the more well-known queuing systems, the notation used to represent them, the performance quantities of interest, and the methods for calculating them. We have already introduced many notations for the quantities of interest for random variables and stochastic processes. Figure 7.3 reviews these and adds a host of others that will be useful for the analysis of queuing systems. The following text briefly discusses the more important parameters.

λ	Arrival rate at entrance to a queue
μ	Service rate (average) of a server
P_n	Probability that there are n customers in the system at steady state
C	Number of identical servers in the queuing system
N	Random variable for the number of customers at steady state
L	E[N], expected number of customers in the system at steady state
W_q	Random variable for customer waiting time in a queue
S	Random variable for customer service time
N_q	Random variable for the number of customers in a queue at steady state
L_q	E[N_q], expected number of customers in a queue at steady state
N_s	Random variable for the number of customers at a server at steady state
W	W_q+S, random variable for the total time in a system

Figure 7.3: Stochastic processes and random variable notation.

The arrival rate for a queuing system defines the stream of arrivals into a queue from some outside source. This rate is defined as an average rate, which is derived from an arrival process. The average interarrival time for a given arrival process is denoted as:

(7.1)

The service rate parameter is defined in a way that is similar to the arrival rate. This rate is also an average rate, which defines how many customers are processed per unit time when the server is busy. The service rate can be cast in terms of the service time random variable as:

(7.2)

Often, we wish to know the probability that the system will contain exactly n customers at steady state. Accounting for all of the probabilities for n ranging from zero to infinity defines the probability distribution for the number of customers in the system.

The number of identical servers in a system indicates that a customer leaving a queue may proceed to one of C servers as soon as one becomes nonbusy (free).

Of interest for any queuing system is the average number of customers (N) in the system at steady state. This value can be thought of as the sum of all customers in queues (N_q) and at servers (N_s):

(7.3)

The total time a customer spends in the system can also be thought of as the sum of wait time in the queues (q_t) and time at the servers (s_t). The total time, and expected total time at steady state, therefore, are given as:

(7.4)

In addition to the notation described previously for the quantities associated with queuing systems, it is also useful to introduce a notation for the parameters of a queuing system. The notation we will use here is known as the Kendall notation, illustrated in Figure 7.4.

A/B/c/K/m/Z

where

A

arrival process definition

B

service time distribution

c

number of identical servers

K

maximum number of customers allowed in the system (default = ∞)

m

number of customers allowed to arrive before the arrival process stops (default = ∞)

Z

discipline used to order customers in the queue (default = FIFO)

Figure 7.4: Kendall notation.

The symbols used in a Kendall notation description also have some standard definitions. Figure 7.5 shows the more common designators for the A and B fields of the notation.

D	deterministic service time or arrival rate
G	general service time or arrival rate
M	Markovian (exponential) service time or arrival rate

Figure 7.5: Kendall notation symbol definitions.

The service discipline used to order customers in the queue can be any of a variety of types, such as first-in first-out (FIFO), last in first out (LIFO), priority ordered, random ordered, and others. Next, we will examine several queuing systems and give expressions for the more important performance quantities.

7.1.1 The M/M/1 queuing system

The M/M/1 queuing system is characterized by a Poisson arrival process and exponential service time distributions, with one server, and a FIFO queue ordering discipline. The system, shown in Figure 7.6, may represent an input buffer holding incoming data bytes, with an I/O processor as the server. A few of the quantities that we will be interested in for this type of queuing system are the average queue length, the wait time for a customer in the queue, the total time a customer spends in the system, and the server utilization.

click to expand
Figure 7.6: M/M/1 queuing system model.

Let's look at the exponential service distribution first. It is given as:

(7.5)

and is shown in Figure 7.7. In the figure, E[S] is the average service time of a customer at the server. Next, let's derive the steady-state equations for the M/M/1 system.

click to expand
Figure 7.7: Exponential service distribution.

The M/M/1 system is a birth-death process, as discussed in Chapter 6. Let us assume that:

(7.6)

From earlier discussions about birth-death processes, we know that:

(7.7)

and

(7.8)

Following the same reasoning for deriving the steady-state probabilities as we did for the general birth-death process, we obtain the steady-state equations for the M/M/1 system:

(7.9)

(7.10)

Now, if we let u denote the average server utilization, we define this quantity as the mean service time of a single customer divided by the mean interarrival time (see equations [7.1] and [7.2]), then:

(7.11)

Solving the steady-state equations (7.9) and (7.10), we obtain:

(7.12)

Similarly, for n = 1

(7.13)

Similarly:

(7.14)

We assume here that u is less than 1 so that we have a finite queue length. Now, we know that:

and

(7.15)

so:

The right-hand side of equation (7.15) is recognized as a geometric progression that has the following solution:

(7.16)

Combining equations (7.14) and (7.16), we arrive at the steady-state probability that there are n customers in an M/M/1 system:

(7.17)

Figure 7.8 shows the state transition diagram for the M/M/1 queuing system.

click to expand
Figure 7.8: M/M/1 system state transition diagram.

Now let's look at the average number of customers in the system at steady state. This is given as the expected value of N, which can be found by:

(7.18)

The average amount of time that a customer must wait in the queue, assuming that other customers are already in the queue, is given as the number of customers ahead divided by the average service time of the server:

(7.19)

The expected wait time in the queue, then, is a function of the average wait time and the steady-state probability of having i customers in the system:

(7.20)

Combining the queue waiting time (equation [7.20]) and the expected service time E[s] (equation [7.2]) yields the total customer time in the system, called the expected wait time:

(7.21)

If we rewrite equation (7.18) as:

(7.22)

using equation (7.21), we obtain Little's result:

(7.23)

Little's result holds in general for any queuing system in steady state that conforms to the flow balance assumption discussed earlier. As such, it gives us an important relationship for the effect of arrival rate and queue length on total customer wait time. A related result, also attributed to Little, states the equivalent for queue length and queue waiting time and also holds for queuing systems in steady state:

(7.24)

This second version of Little's result says that the expected queue length can be found directly from the arrival rate times the expected queue wait time.

The total waiting time in the system, then, can be found by using Little's result or by summing the queue wait time and the expected service time.

Server utilization is a useful quantity for determining how many equivalent servers must be provided to service a given arrival process. The method is straightforward and involves solving an M/M/1 queuing system using the methods indicated previously. Suppose, for instance, that we have an M/M/1 system with an arrival rate of six customers per minute and a service time of ten seconds. Then the server utilization, as given by equation (7.11), is 1. This means that the server can be expected to be busy 100 percent of the time, but that it can, in fact, process enough customers so that infinite queue buildup is prevented. Suppose now that the arrival rate increases to 60 customers per minute so that the server utilization becomes 10, an overload situation. If we speed up the server by 10, however, or provide ten servers of the original speed, the utilization would again be 1. In general, then, if the utilization is less than 1, the server can keep up with the flow of customers and an infinite queue will not result. If, however, the utilization is greater than 1, the utilization, rounded up to the next largest integer, gives an indication of the number of identical servers that is necessary to keep up with the customer flow.

A final interesting property of the M/M/1 queuing system is the fact that the queue waiting time and total waiting time both have exponential distributions. For instance, the queue wait time can be found as follows:

(7.25)

From equation (7.14) and the distribution (Poisson), we get:

(7.26)

From equation (7.21), we substitute to get:

(7.27)

Including P[W_q = 0]:

(7.28)

By substituting :

(7.29)

From these distributions, we can find the percentiles for the expected wait time for r percent of the total number of customers. The percentile of any random variable is defined as:

(7.30)

In the case of queue wait time, for example, if we wish to find the wait time that 90 percent of the customers in the system will not exceed, we have:

(7.31)

7.1.2 The M/M/1/K system

An interesting and realistic variation on the basic M/M/1 system is a system with a finite queue size. In this system, once the queue is full, new arrivals are lost and are never provided service. This is quite realistic, for example, in an input system with finite input buffer space and no flow-control protocol. The birth-death state transition diagram for the M/M/1/K system is shown in Figure 7.9.

click to expand
Figure 7.9: State diagram for the M/M/1/K system.

As for the M/M/1 system, we have:

(7.32)

Using the law of total probability, we also have:

(7.33)

The summation is a geometric series, which yields:

so:

(7.34)

Substituting into equation (7.32) yields:

(7.35)

If the arrival rate is equal to the service rate, we have:

(7.36)

and

(7.37)

The expected number of customers in the system, for a system with nonequal arrival and service rates, is found as:

(7.38)

After some algebra and simplification of the summation, we get:

(7.39)

We can see that, for very large values of K, the second term approximates zero and we get the same expression as for the M/M/1 system.

For the case where the arrival and service rates are equal:

(7.40)

To compute the wait time distribution for the M/M/1/K system, we must compute the probability for the number of customers in the queue when a customer arrives, given that the customer is admitted to the system. This is given as:

(7.41)

From this, we can arrive at the wait time distribution:

(7.42)

This quantity can be found in the same way as the statistic for the M/M/1 system.

7.1.3 The M/M/C system

The M/M/C system, shown in Figure 7.10, consists of a single waiting line that feeds C identical servers. The arrival process is considered to be Poisson, and the servers have exponential service times. The state transition diagram for an M/M/C system is shown in Figure 7.11. Now let's write some of the flow balance equations for the state transition diagram.

Figure 7.10: An MIMIC system, for C = 3.

click to expand
Figure 7.11: State transition diagram for an MIMIC system, for C = 3.

(7.43)

so that:

To find the probability of no customers in the system, we sum all probabilities:

(7.44)

The expected queue length can be found by subtracting the number of customers in service from the expected number of customers in the system:

(7.45)

Using Little's result, we can compute the queue wait time:

(7.46)

The total wait time is:

(7.47)

The total number of customers in the system is:

(7.48)

For some multiple server systems, no queue is provided for customers to wait for service. In this case, a customer who arrives when all servers are busy is turned away, perhaps to try again later. The state transition diagram is shown in Figure 7.12. This system is often referred to as the M/M/C loss system, because customers who arrive when all servers are busy are lost. Writing the flow balance equations, we obtain the steady-state probabilities as we did for the M/M/C system:

click to expand
Figure 7.12: MIMIC loss system.

(7.49)

The probability that a customer will be turned away, then, can be found from the previous expression with N = C. Since there is no queue, the queue length and queue waiting time are zero, and the total wait time is the expected service time.

7.1.4 The M/G/1 system

The queuing systems that we have discussed so far have all had the Markov property for arrival and service processes, making it possible to model the system as a birth-death process and to write the flow balance equations by inspection. Next, we will look at a system in which the service time does not have the Markov property. In the M/G/1 system, each customer has different and independent service times. Because service times are not guaranteed to be Markovian, the system is not representative of a Markov chain and we must resort to other methods to derive meaningful statistics. One approach commonly taken is to look at the process that describes jobs leaving the system, which is a stochastic process that also happens to be a Markov chain. It has been shown [3] that, in the limit, the distribution for the number of jobs in the system at any point in time and the number of jobs in the system observed when a customer departs from the system, are identical.

Summarizing the procedure, then, we can analyze certain aspects of the system that are described by a non-Markovian process by observing a Markovian subportion of the system (in this case the departure process) and extrapolating the results back to the original system. This type of analysis relies on what is known as an embedded Markov chain. The derivation of the statistics for the M/G/1 system is beyond the scope of this book.

The general M/G/1 system is useful in many situations, because we can characterize a known service process in terms of its moments and then evaluate its performance in the presence of a random arrival process.

7.1.5 The G/M/1 system

In the previous section, we discussed the situation in which a system had a non-Markovian service process. Next, we will consider the case in which the service time is random and the arrival process is non-Markovian. We will assume that the interarrival times are independent and identically distributed. Again, we can find an embedded Markov chain in this system whose behavior is essentially equivalent to the system's behavior at steady state. In this case, the random variable defining the number of customers in the system, at precisely the time when another arrival occurs, forms a process that is a Markov chain. As with the other systems that we have discussed, the statistics of interest use the probability of having an empty system in calculating their values.

< Free Open Study >

A	arrival process definition
B	service time distribution
c	number of identical servers
K	maximum number of customers allowed in the system (default = ∞)
m	number of customers allowed to arrive before the arrival process stops (default = ∞)
Z	discipline used to order customers in the queue (default = FIFO)