The load intensity of a multiclass model with R classes and K devices is represented by the vector = (N_{1}, N_{2}, ..., N_{R}), where N_{r} indicates the number of class r customers in the system. The goal of multiclass algorithms is to calculate the performance measures of the network as a function of . In the transaction system example (i.e., Figure 13.1), the performance measures are calculated for each class (update and query) for the population = (1, 3) (i.e., for one update transaction and three query transactions). The two types of processing that are usually modeled as closed classes are background batch jobs and interactive jobs. The key feature is that the total load placed by these classes on a system is constant as depicted in Figure 13.2. In the case of background processing, it is assumed that the system always has a backlog of jobs waiting to enter the system. Thus whenever a process leaves the system it is replaced by a new, statistically identical, process. Therefore the number of background processes in the system remains constant. For example, in a database system, typical background processes include a log writer, a checkpointing process, and a session monitor. In an interactive system, each customer (e.g., transaction, process, user, job) alternates between thinking and waiting. The thinking state corresponds to the period of time that elapses since a customer receives a reply from the system due to the previous request until a new request is issued. After submitting a new request, the customer enters the waiting state while the system executes the request. Each of the M_{r} terminals is associated with one class r customer. Thus, there exists M_{r} requests in the system, where each request cycles between spending R_{r} units of time executing and Z_{r} time units thinking. A closed model is a combination of interactive and batch classes. The closed system analysis begins with the load intensity vector and the class descriptor parameters (D_{i,r}, M_{r}, Z_{r}). The analysis computes the throughputs, response times, and queue lengths of each class. Figure 13.2. Multiple class closed model.
As with the singleclass model, the MVAbased solution of a multiclass system relies on three basic equations applied to each class. Equation 13.5.1
Equation (13.5.1) is obtained by applying Little's Law separately to each class of customers. If r is a batch class, then Z_{r} is zero. The residence time corresponds to the total time a class r customer spends at server i during its execution. It includes the service demand (D_{i,r}) plus the total waiting time at the device. The average response time of class r customers can then be written as . Applying Little's Law and the Forced Flow Law to each service center yields Eq. (13.5.2). Equation 13.5.2
Summing up customers of all classes at device i gives the total number of customers at that device, . Equation 13.5.3
The key equation of the MVA technique can be derived from the observation that the mean response time of a class r customer at service center i equals its own mean service time at that device plus the time to complete the mean backlog seen upon its arrival (i.e., the average number of customers seen upon arrival multiplied by each customer's mean service time). Therefore, Equation 13.5.4
where is the average queue length at device i seen by an arriving class r customer. As seen in Eq. (13.5.4), the average residence time is a function of the service demand and not of the individual values of the service times and visit ratios. In the case of a delay server, it follows directly by definition that the backlog is zero, , which makes = D_{i,r}. When the scheduling discipline of center i is PS or LCFSPR, the expression can be viewed as an inflation factor of the service demand due to the congestion by other customers. For FCFS service centers, Eq. (13.5.4) represents the customer's own service demand plus the time to complete the service of all customers in front of it. For practical purposes, scheduling disciplines can be grouped into two categories: delay and queuing. The latter encompasses loadindependent servers with the following disciplines: PS, LCFSPR, and FCFS. Having as a starting point the fact that the queue length is zero when there are no customers in the network , Eqs. (13.5.1), (13.5.2), and (13.5.4) can be used iteratively to calculate the performance measures of the model. However, there is a minor problem; no expression for has yet been given. Given that expression it would be easy to solve a multiclass model. Multiclass model solution techniques are grouped into either exact or approximate solutions, depending on the way the backlog seen upon arrival [i.e., ] is calculated. 13.5.1 Exact Solution AlgorithmAs pointed out in Chapter 10, an exact solution technique means the exact solution of analytic formulas of approximate models of actual systems. The key to the exact solution of multiclass closed queuing networks is the arrival theorem [16, 17], which states that a class r customer arriving at service center i in a system with population sees the distribution of the number of customers in that center as being equal to the steadystate distribution for a network with population . The vector consists of a 1 in the rth position and zeros in the rest of the vector [i.e., (0, 0, ..., 1, ..., 0)]. In other words, it states simply that the arriving customer sees the system in equilibrium with itself removed. In the transaction system example with population = (1, 3), a query transaction that arrives at the CPU sees a queuing distribution equal to that of the system with population = (1, 2). From the arrival theorem, it follows that Equation 13.5.5
Combining Eqs. (13.5.1), (13.5.2), (13.5.3), (13.5.4), and (13.5.5), the algorithm for the exact solution of closed multiclass models is described in Figure 13.3. Figure 13.3. Exact MVA algorithm for multiple classes.
13.5.2 Closed Models: Case StudyRecall the motivating problem of Section 13.3. The algorithm of Figure 13.3 can be used to obtain the performance measures for each class. The first step is to apply the exact algorithm to the baseline model described in Table 13.1 [i.e., to calculate the results for the population of one update and three query transactions, = (1, 3)]. From the arrival theorem, to calculate the residence time at the devices (processor and two disks) for population (1, 3), the device queue lengths are required for populations (0, 3) and (1, 2). These correspond to one less update transaction and one less query transaction, respectively. By continually removing one customer from each class, eventually the performance measures for population (0, 0) are calculated, which is the starting point of the algorithm. Figure 13.4 shows the precedence relationships required to calculate the results of a system with population (1, 3) using exact MVA. Table 13.3 shows for each population of the sequence of Figure 13.4 the results calculated by the exact MVA algorithm for the baseline model of the transaction system example. It is worth noting that each class can be calculated separately, as indicated in Table 13.3. However, the interaction among the multiple classes is explicitly represented by the term in the equation . The average number of customers at device i reflects the contention for common shared resources among the distinct classes of the workload.
From Table 13.3 it is noted that the calculated throughputs for the query and update classes are 4.093 tps and 0.409 tps, respectively, which match the measurement results of Table 13.1. Either using Little's Law, or summing the average device residence times, the average response times are obtained for the two classes: 0.733 sec for queries and 2.444 sec for updates. Once the baseline model has been solved and validated, attention is directed to the prediction phase. To construct a predictive model, the parameters of the predicted system need to be specified. Consider the two questions posed in Section 13.3. Figure 13.4. Sequence of calculations of MVA.
13.5.3 Approximate Solution AlgorithmsTable 13.3 provides a good idea regarding the number of operations required to compute the results of the simple transaction system example that consists of only 2 classes, 4 customers, and 3 devices. Not surprising, the computational effort required to compute performance measures of models of modern systems with several classes, many processes in execution, and hundreds of I/O devices is "significant." Looking at Figure 13.4, note that to compute the results of the model with population (1, 2), the results of populations (1, 1) and (0, 2) are needed. Each of these populations requires, as input, queue lengths from two related populations. In the general case, when a system has R classes, each calculation of metrics for a given population demands inputs from R other populations. Due to the precedence relationships in the calculation of performance measures of a multiclass model, the computational complexity of MVA algorithms grows exponentially with the number of classes. The number of multiplications and the number of additions required to solve a multiclass model is proportional to Equation 13.5.6
Modeling distributed systems often involve large queuing networks. For example, consider a multitier distributed server environment, modeled as multiclass models [7, 10]. The workload may be viewed as processes running at the front servers, which during their execution require access to a number of files that can be either local or in the backend servers. Assume that each process visits three backend servers for file services. Suppose that we have a high speed LANbased system with 20 frontend servers (processor and disk), 5 backend servers and 1 process per server. To represent the different routings and service requests, a separate class is used for each process at the frontend servers. Thus, a closed model is constructed with 20 classes, 5 service centers (i.e., processor, disk, and 3 backend servers), and 1 customer in each class. Substituting these numbers into Eq. (13.5.6), the solution of this hypothetical model would require approximately 104 million operations. So how can one avoid the high computational cost of exact MVA algorithms? The source of the large number of operations required to compute the exact MVA algorithm is the recursion expressed in the equation Equation 13.5.7
With the goal of reducing the time and space complexity of the MVA algorithm, several approximations have been proposed to break up the recursion [18]. The approximate algorithms reduce the complexity by substituting approximations for that are not recursive cite13Sevcik00. A very simple approximation, due to Bard [4], is as follows: Equation 13.5.8
This approximation is attractive when the number of customers becomes large. The most commonly used approximation is one proposed by Schweitzer [19] for BCMP models. It is based on the assumption that the number of class r customers at each service center increases proportionally to the number of class r customers in the network. From this observation, it follows that Equation 13.5.9
Equation 13.5.10
Equation (13.5.10) is the basis of an iterative method for calculating performance measures of a closed model, described by the algorithm of Figure 13.5. The basic idea is to substitute estimates for queue lengths [] into Eq. (13.5.10) and use the MVA equations to compute the next estimates for . The iterative process starts with an estimate which assumes that class r customers are equally distributed among the K devices of the network. Iteration stops when successive values of are sufficiently close [i.e., where is a tolerance bound]. Note that in the algorithm, the notation D_{i,r} > 0. The computational cost of solving the iterative method is proportional to the product (K x R) per iteration and typical errors are quite small. In several experiments reported in [21], the average errors in throughput and response time for various approximate MVA algorithms are around 2% for relatively small populations and less than 1% for relatively large populations. For example, the iterative solution of the model of the hypothetical distributed system discussed in Section 13.5.1 would require a number of operations proportional to (5 x 20 = 100), as opposed to 104 million operations for the exact solution. Although approximate methods provide a cheaper solution for productform multiclass models, they have a serious drawback. The commonly used approximate methods for solving multiclass models do not provide bounds on the errors introduced by the approximations. Therefore, to assess the approximation's reliability, one has to validate the results against the exact MVA results, which may be impossible in the case of large systems. To exemplify the accuracy of Schweitzer's approximation, compare the exact results of the transaction system example with those calculated by the iterative process, given that the maximum tolerance for the absolute difference between successive values of the queue lengths is 0.01. Table 13.6 shows the throughput and response time computed under both methods. As seen, the maximum observed relative error is 2.25%.
Example 13.2.Consider a closed model that represents a packet switching store and forward communication network with flow control mechanism [15]. Consider that each virtual channel is represented by one class. Thus, the number of customers in each class represents the window size (i.e., the maximum number of messages allowed on the virtual channel). To keep the example simple, assume that the devices of the model represent the sink node, the source node, and the virtual channel. All devices are assumed to meet the BCMP requirements. The model to be solved has the following parameters: 2 classes (i.e., 2 virtual channels), N_{1} = 20, N_{2} = 25 (window sizes), = (0.0030, 0.0009,0.0123), and = (0.0035, 0.0215,0.0011). Table 13.7 shows the iterations required by the Schweitzer's approximation to solve this model. The termination criterion is to have 0.01 as the maximum difference between successive values of N_{i,r}. Within 9 iterations, the approximate method obtains results whose maximum error compared to the exact results is 1.59%. Notice that the exact solution requires the computation of queue lengths for 6 (26 x 21) different populations.
