13.5 Closed Models


The load intensity of a multiclass model with R classes and K devices is represented by the vector graphics/349fig02.gif = (N1, N2, ..., NR), where Nr indicates the number of class r customers in the system. The goal of multiclass algorithms is to calculate the performance measures of the network as a function of graphics/349fig02.gif. In the transaction system example (i.e., Figure 13.1), the performance measures are calculated for each class (update and query) for the population graphics/349fig02.gif = (1, 3) (i.e., for one update transaction and three query transactions).

The two types of processing that are usually modeled as closed classes are background batch jobs and interactive jobs. The key feature is that the total load placed by these classes on a system is constant as depicted in Figure 13.2. In the case of background processing, it is assumed that the system always has a backlog of jobs waiting to enter the system. Thus whenever a process leaves the system it is replaced by a new, statistically identical, process. Therefore the number of background processes in the system remains constant. For example, in a database system, typical background processes include a log writer, a checkpointing process, and a session monitor. In an interactive system, each customer (e.g., transaction, process, user, job) alternates between thinking and waiting. The thinking state corresponds to the period of time that elapses since a customer receives a reply from the system due to the previous request until a new request is issued. After submitting a new request, the customer enters the waiting state while the system executes the request. Each of the Mr terminals is associated with one class r customer. Thus, there exists Mr requests in the system, where each request cycles between spending Rr units of time executing and Zr time units thinking. A closed model is a combination of interactive and batch classes. The closed system analysis begins with the load intensity vector graphics/350fig02.gif and the class descriptor parameters (Di,r, Mr, Zr). The analysis computes the throughputs, response times, and queue lengths of each class.

Figure 13.2. Multiple class closed model.

graphics/13fig02.gif

As with the single-class model, the MVA-based solution of a multiclass system relies on three basic equations applied to each class.

Equation 13.5.1

graphics/13equ51.gif


Equation (13.5.1) is obtained by applying Little's Law separately to each class of customers. If r is a batch class, then Zr is zero. The residence time graphics/350fig02.gif corresponds to the total time a class r customer spends at server i during its execution. It includes the service demand (Di,r) plus the total waiting time at the device. The average response time of class r customers can then be written as graphics/350fig01.gif.

Applying Little's Law and the Forced Flow Law to each service center yields Eq. (13.5.2).

Equation 13.5.2

graphics/13equ52.gif


Summing up customers of all classes at device i gives the total number of customers at that device, graphics/352fig03.gif.

Equation 13.5.3

graphics/13equ53.gif


The key equation of the MVA technique can be derived from the observation that the mean response time of a class r customer at service center i equals its own mean service time at that device plus the time to complete the mean backlog seen upon its arrival (i.e., the average number of customers seen upon arrival multiplied by each customer's mean service time). Therefore,

Equation 13.5.4

graphics/13equ54.gif


where graphics/351fig01.gif is the average queue length at device i seen by an arriving class r customer. As seen in Eq. (13.5.4), the average residence time is a function of the service demand and not of the individual values of the service times and visit ratios.

In the case of a delay server, it follows directly by definition that the backlog is zero, graphics/351fig05.gif, which makes graphics/351fig02.gif = Di,r. When the scheduling discipline of center i is PS or LCFS-PR, the expression graphics/351fig03.gif can be viewed as an inflation factor of the service demand due to the congestion by other customers. For FCFS service centers, Eq. (13.5.4) represents the customer's own service demand plus the time to complete the service of all customers in front of it. For practical purposes, scheduling disciplines can be grouped into two categories: delay and queuing. The latter encompasses load-independent servers with the following disciplines: PS, LCFS-PR, and FCFS.

Having as a starting point the fact that the queue length is zero when there are no customers in the network graphics/352fig01.gif, Eqs. (13.5.1), (13.5.2), and (13.5.4) can be used iteratively to calculate the performance measures of the model. However, there is a minor problem; no expression for graphics/352fig02.gif has yet been given. Given that expression it would be easy to solve a multiclass model. Multi-class model solution techniques are grouped into either exact or approximate solutions, depending on the way the backlog seen upon arrival [i.e., graphics/352fig02.gif] is calculated.

13.5.1 Exact Solution Algorithm

As pointed out in Chapter 10, an exact solution technique means the exact solution of analytic formulas of approximate models of actual systems. The key to the exact solution of multiclass closed queuing networks is the arrival theorem [16, 17], which states that a class r customer arriving at service center i in a system with population graphics/349fig02.gif sees the distribution of the number of customers in that center as being equal to the steady-state distribution for a network with population graphics/352fig07.gif. The vector graphics/352fig08.gif consists of a 1 in the rth position and zeros in the rest of the vector [i.e., (0, 0, ..., 1, ..., 0)]. In other words, it states simply that the arriving customer sees the system in equilibrium with itself removed. In the transaction system example with population graphics/349fig02.gif = (1, 3), a query transaction that arrives at the CPU sees a queuing distribution equal to that of the system with population graphics/349fig02.gif = (1, 2). From the arrival theorem, it follows that

Equation 13.5.5

graphics/13equ55.gif


Combining Eqs. (13.5.1), (13.5.2), (13.5.3), (13.5.4), and (13.5.5), the algorithm for the exact solution of closed multiclass models is described in Figure 13.3.

Figure 13.3. Exact MVA algorithm for multiple classes.

graphics/13fig03.gif

13.5.2 Closed Models: Case Study

Recall the motivating problem of Section 13.3. The algorithm of Figure 13.3 can be used to obtain the performance measures for each class. The first step is to apply the exact algorithm to the baseline model described in Table 13.1 [i.e., to calculate the results for the population of one update and three query transactions, graphics/349fig02.gif = (1, 3)]. From the arrival theorem, to calculate the residence time at the devices (processor and two disks) for population (1, 3), the device queue lengths are required for populations (0, 3) and (1, 2). These correspond to one less update transaction and one less query transaction, respectively. By continually removing one customer from each class, eventually the performance measures for population (0, 0) are calculated, which is the starting point of the algorithm. Figure 13.4 shows the precedence relationships required to calculate the results of a system with population (1, 3) using exact MVA.

Table 13.3 shows for each population of the sequence of Figure 13.4 the results calculated by the exact MVA algorithm for the baseline model of the transaction system example. It is worth noting that each class can be calculated separately, as indicated in Table 13.3. However, the interaction among the multiple classes is explicitly represented by the term graphics/354fig02.gif in the equation graphics/354fig03.gif. The average number of customers at device i reflects the contention for common shared resources among the distinct classes of the workload.

Table 13.3. Step-by-Step Results of the Two-Class Model Solution

Class

Variable

Population (Update, Query)

(0,0)

(0,1)

(0,2)

(0,3)

(1,0)

(1,1)

(1,2)

(1,3)

Query

R'1,q

R'2,q

R'3,q

-

-

-

0.105

0.180

0.000

0.144

0.294

0.000

0.174

0.422

0.000

-

-

-

0.141

0.259

0.000

0.177

0.388

0.000

0.204

0.529

0.000

X0,q

-

3.509

4.566

5.034

-

2.500

3.540

4.093

n1,q

n2,q

n3,q

0

0

0

0.368

0.632

0.000

0.658

1.342

0.000

0.876

2.124

0.000

-

-

-

0.352

0.648

0.000

0.627

1.373

0.000

0.835

2.165

0.000

Update

R'1,u

R'2,u

R'3,u

-

-

-

-

-

-

-

-

-

-

-

-

0.375

0.480

0.240

0.513

0.783

0.240

0.622

1.124

0.240

0.704

1.500

0.240

X0,u

-

-

-

-

0.913

0.651

0.504

0.409

n1,u

n1,u

n1,u

0

0

0

-

-

-

-

-

-

-

-

-

0.343

0.438

0.219

0.334

0.510

0.156

0.313

0.566

0.121

0.288

0.614

0.098

From Table 13.3 it is noted that the calculated throughputs for the query and update classes are 4.093 tps and 0.409 tps, respectively, which match the measurement results of Table 13.1. Either using Little's Law, or summing the average device residence times, the average response times are obtained for the two classes: 0.733 sec for queries and 2.444 sec for updates. Once the baseline model has been solved and validated, attention is directed to the prediction phase. To construct a predictive model, the parameters of the predicted system need to be specified. Consider the two questions posed in Section 13.3.

Figure 13.4. Sequence of calculations of MVA.

graphics/13fig04.gif

  1. What is the predicted increase in the throughput of query transactions if the load of the update class is moved to off-peak hours?

    Answer: Since the update class will be removed, it is reasonable to assume that the number of transaction in the system will remain the same and the system will have a population of 4 query transactions. Solving a single-class model with 4 queries, a throughput of 5.275 tps is obtained. This new value indicates that the removal of the update class increases throughput by 28.87%.

  2. Realizing that disk 1 is the bottleneck (i.e., the device with the highest utilization) and disk 2 is lightly loaded, what is the predicted response time if the total I/O load of query transactions is moved to disk 2?

    Answer: To construct the predictive model, it is only necessary to shift the value of D2,q to D3,q, to indicate that the I/O load of query transactions will be moved from disk 1 to disk 2. With the new parameters, the model is resolved to give the following results: X0,q = 4.335 tps, X0,u = 0.517 tps, Rq = 0.692 sec, and Ru = 1.934 sec. These results indicate a reduction of 5.6% in the average response time of queries and 20.9% in the mean response time of update transactions. Why does the proposed modification favor the update class? First, consider the system performance measures obtained by the baseline and predictive models, respectively.

    From Table 13.4, note that the proposed modification changes the bottleneck from disk 1 to disk 2 and, at the same time, provides a better balance of disk utilization. Now, consider where the two types of transactions spend their time. Let the residence time percentage be the time a transaction spends at device i, expressed as a percentage of the average response time for the transaction [i.e., graphics/356fig01.gif].

    Table 13.4. Device Utilization

    Model

    Class

    Utilization (%)

    Processor

    Disk 1

    Disk 2

    Baseline

    Query

    Update

    Total

    42.98

    15.33

    58.31

    73.67

    19.63

    93.3

    0

    9.81

    9.81

    Modified

    Query

    Update

    Total

    45.51

    19.38

    64.89

    0

    24.81

    24.81

    78.03

    12.40

    90.43

    From Table 13.5 note that in the baseline model, query transactions spend 72.2% of their average response time at the bottleneck device, whereas update transactions spend 61.4%. When the I/O load of query transactions is moved to disk 2, it becomes the bottleneck. Update transactions benefit more from the modification because they get better disk utilization balance. To confirm this, the results in Table 13.5 show that update transactions spend 24.8% and 38.8% of their time at disk 1 and disk 2, respectively. Moreover, disk 1 has no contention, since it is dedicated to the update transaction class. In contrast, query transactions concentrate their I/O on Disk 2, which is also used by updates.

    Table 13.5. Residence Time Percentage

    Model

    Class

    Residence Time (%)

    Processor

    Disk 1

    Disk 2

    Baseline

    Query

    Update

    27.8

    28.8

    72.2

    61.4

    0

    9.8

    Modified

    Query

    Update

    31.5

    36.4

    0

    24.8

    68.5

    38.8

13.5.3 Approximate Solution Algorithms

Table 13.3 provides a good idea regarding the number of operations required to compute the results of the simple transaction system example that consists of only 2 classes, 4 customers, and 3 devices. Not surprising, the computational effort required to compute performance measures of models of modern systems with several classes, many processes in execution, and hundreds of I/O devices is "significant."

Looking at Figure 13.4, note that to compute the results of the model with population (1, 2), the results of populations (1, 1) and (0, 2) are needed. Each of these populations requires, as input, queue lengths from two related populations. In the general case, when a system has R classes, each calculation of metrics for a given population demands inputs from R other populations. Due to the precedence relationships in the calculation of performance measures of a multiclass model, the computational complexity of MVA algorithms grows exponentially with the number of classes. The number of multiplications and the number of additions required to solve a multiclass model is proportional to

Equation 13.5.6

graphics/13equ56.gif


Modeling distributed systems often involve large queuing networks. For example, consider a multi-tier distributed server environment, modeled as multiclass models [7, 10]. The workload may be viewed as processes running at the front servers, which during their execution require access to a number of files that can be either local or in the back-end servers. Assume that each process visits three back-end servers for file services. Suppose that we have a high speed LAN-based system with 20 front-end servers (processor and disk), 5 back-end servers and 1 process per server. To represent the different routings and service requests, a separate class is used for each process at the front-end servers. Thus, a closed model is constructed with 20 classes, 5 service centers (i.e., processor, disk, and 3 back-end servers), and 1 customer in each class. Substituting these numbers into Eq. (13.5.6), the solution of this hypothetical model would require approximately 104 million operations. So how can one avoid the high computational cost of exact MVA algorithms?

The source of the large number of operations required to compute the exact MVA algorithm is the recursion expressed in the equation

Equation 13.5.7

graphics/13equ57.gif


With the goal of reducing the time and space complexity of the MVA algorithm, several approximations have been proposed to break up the recursion [18]. The approximate algorithms reduce the complexity by substituting approximations for graphics/358fig01.gif that are not recursive cite13-Sevcik00. A very simple approximation, due to Bard [4], is as follows:

Equation 13.5.8

graphics/13equ58.gif


This approximation is attractive when the number of customers becomes large. The most commonly used approximation is one proposed by Schweitzer [19] for BCMP models. It is based on the assumption that the number of class r customers at each service center increases proportionally to the number of class r customers in the network. From this observation, it follows that

Equation 13.5.9

graphics/13equ59.gif


Equation 13.5.10

graphics/13equ510.gif


Equation (13.5.10) is the basis of an iterative method for calculating performance measures of a closed model, described by the algorithm of Figure 13.5. The basic idea is to substitute estimates for queue lengths [graphics/359fig01.gif] into Eq. (13.5.10) and use the MVA equations to compute the next estimates for graphics/359fig02.gif. The iterative process starts with an estimate graphics/359fig03.gif which assumes that class r customers are equally distributed among the K devices of the network. Iteration stops when successive values of graphics/359fig02.gif are sufficiently close [i.e.,graphics/359fig05.gif where is a tolerance bound]. Note that in the algorithm, the notation Di,r > 0.

The computational cost of solving the iterative method is proportional to the product (K x R) per iteration and typical errors are quite small. In several experiments reported in [21], the average errors in throughput and response time for various approximate MVA algorithms are around 2% for relatively small populations and less than 1% for relatively large populations.

For example, the iterative solution of the model of the hypothetical distributed system discussed in Section 13.5.1 would require a number of operations proportional to (5 x 20 = 100), as opposed to 104 million operations for the exact solution. Although approximate methods provide a cheaper solution for product-form multiclass models, they have a serious drawback. The commonly used approximate methods for solving multiclass models do not provide bounds on the errors introduced by the approximations. Therefore, to assess the approximation's reliability, one has to validate the results against the exact MVA results, which may be impossible in the case of large systems.

To exemplify the accuracy of Schweitzer's approximation, compare the exact results of the transaction system example with those calculated by the iterative process, given that the maximum tolerance for the absolute difference between successive values of the queue lengths is 0.01. Table 13.6 shows the throughput and response time computed under both methods. As seen, the maximum observed relative error is 2.25%.

Table 13.6. Exact and Approximate Results

Method

Throughput

Response Time

Query

Update

Query

Update

Exact MVA

4.093

0.409

0.733

2.445

Approximate

4.001

0.407

0.749

2.456

Relative error (|%|)

2.25

0.49

0.22

0.45

Example 13.2.

Consider a closed model that represents a packet switching store and forward communication network with flow control mechanism [15]. Consider that each virtual channel is represented by one class. Thus, the number of customers in each class represents the window size (i.e., the maximum number of messages allowed on the virtual channel). To keep the example simple, assume that the devices of the model represent the sink node, the source node, and the virtual channel. All devices are assumed to meet the BCMP requirements. The model to be solved has the following parameters: 2 classes (i.e., 2 virtual channels), N1 = 20, N2 = 25 (window sizes), graphics/360fig01.gif = (0.0030, 0.0009,0.0123), and graphics/360fig02.gif = (0.0035, 0.0215,0.0011). Table 13.7 shows the iterations required by the Schweitzer's approximation to solve this model. The termination criterion is to have 0.01 as the maximum difference between successive values of Ni,r. Within 9 iterations, the approximate method obtains results whose maximum error compared to the exact results is 1.59%. Notice that the exact solution requires the computation of queue lengths for 6 (26 x 21) different populations.

Table 13.7. Approximate MVA Computation

Iteration

Class 1

Class 2

Queue Length

TPUT

Queue Length

TPUT

Channel

Source

Sink

Channel

Source

Sink

1

8.333

8.333

8.333

78.80

6.667

6.667

6.667

61.13

2

3.704

1.111

15.185

81.07

3.352

20.593

1.054

48.34

3

1.914

1.652

16.433

79.73

1.341

22.745

0.915

44.53

4

0.995

1.817

17.189

78.26

0.655

23.448

0.897

43.50

5

0.610

1.843

17.545

77.43

0.399

23.688

0.911

43.22

6

0.459

1.842

17.697

77.05

0.301

23.775

0.923

43.14

7

0.401

1.839

17.758

76.90

0.264

23.806

0.929

43.11

8

0.379

1.837

17.782

76.83

0.249

23.818

0.932

43.10

9

0.371

1.837

17.791

76.83

0.244

23.822

0.933

43.10

Exact

0.377

1.863

17.760

77.43

0.246

23.813

0.942

43.27

Error (|%|)

1.59

1.39

0.17

0.77

0.81

0.03

0.98

0.39



Performance by Design. Computer Capacity Planning by Example
Performance by Design: Computer Capacity Planning By Example
ISBN: 0130906735
EAN: 2147483647
Year: 2003
Pages: 166

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net