15.3 Modeling Blocking Effects

Blocking is a general phenomenon in computer systems, which occurs when the operation of a system component is suspended because of the unavailability of resources elsewhere in the system [3]. The literature [11] reports three different types of blocking: 1) blocking after service, 2) blocking before service, and 3) repetitive blocking. The latter occurs when a customer at service center j wants to go to center k, whose queue is full. In this case, the customer is rejected at center k and goes to the rear of the queue of center j. This section shows how to model blocking before service, i.e., blocking that occurs when the customer to be served next at center i determines that its next service center is j before entering service at i. If j is already full, the customer cannot enter the service center i and this center is therefore blocked. For example, when the load of a mail server becomes higher than a given threshold, the server queues messages rather than attempting to deliver them. In other words, the send message service is blocked until the load decreases and the system can resume an adequate throughput.

Another example is that of a Web server [5], which typically handles multiple client requests simultaneously. These requests share the the server components (e.g., the processor, disk, memory, and network interface). A common architecture for a Web server consists of a number of processes, where each process handles one request at a time. When the TCP connection is established, the server removes it from the SYN-RCVD queue and places it in the listen sockets queue (i.e., acceptance queue) of connections awaiting acceptance from the Web server [6]. Usually, a Web server imposes a limit on the number of simultaneous processes [5], also called maximum number of clients. Thus, a request may get blocked at the acceptance queue until the server load falls below the maximum number of clients threshold.

The maximum number of processes that may be active simultaneously depends on factors such as available memory, disk subsystem capacity, and the expected load on the server. When the number of requests being served reaches the maximum number of processes, an arriving request must wait in the blocking queue, as illustrated in Fig. 15.4. When a request is completed, another request is selected from the blocking queue to be executed by the system. Figure 15.4 depicts a representation of a system with a blocking queue. The number of available processes is represented by a pool of tokens. If all tokens are occupied, an arriving request must wait. Once a request acquires a process (represented by a token), it becomes ready to use processor and I/O components. After completing execution, a process releases the token, which is immediately assigned to a process in the blocking queue. If the queue is empty, the token returns to the pool of available resources. Given that this model cannot be solved directly by product-form (e.g., MVA) algorithms, an approximate solution is required. Techniques for solving non product-form models rely heavily on the use of the flow-equivalent concept, which was introduced in Chapter 14, as the decomposition/aggregation technique.

Figure 15.4. A system with a blocking queue.

graphics/15fig04.gif

15.3.1 Single Class

Let J denote the the maximum number of transactions that may be executed simultaneously in a transaction system. When a transaction arrives and finds J transactions in execution, it has to wait outside the system, in a blocking queue. When a transaction completes execution, another one, from the blocking queue, is admitted into the system. Consider for now that a single-class queuing model is used to represent the computer system with blocking. The workload consists of transactions processed by the system and can be modeled either by an open class (transaction type) or by a closed class (interactive type). In both cases, the blocking effects appear when the number of transactions to be executed exceeds J. This situation generates a blocking queue, whose length varies with time. The solution of single-class models with blocking queuing is typical of many other non product-form models. It proceeds in five steps.

Consider the system in isolation ignoring the external arrivals of transactions.
For each feasible customer population (i.e., from 0 to J transactions), solve the model in isolation and obtain the throughput as a function of the population, X(n), n = 0, 1, 2, ..., J. Use either the exact or the approximate MVA algorithm to solve the model.
Define a flow-equivalent (FE) server to replace the system. The service time of the flow-equivalent server depends on the load of the system and is given by:
Create a reduced model with the flow-equivalent server and the external workload. Fig. 15.5 shows the reduced models for the two representations of the workload. For an interactive workload, the reduced model is closed with population equal to M, the number of clients associated with the class. In addition to the flow-equivalent server, the model has a delay server that represents the average think time (Z). The interactive model can be viewed as having M transactions, which alternate between execution in the flow-equivalent server and thinking in the delay server. The reduced model of the transaction class is an open model with a flow-equivalent server and arrival rate l.

Figure 15.5. Reduced models of a system with blocking queuing.
Solve the reduced model. The closed model can be solved using the single-class MVA algorithm with load-dependent devices, described in Chapter 14. The solution of the open model follows that of the birth-death model provided in Chapter 10.

Example 15.2.

A database server has a processor and two disks (D1 and D2) and is responsible for processing transactions that access the database. For workload characterization purposes, the transactions are grouped into a single class, which means that they are somehow similar. A typical transaction is characterized by the following service demands: D_cpu = 0.1 sec and D_D1 = D_D2 = 0.45 sec. The arrival rate is l = 1 tps. With the purpose of keeping the example simple, the maximum number of transactions was set to three (i.e., J = 3). The performance analyst wants to evaluate the effect of the blocking queue (bq) on the transaction response time. In other words, the model should be able to calculate the percentage of the response time that a transaction spends waiting in the blocking queue, before being admitted for execution (%bq).

To answer the question, the analyst needs to calculate the average response time (R_s) and the average waiting time (R_bq) in the blocking queue. From Little's Law, it follows that

Equation 15.3.11

graphics/15equ311.gif

where N_s is the average number of transactions in the system and N_bq is the average number of transactions waiting for execution.

Following the steps to solve a non product-form model, one first calculates the server throughput in isolation for all possible populations. In this case, because of the memory constraint, the server can have at most three transactions in execution (n = 1, 2, 3). The single-class MVA algorithm of Chapter 12 yields X(1) = 1 tps, X(2) = 1.413 tps, and X(3) = 1.623 tps. With these results one can define a flow-equivalent server for the database system. The reduced model is an open system with a load-dependent server and arrival rate l. Using the results introduced in Chapter 10 for birth-death models one is able to calculate P_k, the probability that there are k transactions in the system. Thus, letting l_i = l and m_i = X(i), the probabilities P₀ = 0.260, P₁ = 0.260, P₂ = 0.184, and P₃ = 0.113 are obtained. The value of N_bq can be obtained from the probabilities P_k as

Equation 15.3.12

graphics/15equ312.gif

since when there are k (k > 1) transactions in the blocking queue there are k + 3 in execution. Using the probability values in Eq. (15.3.12) yields N_bq = 0.474. The average number of transactions in the system is given by

Equation 15.3.13

where N_e is the average number of transactions in execution given by

Equation 15.3.14

graphics/15equ314.gif

Substituting the values of P_k and J into Eq. (15.3.14), yields N_e = 1.516. Thus, the average number of transactions in the system is 1.99 and the percentage of time that a typical transaction spends in the blocking queue is (0.474 x 100)/1.99 = 23.8%.

15.3.2 Multiple Classes

Chapter 13 showed the importance of multiple-class models for performance engineering. Therefore, one needs to generalize the solution of blocking models to multiple classes. The queuing network models considered here contain K service centers and a number R of different classes of customers. Among the customer classes, C classes (C J_c, c = 1, 2, ..., C. For example, consider a simple two-class model with a processor and two disks. The workload consists of two classes of transactions: queries (q) and updates (u). The execution mix can have at most four queries and one update simultaneously (i.e., J_q = 4 and J_u = 1). The two classes are characterized by their arrival rates, l_q = 4.20 tps and l_u = 0.20 tps. Typical capacity planning questions are: What is the effect on the system performance of the doubling the maximum number of transactions in execution at the same time? To answer these questions, one needs to introduce first approximate solution techniques for the blocking queuing problem.

Brandwajn [10] and Menascé and Almeida [24] independently developed a noniterative solution technique for multiple-class models with memory constraints, i.e., blocking characteristics. Basically, the technique requires the calculation of throughputs for all possible combinations of customer populations. These throughputs feed a series of single-class models, whose results are then combined into the solution of the original problem. Although accurate, the technique has a high computational cost. An iterative approximation, developed by Brandwajn [10] and Lazowska and Zahorjan [23], circumvents the problem of computing throughputs for all populations. However, as pointed out in Ref. [37], the iterative algorithm tends to be less accurate than the first technique. The basic approach of the iterative technique developed in Ref. [23] is to reduce a multiclass model to a set of single-class models. To achieve this goal, two assumptions are introduced.

Independence. It is assumed that the average class i population in the blocking model is independent of class j (i j) population.
Average Population. It is assumed that the throughput of class i given n_i customers in the model depends only on the average population of the other classes.

The first assumption allows the creation of flow-equivalent servers for each blocking class c (c = 1, 2, ..., C). The second assumption avoids the calculation of throughputs for all possible population configurations. The throughput of a given class i with population n_i, X_i(n_i), is obtained from the evaluation of the multiclass model with n_i class i customers and with other class populations fixed at their average values, n_j = , j i. The average customer population can be obtained by an iterative process that evaluates the single-class model of each blocking class. The full description of the algorithm proposed by Lazowska and Zahorjan [23] blocking models with multiple classes is as follows.

Initialize the average population of each blocking class. To obtain an initial estimate for the average population of each blocking class, solve the original multiclass models without blocking. If class c is of type transaction it should be modeled as an open class. If class c is of type interactive it should be modeled as a closed class. The techniques for solving the unconstrained model are given in Chapter 13 for closed, open, and mixed multiclass queuing networks. For each blocking class, set the estimate to the minimum between J_c and the average class c population obtained by solving the unconstrained model. Thus, the outcome of this step is a set of estimates for .
Create a transformed multiclass model. The type of a blocking class is either transaction or interactive. Change the original type to a closed, with population equal to . The unconstrained classes remain unchanged.
For each blocking class c = 1, ..., C do:
1. Calculate the throughputs X_c(n_c) for n_c = 1, ..., J_c. These throughputs are obtained through the solution of the transformed multiclass model when is substituted for 1, 2, ..., J_c. The population for class s (s c) remains fixed at . The model has to be solved J_c times.
2. Define a single-class model with memory queuing and throughputs X_c(n_c).
3. Using the FESC-based algorithm of Section 15.3.1, solve the singleclass model.
4. Using Eq. (15.3.14), calculate N_m,c, the average number of class c customers in execution. Let = N_m,c.
Repeat step 3 until successive values of are sufficiently close (i.e., the percentage difference between successive iterations is smaller than a set tolerance).

Obtain performance results for the C constrained classes from the solution of the C single-class models of step 3.
Obtain performance results for the (R C) unconstrained classes from the solution of the transformed multiclass model (step 2), which take as input parameter for the blocking classes the final values for obtained in step 3.

Example 15.3.

Consider again the simple two-class model described in the beginning of this section. Table 15.1 displays the parameters that characterize the model.

Step 1 of the algorithm requires that one solves an open multiclass unconstrained model as given in Chapter 13. The resulting values of the average population are 6.0192 and 0.8540 transactions for classes query and update, respectively. So and are initialized as = min{4,6.0192} = 4 and = min{1,0.8540} = 0.8540. Table 15.2 shows the first four iterations of step 3 of the algorithm. The columns labeled X_Q(1) through X_Q(4) display the throughputs obtained in step 3a for class query, and the column labeled X_U(1) shows the throughput obtained in the same step for class update. The columns labeled and display the values obtained in step 3 (d) derived from the solution of a FESC model for classes query and update, respectively. For instance, if one wanted to stop at iteration 4, the response time for update transactions would be obtained by dividing the average number of transactions in the system (N_s) obtained by solving the FESC model (1.077 in this case) by the average arrival rate (0.2 tps). The resulting response time would then be 5.83 sec. Table 15.3 shows the solution of this example for a tolerance of 10². This table shows that an update transaction spends, on average, 51.74% of the response time waiting for execution. It is clear that the system needs more processes to execute the transactions. The same model can be used to investigate the effect on the performance of update transactions of an increase in the maximum number of processes. Assume that the maximum number of update transactions in execution is doubled. Thus, letting J_u = 2 and solving the model again, one obtains 3.09 sec and 2.33 sec for the average response times of updates and queries, respectively. In this case, an update transaction would spend only 0.29 sec waiting for execution.

Table 15.1. Input Parameters for the Two-Class Model
Class	Service Demand (sec)			Arrival Rate, l	J
Class	Processor	Disk 1	Disk 2	Arrival Rate, l	J
Query Update	0.105 0.375	0.180 0.480	0 0.240	4.20 0.20	4 1

Table 15.2. First Four Iterations for Multiclass Example
Iteration	Query					Update
Iteration	X_Q(1)	X_Q(2)	X_Q(3)	X_Q(4)		X_U(1)
1	2.542	3.577	4.115	4.434	3.648	0.342	0.585
2	2.789	3.835	4.350	4.645	3.355	0.381	0.525
3	2.850	3.897	4.407	4.695	3.289	0.386	0.518
4	2.857	3.904	4.412	4.700	3.282	0.386	0.518

Figure 15.4. A system with a blocking queue.

15.3.1 Single Class

Figure 15.5. Reduced models of a system with blocking queuing.

Example 15.2.

15.3.2 Multiple Classes

Example 15.3.

Table 15.1. Input Parameters for the Two-Class Model

Table 15.2. First Four Iterations for Multiclass Example