Blocking is a general phenomenon in computer systems, which occurs when the operation of a system component is suspended because of the unavailability of resources elsewhere in the system [3]. The literature [11] reports three different types of blocking: 1) blocking after service, 2) blocking before service, and 3) repetitive blocking. The latter occurs when a customer at service center j wants to go to center k, whose queue is full. In this case, the customer is rejected at center k and goes to the rear of the queue of center j. This section shows how to model blocking before service, i.e., blocking that occurs when the customer to be served next at center i determines that its next service center is j before entering service at i. If j is already full, the customer cannot enter the service center i and this center is therefore blocked. For example, when the load of a mail server becomes higher than a given threshold, the server queues messages rather than attempting to deliver them. In other words, the send message service is blocked until the load decreases and the system can resume an adequate throughput. Another example is that of a Web server [5], which typically handles multiple client requests simultaneously. These requests share the the server components (e.g., the processor, disk, memory, and network interface). A common architecture for a Web server consists of a number of processes, where each process handles one request at a time. When the TCP connection is established, the server removes it from the SYNRCVD queue and places it in the listen sockets queue (i.e., acceptance queue) of connections awaiting acceptance from the Web server [6]. Usually, a Web server imposes a limit on the number of simultaneous processes [5], also called maximum number of clients. Thus, a request may get blocked at the acceptance queue until the server load falls below the maximum number of clients threshold. The maximum number of processes that may be active simultaneously depends on factors such as available memory, disk subsystem capacity, and the expected load on the server. When the number of requests being served reaches the maximum number of processes, an arriving request must wait in the blocking queue, as illustrated in Fig. 15.4. When a request is completed, another request is selected from the blocking queue to be executed by the system. Figure 15.4 depicts a representation of a system with a blocking queue. The number of available processes is represented by a pool of tokens. If all tokens are occupied, an arriving request must wait. Once a request acquires a process (represented by a token), it becomes ready to use processor and I/O components. After completing execution, a process releases the token, which is immediately assigned to a process in the blocking queue. If the queue is empty, the token returns to the pool of available resources. Given that this model cannot be solved directly by productform (e.g., MVA) algorithms, an approximate solution is required. Techniques for solving non productform models rely heavily on the use of the flowequivalent concept, which was introduced in Chapter 14, as the decomposition/aggregation technique. Figure 15.4. A system with a blocking queue.
15.3.1 Single ClassLet J denote the the maximum number of transactions that may be executed simultaneously in a transaction system. When a transaction arrives and finds J transactions in execution, it has to wait outside the system, in a blocking queue. When a transaction completes execution, another one, from the blocking queue, is admitted into the system. Consider for now that a singleclass queuing model is used to represent the computer system with blocking. The workload consists of transactions processed by the system and can be modeled either by an open class (transaction type) or by a closed class (interactive type). In both cases, the blocking effects appear when the number of transactions to be executed exceeds J. This situation generates a blocking queue, whose length varies with time. The solution of singleclass models with blocking queuing is typical of many other non productform models. It proceeds in five steps.
Example 15.2.A database server has a processor and two disks (D1 and D2) and is responsible for processing transactions that access the database. For workload characterization purposes, the transactions are grouped into a single class, which means that they are somehow similar. A typical transaction is characterized by the following service demands: D_{cpu} = 0.1 sec and D_{D1} = D_{D2} = 0.45 sec. The arrival rate is l = 1 tps. With the purpose of keeping the example simple, the maximum number of transactions was set to three (i.e., J = 3). The performance analyst wants to evaluate the effect of the blocking queue (bq) on the transaction response time. In other words, the model should be able to calculate the percentage of the response time that a transaction spends waiting in the blocking queue, before being admitted for execution (%bq). To answer the question, the analyst needs to calculate the average response time (R_{s}) and the average waiting time (R_{bq}) in the blocking queue. From Little's Law, it follows that
Equation 15.3.11
where N_{s} is the average number of transactions in the system and N_{bq} is the average number of transactions waiting for execution. Following the steps to solve a non productform model, one first calculates the server throughput in isolation for all possible populations. In this case, because of the memory constraint, the server can have at most three transactions in execution (n = 1, 2, 3). The singleclass MVA algorithm of Chapter 12 yields X(1) = 1 tps, X(2) = 1.413 tps, and X(3) = 1.623 tps. With these results one can define a flowequivalent server for the database system. The reduced model is an open system with a loaddependent server and arrival rate l. Using the results introduced in Chapter 10 for birthdeath models one is able to calculate P_{k}, the probability that there are k transactions in the system. Thus, letting l_{i} = l and m_{i} = X(i), the probabilities P_{0} = 0.260, P_{1} = 0.260, P_{2} = 0.184, and P_{3} = 0.113 are obtained. The value of N_{bq} can be obtained from the probabilities P_{k} as
Equation 15.3.12
since when there are k (k > 1) transactions in the blocking queue there are k + 3 in execution. Using the probability values in Eq. (15.3.12) yields N_{bq} = 0.474. The average number of transactions in the system is given by
Equation 15.3.13
where N_{e} is the average number of transactions in execution given by
Equation 15.3.14
Substituting the values of P_{k} and J into Eq. (15.3.14), yields N_{e} = 1.516. Thus, the average number of transactions in the system is 1.99 and the percentage of time that a typical transaction spends in the blocking queue is (0.474 x 100)/1.99 = 23.8%. 15.3.2 Multiple ClassesChapter 13 showed the importance of multipleclass models for performance engineering. Therefore, one needs to generalize the solution of blocking models to multiple classes. The queuing network models considered here contain K service centers and a number R of different classes of customers. Among the customer classes, C classes (C J_{c}, c = 1, 2, ..., C. For example, consider a simple twoclass model with a processor and two disks. The workload consists of two classes of transactions: queries (q) and updates (u). The execution mix can have at most four queries and one update simultaneously (i.e., J_{q} = 4 and J_{u} = 1). The two classes are characterized by their arrival rates, l_{q} = 4.20 tps and l_{u} = 0.20 tps. Typical capacity planning questions are: What is the effect on the system performance of the doubling the maximum number of transactions in execution at the same time? To answer these questions, one needs to introduce first approximate solution techniques for the blocking queuing problem. Brandwajn [10] and Menascé and Almeida [24] independently developed a noniterative solution technique for multipleclass models with memory constraints, i.e., blocking characteristics. Basically, the technique requires the calculation of throughputs for all possible combinations of customer populations. These throughputs feed a series of singleclass models, whose results are then combined into the solution of the original problem. Although accurate, the technique has a high computational cost. An iterative approximation, developed by Brandwajn [10] and Lazowska and Zahorjan [23], circumvents the problem of computing throughputs for all populations. However, as pointed out in Ref. [37], the iterative algorithm tends to be less accurate than the first technique. The basic approach of the iterative technique developed in Ref. [23] is to reduce a multiclass model to a set of singleclass models. To achieve this goal, two assumptions are introduced.
The first assumption allows the creation of flowequivalent servers for each blocking class c (c = 1, 2, ..., C). The second assumption avoids the calculation of throughputs for all possible population configurations. The throughput of a given class i with population n_{i}, X_{i}(n_{i}), is obtained from the evaluation of the multiclass model with n_{i} class i customers and with other class populations fixed at their average values, n_{j} = , j i. The average customer population can be obtained by an iterative process that evaluates the singleclass model of each blocking class. The full description of the algorithm proposed by Lazowska and Zahorjan [23] blocking models with multiple classes is as follows.
Example 15.3.Consider again the simple twoclass model described in the beginning of this section. Table 15.1 displays the parameters that characterize the model. Step 1 of the algorithm requires that one solves an open multiclass unconstrained model as given in Chapter 13. The resulting values of the average population are 6.0192 and 0.8540 transactions for classes query and update, respectively. So and are initialized as = min{4,6.0192} = 4 and = min{1,0.8540} = 0.8540. Table 15.2 shows the first four iterations of step 3 of the algorithm. The columns labeled X_{Q}(1) through X_{Q}(4) display the throughputs obtained in step 3a for class query, and the column labeled X_{U}(1) shows the throughput obtained in the same step for class update. The columns labeled and display the values obtained in step 3 (d) derived from the solution of a FESC model for classes query and update, respectively. For instance, if one wanted to stop at iteration 4, the response time for update transactions would be obtained by dividing the average number of transactions in the system (N_{s}) obtained by solving the FESC model (1.077 in this case) by the average arrival rate (0.2 tps). The resulting response time would then be 5.83 sec. Table 15.3 shows the solution of this example for a tolerance of 10^{ 2}. This table shows that an update transaction spends, on average, 51.74% of the response time waiting for execution. It is clear that the system needs more processes to execute the transactions. The same model can be used to investigate the effect on the performance of update transactions of an increase in the maximum number of processes. Assume that the maximum number of update transactions in execution is doubled. Thus, letting J_{u} = 2 and solving the model again, one obtains 3.09 sec and 2.33 sec for the average response times of updates and queries, respectively. In this case, an update transaction would spend only 0.29 sec waiting for execution.
