2.2 Modeling | Performance by Design: Computer Capacity Planning By Example

A model is an abstraction or generalized overview of a real system. The level of detail of a model and the specific aspects of the real system that are considered in the model depend on the purpose of the model. A model should not be made more complex than is necessary to achieve its goals. For instance, if the purpose is to predict what would happen if more memory were added to the system, it may not be necessary to model (or even completely understand) the specific disk scheduling strategy. On the other hand, knowing the average number of jobs that can fit within an extra megabyte of memory would be necessary in the model.

The only completely reliable model of a system is itself (or a duplicate copy). However, it is often infeasible, too costly, or impossible to construct such a prototype model. Future designed systems are not yet available and physically altering existing systems is nontrivial. At the other extreme, intuitive models (i.e., relying on the "experience" or "gut instinct" of one's local "computer guru"), although quick and inexpensive, suffer from lack of accuracy and bias. More scientific methods to model building are then required.

There are two major types of more scientific models: simulation and analytic models. Simulation models are based on computer programs that emulate the different dynamic aspects of a system as well as their static structure. The workload is typically a set of customers (i.e., transactions, jobs, commands) that comes from a specified, often observed, trace script or benchmark. Alternatively, the customer workload is generated through a probabilistic process, using random number generators. The flow of customers through the system generates events such as customer arrivals at the waiting line of a server, beginning of service at any given server, end of service, and the selection of which device to visit next. The events are processed according to their order of occurrence in time. Counters accumulate statistics that are used at the end of a simulation to estimate the values of several important performance measures. For instance, the average response time, T, at a device (i.e., server) can be estimated as

graphics/037equ01.gif

where T_i is the response time experienced by the i^th transaction and n_t is the total number of transactions that visited the server during the simulation. The value of T obtained in a single simulation run must be viewed as a single point in a sample space. Thus, several simulation runs are required to generate a sample of adequate size to allow for a statistical analysis to be carried out.

Because of the level of detail generally necessary in simulation models, they are often too expensive to develop, validate, and run. On the other hand, once constructed, simulation models allow for the investigation of phenomena at a detailed level of study. The are good references on simulation techniques [3]-[5].

Analytic models are composed of a set of formulas and/or computational algorithms that provide the values of desired performance measures as a function of the set of input workload parameters. For analytic models to be mathematically tractable, they are generally less detailed than simulation models. Therefore, they tend to be less accurate but more efficient to run. For example, a single-server queue (under certain assumptions to be discussed in later chapters) can expect its average response time, T, to be

graphics/037equ02.gif

where S is the average time spent by a typical request at the server (service time) and l is the average arrival rate of customer requests to the server.

The primary advantages of analytic and simulation models are, respectively:

Analytic models are less expensive to construct and tend to be computationally more efficient to run than simulation models.
Because of their higher level of abstraction, obtaining the values of the input parameters in analytic models is simpler than in simulation models.
Simulation models can be made as detailed as needed and can be more accurate than analytic models.
There are some system behaviors that analytic models cannot (or very poorly) capture, thus necessitating the need for simulation.

The reader should be cautioned that simulation models that are not properly validated can produce useless and misleading results. As noted, in some situations exact analytic models are not available or are computationally inefficient. In these cases, one can resort to approximations that may render the model easier to solve or solvable in a more efficient manner. The price is one of fidelity and accuracy. It is difficult to gauge the accuracy of the approximation. Simulation models are quite useful in this regard, since one can always compare the results obtained from a detailed simulation model with those obtained by approximate analytic models. Once convinced that the approximation is reasonably accurate, the simulation model can be abandoned in favor of a simpler and more efficient analytic model.

In capacity planning, the analyst is generally interested in being able to quickly compare and evaluate different scenarios. Accuracies at the 10% to 30% level are often acceptable for this purpose. Because of their efficiency and flexibility, analytic models (exact or approximate) are generally preferable for capacity planning purposes. This is the approach taken in this book.