4.5 Workload Model | Performance by Design: Computer Capacity Planning By Example

A workload model is a representation that mimics the real workload under study. It can be a set of programs written and implemented with the goal of artificially testing a system in a controlled environment. A workload model may also serve as input data for an analytical model of a system. It is not practical to have a model composed of thousands of basic components to mimic the real workload. Workload models should be compact and representative of the actual workload [14].

4.5.1 Types of Workload Models

Workload models can be classified into two main categories:

Natural models are constructed either using basic components of the real workload as building blocks or using execution traces of the real workload. A natural benchmark consists of programs extracted from the real workload of a system. These programs are selected so that the benchmark represents the overall system load in given periods of time. Another natural model often used in performance studies is a workload trace. It consists of a chronological sequence of data representing specific events that occurred in the system during a measurement session. For example, in the case of a Web server, the log access contains one entry per HTTP request processed by the server. Among other information, each log entry specifies the name of the host making the request, the timestamp, and the name of the file requested. This type of log characterizes the real workload during a given period of time. Although traces exhibit reproducibility and representativeness features, they do have drawbacks. Usually, traces consist of huge amounts of data, which increases the complexity of using them in modeling exercises. It is usually difficult to modify a trace to represent different workload scenarios. Moreover, traces are suitable only for simulation or prototype models.
Artificial models do not make use of any basic component of the real workload. Instead, these models are constructed out of special-purpose programs and descriptive parameters. Artificial models are partitioned into two classes: executable and non-executable models. Executable artificial models consist of a suite of programs especially written to experiment with particular aspects of a computer system. The class of executable models include workloads such as instruction mixes, kernels, synthetic programs, artificial benchmarks, and drivers. Instruction mixes are hardware demonstration programs intended to test the speed of a computer on simple computational and I/O operations. Program kernels are pieces of code selected from computationally intense parts of a real program. In general, kernels concentrate on measuring the performance of processors without considering the I/O system. Synthetic programs are specially devised codes that place demands on different resources of a computing system. Unlike benchmarks, synthetic programs do not resemble the real workload. Benchmarks, synthetic programs, and other forms of executable models are not adequate inputs for performance models. When the performance of a system is analyzed through the use of analytic or simulation models, non executable representations for the workload are required. Because the approach to performance engineering relies on the use of analytic models for performance prediction, this book focuses on workload representations suitable for these kinds of models.

Non-executable workload models are described by a set of average parameter values that reproduce the same resource usage as the real workload. Each parameter denotes an aspect of the execution behavior of the basic component on the system under study. The basic inputs to analytical models are parameters that describe the service centers (i.e., hardware and software resources) and the customers (e.g., transactions and requests). Typical parameters are:

Component (e.g., transaction and request) interarrival times,
Service demands,
Component sizes, and
Execution mixes (classes of components and their corresponding levels of multiprogramming).

Each type of system may be characterized by a different set of parameters. As an example, consider a parametric characterization of a workload of a distributed system file server [5]. Several factors have a direct influence on the performance of a file server: the system load, the device capability, and the locality of file references. From these factors, the following parameters are defined:

Frequency distribution of the requests: describes the participation of each request (e.g., read, write, create, rename) on the total workload.
Request interarrival time distribution: indicates the time between successive requests. It also indicates the intensity of the system load.
File referencing behavior: describes the percentage of accesses made to each file in the disk subsystem.
Size of reads and writes: indicates the I/O load. This parameter has a strong influence on the time needed to service a request.

The above parameters specify the workload model and are capable of driving synthetic programs that accurately represent real workloads.

Another study [22] looks at an I/O workload from a different perspective. The workload model for a storage device (i.e., queues, caches, controllers, disks) is specified by three classes of parameters: access time attributes, access type (i.e., fraction of reads and writes), and spatial locality attributes [1]. Access time attributes capture the time access pattern, which includes the arrival process (i.e., deterministic, Poisson, bursty), arrival rate, burst rate, burst count, and burst fraction. Spatial locality attributes specify the relationship between consecutive requests, such as sequential and random accesses. Due to "seek" delays, workloads with a larger fraction of sequential accesses typically exhibit better performance than those with random accesses. Thus, for a detailed model of an I/O device, it is important to include the spatial locality attributes of the workload.

4.5.2 Clustering Analysis

Consider a workload that consists of transactions that exhibit a large variability in terms of their CPU and I/O demands. Averaging the demands of all transactions would likely produce a workload model that is not representative of most of the transactions. Therefore, the transactions in the workload have to be grouped, or clustered, so that the variability within each group is relatively small when compared to the variability in the entire data set. These clusters correspond to different classes of a multiclass performance model.

The process of generating these rather homogeneous groups is called clustering analysis. While clustering analysis can be performed automatically by specific functions of software packages (e.g., SPSS, SAS, WEKA), a performance analyst should know the fundamentals of this technique.

A well-known clustering technique, called k-means clustering, is presented here. Assume that a workload is described by a set of p points w_i = (D_i₁, D_i₂, ..., D_{i_M}) in an M-dimensional space, where each point w_i represents a unit of work (e.g., transaction, request) executed by a computer system. The components of a point w_i describe specific properties of a transaction, such as the service demand at a particular device or any other value (e.g., I/O count, bytes transmitted) from which service demands can be obtained.

Many clustering algorithms, including k-means, require that a distance metric between points be defined. A common distance metric is the Euclidean distance, which defines the distance, d_i,j, between two points w_i = (D_i₁, D_i₂, ···, D_{i_M}) and w_j = (D_j₁, D_j₂, ···, D_{j_M}) as

Equation 4.5.2

graphics/04equ52.gif

The use of raw data to compute Euclidean distances may lead to distortions if the components of a point have very different relative values and ranges. For instance, suppose that the workload of a Web server is described by points, each representing an HTTP request, of the type (f, c) where f is the size of the file retrieved by an HTTP request and c is the CPU time required to process the request. Consider the point (25, 30) where f is measured in Kbytes and c in milliseconds. If f is now measured in Mbytes, the point (25, 30) becomes (0.025,30). Such changes of units may result in the modification of the relative distances among groups of points [16]. Scaling techniques should be used to minimize problems that arise from the choice of units and from the different ranges of values of the parameters [16].

The k-means algorithm produces k clusters. Each cluster is represented by its centroid, which is a point whose coordinates are obtained by computing the average of all the coordinates of the points in the cluster. The algorithms starts by determining k points in the workload to act as initial estimates of the centroids of the k clusters. The remaining points are then allocated to the cluster with the nearest centroid. The allocation procedure iterates several times over the input points until no point switches cluster assignment or a maximum number of iterations is performed. Having as input data the p points w_i = (D_i₁, D_i₂, ···, D_{i_M}), the steps required by the k-means algorithm are shown in Fig. 4.5.

Figure 4.5. The k-means clustering algorithm.

graphics/04fig05.gif

Another distance-based clustering algorithm is the minimum spaning tree algorithm described in [16]. Fractal-based clustering of E-commerce workloads is discussed in [15].