Performance engineering involves experimentation in addition to modeling. Different alternatives can be compared by designing experiments, conducting experiments, analyzing the results, and drawing conclusions.
Many factors may impact the performance of computer systems being compared and each factor may have several levels. Consider the issue of purchasing a new Web server. Several factors are relevant, which directly influence the performance of the Web Server. Such factors include: processor speed, number of processors, and amount of main memory. Each of these factors may have more than one level as indicated in Table 6.6.
An exhaustive evaluation of all options considering all possible combinations of factors and levels, would require 48 (= 4x4x3) different experiments. This is called a full factorial design evaluation. The number of experiments in a full factorial design evaluation may be too large, making the experimental process time consuming and expensive. A significant reduction is the number of experiments is achieved by reducing the number of levels of each factor and/or eliminating factors that do not make a significant contribution to overall performance.
A method for eliminating factors that are less relevant is a 2k factorial design. The basic idea is to consider only two levels for each of the k factors. When factors affect performance monotonically (e.g., performance improves monotonically as the processor speed increases), the minimum and maximum levels of each factor are evaluated to determine whether or not the factor has a significant performance impact. For example, increasing the processor speed from 2.0 GHz to 3.1 GHz improves performance. By conducting experiments for two levels only, 2.0 GHz and 3.1 GHz, the effect of this factor can be determined.
An important aspect of an experiment is the workload and initial conditions used in the experiment. For example, selecting a representative workload and replaying it on the system under different configurations is effective. Care must be taken to ensure that different initial conditions, such as the contents of various caches and buffers, do not distort the results.
When analyzing the results of experiments there is always some degree of experimental error. This error contributes to variation within the measured results. Experimental error may come from non-controllable factors that may affect the results. Examples include extraneous load on the network, caching activity by file systems, garbage collection activities, paging activities, and other operating system background management activities. Thus, the variation in the results is due to: 1) different levels of the design factors involved, 2) interaction between factors, and 3) experimental error.
A technique known as ANOVA (Analysis of Variance) can be used to separate the observed variation into two main components: variation that can be attributed to assignable causes (e.g., amount of main memory or number of processors) and uncontrollable variation (e.g., network load, operating system background activities) . A detailed description of ANOVA is outside the scope of this book. The interested reader may refer to . However, it is useful to mention that single factor and two-factor ANOVA can be easily performed using MS Excel by using the Tools Data Analysis facility.
Confidence intervals can be used as a simple method for comparing two alternatives  as explained via the following simple example. Suppose that management is interested in comparing the performance of their Web server with that of a new Web server. The performance analyst conducts an experiment to determine if the performance obtained from the two Web servers is different at a 95% confidence level.
The analyst carries out the following steps:
The above four steps are performed to determine if the two Web servers give significantly different performance. A negative value of Dnew orig indicates the new server downloads files faster than the original server. The results of running the experiments are shown in Table 6.7. As indicated by the table, the 95% confidence interval for the mean of the difference in PDF file download times is [-0.0380, -0.0334], which does not include zero. Similarly, for ZIP files, the 95% confidence interval for the mean of the difference in download times is [-0.1160, -0.1058], which also does not include the zero.
Thus, at the 95% confidence level, the new server outperforms the original server.