|
A strategy for improving Linux performance and scalability includes running several industry-accepted and component-level benchmarks, selecting the appropriate hardware and software, developing benchmark run rules, setting performance and scalability targets, and measuring, analyzing, and improving performance and scalability. Performance is defined as raw throughput on a uniprocessor (UP) or SMP. A distinction is made between SMP scalability (CPUs) and resource scalability (for example, the number of network connections). Hardware and SoftwareThe architecture assumed for the majority of this discussion is IA-32 (for example, x86), from one to eight processors. Also examined are the issues associated with nonuniform memory access (NUMA) IA-32 and NUMA IA-64 architectures. The selection of hardware typically aligns with the selection of the benchmark and the associated workload. The selection of software aligns with the evaluator's middleware strategy and/or open-source middleware. The following lists several workloads that are typically targeted for Linux server performance evaluation. Each workload includes a description of sample hardware discussed in this chapter.
Run RulesDuring benchmark setup, run rules are developed that detail how the benchmark is installed, configured, and run, and how results are to be interpreted. The run rules serve several purposes:
These run rules are the foundation of benchmark execution. Setting benchmark targets, which typically occurs after the run rules have been defined, is the next step in the evaluation process. Setting TargetsPerformance and scalability targets for a benchmark are associated with a specific SUT (hardware and software configuration). Setting performance and scalability targets requires the following:
Measurement, Analysis, and TuningThe benchmark executions are initiated according to the run rules in order to measure both performance and scalability in terms of the defined performance metric. When calculating SMP scalability for a given machine, there exists an alternative between computing this metric based on the performance of a UP kernel and computing it based on the performance of an SMP kernel, with the number of processors set to 1 (1P). The important factor here is consistency, so either option is acceptable, as long as the same alternative is used when comparing results. Before any measurements are made, both the hardware and software configurations are tuned before performance and scalability are analyzed. Tuning is an iterative cycle of tuning and measuring. It involves measuring components of the system such as CPU utilization and memory usage, and possibly adjusting system hardware parameters, system resource parameters, and middleware parameters. Tuning is one of the first steps of performance analysis. Without tuning, scaling results may be misleading. In other words, they might not indicate kernel limitations, but rather some other issue. The first step required to analyze the SUT's performance and scalability is to understand the benchmark and the workload tested. Initial performance analysis is made against a tuned system. Sometimes analysis uncovers additional modifications to tuning parameters. Analyzing the SUT's performance and scalability requires a set of performance tools. The use of open-source community (OSC) tools is desirable whenever possible to facilitate posting of analysis data to the OSC in order to illustrate performance and scalability bottlenecks. It also allows those in the OSC to replicate results with the tool or to understand the results after using the tool on another application on which they can experiment. In many instances, ad hoc performance tools are developed to gain a better understanding of a specific performance bottleneck. Ad hoc performance tools are usually simple tools that instrument a specific component of the Linux kernel. It is advantageous to share such tools with the OSC. A sample listing of performance tools available includes the following:
Ad hoc performance tools help you further understand a specific aspect of the system. Examples are as follows:
Performance analysis data is then used to identify performance and scalability bottlenecks. You need a broad understanding of the SUT and a more specific understanding of certain Linux kernel components that are being stressed by the benchmark to understand where the performance bottlenecks exist. You must also understand the Linux kernel source code that is the cause of the bottleneck. In addition, the Linux OSC can be leveraged to help you isolate performance-related issues and developing patches for their associated resolution. Exit StrategyAn evaluation of Linux kernel performance may require several cycles of running the benchmarks, analyzing the results to identify performance and scalability bottlenecks, addressing any bottlenecks by integrating patches into the Linux kernel, and running the benchmark again. You can obtain the patches by finding existing patches in the OSC or by developing new ones. There is a set of criteria for determining when Linux is "good enough," which terminates the process. If the targets have been met and there are no outstanding Linux kernel issues to address for the specific benchmark that would significantly improve its performance, Linux is "good enough." In this instance, it is better to move on to address other issues. Second, if several cycles of performance analysis have occurred, and there are still outstanding bottlenecks, consider the trade-offs between the development costs of continuing the process and the benefits of any additional performance gains. If the development costs are too high relative to any potential performance improvements, discontinue the analysis and articulate the rationale appropriately. In both cases, when reviewing all the additional outstanding Linux kernel-related issues to address, assess appropriate benchmarks that may be used to address these kernel component issues, examine any existing data on the issues, and decide to analyze the kernel component (or collection of components) based on this collective information. |
|