Why is the system so slow? is probably second on any system administrator's things-I-least-want-to-hear list (right after Why did the system crash again?!). Like system reliability, system performance is a topic that comes up only when there is a problem. Unfortunately, no one is likely to compliment or thank you for getting the most out of the system's resources. System performance-related complaints can take on a variety of forms, ranging from sluggish interactive response time, to a job that takes too long to complete or is unable to run at all because of insufficient resources. In general, system performance depends on how efficiently a system's resources are applied to the current demand for them by various jobs in the system. The most important system resources from a performance perspective are CPU, memory, and disk and network I/O, although sometimes other device I/O can also be relevant. How well a system performs at any given moment is the result of both the total demand for the various system resources and how well the competition among processes[1] for them is being managed. Accordingly, performance problems can arise from a number of causes, including both a lack of needed resources and ineffective control over them. Addressing a performance problem involves identifying what these resources are and figuring out how to manage them more effectively.
NOTE
As with most of life, performance tuning is much harder when you have to guess what normal is. If you don't know what the various system performance metrics usually show when performance is acceptable, it will be very hard to figure out what is wrong when performance degrades. Accordingly, it is essential to do routine system monitoring and to maintain records of performance-related statistics over time. When the lack of a critical resource is the source of a performance problem, there are a limited number of approaches to improving the situation. Put simply, when you don't have enough of something, there are only a few options: get more, use less, eliminate inefficiency and waste to make the most of what you have, or ration what you have. In the case of a system resource, this can mean obtaining more of it (if that is possible), reducing job or system requirements to desire less of it, having its various consumers share the amount that is available by dividing it between them, having them take turns using it, or otherwise changing the way it is allocated or controlled. For example, if your system is short of CPU resources, your options for improving things may include some or all of the following:
Naturally, not all potential solutions will necessarily be possible on any given computer system or within any given operating system. It is often necessary to distinguish between raw system resources like CPU and memory and the control mechanisms by which they are accessed and allocated. For example, in the case of the system's CPU, you don't have the ability to allocate or control this resource as such (unless you count taking the system down). Rather, you must use features like nice numbers and scheduler parameters to control usage. Table 15-1 lists the most important control mechanisms associated with CPU, memory, and disk and network I/O performance.
15.1.1 The Tuning ProcessThe following process offers the most effective approach to addressing system performance issues. 15.1.1.1 Define the problem in as much detail as you can.The more specific you can be about what is wrong (or less than optimal) with the way things are currently, the more likely it is you can find ways to improve them. Ideally, you'd like to move from an initial problem description like this one:
to one like this:
A good description of the current performance issues will also implicitly state your performance goals. For example, in this case, the performance goal is clearly to improve interactive response time for users running under X. It is important to understand such goals clearly, even if it is not always possible to reach them (in which case, they are really wishes more than goals). 15.1.1.2 Determine what's causing the problem.To do so, you'll need to answer questions like these:
For example, if we examined the system with the X windows performance problems, we might find that the response-time problems occurred only when more than one simulation job and/or large compilation job is running. By watching what happens when a user tries to switch windows under those conditions, we could also figure out that the critical resource is system memory and that the system is paging (we'll have more to say about this later in this chapter). 15.1.1.3 Formulate explicit performance improvement goals.This step involves transforming the implicit goals (wishes) that were part of the problem description into concrete, measurable goals. Again, being as precise and detailed as possible will make your job easier. In many cases, tuning goals will need to be developed in conjunction with the users affected by the performance problems, and possibly with other users and management personnel as well. System performance is almost always a matter of compromises and tradeoffs, because it inevitably involves deciding how to apply and apportion the finite available resources. Tuning is easiest and most successful where there is a clear agreement about the relative priority and importance of the various competing activities on the system. To continue with our example, setting achievable tuning goals will be difficult unless it is decided whose performance is more important. In other words, it is probably necessary to choose between snappy interactive response time for X users and fast completeion times for simulation and compilation jobs (remember that the status quo has already been demonstrated not to work). Decided one way, the tuning goal could become something like this:
15.1.1.4 Design and implement modifications to the system and applications to achieve those goals.Figuring out what to do is, of course, the trickiest part of tuning a system. We'll look at what the options are for various types of problems in the upcoming sections of this chapter. It is important to tune the system as a whole. Focusing only on part of the system workload will give you a distorted picture of the problem, because system performance is ultimately the result of the interactions among everything on the system. 15.1.1.5 Monitor the system to determine how well the changes worked.The purpose here is to evaluate the system status after the change is made and determine whether or not the change has improved things as expected or desired. The most successful tuning method introduces small changes to the system, one at a time, allowing you to thoroughly test each one and judge its effectiveness and to back it out again if it makes things worse instead of better. 15.1.1.6 Return to the first step and begin again.System performance tuning is inevitably an iterative process, because even a successful change will often reveal new interactions to understand and new problems to address. Similarly, once the bottleneck caused by one system resource is relieved, a new one centered around a different resource may very well arise. In fact, the initial performance problem can often be just a secondary symptom of the real, more serious underlying problem (e.g., a CPU shortage can be a symptom of serious memory shortfalls). NOTE
Not all problems in life can be solved with money, but many performance issues can. If you have definitively identified the resource that is in short supply and you can afford to buy more of it (or upgrade it), do so. This approach is often the best and fastest way to address a performance problem. On the other hand, buying hardware in the hope that will alleviate a performance problem is likely to be both wasteful and frustrating. Most operating systems provide specialized tools for performance tuning. These are the primary tuning tools and procedures for each of the various operating systemswe are considering:
We'll discuss using these tools at the appropriate points within this chapter. Some systems also provide additional performance monitoring and tuning tools as add-on packages. 15.1.2 Some Tuning CaveatsI'll close this section with two important notes about system performance tuning. First, be aware of the experimenter effect. The term refers to the realization that merely watching something happen can change the thing that is happening in significant ways. In anthropology, this means that the a researcher observing the customs and behaviors of another culture inevitably has an effect on what is observed; people behave differently when they know they are being watched, especially by outsiders. For performance monitoring, running the monitoring tools can also have an effect on the system, and this fact needs to be taken into account when interpreting the data they collect. Ideally, performance data collection should be decoupled from data analysis (and the latter can take place on a different system). Second, consider this advice from IBM's AIX Versions 3.2 and 4.1 Performance Tuning Guide:
Its overly formal language aside, this maxim reminds us that the tools Unix provides for observing system behavior offer one way of looking at the system, but not the only way. What is actually important to watch and tune on your system may or may not be trivially accessible to either monitoring or modification. At the same time, it is also necessary to keep this important corollary in mind:
This is, of course, really just another way of saying:
|