Processors and Multiprocessing | Performance Tuning for Linux Servers

Most server-class systems are designed to support more than one processor. The most common type of multiprocessor systems supported by Linux are the tightly coupled Symmetrical Multiprocessing or Shared Memory Multiprocessing architectures (SMP). These tightly coupled architectures are called SMP because each processor shares the same system bus, and therefore, each processor is symmetrical to or equidistant from system memory and I/O resources. In other words, memory access and I/O access times from any processor in the system are uniform.

The advantage of SMP systems is that they provide more computing power, because there are more processors with which to schedule work. In a perfect world, an SMP system provides linear scalability as more processors are added to the system. To explain further, a workload on an n-processor system could perform n times faster than the same workload on a one-processor system. Realistically, because processors in an SMP server share system resources (the memory bus, the I/O bus, and so on), linear scalability is difficult to achieve. Achieving acceptable scalability in an SMP environment involves both optimized hardware and software. The hardware must be designed to exploit the system's parallel characteristics. System software must be written to take full advantage of the parallelism built into the hardware. On the other hand, the fact that processors share certain system resources places limitations on the amount of parallelism that can be achieved. Both system hardware and software must implement complicated locking logic and algorithms to provide mutual exclusion of shared system resources. The system must prevent concurrent access to any shared resource to preserve data consistency and correct program operation. Mutual exclusion is one of the primary factors that limit the scalability of any SMP operating system. SMP support in the Linux kernel has evolved from a model that completely serialized access to the entire kernel to a design that now supports multiple layers and types of locks within kernel components at every level of the kernel. The 2.6 Linux kernel continues to improve SMP scalability over previous kernel versions by implementing features that further exploit parallelism in SMP environments.

Server Topologies

Any size of computer can be configured to run as a server. Some services can be provided by a single processor computer, whereas other services, such as large databases, require more substantial computer hardware. Because the typical single-processor system with memory and a few disk drives should be common to anyone attempting to tune Linux on a server, this chapter focuses on larger server configurations with multiple processors and potentially large amounts of disk storage.

Linux can support servers effectively with up to 16 processors on 2.4-based kernels and 32 processors on 2.6-based kernels (and up to 512 processors on some architectures). As the processor count scales up, a similar scaling up of memory and I/O capacity must occur. A 16-processor server with only 1GB of memory would most likely suffer performance problems from a lack of memory for the processors to make use of. Similarly, a server with a large memory would be hampered if only one disk drive were attached, or only one path to get to disk storage. An important consideration is the balance of a server's elements so that adequate resources exist for the work being performed.

An important characteristic of multiprocessor configurations is the manner in which the processors are connectedthe server's topology. The basic multiprocessor system employs a large system bus that all processors connect to and that also connects the processors to memory and I/O buses, as depicted in Figure 3-1. Multiprocessor systems like these are referred to as Symmetric Multiprocessors (SMPs) because all processors are equal and have similar access to system resources.

Figure 3-1. The basic multiprocessor system.

How many processors a server needs is determined by the workload. More processors provide more processing power and can provide additional throughput on CPU-bound jobs. If a workload is CPU-boundprocesses are waiting excessively for a turn on a processoradditional processor capacity might be warranted.

SMP systems are fairly common and can be found with two to four processors as a commodity product. Larger configurations are possible, but as the processor count goes up, more memory is attached, and more I/O devices are used, the common system bus becomes a bottleneck. There is not enough capacity on the shared system bus to accommodate all the data movement associated with the quantity of processors, memory, and I/O devices. Scaling to larger systems then requires approaches other than SMP.

Various approaches for larger scaling have been employed. The two most common approaches are clusters and Non-Uniform Memory Architecture (NUMA). Both of these approaches have a common basis in that they eliminate a shared system bus. A cluster is constructed of a collection of self-contained systems that are interconnected and have a central control point that manages the work that each system (node) within a cluster is performing. Each node within a cluster runs its own operating system (that is, Linux) and only has direct access to its own memory. NUMA systems, on the other hand, are constructed from nodes connected through a high-speed interconnect, but a common address space is shared across all nodes. Only one operating system image is present, and it controls all operations across the nodes. The memory, although local to each node, is accessible to all nodes in one large, cache-coherent physical address space. Clusters and NUMA systems are discussed in more detail in later sections.

Mixing Processors

Many modern server platforms support the mixing of processor speeds and steppings (revisions) within the same system. Special consideration must be taken to ensure optimal operation in such an environment. Usually, the processor vendor publishes specific guidelines that must be met when mixing processors with different speeds and different features or stepping levels. Some of the most common guidelines are as follows:

The boot processor is selected from the set of processors having the lowest stepping and lowest feature set of all other processors in the system.
The system software uses a common speed for all processors in the system, determined by the slowest speed of all processors configured in the system.
All processors use the same cache size, determined by the smallest cache size of all processors configured in the system.

System software must implement and follow similar restrictions or guidelines to ensure correct program operation.

As you have learned from this section, Linux can support servers with multiprocessors. However, you have also learned the memory has to be increased to avoid performance problems. The next section addresses the memory issue.