Memory | Performance Tuning for Linux Servers

Servers tend to have large quantities of memory. Amounts of 1GB to 4GB of memory per processor are common. The amount of memory needed for a server varies depending on the type of work the server is doing. If a server is swapping excessively, additional memory should be considered. Some workloads perform substantially better if there is enough memory on the server to keep common, heavily used data locked in memory. Other workloads use small amounts of memory with transient data, so additional memory would not be a benefit.

The maximum amount of memory a process on a server can address is limited by the processor's word size. Server processors have either 32-bit or 64-bit words. Registers on processors are the size of a word and are used to hold memory addresses. The maximum amount of memory that can be addressed by a processor is a function of the word size. 32-bit processors have a 4GB limit on memory addressability (2 raised to the 32^nd power). On Linux the user-space process is provided only 3GB of address space, with the last gigabyte of address space reserved for use by the kernel.

On 64-bit processors, the 4GB limit goes away, but most 64-bit implementations put a restriction on the maximum address below the possible maximum (that is, 2 raised to the 64^th power).

Some 32-bit processors (for example, Pentium) implement additional address bits for accessing physical addresses greater than 32 bits, but these are accessible only via virtual addressing by use of additional bits in page table entries. x86-based processors currently support up to 64GB of physical memory through this mechanism, but the virtual addressability is still restricted to 4GB.

64-bit processors are appropriate for workloads that have processes that need to address large quantities of data. Large databases, for example, benefit from the additional memory addressability provided by 64-bit processors. 32-bit processors, on the other hand, are better for workloads that do not have large addressability requirements, because code compiled for 32-bit processors is more compact (because addresses used in the code are half the size of 64-bit addresses). The more compact code reduces cache usage.

Processor speeds and memory speeds continue to increase. However, memory speed technology usually lags processor technology. Therefore, most server systems implement smaller high-speed memory subsystems called caches. Cache memory subsystems are implemented between the processors and memory subsystems to help bridge the gap between faster processor speeds and the slower memory access times. The advantage of implementing caches is that they can substantially improve system performance by exploiting a property called locality of reference. Most programs, at some point, continuously execute the same subset of instructions for extended periods of time. If the subset of instructions and the associated data can fit in the cache memory, expensive memory accesses can generally be eliminated, and overall workload performance can be substantially increased.

Most processors today implement multiple levels of caches. In addition, some servers can also implement multiple cache hierarchies. The processor caches are typically much smaller and faster than caches implemented in the platform. Caches range in size from a few kilobytes to a few megabytes for on-chip caches and up to several megabytes for system caches. Caches are broken into same-sized entries called cache lines. Each cache line represents a number of contiguous words of main memory. Cache line sizes range from a few bytes (in processor caches) to hundreds of bytes (in system caches). Data is inserted into or evicted from caches on cache line boundaries. The Linux kernel exploits this fact by ensuring that data structures or portions of data structures that are accessed frequently are aligned on cache line boundaries. Cache lines are further organized into sets. The number of lines in a set represents the number of lines a hash routine must search to determine whether an address is available in cache. Caches implement different replacement policies to determine when data is evicted from a cache. Caches also implement different cache consistency algorithms to determine when data is written back to main memory and provide the capability to flush the total contents of a cache. Proper operating system management of system caches can have an impact on system performance.

As much as memory is important to keeping things running smoothly, I/O capacity is also important. Along with memory, this is a component that keeps the processors effective.