Paging and Swapping | Performance Tuning for Linux Servers

Each user process in Linux operates on a single, contiguous virtual address space. This virtual address consists of several different types of memory objects, such as the program text (read only) and program data (copy-on-write). When a program is loaded into a process's virtual address space, the area is initially memory mapped and backed by the program binary. This allows the virtual memory system to free and reuse these text pages rather easily. As an example, when a data page is modified, the virtual memory system has to create a private copy of the page and assign it to the process that initiated the update. Private data pages are initially referred to as copy-on-write or zero-fill-on-demand pages. The pages have to be treated differently when a page-out situation arises. Most application programs allocate more virtual memory than they ever use at any given time. For example, the text segment of a program often includes large amounts of error-handling code that is seldom or never executed. To avoid wasting memory on virtual pages that are never accessed, Linux (as well as most other UNIX operating systems) utilizes a method called demand paging. When you adopt this method, the virtual address space starts out empty. In other words, all virtual pages are marked in the page table as not present. When accessing a virtual page that is not present, the CPU generates a page fault. This fault is intercepted by the Linux kernel and triggers the page fault handler. As a result, the kernel can allocate a new page frame, determine the content of the accessed page, load the page, and finally update the page table to mark the page as present. At that point, execution returns to the process that caused the page fault. Because the required page is now present and therefore available, the instruction can now execute without causing a page fault.

A physical resource such as memory can experience a resource shortage as multiple threads from different applications compete for the scarce resource. In such a scenario, the Linux system has to choose a page frame that backs a virtual page that has not been accessed recently and has to write the page to a special area on the disk called the swap space. Now the system is capable of reusing the page frame to back the new virtual page that is being requested. The exact place where the old page is written depends on the kind of swap space used in the infrastructure. Linux supports multiple swap space areas, where each area can consist of either an entire disk partition or a specially formatted file in an existing file system. Therefore, the page table associated with the old page has to be updated accordingly. The Linux system maintains this update procedure by labeling the page-table entry as not present. To keep track of where the old page has been stored, Linux records the page's disk location. To summarize, a page-table entry that is labeled as present contains the page frame number of the physical page frame that backs the virtual page. A page-table entry that is marked as not present contains the page's disk location. The technique of borrowing a page from a process and writing it to the disk subsystem is referred to as paging. A related technique is known as swappinga much more aggressive form of paging that steals not only an individual page, but also a process's entire page set. Linux, as well as most other UNIX operating systems, utilizes paging but not swapping.

Replacement Policy

From an execution and stability perspective, the relevance of borrowing a certain page as memory becomes scarce is not important. On the other hand, from a performance perspective, which page to borrow and when to borrow it is paramount. The procedure that determines which page to evict from the main memory subsystem is referred to as the replacement policy. As an example, the least recently used (LRU) approach analyzes the past behavior and selects the page that has not been accessed for the longest period of time. Even though LRU could be implemented as a page replacement algorithm, it is not practical. This approach would require updating a data structure on every access to main memory, generating a rather significant overhead. In practice, most UNIX operating systems utilize variations of lower overhead replacement polices such as not recently used (NRU); Linux relies on an LRU-based approach. In a Linux environment, the page replacement mechanism is complicated by the fact that the kernel may utilize a variable amount of nonpageable memory. As an example, file data is stored in the buffer cache, which can grow and shrink in a dynamic fashion. When the kernel has to allocate a new page frame, the system is faced with two options: it can borrow a page from the kernel or steal a page from a process. In other words, the kernel has to implement not just a replacement policy, but also a memory balancing policy that determines how much memory is utilized for kernel buffers and how much is used to back virtual pages.

Page Replacement and Memory Balancing

The combination of page replacement and memory balancing represents a daunting task where there is no clear and perfect solution. Consequently, the Linux kernel uses a variety of procedures that tend to work well in practice. From an implementation perspective, the Linux kernel requires, from the platform-specific portion of the system, 2 extra bits in each page-table entry; they are known as the access and dirty bits. The access bit indicates whether the page has been accessed since the access bit was last cleared; the dirty bit indicates whether the page has been modified since it was last paged in. Linux utilizes the kswapd thread to periodically inspect these 2 bits. After inspection, kswapd clears the access bit. If kswapd detects that the kernel is running into a low-memory situation, it starts to proactively page out memory that has not been used recently. If a page's dirty bit is set, it is necessary to write the page to disk before the page frame can be freed. Because this represents a relatively costly exercise, kswapd prefers to free pages whose access and dirty bits are cleared (set to 0). By definition, such pages have not been accessed recently and do not have to be written back to disk before the page frame is freed; therefore, they can be reclaimed at a low performance cost.

Although systems vary as to how they manage physical pages, Linux uses the three-level page table in its architecture. These page tables can be used to convert virtual addresses to physical addresses and vice versa.