Memory and Address Space

team bbl


Because the physical memory subsystem has a lower latency than the disk subsystem, one of the challenges faced by the virtual memory subsystem is to keep the most frequently referenced portions of memory in the faster primary storage. In the event of a physical memory shortage, the virtual memory subsystem is required to surrender some portion of physical memory. This is done by transferring infrequently used memory pages out to the backing store. Therefore, the virtual memory subsystem provides resource virtualization in the sense that a process does not have to manage the details of physical memory allocation. The process also doesn't have to manage information and fault isolation because each process executes in its own address space. In most circumstances, hardware facilities in the memory management unit perform the memory protection functionality by preventing a process from accessing memory outside its legal address space. The exception to this is memory regions that are explicitly shared among processes.

Address Space

A process's virtual address space is defined as the range of memory addresses that are presented to the process as its environment. At any time in a process's life cycle, some addresses are mapped to physical addresses and some are not. The kernel creates the basic skeleton of a process's virtual address space when the fork() system call is initiated. The virtual address layout within a process is established by the dynamic linker and may vary from hardware platform to hardware platform. In general, the virtual address space is composed of equal-sized components that are referenced as virtual pages. In an IA-32 environment, the page size is 4KB; in an IA-64 setup the page sizes can be configured as 4, 8, 16, and 64KB. The virtual address space of any Linux process is further divided into two main regions: the user space and the kernel space. The user space resides in the lower portion of the address space, starting at address zero and extending to a platform-specific TASK_SIZE limit as specified in processor.h (see Figure 9-1). The remainder of the address space is reserved for the kernel. The user portion of the address space is marked private, specifying that it is mapped by the process's own page table. On the other hand, the kernel space is shared among all the processes. Depending on the hardware infrastructure, the kernel address space is either mapped into the upper portion of each address space of a process or occupies the top portion of the virtual address space of the CPU. During execution at the user level, only the user address space is accessible, because an attempt to operate on a kernel virtual address would result in a protection violation fault. While executing in kernel mode, both user and kernel address space are accessible.

User Address Space

Each address space is represented in the Linux kernel through an object known as the mm structure. Because multiple tasks may share the same address space, the mm structure is a reference counted object that exists as long as the reference count is greater than zero. Each task structure incorporates the mm pointer to the mm structure that defines the address space of the task (process).

Figure 9-2 represents the scenario where a process attempts to read a word at address z. The actual read operation is depicted by number 1. Because the page table is assumed to be empty, the read operation causes a page fault. In response to the page fault, the Linux kernel searches the VM area list of this particular process to locate the VM area that holds the faulting address. After determining which page must be accessed for this particular request, Linux initiates a disk file read, as illustrated by number 2. As the I/O subsystem provides the file, the operating system copies the data into an available page frame, as indicated by number 3. The last step required to finalize the read page fault consists of updating the page table to reflect the mapping of the virtual- to-physical page frame that contains the data. At this point, the system can reinitiate the read request. It will complete successfully this time, because the required data is now available.

Figure 9-2. VM area structures.


Tasks that access only the kernel address space, such as the kswapd or the pdflush thread, utilize an anonymous address space; therefore, the mm pointer references NULL in these cases. The mm structure is considered the entry point into the core of the virtual memory subsystem, because it contains pointers to the two main data structures that establish the virtual memory environment. The first structure is the page table; the second structure is known as the virtual memory area. From the kernel's perspective, the presence of a system-wide page table is sufficient to implement virtual memory. Some of the more traditional large page tables, including the clustered page table approach, are not efficient when representing large address spaces.

The VM Area Structures

To circumvent the issue of large page tables, Linux does not represent address spaces with page tables per se, but utilizes a set of VM area structure lists instead. The idea behind this approach is to partition an address space into contiguous page ranges that can be handled the same way, where each range can be represented through a single VM area structure. In such a case, where a process accesses a page in which no translation exists in the page table, the VM area responsible for this particular page holds all the information necessary to establish and install the page. As depicted in Figure 9-2, the VM area list lets the Linux kernel create the actual page table entry for any given address that is mapped in the address space of a particular process. The consequence of this scenario is that the per-process page table can be considered a cache subsystem. In other words, if a translation is available, the kernel simply use it; if the translation is missing, the kernel can create it based on the corresponding VM area. Treating the page table as a cache provides significant flexibility because the translations for clean pages can be removed at will. The translations for dirty pages can be removed only if pages are backed up by a file. Prior to removal, these pages have to be cleaned by writing the page content back to the file. This cache-like utilization behavior of page tables in Linux provides the foundation for a rather efficient copy-on-write implementation.

An example of utilizing a VM area-based approach is that if a process maps a significant number of different files into its address space, the process (more specifically, the Linux kernel) may have to maintain a VM area list that consists of hundreds of entries. This results in the system slowing down as the VM area list grows, making it necessary to traverse the list on every page fault. To circumvent the performance implication of traversing the list, the Linux operating system tracks the number of VM areas on the list. In a situation where the size of the list reaches a certain threshold (normally 32 entries), the system creates a secondary data structure that organizes the VM areas in a self-balancing binary search tree. The implication of utilizing a binary tree-based search algorithm is that given a virtual address, the matching VM area structure can be located in a number of steps. These steps reveal a logarithmic relationship to the number of VM areas in the address space. To expedite the scenario where the system has to visit all the VM area structures, the Linux kernel maintains (after reaching the threshold) the linear and binary tree structures.

Kernel Address Space

As illustrated in Figure 9-1, the overall kernel address space can be decomposed into a kernel image section and a kernel module section. (The kernel image is also known as the identity mapped segment, and the kernel module is also referred to as the page table-mapped segment.)

Kernel Module Section

The kernel module section is mapped by the kernel private page table and is primarily utilized to implement the kernel's vmalloc() area. This allows the system to allocate large contiguous virtual memory regions. As an example, the memory necessary to load a particular kernel module is allocated in this section of address space. The address range associated with vmalloc() is governed by the two platform-specific parameters VMALLOC_START and VMALLOC_ END. The vmalloc() section does not necessarily occupy the entire page table mapped segment, therefore leaving open the possibility to utilize portions of the segment for platform-specific purposes and functions.

Kernel Image Section

The kernel image section is unique in the sense that there is a direct correlation or mapping between a virtual address in this segment and the physical address it translates into. The mapping is platform-specific, but the one-to-one identity relationship provides the segment with its name. This segment could be implemented via a page-table-based approach, but more efficient platform-dependent techniques can be utilized. In other words, the system can rely on a simple mapping formula similar to (pfn = (addr PAGE_OFFSET) / PAGE_SIZE). This formula can be used to minimize the overhead of utilizing a full page table-based implementation. Despite this simple methodology, some Linux systems use a table called the page frame map to keep track of the status of the physical page frames in the system. For each page frame, this table contains one page frame descriptor (pfd) that contains various resource-related system maintenance data. The information stores counts or the number of address spaces that are utilizing the page frame, various flags that indicate whether the frame can be paged out to disk, or whether the page is marked as dirty.

In Linux, there is no direct correlation between the actual size of the physical address space and the size of the virtual address space. However, both are limited in size. To better manage your address space, Linux has developed high-memory support.

    team bbl



    Performance Tuning for Linux Servers
    Performance Tuning for Linux Servers
    ISBN: 0137136285
    EAN: 2147483647
    Year: 2006
    Pages: 254

    flylib.com © 2008-2017.
    If you may any questions please contact us: flylib@qtcs.net