Section 5.4. Per-Process Resources | The Design and Implementation of the FreeBSD Operating System

5.4. Per-Process Resources

As we have already seen, a process requires a process entry and a kernel stack. The next major resource that must be allocated is its virtual memory. The initial virtual-memory requirements are defined by the header in the process's executable. These requirements include the space needed for the program text, the initialized data, the uninitialized data, and the run-time stack. During the initial startup of the program, the kernel will build the data structures necessary to describe these four areas. Most programs need to allocate additional memory. The kernel typically provides this additional memory by expanding the uninitialized data area.

Most FreeBSD programs use shared libraries. The header for the executable will describe the libraries that it needs (usually the C library, and possibly others). The kernel is not responsible for locating and mapping these libraries during the initial execution of the program. Finding, mapping, and creating the dynamic linkages to these libraries is handled by the user-level startup code prepended to the file being executed. This startup code usually runs before control is passed to the main entry point of the program [Gingell et al., 1987].

FreeBSD Process Virtual-Address Space

The initial layout of the address space for a process is shown in Figure 5.6 (on page 152). As discussed in Section 5.2, the address space for a process is described by that process's vmspace structure. The contents of the address space are defined by a list of vm_map_entry structures, each structure describing a region of virtual address space that resides between a start and an end address. A region describes a range of memory that is being treated in the same way. For example, the text of a program is a region that is read-only and is demand paged from the file on disk that contains it. Thus, the vm_map_entry also contains the protection mode to be applied to the region that it describes. Each vm_map_entry structure also has a pointer to the object that provides the initial data for the region. The object also stores the modified contents either transiently, when memory is being reclaimed, or more permanently, when the region is no longer needed. Finally, each vm_map_entry structure has an offset that describes where within the object the mapping begins.

Figure 5.6. Layout of an address space.

The example shown in Figure 5.6 represents a process just after it has started execution. The first two map entries both point to the same object; here, that object is the executable. The executable consists of two parts: the text of the program that resides at the beginning of the file and the initialized data area that follows at the end of the text. Thus, the first vm_map_entry describes a read-only region that maps the text of the program. The second vm_map_entry describes the copy-on-write region that maps the initialized data of the program that follows the program text in the file (copy-on-write is described in Section 5.6). The offset field in the entry reflects this different starting location. The third and fourth vm_map_entry structures describe the uninitialized data and stack areas, respectively. Both of these areas are represented by anonymous objects. An anonymous object provides a zero-filled page on first use and arranges to store modified pages in the swap area if memory becomes tight. Anonymous objects are described in more detail later in this section.

Page-Fault Dispatch

When a process attempts to access a piece of its address space that is not currently resident, a page fault occurs. The page-fault handler in the kernel is presented with the virtual address that caused the fault and the type of access that was attempted (read or write). The fault is handled with the following four steps:

1. Find the vmspace structure for the faulting process; from that structure, find the head of the vm_map_entry list.

2. Traverse the vm_map_entry list, starting at the entry indicated by the map hint; for each entry, check whether the faulting address falls within its start and end address range. If the kernel reaches the end of the list without finding any valid region, the faulting address is not within any valid part of the address space for the process, so send the process a segment fault signal.

3. Having found a vm_map_entry that contains the faulting address, convert that address to an offset within the underlying object. Calculate the offset within the object as

 object_offset = fault_address     - vm_map_entry4start_address     + vm_map_entryobject_offset

Subtract off the start address to give the offset into the region mapped by the vm_map_entry. Add in the object offset to give the absolute offset of the page within the object.

4. Present the absolute object offset to the underlying object, which allocates a vm_page structure and uses its pager to fill the page. The object then returns a pointer to the vm_page structure, which is mapped into the faulting location in the process address space.

Once the appropriate page has been mapped into the faulting location, the page-fault handler returns and reexecutes the faulting instruction.

Mapping to Objects

Objects are used to hold information about either a file or about an area of anonymous memory. Whether a file is mapped by a single process in the system or by many processes in the system, it will always be represented by a single object. Thus, the object is responsible for maintaining all the state about those pages of a file that are resident. All references to that file will be described by vm_map_entry structures that reference the same object. An object never stores the same page of a file in more than one physical-memory page, so all mappings will get a consistent view of the file.

An object stores the following information:

A list of the pages for that object that are currently resident in main memory; a page may be mapped into multiple address spaces, but it is always claimed by exactly one object
A count of the number of vm_map_entry structures or other objects that reference the object
The size of the file or anonymous area described by the object
The number of memory-resident pages held by the object
Pointer to shadow objects (described in Section 5.5)
The type of pager for the object; the pager is responsible for providing the data to fill a page and for providing a place to store the page when it has been modified (pagers are covered in Section 5.10)

There are three types of objects in the system:

Named objects represent files; they may also represent hardware devices that are able to provide mapped memory such as frame buffers.
Anonymous objects represent areas of memory that are zero filled on first use; they are abandoned when they are no longer needed.
Shadow objects hold private copies of pages that have been modified; they are abandoned when they are no longer referenced.

These objects are often referred to as "internal" objects in the source code. The type of an object is defined by the type of pager that that object uses to fulfill page-fault requests.

A named object uses either the device pager, if it maps a hardware device, or the vnode pager, if it is backed by a file in the filesystem. The device pager services a page fault by returning the appropriate address for the device being mapped. Since the device memory is separate from the main memory on the machine, it will never be selected by the pageout daemon. Thus, the device pager never has to handle a pageout request.

The vnode pager provides an interface to objects that represent files in the filesystem. The vnode pager keeps a reference to a vnode that represents the file being mapped in the object. The vnode pager services a pagein request by doing a read on the vnode; it services a pageout request by doing a write to the vnode. Thus, the file itself stores the modified pages. In cases where it is not appropriate to modify the file directly, such as an executable that does not want to modify its initialized data pages, the kernel must interpose an anonymous shadow object between the vm_map_entry and the object representing the file; see Section 5.5.

Anonymous objects use the swap pager. An anonymous object services pagein requests by getting a page of memory from the free list and zeroing that page. When a pageout request is made for a page for the first time, the swap pager is responsible for finding an unused page in the swap area, writing the contents of the page to that space, and recording where that page is stored. If a pagein request comes for a page that had been previously paged out, the swap pager is responsible for finding where it stored that page and reading back the contents into a free page in memory. A later pageout request for that page will cause the page to be written out to the previously allocated location.

Shadow objects also use the swap pager. They work just like anonymous objects, except that the swap pager provides their initial pages by copying existing pages in response to copy-on-write faults instead of by zero-filling pages.

Further details on the pagers are given in Section 5.10.

Objects

Each virtual-memory object has a pager type, pager handle, and pager private data associated with it. Objects that map files have a vnode-pager type associated with them. The handle for the vnode-pager type is a pointer to the vnode on which to do the I/O, and the private data is the size of the vnode at the time that the mapping was done. Every vnode that maps a file has an object associated with it. When a fault occurs for a file that is mapped into memory, the object associated with the file can be checked to see whether the faulted page is resident. If the page is resident, it can be used. If the page is not resident, a new page is allocated, and the vnode pager is requested to fill the new page.

Caching in the virtual-memory system is identified by an object that is associated with a file or region that it represents. Each object contains pages that are the cached contents of its associated file or region. Objects are reclaimed as soon as their reference count drops to zero. Pages associated with reclaimed objects are moved to the free list. Objects that represent anonymous memory are reclaimed as part of cleaning up a process as it exits. However, objects that refer to files are persistent. When the reference count on a vnode drops to zero, it is stored on a least-recently used (LRU) list known as the vnode cache; see Section 6.6. The vnode does not release its object until the vnode is reclaimed and reused for another file. Unless there is pressure on the memory, the object associated with the vnode will retain its pages. If the vnode is reactivated and a page fault occurs before the associated page is freed, that page can be reattached rather than being reread from disk.

This cache is similar to the text cache found in earlier versions of BSD in that it provides performance improvements for short-running but frequently executed programs. Frequently executed programs include those used to list the contents of directories, show system status, or perform the intermediate steps involved in compiling a program. For example, consider a typical application that is made up of multiple source files. Each of several compiler steps must be run on each file in turn. The first time that the compiler is run, the executable files associated with its various components are read in from the disk. For each file compiled thereafter, the previously created executable files are found, as well as any previously read header files, alleviating the need to reload them from disk each time.

Objects to Pages

When the system is first booted, the kernel looks through the physical memory on the machine to find out how many pages are available. After the physical memory that will be dedicated to the kernel itself has been deducted, all the remaining pages of physical memory are described by vm_page structures. These vm_page structures are all initially placed on the memory free list. As the system starts running and processes begin to execute, they generate page faults. Each page fault is matched to the object that covers the faulting piece of address space. The first time that a piece of an object is faulted, it must allocate a page from the free list and must initialize that page either by zero-filling it or by reading its contents from the filesystem. That page then becomes associated with the object. Thus, each object has its current set of vm_page structures linked to it. A page can be associated with at most one object at a time. Although a file may be mapped into several processes at once, all those mappings reference the same object. Having a single object for each file ensures that all processes will reference the same physical pages.

If memory becomes scarce, the paging daemon will search for pages that have not been used actively. Before these pages can be used by a new object, they must be removed from all the processes that currently have them mapped, and any modified contents must be saved by the object that owns them. Once cleaned, the pages can be removed from the object that owns them and can be placed on the free list for reuse. The details of the paging system are described in Section 5.12.