Section 3.5. Memory-Management Services | The Design and Implementation of the FreeBSD Operating System

3.5. Memory-Management Services

The memory organization and layout associated with a FreeBSD process is shown in Figure 3.3 (on page 62). Each process begins execution with three memory segments: text, data, and stack. The data segment is divided into initialized data and uninitialized data (also known as bss). The text is read-only and is normally shared by all processes executing the file, whereas the data and stack areas can be written by, and are private to, each process. The text and initialized data for the process are read from the executable file.

Figure 3.3. Layout of a FreeBSD process in memory and on disk.

An executable file is distinguished by being a plain file (rather than a directory, special file, or symbolic link) and by having one or more of its execute bits set. Each executable file has an exec header containing a magic number that specifies the type of the executable file. FreeBSD supports multiple executable formats including the following:

Files that must be read by an interpreter
Files that are directly executable including AOUT, ELF, and gzipped ELF

An executable is initially parsed by the image activation (imgact) framework. The header of a file to be executed is passed through a list of registered image activators to find a matching format. When a matching format is found, the corresponding image activator prepares the file for execution.

Files falling into the first classification have as their magic number (located in the first 2 bytes of the file) the two-character sequence #! followed by the pathname of the interpreter to be used. This pathname is currently limited by a compile-time constant to 128 characters. For example, #!/bin/sh refers to the Bourne shell. The image activator that will be selected is the one that handles the invocation of interpreters. It will load and run the named interpreter, passing the name of the file that is to be interpreted as an argument. To prevent loops, FreeBSD allows only one level of interpretation, and a file's interpreter may not itself be interpreted.

For performance reasons, most files fall into the second classification and are directly executable. Information in the header of a directly executable file includes the architecture and operating system for which an executable was built and whether it is statically linked or uses shared libraries. The selected image activator can use information such as knowledge of the operating system for which an executable was compiled to configure the kernel to use the proper system call interpretation when running the program. For example, a executable built to run on Linux can be seamlessly run on FreeBSD by using the system-call dispatch-vector that provides emulation of the Linux system calls.

The header also specifies the sizes of text, initialized data, uninitialized data, and additional information for debugging. The debugging information is not used by the kernel or by the executing program. Following the header is an image of the text, followed by an image of the initialized data. Uninitialized data are not contained in the executable file because they can be created on demand using zero-filled memory.

To begin execution, the kernel arranges to have the text portion of the file mapped into the low part of the process address space starting at the beginning of the second page of the virtual address space. The first page of the virtual address space is marked as invalid so that attempts to read or write through a null pointer will fault. The initialized data portion of the file is mapped into the address space following the text. An area equal to the uninitialized data region is created with zero-filled memory after the initialized data region. The stack is also created from zero-filled memory. Although the stack should not need to be zero filled, early UNIX systems made it so. In an attempt to save some startup time in 4.2BSD, the developers modified the kernel to not zero-fill the stack, leaving the random previous contents of the page instead. But concerns about surreptitious misuse of data from previously running programs and unrepeatable errors in previously working programs lead to restoration of the zero filling of the stack by the time that 4.3BSD was released.

Copying into memory the entire text and initialized data portion of a large program causes a long startup latency. FreeBSD avoids this startup time by demand paging the program into memory rather than preloading the program. In demand paging, the program is loaded in small pieces (pages) as it is needed rather than all at once before it begins execution. The system does demand paging by dividing up the address space into equal-sized areas called pages. For each page, the kernel records the offset into the executable file of the corresponding data. The first access to an address on each page causes a page-fault trap in the kernel. The page-fault handler reads the correct page of the executable file into the process memory. Thus, the kernel loads only those parts of the executable file that are needed. Chapter 5 explains paging details.

It might seem more efficient to load the whole process at once rather than in many little pieces. However, most processes use less than half of their address space during their entire execution lifetime. The reason for the low utilization is that typical user commands have many options, only a few of which are used on any invocation. The code and data structures that support the unused options are not needed. Thus, the cost of loading the subset of pages that are used is lower than the cost of initially loading the whole process. In addition to the time saved by avoiding the loading of the entire process, demand paging also reduces the amount of physical memory that is needed to run the process.

The uninitialized data area can be extended with zero-filled pages using the system call sbrk, although most user processes use the library routine malloc(), a more programmer-friendly interface to sbrk. This allocated memory, which grows from the top of the original data segment, is called the heap. On the PC, the stack grows down from the top of memory, whereas the heap grows up from the bottom of memory.

Above the user stack are areas of memory that are created by the system when the process is started. Directly above the user stack is the number of arguments (argc), the argument vector (argv), and the process environment vector (envp) set up when the program was executed. Following them are the argument and environment strings themselves. Next is the signal code, used when the system delivers signals to the process. At the top is the ps_strings structure, used by ps to locate the argv of the process.

Historically most executables were statically linked. In a statically linked binary, all the library routines and system call entry stubs are loaded into the binary image at the time that it is compiled. Today, most binaries are dynamically linked. A dynamically linked binary contains only the compiled application code and a list of the routines (library and system call entry stubs) that it needs. When the executable is run, a set of shared libraries containing the routines that it needs to use are mapped into its address space as part of its startup. The first time that it calls a routine, that routine is located in the shared library and a dynamic linkage is created to it.

When the dynamic loader does the mmap system call to allocate space for the shared libraries, the kernel must find a place within the process address space to place them. The convention in FreeBSD is to place them just below the administrative lower limit for the stack. Since the stack will not be permitted to grow below the administrative stack size limit, there is no danger of the shared libraries being overwritten. A side effect of this implementation is that the stack limit cannot be safely changed after a binary begins running. Ideally, a bigger stack limit can be set by the process (such as the shell) before it starts the application. However, applications that know at startup that they will need a bigger stack can increase their stack limit and then call the exec system call on themselves to restart themselves with their shared libraries relocated at the bottom of their new stack limit.

An alternative would be to place the shared libraries just above the heap limit. However, this would mean that the heap limit could not be increased once the binary began running. As applications much more frequently want to increase their heap size than their stack size, the stack limit was selected as the appropriate location to place the shared libraries.

A process requires the use of some global system resources. The kernel maintains a linked list of processes that has an entry for each process in the system. Among other data, the process entries record information on scheduling and on virtual-memory allocation. Because the entire process address space, including the kernel stack for the process, may be swapped out of main memory, the process entry must record enough information to be able to locate the process and to bring that process back into memory. In addition, information needed while the process is swapped out (e.g., scheduling information) must be maintained in the process entry rather than in the user structure to avoid the kernel swapping in the process only to decide that it is not at a high enough priority to be run.

Other global resources associated with a process include space to record information about descriptors and page tables that record information about physical-memory utilization.