Section 8.3. Mach VM | Mac OS X Internals: A Systems Approach

8.3. Mach VM

In this section, we will discuss the Mach VM architecture as it is implemented in the Mac OS X kernel. Mach's VM design has the following noteworthy aspects:

A clean separation between machine-dependent and machine-independent parts. Only the latter part has complete VM-related information.
Large, sparse virtual address spacesone for each task, and fully shared by all threads within that task.
Integration of memory management and interprocess communication. Mach provides IPC-based interfaces for working with task address spaces. These interfaces are especially flexible in allowing one task to manipulate the address space of another.
Optimized virtual copy operations through symmetric or asymmetric copy-on-write (COW) algorithms.
Flexible memory sharing between related or unrelated tasks, with support for copy-on-write, which is useful during fork() and during large IPC transfers. In particular, tasks can send parts of their address spaces to one another in IPC messages.
Memory-mapped files.
A variety of backing store types usable through multiple pagers. Although not supported in Mac OS X, Mach provides support for user-space pagers, wherein user programs can implement facilities such as encrypted virtual memory and distributed shared memory.

Figure 85 shows an overview of the relationships between the key components of Mach's VM architecture.

Figure 85. The Mac OS X implementation of the Mach VM architecture

8.3.1. Overview

Each task's address space is represented in the kernel by an address mapa VM map, which contains a doubly linked list of memory regions and a machine-dependent physical map (pmap) structure. The pmap handles virtual-to-physical address translations. Each memory regiona VM map entryrepresents a contiguous range of virtual addresses, all of which are currently mapped (valid) in the task. However, each range has its own protection and inheritance attributes, so even if an address is valid, the task may not be able to access it for one or more types of operations. Moreover, the VM map entries are ordered by address in the list. Each VM map entry has an associated VM object, which contains information about accessing the memory from its source. A VM object contains a list of resident pages, or VM pages. Each VM page is identified within the VM object by its offset from the start of the object. Now, some or all of the VM object's memory may not be resident in physical memoryit may be in a backing store, for example, a regular file, a swap file, or a hardware device. The VM object is backed^[3] by a memory object, which, in the simplest sense, is a Mach port to which messages can be sent by the kernel to retrieve the missing data. The owner of a memory object is a memory manager (often called a pager). A pager is a specialized task (an in-kernel piece of code in Mac OS X) that supplies data to the kernel and receives modified data upon eviction.

^[3] A portion of a VM object can also be backed by another VM object, as we will see when we discuss Mach's copy-on-write mechanism.

Figure 86 is a more detailed version of Figure 85, showing a finer-grained view of the relationships between the VM subsystem data structures.

Figure 86. Details of the Mac OS X Mach VM architecture

Let us now look at the important constituents of Mach's VM subsystem in detail.

8.3.2. Task Address Spaces

Each task has a virtual address space defining the set of valid virtual addresses that any thread within the task is allowed to reference. A 32-bit task has a 4GB virtual address space, whereas a 64-bit task's virtual address space is much largerMac OS X 10.4 provides a 64-bit user task with a 51-bit virtual address space, which amounts to over 2 petabytes^[4] of virtual memory. For a typical task, its virtual address space is "large" in that it uses only a subset of the available virtual memory. At any given time, several subranges of a task's address space may be unused, leading to a typically sparsely populated virtual memory. It is, however, possible for special-purpose programs to have virtual memory requirements that exceed what a 32-bit address space can provide.

^[4] A petabyte is approximately 10¹⁵ bytes.

8.3.3. VM Maps

Each task's virtual address space is described by a VM map data structure (struct vm_map [osfmk/vm/vm_map.h]). The task structure's map field points to a vm_map structure.

The task structure also contains information used by the task working set detection subsystem and the global shared memory subsystem. We will look at these subsystems in Sections 8.14 and 8.13, respectively.

A VM map is a collection of memory regions, or VM map entries, with each region being a virtually contiguous set of pages (a virtual range) with the same properties. Examples of these properties include the memory's source and attributes such as protection and inheritance. Each entry has a start address and an end address. The VM map points to an ordered doubly linked list of VM map entries.

8.3.4. VM Map Entries

A VM map entry is represented by a vm_map_entry structure (struct vm_map_entry [osfmk/vm/vm_map.h]). Since each entry represents a virtual memory range that is currently mapped in the task, the kernel searches the entry list at various timesin particular, while allocating memory. vm_map_lookup_entry() [osfmk/vm/vm_map.c] is used to find a VM map entry, if any, containing the specified address in the given VM map. The search algorithm is simple: The kernel searches the list linearly, either from the head of the list or from a hint that it previously saved after a successful lookup. The hint is maintained in the VM map, which also maintains a "free space" hint used to determine a free address quickly. If the given address cannot be found, vm_map_lookup_entry() returns the immediately preceding entry.

The kernel can split or merge VM map entries as necessary. For example, changing one or more attributes of a subset of a VM entry's pages will result in the entry being split into either two or three entries, depending on the offset of the modified page or pages. Other operations can lead to the merging of entries describing adjacent regions.

8.3.5. VM Objects

A task's memory can have several sources. For example, a shared library mapped into the task's address space represents memory whose source is the shared library file. We noted earlier that all pages in a single VM map entry have the same source. A VM object (struct vm_object [osfmk/vm/vm_object.h]) represents that source, with a VM map entry being the bridge between a VM object and a VM map. A VM object is conceptually a contiguous repository of data, some of which may be cached in resident memory, and the rest can be retrieved from the corresponding backing store. The entity in charge of transferring pages between physical memory and a backing store is called a pager, or more appropriately, a memory manager. In other words, a VM object is backed by a memory manager. As we will shortly see, when Mach uses copy-on-write optimizations, a VM object can be partially backed by another VM object.

Although we will use the terms pager and memory manager synonymously, it must be noted that besides paging, a memory manager also plays an important role in maintaining consistency between the contents of the backing store and the contents of resident pages corresponding to a VM object. Sometimes a memory manager is also called a data manager.

8.3.5.1. Contents of a VM Object

A VM object contains a list of its resident pages, along with information about how to retrieve the pages that are not resident. Note that resident pages are not shared between VM objectsa given page exists within exactly one VM object. The list of resident page structures attached to a VM object is especially useful in releasing all pages associated with an object when it is destroyed.

A VM object data structure also contains properties such as the following:

Object's size
Number of references to the object
Associated memory object (pager) and the offset into the pager
Memory object control port
Pointers to shadow and copy objects (see Section 8.3.7), if any
"Copy strategy" that the kernel should use while copying the VM object's data
Flag indicating whether the object is internal (and thus is created by the kernel and managed by the default pager)
Flag indicating whether the object is temporary (and thus cannot be changed externally by a memory manager; in-memory changes to such an object are not reflected back to the memory manager)
Flag indicating whether the object can persist (i.e., whether the kernel can keep the object's data cached, along with rights to the associated memory object) after all address map references to the object are deallocated

As shown in Figure 86, a memory object is implemented as a Mach port to which a pager owns receive rights.^[5] When the kernel needs a VM object's pages to be brought into physical memory from the backing store, it communicates with the associated pager through the memory object port. The memory object control port, to which the kernel owns receive rights, is used to receive data from the pager.

^[5] We will discuss Mach port rights in Chapter 9.

With this knowledge, we can redescribe the bigger picture as follows: A VM map maps each valid region of a task's virtual address space to an offset within some memory object. For each memory object used in a VM map, the VM subsystem maintains a VM object.

8.3.5.2. Backing Stores

A backing store is a place for data to live when it is not resident. It can also be the source of the data, but not necessarily. In the case of a memory-mapped file, the backing store is the file itself. When the kernel needs to evict from physical memory a page that is backed by a file, it can simply discard the page unless the page has been modified while it was resident, in which case the change can be committed to the backing store.

Dynamically allocated memory, such as that obtained by calling malloc(3), is anonymous in that it has no named source to begin with. When an anonymous memory page is used for the first time, Mach simply provides a physical page filled with zeros (hence, anonymous memory is also called zero-filled memory). In particular, there is no backing store initially associated with anonymous memory. When the kernel must evict such a page, it uses swap space as the backing store. Anonymous memory does not persist across system reboots. The corresponding VM objects, which are created by the kernel, are also called internal objects.

When allocating anonymous memory, the kernel checks whether an existing VM map entry can be extended so that the kernel can avoid creating a new entry and a new VM object.

8.3.6. Pagers

A pager manipulates memory objects and pages. It owns the memory object port, which is used by the pager's clients (such as the kernel) as an interface to the memory object's pages, with operations for reading and writing those pages being part of the interface. The memory object is essentially a Mach port representation of the underlying backing storage^[6]it represents the nonresident state of the memory ranges backed by the memory object abstraction. The nonresident state (e.g., on-disk objects such as regular files and swap space) is essentially secondary memory that the kernel caches in primary (physical) memory.

^[6] Here's another way to look at this: A memory object is an object-oriented encapsulation of memory, implementing methods such as read and write.

As shown in Figure 81, Mac OS X provides three in-kernel pagers:

The default pager, which transfers data between physical memory and swap space
The vnode pager, which transfers data between physical memory and files
The device pager, which is used for mapping special-purpose memory (such as framebuffer memory, PCI memory, or other physical addresses mapped to special hardware), with the necessary WIMG characteristics

The letters in WIMG each specify a caching aspect, namely: write-through, caching-inhibited, memory coherency required, and guarded storage.

A pager may provide any number of memory objects, each of which represents one range of pages that the pager manages. Conversely, a task's address space may have any number of pagers managing separate pieces of it. Note that a pager is not directly involved in paging policiesit cannot alter the kernel's page replacement algorithm beyond setting memory object attributes.

8.3.6.1. External Pagers

The term external memory manager (or external pager) can be used to mean two things. In the first case, it refers to any pager other than the defaultspecifically, one that manages memory whose source is external to the kernel. Anonymous memory corresponds to internal objects, whereas memory-mapped files correspond to external objects. Therefore, the vnode pager would be termed as an external pager in this sense. This is the meaning we use in this chapter.

The other meaning refers to where the pager is implemented. If we designate an in-kernel pager as an internal pager, an external pager would be implemented as a specialized user task.

User-Space Pagers

User-space pagers allow flexibility in the types of backing stores that can be introduced without changing the kernel. For example, a pager can be written whose backing store is encrypted or compressed on disk. Similarly, distributed shared memory can be easily implemented via a user-space pager. Mac OS X does not support user-space pagers.

8.3.6.2. A Pager's Port

Whereas a memory object represents a source of data, the memory object's pager is the provider and manager of that data. When a portion of memory represented by a memory object is used by a client task, there are three parties primarily involved: the pager, the kernel, and the client task. As we will see in Section 8.6.1, a task directly or indirectly uses vm_map() (or its 64-bit variant) to map some or all of the memory object's memory into its address space. To do this, the caller of vm_map() must have send rights to the Mach port that represents the memory object. The pager owns this port and can therefore provide these rights to others.

A pager could advertise a service port to which clients could send messages to obtain memory objects. For example, a user-space pager could register its service port with the Bootstrap Server.^[7] However, Mac OS X currently does not provide support for adding your own pagers. The three in-kernel pagers in Mac OS X have hardcoded ports. When pager-independent VM code needs to communicate with a pager, it determines the pager to call based on the value of the memory object passed, since the value must correspond to one of the known pagers.

^[7] We will discuss details of the Bootstrap Server in Section 9.4.

kern_return_t memory_object_init(memory_object_t              memory_object,                    memory_object_control_t      memory_control,                    memory_object_cluster_size_t memory_object_page_size) {     if (memory_object->pager = &vnode_pager_workaround)         return vnode_pager_init(memory_object, memory_control,                                 memory_object_page_size);     else if (memory_object->pager == &device_pager_workaround)         return device_pager_init(memory_object, memory_control,                                  memory_object_page_size);     else // default pager         return dp_memory_object_init(memory_object, memory_control,                                      memory_object_page_size); }

The operation of a pager in the Mac OS X kernel uses a combination of the following: a subset of the original Mach pager interface, universal page lists (UPLs), and the unified buffer cache (UBC).

Note that the kernel implicitly provides the memory object for an internal pagerthe calling task does not have to acquire send rights to one directly. For example, when a regular file is opened, the vnode pager's port is stashed into the UBC structure referenced from the vnode.

8.3.6.3. The Mach Pager Interface

Mach paging can be summarily described as follows: a client task obtains a memory object port directly or indirectly from a memory manager. It requests the kernel by calling vm_map() to map the memory object into its virtual address space. Thereafter, when the task attempts to accessread or writea page from the newly mapped memory for the first time, a page-not-resident fault occurs. In handling the page fault, the kernel communicates with the memory manager by sending it a message requesting the missing data. The memory manager fetches the data from the backing store it is managing. Other types of page faults are handled as appropriate, with the kernel calling the memory manager and the latter responding asynchronously.

This is how the kernel uses physical memory as a cache for the contents of various memory objects. When the kernel needs to evict resident pages, it maydepending on the nature of the mappingsend "dirty" (modified while resident) pages to the memory manager.

When a client task is done using a mapped memory range, it can call vm_deallocate() to unmap that range. When all mappings of a memory object are gone, the object is terminated.

Figure 87 shows several messages (routines) that are part of the dialog between a memory manager and a kernel.^[8] Let us look at some of these.

^[8] We say "a kernel" because, pedantically speaking, a pager could be serving multiple kernels.

Figure 87. The Mach pager interface in Mac OS X

When a memory object is mapped for the first time, the kernel needs to notify the pager that it is using the object. It does so by sending a memory_object_init() message^[9] to the pager. Regardless of where it is implemented, if you consider the pager as being logically external to the kernel, this is an upcall from the kernel to the pager. An external pager demultiplexes all messages it receives using the memory_object_server() routine.

^[9] In Mac OS X, the "message" is simply a function call, not an IPC message.

kern_return_t memory_object_init(memory_object_t              memory_object,                    memory_object_control_t      memory_control,                    memory_object_cluster_size_t memory_object_page_size);

The memory_object argument to memory_object_init() is the port representing the memory object in question. Since the pager can give different clients different memory objects, the client tells the pager which memory object it is dealing with. memory_control, which the kernel provides to the pager, is a port to which the kernel holds receive rights. The pager uses this port to send messages to the kernel. Hence, it is also called the pager reply port.

In Mach, a pager could be serving multiple kernels. In that case, there would be a separate control port for each kernel.

Consider the specific example of the vnode pager in Mac OS X. When memory_object_init() determines (using the hardcoded vnode pager port) that the memory object passed to it corresponds to the vnode pager, it calls vnode_pager_init() [osfmk/vm/bsd_vm.c]. The latter does not really set up the vnode pager, which was already set up when the vnode was created. However, vnode_pager_init() does call memory_object_change_attributes() to set the kernel's attributes for the memory object.

kern_return_t memory_object_change_attributes(memory_object_control_t control,                                 memory_object_flavor_t  flavor,                                 memory_object_info_t    attributes,                                 mach_msg_type_number_t  count);

The kernel maintains per-object attributes for mapped objects. Cacheability and copy strategy are examples of such attributes. Cacheability specifies whether the kernel should cache the object (provided there is enough memory) even after all users of the object are gone. If an object is marked as not cacheable, it will not be retained when it is not in use: The kernel will return the dirty pages to the pager, reclaim the clean pages, and inform the pager that the object is no longer in use. Copy strategy specifies how the memory object's pages are copied. The following are examples of valid copy strategies.

MEMORY_OBJECT_COPY_NONE The pager's pages should be copied immediately, with no copy-on-write optimization by the kernel.
MEMORY_OBJECT_COPY_CALL If the kernel needs to copy any of the pager's pages, it should call the pager.
MEMORY_OBJECT_COPY_DELAY The pager promises not to change externally any of the data cached by the kernel, so the kernel is free to use an optimized copy-on-write strategy (see asymmetric copy-on-write in Section 8.3.7).
MEMORY_OBJECT_COPY_TEMPORARY This strategy acts like MEMORY_OBJECT_COPY_DELAY; additionally, the pager is not interested in seeing any changes from the kernel.
MEMORY_OBJECT_COPY_SYMMETRIC This strategy acts like MEMORY_OBJECT_COPY_TEMPORARY; additionally, the memory object will not be multiply mapped (see symmetric copy-on-write in Section 8.3.7).

The attributes can be retrieved through memory_object_get_attributes().

kern_return_t memory_object_get_attributes(memory_object_control_t  control,                              memory_object_flavor_t   flavor,                              memory_object_info_t     attributes,                              mach_msg_type_number_t  *count);

When a client task accesses a memory object page that is not resident, a page fault occurs. The kernel locates the appropriate VM object, which refers to the memory object. The kernel sends the pager a memory_object_data_request() message. The pager will typically provide the data, fetching it from the backing store.

kern_return_t memory_object_data_request(memory_object_t              memory_object,                            memory_object_offset_t       offset,                            memory_object_cluster_size_t length,                            vm_prot_t                    desired_access);

In Mach, the pager would respond to memory_object_data_request() by sending an asynchronous reply to the kernel: it would send a memory_object_data_supply() or memory_object_data_provided() message (depending on the Mach version) to the memory object control port. In Mac OS X, memory_object_data_request() explicitly calls one of the three pagers. In the case of the vnode pager, the kernel calls vnode_pager_data_request() [osfmk/vm/bsd_vm.c], which in turn calls vnode_pager_cluster_read() [osfmk/vm/bds_vm.c]. The latter causes data to be paged in by calling vnode_pagein() [bsd/vm/vnode_pager.c], which eventually calls the file-system-specific page-in operation.

Paging Problems

In Mach, the pager can also reply with a memory_object_data_unavailable() or memory_object_data_error() message. memory_object_data_unavailable() means that although the range within the memory object is valid, there is no data for it yet. This message notifies the kernel to return zero-filled pages for the range. Although the pager itself could create zero-filled pages and supply them through memory_object_data_supply(), the kernel's zero-fill code is likely to be more optimized. If a paging errorsay, a bad disk sectorcauses the pager to fail to retrieve data, the pager can respond with a memory_object_data_error() message.

When the kernel needs to reclaim memory and there are dirty pages for a memory object, the kernel can send those pages to the pager through memory_object_data_return(). In Mac OS X, the in-kernel page-out daemon does this.

kern_return_t memory_object_data_return(memory_object_t         memory_object,                           memory_object_offset_t  offset,                           vm_size_t               size,                           memory_object_offset_t *resid_offset,                           int                    *io_error,                           boolean_t               dirty,                           boolean_t               kernel_copy,                           int                     upl_flags);

There is no explicit response to this messagethe pager simply deallocates the pages from its address space so that the kernel can use the physical memory for other purposes. In Mac OS X, for the vnode pager, memory_object_data_return() calls vnode_pager_data_return() [osfmk/vm/bsd_vm.c], which in turn calls vnode_pager_cluster_write() [osfmk/vm/bsd_vm.c]. The latter causes data to be paged out by calling vnode_pageout() [bsd/vm/vnode_pager.c], which eventually calls the file-system-specific page-out operation.

A pager uses memory_object_lock_request() to control use of the (resident) data associated with the given memory object. The data is specified as the number of bytes (the size argument) starting at a given byte offset (the offset argument) within the memory object. memory_object_lock_request() sanity-checks its arguments and calls vm_object_update() [osfmk/vm/memory_object.c] on the associated VM object.

kern_return_t memory_object_lock_request(memory_object_control_t control,                            memory_object_offset_t  offset,                            memory_object_size_t    size,                            memory_object_offset_t *resid_offset,                            int                    *io_errno,                            memory_object_return_t  should_return,                            int                     flags,                            vm_prot_t               prot);

The should_return argument to memory_object_lock_request() is used to specify the data to be returned, if at all, to the memory manager. It can take the following values:

MEMORY_OBJECT_RETURN_NONE do not return any pages
MEMORY_OBJECT_RETURN_DIRTY return only dirty pages
MEMORY_OBJECT_RETURN_ALL return both dirty and precious pages
MEMORY_OBJECT_RETURN_ANYTHING return all resident pages

The flags argument specifies the operation to perform, if any, on the data. Valid operations are MEMORY_OBJECT_DATA_FLUSH, MEMORY_OBJECT_DATA_NO_CHANGE, MEMORY_OBJECT_DATA_PURGE, MEMORY_OBJECT_COPY_SYNC, MEMORY_OBJECT_DATA_SYNC, and MEMORY_OBJECT_IO_SYNC. Note that the combination of should_return and flags determines the fate of the data. For example, if should_return is MEMORY_OBJECT_RETURN_NONE and flags is MEMORY_OBJECT_DATA_FLUSH, the resident pages will be discarded.

The prot argument is used to restrict access to the given memory. Its value specifies the access that should be disallowed. The special value VM_PROT_NO_CHANGE is used when no change in protection is desired.

The kernel uses memory_object_terminate() to notify the pager that the object is no longer in use. The pager uses memory_object_destroy() to notify the kernel to shut down a memory object even if there are references to the associated VM object. This results in a call to vm_object_destroy() [osfmk/vm/vm_object.c]. In Mac OS X, memory_object_destroy() is called because of vclean() [bsd/vfs/vfs_subr.c], which cleans a vnode when it is being reclaimed.

kern_return_t memory_object_terminate(memory_object_t memory_object); kern_return_t memory_object_destroy(memory_object_control_t control, kern_return_t reason);

8.3.7. Copy-on-Write

Copy-on-write (COW) is an optimization technique wherein a memory copy operation defers the copying of physical pages until one of the parties involved in the copy writes to that memoryuntil then, the physical pages are shared between the parties. As long as copied data is only read and not written to, copy-on-write saves both time and physical memory. Even when the data is written to, copy-on-write copies only the modified pages.

Note in Figure 86 that two of the VM entries are shown as pointing to the same VM object. This is how Mach implements symmetric copy-on-write sharing. Figure 88 shows the scheme. In a symmetric copy-on-write operation, the needs_copy bit is set in both the source and destination VM map entries. Both entries point to the same VM object, whose reference count is incremented. Moreover, all pages in the VM object are write-protected. At this point, both tasks access the same physical pages while reading from the shared memory. When such a page is written to by one of the tasks, a page protection fault occurs. The kernel does not modify the original VM object but creates a new VM objecta shadow object containing a copy of the faulting pageand gives it to the task that modified the page. The other pages, including the unmodified version of the page in question, remain in the original VM object, whose needs_copy bit remains set.

Figure 88. Symmetric copy-on-write using shadow objects

In Figure 88, when the destination task accesses a previously copy-on-write-shared page that it has already modified, the kernel will find that page in the shadow object. The remaining pages will not be found in the shadow objectthe kernel will follow the pointer to the original object and find them there. Multiple copy-on-write operations can result in a shadow object being shadowed by another, leading to a shadow chain. The kernel attempts to collapse such chains when possible. In particular, if all pages in some VM object are shadowed by the parent object, the latter does not need to shadow the former any moreit can shadow the next VM object, if any, in the chain.

The scheme is symmetric because its operation does not depend on which taskthe source or the destination in the copy-on-write operationmodifies a shared page.

It is important to note that when a shadow object is created during a symmetric copy-on-write, no memory manager is recorded for it. The kernel will use swap space as the backing store, and the default pager as the memory manager, when it needs to page out anonymous memory. There is a problem, however, if an external memory managersay, the vnode pager in the case of a memory-mapped filebacks the original VM object. The kernel cannot change the VM object because doing so would disconnect the file mapping. Since page modifications in a symmetric copy-on-write are seen only by shadow objects, the original VM object, which is connected to the memory manager, will never see those modifications. Mach solves this problem by using an asymmetric copy-on-write algorithm, in which the source party retains the original VM object and the kernel creates a new object for the destination. The asymmetric algorithm works as follows (see Figure 89).

When a copy operation is performed, create a new objecta copy objectfor use by the destination.
Point the shadow field of the copy object to the original object.
Point the copy field of the original object to the copy object.
Mark the copy object as copy-on-write. Note that the original object is not marked copy-on-write in this case.
Whenever a page is about to be modified in the source mapping, copy it to a new page first and push that page to the copy object.

Figure 89. Asymmetric copy-on-write using copy objects

8.3.8. The Physical Map (Pmap)

A VM map also points to a physical map (pmap) data structure (struct pmap [osfmk/ppc/pmap.h]), which describes hardware-defined virtual-to-physical address translation mappings. Mach's pmap layer encapsulates the machine-dependent VM codein particular, that for managing the MMU and the cachesand exports generic functions for use by the machine-independent layer. To understand the pmap layer's role in the system, let us look at examples of functions in the pmap interface.

The Mac OS X kernel contains additional code outside of the pmap modulein osfmk/ppc/mappings.cto maintain virtual-to-physical mappings on the PowerPC. This code acts as a bridge between the pmap layer and the underlying hardware, which is contrary to Mach's traditional encapsulation of all hardware-dependent code within the pmap layer.

8.3.8.1. The Pmap Interface

pmap_map() maps the virtual address range starting at va to the physical address range spa through epa, with the machine-independent protection value prot. This function is called during bootstrapping to map various ranges, such as those corresponding to the exception vectors, the kernel's text segment, and the kernel's data segment.

vm_offset_t pmap_map(vm_offset_t va, vm_offset_t spa, vm_offset_t epa, vm_prot_t prot);

pmap_map_physical() and pmap_map_iohole() are special versions of pmap_map(). The former maps physical memory into the kernel's address map. The virtual address used for this mapping is lgPMWvaddr, the so-called physical memory window. pmap_map_iohole() takes a physical address and size and then maps an "I/O hole" in the physical memory window.

pmap_create() creates and returns a physical map, either by recovering one from the list of free pmaps or by allocating one from scratch.

pmap_t pmap_create(vm_map_size_t size);

Besides the list of free pmaps (free_pmap_list), the kernel also maintains the following relevant data structures:

A list of in-use pmaps (anchored by kernel_pmap, the kernel pmap).
A list of physical addresses of in-use pmaps (anchored by kernel_pmap_phys).
A pointer to a cursor pmap (cursor_pmap), which the kernel uses as the starting point while searching for free pmaps. cursor_pmap points to either the last pmap allocated or to the previous-to-last if it was removed from the in-use list of pmaps.

The kernel pmap is located in a 512-byte block in the V=R (virtual=real) area. Therefore, kernel_pmap_phys and kernel_pmap both point to the same location. Each address space is assigned an identifier that is unique within the system. The identifier is used to construct the 24-bit PowerPC virtual segment identifier (VSID). The number of active address spaces is limited by maxAdrSp (defined to be 16384 in osfmk/ppc/pmap.h).

pmap_create() is called during task creation, regardless of whether the child is inheriting the parent's memory or not. If no memory is being inherited, a "clean slate" address space is created for the child task; otherwise, each VM entry in the parent is examined to see if it needs to be shared, copied, or not inherited at all.

pmap_destroy() removes a reference to the given pmap. When the reference count reaches zero, the pmap is added to the list of free pmaps, which caches the first free_pmap_max (32) pmaps that are freed up. pmap_destroy() is called when a VM map is destroyed after the last reference to it goes away.

void pmap_destroy(pmap_t pmap);

pmap_reference() increments the reference count of the given pmap by one.

void pmap_reference(pmap_t pmap);

pmap_enter() creates a translation for the virtual address va to the physical page number pa in the given pmap with the protection prot.

void pmap_enter(pmap_t             pmap,            vm_map_offset_t    va,            ppnum_t            pa,            vm_prot_t          prot,            unsigned int       flags,            __unused boolean_t wired);

The flags argument can be used to specify particular attributes for the mappingfor example, to specify cache modes:

VM_MEM_NOT_CACHEABLE (cache inhibited)
VM_WIMG_WTHRU (write-through cache)
VM_WIMG_WCOMB (write-combine cache)
VM_WIMG_COPYBACK (copyback cache)

pmap_remove() unmaps all virtual addresses in the virtual address range determined by the given pmap and [sva, eva)that is, inclusive of sva but exclusive of eva. If the pmap in question is a nested pmap, then pmap_remove() will not remove any mappings. A nested pmap is one that has been inserted into another pmap. The kernel uses nested pmaps to implement shared segments, which in turn are used by shared libraries and the commpage mechanism.

void pmap_remove(pmap_t pmap, addr64_t sva, addr64_t eva);

pmap_page_protect() lowers the permissions for all mappings to a given page. In particular, if prot is VM_PROT_NONE, this function removes all mappings to the page.

void pmap_page_protect(ppnum_t pa, vm_prot_t prot);

pmap_protect() changes the protection on all virtual addresses in the virtual address range determined by the given pmap and [sva, eva). If prot is VM_PROT_NONE, pmap_remove() is called on the virtual address range.

void pmap_protect(pmap_t          pmap,              vm_map_offset_t sva,              vm_map_offset_t eva,              vm_prot_t       prot);

pmap_clear_modify() clears the dirty bit for a machine-independent page starting at the given physical address. pmap_is_modified() checks whether the given physical page has been modified since the last call to pmap_clear_modify(). Similarly, pmap_clear_reference() and pmap_is_referenced() operate on the referenced bit of the given physical page.

void      pmap_clear_modify(ppnum pa); boolean_t pmap_is_modified(register ppnum_t pa); void      pmap_clear_reference(ppnum_t pa); boolean_t pmap_is_referenced(ppnum_t pa);

pmap_switch() switches to a new pmapthat is, it changes to a new address space. It is called during a thread context switch (unless the two threads belong to the same task and therefore share the same address space).

void pmap_switch(pmap_t pmap);

PMAP_ACTIVATE(pmap, thread, cpu) and PMAP_DEACTIVATE(pmap, thread, cpu) activate and deactivate, respectively, pmap for use by tHRead on cpu. Both these routines are defined to be null macros on the PowerPC.