Section 9.5. Segment Drivers

9.5. Segment Drivers

Another example of the object-oriented approach to memory management is the memory "segment" object. Memory segments manage the mapping of a linear range of virtual memory into an address space. The mapping is between the address space and some type of device. The objective of the memory segment is to allow both memory and devices to be mapped into an address space. Traditionally, this required hard-coding memory and device information into the address space handlers for each device. The object architecture allows different behaviors for different segments.

For example, one segment might be a mapping of a file into an address space (with mmap()), and another segment might be the mapping of a hardware device into the process's address space (a graphics framebuffer). In this case, the segment driver provides a similar view of linear address space, even though the file mapping operation with mmap() uses pages of memory to cache the file data, whereas the framebuffer device maps the hardware device into the address space.

The flexibility of the segment object allows us to use virtually any abstraction to represent a linear address space that is visible to a process, regardless of the real facilities behind the scenes.

struct seg {         caddr_t s_base;                 /* base virtual address */         size_t  s_size;                 /* size in bytes */         uint_t  s_szc;                  /* max page size code */         uint_t  s_flags;                /* flags for segment, see below */         struct  as *s_as;               /* containing address space */         avl_node_t s_tree;              /* AVL tree links to segs in this as */         struct  seg_ops *s_ops;         /* ops vector: see below */         void *s_data;                   /* private data for instance */ };                                                                            See vm/seg.h

To implement an address space, a segment driver implementation is required to provide at least the following: functions to create a mapping for a linear address range, page fault handling routines to deal with machine exceptions within that linear address range, and a function to destroy the mapping. These functions are packaged together into a segment driver, which is an instantiation of the segment object interface. Figure 9.9 illustrates the relationship between an address space and a segment and shows a segment mapping the heap space of a process.

Figure 9.9. Segment Interface

A segment driver implements a subset of the methods described in Table 9.5, as well as a constructor function to create the first instance of the object. Functions in the segment operations structure, s_ops, point to functions within the vnode segment driver and are prefixed with segvn. A segment object is created when another subsystem wants to create a mapping by calling as_map() to create a mapping at a specific address. The segment's create routine is passed as an argument to as_map(), a segment object is created, and a segment object pointer is returned. Once the segment is created, other parts of the virtual memory system can call into the segment for different address space operations without knowing what the underlying segment driver is using the segment method operations for.

For example, when a file is mapped into an address space with mmap(), the address space map routine as_map() is called with segvn_create() (the vnode segment driver constructor) as an argument, which in turn calls into the seg_vn segment driver to create the mapping. The segment object is created and inserted into the segment list for the address space (struct as), and from that point on, the address space can perform operations on the mapping without knowing what the underlying segment is.

The address space routines can operate on the segment without knowing what type of segment is underlying by calling the segment operation macros. For example, if the address space layer wants to call the fault handler for a segment, it calls SEGOP_FAULT(), which invokes the segment-specific page fault method, as shown below.

#define SEGOP_FAULT(h, s, a, l, t, rw) \                 (*(s)->s_ops->fault)((h), (s), (a), (l), (t), (rw))                                                                            See vm/seg.h

The Solaris kernel is implemented with a range of segment drivers for various functions. The different types of drivers are shown in Table 9.4. Most of the process address space mappingincluding executable text, data, heap, stack and memory-mapped filesis performed with the vnode segment driver, seg_vn. Other types of mappings that don't have vnodes associated with them require different segment drivers. The other segment drivers are typically associated with kernel memory mappings or hardware devices, such as graphics adapters.

Table 9.4. Solaris 10 Segment Drivers
Segment	Function
`seg_vn`	The `vnode` mappings into process address spaces are managed with the `seg_vn` device driver. Executable text and data, shared libraries, mapped files, heap and stack (heap and stack are anonymous memory) are all mapped with `seg_vn`.
`seg_kmem`	The segment from which the bulk of nonpageable kernel memory is allocated. (See Chapter 11.)
`seg_kp`	The segment from which pageable kernel memory is allocated. Only a very small amount of the kernel is pageable; kernel thread stacks and TNF buffers are the main consumers of pageable kernel memory.
`seg_kpm`	A mapping of all physical memory into the kernel's address space on 64-bit systemsto facilitate fast mapping of pages. The file systems use this facility to read and write to pages to avoid excessive map/unmap operations.
`seg_spt`	Shared page table segment driver. Fast System V shared memory is mapped into process address space from this segment driver. Memory allocated from this driver is also known as Intimate Shared Memory (ISM).
`seg_map`	The kernel uses the `seg_map` driver to map files (`vnodes`) into the kernel's address space, to implement file system caching.
`seg_dev`	Mapped hardware devices.
`seg_mapdev`	Mapping support for mapped hardware devices, through the `ddi_mapdev(9F)` interface.
`seg_lock`	Mapping support for hardware graphics devices that are mapped between user and kernel address space.
`seg_drv`	Mapping support for mapped hardware graphics devices.
`seg_nf`	Nonfaulting kernel memory driver.

Table 9.5 describes segment driver methods implemented in Solaris 10.

Table 9.5. Solaris Segment Driver Methods
Method	Description
`advise()`	Provides a hint to optimize memory accesses to this segment. For example, sequential advice given to mapped files causes read-ahead to occur.
`checkprot()`	Checks that the requested access type (read, write, exec) is allowed within the protection level of the pages within the segment.
`dump()`	Dumps the segment to the dump device; used for crash dumps.
`dup()`	Duplicates the current memory segment, including all of the page mapping entries to the new segment pointer provided.
`fault()`	Handles a page fault for a segment. The arguments describe the segment, the virtual address of the page fault, and the type of fault.
`faulta()`	Starts a page fault on a segment and address asynchronously. Used for read-ahead or prefaulting of data as a performance optimization for I/O.
`free()`	Destroys a segment.
`getmemid()`	Gets a unique identifier for the memory segment.
`getoffset()`	Queries the segment driver for the offset into the underlying device for the mapping. (Not meaningful on all segment drivers.)
`getpolicy()`	Get the MPO Lgroup Policy for the supplied address
`getprot()`	Asks the segment driver for the protection levels for the memory range.
`gettype()`	Queries the driver for the sharing modes of the mapping.
`getvp()`	Gets the `vnode` pointer for the `vnode`, if there is one, behind this mapping.
`incore()`	Queries to find out how many pages are in physical memory for a segment.
`kluster()`	Asks the segment driver if it is OK to cluster I/O operations for pages within this segment.
`lockop()`	Locks or unlocks the pages for a range of memory mapped by a segment.
`pagelock()`	Locks a single page within the segment.
`setpagesize()`	Advises the page size for the address range
`setprot()`	Sets the protection level of the pages within the address range supplied.
`swapout()`	Attempts to swap out as many pages to secondary storage as possible.
`sync()`	Syncs up any dirty pages within the segment to the backing store.
`unmap()`	Unmaps the address space range within a segment.

9.5.1. The vnode Segment: `seg_vn`

The most widely used segment driver is the vnode segment driver, seg_vn. The seg_vn driver maps files (or vnodes) into a process address space, using physical memory as a cache. The seg_vn segment driver also creates anonymous memory within the process address space for the heap and stack and provides support for System V (non ISM) shared memory. (See Section 4.4.)

The seg_vn segment driver manages the following mappings into process address space:

Executable text
Executable data
Heap and stack (anonymous memory)
Shared libraries
Mapped files

9.5.1.1. Memory Mapped Files

We can map a file into a process's address space with the mmap system call. (See mmap(2).) When we map a file into our address space, we call into the address space routines to create a new segment, a vnode segment. A vnode segment handles memory address translation and page faults for the memory range requested in the mmap system call, and the new segment is added to the list of segments in the process's address space. When the segment is created, the seg_vn driver initializes the segment structure with the address and length of the mapping, then creates a seg_vn-specific data structure within the segment structure's s_data field. The seg_vn-specific data structure holds all of the information the seg_vn driver needs to handle the address mappings for the segment.

The seg_vn-specific data structure (struct segvn_data) contains pointers to the vnode that is mapped and to any anonymous memory that has been allocated for this segment. The file system does most of the work of mapped files once the mapping is created. As a result, the seg_vn driver is fairly simplemost of the seg_vn work is done during creation and deletion of the mapping.

The more complex part of the seg_vn driver implementation is its handling of anonymous memory pages within the segment, which we discuss in the sections that follow. When we create a file mapping, we put the vnode and offset of the file being mapped into the segvn_data structure members, vp and offset. The seg_vn data structure is shown below; Figure 9.10 illustrates the seg_vn segment driver vnode relationship.

Figure 9.10. The `seg_vn` Segment Driver Vnode Relationship

typedef struct  segvn_data {         krwlock_t lock;         /* protect segvn_data and vpage array */         kmutex_t segp_slock;    /* serialize insertions into seg_pcache */         uchar_t pageprot;       /* true if per page protections present */         uchar_t prot;           /* current segment prot if pageprot == 0 */         uchar_t maxprot;        /* maximum segment protections */         uchar_t type;           /* type of sharing done */         u_offset_t offset;      /* starting offset of vnode for mapping */         struct  vnode *vp;      /* vnode that segment mapping is to */         ulong_t anon_index;     /* starting index into anon_map anon array */         struct  anon_map *amp;  /* pointer to anon share structure, if needed */         struct  vpage *vpage;   /* per-page information, if needed */         struct  cred *cred;     /* mapping credentials */         size_t  swresv;         /* swap space reserved for this segment */         uchar_t advice;         /* madvise flags for segment */         uchar_t pageadvice;     /* true if per page advice set */         ushort_t flags;         /* flags - from sys/mman.h */         ssize_t softlockcnt;    /* # of pages SOFTLOCKED in seg */         lgrp_mem_policy_info_t policy_info; /* memory allocation policy */ } segvn_data_t;                                                                         See vm/seg_vn.h

Creating a mapping for a file is done with the mmap() system call, which calls the map method for the file system that contains the file. For example, calling mmap() for a file on a UFS file system will call ufs_map(), which in turn calls into the seg_vn driver to create a mapped file segment in the address space with the segvn_create() function.

At this point we create an actual virtual memory mapping by talking to the hardware through the hardware address translation functions by using the hat_map() function. The hat_map() function is the central function for creating address space mappings. It calls into the hardware-specific memory implementation for the platform to program the hardware MMU, so that memory address references within the supplied address range will trigger the page fault handler in the segment driver until a valid physical memory page has been placed at the accessed location. Once the hardware MMU mapping is established, the seg_vn driver can begin handling page faults within that segment.

Having established a valid hardware mapping for our file, we can look at how our mapped file is effectively read into the address space. The hardware MMU can generate traps for memory accesses to the memory within that segment. These traps will be routed to our seg_vn driver through the as_fault() routine. (See Section 9.4.4.) The first time we access a memory location within our segment, the segvn_fault() page fault handling routine is called. This fault handler recognizes our segment as a mapped file (by looking in the segvn_data structure) and simply calls into the vnode's file system (in this case, with ufs_getpage()) to read in a page-sized chunk from the file system. The subsequent access to memory that is now backed by physical memory simply results in a normal memory access. It's not until a page is stolen from behind the segment (the page scanner can do this) that a page fault will occur again.

Writing to a mapped file is done by updating the contents of memory within the mapped segment. The file is not updated instantly, since there is no software- or hardware-initiated event to trigger any such write. Updates occur when the file system flush daemon finds that the page of memory has been modified and then pushes the page to the file system with the file systems putpage routine, in this case, ufs_putpage().

9.5.2. Copy-on-Write

The copy-on-write process occurs when a process writes to a page that is mapped with MAP_PRIVATE. This process prevents other mappings to the page from seeing changes that are made. seg_vn implements a copy-on-write by setting the hardware MMU permissions of a segment to read-only and setting the segment permissions to read-write. When a process attempts to write to a mapping that is configured this way, the MMU generates an exception and causes a page fault on the page in question. The page fault handler in seg_vn looks at the protection mode for the segment; if it is mapped private and read-write, then the handler initiates a copy-on-write.

The copy-on-write unmaps the shared vnode page where the fault occurred, creates a page of anonymous memory at that address, and then copies the contents of the old page to the new anonymous page. All of this happens in the context of the page fault, so the process never knows what's happening underneath it.

The copy-on-write operation behaves slightly differently under different memory conditions. When memory is low, rather than creating a new physical memory page, the copy-on-write steals the page from the offset of the file underneath and renames it to be the new anonymous page. This only occurs when free memory is lower than the system parameter minfree.

9.5.3. Page Protection and Advice

The seg_vn segment supports memory protection modes on either the whole segment or individual pages within a segment. Whole segment protection is implemented by the segvn_data structure member, prot; its enablement depends on the boolean switch, pageprot, in the segvn_data structure. If pageprot is equal to zero, then the entire segment's protection mode is set by prot; otherwise, page-level protection is enabled.

Page-level protection is implemented by an array of page descriptors pointed to by the vpage structure, shown below. If page-level protection is enabled, then vpage points to an array of vpage structures. Every possible page in the address space has one array entry, which means that the number of vpage members is the segment virtual address space size divided by the fundamental page size for the segment (8 Kbytes on UltraSPARC).

struct vpage {         uchar_t nvp_prot;       /* see <sys/mman.h> prot flags */         uchar_t nvp_advice;     /* pplock & <sys/mman.h> madvise flags */ };                                                                           See vm/page.h

The vpage enTRy for each page uses the standard memory protection bits (see mmap(2)). The per-page vpage structures are also used to implement memory advice for memory-mapped files in the seg_vn segment.

9.5. Segment Drivers

Figure 9.9. Segment Interface

Table 9.4. Solaris 10 Segment Drivers

Table 9.5. Solaris Segment Driver Methods

9.5.1. The vnode Segment: seg_vn

9.5.1.1. Memory Mapped Files

Figure 9.10. The seg_vn Segment Driver Vnode Relationship

9.5.2. Copy-on-Write

9.5.3. Page Protection and Advice

9.5.1. The vnode Segment: `seg_vn`

Figure 9.10. The `seg_vn` Segment Driver Vnode Relationship