Section 9.8. The swapfs Layer

9.8. The `swapfs` Layer

Each physical page of memory is identified by its vnode and offset. The vnode and offset identify a backing store that tells where to find the page when it's not in physical memory. For a regular file, the physical page caching the file has a vnode and offset that are simply the file's vnode and offset. Swap space is used as a backing store for anonymous pages of memory, so that when we are short of memory, we can copy a page out to disk and free up a page of memory.

Because swap space is used as the backing store for anonymous memory, we need to ensure we have enough swap space for the pages we may need to swap out. We do that by reserving space upfront when we create writable mappings backed by anonymous memory for heap space, stack, and writable mapped files with MAP_PRIVATE set.

The Solaris kernel allows us to allocate anonymous memory without reserving physical swap space when sufficient memory is available to hold the virtual contents of a process. This means that under some circumstances a system can run with little or no swap.

Traditional UNIX implementations need a page-sized unit of swap space for every page-sized unit of writable virtual memory. For example, a malloc request of 8 Mbytes on a traditional UNIX system would require us to reserve 8 Mbytes of swap disk space, even if that space was never used. This requirement led to the old rule of swap space = 2 x memory sizethe rough assumption was that processes would, on average, have a virtual size about twice that of the physical pages they consumed. The swapfs layer allows Solaris to be much more conservative; you only need swap space for the amount of virtual memory that is larger than the pageable physical memory available in the machine.

The Solaris swap implementation uses swapfs to implement space-efficient swap allocation. The swapfs file system is a pseudo file system between the anon layer and the physical swap devices. The swapfs file system acts as if there is real swap space behind the page, even if no physical swap space was allocated.

9.8.1. `swapfs` Implementation

The swapfs file system uses a system global variable, availrmem, to keep track of the available pageable physical memory in the system and adds it to the total amount of swap space available. When we reserve virtual swap, we simply decrement the amount of virtual memory available from the pool. As long as enough memory and physical swap space are available, then the swap allocations succeed. It's not until later that physical swap space is assigned.

When we create a private segment, we reserve swap and allocate anon structures. At this stage, that's all that happens until a real memory page is created as a result of a ZFOD or copy-on-write (COW). When a physical page is faulted in, it is identified by vnode/offset, which for anonymous memory is the virtual swap device for the page.

Anonymous pages in Solaris are assigned a swapfs vnode and offsets when the segment driver calls anon_alloc() to get a new anonymous page. The anon_alloc() function calls into swapfs tHRough swapfs_getvp() and then calls swapfs_getpage() to create a new page with swapfs vnode/offset. The anon structure members, an_vp and an_off, which identify the backing store for this page, are initialized to reference the vnode and offset within the swapfs virtual swap device.

Figure 9.12 shows how the anon slot points into swapfs. At this stage, we still don't need any physical swap spacethe amount of virtual swap space available was decremented when the segment reserved virtual swap spacebut because we haven't had to swap the pages out to physical swap, no physical swap space has been allocated.

Figure 9.12. Anon Slot Initialized to Virtual Swap before Page-Out

It's not until the first page-out request occursbecause the page scanner must want to push a page to swapthat real swap is assigned. At this time, the page scanner looks up the vnode for the page and then calls its putpage() method. The page's vnode is a swapfs vnode, and hence swapfs_putpage() is called to swap this page out to the swap device. The swapfs_putpage() routine allocates a page-sized block of physical swap and then sets the physical vnode an_pvp and an_poff fields in the anon slot to point to the physical swap device. The page is pushed to the swap device. At this point we allocate physical swap space. Figure 9.13 shows the anon slot after the page has been swapped out.

Figure 9.13. Physical Swap after a Page-Out Occurs

When we exhaust physical swap space, we simply ignore the putpage() request for a page, resulting in memory performance problems that are very hard to analyze. A failure does not occur when physical swap space fills; during reservation, we ensured that we had sufficient available virtual swap space, comprising both physical memory and physical swap space. In this case, the swapfs_putpage() simply leaves the page in memory and does not push a page to physical swap. This means that once physical swap is 100 percent allocated, we begin effectively locking down the remaining pages in physical memory. For this reason, it's often a bad idea to run with 100 percent physical swap allocation (swap -1 shows 0 blocks free) because we might start locking down the wrong pages in memory and our working set might not correctly match the pages we really want in memory.

9.8. The swapfs Layer

9.8.1. swapfs Implementation

Figure 9.12. Anon Slot Initialized to Virtual Swap before Page-Out

Figure 9.13. Physical Swap after a Page-Out Occurs

9.8. The `swapfs` Layer

9.8.1. `swapfs` Implementation