4.3 Paging and Swapping | System Performance Tuning2002

Paging and swapping are terms that are often used interchangeably, but they are quite distinct. A system that is paging is writing selected, infrequently used pages of memory to disk, while a system that is swapping is writing entire processes from memory to disk. Let's say that you are working on your automobile, and you only have a small amount of space available for tools. Paging is equivalent to putting the 8 mm socket back in the toolchest so you have enough room for a pair of pliers; swapping is like putting your entire socket set away.

Many people feel that their systems should never nontrivially page (that is, perform paging on a pre-Solaris 8 system that is not simply filesystem activity). It's important to realize that paging and swapping allow the system to continue getting work done despite adverse memory conditions. Paging is not necessarily indicative of a problem; it is the action of the page scanner to try and increase the size of the free list by moving inactive pages to disk. A process, as a general rule, spends about 80% of its time running about 20% of its code; since the entire process doesn't need to be in memory at once, writing some pages out to disk won't affect performance substantially. Performance only begins to suffer when a memory shortage continues or worsens.

4.3.1 The Decline and Fall of Interactive Performance

Historically, Unix systems implemented a time-based swapping mechanism, whereby a process that was idle for more than 20 seconds would be swapped out; this isn't done anymore. Swapping is now used only to address the most severe memory shortages. If you come across a system where vmstat reports a nonzero swap queue, the only conclusion you can draw is that, at some indeterminate time in the past, the system was short enough on memory to swap out a process.

Memory shortages also tend to appear worse than they are because of the nature of the memory reclamation mechanism. When a system is desperately swapping jobs out, it tries to avoid a major performance decrease by keeping active jobs in memory for as long as possible. Unfortunately, programs that directly interact with users (such as shells , editors, or anything else that is dependent on user -supplied input) are inactive while waiting for the user to type something. As a result, these interactive processes are likely to be targeted as good places for memory reclamation efforts to be targeted . For example, when you pause momentarily in tying a command (such as when you're looking through the process table trying to find the ID of the process that is hogging all your memory so you can kill it), your shell will have to make the long trip from disk back into memory before your characters can be echoed back. Even worse, the disk subsystem is probably under heavy load from all the paging and swapping activity! The upshot is that in memory shortages, interactive performance falls through the floor.

This situation degrades even further if memory consumption continues. The system starts to slow down as disks become overloaded with paging and swapping, and the load average spirals higher; memory-based constraints quickly turn into I/O-based constraints.

4.3.2 Swap Space

Swap space (or, more accurately, paging space , since almost all of this sort of activity is paging rather than swapping) is a topic that tends to confuse people. Swap space serves three functions:

As a place to write pages of private memory (a paging store )
As a place for anonymous memory storage
As a place to store crash dumps ^[6]

^[6] By default, the very end of the first swap partition is used to store kernel crash dumps under Solaris 7.

This space is allocated from both spare physical memory and from a swap space on disk, be it a dedicated partition or a swapfile .

4.3.2.1 Anonymous memory

Swap space for anonymous memory is used in two stages. Anonymous memory reservations are taken out of disk-based swap, but allocations are taken out of physical memory-based swap. When anonymous memory is requested (via the malloc(3C) system call, for example), a reservation is made against swap, and a mapping is made against /dev/zero . Disk-based swap is used until none remains, in which case physical memory is used instead. Memory space that is mapped but never used stays reserved. This is common behavior in large database systems, and explains why applications such as Oracle require large amounts of disk-based swap, even though they are unlikely to allocate the entire reserved space.

When reserved pages are accessed for the first time, physical pages are taken from the free list, zeroed, and allocated rather than reserved. Should a page of anonymous memory be stolen by the page scanner, the data is written to disk-based swap space (that is, the allocation is moved from memory to disk), and the memory is freed for reuse.

4.3.2.2 Sizing swap space

There are many rules of thumb for how much swap space should be configured on a system, ranging from four times the amount of physical memory to half the amount of physical memory, and none of them are particularly good.

Under Solaris, one tool to get a long term picture of swap usage is /usr/sbin/swap -s . Here's an example from a workstation running Solaris 7 with 128 MB of physical memory and a 384 MB swap partition:

 %  swap -s  total: 12000k bytes allocated + 3512k reserved = 15512k used, 468904k available

Unfortunately, the names of these fields are misleading, if not downright incorrect. A more accurately labeled version would read:

 total: 12000k bytes allocated + 3512k unallocated = 15512k reserved, 468904k available

When the available swap space reaches zero, the system will no longer be able to use more memory (until some is freed). From this report, it is obvious that the system has plenty of swap space configured.

You can also use /usr/sbin/swap -l to find out how much swap space is in use. Here's an example from the same system:

 %  swap -l  swapfile             dev  swaplo blocks   free /dev/dsk/c1t1d0s1   32,129     16 788384 775136

In this case, there is a single swap device (a dedicated partition) with a total area of 688,384 512-byte blocks (about 384 MB). At the time this measurement was taken, virtually all of the swap space was free.

One tool you can use to monitor swap space usage over time is /usr/bin/sar -r interval , with interval specified in seconds:

 %  sar -r 3600  SunOS fermat 5.7 Generic sun4u    06/01/99 00:00:00 freemem freeswap 00:00:00     679   935313 01:00:00     680   937184 02:00:00     680   937184

sar reports the free memory in pages, and the free swap in disk blocks (really the amount of available swap space).

On Linux systems, you can get similar information from the file /proc/meminfo :

 %  cat /proc/meminfo  head -3  grep -vi mem  total:    used:    free:  shared: buffers:  cached: Swap: 74657792 18825216 55832576

In general, if more than about half of your swap area is in use, you should consider increasing it. Disk is very cheap; skimping on swap space is only going to hurt you.

4.3.2.3 Organizing swap space

In order to minimize the impact of paging as much as possible, swap areas should be configured on the fastest possible disks. The swap partition should be on a low numbered slice (traditionally, slice one, with the root filesystem on slice zero); we'll explain why in Chapter 5. There is absolutely no reason, from a performance point of view, to place swap areas on slow disks, fast disks being accessed through slow controllers, fast disks that are already heavily loaded with I/O activity, or to have more than one swap area on a disk. The best strategy is to have a dedicated swapping area with multiple fast disks and controllers. If it is possible, try to put the swap area on a fast, lightly used disk. If it is not possible, however, try and put the swap file on the most heavily used partition. This minimizes the seek time to the disk; for a detailed discussion, see Section 5.4.10.1.

Solaris allows you to use remote files as swapping areas via NFS. This isn't a good way to achieve reasonable performance on modern systems. If you're configuring a diskless workstation, you have no choice unless you can survive without disk-based swap space. Measuring the level of I/O activity on a per-partition and per-disk basis is a topic that we'll cover in more detail in Section 5.5. The single most important issue in swap area placement is to put swap areas on low numbered partitions on fast, lightly used disks.

4.3.2.4 Swapfiles

Sometimes, you need to create a swap area in an "emergency" setting. The most effective way of doing this on a Solaris system is to use a swapfile , which is a file on disk that the system treats as swap space. In order to create one, you must first create an empty file with /usr/sbin/mkfile , then use /usr/bin/swap -a on the swapfile. Here's an example:

 #  mkfile 64m /swapfile  #  /usr/bin/swap -a /swapfile  #  /usr/bin/swap -l  swapfile             dev  swaplo blocks   free /dev/dsk/c0t0d0s0   32,0      16 262944 232816 /swapfile              -      16  65520  65520

You may alternately specify the size of the swapfile to mkfile in blocks or kilobytes by specifying the units as b or k , respectively. You can remove the file from the swap space via /usr/bin/swap -d swapfile . Note that the swapfile will not be automatically activated at system boot.