6.18. Swap SpaceIn this section we look at how swap is allocated and then discuss the statistics used for monitoring swap. We refer to swap space as seen by the processes as virtual swap space and real (disk or file) swap space as physical swap space. 6.18.1. Swap AllocationSwap space allocation goes through distinct stages: reserve, allocate, and swap-out. When you first create a segment, you reserve virtual swap space; when you first touch and allocate a page, you "allocate" virtual swap space for that page; then, if you encounter a memory shortage, you can "swap out" a page to swap space. Table 6.6 summarizes the swap states.
Swap space is reserved each time a heap segment is created. The amount of swap space reserved is the entire size of the segment being created. Swap space is also reserved if there is a possibility of anonymous memory being created. For example, mapped file segments that are mapped MAP_PRIVATE (like the executable data segment) reserve swap space because at any time they could create anonymous memory during a copy-on-write operation. You should reserve virtual swap space up-front so that swap space allocation assignment is done at the time of request, rather than at the time of need. That way, an out-of-swap-space error can be reported synchronously during a system call. If you allocated swap space on demand during program execution rather than when you called malloc(), the program could run out of swap space during execution and have no simple way to detect the out-of-swap-space condition. For example, in the Solaris kernel, we fail a malloc() request for memory as it is requested rather than when it is needed later, to prevent processes from failing during seemingly normal execution. (This strategy differs from that of operating systems such as IBM's AIX, where lazy allocation is done. If the resource is exhausted during program execution, then the process is sent a SIGDANGER signal.) The swapfs file system includes all available pageable memory as virtual swap space in addition to the physical swap space. That way, you can "reserve" virtual swap space and "allocate" swap space when you first touch a page. When you reserve swap rather than reserving disk space, you reserve virtual swap space from swapfs. Disk swap pages are only allocated once a page is paged out. With swapfs, the amount of virtual swap space available is the amount of available unlocked, pageable physical memory plus the amount of physical (disk) swap space available. If you were to run without swap space, then you could reserve as much virtual memory as there is unlocked pageable physical memory available on the system. This would be fine, except that often virtual memory requirements are greater than physical memory requirements, and this case would prevent you from using all the available physical memory on the system. For example, a process may reserve 100 Mbytes of memory and then allocate only 10 Mbytes of physical memory. The process's physical memory requirement would be 10 Mbytes, but it had to reserve 100 Mbytes of virtual swap, thus using 100 Mbytes of virtual swap allocated from available real memory. If we ran such a process on a 128-Mbyte system, we would likely start only one of these processes before we exhausted our swap space. If we added more virtual swap space by adding a disk swap device, then we could reserve against the additional space, and we would likely get 10 or so of the equivalent processes in the same physical memory. The process data segment is another good example of a requirement for larger virtual memory than for physical memory. The process data segment is mapped MAP_PRIVATE, which means that we need to reserve virtual swap for the whole segment, but we allocate physical memory only for the few pages that we write to within the segment. The amount of virtual swap required is far greater than the physical memory allocated to it, so if we needed to swap pages out to the swap device, we would need only a small amount of physical swap space. If we had the ideal process that had all of its virtual memory backed by physical memory, then we could run with no physical swap space. Usually, we need something like 0.5 to 1.5 times memory size for physical swap space. It varies, of course, depending on the virtual-to-physical memory ratio of the application. Another consideration is system size. A large multiprocessor Sun Server with 512GB of physical memory is unlikely to require 1TB of swap space. For very large systems with a large amount of physical memory, configured swap can potentially be less than total physical memory. Again, the actual amount of virtual memory required to meet performance goals will be workload dependent. 6.18.2. Swap StatisticsThe amount of anonymous memory in the system is recorded by the anon accounting structures. The anon layer keeps track in the kanon_info structure of how anonymous pages are allocated. The kanon_info structure, shown below, is defined in the include file vm/anon.h. struct k_anoninfo { pgcnt_t ani_max; /* total reservable slots on phys disk swap */ pgcnt_t ani_free; /* # of unallocated phys and mem slots */ pgcnt_t ani_phys_resv; /* # of reserved phys (disk) slots */ pgcnt_t ani_mem_resv; /* # of reserved mem slots */ pgcnt_t ani_locked_swap; /* # of swap slots locked in reserved */ /* mem swap */ }; See sys/anon.h The k_anoninfo structure keeps count of the number of slots reserved on physical swap space and against memory. This information populates the data used for the swapctl system call. The swapctl() system call provides the data for the swap command and uses a slightly different data structure, the anoninfo structure, shown below. struct anoninfo { pgcnt_t ani_max; pgcnt_t ani_free; pgcnt_t ani_resv; }; See sys/anon.h The anoninfo structure exports the swap allocation information in a platform-independent manner. 6.18.3. Swap Summary: swap -sThe swap -s command output, shown below, summarizes information from the anoninfo structure. $ swap -s total: 108504k bytes allocated + 13688k reserved = 122192k used, 114880k available The output of swap -s can be somewhat misleading because it confuses the terms used for swap definition. The output is really telling us that 122,192 Kbytes of virtual swap space have been reserved, 108,504 Kbytes of swap space are allocated to pages that have been touched, and 114,880 Kbytes are free. This information reflects the stages of swap allocation, shown in Figure 6.5. Remember, we reserve swap as we create virtual memory, and then part of that swap is allocated when real pages are assigned to the address space. The balance of swap space remains unused. Figure 6.5. Swap Allocation States6.18.4. Listing Physical Swap Devices: swap -lThe swap -l command lists the physical swap devices and their levels of physical allocation. $swap -l swapfile dev swaplo blocks free /dev/dsk/c0t0d0s0 136,0 16 1049312 782752 The blocks and free are in units of disk blocks, or sectors (512 bytes). This example shows that some of our physical swap slice has been used. 6.18.5. Determining Swapped-Out ThreadsThe pageout scanner will send clusters of pages to the swap device. However, if it can't keep up with demand, the swapper swaps out entire threads. The number of threads swapped out is either the kthr:w column from vmstat or swpq-sz from sar -q. The following example is the same system from the previous swap -l example but it has experienced a dire memory shortage in the past and has swapped out entire threads. $ vmstat 1 2 kthr memory page disk faults cpu r b w swap free re mf pi po fr de sr dd dd f0 s3 in sy cs us sy id 0 0 13 423816 68144 3 16 5 0 0 0 1 0 0 0 0 67 36 136 1 0 98 0 0 67 375320 43040 0 6 0 0 0 0 0 0 0 0 0 406 354 137 1 0 99 $ sar -q 1 SunOS mars 5.9 Generic_118558-05 sun4u 03/12/2006 05:05:36 runq-sz %runocc swpq-sz %swpocc 05:05:37 0.0 0 67.0 99 Our system currently has 67 threads swapped out to the physical swap device. The sar command has also provided a %swpocc column, which reports the percent swap occupancy. This is the percentage of time that threads existed on the swap device (99% is a rounding error) and is more useful for much longer sar intervals. 6.18.6. Monitoring Physical Swap ActivityTo determine if the physical swap devices are currently busy with I/O transactions, we can use the iostat command in the regular manner. We just need to remember that we are looking at the swap slice, not a file system slice. $ iostat -xnPz 1 ... extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.0 27.0 0.0 3452.3 2.1 0.7 78.0 24.9 32 34 c0t0d0s1 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 1.0 0.0 8.0 0.0 0.0 0.0 39.6 36.3 4 4 c0t0d0s0 0.0 75.1 0.0 9609.3 8.0 1.9 107.1 24.7 88 95 c0t0d0s1 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.0 61.0 0.0 7686.7 5.4 1.4 88.3 23.6 65 73 c0t0d0s1 ... Physical memory was quickly exhausted on this system, causing a large number of pages to be written to the physical swap device, c0t0d0s1. Swap activity due to the swapping out of entire threads can be viewed with sar -w. The vmstat -S command prints similar swapping statistics. 6.18.7. MemTool prtswapIn the following example, we use the prtswap script in MemTool to list the states of swap to find out where the swap is allocated from. We then use the prtswap command without the -l option for just a summary of the swap allocations. # prtswap -l Swap Reservations: -------------------------------------------------------------------------- Total Virtual Swap Configured: 767MB = RAM Swap Configured: 255MB Physical Swap Configured: + 512MB Total Virtual Swap Reserved Against: 513MB = RAM Swap Reserved Against: 1MB Physical Swap Reserved Against: + 512MB Total Virtual Swap Unresv. & Avail. for Reservation: 253MB = Physical Swap Unresv. & Avail. for Reservations: 0MB RAM Swap Unresv. & Avail. for Reservations: + 253MB Swap Allocations: (Reserved and Phys pages allocated) -------------------------------------------------------------------------- Total Virtual Swap Configured: 767MB Total Virtual Swap Allocated Against: 467MB Physical Swap Utilization: (pages swapped out) -------------------------------------------------------------------------- Physical Swap Free (should not be zero!): 232MB = Physical Swap Configured: 512MB Physical Swap Used (pages swapped out): - 279MB See MemTool # prtswap Virtual Swap: --------------------------------------------------------------- Total Virtual Swap Configured: 767MB Total Virtual Swap Reserved: 513MB Total Virtual Swap Free: (programs will fail if 0) 253MB Physical Swap Utilization: (pages swapped out) --------------------------------------------------------------- Physical Swap Configured: 512MB Physical Swap Free (programs will be locked in if 0): 232MB See MemTool The prtswap script uses the anonymous accounting structure members to establish how swap space is allocated and uses the availrmem counter, the swapfsminfree reserve, and the swap -l command to find out how much swap is used. Table 6.7 shows the anonymous accounting variables stored in the kernel.
6.18.8. Display of Swap Reservations with pmapThe -S option of pmap describes the swap reservations for a process. The amount of swap space reserved is displayed for each mapping within the process. Swap reservations are reported as zero for shared mappings since they are accounted for only once systemwide. sol9$ pmap -S 15492 15492: ./maps Address Kbytes Swap Mode Mapped File 00010000 8 - r-x-- maps 00020000 8 8 rwx-- maps 00022000 20344 20344 rwx-- [ heap ] 03000000 1024 - rw-s- dev:0,2 ino:4628487 04000000 1024 1024 rw--- dev:0,2 ino:4628487 05000000 1024 512 rw--R dev:0,2 ino:4628487 06000000 1024 1024 rw--- [ anon ] 07000000 512 512 rw--R [ anon ] 08000000 8192 - rwxs- [ dism shmid=0x5] 09000000 8192 - rwxs- [ dism shmid=0x4] 0A000000 8192 - rwxs- [ dism shmid=0x2] 0B000000 8192 - rwxsR [ ism shmid=0x3] FF280000 680 - r-x-- libc.so.1 FF33A000 32 32 rwx-- libc.so.1 FF390000 8 - r-x-- libc_psr.so.1 FF3A0000 8 - r-x-- libdl.so.1 FF3B0000 8 8 rwx-- [ anon ] FF3C0000 152 - r-x-- ld.so.1 FF3F6000 8 8 rwx-- ld.so.1 FFBFA000 24 24 rwx-- [ stack ] -------- ------- ------- total Kb 50464 23496 You can use the swap reservation information to estimate the amount of virtual swap used by each additional process. Each process consumes virtual swap from a global virtual swap pool. Global swap reservations are reported by the avail field of the swap(1M) command. It is important to stress that while you should consider virtual reservations, you must not confuse them with physical allocations (which is easy to do since many commands just describe them as "swap"). For example: # pmap -S 236 236: /usr/lib/nfs/nfsmapid Address Kbytes Swap Mode Mapped File 00010000 24 - r-x-- nfsmapid 00026000 8 8 rwx-- nfsmapid 00028000 7768 7768 rwx-- [ heap ] ... FF3EE000 8 8 rwx-- ld.so.1 FFBFE000 8 8 rw--- [ stack ] -------- ------- ------- total Kb 10344 8272 Process ID 236 (nfsmapid) has a total Swap reservation of 8 Mbytes. Now we list the state of our physical swap devices on this system: $ swap -l swapfile dev swaplo blocks free /dev/dsk/c0t0d0s1 136,9 16 2097632 2097632 No physical swap has been used. |