9.2 How Much Swap Space Do I Need? | HP-UX CSE(c) Official Study Guide and Desk Reference

Swap space is disk space that we hope we will never need. If we actually use swap space, it means that at some time our system was running out of memory to such as extent that the virtual memory system decided to get rid of some data from memory and store it out on disk, on a swap device . This basic idea of when the virtual memory system works helps us when we are trying to decide how much swap space to configure.

An old rule-of-thumb went something like this: " twice as much swap space as main memory. " I can't categorically say that this is either a good or bad rule-of-thumb. Remember the four questions we asked at the beginning? We need to answer these questions before we can calculate how much swap space to configure:

How much main memory do you have?
How much data will your applications want in memory at any one time?
How much application data will be locked in memory?
How does the virtual memory system decide when it's time to start paging-out?

Some of these questions are harder to answer than others. Let's start at the beginning:

How much main memory do you have?

This one isn't too difficult because we can gather this information either from the dmesg command or from syslog.log .
```
 
```
```
 root@hpeos003[]  dmesg  grep Physical  Physical: 1048576 Kbytes, lockable: 742136 Kbytes, available: 861432 Kbytes root@hpeos003[] 
```
Physical memory is how much is installed and configured in the system.

Available memory is the amount of memory left after the operating system has taken what it initially needs.

Lockable memory is the amount of memory that is able to be locked by process using plock() and shmctl() system calls. By default, only root process can be locked in memory (see the discussion on Privilege Groups in Chapter 29, Dealing with Immediate Security Threats). Lockable memory should always be less than available memory. The amount of Lockable memory is controlled by the kernel parameter unlockable_mem . Available memory = Lockable memory + unlockable_mem . If unlockable_mem is less than or equal to zero, the kernel will calculate a suitable value.
How much data will your applications want in memory at any one time?

This is possibly the hardest question to answer. The only way to attempt to calculate it is to analyze a running system at its busiest. Using tools like glance , top , ipcs , and ps , we can monitor how much memory the processes are using. We can only assume that at their busiest processes will be as big as they will ever be. Some applications will define a large working area in which they store application data. The application will manage how big this area is and make a call to the operating system to populate this area with data. If this global area is part of an application, it can be an excellent measure as to a vast proportion of an application's needs. Just looking at the size of the application database is not sufficient because it is unlikely that we will ever suck the entire database into memory.
How much application data will be locked in memory?

We mentioned lockable memory in question 1. Some applications are coded with system calls to lock parts of the application into memory, usually only the most recently used parts of the application. A user process needs special privileges in order to do this (see Chapter 29, Dealing with Immediate Security Threats for a discussion on setprivgrp ). If a process is allowed to lock parts of the application in memory; the virtual memory system will not be able to page those parts of the application to a swap device. It should be noted that application developers need to be aware of the existence of process locking in order to use it effectively in their applications. The application installation instructions should detail the need for the application to have access to this feature.
How does the virtual memory system decide when it's time to start paging-out?

This last question leads us to a discussion on virtual memory, reserving swap space, the page daemon, and a two-handed clock.

9.2.1 Reserving swap space

While a process is running, it may ask the operating system for access to more memory, i.e., the process is growing in size. The virtual memory system would love to keep giving a process more memory, but it's a bit worried because it knows that we have a finite amount of memory. If we keep giving memory to processes, there will come a point when we need to page-out processes and we will want to know we can fit them all out on the swap device. This is where reserving swap comes in. Every time a process requests more memory, the virtual memory system will want to reserve some swap space in case it needs to page-out that process in the future. Reserving swap is one of the safety nets used to ensure that the memory system doesn't get overloaded. Let's get one thing clear here: Reserving swap space does not mean going to disk and setting aside some disk space for a process. That is called allocating swap space, and it only happens when we start running out of memory. Reserving swap space only puts on hold enough space for every process in the system. This gives us an idea of how much swap space is an absolute minimum. If we need a pool of swap space from which to reserve, surely that pool needs to be at least as big as the main memory. In the old days, that was the case. Now systems can have anything up to 512GB of RAM. We hope that we will never need to swap! Taking the old analogy regarding reserving swap, we would need to configure at least 512GB of swap space in order to achieve our pool of swap space from which we can start reserving space for processes. That would be an awful lot of swap space doing nothing. Consequently, HP-UX utilizes something called pseudo-swap. Pseudo-swap allows us to continue allocating memory once our pool of device and filesystem swap is exhausted. It's almost like having an additional but fictitious ( pseudo ) swap device that makes the total swap space on the system seem like more than we have actually configured. The size of pseudo-swap is set at 7/8 of the Available memory at system startup time. The use of pseudo-swap is controlled by the kernel parameter swapmem_on ,which defaults to 1 (on). This means that our total swap space on the system can be calculated as such:

Total swap space = (device swap + filesystem swap) + pseudo-swap

If there is no device or filesystem swap; the system uses pseudo-swap as a last resort. Pseudo-swap is never reserved; it is either free or pseudo-allocated. In reality, we never swap to a pseudo-swap device, because it's in memory. If we utilize pseudo-swap, we need to appreciate that it is simply a mechanism for fooling the virtual memory system into thinking we have more swap space than we actually have. It was designed with large memory systems in mind, where it is unlikely we will ever need to actually use swap space, but we do need to configure some swap space.

If we think about it, the use of pseudo-swap means that the absolute minimum amount of device/filesystem swap we could configure is 1/8 of all memory. Every time we add more memory to our system, we would have to review the amount of total configured swap space. We could get to a situation where the Virtual Memory system could not reserve any swap space for a process. This would mean that a process would not be able to grow even though there was lots of (our recently installed) memory available.

IMPORTANT

Just by installing more memory in a system does not mean the operating system can use it immediately. Be sure you understand the idea of how reserving swap space works.

I would never suggest using the "1/8 of real memory" minimum in real life, but this is the theoretical minimum. It's not a good idea because if we get to the point where processes are actually being paged-out to disk, we could get into a situation where we haven't enough space to accommodate the processes that are in memory. At that time, the system would go through a time where it is thrashing, i.e., spending more time paging than doing useful work. This is a bad idea because processes get little opportunity to execute and the system effectively hangs .

9.2.2 When to throw pages out

The virtual memory system has a number of kernel variables that it monitors on a constant basis. If these variables reach certain thresholds, it's time to start freeing up memory because we are just about to run out. The main process that takes care of freeing pages is known as vhand , the page daemon. The main kernel parameters that control when vhand wakes up are lotsfree , desfree , and minfree . Each of these parameters is associated with a trigger value that activates vhand . These trigger values are based on the amount of non-kernel (Available) memory available in your system (see Table 9-1):

Table 9-1. Virtual Memory Triggers

Parameter	Amount of Non-kernel Memory (NkM) available	Description and default value
lotsfree	NkM 32MB	1/8 of NkM not to exceed 256 pages (1MB)
	32MB NkM 2GB	1/16 of NkM not to exceed 8192 pages (32MB)
	2GB < NkM	16382 pages (64MB)
desfree	NkM 32MB	1/16 of NkM not to exceed 60 pages (240KB).
	32MB NkM 2GB	1/64 of NkM not to exceed 1024 pages (4MB)
	2GB < NkM	3072 pages (12MB)
minfree	NkM 32MB	1/2 of desfree not to exceed 25 pages (100KB)
	32MB NkM 2GB	1/4 desfree not to exceed 256 pages (1MB)
	2GB < NkM	1280 pages (5MB)

We can view the relationship between these parameters as follows in Figure 9-3:

Figure 9-3. Virtual Memory triggers.

graphics/09fig03.gif

When the number of free pages falls below lotsfree , vhand wakes up and tunes its internal parameters in preparation for starting to page process to disk. The variable we haven't mentioned here is gpgslim . This variable is initially set 1/4 of the distance between lotsfree and desfree and activates vhand into stealing pages if freemem falls below gpgslim . gpgslim will float between lotsfree and desfree depending on current memory pressure.

vhand decides when to steal pages by performing a simple test; has this page been referenced recently ? When awakened, vhand runs eight times per second by default and will only consume 10 percent of the CPU cycles for that interval. This is an attempt to be aggressive but not too aggressive at stealing pages. Initially, vhand will scan 1/16 of a pregion of an active process before moving to the next process. In doing so, it sets the reference bit for that page to equal 0 in the Page Directory (it also purges the TLB entry to ensure that the next reference must go through the Page Directory). This is known as aging a page. If the owning process comes along and uses the page, the reference bit is reset and vhand now knows that the page has been referenced recently . If the process doesn't access the page, vhand has to decide when it's time to steal the page. The time between aging a page and stealing a page is the result of continual assessment by the virtual memory system as to how aggressively it needs to steal pages. The whole process can be visualized by imagining a two-handed clock with the age-hand clearing the reference bit and the steal-hand running sometime behind the age-hand . When the steal-hand comes to recheck the reference bit, if it hasn't been reset, vhand knows this page hasn't been referenced recently and can schedule for that page to be written out to disk. vhand can steal that page. If the process suddenly wakes back up, it can reclaim the page before it is actually written out to disk. If not, the page is paged-out and memory is freed up for other processes/threads to use. If vhand manages to free up enough pages in order to get freemem above gpgslim , it can relax a little, stop stealing pages, and just age them. If, however, it can't free up enough pages, gpgslim will rise and vhand will become more aggressive in trying to steal pages (the two hands move round the clock quicker and scan more pages faster). Eventually, if vhand is unsuccessful at keeping freemem above gpgslim , we could get to a point where freemem is less that desfree . At this time, the kernel will start to deactivate processes that haven't run in the last 20 seconds. This is the job of the swapper process. Process deactivation is where the process is taken off the run queue and all its pages are put in front of the steal - hand (the age-hand is on its way, so the pages will soon be destined for the swap device). If this fails and freemem falls below minfree , swapper realizes that things are getting desperate and chooses any active process to be deactivated in an attempt to steal enough pages as quickly as possible. At this stage, the system is thrashing, i.e., paging-in/out more than it is doing useful work.

The parameters lotsfree , desfree , and minfree are kernel tunable parameters, but unless you really know what you're doing, it's suggested that you leave them at the values listed above.

9.2.3 So how much swap space should I configure?

There still isn't an easy answer to this. We now know about pseudo-swap and how it allows us to continue to use main memory even when we have run out of device/filesystem swap. We also know when the virtual memory system will start to steal pages in order to avoid a shortfall in available memory. Has this allowed us to get any nearer to an answer to our question? We need to return to those four questions we asked initially and remind ourselves that we need to establish how much data our application will require at peak time. This is not easy. We could come up with a formula of the form:

Start with the absolute minimum of 12.5 percent (1/8) of main memory. Realistically, we need to be starting at about 25 percent of main memory.
Add up all the shared memory segments used by all your applications. This might be available as a global area figure used by the application. If the global area is going to be locked in memory, we can't count it because it will never be paged-out.
Add up all private VM requirements of every process on the system at their busiest period. This is known as the virtual set-size and includes local/private data structure for each process. The only tool readily available that shows the vss for a process is HP's glance utility.
Add an additional 10-15 percent for process structures that could be paged-out (this also factors in a what-if amount, just in case).

We have added up the shared and private areas to give us a figure that will be added to the minimum 12.5 percent of main memory. This gives us a figure of the amount of device/filesystem swap to configure. This will constitute the pool of swap space we use to reserve space for running processes. The addition of pseudo-swap should accommodate any unforeseen spikes in demand. This formula makes some kind of sense because we have tried to establish every data element that could possibly be in memory at any one time. We haven't factored in text pages, as they are not paged to a swap area. If needed, text pages will simply be discarded, because we can always page-in the text from the program in the filesystem. This figure is not easy to come to. If you can get some idea of what the figure is likely to be, you are well on your way to working out how much memory your system needs. With a few minor modifications, we could re-jig the swap space formula to look something like:

20-30MB for the kernel.
Five percent of memory for the initial allocation of the dynamic buffer cache (kernel parameter dbc_min_pct ). The dbc_max_pct needs to be monitored carefully (50 percent by default). This is the maximum amount of memory that the dynamic buffer cache can consume. Like other memory users, we will only steal buffer pages when we get to the lotsfree threshold.
Ten percent of memory for network packet reassembly (kernel parameter netmemmax ).
Additional memory for any additional networking products such as NFS, IPv6, IPSec, SSH, HIDS, CDE, and so on. Reaching a precise number for this is difficult without careful analysis of all related processes.
Add up all resident (in memory) shared memory requirements. This includes shared memory segments, shared libraries (difficult to calculate unless you know all the shared libraries loaded by an application), text, and all applications using locked memory. The only sensible way to calculate this is to talk to your application supplier and ask about the size of programs, the size of the global area, and how much of that global area is normally required in memory at any one time.
Add up all the resident private data elements for every program running on the system (=resident set size; RSS ). This is similar to the VSS except that it is the actual data that made it into memory as opposed to the VSS that is local data that might make it into memory. Again, work with your application supplier to work this out (or use glance ).

In brief, there is no easy answer. More often than not, we have to work with our application suppliers to establish a reasonable working set of text/data that is probably going to be required in memory at any one time. From there, we can establish how much text/data is likely to be used at any one time. These figures can help us work out how much physical memory and swap space we will need to configure.

Being cheaper, swap space will probably always be bigger than main memory, although we now know that is not a necessity; it's just that most administrators feel safer that way. For most people, the prospect of having a system thrashing because the actual amount of swap space can't accommodate the amount of data in memory is not a pleasant prospect. That is where a sizing exercise is crucial.