In the last several sections, we've concentrated on the virtual view of a Windows 2000 process—page tables, PTEs, and VADs. In the remainder of this chapter, we'll explain how Windows 2000 keeps a subset of virtual addresses in physical memory.
As you'll recall, the term used to describe a subset of virtual pages resident in physical memory is called a working set. There are two kinds of working sets—process working sets and the system working set.
NOTE
The kernel extensions to support Terminal Services for Windows 2000 (which supports multiple independent interactive user sessions on a single Windows 2000 server system) add a third type of working set: the session working set. A session consists of a set of processes as well as a session working set for kernel-mode session-specific data structures allocated by the kernel-mode part of the Win32 subsystem (Win32k.sys), the session working set's code and data, session paged pool, session mapped views, and other session-space device drivers.
Before examining the details of each type of working set, let's look at the overall policy for deciding which pages are brought into physical memory and how long they remain. After that, we'll explore the two types of working sets.
Virtual memory systems generally define three policies that dictate how (or when) paging is performed: a fetch policy, a placement policy, and a replacement policy.
A fetch policy determines when the pager brings a page from disk into memory. One type of fetch policy attempts to load the pages a process will need before it asks for them. Other fetch policies, called demand-paging policies, load a page into physical memory only when a page fault occurs. In a demand-paging system, a process incurs many page faults when its threads first begin executing because the threads reference the initial set of pages they need to get going. Once this set of pages is loaded into memory, the paging activity of the process decreases.
NOTE
To optimize the startup time of an image, a tool named the Working Set Tuner has been provided in the Platform SDK. This utility reorders the pages in an executable image, placing them in the order in which they are referenced during image startup and thus decreasing load time.
The Windows 2000 memory manager uses a demand-paging algorithm with clustering to load pages into memory. When a thread receives a page fault, the memory manager loads into memory the faulted page plus a small number of pages following it. This strategy attempts to minimize the number of paging I/Os a thread will incur. Because programs, especially large ones, tend to execute in small regions of their address space at any given time, loading clusters of virtual pages reduces the number of disk reads. The values that determine the default page read cluster sizes depend on physical memory size and are listed in Table 7-16. Notice that the values differ for pages in executable images versus other pages.
Table 7-16 Page Fault Read Clustering Values
Memory Size* | Cluster Size for Code Pages in Images | Cluster Size for Data Pages in Images | Cluster Size for All Other Pages |
---|---|---|---|
< 12 MB | 3 | 2 | 5 |
12-19 MB | 3 | 2 | 5 |
> 19 MB | 8 | 4 | 8 |
* Note that the minimum memory size supported by Windows 2000 is 32 MB. However, future embedded versions might support systems with less memory.
When a thread receives a page fault, the memory manager must also determine where in physical memory to put the virtual page. The set of rules it uses to determine the best position is called a placement policy. Windows 2000 considers the size of CPU memory caches when choosing page frames to minimize unnecessary thrashing of the cache.
If physical memory is full when a page fault occurs, a replacement policy is used to determine which virtual page must be removed from memory to make room for the new page. Common replacement policies include least recently used (LRU) and first in, first out (FIFO). The LRU algorithm requires the virtual memory system to track when a page in memory is used. When a new page frame is required, the page that hasn't been used for the greatest amount of time is paged to disk and its frame is freed to satisfy the page fault. The FIFO algorithm is somewhat simpler; it removes the page that has been in physical memory for the greatest amount of time, regardless of how often it's been used.
Replacement policies can be further characterized as either global or local. A global replacement policy allows a page fault to be satisfied by any page frame, whether or not that frame is owned by another process. For example, a global replacement policy using the FIFO algorithm would locate the page that has been in memory the longest and would free it to satisfy a page fault; a local replacement policy would limit its search for the oldest page to the set of pages already owned by the process that incurred the page fault. Global replacement policies make processes vulnerable to the behavior of other processes—an ill-behaved application can undermine the entire operating system by inducing excessive paging activity in all processes.
On multiprocessor systems, Windows 2000 implements a variation of a local FIFO replacement policy. On uniprocessor systems, it implements something closer to a least recently used policy (LRU) (known as the clock algorithm, as implemented in most versions of UNIX). It allocates a number of page frames (dynamically adjusted) to each process, called the process working set (or in the case of pageable system code and data, to the system working set). When a process working set reaches its limit and/or a working set needs to be trimmed because of demands for physical memory from other processes, the memory manager removes pages from the working set until it has determined there are enough free pages. How working sets are managed is described in the next section.
Every process starts with the same default working set minimum and maximum. These values, which are listed in Table 7-17, are calculated at system initialization time and are based strictly on the size of physical memory.
Table 7-17 Default Minimum and Maximum Working Set Sizes
Memory Size | Default Minimum Working Set Size (in Pages) | Default Maximum Working Set Size (in Pages) |
---|---|---|
Small | 20 | 45 |
Medium | 30 | 145 |
Large | 50 | 345 |
You can change these default values on a per-process basis with the Win32 SetProcessWorkingSetSize function, though you must have the "increase scheduling priority" user right to do this. The maximum working set size can't exceed the systemwide maximum calculated at system initialization time and stored in the kernel variable MmMaximumWorkingSetSize. This value is set to be the number of available pages (the size of the zero, free, and standby list) at the time the computation is made minus 512 pages. However, this computed value has a fixed limit of 1984 MB or 3008 MB on a system running with a 3-GB user space.
When a page fault occurs, the process's working set limits and the amount of free memory on the system are examined. If conditions permit, the memory manager allows a process to grow to its working set maximum (or beyond—the maximum can be exceeded if enough free pages are available). However, if memory is tight, Windows 2000 replaces rather than adds pages in a working set when a fault occurs.
Although Windows 2000 attempts to keep memory available by writing modified pages to disk, when modified pages are being generated at a very high rate, more memory is required in order to meet memory demands. Therefore, when physical memory runs low (MmAvailablePages is less than MmMinimumFreePages), the working set manager, a routine that runs in the context of the balance set manager system thread (described in the next section), is called to initiate automatic working set trimming to increase the amount of free memory available in the system. (With the Win32 SetProcessWorkingSetSize function mentioned earlier, you can also initiate working set trimming of your own process, for example, after your application is initialized.)
The working set manager examines available memory and decides which, if any, working sets need to be trimmed. If there is ample memory, the working set manager calculates how many pages could be removed from working sets if needed. If trimming is needed, it looks at working sets that are above their minimum setting. It also dynamically adjusts the rate at which it examines working sets as well as arranges the list of processes that are candidates to be trimmed into an optimal order. For example, larger processes that have been idle longer are considered before smaller processes that are running more often; the process running the foreground application is considered last; and so on.
Some of the kernel variables that affect working set expansion and trimming are listed in Table 7-18. The values of these variables are fixed or system set and can't be adjusted by registry values.
Table 7-18 Working Set-Related System Control Variables
Variable | Value | Description |
---|---|---|
MmWorkingSetSize-Increment | 6 | The number of pages to add to a working set if there are sufficient available pages and the working set is below its maximum. |
MmWorkingSetSize-Expansion | 20 | The number of pages by which to expand the maximum working set if it is at its maximum and there are sufficient available pages. |
MmWsExpandThreshold | 90 | The number of pages that must be available to expand the working set above its maximum. |
MmPagesAboveWs-Minimum | Dynamic | The number of pages that would be removed from working sets if every working set was at its minimum. |
MmPagesAboveWs-Threshold | 37 | If memory is getting short and MmPagesAboveWs-Minimum is above this value, trim working sets. |
MmWsAdjustThreshold | 45 | The number of pages required to be freed by working set reduction before working set reduction is attempted |
MmWsTrimReduction-Goal | 29 | The total number of pages to reduce by working set trimming. |
When it finds processes using more than their minimums, the working set manager looks for pages to remove from their working sets, making the pages available for other uses. If the amount of free memory is still too low, the working set manager continues removing pages from processes' working sets until it achieves a minimum number of free pages on the system.
If a process has incurred more than a few page faults since the last time it was trimmed, it becomes exempt from trimming, the theory being that if the working set manager makes a mistake and trims pages that were being used, it won't trim any more out until the next periodic trim cycle (6 seconds later).
The algorithm to determine which pages to remove from a working set is different on a single-processor system than on a multiprocessor system. On a single-processor system, the working set manager tries to remove pages that haven't been accessed recently. It does this by checking the accessed bit in the hardware PTE to see whether the page has been accessed. If the bit is clear, the page is aged, that is, a count is incremented indicating that the page hasn't been referenced since the last working set trim scan. Later, the age of pages is used to locate candidate pages to remove from the working set.
If the hardware PTE accessed bit is set, the working set manager clears it and goes on to examine the next page in the working set. In this way, if the accessed bit is clear the next time the working set manager examines the page, it knows that the page hasn't been accessed since the last time it was examined. This scan for pages to remove continues through the working set list until either the number of desired pages has been removed or the scan has returned to the starting point. (The next time the working set is trimmed, the scan picks up where it left off last.)
On a multiprocessor system, the working set manager doesn't check the access bit; clearing it would require invalidating TLB entries on other processors, which would result in unnecessary TLB cache misses by threads in the same process that might be running on other processors. Thus, on a multiprocessor system, pages are removed from the working set without regard to the state of the accessed bit.
EXPERIMENT
Viewing Process Working Set SizesYou can use the Performance tool to examine process working set sizes by looking at the following performance counters:
Counter Description Process: Working Set Current size of the selected process's working set in bytes Process: Working Set Peak Peak size of the selected process's working set in bytes Process: Page Faults/Sec Number of page faults for the process that occur each second Several other process viewer utilities (such as Task Manager, Pview, and Pviewer) also display the process working set size.
You can also get the total of all the process working sets by selecting the _Total process in the instance box in the Performance tool. This process isn't real—it's simply a total of the process-specific counters for all processes currently running on the system. The total you see is misleading, however, because the size of each individual process working set includes pages being shared by other processes. Thus, if two or more processes share a page, the page is counted in each process's working set.
EXPERIMENT
Viewing the Working Set ListYou can view the individual entries in the working set by using the kernel debugger !wsle command. The following example shows a partial output of the working set list of LiveKd. (This command was run on the LiveKd process.)
kd> !wsle 7 Working Set @ c0502000 Quota: 9f FirstFree: 40 FirstDynamic: 3 LastEntry 1fe NextSlot: 3 LastInitialized 257 NonDirect 5c HashTable: 0 HashTableSize: 0 Virtual Address Age Locked ReferenceCount c0300203 0 1 1 c0301203 0 1 1 c0502203 0 1 1 c01df201 0 0 1 c01ff201 0 0 1 c0005201 0 0 1 c0001201 0 0 1 c0002201 0 0 1 c0000201 0 0 1 c0006201 0 0 1 77e87119 0 0 1 00402319 0 0 1 77e01201 0 0 1 7ffdf201 0 0 1 00130201 0 0 1 77e9e119 0 0 1 78033201 0 0 1 00230221 0 0 1 00131201 0 0 1 77d50119 0 0 1 00132201 0 0 1 c01e0201 0 0 1 00411309 0 0 1 0040d201 0 0 1 77edf201 0 0 1 77ee0201 0 0 1 77fcd201 0 0 1 0040e201 0 0 1 7ffc1009 0 0 1 00401319 0 0 1Notice that some entries in the working set list are page table pages (the ones with addresses greater than 0xC0000000), some are from system DLLs (the ones in the 0x7nnnnnnn range), and some are from the code of LiveKd.exe itself (those in the 0x004nnnnn range).
Working set expansion and trimming take place in the context of a system thread called the balance set manager (routine KeBalanceSetManager). The balance set manager is created during system initialization. Although the balance set manager is technically part of the kernel, it calls the memory manager's working set manager to perform working set analysis and adjustment.
The balance set manager waits on two different event objects: an event that is signaled when a periodic timer set to fire once per second expires and an internal working set manager event that the memory manager signals at various points when it determines that working sets need to be adjusted. For example, if the system is experiencing a high page fault rate or the free list is too small, the memory manager wakes up the balance set manager so that it will call the working set manager to begin trimming working sets. When memory is more plentiful, the working set manager will permit faulting processes to gradually increase the size of their working sets by faulting pages back into memory, but the working sets will grow only as needed.
When the balance set manager wakes up as the result of its 1-second timer expiring, it takes the following four steps:
The swapper is also awakened by the scheduling code in the kernel if a thread that needs to run has its kernel stack swapped out or if the process has been swapped out. The swapper looks for threads that have been in a wait state for a specified amount of time (3 seconds on small memory systems, 7 seconds on medium or large memory systems). If it finds one, it puts the thread's kernel stack in transition (moving the pages to the modified or standby lists) so as to reclaim its physical memory, operating on the principle that, if a thread's been waiting that long, it's going to be waiting even longer. When the last thread in a process has its kernel stack removed from memory, the process is marked to be entirely outswapped. That's why, for example, processes that have been idle for a long time (such as Winlogon is after you log on) can have a zero working set size.
Just as processes have working sets, the pageable code and data in the operating system are managed by a single system working set. Five different kinds of pages can reside in the system working set:
You can examine the size of the system working set or the size of the five components that contribute to it with the performance counters or system variables shown in Table 7-19. Keep in mind that the performance counter values are in bytes whereas the system variables are measured in terms of pages.
You can also examine the paging activity in the system working set by examining the Memory: Cache Faults/Sec performance counter, which describes page faults that occur in the system working set (both hard and soft).
Table 7-19 System Working Set Performance Counters
Performance Counter (in Bytes) | System Variable (in Pages) | Description |
---|---|---|
Memory: Cache Bytes* | MmSystemCacheWs.WorkingSetSize | Total size of system working set (including the cache, paged pool, pageable Ntoskrnl and driver code, and system mapped views); this is not the size of the system cache alone, even though the name implies that it is. |
Memory: Cache Bytes Peak | MmSystemCacheWs.Peak | Peak system working set size. |
Memory: System Cache Resident Bytes | MmSystemCachePage | Physical memory consumed by the system cache. |
Memory: System Code Resident Bytes | MmSystemCodePage | Physical memory consumed by pageable code in Ntoskrnl.exe. |
Memory: System Driver Resident Bytes | MmSystemDriverPage | Physical memory consumed by pageable device driver code. |
Memory: Pool Paged Resident Bytes | MmPagedPoolPage | Physical memory consumed by paged pool. |
* Internally, this working set is called the system cache working set, even though the system cache is just one of five different components in it. Thus, several utilities think they are displaying the size of the file cache when they are displaying the total size of the system working set.
The system variable that contains the value for this counter is MmSystemCacheWs.PageFaultCount.
The minimum and maximum system working set size is computed at system initialization time based on the amount of physical memory on the machine and whether the system is running Windows 2000 Professional or Windows 2000 Server. The initial values, which are listed in Table 7-20, are chosen based on system memory size.
Table 7-20 Minimum and Maximum Size of System Working Set
Memory Size | System Working Set Minimum (in Pages) | System Working Set Maximum (in Pages) |
---|---|---|
Small | 388 | 500 |
Medium | 688 | 1150 |
Large | 1188 | 2050 |
These numbers are further altered if the registry value HKLM\SYSTEM\CurrentControlSet\Control\Session Manager\Memory Management\LargeSystemCache is set to 1 (the default on Windows 2000 Server systems) and the number of available pages (MmAvailablePages, as described in Table 7-25) is greater than 350 plus 6 MB (a total of 1886 pages on x86 systems). In this case, the system working set maximum is set to available pages minus 4 MB. If this value is greater than the maximum working set size supported by Windows 2000 (1984 MB for normal x86 systems or 3008 MB on a system running with a 3-GB user space), the system working set maximum is reduced to that maximum value minus 5 pages.
Windows 2000 then checks to see whether the new system working set maximum is greater than the virtual size of the system cache—if it is, the working set maximum is reduced to the virtual size of the system cache. In other words, the system working set could potentially expand to use all the virtual memory reserved for the system cache. (See Chapter 11 for more information about the virtual size of the system cache.)
Finally, a check is made to determine whether the difference between the system working set minimum and maximum is less than 500 pages. If it is, the working set minimum is reduced to the working set maximum minus 500 pages.
The final calculated working set minimum and maximum are then stored in the system variables shown in Table 7-21. (These variables aren't available through any performance counter.)
Table 7-21 System Variables That Store Working Set Minimums or Maximums
Variable | Type | Description |
---|---|---|
MmSystemCacheWsMinimum or MmSystemCacheWs.MinimumWorkingSetSize | ULONG | Minimum working set size |
MmSystemCacheWsMaximum or MmSystemCacheWs.MaximumWorkingSetSize | ULONG | Maximum working set size |