Services the Memory Manager Provides

< Day Day Up >

The memory manager provides a set of system services to allocate and free virtual memory, share memory between processes, map files into memory, flush virtual pages to disk, retrieve information about a range of virtual pages, change the protection of virtual pages, and lock the virtual pages into memory.

Like other Windows executive services, the memory management services allow their caller to supply a process handle, indicating the particular process whose virtual memory is to be manipulated. The caller can thus manipulate either its own memory or (with the proper permissions) the memory of another process. For example, if a process creates a child process, by default it has the right to manipulate the child process's virtual memory. Thereafter, the parent process can allocate, deallocate, read, and write memory on behalf of the child process by calling virtual memory services and passing a handle to the child process as an argument. This feature is used by subsystems to manage the memory of their client processes, and it is also key for implementing debuggers because debuggers must be able to read and write to the memory of the process being debugged.

Most of these services are exposed through the Windows API. The Windows API has three groups of functions for managing memory in applications: page granularity virtual memory functions (Virtualxxx), memory-mapped file functions (CreateFileMapping, MapViewOfFile), and heap functions (Heapxxx and the older interfaces Localxxx and Globalxxx). (We'll describe the heap manager later in this section.)

The memory manager also provides a number of services, such as allocating and deallocating physical memory and locking pages in physical memory for direct memory access (DMA) transfers, to other kernel-mode components inside the executive as well as to device drivers. These functions begin with the prefix Mm. In addition, though not strictly part of the memory manager, executive support routines that begin with Ex are used to allocate and deallocate from the system heaps (paged and nonpaged pool) as well as to manipulate look-aside lists. We'll touch on these topics later in this chapter, in the section "System Memory Pools."

Although we'll be referring to Windows functions and kernel-mode memory management and memory allocation routines provided for device drivers, we won't cover the interface and programming details but rather the internal operations of these functions. Refer to the Platform Software Development Kit (SDK) and Device Driver Kit (DDK) documentation on MSDN for a complete description of the available functions and their interfaces.

Large and Small Pages

The virtual address space is divided into units called pages. That is because the hardware memory management unit translates virtual to physical addresses at the granularity of a page. Hence, a page is the smallest unit of protection at the hardware level. (The various page protection options are described in the section "Protecting Memory.") There are two page sizes: small and large. The actual sizes vary based on hardware architecture, and they are listed in Table 7-1.

Table 7-1. Page Sizes
Architecture	Small Page Size	Large Page Size
x86	4 KB	4 MB (2 MB on PAE systems)
x64	4 KB	2 MB
IA64	8 KB	16 MB

The advantage of large pages is speed of address translation for references to other data within the large page. This advantage exists because the first reference to any byte within a large page will cause the hardware's translation look-aside buffer (or TLB, which is described in the section "Translation Look-Aside Buffer") to have in its cache the information necessary to translate references to any other byte within the large page. If small pages are used, more TLB entries are needed for the same range of virtual addresses, thus increasing recycling of entries as new virtual addresses require translation. This, in turn, means having to go back to the page table structures when references are made to virtual addresses outside the scope of a small page whose translation has been cached. The TLB is a very small cache, and thus large pages make better use of this limited resource.

To take advantage of large pages, on systems considered to have enough memory (see Table 7-2 for the minimums), Windows maps with large pages the core operating system images (Ntoskrnl.exe and Hal.dll) as well as core operating system data (such as the initial part of nonpaged pool and the data structures that describe the state of each physical memory page). Windows also automatically maps I/O space requests (calls by device drivers to MmMapIoSpace) with large pages if the request is of satisfactory large page length and alignment. Lastly, Windows also allows applications to map their images, private memory and pagefilebacked sections with large pages. (See the MEM_LARGE_PAGE flag on the VirtualAlloc function.) You can also specify other device drivers to be mapped with large pages by adding a multistring registry value HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Session Manager\Memory Management\LargePageDrivers and specifying the names of the drivers as separately null terminated strings.

Table 7-2. Large Page Minimums
Operating System	Minimum Memory to Use Large Pages
Windows 2000	>127 MB
Windows XP, Windows Server 2003	>255 MB

One side-effect of large pages is that because each large page must be mapped with a single protection (because hardware memory protection is on a per-page basis), if a large page contains both read-only code and read/write data, the page must be marked as read/write, which means that the code will be writable. This means device drivers or other kernel mode code could, as a result of a bug, modify what is supposed to be read-only operating system or driver code without causing a memory access violation. However, if small pages are used to map the kernel, the read-only portions of NTOSKRNL.EXE and HAL.DLL will be mapped as read-only pages. Although this reduces efficiency of address translation, if a device driver (or other kernel mode code) attempts to modify a read-only part of the operating system, the system will crash immediately with the finger pointing at the offending instruction, as opposed to allowing the corruption to occur and later the system crashing (in a harder-to-diagnose way) when some other component trips over their corrupted data. If you suspect you are experiencing kernel code corruptions, enable Driver Verifier (described later in this chapter) this will disable the use of large pages.

Reserving and Committing Pages

Pages in a process address space are free, reserved, or committed. Applications can first reserve address space and then commit pages in that address space. Or they can reserve and commit in the same function call. These services are exposed through the Windows VirtualAlloc and VirtualAllocEx functions.

Reserved address space is simply a way for a thread to reserve a range of virtual addresses for future use. Attempting to access reserved memory results in an access violation because the page isn't mapped to any storage that can resolve the reference.

Committed pages are pages that, when accessed, ultimately translate to valid pages in physical memory. Committed pages are either private and not shareable or mapped to a view of a section (which might or might not be mapped by other processes). Sections are described in two upcoming sections: "Shared Memory and Mapped Files" and "Section Objects."

If the pages are private to the process and have never been accessed before, they are created at the time of first access as zero-initialized pages (or demand zero). Private committed pages can later be automatically written to the paging file by the operating system if memory demands dictate. Committed pages that are private are inaccessible to any other process unless they're accessed using cross-process memory functions, such as ReadProcessMemory or WriteProcess-Memory. If committed pages are mapped to a portion of a mapped file, they might need to be brought in from disk when accessed unless they've already been read earlier, either by the process accessing the page or by another process that had the same file mapped and had previously accessed the page. (See the section "Shared Memory and Mapped Files" later in this chapter.)

Pages are written to disk through normal modified page writing as pages are moved from the process working set to the modified list and ultimately to disk. (Working sets and the modified list are explained later in this chapter.) Mapped file pages can also be written back to disk as a result of an explicit call to FlushViewOfFile.

You can decommit pages and/or release address space with the VirtualFree or VirtualFreeEx function. The difference between decommittal and release is similar to the difference between reservation and committal decommitted memory is still reserved, but released memory is neither committed nor reserved. (It's freed.)

Using the two-step process of reserving and committing memory can reduce memory usage by deferring committing pages until needed but keeping the convenience of virtual contiguity.

Reserving memory is a relatively fast and inexpensive operation under Windows because it doesn't consume any committed pages (a precious system resource) or process page file quota (a limit on the number of committed pages a process can consume not necessarily page file space). All that needs to be updated or constructed is the relatively small internal data structures that represent the state of the process address space. (We'll explain these data structures, called virtual address descriptors, or VADs, later in the chapter.)

Reserving and then committing memory is useful for applications that need a potentially large virtually contiguous memory buffer; rather than committing pages for the entire region, the address space can be reserved and then committed later when needed. A utilization of this technique in the operating system is the user-mode stack for each thread. When a thread is created, a stack is reserved. (1 MB is the default; you can override this size with the CreateThread function call or on an imagewide basis by using the /STACK linker flag.) By default, the initial page in the stack is committed and the next page is marked as a guard page, which isn't committed, that traps references beyond the end of the committed portion of the stack and expands it.

Locking Memory

In general, it's better to let the memory manager decide which pages remain in physical memory. However, there might be special circumstances where this is necessary. Pages can be locked in memory in two ways:

Windows applications can call the VirtualLock function to lock pages in their process working set. The number of pages a process can lock can't exceed its minimum working set size minus eight pages. Therefore, if a process needs to lock more pages, it can increase its working set minimum with the SetProcessWorkingSetSize function (referred to in the section "Working Set Management").
Device drivers can call the kernel-mode functions MmProbeAndLockPages, MmLockPagableCodeSection, MmLockPagableDataSection, or MmLockPagableSectionByHandle. Pages locked using this mechanism remain in memory until explicitly unlocked. Although no quota is imposed on the number of pages a driver can lock in memory, a driver can't lock more pages than the resident available page count will allow.

Allocation Granularity

Windows aligns each region of reserved process address space to begin on an integral boundary defined by the value of the system allocation granularity, which can be retrieved from the Windows GetSystemInfo function. Currently, this value is 64 KB. This size was chosen so that if support were added for future processors with large page sizes (for example, up to 64 KB) or virtually indexed caches that require systemwide physical-to-virtual page alignment, the risk of requiring changes to applications that made assumptions about allocation alignment would be reduced. (Windows kernel-mode code isn't subject to the same restrictions; it can reserve memory on a single-page granularity.)

Finally, when a region of address space is reserved, Windows ensures that the size and base of the region is a multiple of the system page size, whatever that might be. For example, because x86 systems use 4-KB pages, if you tried to reserve a region of memory 18 KB in size, the actual amount reserved on an x86 system would be 20 KB. If you specified a base address of 3 KB for an 18-KB region, the actual amount reserved would be 24 KB.

Shared Memory and Mapped Files

As is true with most modern operating systems, Windows provides a mechanism to share memory among processes and the operating system. Shared memory can be defined as memory that is visible to more than one process or that is present in more than one process virtual address space. For example, if two processes use the same DLL, it would make sense to load the referenced code pages for that DLL into physical memory only once and share those pages between all processes that map the DLL, as illustrated in Figure 7-1.

Figure 7-1. Sharing memory between processes

Each process would still maintain its private memory areas in which to store private data, but the program instructions and unmodified data pages could be shared without harm. As we'll explain later, this kind of sharing happens automatically because the code pages in executable images are mapped as execute-only and writable pages are mapped copy-on-write. (See the section "Copy-on-Write" for more information.)

The underlying primitives in the memory manager used to implement shared memory are called section objects, which are called file mapping objects in the Windows API. The internal structure and implementation of section objects are described in the section "Section Objects" later in this chapter.

This fundamental primitive in the memory manager is used to map virtual addresses, whether in main memory, in the page file, or in some other file that an application wants to access as if it were in memory. A section can be opened by one process or by many; in other words, section objects don't necessarily equate to shared memory.

A section object can be connected to an open file on disk (called a mapped file) or to committed memory (to provide shared memory). Sections mapped to committed memory are called page file backed sections because the pages are written to the paging file if memory demands dictate. (Because Windows can run with no paging file, page file backed sections might in fact be "backed" only by physical memory.) As with any other page that is made visible to user mode (such as private committed pages), shared committed pages are always zero-filled when they are first accessed.

To create a section object, call the Windows CreateFileMapping function, specifying the file handle to map it to (or INVALID_HANDLE_VALUE for a page file backed section), and optionally a name and security descriptor. If the section has a name, other processes can open it with OpenFileMapping. Or you can grant access to section objects through handle inheritance (by specifying that the handle be inheritable when opening or creating the handle) or handle duplication (by using DuplicateHandle). Device drivers can also manipulate section objects with the ZwOpenSection, ZwMapViewOfSection, and ZwUnmapViewOfSection functions.

A section object can refer to files that are much larger than can fit in the address space of a process. (If the paging file backs a section object, sufficient space must exist in the paging file to contain it.) To access a very large section object, a process can map only the portion of the section object that it requires (called a view of the section) by calling the MapViewOfFile function and then specifying the range to map. Mapping views permits processes to conserve address space because only the views of the section object needed at the time must be mapped into memory.

Windows applications can use mapped files to conveniently perform I/O to files by simply making them appear in their address space. User applications aren't the only consumers of section objects: the image loader uses section objects to map executable images, DLLs, and device drivers into memory, and the cache manager uses them to access data in cached files. (For information on how the cache manager integrates with the memory manager, see Chapter 11.) How shared memory sections are implemented, both in terms of address translation and the internal data structures, is explained later in this chapter.

EXPERIMENT: Viewing Memory Mapped Files

You can list the memory mapped files in a process by using Process Explorer from Sysinternals. To view the memory mapped files by using Process Explorer, configure the lower pane to show the DLL view. (Click on View, Lower Pane View, DLLs.) Note that this is more than just a list of DLLs it represents all memory mapped files in the process address space. Some of these are DLLs, one is the image file (EXE) being run, and additional entries might represent memory mapped data files. For example, the following display from Process Explorer shows a Microsoft PowerPoint process that has memory mapped the PowerPoint file being edited into its address space:

You can also search for memory mapped files by clicking on Find, DLL. This can be useful when trying to determine which process(es) are using a DLL that you are trying to replace.

Finally, comparing the list of DLLs loaded in a process with another instance of the same program running on another system might help point to DLL configuration issues, such as the wrong version of a DLL getting loaded in a process. This problem is known affectionately as "DLL hell."

Protecting Memory

As explained in Chapter 1, Windows provides memory protection so that no user process can inadvertently or deliberately corrupt the address space of another process or the operating system itself. Windows provides this protection in four primary ways.

First, all systemwide data structures and memory pools used by kernel-mode system components can be accessed only while in kernel mode user-mode threads can't access these pages. If they attempt to do so, the hardware generates a fault, which in turn the memory manager reports to the thread as an access violation.

Note

In contrast, Microsoft Windows 95, Microsoft Windows 98, and Microsoft Windows Millennium Edition have some pages in system address space that are writable from user mode, thus allowing an errant application to corrupt key system data structures and crash the system.

Second, each process has a separate, private address space, protected from being accessed by any thread belonging to another process. The only exceptions are if the process decides to share pages with other processes or if another process has virtual memory read or write access to the process object and thus can use the ReadProcessMemory or WriteProcessMemory functions. Each time a thread references an address, the virtual memory hardware, in concert with the memory manager, intervenes and translates the virtual address into a physical one. By controlling how virtual addresses are translated, Windows can ensure that threads running in one process don't inappropriately access a page belonging to another process.

Third, in addition to the implicit protection virtual-to-physical address translation offers, all processors supported by Windows provide some form of hardware-controlled memory protection (such as read/write, read-only, and so on); the exact details of such protection vary according to the processor. For example, code pages in the address space of a process are marked read-only and are thus protected from modification by user threads.

Table 7-3 lists the memory protection options defined in the Windows API. (See the VirtualProtect, VirtualProtectEx, VirtualQuery, and VirtualQueryEx functions.)

Table 7-3. Memory Protection Options Defined in the Windows API
Attribute	Description
PAGE_NOACCESS	Any attempt to read from, write to, or execute code in this region causes an access violation.
PAGE_READONLY	Any attempt to write to (and on processors with no execute support, execute code in) memory causes an access violation, but reads are permitted.
PAGE_READWRITE	The page is readable and writable, but not executable.
PAGE_EXECUTE^[*]	Any attempt to write to code in memory in this region causes an access violation, but execution (and read on all existing processors) is permitted.
PAGE_EXECUTE_READ^[*]	Any attempt to write to code in memory in this region causes an access violation, but executes and reads are permitted.
PAGE_EXECUTE_READWRITE^[*]	The page is readable, writable, and executable no action will cause an access violation.
PAGE_WRITECOPY	Any attempt to write to memory in this region causes the system to give the process a private copy of the page. On processors with no execute support, attempts to execute code in memory in this region cause an access violation.
PAGE_EXECUTE_WRITECOPY	Any attempt to write to memory in this region causes the system to give the process a private copy of the page. Reading and executing code in this region is permitted. (No copy is made in this case.)
PAGE_GUARD	Any attempt to read from or write to a guard page raises an EXCEPTION_GUARD_PAGE exception and turns off the guard page status. Guard pages thus act as a one-shot alarm. Note that this flag can be specified with any of the page protections listed in this table except PAGE_NOACCESS.
PAGE_NOCACHE	Use physical memory that is not cached. This is not recommended for general usage. It is useful for device drivers for example, mapping a video frame buffer with no caching.
PAGE_WRITECOMBINE	Enables write-combined memory accesses. When enabled, the processor might cache memory write requests to optimize performance. For example, if multiple writes are made to the same address, only the most recent write might occur.

^[*] No execute protection is supported by Windows XP Service Pack 2 and Windows Server 2003 Service Pack 1 and later on processors that have the necessary hardware support (for example, the x64, IA-64, and future x86 processors). On earlier versions of Windows and on processors that do not support no execute protection, all page permissions allow execution. The next section contains a more complete description of no execute protection.

And finally, shared memory section objects have standard Windows access-control lists (ACLs) that are checked when processes attempt to open them, thus limiting access of shared memory to those processes with the proper rights. Security also comes into play when a thread creates a section to contain a mapped file. To create the section, the thread must have at least read access to the underlying file object or the operation will fail.

Once a thread has successfully opened a handle to a section, its actions are still subject to the memory manager and the hardware-based page protections described earlier. A thread can change the page-level protection on virtual pages in a section if the change doesn't violate the permissions in the ACL for that section object. For example, the memory manager allows a thread to change the pages of a read-only section to have copy-on-write access but not to have read/write access. The copy-on-write access is permitted because it has no effect on other processes sharing the data.

These four primary memory protection mechanisms are part of the reason that Windows is a robust, reliable operating system that is impervious to and resilient to application errors.

No Execute Page Protection

Although the Windows memory management API has always had page protection bits defined in the programming interface that allow specification of whether or not pages can contain executable code, it is only as of Windows XP Service Pack 2 and Windows Server 2003 Service Pack 1 that this capability is supported on processors that have hardware "no execute" protection, including all AMD64 processors (including AMD Athlon64 and AMD Opteron) and certain exclusively 32-bit AMD processors (selected AMD Sempron processors details are in AMD's product literature), Intel IA-64, and Intel Pentium 4 and Xeon processors with Intel Extended Memory 64 Technology (EM64T).

No execute page protection (also referred to as data execution prevention, or DEP) means an attempt to transfer control to an instruction in a page marked as "no execute" will generate an access fault. This can prevent certain types of viruses from exploiting bugs in the system that permit the execution of code placed in a data page. If an attempt is made in kernel mode to execute code in a page marked as no execute, the system will crash with the ATTEMPTED_ EXECUTE_OF_NOEXECUTE_MEMORY bugcheck code. If this occurs in user mode, a STATUS_ACCESS_VIOLATION (0xc0000005) exception is delivered to the thread attempting the illegal reference. If a process allocates memory that needs to be executable, it must explicitly mark such pages by specifying the PAGE_EXECUTE, PAGE_EXECUTE_READ, PAGE_EXECUTE_READWRITE, or PAGE_EXECUTE_WRITECOPY flags on the page granularity memory allocation functions.

On 64-bit versions of Windows, execution protection is always applied to all 64-bit programs and device drivers and cannot be disabled. Execution protection for 32-bit programs depends on system configuration settings, described shortly. On 64-bit Windows, execution protection is applied to thread stacks (both user and kernel mode), user mode pages not specifically marked as executable, kernel paged pool, and kernel session pool (for a description of kernel memory pools, see the section "System Memory Pools"). However, on 32-bit Windows, execution protection is only applied to thread stacks and user mode pages, not to paged pool and session pool. Also, when execution protection is enabled on 32-bit Windows, the system automatically boots in PAE mode (automatically selecting the PAE kernel, \Windows\ System32\Ntkrnlpa.exe). For a description of PAE, see the section "Physical Address Extension (PAE)."

The application of execution protection for 32-bit programs depends on the Boot.ini /NOEXECUTE= switch. The settings can be changed by going to the Data Execution Prevention tab under My Computer, Properties, Advanced, Performance Settings. (See Figure 7-2.) When you configure no execute protection with the DEP settings dialog box, Boot.ini is modified to add the appropriate /NOEXECUTE Boot.ini switch. For a list of the variations of the switch and how they correspond to the DEP settings tab, see Table 7-4. 32-bit applications that are excluded from execution protection are listed as registry values under the key HKLM\Software\Microsoft\ Windows NT\CurrentVersion\AppCompatFlags\Layers with the value name being the full path of the executable and the data set to "DisableNXShowUI".

Figure 7-2. Data Execution Protection settings

Table 7-4. Boot.ini/NOEXECUTE Switch
Boot.ini Switch	Option in DEP Settings Dialog Box	Meaning
/NOEXECUTE=OPTIN	Turn on DEP for necessary Windows programs and services only.	Enables DEP for core Windows system images.
/NOEXECUTE=OPTOUT	Turn on DEP for all programs and services except those that I select.	Enables DEP for all executables except those specified.
/NOEXECUTE=ALWAYSON	(No GUI interface to select this option)	Enables DEP for all components with no ability to exclude certain applications.
/NOEXECUTE=ALWAYSOFF	(No GUI interface to select this option)	Disables DEP (not recommended).

On Windows XP (both 64-bit and 32-bit versions), execution protection for 32-bit applications is configured by default to apply only to core Windows operating system executables (/NOEXECUTE=OPTIN) so as not to break 32-bit applications that might rely on being able to execute code in pages not specifically marked as executable. On Windows Server 2003 systems, execution protection for 32-bit applications is configured by default to apply to all 32-bit programs (/NOEXECUTE=OPTOUT).

Note

To obtain a complete list of which programs are protected, install the Windows Application Compatibility Toolkit (downloadable from http://www.microsoft.com) and run the Compatibility Administrator Tool. Click on System Database, Applications, and Windows Components and, on the right-hand pane, the list of protected executables will be shown.

Software Data Execution Prevention

Because most processors running Windows these days do not support hardware "no execute" protection, Windows XP Service Pack 2 and Windows Server 2003 Service Pack 1 and later support limited software data execution prevention (DEP). One aspect of software DEP reduces exploits of the exception handling mechanism in Windows. (See Chapter 3 for a description of structured exception handling.) If the program's image files are built with safe structured exception handling (a new feature in the Microsoft Visual C++ 2003 compiler), before an exception is dispatched, the system verifies that the exception handler is registered in the function table located within the image file. If the program's image files are not built with safe structured exception handling, software DEP ensures that before an exception is dispatched, the exception handler is located within a memory region marked as executable.

Copy-on-Write

Copy-on-write page protection is an optimization the memory manager uses to conserve physical memory. When a process maps a copy-on-write view of a section object that contains read/write pages, instead of making a process private copy at the time the view is mapped (as the Hewlett Packard OpenVMS operating system does), the memory manager defers making a copy of the pages until the page is written to. All modern UNIX systems use this technique as well. For example, as shown in Figure 7-3, two processes are sharing three pages, each marked copy-on-write, but neither of the two processes has attempted to modify any data on the pages.

Figure 7-3. The "before" of copy-on-write

If a thread in either process writes to a page, a memory management fault is generated. The memory manager sees that the write is to a copy-on-write page, so instead of reporting the fault as an access violation, it allocates a new read/write page in physical memory, copies the contents of the original page to the new page, updates the corresponding page-mapping information (explained later in this chapter) in this process to point to the new location, and dismisses the exception, thus causing the instruction that generated the fault to be reexecuted. This time, the write operation succeeds, but as shown in Figure 7-4, the newly copied page is now private to the process that did the writing and isn't visible to the other processes still sharing the copy-on-write page. Each new process that writes to that same shared page will also get its own private copy.

Figure 7-4. The "after" of copy-on-write

One application of copy-on-write is to implement breakpoint support in debuggers. For example, by default, code pages start out as execute-only. If a programmer sets a breakpoint while debugging a program, however, the debugger must add a breakpoint instruction to the code. It does this by first changing the protection on the page to PAGE_EXECUTE_READWRITE and then changing the instruction stream. Because the code page is part of a mapped section, the memory manager creates a private copy for the process with the breakpoint set, while other processes continue using the unmodified code page.

Copy-on-write is one example of an evaluation technique known as lazy evaluation that the memory manager uses as often as possible. Lazy-evaluation algorithms avoid performing an expensive operation until absolutely required if the operation is never required, no time is wasted on it.

The POSIX subsystem takes advantage of copy-on-write to implement the fork function. Typically, when a UNIX application calls the fork function to create another process, the first thing that the new process does is call the exec function to reinitialize the address space with an executable program. Instead of copying the entire address space on fork, the new process shares the pages in the parent process by marking them copy-on-write. If the child writes to the data, a process private copy is made. If not, the two processes continue sharing and no copying takes place. One way or the other, the memory manager copies only the pages the process tries to write to rather than the entire address space.

To examine the rate of copy-on-write faults, see the performance counter Memory: Write Copies/Sec.

Heap Manager

Many applications allocate smaller blocks than the 64-KB minimum allocation granularity possible using page granularity functions such as VirtualAlloc. Allocating such a large area for relatively small allocations is not optimal from the memory usage and performance standpoint. To address this need, Windows provides a component called the heap manager, which manages allocations inside larger memory areas reserved using the page granularity memory allocation functions. The allocation granularity in the heap manager is relatively small: 8 bytes on 32-bit systems and 16 bytes on 64-bit systems. The heap manager has been designed to optimize memory usage and performance in the case of these smaller allocations.

The heap manager exists in two places: Ntdll.dll and Ntoskrnl.exe. The subsystem APIs (such as the Windows heap APIs) call the functions in Ntdll, and various executive components and device drivers call the functions in Ntoskrnl. Its native interfaces (prefixed with Rtl) are available only for use in internal Windows components or kernel mode device drivers. The documented Windows API interface to the heap (prefixed with Heap) are thin functions that call the native functions in Ntdll.dll. In addition, legacy APIs (prefixed with either Local or Global) are provided to support older Windows applications. The most common Windows heap functions are:

HeapCreate or HeapDestroy Creates or deletes, respectively, a heap. The initial reserved and committed size can be specified at creation.
HeapAlloc Allocates a heap block.
HeapFree Frees a block previously allocated with HeapAlloc.
HeapReAlloc Changes the size of an existing allocation (grows or shrinks an existing block).
HeapLock or HeapUnlock Controls mutual exclusion to the heap operations.
HeapWalk Enumerates the entries and regions in a heap.

Types of Heaps

Each process has at least one heap: the default process heap. The default heap is created at process startup and is never deleted during the process's lifetime. It defaults to 1 MB in size, but it can be bigger by specifying a starting size in the image file by using the /HEAP linker flag. This size is just the initial reserve, however it will expand automatically as needed. (You can also specify the initial committed size in the image file.)

The default heap can be explicitly used by a program or implicitly used by some Windows internal functions. An application can query the default process heap by making a call to the Windows function GetProcessHeap. Processes can also create additional private heaps with the HeapCreate function. When a process no longer needs a private heap, it can recover the virtual address space by calling HeapDestroy. An array with all heaps is maintained in each process, and a thread can query them with the Windows function GetProcessHeaps.

A heap can manage allocations either in large memory regions reserved from the memory manager via VirtualAlloc or from memory mapped file objects mapped in the process address space. The latter approach is rarely used in practice, but it's suitable for scenarios where the content of the blocks need to be shared between two processes or between a kernel mode and a user mode component. If a heap is built on top of a memory mapped file region, certain constraints apply with respect to the component that can call heap functions. First, the internal heap structures use pointers, and therefore do not allow relocation to different addresses. Second, the synchronization across multiple processes or between a kernel component and a user process is not supported by the heap functions. Also, in the case of a shared heap between user and kernel mode, the user mode mapping should be read-only to prevent user mode code from corrupting the heap internal structures, which would result in a system crash.

Heap Manager Structure

As shown in Figure 7-5, the heap manager is structured in two layers: an optional front-end layer and the core heap. The core heap handles the basic functionality and is mostly common across the user and kernel mode heap implementations. The core functionality includes the management of blocks inside segments, the management of the segments, policies for extending the heap, committing and decommitting memory, and management of the large blocks.

Figure 7-5. Heap manager layers

For user mode heaps only, an optional front-end heap layer can exist on top of the existing core functionality. There are two types of front-end layers: look-aside lists and the Low Fragmentation Heap (or LFH, which is available in Windows XP and later), both of which are described later in this section. Only one front-end layer can be used for one heap at one time.

Heap Synchronization

The heap manager supports concurrent access from multiple threads by default. However, if a process is single threaded or uses an external mechanism for synchronization, it can tell the heap manager to avoid the overhead of synchronization by specifying HEAP_NO_SERIALIZE either at heap creation or on a per-allocation basis.

A process can also lock the entire heap and prevent other threads from performing heap operations for operations that would require consistent states across multiple heap calls. For instance, enumerating the heap blocks in a heap with the Windows function HeapWalk requires locking the heap if multiple threads can perform heap operations simultaneously.

If heap synchronization is enabled, there is one lock per heap that protects all internal heap structures. In heavily multithreaded applications (especially when running on multiprocessor systems), the heap lock might become a significant contention point. In that case, performance might be improved by enabling the front-end heap, described in an upcoming section.

Look-Aside Lists

Look-aside lists are single linked lists that allow elementary operations such as "push to the list" or "pop from the list" in a last in, first out (LIFO) order with nonblocking algorithms. A simplified version of these data structures is also available to Windows applications through the functions InterlockedPopEntrySList or InterlockedPushEntrySList. There are 128 look-aside lists per heap, which handle allocations up to 1 KB on 32-bit platforms and up to 2 KB on 64-bit platforms.

Look-aside lists provide a significant performance improvement over normal heap allocations because multiple threads can concurrently perform allocation and deallocation operations without acquiring the heap global lock. Also, cache locality is optimized by using a LIFO ordering model and by accessing fewer internal data files in each heap operation.

The heap manager maintains the number of blocks in each look-aside list and some counters that help tune the usage of each list independently. If a thread allocates a block of a size that does not exist in the corresponding look-aside list, the heap manager will forward the call to the core heap manager to complete the operation. The heap manager will also update in this case an internal counter of misses at allocations, which will later be used in tuning decisions.

The heap manager creates look-aside lists automatically when a heap is created, as long as no debugging options are enabled and the heap is expandable. Some applications might have compatibility issues as a result of the heap manager's use of look-aside lists. In this case, these legacy applications can be made to run properly by specifying the DisableHeapLookaside flag in the image file execution options for that application. (Image file execution options can be specified using the Imagecfg.exe tool in the Windows 2000 Server Resource Kit, supplement 1.)

The Low Fragmentation Heap

Many applications running in Windows have relatively small heap memory usage (usually less than one megabyte). For this class of applications, the heap manager's best-fit policy helps keep a low memory footprint for each process. However, this strategy does not scale for large processes and multiprocessor machines. In these cases, memory available for heap usage might be reduced as a result of heap fragmentation. Performance can suffer in scenarios where only certain sizes are often used concurrently from different threads scheduled to run on different processors. This is because several processors need to modify the same memory location (for example, the head of the look-aside list for that particular size) at the same time, thus invalidating the corresponding cache line for the other processors.

The Low Fragmentation Heap (LFH) addresses these issues using the core heap manager and look-aside lists. Unlike the look-aside lists that are used as front-end heaps by default if other heap settings are allowing it, LFH is turned on only if an application calls the HeapSetInformation function. For large heaps, a significant percentage of allocations is generally grouped in a relatively small number of buckets of certain sizes. The allocation strategy used by LFH is to optimize the usage for these patterns by efficiently handling same-size blocks.

To address scalability, LFH expands the frequently accessed internal structures to a number of slots that is two times larger than the current number of processors on the machine. The assignment of threads to these slots is done by an LFH component called the affinity manager. Initially, LFH starts using the first slot for heap allocations; however, if a contention is detected at accessing some internal data, LFH switches the current thread to use a different slot. Further contentions will spread threads on more slots. These slots are controlled for each size bucket, also to improve locality and minimize the overall memory consumption.

Heap Debugging Features

The heap manager includes several features to help detect bugs by using the following heap functions:

Enable tail checking The end of each block carries a signature, which is checked when the block is released. If a buffer overrun destroyed the signature entirely or partially, the heap will report this error.
Enable free checking A free block is filled with a pattern, which is checked at various points when the heap manager needs to access the block (such as at removal from the free list to allocate the block). If the process continued to write to the block after freeing it, the heap manager will detect changes in the pattern and the error will be reported.
Parameter checking This function consists of extensive checking of the parameters passed to the heap functions.
Heap validation The entire heap is validated at each heap call.
Heap tagging and stack traces support This function supports specifying tags for allocation and/or captures user mode stack traces for the heap calls to help narrow the possible causes of a heap error.

The first three options are enabled by default if the loader detects that a process is started under the control of a debugger. (A debugger can override this behavior and turn off these features.) The heap debugging features can be specified for an executable image by setting various debugging flags in the image header using the gflags tool. (See the section "Windows Global Flags" in Chapter 3.) Or, heap debugging options can be enabled using the !heap command in the standard Windows debuggers. (See the debugger help for more information.)

Enabling heap debugger options affects all heaps in the process. Also, if any of the heap debug options are enabled, the front-end heap will be disabled automatically and the core heap will be used (with the required debugging options enabled). The front-end heaps are also not used for heaps that are not expandable (because of the extra overhead added to the existing heap structures) or for heaps that do not allow serialization.

Pageheap

Because the tail and free checking options described in the preceding sections might be discovering corruptions that occurred well before the problem was detected, an additional heap debugging tool, called pageheap, is provided that directs all or part of the heap calls to a different heap manager. Pageheap is part of the Windows Application Compatibility Toolkit, which can be downloaded from http://www.microsoft.com. The pageheap places allocations at the end of pages so that if a buffer overrun occurs, it will cause an access violation, making it easier to detect the offending code. Optionally, pageheap allows placing the blocks at the beginning of the pages to detect buffer underrun problems. (This is a rare occurrence.) The pageheap also can protect freed pages against any access to detect references to heap blocks after they have been freed.

Note that using the pageheap can result in running out of address space because of the significant overhead added for small allocations. Also, performance can suffer as of result of the increase of references to demand zero pages, loss of locality, and additional overhead caused by frequent calls to validate heap structures. A process can reduce the impact by specifying that the pageheap be used only for blocks of certain sizes, address ranges, and/or originating DLLs.

Note

For more information on pageheap, see article 286470 in the Microsoft Knowledge Base (http://support.microsoft.com).

Address Windowing Extensions

Although the 32-bit version of Windows can support up to 128 GB of physical memory (as shown in Table 2-4), each 32-bit user process has by default only a 2-GB virtual address space. (This can be configured up to 3 GB when using the /3GB and /USERVA Boot.ini switches, described in the upcoming section "x86 User Address Space Layouts.") To allow a 32-bit process to allocate and access more physical memory than can be represented in its limited address space, Windows provides a set of functions called Address Windowing Extensions (AWE). For example, on a Windows 2000 Advanced Server system with 8 GB of physical memory, a database server application could use AWE to allocate and use perhaps 6 GB of memory as a database cache.

Allocating and using memory via the AWE functions is done in three steps:

Allocating the physical memory to be used
Creating a region of virtual address space to act as a window to map views of the physical memory
Mapping views of the physical memory into the window

To allocate physical memory, an application calls the Windows function AllocateUserPhysicalPages. (This function requires the Lock Pages In Memory user right.) The application then uses the Windows VirtualAlloc function with the MEM_PHYSICAL flag to create a window in the private portion of the process's address space that is mapped to some or all of the physical memory previously allocated. The AWE-allocated memory can then be used with nearly all the Windows APIs. (For example, the Microsoft DirectX functions can't use AWE memory.)

If an application creates a 256-MB window in its address space and allocates 4 GB of physical memory (on a system with more than 4 GB of physical memory), the application can use the MapUserPhysicalPages or MapUserPhysicalPagesScatter Windows functions to access any portion of the physical memory by mapping the memory into the 256-MB window. The size of the application's virtual address space window determines the amount of physical memory that the application can access with a given mapping. Figure 7-6 shows an AWE window in a server application address space mapped to a portion of physical memory previously allocated by AllocateUserPhysicalPages.

Figure 7-6. Using AWE to map physical memory

The AWE functions exist on all editions of Windows and are usable regardless of how much physical memory a system has. However, AWE is most useful on systems with more than 2 GB of physical memory because it's the only way for a 32-bit process to directly use more than 2 GB of memory. Another use is for security purposes: because AWE memory is never paged out, the data in AWE memory could never have a copy in the paging file that someone could examine by rebooting into an alternate operating system.

Finally, there are some restrictions on memory allocated and mapped by the AWE functions:

Pages can't be shared between processes.
The same physical page can't be mapped to more than one virtual address in the same process.
On older versions of Windows, page protection is limited to read/write. In Windows Server 2003 Service Pack 1 and later, no access and read-only are supported.

For a description of the page table data structures used to map memory on systems with more than 4 GB of physical memory, see the section "Physical Address Extension (PAE)."

< Day Day Up >

Large and Small Pages

Table 7-1. Page Sizes

Table 7-2. Large Page Minimums

Reserving and Committing Pages

Locking Memory

Allocation Granularity

Shared Memory and Mapped Files

Figure 7-1. Sharing memory between processes

EXPERIMENT: Viewing Memory Mapped Files

Protecting Memory

Table 7-3. Memory Protection Options Defined in the Windows API

No Execute Page Protection

Figure 7-2. Data Execution Protection settings

Table 7-4. Boot.ini/NOEXECUTE Switch

Software Data Execution Prevention

Copy-on-Write

Figure 7-3. The "before" of copy-on-write

Figure 7-4. The "after" of copy-on-write

Heap Manager

Types of Heaps

Heap Manager Structure

Figure 7-5. Heap manager layers

Heap Synchronization

Look-Aside Lists

The Low Fragmentation Heap

Heap Debugging Features

Pageheap

Address Windowing Extensions

Figure 7-6. Using AWE to map physical memory