Memory Management
In this section, I ll discuss the topic of memory management. Windows XP divides the available virtual address space in several ways. One division a very firm one based on security and integrity concerns is between user-mode addresses and kernel-mode addresses. Another division, which is almost but not quite coextensive with the first, is between paged and nonpaged memory. All user-mode addresses and some kernel-mode addresses reference page frames that the Memory Manager swaps to and from the disk over time, while some kernel-mode addresses always reference page frames in physical memory. Since Windows XP allows portions of drivers to be paged, I ll explain how you control the pageability of your driver at the time you build your driver and at run time.
Windows XP provides several methods for managing memory. I ll describe two basic service functions ExAllocatePoolWithTag and ExFreePool that you use for allocating and releasing randomly sized blocks from a heap. I ll also describe the primitives that you use for organizing memory blocks into linked lists of structures. Finally I ll describe the concept of a look aside list, which allows you to efficiently allocate and release blocks that are all the same size.
User-Mode and Kernel-Mode Address Spaces
Windows XP and Microsoft Windows 98/Me run on computers that support a virtual address space, wherein virtual addresses are mapped either to physical memory or (conceptually, anyway) to page frames within a swap file on disk. To grossly simplify matters, you can think of the virtual address space as being divided into two parts: a kernel-mode part and a user-mode part. See Figure 3-6.
Figure 3-6. User-mode and kernel-mode portions of the address space.
Each user-mode process has its own address context, which maps the user-mode virtual addresses to a unique collection of physical page frames. In other words, the meaning of any particular virtual address changes from one moment to the next as the Windows XP scheduler switches from a thread in one process to a thread in another process. Part of the work in switching threads is to change the page tables used by a processor so that they refer to the incoming thread s process context.
It s generally unlikely that a WDM driver will execute in the same thread context as the initiator of the I/O requests it handles. We say that we re running in arbitrary thread context if we don t know for sure to which process the current user-mode address context belongs. In arbitrary thread context, we simply can t use a virtual address that belongs to user mode because we can t have any idea to what physical memory it might point. In view of this uncertainty, we generally obey the following rule inside a driver program:
Never (well, hardly ever) directly reference user-mode memory.
In other words, don t take an address that a user-mode application provides and treat that address as a pointer that we can directly dereference. I ll discuss in later chapters a few techniques for accessing data buffers that originate in user mode. All we need to know right now, though, is that we re (nearly) always going to be using kernel-mode virtual addresses whenever we want to access the computer s memory.
How Big Is a Page?
In a virtual memory system, the operating system organizes physical memory and the swap file into like-size page frames. In a WDM driver, you can use the manifest constant PAGE_SIZE to tell you how big a page is. In some Windows XP computers, a page is 4096 bytes long; in others, it s 8192 bytes long. A related constant named PAGE_SHIFT equals the page size as a power of 2. That is:
PAGE_SIZE == 1 << PAGE_SHIFT
For your convenience, you can use a few preprocessor macros in your code when you re working with the size of a page:
ROUND_TO_PAGES rounds a size in bytes to the next-higher page boundary. For example, ROUND_TO_PAGES(1) is 4096 on a 4-KB-page computer.
BYTES_TO_PAGES determines how many pages are required to hold a given number of bytes beginning at the start of a page. For example, BYTES_TO_PAGES(42) would be 1 on all platforms, and BYTES_TO_PAGES(5000) would be 2 on some platforms and 1 on others.
BYTE_OFFSET returns the byte offset portion of a virtual address. That is, it calculates the starting offset within some page frame of a given address. On a 4-KB-page computer, BYTE_OFFSET(0x12345678) would be 0x678.
PAGE_ALIGN rounds a virtual address down to a page boundary. On a 4-KB-page computer, PAGE_ALIGN(0x12345678) would be 0x12345000.
ADDRESS_AND_SIZE_TO_SPAN_PAGES returns the number of page frames occupied by a specified number of bytes beginning at a specified virtual address. For example, the statement ADDRESS_AND_SIZE_TO_SPAN_PAGES(0x12345FFF, 2) is 2 on a 4-KB-page machine because the 2 bytes span a page boundary.
Paged and Nonpaged Memory
The whole point of a virtual memory system is that you can have a virtual address space that s much bigger than the amount of physical memory on the computer. To accomplish this feat, the Memory Manager needs to swap page frames in and out of physical memory. Certain parts of the operating system can t be paged, though, because they re needed to support the Memory Manager itself. The most obvious example of something that must always be resident in memory is the code that handles page faults (the exceptions that occur when a page frame isn t physically present when needed) and the data structures used by the page fault handler.
The category of must be resident stuff is much broader than just the page fault handlers. Windows XP allows hardware interrupts to occur at nearly any time, including while a page fault is being serviced. If this weren t so, the page fault handler wouldn t be able to read or write pages from a device that uses an interrupt. Thus, every hardware interrupt service routine must be in nonpaged memory. The designers of Windows NT decided to include even more routines in the nonpaged category by using a simple rule:
Code executing at or above interrupt request level (IRQL) DISPATCH_LEVEL cannot cause page faults.
I ll elaborate on this rule in the next chapter.
You can use the PAGED_CODE preprocessor macro (declared in wdm.h) to help you discover violations of this rule in the checked build of your driver. For example:
NTSTATUS DispatchPower(PDEVICE_OBJECT fdo, PIRP Irp) { PAGED_CODE() }
PAGED_CODE contains conditional compilation. In the checked-build environment, it prints a message and generates an assertion failure if the current IRQL is too high. In the free-build environment, it doesn t do anything. To understand why PAGED_CODE is useful, imagine that DispatchPower needs for some reason to be in nonpaged memory but that you have misplaced it in paged memory. If the system happens to call DispatchPower at a time when the page containing it isn t present, a page fault will occur, followed by a bug check. The bug check code will be pretty uninformative (IRQL_NOT_LESS_OR_EQUAL or DRIVER_IRQL_NOT_LESS_OR_EQUAL), but at least you ll find out that you have a problem. If you test your driver in a situation in which the page containing DispatchPower happens fortuitously to be in memory, though, there won t be a page fault. PAGED_CODE will detect the problem even so.
Setting the Driver Verifier Force IRQL Checking option will greatly increase the chances of discovering that you ve broken the rule about paging and IRQL. The option forces pageable pages out of memory whenever verified drivers raise the IRQL to DISPATCH_LEVEL or beyond.
Compile-Time Control of Pageability
Given that some parts of your driver must always be resident and some parts can be paged, you need a way to control the assignment of your code and data to the paged and nonpaged pools. You accomplish part of this job by instructing the compiler how to apportion your code and data among various sections. The run-time loader uses the names of the sections to put parts of your driver in the places you intend. You can also accomplish parts of this job at run time by calling various Memory Manager routines that I ll discuss in the next section.
NOTE
Win32 executable files, including kernel-mode drivers, are internally composed of one or more sections. A section can contain code or data and, generally speaking, has additional attributes such as being readable, writable, shareable, executable, and so on. A section is also the smallest unit that you can designate when you re specifying page ability. When loading a driver image, the system puts sections whose literal names begin with PAGE or .EDA (the start of .EDATA) into the paged pool unless the DisablePagingExecutive value in the HKLM\System\CurrentControlSet\Control\Session Manager\Memory Management key happens to be set (in which case no driver paging occurs). Note that these names are case sensitive! In one of the little twists of fate that affect us all from time to time, running Soft-Ice/W on Windows XP requires you to disable kernel paging in this way. This certainly makes it harder to find bugs caused by misplacement of driver code or data into the paged pool! If you use this debugger, I recommend that you religiously use the PAGED_CODE macro and the Driver Verifier.
The traditional way of telling the compiler to put code into a particular section is to use the alloc_text pragma. Since not every compiler will necessarily support the pragma, the DDK headers either define or don t define the constant ALLOC_PRAGMA to tell you whether to use the pragma. You can then invoke the pragma to specify the section placement of individual subroutines in your driver, as follows:
#ifdef ALLOC_PRAGMA #pragma alloc_text(PAGE, AddDevice) #pragma alloc_text(PAGE, DispatchPnp) #endif
These statements serve to place the AddDevice and DispatchPnp functions into the paged pool.
The Microsoft C/C++ compiler places two annoying restrictions on using alloc_text:
The pragma must follow the declaration of a function but precede the definition. One way to obey this rule is to declare all the functions in your driver in a standard header file and invoke alloc_text at the start of the source file that contains a given function but after you include that header.
The pragma can be used only with functions that have C-linkage. In other words, it won t work for class member functions or for functions in a C++ source file that you didn t declare using extern C .
To control the placement of data variables, you use a different pragma under the control of a different preprocessor macro symbol:
#ifdef ALLOC_DATA_PRAGMA #pragma data_seg("PAGEDATA") #endif
The data_seg pragma causes all static data variables declared in a source module after the appearance of the pragma to go into the paged pool. You ll notice that this pragma differs in a fundamental way from alloc_text. A pageable section starts where #pragma data_seg( PAGEDATA ) appears and ends where a countervailing #pragma data_seg() appears. Alloc_text, on the other hand, applies to a specific function.
More About Section Placement
In general, I find it more convenient to specify the section placement of whole blocks of code by using the Microsoft code_seg pragma, which works the same way as data_seg, only for code. That is, you can tell the Microsoft compiler to start putting functions into the paged pool like this:#pragma code_seg("PAGE") NTSTATUS AddDevice(...){...} NTSTATUS DispatchPnp(...){...}
The AddDevice and DispatchPnp functions would both end up in the paged pool. You can check to see whether you re compiling with the Microsoft compiler by testing the existence of the predefined preprocessor macro _MSC_VER.
To revert to the default code section, just code #pragma code_seg with no argument:
#pragma code_seg()
Similarly, to revert to the regular nonpaged data section, code #pragma data_seg with no argument:
#pragma data_seg()
This sidebar is also the logical place to mention that you can also direct code into the INIT section if it s not needed once your driver finishes initializing. For example:
#pragma alloc_text(INIT, DriverEntry)
This statement forces the DriverEntry function into the INIT section. The system will release the memory it occupies when it returns. This small savings isn t very important in the grand scheme of things because a WDM driver s DriverEntry function isn t very big. Previous Windows NT drivers had large DriverEntry functions that had to create device objects, locate resources, configure devices, and so on. For them, using this feature offered significant memory savings.
Notwithstanding the low utility of putting DriverEntry in the INIT section in a WDM driver, I was in the habit of doing so until quite recently. Because of a bug in Windows 98/Me, I had a situation in which a WDM driver wasn t being completely removed from memory after I unplugged my hardware. One part of the system didn t understand this and tried to call DriverEntry when I replugged the hardware. The memory that had originally contained DriverEntry had long since been overwritten by INIT code belonging to other drivers, and a crash resulted. This was very difficult to debug! I now prefer to place DriverEntry in a paged section.
You can use the DUMPBIN utility that comes with Microsoft Visual C++ .NET to easily see how much of your driver is initially pageable. Your marketing department might even want to crow about how much less nonpaged memory you use than your competitors.
Run-Time Control of Pageability
Table 3-3 lists the service functions you can use at run time to fine-tune the pageability of your driver in various situations. The purpose of these routines is to let you release the physical memory that would otherwise be tied up by your code and data during periods when it won t be needed. In Chapter 8, for example, I ll discuss how you can put your device into a low power state during periods of inactivity. Powering down might be a good time to release your locked pages.
Service Function | Description |
MmLockPagableCodeSection | Locks a code section given an address inside it |
MmLockPagableDataSection | Locks a data section given an address inside it |
MmLockPagableSectionByHandle | Locks a code section by using a handle from a previous MmLockPagableCodeSection call (Windows 2000 and Windows XP only) |
MmPageEntireDriver | Unlocks all pages belonging to driver |
MmResetDriverPaging | Restores compile-time pageability attributes for entire driver |
MmUnlockPagableImageSection | Unlocks a locked code or data section |
I m going to describe one way to use these functions to control the pageability of code in your driver. You might want to read the DDK descriptions to learn about other ways to use them. First distribute subroutines in your driver into separately named code sections, like this:
#pragma alloc_text(PAGEIDLE, DispatchRead) #pragma alloc_text(PAGEIDLE, DispatchWrite)
That is, define a section name beginning with PAGE and ending in any four-character suffix you please. Then use the alloc_text pragma to place some group of your own routines in that special section. You can have as many special pageable sections as you want, but your logistical problems will grow as you subdivide your driver in this way.
During initialization (say, in DriverEntry), lock your pageable sections like this:
PVOID hPageIdleSection; NTSTATUS DriverEntry(...) { hPageIdleSection = MmLockPagableCodeSection((PVOID) DispatchRead); }
When you call MmLockPagableCodeSection, you specify any address at all within the section you re trying to lock. The real purpose of making this call during DriverEntry is to obtain the handle value it returns, which I ve shown you saving in a global variable named hPageIdleSection. You ll use that handle much later on, when you decide you don t need a particular section in memory for a while:
MmUnlockPagableImageSection(hPageIdleSection);
This call will unlock the pages containing the PAGEIDLE section and allow them to move in and out of memory on demand. If you later discover that you need those pages back again, you make this call:
MmLockPagableSectionByHandle(hPageIdleSection);
Following this call, the PAGEIDLE section will once again be in nonpaged memory (but not necessarily the same physical memory as previously). Note that this function call is available to you only in Windows 2000 and Windows XP, and then only if you ve included ntddk.h instead of wdm.h. In other situations, you will have to call MmLockPagableCodeSection again.
You can do something similar to place data objects into pageable sections:
PVOID hPageDataSection; #pragma data_seg("PAGE") ULONG ulSomething; #pragma data_seg() hPageDataSection = MmLockPagableDataSection((PVOID) &ulSomething); MmUnlockPagableImageSection(hPageDataSection); MmLockPagableSectionByHandle(hPageDataSection);
I ve played fast and loose with my syntax here these statements would appear in widely separated parts of your driver.
The key idea behind the Memory Manager service functions I just described is that you initially lock a section containing one or more pages and obtain a handle for use in subsequent calls. You can then unlock the pages in a particular section by calling MmUnlockPagableImageSection and passing the corresponding handle. Relocking the section later on requires a call to MmLockPagableSectionByHandle.
A quick shortcut is available if you re sure that no part of your driver will need to be resident for a while. MmPageEntireDriver will mark all the sections in a driver s image as being pageable. Conversely, MmResetDriverPaging will restore the compile-time pageability attributes for the entire driver. To call these routines, you just need the address of some piece of code or data in the driver. For example:
MmPageEntireDriver((PVOID) DriverEntry); MmResetDriverPaging((PVOID) DriverEntry);
You need to exercise care when using any of the Memory Manager routines I ve just described if your device uses an interrupt. If you page your entire driver, the system will also page your interrupt service routine (ISR). If your device or any device with which you share an interrupt vector should interrupt, the system will try to call your ISR. Even if you think your interrupt isn t shared and you ve inhibited your device from generating an interrupt, bear in mind that spurious interrupts have been known to occur. If the ISR isn t present, the system will crash. You avoid this problem by disconnecting your interrupt before allowing the ISR to be paged.
Heap Allocator
The basic heap allocation service function in kernel mode is ExAllocatePoolWithTag. You call it like this:
PVOID p = ExAllocatePoolWithTag(type, nbytes, tag);
The type argument is one of the POOL_TYPE enumeration constants described in Table 3-4, and nbytes is the number of bytes you want to allocate. The tag argument is an arbitrary 32-bit value. The return value is a kernel-mode virtual address pointer to the allocated memory block.
In most drivers, including the samples in this book and in the DDK, you ll see calls to an older function named ExAllocatePool:
PVOID p = ExAllocatePool(type, nbytes);
ExAllocatePool was the heap allocation function in the earliest versions of Windows NT. In the Windows XP DDK, ExAllocatePool is actually a macro that invokes ExAllocatePoolWithTag using the tag value mdW ( Wdm plus a trailing space after byte reversal).
Pool Type | Description |
NonPagedPool | Allocates from the nonpaged pool of memory |
PagedPool | Allocates from the paged pool of memory |
NonPagedPoolCacheAligned | Allocates from the nonpaged pool and ensures that memory is aligned with the CPU cache |
PagedPoolCacheAligned | Allocates from the paged pool of memory and ensures that memory is aligned with the CPU cache |
The most basic decision you must make when you call ExAllocatePoolWithTag is whether the allocated memory block can be swapped out of memory. That choice depends simply on which parts of your driver will need to access the memory block. If you ll be using a memory block at or above DISPATCH_LEVEL, you must allocate it from the nonpaged pool. If you ll always use the memory block below DISPATCH_LEVEL, you can allocate from the paged or nonpaged pool as you choose.
Allocations from the PagedPool must occur at an IRQL less than DISPATCH_LEVEL. Allocations from the NonPagedPool must occur at an IRQL less than or equal to DISPATCH_LEVEL. The Driver Verifier bug checks whether you violate either of these rules.
Limits on Pool Allocations
A frequently asked question is, How much memory can I allocate with one call to ExAllocatePoolWithTag? Unfortunately, there s no simple answer to the question. A starting point is to determine the maximum sizes of the paged and non-paged pools. You can consult Knowledge Base article Q126402 and Chapter 7 of Inside Windows 2000 (Microsoft Press, 2000) for (probably) more information than you ll ever want to know about this topic. By way of example, on a 512 MB machine, I ended up with a maximum nonpaged pool size of 128 MB and an actual paged pool size of 168 MB.Knowing the pool sizes is not the end of the story, though. I wouldn t expect to be able to allocate anywhere close to 128 MB of nonpaged memory on this 512-MB computer in one call to ExAllocatePoolWithTag. For one thing, other parts of the system will have used up significant amounts of nonpaged memory by the time my driver gets a chance to try, and the system would probably run very poorly if I took all that was left over. For another thing, the virtual address space is likely to be fragmented once the system has been running for a while, so the heap manager wouldn t be able to find an extremely large contiguous range of unused virtual addresses.
In actual, not-very-scientific tests, using the MEMTEST sample from the companion content, I was able to allocate about 129 MB of paged memory and 100 MB of nonpaged memory in a single call.
Sample Code
The MEMTEST sample uses ExAllocatePoolWithTagPriority to determine the largest contiguous allocations possible from the paged and nonpaged pools.
When you use ExAllocatePoolWithTag, the system allocates 4 more bytes of memory than you asked for and returns you a pointer that s 4 bytes into that block. The tag occupies the initial 4 bytes and therefore precedes the pointer you receive. The tag will be visible to you when you examine memory blocks while debugging or while poring over a crash dump, and it can help you identify the source of a memory block that s involved in some problem or another. For example:
#define DRIVERTAG ''KNUJ'' PVOID p = ExAllocatePoolWithTag(PagedPool, 42, DRIVERTAG);
Here I used a 32-bit integer constant as the tag value. On a little-endian computer such as an x86, the bytes that compose this value will be reversed in memory to spell out a common word in the English language. Several features of the Driver Verifier relate to specific memory tags, by the way, so you can do yourself a favor in the debugging department by using one more unique tags in your allocation calls.
Do not specify zero or GIB (BIG with a space at the end, after byte reversal) as a tag value. Zero-tagged blocks can t be tracked, and the system internally uses the BIG tag for its own purposes. Do not request zero bytes. This restriction could be a special concern if you re writing your own C or C++ runtime support, since malloc and operator new allow requests for zero bytes.
More About Pool Tags
Several diagnostic mechanisms inside the kernel depend on pool tagging, and you can help yourself analyze your driver s performance by picking a unique set of tags. You must also explicitly enable kernel pool tagging in the retail release of the system (it s enabled by default in the checked build) by using the GFLAGS.EXE utility. GFLAGS is part of the platform SDK and other components.Having done both of these things using unique tags in your driver and enabling pool tagging in the kernel you can profitably use a few tools. POOLMON and POOLTAG in the DDK tools directory report on memory usage by tag value. You can also ask GFLAGS to make one of your pools special in order to check for overwrites.
The pointer you receive will be aligned with at least an 8-byte boundary. If you place an instance of some structure in the allocated memory, members to which the compiler assigns an offset divisible by 4 or 8 will therefore occupy an address divisible by 4 or 8 too. On some RISC platforms, of course, you must have doubleword and quadword values aligned in this way. For performance reasons, you might want to be sure that the memory block will fit in the fewest possible number of processor cache lines. You can specify one of the XxxCacheAligned type codes to achieve that result. If you ask for less than a page s worth of memory, the block will be contained in a single page. If you ask for at least a page s worth of memory, the block will start on a page boundary.
NOTE
Asking for PAGE_SIZE + 1 bytes of memory is about the worst thing you can do to the heap allocator: the system reserves two pages, of which nearly half will ultimately be wasted.
It should go without saying that you need to be extra careful when accessing memory you ve allocated from the free storage pools in kernel mode. Since driver code executes in the most privileged mode possible for the processor, there s almost no protection from wild stores.
Using GFLAGS or the Driver Verifier s Special Pool option allows you to find memory overwrite errors more easily. With this option, allocations from the special pool lie at the end of a page that s followed in virtual memory by a not-present page. Trying to touch memory past the end of your allocated block will earn you an immediate page fault. In addition, the allocator fills the rest of the page with a known pattern. When you eventually release the memory, the system checks to see whether you overwrote the pattern. In combination, these checks make it much easier to detect bugs that result from coloring outside the lines of your allocated memory. You can also ask to have allocations at the start of a page preceded by a not-present page, by the way. Refer to Knowledge Base article Q192486 for more information about the special pool.
Handling Low-Memory Situations
If there isn t enough memory to satisfy your request, the pool allocator returns a NULL pointer. You should always test the return value and do something reasonable. For example:
PMYSTUFF p = (PMYSTUFF) ExAllocatePool(PagedPool, sizeof(MYSTUFF)); if (!p) return STATUS_INSUFFICIENT_RESOURCES;
Additional pool types include the concept must succeed. If there isn t enough heap memory to satisify a request from the must-succeed pool, the system bug checks. Drivers should not allocate memory using one of the must-succeed specifiers. This is because a driver can nearly always fail whatever operation is under way. Causing a system crash in a low-memory situation is not something a driver should do. Furthermore, only a limited pool of must-succeed memory exists in the entire system, and the operating system might not be able to allocate memory needed to keep the computer running if drivers tie up some. In fact, Microsoft wishes it had never documented the must-succeed options in the DDK to begin with.
The Driver Verifier will bug check whether a driver specifies one of the must-succeed pool types in an allocation request. In addition, if you turn on the low-resource-simulation option in the Driver Verifier, your allocations will begin randomly failing after the system has been up seven or eight minutes. Every five minutes or so, the system will fail all your allocations for a burst of 10 seconds.
In some situations, you might want to use a technique that s commonly used in file system drivers. If you OR the value POOL_RAISE_IF_ALLOCATION_FAILURE (0x00000010) into the pool type code, the heap allocator will raise a STATUS_INSUFFICIENT_RESOURCES exception instead of returning NULL if there isn t enough memory. You should use a structured exception frame to catch such an exception. For example:
#ifndef POOL_RAISE_IF_ALLOCATION_FAILURE #define POOL_RAISE_IF_ALLOCATION_FAILURE 16 #endif #define PagedPoolRaiseException (POOL_TYPE) \ (PagedPool POOL_RAISE_IF_ALLOCATION_FAILURE) #define NonPagedPoolRaiseException (POOL_TYPE) \ (NonPagedPool POOL_RAISE_IF_ALLOCATION_FAILURE) NTSTATUS SomeFunction() { NTSTATUS status; __try { PMYSTUFF p = (PMYSTUFF) ExAllocatePoolWithTag(PagedPoolRaiseException, sizeof(MYSTUFF), DRIVERTAG); <Code that uses "p" without checking it for NULL> status = STATUS_SUCCESS; } __except(EXCEPTION_EXECUTE_HANDLER) { status = GetExceptionCode(); } return status; }
NOTE
POOL_RAISE_IF_ALLOCATION_FAILURE is defined in NTIFS.H, a header file that s available only as part of the extra-cost Installable File System kit. Doing memory allocations with this flag set is so common in file system drivers, though, that I thought you should know about it.
Incidentally, I suggest that you not go crazy trying to diagnose or recover from failures to allocate small blocks of memory. As a practical matter, an allocation request for, say, 32 bytes is never going to fail. If memory were that tight, the system would be running so sluggishly that someone would surely reboot the machine. You must not be the cause of a system crash in this situation, though, because you would thereby be the potential source of a denial-of-service exploit. But since the error won t arise in real life, there s no point in putting elaborate code into your driver to log errors, signal WMI events, print debugging messages, execute alternative algorithms, and so on. Indeed, the extra code needed to do all that might be the reason the system didn t have the extra 32 bytes to give you in the first place! Thus, I recommend testing the return value from every call to ExAllocatePoolWithTag. If it discloses an error, do any required cleanup and return a status code. Period.
Releasing a Memory Block
To release a memory block you previously allocated with ExAllocatePoolWithTag, you call ExFreePool:
ExFreePool((PVOID) p);
You do need to keep track somehow of the memory you ve allocated from the pool in order to release it when it s no longer needed. No one else will do that for you. You must sometimes closely read the DDK documentation of the functions you call with an eye toward memory ownership. For example, in the AddDevice function I showed you in the previous chapter, there s a call to IoRegisterDeviceInterface. That function has a side effect: it allocates a memory block to hold the string that names the interface. You are responsible for releasing that memory later on.
The Driver Verifier checks at DriverUnload time to ensure a verified driver has released all the memory it allocated. In addition, the verifier sanity-checks all calls to ExFreePool to make sure they refer to a complete block of memory allocated from a pool consistent with the current IRQL.
The DDK headers declare an undocumented function named ExFreePoolWithTag. This function was intended for internal use only in order to make sure that system components didn t inadvertently release memory belonging to other components. The function was politely called a waste of time by one of the Microsoft developers, which pretty much tells us that we needn t worry about what it does or how to use it. (Hint: you need to do some other undocumented things in order to use it successfully.)
Two More Functions
Although ExAllocatePoolWithTag is the function you should use for heap allocation, you can use two other pool allocation functions in special circumstances: ExAllocatePoolWithQuotaTag (and a macro named ExAllocatePoolWithQuota that supplies a default tag) and ExAllocatePoolWithTagPriority. ExAllocatePoolWithQuotaTag allocates a memory block and charges the current thread s scheduling quota. This function is for use by file system drivers and other drivers running in a nonarbitrary thread context for allocating memory that belongs to the current thread. A driver wouldn t ordinarily use this function because a quota violation causes the system to raise an exception.
ExAllocatePoolWithTagPriority, which is new with Windows XP, allows you to specify how important you consider it to be that a memory allocation request succeed:
PVOID p = ExAllocatePoolWithTagPriority(type, nbytes, tag, priority);
The arguments are the same as we ve been studying except that you also supply an additional priority indicator. See Table 3-5.
Priority Argument | Description |
LowPoolPriority | System may fail request if low on resources. Driver can easily cope with failure. |
NormalPoolPriority | System may fail request if low on resources. |
HighPoolPriority | System should not fail request unless completely out of resources. |
The DDK indicates that most drivers should specify NormalPoolPriority when calling this function. HighPoolPriority should be reserved for situations in which success is critically important to the continued working of the system.
You can lexigraphically append the phrases SpecialPoolOverrun and SpecialPoolUnderrun to the names given in Table 3-5 (for example, LowPool PrioritySpecialPoolOverrun, and so on). If an allocation would use the special pool, the overrun and underrun flags override the default placement of blocks.
At the time I m writing this, ExAllocatePoolWithTagPriority turns into a simple call to ExAllocatePoolWithTag if you are asking for paged memory at high priority or nonpaged memory at any priority. The extra resource checking happens only with requests for paged memory at low or normal priority. This behavior could change in service packs or later versions of the operating system.
Linked Lists
Windows XP makes extensive use of linked lists as a way of organizing collections of similar data structures. In this chapter, I ll discuss the basic service functions you use to manage doubly-linked and singly-linked lists. Separate service functions allow you to share linked lists between threads and across multiple processors; I ll describe those functions in the next chapter after I ve explained the synchronization primitives on which they depend.
Whether you organize data structures into a doubly-linked or a singly-linked list, you normally embed a linking substructure either a LIST_ENTRY or a SINGLE_LIST_ENTRY into your own data structure. You also reserve a list head element somewhere that uses the same structure as the linking element. For example:
typedef struct _TWOWAY { LIST_ENTRY linkfield; } TWOWAY, *PTWOWAY; LIST_ENTRY DoubleHead; typedef struct _ONEWAY { SINGLE_LIST_ENTRY linkfield; } ONEWAY, *PONEWAY; SINGLE_LIST_ENTRY SingleHead;
When you call one of the list-management service functions, you always work with the linking field or the list head never directly with the containing structures themselves. So suppose you have a pointer (pdElement) to one of your TWOWAY structures. To put that structure on a list, you d reference the embedded linking field like this:
InsertTailList(&DoubleHead, &pdElement->linkfield);
Similarly, when you retrieve an element from a list, you re really getting the address of the embedded linking field. To recover the address of the containing structure, you can use the CONTAINING_RECORD macro. (See Figure 3-7.)
Figure 3-7. The CONTAINING_RECORD macro.
So if you wanted to process and discard all the elements in a singly-linked list, your code would look something like this:
PSINGLE_LIST_ENTRY psLink = PopEntryList(&SingleHead); while (psLink) { PONEWAY psElement = CONTAINING_RECORD(psLink, ONEWAY, linkfield); ExFreePool(psElement); psLink = PopEntryList(&SingleHead); }
Just before the start of this loop, and again after every iteration, you retrieve the current first element of the list by calling PopEntryList. PopEntryList returns the address of the linking field within a ONEWAY structure, or else it returns NULL to signify that the list is empty. Don t just indiscriminately use CONTAINING_RECORD to develop an element address that you then test for NULL you need to test the link field address that PopEntryList returns!
Doubly-Linked Lists
A doubly-linked list links its elements both backward and forward in a circular fashion. See Figure 3-8. That is, starting with any element, you can proceed forward or backward in a circle and get back to the same element. The key feature of a doubly-linked list is that you can add or remove elements anywhere in the list.
Figure 3-8. Topology of a doubly-linked list.
Table 3-6 lists the service functions you use to manage a doubly-linked list.
Service Function or Macro | Description |
InitializeListHead | Initializes the LIST_ENTRY at the head of the list |
InsertHeadList | Inserts element at the beginning |
InsertTailList | Inserts element at the end |
IsListEmpty | Determines whether list is empty |
RemoveEntryList | Removes element |
RemoveHeadList | Removes first element |
RemoveTailList | Removes last element |
Here is a fragment of a fictitious program to illustrate how to use some of these functions:
typedef struct _TWOWAY { LIST_ENTRY linkfield; } TWOWAY, *PTWOWAY; LIST_ENTRY DoubleHead; InitializeListHead(&DoubleHead); ASSERT(IsListEmpty(&DoubleHead)); PTWOWAY pdElement = (PTWOWAY) ExAllocatePool(PagedPool, sizeof(TWOWAY)); InsertTailList(&DoubleHead, &pdElement->linkfield); if (!IsListEmpty(&DoubleHead)) { PLIST_ENTRY pdLink = RemoveHeadList(&DoubleHead); pdElement = CONTAINING_RECORD(pdLink, TWOWAY, linkfield); ExFreePool(pdElement); }
InitializeListHead initializes a LIST_ENTRY to point (both backward and forward) to itself. That configuration indicates that the list is empty.
InsertTailList puts an element at the end of the list. Notice that you specify the address of the embedded linking field instead of your own TWOWAY structure. You could call InsertHeadList to put the element at the beginning of the list instead of the end. By supplying the address of the link field in some existing TWOWAY structure, you could put the new element either just after or just before the existing one.
PTWOWAY prev; InsertHeadList(&prev->linkfield, &pdElement->linkfield); PTWOWAY next; InsertTailList(&next->linkfield, &pdElement->linkfield);
Recall that an empty doubly-linked list has the list head pointing to itself, both backward and forward. Use IsListEmpty to simplify making this check. The return value from RemoveXxxList will never be NULL!
RemoveHeadList removes the element at the head of the list and gives you back the address of the linking field inside it. RemoveTailList does the same thing, just with the element at the end of the list instead.
It s important to know the exact way RemoveHeadList and RemoveTailList are implemented if you want to avoid errors. For example, consider the following innocent-looking statement:
if (<some-expr>) pdLink = RemoveHeadList(&DoubleHead);
What I obviously intended with this construction was to conditionally extract the first element from a list. C est raisonnable, n est-ce pas? But no, when you debug this later on, you find that elements keep mysteriously disappearing from the list. You discover that pdLink gets updated only when the if expression is TRUE but that RemoveHeadList seems to get called even when the expression is FALSE.
Mon dieu! What s going on here? Well, RemoveHeadList is really a macro that expands into multiple statements. Here s what the compiler really sees in the above statement:
if (<some-expr>) pdLink = (&DoubleHead)->Flink; {{ PLIST_ENTRY _EX_Blink; PLIST_ENTRY _EX_Flink; _EX_Flink = ((&DoubleHead)->Flink)->Flink; _EX_Blink = ((&DoubleHead)->Flink)->Blink; _EX_Blink->Flink = _EX_Flink; _EX_Flink->Blink = _EX_Blink; }}
Aha! Now the reason for the mysterious disappearance of list elements becomes clear. The TRUE branch of the if statement consists of just the single statement pdLink = (&DoubleHead)->Flink, which stores a pointer to the first element. The logic that removes a list element stands alone outside the scope of the if statement and is therefore always executed. Both RemoveHeadList and RemoveTailList amount to an expression plus a compound statement, and you dare not use either of them in a spot where the syntax requires an expression or a statement alone. Zut alors!
The other list-manipulation macros don t have this problem, by the way. The difficulty with RemoveHeadList and RemoveTailList arises because they have to return a value and do some list manipulation. The other macros do only one or the other, and they re syntactically safe when used as intended.
Singly-Linked Lists
A singly-linked list links its elements in only one direction, as illustrated in Figure 3-9. Windows XP uses singly-linked lists to implement pushdown stacks, as suggested by the names of the service routines in Table 3-7. Just as was true for doubly-linked lists, these functions are actually implemented as macros in wdm.h, and similar cautions apply. PushEntryList and PopEntryList generate multiple statements, so you can use them only on the right side of an equal sign in a context in which the compiler is expecting multiple statements.
Service Function or Macro | Description |
PushEntryList | Adds element to top of list |
PopEntryList | Removes topmost element |
Figure 3-9. Topology of a singly-linked list.
The following pseudofunction illustrates how to manipulate a singly-linked list:
typedef struct _ONEWAY { SINGLE_LIST_ENTRY linkfield; } ONEWAY, *PONEWAY; SINGLE_LIST_ENTRY SingleHead; SingleHead.Next = NULL; PONEWAY psElement = (PONEWAY) ExAllocatePool(PagedPool, sizeof(ONEWAY)); PushEntryList(&SingleHead, &psElement->linkfield); SINGLE_LIST_ENTRY psLink = PopEntryList(&SingleHead); if (psLink) { psElement = CONTAINING_RECORD(psLink, ONEWAY, linkfield); ExFreePool(psElement); }
Instead of invoking a service function to initialize the head of a singly-linked list, just set the Next field to NULL. Note also the absence of a service function for testing whether this list is empty; just test Next yourself.
PushEntryList puts an element at the head of the list, which is the only part of the list that s directly accessible. Notice that you specify the address of the embedded linking field instead of your own ONEWAY structure.
PopEntryList removes the first entry from the list and gives you back a pointer to the link field inside it. In contrast with doubly-linked lists, a NULL value indicates that the list is empty. In fact, there s no counterpart to IsListEmpty for use with a singly-linked list.
Lookaside Lists
Even employing the best possible algorithms, a heap manager that deals with randomly sized blocks of memory will require some scarce processor time to coalesce adjacent free blocks from time to time. Figure 3-10 illustrates how, when something returns block B to the heap at a time when blocks A and C are already free, the heap manager can combine blocks A, B, and C to form a single large block. The large block is then available to satisfy some later request for a block bigger than any of the original three components.
Figure 3-10. Coalescing adjacent free blocks in a heap.
If you know you re always going to be working with fixed-size blocks of memory, you can craft a much more efficient scheme for managing a heap. You can, for example, preallocate a large block of memory that you subdivide into pieces of the given fixed size. Then you can devise some scheme for knowing which blocks are free and which are in use, as suggested by Figure 3-11. Returning a block to such a heap merely involves marking it as free you don t need to coalesce it with adjacent blocks because you never need to satisfy randomly sized requests.
Merely allocating a large block that you subdivide might not be the best way to implement a fixed-size heap, though. In general, it s hard to guess how much memory to preallocate. If you guess too high, you ll be wasting memory. If you guess too low, your algorithm will either fail when it runs out (bad!) or make too-frequent trips to a surrounding random heap manager to get space for more blocks (better). Microsoft has created the lookaside list object and a set of adaptive algorithms to deal with these shortcomings.
Figure 3-11. A heap containing fixed-size blocks.
Figure 3-12 illustrates the concept of a lookaside list. Imagine that you had a glass that you could (somehow the laws of physics don t exactly make this easy!) balance upright in a swimming pool. The glass represents the lookaside list object. When you initialize the object, you tell the system how big the memory blocks (water drops, in this analogy) are that you ll be working with. In earlier versions of Windows NT, you could also specify the capacity of the glass, but the operating system now determines that adaptively. To allocate a memory block, the system first tries to remove one from the list (remove a water drop from the glass). If there are no more, the system dips into the surrounding memory pool. Conversely, to return a memory block, the system first tries to put it back on the list (add a water drop to the glass). But if the list is full, the block goes back into the pool using the regular heap manager routine (the drop slops over into the swimming pool).
Figure 3-12. Lookaside lists.
The system periodically adjusts the depths of all lookaside lists based on actual usage. The details of the algorithm aren t really important, and they re subject to change in any case. Basically (in the current release, anyway), the system will reduce the depth of lookaside lists that haven t been accessed recently or that aren t forcing pool access at least 5 percent of the time. The depth never goes below 4, however, which is also the initial depth of a new list.
When the Driver Verifier is running, all lookaside lists are set to a depth of zero, which forces all allocation and free calls to go directly to the pool. This action makes it more likely that driver problems involving memory corruption can be caught. Just bear this fact in mind when you re debugging your driver with the Driver Verifier engaged.
Table 3-8 lists the eight service functions that you use when you work with a lookaside list. There are really two sets of four functions, one set for a lookaside list that manages paged memory (the ExXxxPagedLookasideList set) and another for a lookaside list that manages nonpaged memory (the ExXxxNPagedLookasideList set). The first thing you must do is reserve nonpaged memory for a PAGED_LOOKASIDE_LIST or an NPAGED_LOOKASIDE_LIST object. Even the paged variety of object needs to be in nonpaged memory because the system will access the list object itself at an elevated IRQL.
Service Function | Description |
ExInitializeNPagedLookasideList ExInitializePagedLookasideList | Initialize a lookaside list |
ExAllocateFromNPagedLookasideList ExAllocateFromPagedLookasideList | Allocate a fixed-size block |
ExFreeToNPagedLookasideList ExFreeToPagedLookasideList | Release a block back to a lookaside list |
ExDeleteNPagedLookasideList ExDeletePagedLookasideList | Destroy a lookaside list |
After reserving storage for the lookaside list object somewhere, you call the appropriate initialization routine:
PPAGED_LOOKASIDE_LIST pagedlist; PNPAGED_LOOKASIDE_LIST nonpagedlist; ExInitializePagedLookasideList(pagedlist, Allocate, Free, 0, blocksize, tag, 0); ExInitializeNPagedLookasideList(nonpagedlist, Allocate, Free, 0, blocksize, tag, 0);
(The only difference between the two examples is the spelling of the function name and the first argument.)
The first argument to either of these functions points to the [N]PAGED_LOOKASIDE_LIST object for which you ve already reserved space. Allocate and Free are pointers to routines you can write to allocate or release memory from a random heap. You can use NULL for either or both of these parameters, in which case ExAllocatePoolWithTag and ExFreePool will be used, respectively. The blocksize parameter is the size of the memory blocks you will be allocating from the list, and tag is the 32-bit tag value you want placed in front of each such block. The two zero arguments are placeholders for values that you supplied in previous versions of Windows NT but that the system now determines on its own; these values are flags to control the type of allocation and the depth of the lookaside list.
To allocate a memory block from the list, call the appropriate AllocateFrom function:
PVOID p = ExAllocateFromPagedLookasideList(pagedlist); PVOID q = ExAllocateFromNPagedLookasideList(nonpagedlist);
To put a block back onto the list, call the appropriate FreeTo function:
ExFreeToPagedLookasideList(pagedlist, p); ExFreeToNPagedLookasideList(nonpagedlist, q);
Finally, to destroy a list, call the appropriate Delete function:
ExDeletePagedLookasidelist(pagedlist); ExDeleteNPagedLookasideList(nonpagedlist);
It is very important for you to explicitly delete a lookaside list before allowing the list object to pass out of scope. I m told that a common programming mistake is to place a lookaside list object in a device extension and then forget to delete the object before calling IoDeleteDevice. If you make this mistake, the next time the system runs through its list of lookaside lists to tune their depths, it will put its foot down on the spot where your list object used to be, probably with bad results.