Driver Memory Allocation | The Windows 2000 Device Driver Book: A Guide for Programmers (2nd Edition)

< BACK NEXT >

[oR]

An important aspect of programming concerns the need to allocate storage. Unfortunately, drivers don't have the luxury of making simple calls to malloc and free, or new and delete. Instead, care must be taken to ensure that memory of the right type is allocated. Drivers must also be sure to release any memory they allocate, since there is no automatic cleanup mechanism for kernel-mode code. This section describes techniques a driver can use to work with temporary storage.

Memory Available to Drivers

There are three options for allocating temporary storage in a driver. The criteria used to select the right storage type depends on factors such as duration, size, and from what IRQL level the code which accesses it will be running. The available options are

Kernel stack.
The kernel stack provides limited amounts of nonpaged storage for local variables during the execution of driver routines.
Paged pool.
Driver routines running below DISPATCH_LEVEL IRQL can use a heap area called paged pool. As the name implies, memory in this area is pagable, and a page fault can occur when it is accessed.
Nonpaged pool.
Driver routines running at elevated IRQLs need to allocate temporary storage from another heap area called nonpaged pool. The system guarantees that the virtual memory in nonpaged pool is always physically resident. The device and controller extensions created by the I/O Manager come from this pool area.

Because a driver must be reentrant, global variables are almost never declared. The only exception occurs for read-only data. Otherwise, the attempt by one thread of execution to store into the global variable interferes with another thread's attempt to read or write the same data.

Of course, local static variables in a driver function are just as bad. State for a driver must be kept elsewhere, such as in a device extension as just described.

Working with the Kernel Stack

On the x86 platform, the kernel stack is only 12 KB in size. On other platforms, the stack size is 16 KB. Therefore, the kernel stack must be considered a precious resource. Overflowing the kernel stack will cause an exception something to be avoided at all costs in kernel mode. To avoid kernel stack overflowing, follow these guidelines.

Don't design a driver in such a way that internal routines are deeply nested. Keep the call tree as flat as possible.
Avoid recursion, but where required, limit the depth of recursion. Drivers are not the place to be calculating Fibonacci series using a recursive algorithm.
Do not use the kernel stack to build large data structures. Use one of the pool areas instead.

Another characteristic of the kernel stack is that it lives in cached memory. Therefore, it should not be used for DMA operations. DMA buffers should be allocated from nonpaged pool. Chapter 12 describes DMA caching issues in more detail.

Working with the Pool Areas

To allocate memory in the pool area, drivers use the kernel routines ExAllocatePool and ExFreePool.

These functions allow the following kinds of memory to be allocated:

NonPagedPool
is memory available to driver routines running at all IRQL levels, including DISPATCH_LEVEL IRQL.
NonPagedPoolMustSucceed
is temporary memory that is crucial to the driver's continuing operation. Use this memory for emergencies only and release it as quickly as possible. In fact, since an exception is generated if the requested memory is unavailable, consider never using this option.
NonPagedPoolCacheAligned
is memory that is guaranteed to be aligned on the natural boundary of the CPU data-cache line. A driver might use this kind of memory for a permanent I/O buffer.
NonPagedPoolCacheAlignedMustS
is storage for a temporary I/O buffer that is crucial to the operation of the driver. The S at the end of the request name stands for succeed. As with the previous MustSucceed option, this request should probably never be used.
PagedPool
is memory available only to driver routines running below DISPATCH_LEVEL IRQL. Normally, this includes the driver's initialization, cleanup, and Dispatch routines and any kernel-mode threads the driver is using.
PagedPoolCacheAligned
is I/O buffer memory used by file system drivers.

There are several things to keep in mind when working with the system memory areas. First and foremost, the pools are a precious system resource, and their use should not be extravagant. This is especially true of the NonPaged area.

Second, a driver must be executing at or below DISPATCH_LEVEL IRQL when allocating or freeing nonpaged memory. A driver must be executing at or below APC_LEVEL IRQL to allocate or free from the paged pool.

Finally, release any memory as soon as it is no longer needed. Otherwise, overall system performance is impacted because of low memory conditions. In particular, be sure to give back pool memory when a driver is unloaded.

System Support for Memory Suballocation

Generally, a driver should avoid constantly allocating and releasing small blocks of pool memory. Small is defined as in a request smaller than PAGE_SIZE bytes. Such requests fragment the pool areas and can make it impossible for other kernel-mode code to allocate memory. If such requests are unavoidable in a driver design, consider allocating a single, large chunk of pool and provide private suballocation routines for the driver to use.

In fact, a clever C programmer could write private versions of malloc and free that operate against a large pool area. A C++ programmer could override the new and delete operators for such purpose.

Some drivers need to manage a collection of small, fixed-size memory blocks. A SCSI driver, for example, must maintain a supply of SCSI request blocks (SRBs), which are used to send commands to a SCSI device. The kernel provides two different mechanisms that can be used to handle the details of suballocation.

ZONE BUFFERS

A zone buffer is just a chunk of driver-allocated pool. Executive routines provide management services of collections of fixed-size blocks in paged or nonpaged memory.

The use of zone buffers requires careful synchronization planning. In particular, if an Interrupt Service, DPC, and/or Dispatch routine all need access to the same zone buffer, an Executive spin lock must be used to guarantee noninterference. If the accessing routines all operate at the same IRQL level, a fast mutex can be used instead. Spin locks are described later in this chapter. Fast mutexes are described in chapter 14.

To set up a zone buffer, a structure of type ZONE_HEADER must be declared. The spin lock or fast mutex object may also need to be declared and initialized. The following steps describe the entire process of managing a zone buffer.

Call ExAllocatePool to claim space for the zone buffer itself. Then initialize the zone buffer with ExInitializeZone. Typically, these steps are performed in the DriverEntry routine.
To allocate a block from a zone, call either ExAllocateFromZone or ExInterlockedAllocateFromZone. The interlocked version of the function uses a spin lock to synchronize access to the zone buffer. The noninterlocked function leaves synchronization entirely up to the driver code.
To release a block back to the zone, use either ExFreeToZone or InterlockedFreeToZone. Again, the interlocked version of the function synchronizes access to the zone.
In the driver's Unload routine, use ExFreePool to release the memory used for the entire zone buffer. A driver must ensure that no blocks from the zone are in use when the deallocation occurs.

A zone buffer should be no larger than necessary to keep memory usage to a minimum. A dynamic approach to sizing the zone buffer would be to use the function MmQuerySystemSize to discover the total amount of system memory available. Another Executive function, MmIsThisAnNTAsSystem, checks whether the current platform is running a server version of Windows 2000 (Server or Advanced Server). Drivers running in server environments could allocate a larger zone buffer size to meet the expected higher I/O demands of the server.

If the request to allocate a block from a zone buffer fails, the driver could use the standard pools to grant the requested block instead. This strategy requires a clever structure to indicate whether an allocation came from the zone or the pool. The appropriate deallocation routine must be called to release the block.

An existing zone buffer can be enlarged by calling ExExtendZone or ExInterlockedExtendZone, but these routines should be used infrequently. The system does not appear to deallocate memory from extended zones correctly. In fact, the entire zone buffer abstraction is considered obsolete. Windows 2000 provides a more efficient mechanism, which is described in the next section.

LOOKASIDE LISTS

A lookaside list is a linked list of fixed-size memory blocks. Unlike zone buffers, lookaside lists can grow and shrink dynamically in response to changing system conditions. Therefore, properly sized lookaside lists are less likely to waste memory than zone buffers.

Compared to zone buffers, the synchronization mechanism used with lookaside lists is also more efficient. If the CPU architecture has an 8-byte compare-exchange instruction, the Executive uses it to serialize access to the list. On platforms without such an instruction, it reverts to using a spin lock in nonpaged pool and a fast mutex for lists in paged pool.

To use a lookaside list, a structure of type NPAGED_LOOKASIDE_LIST or PAGED_LOOKASIDE_LIST (depending on whether the list is nonpaged or paged must be allocated). The following steps describe the process of lookaside list management:

Use either the ExInitializeNPagedLookasideList or ExInitializePagedLookasideList function to initialize the list header structure. Normally, the DriverEntry or AddDevice routine performs this task.
Call either ExAllocateNPagedLookasideList or ExAllocatePagedLookasideList to allocate a block from a lookaside list. These calls are invoked throughout the life of the driver.
Call ExFreeToNPageLookasideList or ExFreeToPageLookasideList to release a block.
Use ExDeleteNPagedLookasideList or ExDeletePagedLookasideList to release any resources associated with the lookaside list. Usually this is a function invoked from the driver's Unload or RemoveDevice routine.

The lookaside list initialize functions simply set up the list headers. They do not actually allocate memory for the list. The initialization functions require that the maximum number of blocks that the list can hold be specified. This is referred to as the depth of the list.

When using the allocation functions, the system allocates memory as needed. As blocks are freed, they are chained to the lookaside list up to the maximum allowable depth. After that, any additional blocks freed results in memory being released back to the system. Thus, after a while, the number of available blocks in the lookaside list tend to remain near the depth of the list.

The depth of the lookaside list should be chosen carefully. If too shallow, the system is performing expensive allocation and deallocation operations too often. If too deep, memory is wasted. Statistics are maintained in the list header structure and can help determine a proper value for the depth of the list.

< BACK NEXT >