Kernel-mode programming is something every driver developer should be familiar with. However, some aspects of kernel-mode programming are quite different from user-mode programming. In addition, kernel-mode programming often requires you to do familiar things in a different and frequently much more careful way.
For example, when total memory usage exceeds physical memory, Windows writes excess memory pages to the hard drive, so that a virtual address no longer corresponds to a physical memory address. If a routine attempts to access one of these pages, it causes a page fault, which notifies Windows to reacquire the physical memory. User-mode applications and services are typically not affected by page faults, apart from a slight delay while the pages are read back into memory. In kernel mode, however, page faults that occur under certain circumstances can crash the system.
This section describes the basics of kernel-mode programming and points out some common pitfalls.
Operating systems must have a mechanism to respond efficiently to hardware events. This mechanism is called an interrupt. Each hardware event is associated with an interrupt. One interrupt prompts the clock and scheduler to run, another prompts the keyboard driver to run, and so on. When a hardware event occurs, Windows delivers the associated interrupt to a processor.
When an interrupt is delivered to a processor, the system does not create a new thread to service the interrupt. Instead, Windows interrupts an existing thread for the short time that is necessary for the service routine to handle the interrupt. When the interrupt-related processing is complete, the system returns the thread to its original owner.
Interrupts are not necessarily triggered directly by hardware events. Windows also supports software interrupts in the form of deferred procedure calls (DPCs). Windows schedules a DPC by delivering an interrupt to the appropriate processor. Kernel-mode drivers use DPCs for purposes such as handling the time-consuming aspects of processing a hardware interrupt.
Each interrupt has an associated level-called an IRQL-that governs much of kernel-mode programming. The system uses the different IRQL values to ensure that the most important and time-sensitive interrupts are handled first. A service routine runs at the IRQL that is assigned to the associated interrupt. When an interrupt occurs, the system locates the correct service routine and assigns it to a processor.
The IRQL at which the processor is currently running determines when the service routine runs and whether it can interrupt the execution of the thread that is currently running on the processor. The basic principle is that the highest IRQL has priority. When a processor receives an interrupt, the following occurs:
If an interrupt's IRQL is greater than that of the processor, the system raises the processor's IRQL to the interrupt's level.
The code that was executing on that processor is paused and does not resume until the service routine is finished and the processor's IRQL drops back to its original value. The service routine can, in turn, be interrupted by another service routine with an even higher IRQL.
If an interrupt's IRQL is equal to the processor's IRQL, the service routine must wait until any preceding routines with the same IRQL have finished.
The routine then runs to completion, unless an interrupt arrives with an even higher IRQL.
If an interrupt's IRQL is less than the processor's current IRQL, the service routine must wait until all interrupts with a higher IRQL have been serviced.
The preceding list describes how driver routines with different IRQLs run on a particular processor. However, modern systems typically have two or more processors. It's quite possible for a driver to have routines with different IRQLs running at the same time on different processors. This situation can lead to deadlocks if the routines are not properly synchronized.
Note IRQLs are completely different from thread priorities. The system uses thread priorities to manage thread dispatching during normal processing. An interrupt, by definition, is something that falls outside the realm of normal processing and must be serviced as quickly as possible. A processor resumes normal thread processing only after all outstanding interrupts have been serviced.
Each IRQL has an associated numerical value. However, the values vary for different CPU architectures, so IRQLs are usually referred to by name. Only some IRQLs are used by drivers; the remaining IRQLs are reserved for system use. The following IRQLs are the ones most commonly used by drivers. They are listed in order, starting with the lowest value:
PASSIVE_LEVEL This is the lowest IRQL and is the default value that is assigned to normal thread processing. It is the only IRQL that is not associated with an interrupt. All user-mode applications run at PASSIVE_LEVEL, as do low-priority driver routines. Routines running at PASSIVE_LEVEL can access all the core Windows services.
DISPATCH_LEVEL This is the highest IRQL associated with a software interrupt. DPCs and higher priority driver routines run at DISPATCH_LEVEL. Routines running at DISPATCH_LEVEL have access to only a limited subset of the core Windows services.
DIRQL This IRQL is greater than DISPATCH_LEVEL and is the highest IRQL that drivers typically deal with. It is actually a collective term for the set of IRQLs that are associated with hardware interrupts, also called "device IRQLs." The PnP manager assigns a DIRQL to each device and passes it to the associated driver during startup. The exact level is not usually that important. What you need to know for most purposes is that you are running at DIRQL and your service routine is blocking almost anything else from running on that processor. Routines running at DIRQL have access to only a small number of core Windows services.
Keeping track of what IRQL your routines can or must run at is one of the keys to good driver programming. If you are careless, the system is likely to crash. The WDK includes tools such as Driver Verifier and SDV to help you to catch such errors.
Tip The WDK describes the IRQLs at which a driver routine can run and the IRQL at which DDI routines can be called. See "Managing Hardware Priorities" in the WDK-online at http://go.microsoft.com/fwlink/?LinkId=79339.
Chapter 15, "Scheduling, Thread Context, and IRQL," has more information about IRQLs.
Drivers are normally multithreaded in the sense that they are reentrant; different routines can be called on different threads. However, threads and synchronization work quite differently with drivers than with applications.
Chapter 3, "WDF Fundamentals," introduces WDF concepts for threads and synchronization.
Applications normally create and control their own threads. In contrast, kernel-mode driver routines usually do not create threads nor do they typically run on threads that are created specifically for their use. Drivers often operate much like DLLs. For example, if an application initiates an I/O request, the dispatch routine that receives the request is likely to be running on the thread that initiated the request, in a known process context. However, most driver routines do not know their process context and run on an arbitrary thread.
When a routine runs on an arbitrary thread, the system essentially borrows the thread that is running on the assigned processor and uses it to run the driver routine. A driver running on an arbitrary thread cannot depend on any association between the thread on which the driver is running and the thread that issued the I/O request. For example, if the driver that received the read request must pass the request to a lower driver or handle an interrupt, those routines run on arbitrary thread.
The following example shows how threads are assigned during a typical I/O request:
An application calls DeviceIoControl on thread A to send a device control request to the driver.
The I/O manager calls the appropriate dispatch routine, also on thread A.
The driver programs the device to satisfy the request and returns control to the application.
Later, when the device is ready, it signals an interrupt, which is handled by an ISR. There's no way for the driver to control which thread is running on the processor at the time the interrupt occurs, so the ISR runs in an arbitrary thread context.
The ISR queues a DPC to complete the request at a lower IRQL. The DPC also runs in an arbitrary thread context.
By chance, the DPC might run in the thread context of the application that issued the request, but the DPC could just as easily run in a thread from another process entirely. Generally, it's safest to assume that a driver routine runs in an arbitrary thread context.
Some implications of arbitrary thread context include the following:
Different routines typically run on different threads. On a multiprocessor system, those two routines could very well run at the same time on different processors.
Routines should take no more time than is appropriate for their IRQL. The routine also blocks whatever was running on the thread before the system borrowed it. In general, the higher the IRQL, the greater the impact and the quicker a routine should finish. Driver routines that run at IRQL>=DISPATCH_LEVEL, especially DIRQL, should avoid actions such as long stalls or endless loops that could block the borrowed thread indefinitely.
Driver routines can run concurrently on different threads, so synchronization techniques are essential for avoiding race conditions. For example, if two routines try to modify a linked list at the same time, they could produce a corrupted list. Because Windows uses a preemptive scheduler, race conditions can occur on both single-and multi-processor systems. However, the potential for race conditions is higher on a multiprocessor system because more driver routines can be running at the same time than on a single-processor system.
The principles of synchronization are essentially the same in both kernel mode and user mode. However, kernel mode introduces an additional complication: the available synchronization techniques depend on IRQL.
A number of synchronization options are available to routines running at PASSIVE_LEVEL, including the following:
Kernel dispatcher objects These objects-event, semaphore, and mutex-are a collection of synchronization objects that are commonly used by routines running on nonarbitrary threads at PASSIVE_LEVEL.
Note Events are used for purposes other than synchronization. They are also used as a signaling method for such tasks as I/O completion handling.
Fast mutex and resource objects These objects are built on top of kernel dispatcher objects.
Driver routines often run at DISPATCH_LEVEL and sometimes at DIRQL. The primary synchronization tool for DISPATCH_LEVEL routines is an object called a spin lock. The term derives from the fact that, while one thread owns a spin lock, any other threads that are waiting to acquire the lock "spin" until the lock is available.
Spin locks can be used in an arbitrary thread context at or below DISPATCH_LEVEL. To protect a resource, you acquire the spin lock, use the resource as needed, and then release the spin lock. You typically create a spin lock object when you create the object that the lock will protect and store the spin lock object for later use.
When a routine acquires a spin lock, its IRQL is raised to DISPATCH_LEVEL if it is not already running at that level. The IRQL returns to its previous level when the routine releases the lock.
ISRs must often be synchronized with an associated DPC and sometimes with other ISRs. However, ISRs run at DIRQL, so they cannot use regular spin locks. Instead, ISRs must use an interrupt spin lock. This object is used in exactly the same way as a spin lock, but it raises the processor's IRQL to match the IRQL of the ISR instead of to DISPATCH_LEVEL.
Chapter 15, "Scheduling, Thread Context, and IRQL," explores related techniques in detail.
Race conditions occur when two or more routines attempt to manipulate the same data at the same time. Undiagnosed race conditions are a common cause of driver instability. However, potential race conditions can be managed to prevent such problems. The following are the two basic approaches:
Synchronization Uses synchronization objects to protect access to shared data.
Serialization Prevents race conditions from occurring by queuing the requests that must access the shared data to guarantee that they are always handled in sequence, never at the same time.
Timing issues can be much more critical with drivers than with applications, especially on multiprocessor systems. For example, during the early stages of driver development, developers often use checked builds of the driver and Windows, which are relatively slow. If there's a potential race condition, the slow execution speed of a checked build may prevent it from actually occurring. When you move to the free build, the race condition can suddenly appear. Chapter 10, "Synchronization," describes how to manage race conditions.
Kernel-mode drivers use memory in a way that is distinctly different from the way in which user-mode processes use memory. Both modes have a virtual address space that the system maps to physical addresses, and both use similar techniques to allocate and manage memory. However:
Kernel-mode components share a virtual address space.
This is similar to the way in which all DLLs loaded by a process share the process's virtual address space. Unlike user mode, in which each process has its own virtual address space, the shared address space in kernel mode means that kernel-mode drivers can corrupt each other's memory as well as system memory.
User-mode processes cannot access kernel-mode addresses.
Kernel-mode processes can access user-mode addresses, but this must be done carefully in the correct application context.
A pointer to a user-mode address does not have a straightforward meaning in kernel mode, and a mishandled user-mode pointer can create a security hole or even cause a system crash. Safely handling user-mode addresses requires more work than simply dereferencing a pointer.
Chapter 8, "I/O Flow and Dispatching," describes how WDF drivers can probe and lock user-mode buffers so that they can be handled safely.
Drivers must efficiently manage their use of memory. In general, the task of managing memory is much the same as for any program: a driver must allocate and free memory, use stack memory appropriately, handle faults, and so on. However, the kernel-mode environment imposes some additional constraints.
Faults pose a particular problem for kernel-mode drivers. An invalid access fault can cause a bug check and crash the system. Page faults are a particular problem because the impact of a page fault depends on IRQL. In particular:
For routines running at IRQL>=DISPATCH_LEVEL, a page fault causes a bug check. Page faults at IRQL=DISPATCH_LEVEL are a common cause of driver failure.
For routines running at IRQL<DISPATCH_LEVEL, a page fault is usually not a problem. At worst, there's a slight delay while the page is read back into memory. However, if a driver encounters another page fault while servicing a page fault, the resources required to service the second page fault might already be dedicated to the first page fault. This creates a deadlock and a double-fault crash.
When a kernel-mode component encounters a page fault at DISPATCH_LEVEL or higher, the bug check code is IRQL_NOT_LESS_OR_EQUAL. PREfast and SDV can find many of these mistakes in your driver source code.
Applications typically use a heap for purposes such as allocating large blocks of memory or for memory allocations that must persist for an extended period of time. Kernel-mode drivers allocate memory for those purposes in much the same way. However, because of the problems that page faults can cause in kernel mode, drivers must use two different heaps-called memory pools-for this type of memory allocation:
Paged pool These memory pages can be paged out to the hard drive as appropriate.
Nonpaged pool These pages are always resident in memory and can be used to store data that must be accessed at DISPATCH_LEVEL or higher.
Memory in the paged or nonpaged pools is often referred to as pageable or nonpageable memory, respectively. Make sure that everything is allocated from the correct pool. You should use the paged pool for data or code that will only be accessed at IRQL<DISPATCH_LEVEL. However, the nonpaged pool is a limited resource that's shared with all other kernel-mode processes, so you should use it sparingly.
Chapter 12, "WDF Support Objects," discusses memory allocation.
Kernel-mode routines have a stack, which works in the same way as a user-mode stack. However, the maximum kernel stack size is quite small-much smaller than the maximum user-mode stack size-and does not grow, so use it sparingly. Here are some tips for using the kernel stack:
Recursive routines must be especially careful about stack usage. In general, drivers should avoid using recursive routines.
The disk I/O path is very sensitive to stack usage, because it's a deeply layered path and often requires the handling of one or two page faults.
Drivers that pass around large data objects typically do not put them on the stack.
Instead, the driver allocates nonpaged memory for the objects and passes a pointer to that memory. This is possible because all kernel-mode processes share a common address space.
The kernel stack is your best friend and your worst enemy when it comes to memory management. There are many situations where placing a data structure on the stack rather than in a pool results in a dramatic simplification of your driver. However, the kernel stack is very small, so you also need to be careful about placing too much data on the stack. Drivers at the bottom of very deep call stacks (like those that service storage devices) need to be particularly careful, because the memory manager and file system drivers above them can hog a significant portion of the stack space.
-Peter Wieland, Windows Driver Foundation Team, Microsoft
Info See the driver tip "How do I keep my driver from running out of kernel-mode stack?" on the WHDC Web site-online at http://go.microsoft.com/fwlink/?LinkId=79604.
Drivers must often transfer data in buffers to or from clients or devices. Sometimes a data buffer is simply a pointer to a region of memory. However, kernel-mode data buffers are often in the form of an MDL, which has no counterpart in user mode. An MDL is a structure that describes the buffer and contains a list of the locked pages in kernel memory that constitute the buffer.
Chapter 8, "I/O Flow and Dispatching," explains how to handle MDLs.