Section 1.5. Anatomy of a Windows Device Driver | Inside Windows Storage: Server Storage Technologies for Windows 2000, Windows Server 2003 and Beyond

1.5 Anatomy of a Windows Device Driver

All Windows device drivers have a similar structure. Each driver has a driver object that is created by the I/O Manager when the driver is loaded. Section 1.4 discussed structures related to the device driver, including driver objects. This section discusses the routines that a driver implements, as well as some other characteristic behavior of a storage device driver.

A Windows device driver implements a variety of standard routines, some of which are mandatory and some of which are optional. The definitions of mandatory and optional depend on the nature of the driver. These standard routines include the following:

A mandatory initialization routine , by which a driver performs its housekeeping and initializes and configures device objects (including attaching them to the appropriate driver stack chains) as required. This routine is called by the I/O Manager when a driver is loaded.
A mandatory set of dispatch routines to accomplish specific functionality, such as read, write, create, and close. These routines are called by the I/O Manager and are passed an IRP as a parameter.
An optional startup routine ( StartIO ) that initiates the I/O to a physical device. Obviously only drivers that deal with a physical device (and not necessarily all such drivers) will need this.
An optional interrupt service routine ( ISR ). Drivers that control a physical device may have this. ISRs are described in Section 1.5.1.
An optional deferred procedure call ( DPC ), which drivers may use to handle postprocessing of an ISR. DPCs are described in Section 1.5.2.
An optional completion routine that is called by the I/O Manager (as a notification mechanism) when a lower-level driver completes an IRP. Because all I/O is handled as asynchronous I/O, the completion routine is required quite often, especially for higher-level drivers that always depend on a lower-level driver to complete an IRP.
A mandatory unload routine that is called by the I/O manager to unload a driver.
An optional cancellation routine ( CancelIO ) that is called by the I/O manager to cancel an outstanding operation.
A mandatory system shutdown notification routine that is called by the I/O Manager to notify a driver that it must quickly complete any essential housekeeping when the user requests the system to be powered down.
An optional error-logging routine .

When processing an IRP, a driver may behave in one of many different ways, depending on the nature of the driver and the nature of the I/O request in the IRP. Here are some examples of driver behavior:

Performing the requested operation and completing the IRP.
Performing part of the operation and passing the IRP to a lower-level driver.
Simply passing the IRP to a lower-level driver.
Generating multiple IRPs to a lower-level driver in response to a single IRP. For example, in response to a file open request received by the NTFS file system driver, the driver may need to read some file system metadata to locate the directory and succeeding subdirectories under which the file is located.

Drivers typically access the IRP stack location, as well as the IRP stack location for the next driver. The lowest driver in a stack chain accesses only its own IRP stack location. A driver is responsible for manipulating a pointer in the IRP that points to the stack location that the next driver should be looking at.

Note that the same driver code may be running simultaneously on different CPUs within the same Windows NT system. The driver code needs to be able to synchronize access to critical data from code running on different CPUs. For example, if a driver has a linked list of work items, the code to pull an item off the queue or to add an item to the queue must be such that it will work correctly even when running the same code is being attempted simultaneously on different CPUs. Executing the same request twice can be disastrous at times ”for example, writing the same record twice to tape.

1.5.1 The Interrupt Service Routine

An interrupt service routine ( ISR ) is normally executed in response to an interrupt from a hardware device and can preempt any code executing with a lesser priority. An ISR must do the bare minimum to service the interrupt so that the CPU can be available for servicing other interrupts. The ISR collects the bare minimum of information that it needs and queues a deferred processing call (DPC) to finish servicing the interrupt. The DPC is scheduled to run at an unspecified time, which may be immediately or a little later, depending on what other processing is required.

To ensure that ISRs are always available to service the interrupt, they are never swapped out to the disk. An ISR can be interrupted by a higher-priority ISR, but it can never be preempted by anything else, such as a DPC.

ISRs are typically required for drivers that own a piece of hardware such as a tape or disk driver, but typically a driver that implements only some software functionality, such as a file system driver or a filter driver, will not have an ISR.

1.5.2 The Deferred Processing Call

When an ISR is executing, it needs to accomplish its task quickly and efficiently . Thus an ISR does the bare minimum and then queues a request to a deferred processing call ( DPC ) to accomplish the remaining work at a lower privilege level (these levels are sometimes referred to as IRQ levels, or IRQLs). DPCs may also be queued from code other than ISRs. The queue request creates a new DPC object (via the services of the Object Manager). After the queuing, a hardware request for a DPC interrupt (IRQ level 2) is generated.

Here are some important points about DPCs:

A DPC can be interrupted by another ISR, but it can never be preempted by user mode code.
A DPC cannot cause page faults, so all memory that a DPC accesses must be locked down in physical memory.
A DPC may not take any action that causes it to block ”for example, causing I/O.
A DPC is similar to an ISR in that it needs to execute quickly and it needs to be passed control quickly and efficiently. To minimize the overhead of scheduling a DPC, Windows NT saves the bare minimum state before passing control to a DPC. After the DPC has finished executing, the overhead of restoring state is also minimized because very little state was saved in the first place. As a result, a DPC will execute in the context of an arbitrary process. For example, if Excel is running as a process and Excel initiates an I/O, the resulting DPC (if any) may be called in the context of a Word or PowerPoint process (rather than the Excel process).
Each processor has its own queue of DPCs. Thus a four-CPU Windows NT server will have four separate DPC queues. DPCs can have high, medium, or low priority; the default is medium. A driver can change the priority setting. High-priority DPCs are inserted into the beginning of the queue. Low- and medium-priority DPCs are inserted at the end of the queue.
DPCs typically run on the same CPU as the ISR, but a driver can change this behavior.
If a driver already has a DPC queued, the next request to queue a DPC object is simply ignored. When a DPC is run, it needs to figure out if it has multiple work items ”for example, if the interrupt happened multiple times and each interrupt queued up a work item.
A DPC may be queued on another CPU if the DPC queue on a particular CPU exceeds a certain maximum value. The Windows NT kernel periodically attempts to run DPCs by generating software interrupts.
DPCs cannot be paged out to virtual memory.

1.5.3 The Asynchronous Procedure Call

Asynchronous procedure calls ( APCs ) share similarities with DPCs, but they also have some significant differences. Like DPCs, APCs are executed at a privilege level that is higher than that of the regular code. Unlike DPCs, APCs are always executed in the context of particular processes, as compared to DPCs that are executed in the context of a random process. Thus, APCs are not as lightweight as DPCs, because a lot of context may need to be saved and restored. If you are familiar with UNIX, think of APCs as being somewhat similar to UNIX signal handler routines.

There are two types of APCs: kernel mode and user mode. Kernel mode APCs are associated with drivers or other kernel mode code. Kernel mode APCs are typically used for data transfer ”for example, for copying data from a kernel buffer to the user buffer. Recall that the user buffer needs to be accessed in the context of the process that owns the buffer.

User mode code can also have an APC, which can be queued via the QueueUserAPC API documented in the Platform SDK). The user mode APC is delivered only when the thread is a state that allows the APC to be alerted ”for example, blocked as the result of a WaitForSingleObject or WaitForMultipleObject API call. Details of these APIs can be found in the Platform SDK. Suffice it to say that these APIs allow a thread to achieve synchronization.

APCs may block ”for example, for specific I/O. They are queued per thread, implying that there are multiple APC queues.

Top