Interrupt Request Level | Programming the Microsoft Windows Driver Model

Interrupt Request Level

Windows XP assigns an interrupt request level to each hardware interrupt and to a select few software events. Each CPU has its own IRQL. We label the different IRQL levels with names such as PASSIVE_LEVEL, APC_LEVEL, and so on. Figure 4-1 illustrates the range of IRQL values for the x86 platform. (In general, the numeric values of IRQL depend on which platform you re talking about.) Most of the time, the computer executes in user mode at PASSIVE_LEVEL. All of your knowledge about how multitasking operating systems work applies at PASSIVE_LEVEL. That is, the scheduler may preempt a thread at the end of a time slice or because a higher-priority thread has become eligible to run. Threads can also voluntarily block while they wait for events to occur.

figure 4-1 interrupt request levels.

Figure 4-1. Interrupt request levels.

When an interrupt occurs, the kernel raises the IRQL on the interrupting CPU to the level associated with that interrupt. The activity of processing an interrupt can be uh, interrupted to process an interrupt at a higher IRQL but never to process an interrupt at the same or a lower IRQL. I m sorry to use the word interrupt in two slightly different ways here. I struggled to find a word to describe the temporary suspension of an activity that wouldn t cause confusion with thread preemption, and that was the best choice.

What I just said is sufficiently important to be enshrined as a rule:

An activity on a given CPU can be interrupted only by an activity that executes at a higher IRQL.

You have to read this rule the way the computer does. Expiration of a time slice eventually invokes the thread scheduler at DISPATCH_LEVEL. The scheduler can then make a different thread current. When the IRQL returns to PASSIVE_LEVEL, a different thread is running. But it s still true that the first PASSIVE_LEVEL activity wasn t interrupted by the second PASSIVE_LEVEL activity. I thought this interpretation was incredible hair-splitting until it was pointed out to me that this arrangement allows a thread running at APC_LEVEL to be preempted by a different thread running at PASSIVE_LEVEL. Perhaps a more useful statement of the rule is this one:

An activity on a given CPU can be interrupted only by an activity that executes at a higher IRQL. An activity at or above DISPATCH_LEVEL cannot be suspended to perform another activity at or below the then-current IRQL.

Since each CPU has its own IRQL, it s possible for any CPU in a multiprocessor computer to run at an IRQL that s less than or equal to the IRQL of any other CPU. In the next major section, I ll tell you about spin locks, which combine the within-a-CPU synchronizing behavior of an IRQL with a multiprocessor lockout mechanism. For the time being, though, I m talking just about what happens on a single CPU.

To repeat something I just said, user-mode programs execute at PASSIVE_LEVEL. When a user-mode program calls a function in the native API, the CPU switches to kernel mode but continues to run at PASSIVE_LEVEL in the same thread context. Many times, the native API function calls an entry point in a driver without raising the IRQL. Driver dispatch routines for most types of I/O request packet (IRP) execute at PASSIVE_LEVEL. In addition, certain driver subroutines, such as DriverEntry and AddDevice, execute at PASSIVE_LEVEL in the context of a system thread. In all of these cases, the driver code can be preempted just as a user-mode application can be.

Certain common driver routines execute at DISPATCH_LEVEL, which is higher than PASSIVE_LEVEL. These include the StartIo routine, deferred procedure call (DPC) routines, and many others. What they have in common is a need to access fields in the device object and the device extension without interference from driver dispatch routines and one another. When one of these routines is running, the rule stated earlier guarantees that no thread can preempt it on the same CPU to execute a driver dispatch routine because the dispatch routine runs at a lower IRQL. Furthermore, no thread can preempt it to run another of these special routines because that other routine will run at the same IRQL.

NOTE
Dispatch routine and DISPATCH_LEVEL unfortunately have similar names. Dispatch routines are so called because the I/O Manager dispatches I/O requests to them. DISPATCH_LEVEL is so called because it s the IRQL at which the kernel s thread dispatcher originally ran when deciding which thread to run next. (The thread dispatcher runs at SYNCH_LEVEL, if you care. This is the same as DISPATCH_LEVEL on a uniprocessor machine, if you really care.)

Between DISPATCH_LEVEL and PROFILE_LEVEL is room for various hardware interrupt levels. In general, each device that generates interrupts has an IRQL that defines its interrupt priority vis- -vis other devices. A WDM driver discovers the IRQL for its interrupt when it receives an IRP_MJ_PNP request with the minor function code IRP_MN_START_DEVICE. The device s interrupt level is one of the many items of configuration information passed as a parameter to this request. We often refer to this level as the device IRQL, or DIRQL for short. DIRQL isn t a single request level. Rather, it s the IRQL for the interrupt associated with whichever device is under discussion at the time.

The other IRQL levels have meanings that sometimes depend on the parti cular CPU architecture. Since those levels are used internally by the kernel, their meanings aren t especially germane to the job of writing a device driver. The purpose of APC_LEVEL, for example, is to allow the system to schedule an asynchronous procedure call (APC), which I ll describe in detail later in this chapter. Operations that occur at HIGH_LEVEL include taking a memory snapshot just prior to hibernating the computer, processing a bug check, handling a totally spurious interrupt, and others. I m not going to attempt to provide an exhaustive list here because, as I said, you and I don t really need to know all the details.

To summarize, drivers are normally concerned with three interrupt request levels:

PASSIVE_LEVEL, at which many dispatch routines and a few special routines execute
DISPATCH_LEVEL, at which StartIo and DPC routines execute
DIRQL, at which an interrupt service routine executes

IRQL in Operation

To illustrate the importance of IRQL, refer to Figure 4-2, which illustrates a possible time sequence of events on a single CPU. At the beginning of the sequence, the CPU is executing at PASSIVE_LEVEL. At time t1, an interrupt arrives whose service routine executes at IRQL-1, one of the levels between DISPATCH_LEVEL and PROFILE_LEVEL. Then, at time t2, another interrupt arrives whose service routine executes at IRQL-2, which is less than IRQL-1. Because of the rule already discussed, the CPU continues servicing the first interrupt. When the first interrupt service routine completes at time t3, it might request a DPC. DPC routines execute at DISPATCH_LEVEL. Consequently, the highest priority pending activity is the service routine for the second interrupt, which therefore executes next. When it finishes at t4, assuming nothing else has occurred in the meantime, the DPC will run at DISPATCH_LEVEL. When the DPC routine finishes at t5, IRQL can drop back to PASSIVE_LEVEL.

figure 4-2 interrupt priority in action.

Figure 4-2. Interrupt priority in action.

IRQL Compared with Thread Priorities

Thread priority is a very different concept from IRQL. Thread priority controls the actions of the scheduler in deciding when to preempt running threads and what thread to start running next. The only priority that means anything at IRQLs above APC_LEVEL is IRQL itself, and it controls which programs can execute rather than the thread context within which they execute.

IRQL and Paging

One consequence of running at elevated IRQL is that the system becomes incapable of servicing page faults. The rule this fact implies is simply stated:

Code executing at or above DISPATCH_LEVEL must not cause page faults.

One implication of this rule is that any of the subroutines in your driver that execute at or above DISPATCH_LEVEL must be in nonpaged memory. Furthermore, all the data you access in such a subroutine must also be in nonpaged memory. Finally, as IRQL rises, fewer and fewer kernel-mode support routines are available for your use.

The DDK documentation explicitly states the IRQL restrictions on support routines. For example, the entry for KeWaitForSingleObject indicates two restrictions:

The caller must be running at or below DISPATCH_LEVEL.
If a nonzero timeout period is specified in the call, the caller must be running strictly below DISPATCH_LEVEL.

Reading between the lines, what is being said here is this: if the call to KeWaitForSingleObject might conceivably block for any period of time (that is, you ve specified a nonzero timeout), you must be below DISPATCH_LEVEL, where thread blocking is permitted. If all you want to do is check to see whether an event has been signaled, however, you can be at DISPATCH_LEVEL. You can t call this routine at all from an interrupt service routine or other routine running above DISPATCH_LEVEL.

For the sake of completeness, it s well to point out that the rule against page faults is really a rule prohibiting any sort of hardware exception, including page faults, divide checks, bounds exceptions, and so on. Software exceptions, like quota violations and probe failures on nonpaged memory, are permissible. Thus, it s acceptable to call ExAllocatePoolWithQuota to allocate nonpaged memory at DISPATCH_LEVEL.

Implicitly Controlling IRQL

Most of the time, the system calls the routines in your driver at the correct IRQL for the activities you re supposed to carry out. Although I haven t discussed many of these routines in detail, I want to give you an example of what I mean. Your first encounter with a new I/O request occurs when the I/O Manager calls one of your dispatch routines to process an IRP. The call usually occurs at PASSIVE_LEVEL because you might need to block the calling thread and you might need to call any support routine at all. You can t block a thread at a higher IRQL, of course, and PASSIVE_LEVEL is the level at which there are the fewest restrictions on the support routines you can call.

NOTE
Driver dispatch routines usually execute at PASSIVE_LEVEL but not always. You can designate that you want to receive IRP_MJ_POWER requests at DISPATCH_LEVEL by setting the DO_POWER_INRUSH flag, or by clearing the DO_POWER_PAGABLE flag, in a device object. Sometimes a driver architecture requires that other drivers be able to send certain IRPs at DISPATCH_LEVEL. The USB bus driver, for example, accepts data transfer requests at DISPATCH_LEVEL or below. A standard serial-port driver accepts any read, write, or control operation at or below DISPATCH_LEVEL.

If your dispatch routine queues the IRP by calling IoStartPacket, your next encounter with the request will be when the I/O Manager calls your StartIo routine. This call occurs at DISPATCH_LEVEL because the system needs to access the queue of I/O requests without interference from the other routines that are inserting and removing IRPs from the queue. As I ll discuss later in this chapter, queue access occurs under protection of a spin lock, and that carries with it execution at DISPATCH_LEVEL.

Later on, your device might generate an interrupt, whereupon your interrupt service routine will be called at DIRQL. It s likely that some registers in your device can t safely be shared. If you access those registers only at DIRQL, you can be sure that no one can interfere with your interrupt service routine (ISR) on a single-CPU computer. If other parts of your driver need to access these crucial hardware registers, you would guarantee that those other parts execute only at DIRQL. The KeSynchronizeExecution service function helps you enforce that rule, and I ll discuss it in Chapter 7 in connection with interrupt handling.

Still later, you might arrange to have a DPC routine called. DPC routines execute at DISPATCH_LEVEL because, among other things, they need to access your IRP queue to remove the next request from a queue and pass it to your StartIo routine. You call the IoStartNextPacket service routine to extract the next request from the queue, and it must be called at DISPATCH_LEVEL. It might call your StartIo routine before returning. Notice how neatly the IRQL requirements dovetail here: queue access, the call to IoStartNextPacket, and the possible call to StartIo are all required to occur at DISPATCH_LEVEL, and that s the level at which the system calls the DPC routine.

Although it s possible for you to explicitly control IRQL (and I ll explain how in the next section), there s seldom any reason to do so because of the correspondence between your needs and the level at which the system calls you. Consequently, you don t need to get hung up on which IRQL you re executing at from moment to moment: it s almost surely the correct level for the work you re supposed to do right then.

Explicitly Controlling IRQL

When necessary, you can raise and subsequently lower the IRQL on the current processor by calling KeRaiseIrql and KeLowerIrql. For example, from within a routine running at PASSIVE_LEVEL:

              KIRQL oldirql; ASSERT(KeGetCurrentIrql() <= DISPATCH_LEVEL); KeRaiseIrql(DISPATCH_LEVEL, &oldirql);  KeLowerIrql(oldirql);

KIRQL is the typedef name for an integer that holds an IRQL value. We ll need a variable to hold the current IRQL, so we declare it this way.
This ASSERT expresses a necessary condition for calling KeRaiseIrql: the new IRQL must be greater than or equal to the current level. If this relation isn t true, KeRaiseIrql will bugcheck (that is, report a fatal error via a blue screen of death).
KeRaiseIrql raises the current IRQL to the level specified by the first argument. It also saves the current IRQL at the location pointed to by the second argument. In this example, we re raising IRQL to DISPATCH_LEVEL and saving the current level in oldirql.
After executing whatever code we desired to execute at elevated IRQL, we lower the request level back to its previous value by calling KeLowerIrql and specifying the oldirql value previously returned by KeRaiseIrql.

After raising the IRQL, you should eventually restore it to the original value. Otherwise, various assumptions made by code you call later or by the code that called you can later turn out to be incorrect. The DDK documentation says that you must always call KeLowerIrql with the same value as that returned by the immediately preceding call to KeRaiseIrql, but this information isn t exactly right. The only rule that KeLowerIrql actually applies is that the new IRQL must be less than or equal to the current one. You can lower the IRQL in steps if you want to.

It s a mistake (and a big one!) to lower IRQL below whatever it was when a system routine called your driver, even if you raise it back before returning. Such a break in synchronization might allow some activity to preempt you and interfere with a data object that your caller assumed would remain inviolate.