Interrupt Request Levels


Interrupt Request Levels

An IRQL defines the hardware priority at which a processor operates at any given time. When a processor is running at a given IRQL, interrupts at that IRQL and lower are masked off on the processor. A thread running at a low IRQL can be interrupted to run code at a higher IRQL, but a thread running at a higher IRQL cannot be interrupted to run code at an equal or lower IRQL. For example, a processor that is running at IRQL DISPATCH_LEVEL can be interrupted only by a request at an IRQL greater than DISPATCH_LEVEL.

The number of IRQLs and their specific values are processor dependent. The x64 and Intel Itanium architectures have 16 IRQLs, and x86-based architectures have 32 IRQLs. The difference is due primarily to the types of interrupt controllers that are used with each architecture. Table 15-1 provides a list of the IRQLs for x86, x64, and Intel Itanium processors.

Table 15-1: Interrupt Request Levels for Processor Types
Open table as spreadsheet

IRQL

Processor-specific IRQL value

Description

 

x86

x64

Itanium

 

PASSIVE_LEVEL

User threads and most kernel-mode operations

APC_LEVEL

1

1

1

Asynchronous procedure calls and page faults

DISPATCH_LEVEL

2

2

2

Thread scheduler and DPCs

CMC_LEVEL

N/A

N/A

3

Correctable machine-check level (Itanium platforms only)

Device interrupt levels (DIRQL)

3–26

3–11

4–11

Device interrupts

PC_LEVEL

N/A

N/A

12

Performance counter (Itanium platforms only)

PROFILE_LEVEL

27

15

15

Profiling timer for releases earlier than Windows 2000

SYNCH_LEVEL

27

13

13

Synchronization of code and instruction streams across processors

CLOCK_LEVEL

N/A

13

13

Clock timer

CLOCK2_LEVEL

28

N/A

N/A

Clock timer for x86 hardware

IPI_LEVEL

29

14

14

Interprocessor interrupt for enforcing cache consistency

POWER_LEVEL

30

15

14

Power failure

HIGH_LEVEL

31

15

15

Machine checks and catastrophic errors; profiling timer for Windows XP and later releases

The system schedules all threads to run at IRQLs below DISPATCH_LEVEL, and the system's thread scheduler itself-also called "the dispatcher"-runs at DISPATCH_LEVEL. Consequently, a thread that is running at or above DISPATCH_LEVEL has, in effect, exclusive use of the current processor. Because DISPATCH_LEVEL interrupts are masked off on the processor, the thread scheduler cannot run on that processor and thus cannot schedule any other thread.

On a multiprocessor system, each processor can be running at a different IRQL. Therefore, one processor could run a driver's EvtInterruptIsr function at DIRQL while a second processor runs driver code in a worker thread at PASSIVE_LEVEL. More than one thread could thus attempt to access shared data simultaneously . If both threads only read the data, no locks are required. However, if either thread writes the data, the driver must serialize access by using a lock that raises the IRQL to the highest level at which any code that accesses the data can run. In this example, the code that runs at PASSIVE_LEVEL in the worker thread acquires the interrupt spin lock before it accesses the shared data.

Processor-specific and Thread-specific IRQLs

IRQLs can be considered processor specific or thread specific. IRQLs at or above DISPATCH_LEVEL are processor specific. Hardware and software interrupts at these levels are targeted at individual processors. Drivers commonly use the following processor-specific IRQLs:

  • DISPATCH_LEVEL

  • DIRQL

  • HIGHEST_LEVEL

IRQLs below DISPATCH_LEVEL are thread specific. Software interrupts at these levels are targeted at individual threads. Drivers use the following thread-specific IRQLs:

  • PASSIVE_LEVEL

  • APC_LEVEL

The system thread scheduler considers only thread priority, and not IRQL, when preempting a thread. If a thread running at IRQL APC_LEVEL blocks, the scheduler might select a new thread for the processor that was previously running at PASSIVE_LEVEL.

Although only two thread-specific IRQL values are defined, the system actually implements three levels. The system implements an intermediate level between PASSIVE_LEVEL and APC_LEVEL. Code running at this level is said to be in a critical region. Code that is running at PASSIVE_LEVEL calls KeEnterCriticalRegion to raise the IRQL to this level and calls KeLeaveCriticalRegion to return the IRQL to PASSIVE_LEVEL.

The following sections provide more information about the operating environment for driver code at each of these levels.

IRQL PASSIVE_LEVEL

When the processor is operating at PASSIVE_LEVEL, Windows uses the scheduling priorities of the current threads to determine which thread to run. PASSIVE_LEVEL is the processor's normal operating state. Any thread that is running at PASSIVE_LEVEL is considered preemptible, because it can be replaced by a thread that has a higher scheduling priority. A thread that is running at PASSIVE_LEVEL is also considered interruptible, because it can be interrupted by a request at a higher IRQL.

Occasionally, driver code that is running at IRQL PASSIVE_LEVEL must call a system service function or perform some other action that requires running at a higher IRQL-usually DISPATCH_LEVEL. Before making the call or performing the action, the driver must raise its IRQL to the required level, and immediately after completing the action, the driver must lower the IRQL. To raise and lower IRQL, a driver calls KeRaiseIrql and KeLowerIrql , respectively.

Code that is running at PASSIVE_LEVEL is considered to be working on behalf of the current thread. An application that creates a thread can suspend that thread while the thread is running kernel-mode code at PASSIVE_LEVEL. Therefore, driver code that acquires a lock at PASSIVE_LEVEL must ensure that the thread in which it is running cannot be suspended while it holds the lock; thread suspension would disable access to the driver's device. A driver can resolve this problem by using a lock that raises IRQL or by entering a critical region whenever it tries to acquire a PASSIVE_LEVEL lock.

Chapter 10, "Synchronization," describes how a KMDF driver can constrain certain queue and file object callback functions to run at IRQL PASSIVE_LEVEL.

IRQL PASSIVE_LEVEL in a Critical Region

Code that is running at PASSIVE_LEVEL in a critical region is effectively running at an intermediate level between PASSIVE_LEVEL and APC_LEVEL. Calls to KeGetCurrentIrql return PASSIVE_LEVEL. Driver code can determine whether it is operating in a critical region by calling the KeAreApcsDisabled function, which is available in Windows XP and later releases.

Asynchronous procedure calls (APCs) are software interrupts that are targeted at a specific thread. The system uses APCs to perform work in the context of a particular thread, such as writing back the status of an I/O operation to the requesting application. How a target thread responds to APCs depends on the thread's state and the type of APC.

Driver code that is running above PASSIVE_LEVEL-either at PASSIVE_LEVEL in a critical region or at APC_LEVEL or higher-cannot be suspended. If your driver sets the PASSIVE_LEVEL execution constraint for device or file object callbacks, the framework synchronizes execution of those callbacks by entering a critical region. The critical region prevents a potential denial-of-service attack that could result from thread suspension.

Almost every operation that a driver can perform at PASSIVE_LEVEL can also be performed in a critical region. Two notable exceptions are raising hard errors and opening a file on storage media.

IRQL APC_LEVEL

APC_LEVEL is a thread-specific IRQL that is most commonly associated with paging I/O. Applications cannot suspend code that is running at APC_LEVEL. The system implements fast mutexes -a type of synchronization mechanism-at APC_LEVEL. The KeAcquireFastMutex function raises the IRQL to APC_LEVEL, and KeReleaseFastMutex returns the IRQL to its original value.

The only difference between a thread that is running at PASSIVE_LEVEL with APCs disabled and a thread that is running at APC_LEVEL is that, while running at APC_LEVEL, the thread cannot be interrupted by a special kernel-mode APC, which the system delivers when an I/O request is complete.

IRQL DISPATCH_LEVEL

DISPATCH_LEVEL is the highest software interrupt level and the first processor-specific level. The Windows dispatcher runs at IRQL DISPATCH_LEVEL. Some other kernel-mode support functions, some driver functions, and all DPCs also run at IRQL DISPATCH_LEVEL. While the processor operates at this level, one thread cannot preempt another; only a hardware interrupt can interrupt the running thread. To maximize overall system throughput, driver code that runs at DISPATCH_LEVEL should perform only the minimum required processing.

Because code that is running at DISPATCH_LEVEL cannot be preempted, the operations that a driver can perform at DISPATCH_LEVEL are restricted. Any code that must wait for an object that another thread sets or signals asynchronously-such as an event, semaphore, mutex, or timer-cannot run at DISPATCH_LEVEL because the waiting thread cannot block while waiting for the other thread to perform the action. Waiting for a nonzero period on such an object while at DISPATCH_LEVEL causes the system to deadlock and eventually to crash.

DPCs are, in effect, software interrupts targeted at processors. DPCs, including EvtInterruptDpc and EvtDpcFunc functions, are always called at DISPATCH_LEVEL in an arbitrary thread context. Drivers typically use DPCs for the following purposes:

  • To perform additional processing after a device interrupts.

    Such DPCs are either EvtInterruptDpc or EvtDpcFunc callbacks that are queued by the driver's EvtInterruptIsr callback.

  • To handle device time-outs.

    Such a DPC is an EvtTimerFunc callback that the framework queues upon expiration of a timer that the driver started with the WdfTimerStart method.

The kernel maintains a queue of DPCs for each processor and runs DPCs from this queue just before the processor's IRQL drops below DISPATCH_LEVEL. A DPC is assigned to the queue for the same processor on which the code that queues it is running.

If a device interrupts while either its EvtInterruptDpc or EvtDpcFunc callback is running, its EvtInterruptIsr callback interrupts the DPC and queues a DPC object as it normally would. In a single-processor system, the DPC object is placed at the end of the single DPC queue, where it runs in sequence with any other DPCs in the queue after the EvtInterruptIsr callback and the current DPC complete.

In a multiprocessor system, however, the second interrupt could occur on a different processor and the two DPCs-or the EvtInterruptIsr and a DPC-could run simultaneously. For example, assume a device interrupts on processor 1 while its EvtInterruptDpc function is running on processor 0. The system runs the EvtInterruptIsr function on processor 1 to handle the interrupt. When the EvtInterruptIsr function queues its EvtInterruptDpc function, the system places the DPC object into the DPC queue of processor 1. Thus, a driver's EvtInterruptIsr function can run at the same time as its DPC function, and the same DPC function can simultaneously run on two or more processors. If both functions attempt to access the same data simultaneously, serious errors can occur. Drivers must use interrupt spin locks to protect shared data in these scenarios by calling WdfInterruptAcquireLock or WdfInterruptSychronize .

IRQL DIRQL

DIRQL describes the range of IRQLs that physical devices can generate. Each processor architecture has a range of DIRQLs, as shown earlier in Table 15-1. Multiple devices can interrupt at the same DIRQL. The DIRQL for each device is available to its driver in the translated resource list that the framework passes to the driver's EvtDevicePrepareHardware callback.

The following KMDF driver callback functions run at DIRQL:

  • EvtInterruptIsr

  • EvtInterruptSynchronize

  • EvtInterruptEnable and EvtInterruptDisable

Function drivers for physical devices that generate interrupts include these callbacks; filter drivers rarely do.

While running at DIRQL, driver code must conform to the guidelines described in "Guidelines for Running at IRQL DISPATCH_LEVEL or Higher" later in this chapter.

Tip 

Microsoft made several enhancements to the interrupt architecture in Windows Vista, as described in "Interrupt Architecture Enhancements in Windows"-online at http://go.microsoft.com/fwlink/?LinkId=81584.

IRQL HIGH_LEVEL

Certain bug-check and nonmaskable interrupt (NMI) callback functions run at IRQL HIGH_LEVEL. Because no interrupts can occur at IRQL HIGH_LEVEL, these functions are guaranteed to run without interruption.

The lack of interrupts, however, means that actions of the callback functions are severely restricted. Code that runs at HIGH_LEVEL must not allocate memory, use any synchronization mechanisms, or call any functions that run at IRQL<=DISPATCH_LEVEL. Additional restrictions are described in the following section.

 Tip  See "Writing a Bug Check Callback Routine" in the WDK for information about writing bug check and NMI callback functions-online at http://go.microsoft.com/fwlink/?LinkId=81587. See also "KeRegisterNmiCallback" in the WDK-online at http://go.microsoft.com/fwlink/?LinkId=81588.

Guidelines for Running at IRQL DISPATCH_LEVEL or Higher

The following guidelines apply to driver code that runs at IRQL DISPATCH_LEVEL or above:

  • Use only nonpageable data and code; do not perform any actions that require paging.

    Windows must wait for paging I/O operations to complete, and such waits cannot be performed at DISPATCH_LEVEL or higher. For the same reason, any driver function that obtains a spin lock must not be pageable .

    A driver can store data that it will access at IRQL>=DISPATCH_LEVEL in the following locations:

    • The context area of the device object, the DPC object, or another object that is passed to the callback function.

    • The kernel stack, for small amounts of data that do not need to persist beyond the lifetime of the function.

    • Nonpaged memory that the driver allocates . For large amounts of data, such as the space required for I/O buffers, drivers should create WDF memory objects by calling WdfMemoryCreate or should call the ExAllocateXxx or MmAllocateXxx functions, as appropriate.

  • Never wait for a nonzero period on a WDF wait lock or a kernel dispatcher object such as an event, semaphore, timer, kernel mutex, thread, process, or file object.

  • Do not call functions that convert strings from ANSI to UNICODE, or vice versa. These functions are in pageable code. The WdfStringXxx methods and the kernel-mode safe string functions can be called only at PASSIVE_LEVEL.

  • Never call WdfSpinLockRelease unless you have previously called WdfSpinLockAcquire .

Calls to Functions that Run at a Lower IRQL

If a high-IRQL function must initiate some time-consuming processing, it arranges to complete the processing at a lower IRQL. For example, because EvtInterruptIsr callbacks run at DIRQL, these callbacks must do as little processing as possible, so they queue a DPC to complete the processing at DISPATCH_LEVEL.

Sometimes, driver code that runs at IRQL>=DISPATCH_LEVEL must communicate with code at a lower IRQL. For example, a USB driver might need to reset its device if errors occur during completion of an I/O operation. CompletionRoutine callback functions can be called at DISPATCH_LEVEL, but the synchronous USB pipe and device reset methods must be called at PASSIVE_LEVEL. In this situation, the driver can use a work item. An EvtWorkItem callback function contains the code that calls the reset method. The driver creates a WDF work item object that is associated with the function and then queues the work item. The framework adds the work item function to the system's work item queue, and the system later runs the function in the context of a system thread at IRQL PASSIVE_LEVEL. In addition, if a driver sets the execution level for I/O event and file callbacks to WdfExecutionLevelPassive , the framework invokes those callbacks from a work item.

See "Work Items and Driver Threads" later in this chapter for more information.