An Archetypal Synchronization Problem

A hackneyed example will motivate this discussion. Suppose your driver has a static integer variable that you use for some purpose, say, to count the number of I/O requests that are currently outstanding:

static LONG lActiveRequests;

Suppose further that you increment this variable when you receive a request and decrement it when you later complete the request:

NTSTATUS DispatchPnp(PDEVICE_OBJECT fdo, PIRP Irp) { ++lActiveRequests; ... // process PNP request --lActiveRequests; }

I m sure you recognize already that a counter such as this one ought not to be a static variable: it should be a member of your device extension so that each device object has its own unique counter. Bear with me, and pretend that your driver always manages only a single device. To make the example more meaningful, suppose finally that a function in your driver will be called when it s time to delete your device object. You might want to defer the operation until no more requests are outstanding, so you might insert a test of the counter:

NTSTATUS HandleRemoveDevice(PDEVICE_OBJECT fdo, PIRP Irp) { if (lActiveRequests) <wait for all requests to complete> IoDeleteDevice(fdo); }

This example describes a real problem, by the way, which we ll tackle in Chapter 6 in our discussion of Plug and Play (PnP) requests. The I/O Manager can try to remove one of our devices at a time when requests are active, and we need to guard against that by keeping some sort of counter. I ll show you in Chapter 6 how to use IoAcquireRemoveLock and some related functions to solve the problem.

A horrible synchronization problem lurks in the code fragments I just showed you, but it becomes apparent only if you look behind the increment and decrement operations inside DispatchPnp. On an x86 processor, the compiler might implement them using these instructions:

; ++lActiveRequests; mov eax, lActiveRequests add eax, 1 mov lActiveRequests, eax  ; --lActiveRequests; mov eax, lActiveRequests sub eax, 1 mov lActiveRequests, eax

To expose the synchronization problem, let s consider first what might go wrong on a single CPU. Imagine two threads that are both trying to advance through DispatchPnp at roughly the same time. We know they re not both executing truly simultaneously because we have only a single CPU for them to share. But imagine that one of the threads is executing near the end of the function and manages to load the current contents of lActiveRequests into the EAX register just before the other thread preempts it. Suppose lActiveRequests equals 2 at that instant. As part of the thread switch, the operating system saves the EAX register (containing the value 2) as part of the outgoing thread s context image somewhere in main memory.

NOTE
The point being made in the text isn t limited to thread preemption that occurs as a result of a time slice expiring. Threads can also involuntarily lose control because of page faults, changes in CPU affinity, or priority changes instigated by outside agents. Think, therefore, of preemption as being an all-encompassing term that includes all means of giving control of a CPU to another thread without explicit permission from the currently running thread.

Now imagine that the other thread manages to get past the incrementing code at the beginning of DispatchPnp. It will increment lActiveRequests from 2 to 3 (because the first thread never got to update the variable). If the first thread preempts this other thread, the operating system will restore the first thread s context, which includes the value 2 in the EAX register. The first thread now proceeds to subtract 1 from EAX and store the result back in lActiveRequests. At this point, lActiveRequests contains the value 1, which is incorrect. Somewhere down the road, we might prematurely delete our device object because we ve effectively lost track of one I/O request.

Solving this particular problem is easy on an x86 computer we just replace the load/add/store and load/subtract/store instruction sequences with atomic instructions:

; ++lActiveRequests; inc lActiveRequests  ; --lActiveRequests; dec lActiveRequests

On an Intel x86, the INC and DEC instructions cannot be interrupted, so there will never be a case in which a thread can be preempted in the middle of updating the counter. As it stands, though, this code still isn t safe in a multiprocessor environment because INC and DEC are implemented in several microcode steps. It s possible for two different CPUs to be executing their microcode just slightly out of step such that one of them ends up updating a stale value. The multi-CPU problem can also be avoided in the x86 architecture by using a LOCK prefix:

; ++lActiveRequests; lock inc lActiveRequests  ; --lActiveRequests; lock dec lActiveRequests

The LOCK instruction prefix locks out all other CPUs while the microcode for the current instruction executes, thereby guaranteeing data integrity.

Not all synchronization problems have such an easy solution, unfortunately. The point of this example isn t to demonstrate how to solve one simple problem on one of the platforms where Windows XP runs but rather to illustrate the two sources of difficulty: preemption of one thread by another in the middle of a state change and simultaneous execution of conflicting state-change operations. We can avoid difficulty by judiciously using synchronization primitives, such as mutual exclusion objects, to block other threads while our thread accesses shared data. At times when thread blocking is impermissible, we can avoid preemption by using the IRQL priority scheme, and we can prevent simultaneous execution by judiciously using spin locks.