< Day Day Up > |
The concept of mutual exclusion is a crucial one in operating systems development. It refers to the guarantee that one, and only one, thread can access a particular resource at a time. Mutual exclusion is necessary when a resource doesn't lend itself to shared access or when sharing would result in an unpredictable outcome. For example, if two threads copy a file to a printer port at the same time, their output could be interspersed. Similarly, if one thread reads a memory location while another one writes to it, the first thread will receive unpredictable data. In general, writable resources can't be shared without restrictions, whereas resources that aren't subject to modification can be shared. Figure 3-23 illustrates what happens when two threads running on different processors both write data to a circular queue. Figure 3-23. Incorrect sharing of memoryBecause the second thread got the value of the queue tail pointer before the first thread had finished updating it, the second thread inserted its data into the same location that the first thread had used, overwriting data and leaving one queue location empty. Even though this figure illustrates what could happen on a multiprocessor system, the same error could occur on a single-processor system if the operating system were to perform a context switch to the second thread before the first thread updated the queue tail pointer. Sections of code that access a nonshareable resource are called critical sections. To ensure correct code, only one thread at a time can execute in a critical section. While one thread is writing to a file, updating a database, or modifying a shared variable, no other thread can be allowed to access the same resource. The pseudocode shown in Figure 3-23 is a critical section that incorrectly accesses a shared data structure without mutual exclusion. The issue of mutual exclusion, although important for all operating systems, is especially important (and intricate) for a tightly coupled, symmetric multiprocessing (SMP) operating system such as Windows, in which the same system code runs simultaneously on more than one processor, sharing certain data structures stored in global memory. In Windows, it is the kernel's job to provide mechanisms that system code can use to prevent two threads from modifying the same structure at the same time. The kernel provides mutual-exclusion primitives that it and the rest of the executive use to synchronize their access to global data structures. Because the scheduler synchronizes access to its data structures at DPC/Dispatch level IRQL, the kernel and executive cannot rely on synchronization mechanisms that would result in a page fault or reschedule operation to synchronize access to data structures when the IRQL is DPC/Dispatch level or higher (levels known as an elevated or high IRQL). In the following sections, you'll find out how the kernel and executive uses mutual exclusion to protect its global data structures when the IRQL is high and what mutual-exclusion and synchronization mechanisms the kernel and executive use when the IRQL is low (below DPC/Dispatch level). High-IRQL SynchronizationAt various stages during its execution, the kernel must guarantee that one, and only one, processor at a time is executing within a critical section. Kernel critical sections are the code segments that modify a global data structure such as the kernel's dispatcher database or its DPC queue. The operating system can't function correctly unless the kernel can guarantee that threads access these data structures in a mutually exclusive manner. The biggest area of concern is interrupts. For example, the kernel might be updating a global data structure when an interrupt occurs whose interrupt-handling routine also modifies the structure. Simple single-processor operating systems sometimes prevent such a scenario by disabling all interrupts each time they access global data, but the Windows kernel has a more sophisticated solution. Before using a global resource, the kernel temporarily masks those interrupts whose interrupt handlers also use the resource. It does so by raising the processor's IRQL to the highest level used by any potential interrupt source that accesses the global data. For example, an interrupt at DPC/dispatch level causes the dispatcher, which uses the dispatcher database, to run. Therefore, any other part of the kernel that uses the dispatcher database raises the IRQL to DPC/dispatch level, masking DPC/dispatch-level interrupts before using the dispatcher database. This strategy is fine for a single-processor system, but it's inadequate for a multiprocessor configuration. Raising the IRQL on one processor doesn't prevent an interrupt from occurring on another processor. The kernel also needs to guarantee mutually exclusive access across several processors. Interlocked OperationsThe simplest form of synchronization mechanisms rely on hardware support for multiprocessor-safe manipulating integer values and for performing comparisons. They include functions such as InterlockedIncrement, InterlockedDecrement, InterlockedExchange, and InterlockedCompareExchange. The InterlockedDecrement function, for example, uses the x86 lock instruction prefix (for example, lock xadd) to lock the multiprocessor bus during the subtraction operation so that another processor that's also modifying the memory location being decremented won't be able to modify between the decrement's read of the original value and write of the decremented value. This form of basic synchronization is used by the kernel and drivers. SpinlocksThe mechanism the kernel uses to achieve multiprocessor mutual exclusion is called a spinlock. A spinlock is a locking primitive associated with a global data structure, such as the DPC queue shown in Figure 3-24. Figure 3-24. Using a spinlockBefore entering either critical section shown in the figure, the kernel must acquire the spinlock associated with the protected DPC queue. If the spinlock isn't free, the kernel keeps trying to acquire the lock until it succeeds. The spinlock gets its name from the fact that the kernel (and thus, the processor) is held in limbo, "spinning," until it gets the lock. Spinlocks, like the data structures they protect, reside in global memory. The code to acquire and release a spinlock is written in assembly language for speed and to exploit whatever locking mechanism the underlying processor architecture provides. On many architectures, spinlocks are implemented with a hardware-supported test-and-set operation, which tests the value of a lock variable and acquires the lock in one atomic instruction. Testing and acquiring the lock in one instruction prevents a second thread from grabbing the lock between the time when the first thread tests the variable and the time when it acquires the lock. All kernel-mode spinlocks in Windows have an associated IRQL that is always at DPC/dispatch level or higher. Thus, when a thread is trying to acquire a spinlock, all other activity at the spinlock's IRQL or lower ceases on that processor. Because thread dispatching happens at DPC/dispatch level, a thread that holds a spinlock is never preempted because the IRQL masks the dispatching mechanisms. This masking allows code executing a critical section protected by a spinlock to continue executing so that it will release the lock quickly. The kernel uses spinlocks with great care, minimizing the number of instructions it executes while it holds a spinlock. Note
The kernel makes spinlocks available to other parts of the executive through a set of kernel functions, including KeAcquireSpinlock and KeReleaseSpinlock. Device drivers, for example, require spinlocks in order to guarantee that device registers and other global data structures are accessed by only one part of a device driver (and from only one processor) at a time. Spinlocks are not for use by user programs user programs should use the objects described in the next section. Kernel spinlocks carry with them restrictions for code that uses them. Because spinlocks always have an IRQL of DPC/dispatch level or higher, as explained earlier, code holding a spinlock will crash the system if it attempts to make the scheduler perform a dispatch operation or if it causes a page fault. Queued SpinlocksA special type of spinlock called a queued spinlock is used in some circumstances instead of a standard spinlock. A queued spinlock is a form of spinlock that scales better on multiprocessors than a standard spinlock. In general, Windows will use only standard spinlocks when it expects there to be low contention for the lock. A queued spinlock work like this: When a processor wants to acquire a queued spinlock that is currently held, it places its identifier in a queue associated with the spinlock. When the processor that's holding the spinlock releases it, it hands the lock over to the first processor identified in the queue. In the meantime, a processor waiting for a busy spinlock checks the status not of the spinlock itself but of a per-processor flag that the processor ahead of it in the queue sets to indicate that the waiting processor's turn has arrived. The fact that queued spinlocks result in spinning on per-processor flags rather than global spinlocks has two effects. The first is that the multiprocessor's bus isn't as heavily trafficked by interprocessor synchronization. The second is that instead of a random processor in a waiting group acquiring a spinlock, the queued spinlock enforces first-in, first-out (FIFO) ordering to the lock. FIFO ordering means more consistent performance across processors accessing the same locks. Windows defines a number of global queued spinlocks by storing pointers to them in an array contained in each processor's processer control region (PCR). A global spinlock can be acquired by calling KeAcquireQueuedSpinlock with the index into the PCR array at which the pointer to the spinlock is stored. The number of global spinlocks has grown in each release of the operating system, and the table of index definitions for them is published in the DDK header file Ntddk.h.
Instack Queued SpinlocksIn addition to using static queued spinlocks that are globally defined, Windows XP and Windows Server 2003 kernels support dynamically allocated queued spinlocks with the KeAcquireInStackQueuedSpinlock and KeReleaseInStackQueuedSpinlock functions. Several components including the cache manager, executive pool manager, and NTFS take advantage of these types of locks, and the functions are documented in the DDK for use by thirdparty driver writers. KeAcquireInStackQueuedSpinlock takes a pointer to a spinlock data structure and a spin lock queue handle. The spin lock handle is actually a data structure in which the kernel stores information about the lock's status, including the lock's ownership and the queue of processors that might be waiting for the lock to become available. Executive Interlocked OperationsThe kernel supplies a number of simple synchronization functions constructed on spinlocks for more advanced operations, such as adding and removing entries from singly and doubly linked lists. Examples include ExInterlockedPopEntryList and ExInterlockedPushEntryList for singly linked lists, and ExInterlockedInsertHeadList and ExInterlockedRemoveHeadList for doubly linked lists. All these functions require a standard spinlock as a parameter and are used throughout the kernel and device drivers. Low-IRQL SynchronizationExecutive software outside the kernel also needs to synchronize access to global data structures in a multiprocessor environment. For example, the memory manager has only one page frame database, which it accesses as a global data structure, and device drivers need to ensure that they can gain exclusive access to their devices. By calling kernel functions, the executive can create a spinlock, acquire it, and release it. Spinlocks only partially fill the executive's needs for synchronization mechanisms, however. Because waiting for a spinlock literally stalls a processor, spinlocks can be used only under the following strictly limited circumstances:
These restrictions are confining and can't be met under all circumstances. Furthermore, the executive needs to perform other types of synchronization in addition to mutual exclusion, and it must also provide synchronization mechanisms to user mode. There are several additional synchronization mechanisms for use when spinlocks are not suitable:
Table 3-9 serves as a reference that compares and contrasts the capabilities of these mechanisms and their interaction with kernel-mode APC delivery.
Kernel Dispatcher ObjectsThe kernel furnishes additional synchronization mechanisms to the executive in the form of kernel objects, known collectively as dispatcher objects. The user-visible synchronization objects acquire their synchronization capabilities from these kernel dispatcher objects. Each user-visible object that supports synchronization encapsulates at least one kernel dispatcher object. The executive's synchronization semantics are visible to Windows programmers through the WaitForSingleObject and WaitForMultipleObjects functions, which the Windows subsystem implements by calling analogous system services the object manager supplies. A thread in a Windows application can synchronize with a Windows process, thread, event, semaphore, mutex, waitable timer, I/O completion port, or file object. One other type of executive synchronization object worth noting is called executive resources. Executive resources provide both exclusive access (like a mutex) as well as shared read access (multiple readers sharing read-only access to a structure). However, they're available only to kernel-mode code and thus aren't accessible from the Windows API. Executive resources are not dispatcher objects but rather data structures allocated directly from nonpaged pool that have their own specialized services to initialize, lock, release, query, and wait for them. The executive resource structure is defined in Ntddk.h, and the executive support routines are documented in the DDK reference documentation. The remaining subsections describe the implementation details of waiting for dispatcher objects. Waiting for Dispatcher Objects A thread can synchronize with a dispatcher object by waiting for the object's handle. Doing so causes the kernel to suspend the thread and change its dispatcher state from running to waiting, as shown in Figure 3-25. The kernel removes the thread from the dispatcher ready queue and no longer considers it for execution. Figure 3-25. Waiting for a dispatcher objectNote
At any given moment, a synchronization object is in one of two states: either the signaled state or the nonsignaled state. A thread can't resume its execution until the kernel changes its dispatcher state from waiting to ready. This change occurs when the dispatcher object whose handle the thread is waiting for also undergoes a state change, from the nonsignaled state to the signaled state (when a thread sets an event object, for example). To synchronize with an object, a thread calls one of the wait system services the object manager supplies, passing a handle to the object it wants to synchronize with. The thread can wait for one or several objects and can also specify that its wait should be canceled if it hasn't ended within a certain amount of time. Whenever the kernel sets an object to the signaled state, the kernel's KiWaitTest function checks to see whether any threads are waiting for the object and not also waiting for other objects to become signaled. If there are, the kernel releases one or more of the threads from their waiting state so that they can continue executing. The following example of setting an event illustrates how synchronization interacts with thread dispatching:
Note
What Signals an Object The signaled state is defined differently for different objects. A thread object is in the nonsignaled state during its lifetime and is set to the signaled state by the kernel when the thread terminates. Similarly, the kernel sets a process object to the signaled state when the process's last thread terminates. In contrast, the timer object, like an alarm, is set to "go off" at a certain time. When its time expires, the kernel sets the timer object to the signaled state. When choosing a synchronization mechanism, a program must take into account the rules governing the behavior of different synchronization objects. Whether a thread's wait ends when an object is set to the signaled state varies with the type of object the thread is waiting for, as Table 3-10 illustrates.
When an object is set to the signaled state, waiting threads are generally released from their wait states immediately. Some of the kernel dispatcher objects and the system events that induce their state changes are shown in Figure 3-26. Figure 3-26. Selected kernel dispatcher objectsFor example, a notification event object (called a manual reset event in the Windows API) is used to announce the occurrence of some event. When the event object is set to the signaled state, all threads waiting for the event are released. The exception is any thread that is waiting for more than one object at a time; such a thread might be required to continue waiting until additional objects reach the signaled state. In contrast to an event object, a mutex object has ownership associated with it. It is used to gain mutually exclusive access to a resource, and only one thread at a time can hold the mutex. When the mutex object becomes free, the kernel sets it to the signaled state and then selects one waiting thread to execute. The thread selected by the kernel acquires the mutex object, and all other threads continue waiting.
This brief discussion wasn't meant to enumerate all the reasons and applications for using the various executive objects but rather to list their basic functionality and synchronization behavior. For information on how to put these objects to use in Windows programs, see the Windows reference documentation on synchronization objects or Jeffrey Richter's Programming Applications for Microsoft Windows. Data Structures Two data structures are key to tracking who is waiting for what: dispatcher headers and wait blocks. Both these structures are publicly defined in the DDK include file Ntddk.h. The definitions are reproduced here for convenience: typedef struct _DISPATCHER_HEADER { UCHAR Type; UCHAR Absolute; UCHAR Size; UCHAR Inserted; LONGSignalState; LIST_ENTRYWaitListHead; } DISPATCHER_HEADER; typedefstruct_KWAIT_BLOCK { LIST_ENTRYWaitListEntry; struct _KTHREAD*RESTRICTED_POINTERThread; PVOID Object; struct _KWAIT_BLOCK *RESTRICTED_POINTERNextWaitBlock; USHORT WaitKey; USHORT WaitType; } KWAIT_BLOCK,*PKWAIT_BLOCK,*RESTRICTED_POINTER PRKWAIT_BLOCK; The dispatcher header contains the object type, signaled state, and a list of the threads waiting for that object. The wait block represents a thread waiting for an object. Each thread that is in a wait state has a list of the wait blocks that represent the objects the thread is waiting for. Each dispatcher object has a list of the wait blocks that represent which threads are waiting for the object. This list is kept so that when a dispatcher object is signaled, the kernel can quickly determine who is waiting for that object. The wait block has a pointer to the object being waited for, a pointer to the thread waiting for the object, and a pointer to the next wait block (if the thread is waiting for more than one object). It also records the type of wait (any or all) as well as the position of that entry in the array of handles passed by the thread on the WaitForMultipleObjects call (position zero if the thread was waiting for only one object). Figure 3-27 shows the relationship of dispatcher objects to wait blocks to threads. In this example, thread 1 is waiting for object B, and thread 2 is waiting for objects A and B. If object A is signaled, the kernel will see that because thread 2 is also waiting for another object, thread 2 can't be readied for execution. On the other hand, if object B is signaled, the kernel can ready thread 1 for execution right away because it isn't waiting for any other objects. Figure 3-27. Wait data structures
Fast Mutexes and Guarded MutexesFast mutexes, which are also known as executive mutexes, usually offer better performance than mutex objects because, although they are built on dispatcher event objects, they avoid waiting for the event object (and therefore the spinlocks on which an event object is based) if there's no contention for the fast mutex. This gives the fast mutex especially good performance in a multiprocessor environment. Fast mutexes are used widely throughout the kernel and device drivers. However, fast mutexes are suitable only when normal kernel-mode APC (described earlier in this chapter) delivery can be disabled. The executive defines two functions for acquiring them: ExAcquireFastMutex and ExAcquireFastMutexUnsafe. The former function blocks all APC delivery by raising the IRQL of the processor to APC_LEVEL and the latter expects to be called with normal kernel-mode APC delivery disabled, which can be done by raising the IRQL to APC level or by calling KeEnterCriticalRegion. Another limitation of fast mutexes is that they can't be acquired recursively like mutex objects can. Guarded mutexes are new to Windows Server 2003 and are essentially the same as fast mutexes (although they use a different synchronization object, the KGATE, internally). They are acquired with the KeAcquireGuardedMutex function, but instead of disabling APCs by calling KeEnterCriticalRegion, which disables only normal kernel-mode APCs, it disables all kernel-mode APC delivery by calling KeEnterGuardedRegion. They are not exposed for use outside of the kernel and are used primarily by the memory manager, which uses them to protect global operations such as creating paging files, deleting certain types of shared memory sections, and performing paged pool expansion. (See Chapter 7 for more information on the memory manager.) Executive ResourcesExecutive resources are a synchronization mechanism that supports shared and exclusive access, and like fast mutexes, require that normal kernel-mode APC delivery be disabled before they are acquired. They are also built on dispatcher objects that are only used when there is contention. Executive resources are used throughout the system, especially in file-system drivers. Threads waiting to acquire a resource for shared access wait for a semaphore associated with the resource, and threads waiting to acquire a resource for exclusive access wait for an event. A semaphore with unlimited count is used for shared waiters because they can all be woken and granted access to the resource when an exclusive holder releases the resource simply by signaling the semaphore. When a thread waits for exclusive access of a resource that is currently owned, it waits on a synchronization event object because only one of the waiters will wake when the event is signaled. Because of the flexibility that shared and exclusive access offers, there are a number of functions for acquiring resources: ExAcquireResourceSharedLite, ExAcquireResourceExclusiveLite, ExAcquireSharedStarveExclusive, ExAcquireWaitForExclusive, and ExTryToAcquireResourceExclusiveLite. These functions are documented in the DDK.
Push LocksPush locks, which were introduced in Windows XP, are another optimized synchronization mechanism built on the event object (and in Windows Server 2003 they are built on the internal KGATE synchronization object), and like fast mutexes, they wait for an event object only when there's contention on the lock. They offer advantages over the fast mutex in that they can be acquired in shared or exclusive mode. They are not documented or exported by the kernel and are therefore reserved for use by the operating system. There are two types of push locks: normal and cache aware. Normal push locks require only the size of a pointer in storage (4 bytes on 32-bit systems and 8 bytes on 64-bit systems). When a thread acquires a normal push lock, the push lock code marks the push lock as owned if it is not currently owned. If the push lock is owned exclusively or the thread wants to acquire the thread exclusively and the push lock is owned on a shared basis, the thread allocates a wait block on the thread's stack, initializes an event object in the wait block, and adds the wait block to the wait list associated with the push lock. When a thread releases a push lock, the thread wakes a waiter, if any are present, by signaling the event in the waiter's wait block. A cache-aware push lock layers on the basic push lock by allocating a push lock for each processor in the system and associating it with the cache-aware push lock. When a thread wants to acquire a cache-aware push lock for shared access, it simply acquires the push lock allocated for its current processor in shared mode; to acquire a cache-aware push lock exclusively, it acquires the push lock for each processor in exclusive mode. Areas where push locks are used include the object manager, where they protect global object manager data structures and object security descriptors, and the memory manager, where they protect AWE data structures.
|
< Day Day Up > |