Threads | Professional .NET Framework 2.0 (Programmer to Programmer)

Depending on the type of managed program you are running, the number of threads executing by default will vary. All managed applications need at least one additional thread to run GC finalization. When the debugger is attached to a managed process, a special debugger helper thread is generated. AppDomain unloads are serviced by a special thread. For the concurrent and server GCs, a number of threads will be created to perform asynchronous collection. And of course, depending on the extent to which an application uses the ThreadPool, I/O Completion Port threads and worker threads will be allocated. The overhead that a single thread introduces is not massive: usually 1MB of stack space, some heap overhead for storing TLS data, and the cost of an additional runnable thread being scheduled for execution.

Through the use of the System.Threading APIs you can manage threads of execution in a number of interesting ways. Each individual thread is represented by an instance of the Thread class. The simplest way to obtain a reference to an active thread is to invoke the static Thread.CurrentThread property, which returns the currently executing thread. You can also construct your own threads explicitly or schedule work on the ThreadPool, a CLR-managed pool of threads. We'll explore each of these topics further below. First, let's discuss the lifecycle of explicitly managed CLR threads.

Queuing Work on the Thread Pool

In most situations, you needn't actually create your own Threads by hand. The system manages a pool of worker threads which can accommodate arbitrary incoming work. There is only one pool per-process, and therefore the ThreadPool class offers only static methods. The CLR manages intelligent heuristics to grow and shrink the number of physical threads to ensure good scalability based on the physical machine's architecture. It also manages the set of I/O Completion threads used by the I/O infrastructure (described in Chapter 7).

To schedule a work item for execution, you simply make a call to the QueueUserWorkItem method passing in a WaitCallback delegate:

 ThreadPool.QueueUserWorkItem(new WaitCallback(MyWorker)); // ... void MyCallback(object state) {     // Do some work; this executes on a thread from the ThreadPool. }

This example queues a delegate to the MyCallback function to be run at some point in the future. An overload for QueueUserWorkItem exists that permits you to pass in some state; this gets passed along to the callback function. Note that the Thread class also has a property IsThreadPoolThread, which evaluates to true if code is running from within a thread pool context.

The thread pool makes no guarantees about quality of service, nor does it even ensure that everything in the queue is executed prior to a shut down. If all nonbackground threads in a process are shut down, a work item remaining in the thread pool will not get a chance to run.

Minimum and Maximum Threads

The ThreadPool uses a lower and upper bound on both its worker and I/O Completion threads. The minimum by default is set to 1 per logical hardware thread (processor, core, HT) and a maximum of 25 per logical hardware thread. If you attempt to schedule more than 25, QueueUserWorkItem will block. If all 25 threads are themselves blocked waiting for some other event to occur, this situation could lead to a deadlock. It's generally not recommended to change the default number of threads — the optimal situation is when only 1 thread is runnable per logical hardware thread, so clearly even 25 is very high — but can be used to solve such deadlock problems (e.g., if you don't own the code generating work on the ThreadPool).

The GetMinThreads and GetMaxThreads APIs save the current counts in output parameters. Similarly, GetAvailableThreads can be used to determine how many unused threads are currently available in the pool:

 int minWorkerThreads, maxWorkerThreads, availableWorkerThreads; int minIoThreads, maxIoThreads, availableIoThreads; ThreadPool.GetMinThreads(ref minWorkerThreads, ref minIoThreads); ThreadPool.GetMaxThreads(ref maxWorkerThreads, ref maxIoThreads); ThreadPool.GetAvailableThreads(     ref availableWorkerThreads, ref availableIoThreads);

You can adjust the min and max values using the SetMinThreads and SetMaxThreads methods. SetMin Threads should almost never be used because it limits the pool's ability to shrink under low workloads; it was added to address a problem where the pool's heuristics prevent extreme growth under short periods of time (see MSDN KB article 810259 for details). Similarly, you can use configuration to set these values for ASP.NET applications:

 <configuration>     <system.web>         <processModel             minWorkerThreads="..." maxWorkerThreads="..."             minIoThreads="..." maxIoThreads="..."             ... />     </system.web> </configuration>

Most applications should not need to bother with these settings at all. Many people attempt to tweak with these settings and/or write their own ThreadPool logic with little success.

Wait Registrations

Another ThreadPool API, RegisterWaitForSingleObject, allows you to schedule a callback function for execution once a specific WaitHandle is signaled. We discuss WaitHandles later in this chapter in the context of auto- and manual-reset events. But at a high level, Windows kernel and executive objects can be signaled to indicate that an event has occurred. This applies to synchronization primitives like Mutex, Semaphore, and EventWaitHandle, but also to other objects like Process and Thread. Clients can wait for such events using WaitHandle.WaitOne (i.e., Win32's WaitForSingleObjectEx and WaitForMultipleObjectsEx).

But WaitOne is synchronous and blocks the current thread. You can instead schedule a task to be executed on a ThreadPool thread once this signal occurs:

 EventWaitHandle ewh = /*...*/; RegisteredWaitHandle rwh = ThreadPool.RegisterWaitForSingleObject(     ewh, MyCallback, null, Timeout.Infinite, true);

In this case, we register execution of MyCallback when ewh is signaled. We specify null for state (which, just like QueueUserWorkItem, can be passed to the callback function). The timeout parameter enables us to tell the ThreadPool that it should wait only for the specified time before giving up; Timeout.Infinite is just a constant -1, which means that the request never times out. The last parameter is a bool to indicate whether this should only wait for one signal. If false, the registration will keep firing each time the handle is signaled. The Unregister function on the RegisteredWaitHandle object returned can be used to cancel the registration.

Explicit Thread Management

As noted above, threads can be inspected and managed using the Thread type. While these are not limited to explicit thread management — i.e., they can be used to inspect the state of the world at any point of execution — we'll discuss all of it together.

Scheduling State

Each thread is always in a single well-defined state: Unstarted, Running, Aborted, Stopped, Suspended, or WaitSleepJoin. This state is managed by the CLR, and can be obtained through the ThreadState property on the Thread class. What is returned is an instance of the flags-style enumeration type ThreadState. A thread's state can contain informative values: Background, AbortRequested, StopRequested, and/or SuspendRequested, which don't necessarily pertain to scheduling at all. Background is the odd value out of the bunch, and simply indicates that the target thread is a background thread (that is, it won't keep the process alive should all other threads exit). The other values provide useful information about what state transitions have been requested but not yet made. Here is a description of each:

Unstarted: A Thread object has been allocated, but it has not yet begun executing. In most cases, this means that no native OS thread has even been allocated yet and thus it cannot actually execute any code yet.
Running: The ThreadStart for the target thread is either runnable or actively executing. For reasons discussed below, the CLR does not differentiate between these two.
Aborted: The thread has been aborted, either individually or as part of an AppDomain unload process.
AbortRequested: A request to abort execution of the thread has been made, but the thread has not yet responded. This is usually because the thread is executing some unabortable code, the meaning of which we discuss below.
Stopped: The native thread has exited and is no longer executing.
StopRequested: A request to stop the thread has been made by the CLR's threading subsystem. This cannot be initiated by a user request.
Suspended: The thread's execution has been suspended. It is waiting to be resumed.
SuspendRequested: A request to suspend execution has been made, but the threading subsystem has not been able to respond yet.
WaitSleepJoin: The thread's execution is currently blocked in managed code waiting for some condition to occur. This is automatically initiated if the thread uses a contentious Monitor.Enter or any of the Monitor.Wait, Thread.Sleep, or WaitHandle.WaitOne APIs, among others.

Many transitions between these states can be triggered manually by actions that you can perform using the Thread class itself, while others occur implicitly as a result of CLR execution or using certain non-threading APIs. The possible state transitions and general lifecycle of a thread are illustrated in Figure 10-2; transitions initiated by the threading subsystem are indicated with white instead of black arrows.

image from book
Figure 10-2: Thread state transitions.

The precise means by which state transitions occur and exactly what the states mean are the primary topics of this section. Note that the CLR doesn't differentiate between running and ready as the OS does. This is because the OS doesn't know anything about this thread state. It can't modify it during a context switch, for example, nor would it be a good idea to do so. For the same reason, if you invoke some managed code that causes the thread to block, the ThreadState will likely still report back Running rather than WaitSleepJoin.

Creating New Threads

When you create a new Thread object, you must pass it a delegate that refers to the method which it should execute once started. This is called the thread start function. A thread doesn't begin execution until you explicitly tell it to using the Start instance method, described further below. The constructor merely hands you a new Thread instance in the Unstarted state which you can work with and eventually schedule for execution. If you never get around to calling Start, an underlying native OS thread will not be created (including any memory allocations such as stack space and TEB).

Two styles of thread starts are available: a simple version (ThreadStart) and a parameterized version (ParameterizedThreadStart), the latter of which takes a single object argument. These two delegate type signatures are void ThreadStart() and void ParameterizedThreadStart(object), respectively. The parameterized version is useful for cases in which you need to pass information from the starter to the new thread. If you construct your thread using the parameterized version, you should also use the corresponding Start overload, which accepts an object parameter, otherwise the target of the delegate will see null for its value.

Given two methods WorkerOperation and ParameterizedWorkerOperation:

 void WorkerOperation() {     // Do some work...only relies on shared state, e.g. statics.     Console.WriteLine("Simple worker"); } void ParameterizedWorkerOperation(object obj) {     // Do some work...relies on the state passed in, i.e. ‘obj'.     Console.WriteLine("Parameterized worker: {0}", obj); }

We can schedule both for execution using the two constructors just discussed:

 Thread thread = new Thread(WorkerOperation); Thread paramThread = new Thread(ParameterizedWorkerOperation);

Notice that a shorthand means to accomplish the same thing can be seen by using C#'s delegate inferencing syntax, as follows:

 Thread thread1 = new Thread(WorkerOperation); Thread thread2 = new Thread(ParameterizedWorkerOperation); thread1.Start(); thread2.Start("Some state...");

The result of running this code is that "Simple worker" and "Parameterized worker: Somestate..." are both written to the console. The order in which these appear is nondeterministic; it depends entirely on the way in which threads are scheduled. More details on the Start method can be found in the following section.

Controlling a Thread's Stack Size

Both styles of constructors also offer an overload that takes an integer parameter named maxStackSize. As noted previously, threads by default reserve 1MB of stack space on Windows. Only a few pages are committed initially. Page protection is used to trap when additional pages are needed, which results in committing pages in the stack. But you can specify an alternative size using this parameter. This can be useful if you know your threads will never need 1MB of space, especially if you're creating a large number of threads. SQL Server, for example, allocates only 0.5MB of stack space for managed threads.

The minimum value is for maxStackSize is 128K. Using too small a value can obviously result in stack overflows if an attempt is made to commit up to the page just before the end of the stack. We discuss stack overflow and the CLR's treatment of it in Chapter 3. In summary: it causes a fast-fail.

This code starts a new thread using a specific stack size and uses unbounded recursion to trigger an overflow:

 public void Test(int howManyK) {     Thread t = new Thread(TestOverflow, 1024*howManyK);     t.Start(1); } void TestOverflow(object o) {     int i = (int)o;     Console.WriteLine("TestOverflow("+ i + ")");     TestOverflow(i + 1); // Call this recursively to find the overflow point. }

TestOverflow uses unbounded recursion to overflow the stack and prints out its level for each call. For example, in some quick-and-dirty testing, passing 128 for howManyK will cause an overflow at the 6,249th call; 256 overflows on the 14,441st.

Starting Execution

A thread is not run until you explicitly Start it. Starting a thread physically allocates the OS thread object (via Win32's CreateThread), sets the managed thread's state to Running, and begins execution with the thread start delegate provided during construction time. The Start overload, which accepts an object parameter, passes it transparently to the thread start method.

Execution continues until one of several conditions arises:

The specified thread start method terminates. This can happen either due to a normal completion of the thread start routine, because of a call to Win32's ExitThread, or because of an unhandled exception. In any of these cases, the thread's final state will be Stopped.
A request is made to abort the thread, either explicitly or as part of the AppDomain unload process. This will cause the Thread to transition to the final state Aborted once the threading subsystem is able to process the abort. This interrupts the execution of the thread start code by raising a ThreadAbortException. See the section below for more details on thread aborts.
A blocking wait (e.g., Monitor.Wait, contentious Monitor.Enter, WaitHandle.WaitOne), Thread.Sleep, or Thread.Join operation is invoked. The thread's state is set to WaitSleepJoin, execution blocks, and it resumes again in the Running state once the requested condition becomes true.
The thread's execution is suspended, either as part of a system service such as the GC or explicitly through the use of the Suspend API. This changes its state to Suspended after the threading subsystem is able to respond. When the thread's execution is resumed, a transition back to the Running state occurs.

The IsAlive property of the Thread class will return true if a thread's execution has not ended — that is, if its state is anything other than Stopped or Aborted, the two possible final thread states.

Sleeping

The static Sleep method on the Thread class puts the currently executing thread to sleep for the time interval supplied (specified in either milliseconds or a TimeSpan). Putting a thread to sleep causes it to enter the WaitSleepJoin state. Calling Thread.Sleep(0) causes the current thread to yield to other runnable threads, enabling the OS to perform a context switch. This is useful if the current thread has no useful work to perform. Note that context switches will happen automatically once a thread's time-slice expires.

Interruption

The Interrupt method enables you to asynchronously interrupt a thread that is currently blocked in the WaitSleepJoin state. This interruption causes a ThreadInterruptedException originating from the statement that caused it to block. If Interrupt is called while a thread is not blocked, an interruption will occur immediately the next time the thread tries to enter this state. It is possible that the thread will terminate normally before such an interruption has a chance to occur.

Using Interrupt by itself can be insufficient. If a thread doesn't block, it will never be woken up. A certain level of cooperation can be used to politely ask the thread to stop; if it tries to block, we can still interrupt it. For example:

 class Worker {     private bool interruptRequested;     private Thread myThread;     public void Interrupt()     {         if (myThread == null)             throw new InvalidOperationException();         interruptRequested = true;         myThread.Interrupt();         myThread.Join();     }     private void CheckInterrupt()     {         if (interruptRequested)             throw new ThreadInterruptedException();     }     public void DoWork(object obj)     {         myThread = Thread.CurrentThread;         try         {             while (true)             {                 // Do some work... (including some blocking operations)                 CheckInterrupt();                 // Do some more work...                 CheckInterrupt();                 // And so forth...             }         }         catch (ThreadInterruptedException)         {             // Thread was interrupted; perform any cleanup.             return;         }     } }

Another piece of code might use a worker like this:

 Worker w = new Worker(); Thread t = new Thread(w.DoWork); t.Start(); // Do some work... // Uh-oh, we need to interrupt the worker. w.Interrupt();

This pattern requires that the worker code participate in the interruption scheme. Sometimes this is not feasible, but in the worst case an interruption will still occur at blocking points.

Suspending and Resuming

Thread suspension is a feature that was made obsolete in version 2.0 of the Framework. It is mentioned here simply so that you are able to deal with existing code that might make use of it, and also so you can understand why it is inherently dangerous. The Suspend method tells the threading subsystem to suspend execution of some target thread. Invoking Resume() on a it will transition the thread back into the Running state, picking up execution precisely where it left off prior to suspension. If a thread is never resumed, it continues to use up system resources and hold on to objects, resources, and locks that it accumulated prior to the suspension. It will be eventually freed once the runtime is shut down.

Unfortunately, this suspension can happen while the target thread holds critical sections and resources. For example, if you happen to suspend a target thread while it's invoking a class constructor, you could deadlock your application if another thread needs to call the same constructor. Alas, suspension is required in order to do things like capture a stack trace from a target thread using the StackTrace class. If you are forced to use it for legacy reasons, attempt to minimize the amount of time a thread is held suspended.

Joining

Sometimes you need to block the current thread's execution until another target thread has completed running. This is common in fork/join patterns, where you have a master thread that forks a set of smaller work items and must ensure that they have each completed (sometimes called a barrier). The Join method allows you to do this. Joining on another thread will place the executing thread (that is, the thread which executes the Join statement) into the WaitSleepJoin state, transitioning back to Running once the target thread has completed execution.

For example, consider this code:

 Thread[] workers = new Thread[10]; // Create worker threads: for (int i = 0; i < workers.Length; i++) {     workers[i] = new Thread(DoWork);     workers[i].Start(); } // Do some work while the workers execute... // Now join on the workers (and perhaps process their results): foreach (Thread t in workers) { t.Join(); }

In this snippet, we first create an array, construct and Start each Thread element, do some work while they execute in the background, and finally Join on each of them in order. Join also offers overloads that take a timeout (in milliseconds or as a TimeSpan), returning a bool to indicate whether the call unblocked as a result of successfully joining (true) or due to a timeout (false).

Thread Aborts

Threads can be aborted, which terminates them in a careful manner. User code can invoke one by calling Abort on the target Thread object. Thread aborts are also used in the AppDomain unload process to carefully shut down all code actively running inside an AppDomain at the time an unload occurs. A rude unload is also possible — based on host configuration; for example, SQL Server will escalate to a rude abort if your code takes too long to finish unwinding — which does not abort threads in a careful fashion. It terminates them. Initiating thread aborts on opaque threads is not advisable; they should only be used by sophisticated hosts that are able to mitigate the risk of corrupt state that can ensue.

The CLR uses delay abort regions to ensure that normal thread aborts cannot interrupt the execution of certain regions of code. In such cases, the CLR queues up the abort request and processes it as soon as the target thread exits this region (assuming that it's not nested inside another). It does so to prevent corruption. This is different than a critical region (discussed later), which is used to suggest that a host doesn't attempt to abort individual threads but rather escalate to an AppDomain unload. The following sections of code are marked as delay abort automatically by the CLR:

Any code currently in a managed catch or finally block.
Code executing inside of a Constrained Execution Region (CER). CERs are discussed at length in Chapter 11.
When invoking unmanaged code. In general, unmanaged code is not prepared to deal with the thread abort process.

A thread abort injects a ThreadAbortedException on the current line of execution on the target Thread. It's called an asynchronous exception as a result of the way it originates. Thread abort exceptions are also undeniable in the sense that a catch block is not permitted to suppress one:

 try {     // Imagine a thread-abort is issued at this line of code... } catch (ThreadAbortException) {     // Do nothing (i.e. try to suppress the exception). } // The CLR re-raises the ThreadAbortException right here.

The CLR reraises ThreadAbortExceptions at the end of all catch blocks. If a ThreadAbortException crosses the AppDomain boundary that is being unloaded, it gets turned into an AppDomainUnloadedException.

Note that thread aborts are apt to happen just about anywhere. For example, can you spot the (possible) memory leak in the following code?

 IntPtr handle = CreateEvent(...); try {     // Use the handle... } finally {     CloseHandle(handle); }

An abort can occur between the assignment of CreateEvent's return value to handle and entering the try block. Thus, a ThreadAbortException wouldn't cause CloseHandle to fire. Similarly, an abort can even occur between the call to CreateEvent and the assignment of its return value to the handle variable! An intelligent wrapper must be used that allocates inside of a delay abort region and that uses a Finalize method to guarantee cleanup. The SafeHandle class — discussed in the next chapter — serves precisely that purpose.

Thread Properties

In addition to using the Thread class to control the lifecycle of a managed thread, there are a number of properties that may be of interest. Some of them change the way in which the underlying OS thread is scheduled and/or managed by the CLR.

Thread Identity

Threads have both a unique system-generated identifier and name that can be used for informational and debugging purposes. The ManagedThreadId property obtains a thread's auto-generated integer sequence number (created by the CLR), guaranteed to be unique for all currently active threads. The Name property enables you to set and access a more meaningful string name for each thread. This is a settable property for threads that have not been named yet, and becomes read-only once a name has already been given.

Background Threads

As briefly noted previously, a thread can be marked as being a background thread. A managed application will stay alive only until all of its nonbackground threads have exited. Thus, if you wish to run some sort of daemon or bookkeeping thread that only stays alive while your application is executing other work, you can set your thread to run in the background using the IsBackground property. For example, this sample code creates a thread and runs it in the background:

 Thread t = new Thread(...); t.IsBackground = true; t.Start();

A thread that is to execute in the background — that is, whose IsBackground property evaluates to true — will also have the informative ThreadState value of Background. The Boolean expression (thread.ThreadState & ThreadState.Background) == ThreadState.Background is always the same value as the IsBackground property.

Thread Priority

All threads have a relative priority that determines the way that preemptive schedulers allocate time to code that is ready to run. The process in which threads live also has a relative priority class, which acts as a multiplier for a thread's priority. Although the scheduler uses sophisticated algorithms — I recommend reading the Windows Internals book referenced in the "Further Reading" section for details — roughly speaking a runnable higher-priority thread is always given priority over a lower-priority thread. If a lower-priority thread is executing and no other physical hardware thread is available, the lower-priority thread will be preempted so that the higher-priority thread can run. The OS also employs anti-starvation policies. We discuss a situation below called priority inversion that can cause major problems were the scheduling algorithm implemented as described above.

Each Thread object has a read/write Priority property that can be set at any point before or after a thread has been started. This property is of an enumeration type ThreadPriority. This enumeration contains five distinct values, each increasing in relative priority as perceived by the scheduler: Lowest, BelowNormal, Normal, AboveNormal, and Highest. As you might guess, Normal is the default value for threads created with the managed threading APIs. Please refer to the section on processes later in this chapter for details about process priority classes.

Thread-Isolated Data

By default, static variables are AppDomain-wide and shared among all threads in the process. In some scenarios, you might wish to isolate and store global data specific to a given thread. This allows you to avoid having to worry about many of the synchronization and locking problems outlined below.

Thread Local Storage (TLS)

TLS enables you to allocate and store data into slots managed by the CLR. These are stored in the TEB, and are slightly different than the Win32 notion of TLS (i.e., using TlsAlloc, TlsSetValue, TlsGetValue, TlsFree, and so on). Data stored in TLS is entirely isolated from other threads. One thread cannot access data stored in another's.

To begin using TLS, you must first allocate new slot for each unique piece of data you intend to store. This allocates a structure that is then used by all managed threads. This is done with either of the static methods: Thread.AllocateDataSlot or AllocateNamedDataSlot method. Each returns a LocalDataStore Slot object that acts as a key that you will use to retrieve or store data inside that slot. Using named slots enables you to retrieve the slot key later by its name using the GetNamedDataSlot method, while using unnamed slots requires that you keep hold of the returned LocalDataStoreSlot. Only one slot can exist with a given name; any attempts to add a duplicate key will result in an exception.

Reading and writing data in a slot are done with Thread's GetData and SetData static methods:

 object GetData(LocalDataStoreSlot slot); void SetData(LocalDataStoreSlot slot, object data);

If your application doesn't need to use a slot any longer, calling FreeNamedDataSlot will free the named TLS slot and any resources associated with it.

For example, this code uses an unnamed slot:

 LocalDataStoreSlot slot = Thread.AllocateDataSlot(); // ... Thread.SetData(slot, 63); //... int slotValue = (int)Thread.GetData(slot);

This is convenient when storing thread-wide context and can eliminate the need to pass a large number of arguments around to each method that must access the data. Furthermore, some libraries can use TLS to persist data across disjoint method calls, rather than forcing the user code to maintain and pass around a special context object.

Thread Statics

A simpler alternative to using TLS is to use so-called thread static fields, which cause specific static fields to be thread-wide instead of AppDomain-wide. A thread static is just an ordinary static field that has been annotated with the System.ThreadStaticAttribute attribute:

 class ThreadStaticTest {     [ThreadStatic]     static string data = "<unset>";     static void Test()     {         Console.WriteLine("[Master] before = {0}", data);         data = "Master thread";         Console.WriteLine("[Master] before loop = {0}", data);         Thread[] threads = new Thread[3];         for (int i = 0; i < 3; i++)         {             threads[i] = new Thread(delegate(object j) {                 Console.WriteLine("[Thread{0}] before = {1}", j, data);                 data = "Subordinate "+ j;                 Console.WriteLine("[Thread{0}] after = {1}", j, data);             });             threads[i].Start(i);         }     Array.ForEach<Thread>(threads, delegate(Thread t) { t.Join(); });     Console.WriteLine("[Master] after loop = {0}", data);   } }

Calling the Test method prints out something along these lines:

 [Master] before = <unset> [Master] before loop = Master thread [Thread0] before = [Thread0] after = Subordinate 0 [Thread1] before = [Thread1] after = Subordinate 1 [Thread2] before = [Thread2] after = Subordinate 2 [Master] after loop = Master thread

Notice that the values set by the master and subordinate threads are completely isolated within each thread. Although the master thread sets data to "Master thread" before running the subordinate threads, none of them observes this value. Similarly, when the subordinate threads exist, the master thread still sees the value "Master thread" although each of the subordinates changed its own version of data to something else during execution.

You might have also noticed that each subordinate thread sees a null value for data instead of "<unset>" before setting data. Another consequence of thread static fields is that they are initialized only once by class constructors and only on the thread that first references the field. Other threads will always see the default value of null (or default(T) for value types). Using class constructors to initialize thread static fields can lead to surprising behavior that depends on an inherent race condition; thus, it is strongly advised against.

Sharing State among Threads

Multiple threads can share access to the same data. For example, any thread in the process has access to the entire address space. The type safety of the CLR restricts this access to objects which the code executing on a thread can access. Once a thread obtains a reference to a heap-allocated object that might be used concurrently by other managed threads — either a static variable or an object passed to the thread start function — you run the risk of encountering some tricky situations. Other types of state can be shared, for example: process-wide Win32 HANDLEs, system-wide named kernel objects, memory mapped I/O, etc.

Sharing state among threads is admittedly sometimes convenient. But once you begin sharing state, even ubiquitous idioms like load-modify-store (e.g., i += 5), for example, can intersect with conflicting updates in surprising ways. Any operation is only atomic if its entire execution is guaranteed to execute at once. But multiple instructions — such as load-modify-store on a field — can be interrupted at any point during execution. This might be due to a context switch or true parallel execution. After one thread loads a value, another thread could sneak in and change it, at which point the original value the first thread obtained (sitting on its stack) would be out of date. This is a classic race condition. Once multiple threads share access to state, any program must account for all of the possible interleaving of reads and writes to that location.

A general strategy to prevent such concurrency errors is the use of critical sections. A critical section enables you to protect access to blocks of code such that only one thread may be in the section at any given time. A common implementation of critical sections is to use locks. If anytime a piece of code wishes to modify a shared location it acquires a shared lock, and if we can guarantee only one piece of code can hold a given lock at once, then we can't run into the aforementioned situation. There are numerous locking mechanisms in the Framework. After a brief example of a race condition, we'll look at each of them. Note that locks decrease the amount of concurrency an application exhibits; introducing a lock is intentionally prohibiting multiple threads from executing inside the section at once. We'll see some additional problems later that can arise as a result.

A Classic Race Condition

As a simple example of a situation in which the lack of critical sections could cause program correctness problems, consider the following:

 static int nextId = 0; static int NextId() {     return nextId++; }

The NextId method actually consists of three IL instructions: a load, an increment (add), and a store. If two threads can access this method in parallel, multiple callers could obtain duplicate identifiers. To see why this race condition exists, consider the timeline shown in Figure 10-3.

image from book
Figure 10-3: Race condition arising from unsynchronized access to shared data.

Because each executing thread gets its own stack, each loads a copy of nextId onto its stack and manipulates that. In this example, Thread 1 loads the variable as 0 and then increments its local copy to 1. At this point, it is preempted because either its time-slice expired or a higher-priority thread became runnable (or alternatively, another piece of code is running on another physical hardware thread). A context switch occurs, and the thread remembers the value 1 that it had on its stack. And then Thread 2 runs: it too loads newId as 0, increments its local copy 1, and then stores it back into the shared memory location newId. Thread 1 is then resumed; it proceeds to store 1 into the newId slot. Each thread sees a value of 0 for the NextId return, and after both have finished executing nextId still contains only 1. That's not quite so unique! Note that if a thread is able to execute all three instructions without being interrupted, this code will work correctly. This is one of the reasons that race conditions are so nasty to deal with — they are hard to reproduce and can show up one out of a million times.

Monitors

Each object has a monitor that can be used to synchronize access to it. Like a critical section, only one thread can be inside an object's monitor at any point in time. (In fact, a monitor is the CLR's equivalent to the Win32 CRITICAL_SECTION data structure and related functions.) The CLR manages monitor entry and exit using the object's sync-block, described in Chapter 3. Monitors also support recursion and reentrancy, meaning that once a thread holds an object's monitor any attempts to reacquire it will succeed. If a thread tries to enter an object's monitor that is held by another thread, the attempt will block until the other thread exits the monitor. Timeouts are optional, and can be specified using the monitor's Enter APIs.

Monitor.Enter(object obj) will attempt to enter the specified object's monitor, blocking until it succeeds. Monitor.Exit(object obj) will exit the specified object's monitor, throwing a SynchronizationLockException if the calling thread is not currently in obj's monitor. Every call to Enter(o) must be matched with a call to Exit(o); otherwise, a thread will continue to hold a lock on o's monitor indefinitely. This can lead to deadlocks.

The following pattern helps to ensure a proper exit:

 object obj = /*...*/; Monitor.Enter(obj); try {     // Protected code... } finally {     Monitor.Exit(obj); }

Using monitors, we can easily fix the NextId problem described above:

 static int nextId; static object mutex = new object(); static int NextId() {     Monitor.Enter(mutex);     try     {         return newId++;     }     finally     {         Monitor.Exit(mutex);     } }

We make sure to lock on the mutex object each time we access or modify the static nextId field. In general, determining precisely what to lock on and ensuring that you always do so consistently is the trick, especially when synchronizing access to data from multiple access paths in your code. But assuming that this is the only code that touches nextId directly, our code is now thread-safe — it will no longer return duplicate identifiers.

There is also a timeout-based TryEnter method that will attempt to enter a target object's monitor, backing out if it takes longer than a specified interval of time. There is an overload which takes an int-based millisecond and also one that takes a TimeSpan. This method will return false if the timeout is exceeded before successfully entering the monitor. For example, the above code might look like this with timeouts:

 static int nextId; static object mutex = new object(); static int NextId() {     bool taken = Monitor.TryEnter(mutex, 250);     if (!taken)         throw new Exception("Possible deadlock...monitor acquisition time-out");     try     {         return newId++;     }     finally     {         Monitor.Exit(mutex);     } }

The Monitor's Wait, Pulse, and PulseAll methods can also be for event-based synchronization among concurrent tasks. We discuss this below.

C# Lock Blocks (Language Feature)

The C# language offers syntax that can be used as shorthand for the above pattern. The lock keyword enters and exits an object's monitor before and after the following block; exiting the monitor is implemented as a finally block to ensure execution. VB also offers a similar feature through the use of the SyncLock keyword:

 static int nextId; static object mutex = new object(); static int NextId() {     lock (mutex)     {         return customerId++;     } }

The behavior of this method is the same, but admittedly more readable and writable, and therefore more maintainable. A timeout variant is not offered. But using timeouts as a deadlock prevention mechanism is not a robust solution at all; your goal should be to ferret out and fix any possible deadlock situations before your code ever makes it into production. Using TryEnter can be used to identify such problems, but a program hang is usually sufficient notification of such a problem.

Synchronized Methods

The MethodImplAttribute attribute from System.Runtime.CompilerServices can be used to synchronize access to entire methods. Annotating a method with this attribute with the argument MethodImplOptions.Synchronized causes the CLR to synchronize all access to your method using monitors. This is how J# implements methods decorated with the synchronized keyword. The difference between this construct and those reviewed above is that a lock is acquired for an entire method execution:

 static int nextId; [MethodImpl(MethodImplOptions.Synchronized)] static int NextId() {     return customerId++; }

Using this mechanism unfortunately gives you no control regarding what monitor is acquired. In almost all cases, your logic should to protect its locks so that outside code cannot affect your ability to acquire them. In some extreme cases, this technique can cause code in separate AppDomains to conflict with one other and even cause deadlocks. In general, you should stick to the lock keyword described above.

Windows Lock Objects

There are several Windows primitive executive objects that enable some form of synchronization among threads. These classes, Mutex and Semaphore, are simple wrappers on top of the related Win32 functions. They build on top of the WaitHandle primitive to enable waiting and signaling using standard interfaces. While they provide similar functionality and can be used in many of the same scenarios as monitors, there are some important differences. First, the locks themselves are controlled by Windows (through Win32 function calls). Thus, using them allocates unmanaged resources (HANDLEs) that must be disposed of. Second, and more importantly, they can be named and referred to across multiple processes to synchronize access to machine-wide resources.

Mutexes

A mutex (a.k.a. mutant) stands for "mutual exclusion" and enables you to synchronize access to shared state, much as with the Monitor class. Its functionality is surface to managed code with the Mutex class. Because a mutex can be system-wide, multiple processes can use the same mutex to protect access to some shared system resource. Also note that mutexes are, like monitors, recursive and reentrant; any attempts to acquire a mutex already owned by the current thread will succeed and increase a recursion count. The mutex must be released the same number of times it has been acquired.

Mutexes can be given names, in which case they are system-wide mutexes. Those without names are local mutexes and cannot be accessed outside of the process in which they are allocated. They must be accessed with a shared Mutex object. There are two primary means to get an instance of a mutex: either ask the OS to allocate a new one using one of the available constructors, or use the static OpenExisting method to open a previously allocated system-wide mutex.

The Mutex() and Mutex(bool initiallyOwned) constructors enable you to construct new local mutexes, while Mutex(bool initiallyOwned, string name) and Mutex(bool initiallyOwned,string name, out bool createdNew) enable you to construct new system-wide, named ones. If you don't need to interoperate between processes, the local overloads are preferable; they avoid conflicts with other named mutexes and carry less overhead because they are process-local. The initiallyOwned parameter indicates whether the new mutex will be acquired or not automatically. The createdNew output parameter is set to true if the OS acquired ownership of the mutex.

A mutex is either signaled or unsignaled (remember, it builds on top of WaitHandles). A signaled mutex is available for acquisition, while an unsignaled mutex is already owned. To acquire a mutex, use the WaitOne instance method. This method has a simple no-argument overload, WaitOne(), which will block indefinitely until the target mutex can be acquired. The WaitOne(int millisecondsTimeout, bool exitContext) and WaitOne(TimeSpan timeout, bool exitContext) overloads enable you to pass in a timeout after which the wait gives up, returning true or false to indicate success or failure. Each takes an exitContext argument, which is used to exit the current synchronization domain when a ContextBoundObject is on the stack. Refer to the SDK for more information on synchronization domains if necessary. You may also use the WaitAny or WaitAll static methods on WaitHandle to wait for a number of WaitHandles (including Mutexes) simultaneously. ReleaseMutex is used to release an acquired mutex.

For example, this code snippet shows a local mutex protecting access to a shared variable:

 static int nextId; static Mutex mutex = new Mutex(); static int NextId() {     mutex.WaitOne();     try     {         return newId++;     }     finally     {         mutex.ReleaseMutex();     } }

Issuing a WaitOne on an abandoned mutex will throw an AbandonedMutexException. An abandoned mutex is any system mutex that was not released properly by its owning thread when its thread's process exited. This can occur if a process crashed midway through updating a shared data structure. You can catch this exception and respond by validating the integrity of shared state. If everything looks OK, you can reacquire the mutex and proceed as normal (with caution).

Semaphores

A semaphore can be used to protect access to critical resources, including local and system-wide by using named kernel objects. An instance is represented by the managed Semaphore class and wraps access to the underlying Win32 functions. Unlike a mutex, however, a semaphore is not just for mutually exclusive access to resources. Each semaphore has a number of available resources that are decremented by one each time somebody acquires the semaphore. When the count reaches zero, any attempts to obtain additional locks will block execution until the semaphore's counter has been incremented above zero again. Increasing the semaphore's count is done simply by releasing an acquired lock. Similarly, a semaphore can have a maximum count. If a thread attempts to increment a semaphore's count above that maximum, it will throw a SemaphoreFullException. This effectively limits the number of threads that can access a protected resource to a finite number. This can be helpful when limiting access to scarce shared resources.

Unlike mutexes, any thread can increment and decrement a semaphore. In fact, in producer-consumer scenarios, this is a common pattern: one thread will increase the count to indicate the arrival of a new resource, while another will be decreasing it, for instance, as it consumes the items being produced. For system-wide semaphores, this can lead to difficult-to-debug errors if a program crashes. Abandoned mutexes can be extremely useful for detecting such problems; with semaphores, however, this coding pattern is perfectly legal and does not look like an error to the OS.

The functionality offered by the Semaphore class is almost identical to Mutex. You can create a new semaphore using one of the available constructor overloads; they take two interesting arguments: initial Count and maximumCount, both integers. The initialCount parameter specifies what the initial count of the semaphore, while maximumCount sets a maximum on the semaphore's count. The OpenExisting static method will open an existing named OS semaphore.

As with Mutex, you can use the WaitOne, WaitAny, or WaitAll methods to acquire a semaphore (a.k.a, decrement its count). To release a semaphore (i.e., increment its count) you can use the Release method. The default overload increments the count by 1, but you can use the Release(int count) overload to increment it by the value of count.

Access Control Lists (ACLs)

All Windows executive objects may be secured using ACLs. This is especially useful for system-wide objects, for example to ensure that arbitrary processes cannot access a Mutex or Semaphore by name and cause strange program bugs. The functionality offered is very similar to that which is used to secure access to files. MutexSecurity, SemaphoreSecurity, and EventWaitHandleSecurity are the relevant types, and they are located in the System.Security.AccessControl namespace. In-depth coverage of the .NET Framework's ACL functionality is provided in Chapter 9.

Reader-Writer Locks

With the aforementioned mutual exclusion techniques, it is quite simple to protect access to shared resources by ensuring that only one thread can read or write to shared data simultaneously. However, treating reads and writes identically can result in unnecessary reduction in parallelism. In most cases, all you really need to enforce is that simultaneous writes cannot interfere (i.e., enabling atomic updates), ensuring readers only see consistent data. A policy that enables this is to ensure that, when a writer is writing, only one writer can be doing so and nobody can be concurrently reading; but if nobody is writing, multiple tasks can be reading in parallel. To implement such a policy, we certainly don't need to throttle all access to the data one thread at a time. Doing so can, in fact, degrade performance significantly, for example in scenarios where high volumes of reads dwarf the number of writes.

The ReaderWriterLock class enables you to do exactly what is written. It enables multiple readers to hold locks on the same ReaderWriterLock simultaneously, but when a writer holds a lock nobody else can acquire a read or write lock. A writer will only obtain a lock once all readers and writers have released their locks. Acquisition is done using the AcquireReaderLock and AcquireWriterLock methods. Each takes either an integer or TimeSpan-based timeout. A thread will block until the type of lock requested is taken, or until the timeout exceeded, in which case the method will return without taking a lock. You can pass Timeout.Infinite (i.e., the integer -1) to indicate that the acquisition should not time out. The properties IsReaderLockHeld and IsWriterLockHeld can be used to determine whether a lock succeeded. They will return true if the type of lock requested is held by the current thread.

The corresponding ReleaseReaderLock and ReleaseWriterLock methods release a lock of the respective type. Because the ReaderWriterLock class is reentrant and recursive, a single thread can hold multiple reader locks simultaneously. The ReleaseLock method is a shortcut to release all locks currently held by the current thread, either reader or writer, and will release multiple threads at once if they are held:

 ReaderWriterLock rwLock = new ReaderWriterLock(); rwLock.AcquireReaderLock(250); if (rwLock.IsReaderLockHeld) {     try     {         // Synchronized code...     }     finally     {         rwLock.ReleaseReaderLock();     } }

Although reentrant, a lock only permits a new writer if a lock has no current readers. This means that a thread that tries to acquire a writer lock that already owns a reader lock will block until its timeout expires (or indefinitely if an infinite timeout was supplied). As a convenience, ReaderWriterLock also has an UpgradeToWriterLock method. This will change an already held reader lock to a writer lock. This is not an atomic operation as you might expect, so a waiter that is already in the queue to obtain a lock might be given a chance to run before your lock is upgraded. This method returns a LockCookie that can be passed to the DowngradeToReaderLock method should you want to downgrade the lock back to a read lock.

Request Queues and Starvation

The ReaderWriterLock maintains two queues of lock requests: one for writers, the other for readers. In very read-intensive applications, a writer starvation situation can occur. This might happen theoretically if a writer had to wait until all reader locks were released before acquiring its lock. If readers continue to acquire locks even while a writer waits, it is possible that there would never be a point in time when there were zero active readers. If this were the case, no writers would ever be granted locks, obviously wreaking havoc on the program.

To solve this problem, the ReaderWriterLock class was implemented to stop handing out reader locks as soon as a writer enters the queue. Only when the writer queue is empty will read locks once again be allowed. Unfortunately, this can mean the reverse of the above-mentioned situation can happen. That is, if a constant stream of writers continues to enter the queue, readers will never be given a chance to run. Therefore, this type should be used only in situations where the ratio of read to write operations is at least 2:1 (if not greater).

An Example: Readers and Writers

Let's explore a situation that demonstrates some potential problems with mutual exclusive access, and in particular the benefits of using a ReaderWriterLock. Say that we have an Account class with a number of properties. Each instance of this class is shared by many threads in the program and is read from quite frequently for various purposes. Updates to these instances occur on a relatively infrequent basis. A clear requirement of our application is that an Account can never be viewed in an inconsistent state (that is, mid-update):

 class Account {     private string company;     private decimal balance;     private DateTime lastUpdate;     public Account(string company)     {         this.company = company;         this.balance = 0.0m;         this.lastUpdate = DateTime.Now;     }     public string Company { get { return company; } }     public decimal Balance { get { return balance; } }     public DateTime LastUpdate { get { return lastUpdate; } }     public decimal UpdateBalance(decimal delta)     {         balance += delta;         lastUpdate = DateTime.Now;         return balance;     } }

Unless the code that uses Account does some sort of mutual exclusion, there are a number of problems that could occur. Concurrent updates might clash with each other (e.g., resulting in lost balance updates) or a reader could access data while an instance has been only partially updated (e.g., balance was updated but the corresponding lastUpdate modification). A nave synchronization scheme might use a lock around any use of an Account instance.

For example, a reader might just lock on the instance itself:

 Account account = /*...*/ lock (account) {     Console.WriteLine("{0}, balance: {1}, last updated: {2}",         account.Company, account.Balance, account.LastUpdate); }

And likewise a writer might lock to ensure that it doesn't conflict with concurrent updates:

 Account account = /*...*/; lock (account) {     account.UpdateBalance(-125.75m); // debit operation }

But if we're only reading the data most of the time, two concurrent reads will clash with each other. This reduces concurrency for no good reason. To solve this problem, we can use the ReaderWriterLock class:

 class Account {     public ReaderWriterLock SyncLock = new ReaderWriterLock();     // Class definition otherwise unchanged... }

Our consumers of this class can then use a simple policy to choreograph concurrent access to a shared instance of Account. Whenever a class is going to read data from Account, it must first call AcquireReaderLock and then call ReleaseReaderLock when it's done; similarly, writers must call the AcquireWriterLock and ReleaseWriterLock methods:

 Account account = /*...*/; account.SyncLock.AcquireReaderLock(-1); try {     Console.WriteLine("{0}, balance: {1}, last updated: {2}",         account.Company, account.Balance, account.LastUpdate); } finally {     account.SyncLock.ReleaseReaderLock(); }

And similarly, the writer would change to:

 Account account = /*...*/; account.SyncLock.AcquireWriterLock(-1); try {     account.UpdateBalance(-125.75m); } finally {     account.SyncLock.ReleaseWriterLock(); }

This eliminates unnecessary contention for mutual exclusion locks. You might also consider synchronizing the implementations of the methods themselves rather than forcing consumers to understand the locking mechanisms. For example, the UpdateBalance method could automatically obtain and release a writer lock at the beginning and end of its body, respectively. It's not so simple for the reader locks, though. Usually, you want to use one lock spanning multiple property accesses to simulate atomicity; thus, obtaining and releasing a new lock before and after each property access wouldn't work. If you had an atomic getter method, for example, you could synchronize access properly. These two instance methods demonstrate this technique:

 public class Account {     private ReaderWriterLock syncLock = new ReaderWriterLock();     public decimal UpdateBalance(decimal delta)     {         syncLock.AcquireWriterLock(-1);         try         {             balance += delta;             lastUpdate = DateTime.Now;             return balance;         }         finally         {             syncLock.ReleaseWriterLock();         }     }     public void GetState(out string companyOut, out decimal balanceOut,         out DateTime lastUpdateOut)     {         syncLock.AcquireReaderLock(-1);         try         {             companyOut = company;             balanceOut = balance;             lastUpdateOut = lastUpdateOut;         }         finally         {             syncLock.ReleaseReaderLock();         }     }     // Class definition otherwise unchanged... }

Admittedly having to access an object's state through the use of such an atomic GetState(...) method is a little awkward. You might consider returning a simple struct containing each of the fields instead.

Interlocked Operations

The Interlocked class contains a number of static helper methods that perform atomic updates to shared memory locations. Using Interlocked uses hardware primitives to implement fast compare-and-swap and related functions, avoiding any form of blocking whatsoever. These make use of specific instructions available in all modern hardware (e.g., the lock prefix on x86, memory fences on IA-64), and work in unison with the cache coherency protocol and memory controllers to guarantee a certain degree of thread safety.

Adding a number to a shared integer requires three IL instructions: load, modify, and store. As we saw above, this can lead to race conditions. Instead of using a heavyweight lock, which can cause contending threads to enter a blocking wait-state, you can instead use the atomic Add methods. There is an int-based (32-bit) and long-based (64-bit) overload. Both take a reference to a location as their first argument and the value to add to the current location as their second. The method increments the contents of the location by value, and returns the original value that was in the location before changing it. This is perfect for our scenario above:

 static int nextId = 0; static int NextId() {     return Interlocked.Add(ref nextId, 1); }

This code is now entirely thread-safe, and much better performing too.

Similarly, the Increment and Decrement methods are much like shortcuts to Add(1) and Add(-1) respectively, except that they return the value that exists in the memory location after the call to the function. In other words, we could use Increment for our example above as follows:

 static int nextId = 0; static int NextId() {     return Interlocked.Increment(ref nextId) - 1; }

The Exchange method enables you to replace the contents pointed at by a reference and retrieve the original value as a single atomic operation. There are several overloads: for integers, floating points, IntPtr's, and objects. The reference passed as the first argument, the value to place in the location is passed as the second argument, and the function returns the old contents. CompareExchange is similar, but it will conditionally check to ensure the value of the target reference is a specific value before changing the contents. The first two arguments are the same as with Exchange, and the third argument is used for comparison purposes. If the value stored in location1 is equal to comparand, it will be set to value. Otherwise, location1's contents are left unchanged. CompareExchange returns whatever value it saw in location1.

For example, CompareExchange can be used to build a spin-lock. This is shown in Listing 10-1. This code uses a number of advanced techniques such as critical regions and SpinWaits, which we don't discuss until the "Advanced Threading Topics" section below.

Listing 10-1: A spin-lock

 using System; using System.Threading; class SpinLock {     private int state;     private EventWaitHandle available = new AutoResetEvent(false);     // This looks at the total number of hardware threads available; if it's     // only 1, we will use an optimized code path     private static bool isSingleProc = (Environment.ProcessorCount == 1);     private const int outerTryCount = 5;     private const int cexTryCount = 100;     public void Enter(out bool taken)    {         // Taken is an out parameter so that we set it *inside* the critical         // region, rather than returning it and permitting aborts to creep in.         // Without this, the caller could take the lock, but not release it         // because it didn't know it had to.         taken = false;         while (!taken)         {             if (isSingleProc)             {                 // Don't busy wait on 1-logical processor machines; try                 // a single swap, and if it fails, drop back to EventWaitHandle.                 Thread.BeginCriticalRegion();                 taken = Interlocked.CompareExchange(ref state, 1, 0) == 0;                 if (!taken)                     Thread.EndCriticalRegion();             }             else             {                 for (int i = 0; !taken && i < outerTryCount; i++)                 {                     // Tell the CLR we're in a critical region;                     // interrupting could lead to deadlocks.                     Thread.BeginCriticalRegion();                     // Try ‘cexTryCount' times to CEX the state variable:                     int tries = 0;                     while (!(taken =                         Interlocked.CompareExchange(ref state, 1, 0) == 0) &&                         tries++ < cexTryCount)                     {                         Thread.SpinWait(1);                     }                     if (!taken)                     {                         // We failed to acquire in the busy spin, mark the end                         // of our critical region and yield to let another                         // thread make forward progress.                         Thread.EndCriticalRegion();                         Thread.Sleep(0);                     }                 }             }             // If we didn't acquire the lock, block.             if (!taken) available.WaitOne();         }         return;     }     public void Enter()     {         // Convenience method. Using this could be prone to deadlocks.         bool b;         return Enter(out b);     }     public void Exit()     {         if (Interlocked.CompareExchange(ref state, 0, 1) == 1)         {             // We notify the waking threads inside our critical region so             // that an abort doesn't cause us to lose a pulse, (which could             // lead to deadlocks).             available.Set();             Thread.EndCriticalRegion();         }     } }

Lastly, the reads and writes to 64-bit values on 32-bit machines are not atomic. This means that, subject to race conditions, one thread could read the first 32 bits of a 64-bit location, another thread could update that location, and then the thread could read the last 32 bits of the location. Presumably the value will not be a pointer (because we're on a 32-bit machine), but this can lead to subtle data corruption. To guarantee atomicity, you must use the Interlocked Read and Exchange methods.

Common Concurrency Problems

There are a set of common problems that arise with concurrent code. We saw one of the biggies above: race conditions. Races are among the most difficult bugs to test for and to fix, due to the complex interaction of seemingly unrelated code coupled with strange, hard-to-reproduce timings. There are some others that you must be cognizant of.

Deadlocks

Perhaps one of the most well-known problems in concurrent applications is deadlock. This is a situation in which a chain of threads end up waiting for each other to complete in such a way that none will ever wake up. This can happen, for example, if your threads synchronize on shared data but acquire and release locks in a different order. Unless mitigation is in place for such a situation, your program (or parts of it) could end up hanging indefinitely.

As a simple example, consider a situation where two threads lock on shared data. They do so, however, in different orders. Thread A acquires the lock for a and then b:

 lock (a) {     lock (b)     {          // Synchronized code...     } }

Thread B acquires the same locks in reverse order, b and then a:

 lock (b) {     lock (a)     {         // Synchronized code...     } }

Consider what occurs, however, in this sequence: A acquires a and is then preempted; B acquires b. B attempts to acquire a but blocks because A has it. A then runs and attempts to acquire b but blocks because B has it. We are now in a deadlock, a so-called deadly embrace. Thread A won't release the lock on a until it can acquire the lock for b, but Thread B won't release the lock for b until it can acquire one for a. Unfortunately, deadlocks aren't always so straightforward to detect. A wait-graph is a data structure that tracks who is waiting for whom. Any cycle in this graph represents a deadlock. For example, imagine that A is waiting for B, B is waiting for C, C is waiting for D, and D is waiting for A; this is a deadlock but much more difficult to detect and manage.

Once we are in a deadlock situation, something must give. But the CLR does not attempt to automatically resolve the conflict. SQL Server as a host, however, does perform deadlock detection. Most RDBMSs deal with this situation by killing whichever task has done the least amount of work. But for complex wait-graphs, the algorithm used can get quite complex. Since there isn't built-in support for dealing with deadlocks, you'll need to mitigate the risks yourself. The best way to do this, of course, is to avoid the situation entirely.

The simplest way to avoid a deadlock is to use lock leveling. This is a technique that ensures lock acquisitions occur in a consistent order throughout your entire application. The situation demonstrated above would have never occurred if both blocks of code always acquired a then b. Typically, this is done by factoring software components in a layered fashion. Then components at a certain layer can only take locks at layers lower than it resides. This is difficult to enforce across an entire application, but rigorous code reviews and some level of library support can help.

Starvation

The way that processing time is allocated to threads is left entirely up to the OS scheduling algorithm. Its scheduling algorithm relies on both process class and thread priority when choosing available tasks to run. This results in a world where the highest-priority items are typically given a chance to run before lower-priority items. In most situations, this is exactly what you want. However, it can also cause problems, resulting in a situation generally referred to as starvation.

As an illustration of starvation, consider a program that generates a large volume of high-priority threads. If these threads use up all of the available processor time, no lower-priority tasks will get any time to execute. The scheduler could get into a situation where it is enabling the higher-priority threads to hog all of the available CPU time, never preempting any to allow a lower-priority task to run. Admittedly, this is a rare situation, since most tasks are I/O-bound rather than CPU-bound (even a higher-priority thread that blocks will permit a lower-priority thread to run), but it can indeed occur.

Priority inversion is a classic manifestation of starvation. Imagine this scenario: Three threads are running, A at high priority, B at medium priority, and C at low priority. C gets a chance to run, and acquires lock a. A then becomes runnable, tries to acquire lock a, and subsequently blocks (because C has it already). Before C has a chance to release a, B becomes runnable; the OS lets B run flat-out because it is higher-priority than C. At this point, we've hit a priority inversion. C's priority has been artificially boosted because it has acquired a critical section and forced A to block. B has to give up the processor and enable C to release a in order for A to be able to make forward progress again.

The general problem of starvation is handled by Windows. The thread scheduler employs an anti-starvation policy to avoid the bleak scenarios painted above. A detailed discussion of how this works is outside of the scope of this book. Please refer to the "Further Reading" section for some resources that drill deeper into the OS's scheduling algorithms.

Events

Concurrent tasks must often coordinate work with each other. Imagine a situation where one thread is producing items of interest while another is consuming these items. One way to implement such a scheme would be to write the consumer code in a loop that checks for items, processes them if any are available, and sleeps for a finite amount of time before checking again once there aren't any items. This is a very poor approach. Waiting at the end of each loop for an arbitrary amount of time before checking for new elements is extremely wasteful: the consumer will undoubtedly end up either sleeping for longer than necessary or not long enough. Ideally, the producer would "announce" that a new item is available to all interested parties immediately when one becomes available.

Events permit you to do exactly that. An event is just a notification that some condition has arisen that other objects might be interested in. When a data structure is modified or when some object causes a condition to become true, it is responsible for broadcasting relevant events. Those listening for an event can then wake up and perform whatever processing is needed. In the context of the above-mentioned scenario, the producer can generate an event each time an item has been produced, and consumers can use these events to trigger consumption.

Monitor-Based Events

Monitors can be used for event-based programming. In the same way that you can call Enter to enter an object's monitor, you can likewise wait on an object. An object can then be signaled to wake up anybody who is currently waiting. Pulse is used to wake up just a single (randomly chosen) waiter, while PulseAll is used to wake up all waiters.

To Wait for an object, you must first enter the target's monitor. Once you invoke the Wait(...) method, you implicitly exit the object's monitor temporarily and your thread is moved into the WaitSleepJoin thread-state. When the target object generates an event (i.e., a "pulse"), the thread will wake up, attempt to reacquire the object's monitor, and proceed right after the call to Wait. There is no way to specify the condition for which you are waiting, so you must always check that the condition holds immediately after being woken up.

For example, this code requires that the queue have at least one item to process. We must verify that the condition holds when we wake up:

 static Queue<object> queue = new Queue<object>(); void Consume() {     object item = null;     lock (queue)     {         // Standard test-condition/wait loop:         while (queue.Count == 0)             Monitor.Wait(queue);         item = queue.Dequeue();     }     // OK to process the item... }

The Wait method has a number of overloads, all of which take an object parameter obj indicating the target object on which to wait. Like many of the Wait-like constructs in the System.Threading namespace, you can pass a timeout in the form of an int-based millisecond count or a TimeSpan instance. If the timeout is exceeded while waiting, the method will return false. In this case, the event did not occur.

Generate an event with Pulse and PulseAll must be called while the target object's monitor is held. Pulses are not processed until the monitor is released. Once processed, this will wake up threads that are currently waiting on an event:

 void Produce() {     object item = /*...*/;     lock (queue)     {         queue.Enqueue(item);         Monitor.Pulse(queue);     } }

Which style of pulse you use is entirely dependent on your algorithm. For example, in the producer-consumer example from above, assuming that there can be multiple consumers per producer Pulse is likely the best choice. Presumably only one thread can consume a single item at a time, so pulsing all of the available consumers would just cause all of them to race to enter the monitor simultaneously, increasing contention and wasting cycles. Pulsing one at a time will enable one available consumer to respond to each produced item at a time.

Missed Pulses

Regardless of whether you use Pulse or PulseAll, there is a problematic situation called a "missed pulse" that you must be cautious of. The problem arises when an event misses its intended recipient, sometimes causing the recipient to subsequently wait for the event (which has already occurred); this can lead to deadlocks if the consumption code was written incorrectly.

To illustrate a scenario where this might occur, imagine we use Pulse to notify objects that an interesting condition has arisen. There are multiple threads, each waiting for a different condition to arise, but sharing the same object for event communication. Unless you use PulseAll, you can't be certain which thread will be woken up when you generate the event. The event could be delivered to a thread waiting for an entirely different condition to arise; it will promptly notice this and ignore the event. In this case, the thread actually waiting for the condition will continue to wait. Depending on the interaction between threads, this thread could be left waiting forever. Clearly, using PulseAll in this situation will solve the problem.

Even when using PulseAll, improperly designed consumers can still miss an event. If they aren't in a wait state when the event occurs, they must take care to validate that the event hasn't already occurred before waiting. If a consumer neglects to check for the condition before calling Wait, it could end up waiting indefinitely. This is obviously due to incorrect code but is an easy situation to get into, especially for System.Threading beginners.

Win32 Events

Both the AutoResetEvent and ManualResetEvent classes (deriving from the EventWaitHandle type) provide event functionality based on WaitHandle. The functionality is similar to that of monitors described above but includes a few additional capabilities. As indicated by their names, the primary difference between the two is how events are reset.

As with the other Win32 synchronization primitives, you can open an existing system-wide named event shared among processes with the OpenExisting method; alternatively, you can construct either a local or system-wide event using the constructors. To wait for an event, you simply call the WaitOne method, optionally passing in a timeout for the wait. Similar to Mutex and Semaphore, you can wait for multiple handles using the WaitHandle.WaitAny or WaitAll methods. These block until an event is signaled.

To set an event, you call the Set method. This sets the handle to a signaled state. If you're using an auto-reset event, this will wake up at most one thread currently waiting on the handle. The first thread to enter the event will reset the signal back to an unsignaled state. With a manual-reset event, all threads waiting for the handle are woken up, and the event remains signaled until explicitly reset with the Reset method. You can construct very rich thread interactions using the auto- and manual-reset variants. Because of this, EventWaitHandles are often preferable to using the Monitor.Wait, Pulse, and PulseAll counterparts.

Timers

The System.Threading.Timer class enables you to execute code asynchronously on a timed interval. This functionality implicitly uses the thread pool to execute raised events. To create a new timer, simply call one of its constructors, passing in the callBack delegate that is to be invoked on a recurring interval and the object state to be passed to the delegate. This delegate is similar to WaitCallback above and has a signature of void TimerCallback(object state).

There are several overloads of the constructor, enabling you to specify the start time and periodic interval on which to fire the timer event. Regardless of which you choose (there are signed and unsigned 32-bit integer, 64-bit integer, and TimeSpan-based versions available), the dueTime parameter specifies the amount of time to delay before starting the timer (0 indicates immediate execution), and period specifies the amount of time to delay in between recurring events. Timeout.Infinite can be passed for either of these parameters. If used for dueTime, the Timer will not begin execution until a new dueTime is specified with Change; if used for period, it indicates that the Timer should continue executing forever.

To stop a timer from executing and release any native resources it is holding on to, call the Dispose method on the Timer instance. You must keep a reference around to the Timer you created; otherwise, you'll create a temporary memory leak and won't be able to stop its execution!

Also note that this particular Timer doesn't work well with WinForms. We discuss GUI threading models at a high level below. But if you want to modify WinForms UI components inside the callback — which timers are often used to do — you will need to transition onto the UI thread. (You'll see how to do that manually just shortly.) The System.Timers.Timer class offers a SynchronizingObject property. If you set that to your target UI widget, the class will automatically transition onto the UI thread when invoking the callback. The System.Windows.Forms.Timer class offers similar functionality. Both classes are components and thus integrate with the WinForms designer quite nicely.

Asynchronous Programming Model (APM)

Several areas of the Framework offer asynchronous versions of expensive synchronous operations. The System.IO.Stream APIs are a great example of this; we saw some examples of how to use them in Chapter 7. Most such asynchronous operations in the Framework follow a standardized design pattern, called the asynchronous programming model (APM). Operations following this pattern offer a pair of methods, BeginXxx and EndXxx, which correspond to the asynchronous begin- and end-methods for a synchronous method Xxx. For example, Stream offers BeginRead and EndRead to take the place of its Read method.

An APM pair of methods follows a standard design convention:

BeginXxx accepts the same parameters as Xxx plus two additional ones at the end: an AsyncCallback and a state object. BeginXxx returns an IAsyncResult object, which can be used for completing the asynchronous activity (you'll see this interface shortly). Executing this function initiates the asynchronous activity;
EndXxx accepts an IAsyncResult as input and returns the same return type as the Xxx operation. This function blocks execution until the asynchronous activity has completed and returns the value returned by the underlying Xxx operation; if the underlying operation threw an exception, EndXxx will relay (i.e., rethrow) this exception.

For example, here are the Stream class's Read, BeginRead, and EndRead operations:

 public abstract int Read([In, Out] byte[] buffer, int offset, int count); public virtual IAsyncResult BeginRead(byte[] buffer, int offset, int count,     AsyncCallback callback, object state); public virtual int EndRead(IAsyncResult asyncResult)

Notice that they follow the rules described above exactly. For reference, here is the interface for the System.IAsyncResult interface:

 public interface IAsyncResult {     object AsyncState { get; }     WaitHandle AsyncWaitHandle { get; }     bool CompletedSynchronously { get; }     bool IsCompleted { get; } }

Executing an asynchronous activity does not guarantee precisely where the operation will be carried out. It might be a thread from the ThreadPool, no thread at all (as is the case with I/O Completion Ports — see Chapter 7 for details), or even synchronously on the thread which called BeginXxx itself.

Rendezvousing is the act of completing an asynchronous activity. You have three options:

Supply an AsyncCallback delegate to the BeginXxx activity and optionally some state to be transparently passed to the delegate in the IAsyncResult's AsyncState property upon invocation. The asynchronous activity will call your delegate once it is complete, at which point your method can finish the activity. This requires calling EndXxx on the object on that BeginXxx was called to obtain the return value and to marshal exceptions (if any). Even if the return type for Xxx is void, calling EndXxx is necessary to relinquish resources held by the IAsyncResult.
Poll the IAsyncResult's IsCompleted property for completion. If it returns true, the activity has completed and you can call EndXxx without blocking. This can be used to make an application progress forward until the task is finished. Note that you should not use IsCompleted as a spin-loop predicate, as this can lead to very poor performance. Consider a blocking wait if you'd like to pause execution of the thread until the activity has completed.
Block until the asynchronous task has finished. You can do this either by calling WaitOne on the IAsyncResult's AsyncWaitHandle or just calling EndXxx directly. Remember: EndXxx will block until the asynchronous operation is done.

You should never attempt to mix the callback approach with the blocking wait approach. The callback and the caller share the same IAsyncResult; calling EndXxx on it can lead to race-condition-induced failures. Furthermore, there is no ordering guarantee regarding setting IsCompleted, signaling the WaitHandle, and invoking the callback. All of these factors can lead you into trouble if you try to mix techniques.

Note that delegates follow the APM. Any custom delegate type you define will have a synchronous Invoke and asynchronous BeginInvoke and EndInvoke methods. You should generally prefer APM methods on a type over ThreadPool.QueueUserWorkItem, and you should prefer using the ThreadPool to asynchronous delegates. Using the ThreadPool directly is more efficient than using asynchronous delegate invokes.

UI Threads and Background Workers

It is illegal to manipulate a User Interface (UI) control on any thread other than the dedicated UI thread. Every Windows GUI application has at least one such thread (which you can retrieve via the Win32 GetWindowsThreadProcessId function). This means that if you are doing any asynchronous work, for example on the ThreadPool, an explicit Thread, or from within an asynchronous callback, you need to worry about transitioning back onto the UI thread to update interface components. The System .Windows.Forms.Control type (in the System.Windows.Forms.dll assembly) implements the System.ComponentModel.ISynchronizeInvoke interface, which can be used to make such transitions back onto the thread to which the target control belongs.

If you have a Label whose Text property you'd like to update, for example, you would need to worry about this. Control offers an InvokeRequired property, which returns true to indicate that you're not executing on the UI thread and thus need to transition. A transition is performed by calling the Invoke method, passing a delegate, which will be queued for execution on the UI thread and invoked:

 System.Windows.Forms.Label lb = /*...*/ void SomeAsyncWorkerCallback(IAsyncResult ar) {     // Call some EndXxx function, get results...     lb.Invoke(delegate { lb.Text = "..."; }, null); }

In addition to Invoke are BeginInvoke and EndInvoke methods, which are just the APM versions in case you do not want to block while the message is sent to and processed by the GUI thread.

To aid in this type of logic, 2.0 contains a new type in System.ComponentModel: BackgroundWorker. This new type can be used to launch new asynchronous tasks from the GUI thread and to service various callbacks from the UI thread. BackgroundWorker handles transitioning back to the UI thread when appropriate. Just instantiate a new instance and add a handler to the DoWork event. This is where you should place your intensive operations. You must not touch UI components inside this function, as it executes on a ThreadPool thread. You can also add handlers to the ProgressChanged and RunWorkerCompleted events; these are fired on the UI thread, meaning that you can update visual elements to communicate progress or to the final results of the operation. Executing RunWorkerAsync invokes the DoWork handler, passing the optional object argument as DoWorkEventArgs.Argument.

Advanced Threading Topics

This section discusses a variety of miscellaneous advanced threading topics.

Critical and Thread-Affinity Sections

Because the hosting APIs permit sophisticated hosts to map logical threads to whichever physical OS representation they choose, it is important to communicate how the logical workload is being used at any given time. In addition, a host is apt to swap out work, attempt an AppDomain shutdown (via thread aborts), or any number of things. The Thread APIs Begin- and EndCriticalRegion and Begin- and EndThreadAffinity communicate this information.

A critical region is a section of code that, if interrupted with a thread abort, would put the entire AppDomain at risk. A critical section is a great example. When a thread holds a critical section, not permitting it to release the section and fix up partial updates could corrupt the AppDomain. Thus, a thread that is inside a critical region will normally not be interrupted; if a host must escalate, it will initiate an entire AppDomain unload.

Thread affinity is a general term for code that relies on state in the TEB to function correctly. If you've stashed away some important state in TLS, for example, then migrating your work to another physical thread could cause unpredictable behavior. Using the thread affinity APIs notifies the host of this fact. You should avoid dependencies on thread affinity as much as possible. In some cases — for example Win32's GetLastError — it's unavoidable.

Memory Models

Because the CLR is a virtual execution machine, it takes a certain level of responsibility for abstracting the underlying hardware platform. Computer architectures often execute read and write instructions out of order. This can occur as a result of optimizations throughout the software stack (e.g., compiler optimizations, JIT optimizations) but equally as likely is due to the hardware itself. Modern computers use a very deep pipeline to execute instructions ahead of time, in a predictive fashion, and even sometimes out of order. Modern cache hierarchies use several levels of cache, some of which are shared among CPUs (e.g., in Hyper-Threaded and multi-core processors), some of which are not. Cache is a mechanism used to ensure that processor caches stay in sync.

All such architectures make guarantees around the memory consistency model, so that software can be written that functions predictably. All models guarantee that single-threaded program execution cannot observe the effects of reordering. But concurrent software can. All memory models similarly guarantee that data- and control-dependency ordering is not violated. Sequential consistency (a.k.a. program-order consistency) is a model which guarantees all reads and writes occur in the same order in which you've written them. Seldom is this practical. Software and hardware execute instructions out of order as specified above for performance reasons; limiting their ability to do so can harm performance. And most reads and writes to memory are not to shared state, in which case such optimizations can never be observed.

Memory models are characterized by their strength. Sequential consistency is the strongest, while a processor that enables reads and writes to be rearranged without regard is the weakest. Most models are somewhere in between. IA-32 (a.k.a. x86) and AMD-64 are essentially sequentially consistent. Some weak total ordering guarantees with respect to cache coherency are not made, which seldom surface in practice (although on larger quantities of processors, this might become more noticeable over time). IA-64 is the weakest model implemented in Intel or AMD architecture to date. It uses special instructions to control dependency.

I choose to use the Intel terminology for controlling instruction reordering. The terminology is as follows and is depicted graphically in Figure 10-4:

Load acquire: Prevents reorderable instructions (reads and writes) coming after the load from being moved before it. Instructions can still move from before the load-acquire to after it. The IA-64 opcode for this is ld.acq.
Store release: Prevents reorderable instructions coming before the store from being moved after it. Instructions from after the store release can still be moved before it. IA-64's opcode for this is st.rel.
Memory fence (a.k.a. barrier): Prevents all reorderable instructions from moving with respect to the fence instruction. A barrier can be inserted manually using the Thread.MemoryBarrier method.

image from book
Figure 10-4: Instruction reordering (with IA-64 terminology).

Now that we've seen all of that, let's understand what the CLR's memory model guarantees. Its JIT compiler emits the proper instructions to make this work across all architectures.

The ECMA specification only prevents volatile reads and writes from being reordered. Marking a variable as volatile causes all reads and writes to the location to become load acquire and store releases, respectively. Alternatively, you can use the JIT intrinsics Thread.VolatileRead and VolatileWrite to mark specific load and store operations in this manner. Volatility makes this guarantee regardless of which memory model your implementation chooses.

The CLR 2.0's memory model was strengthened considerable to make executing on IA-64 possible. It guarantees that all stores are store release. Loads are still ordinary loads, however. This memory model permits deleting redundant writes to a location but not redundant reads. These rules are simple enough to make lock-free algorithms such as the double-checked lock work correctly:

 class Singleton {     private static Singleton instance;     private string state;     private bool initialized;     private Singleton()     {         state = "...";         initialized = true;     }     public Foo Instance     {         get         {             if (instance == null)             {                 lock (this)                 {                     if (instance == null)                         instance = new Singleton();                 }             }             return instance;         }     } }

This pattern is more efficient than locking unconditionally on every call to get_Instance. The lock will only be taken the first time instance is constructed, and for any contention that occurs at that moment. But this pattern would be broken on IA-64 without the store release guarantee. That's because there are three writes involved in constructing the single Singleton: its constructor's writes to state and initialized , and the get_Instance method's write to instance. These could be freely reordered with respect to each other. This could result in instance being set before its state was initialized!

COM Apartments

Apartments are used in COM code to make writing multi-threaded code easier. COM interoperability is discussed in Chapter 11. Several types of apartments are available, also shown in Figure 10-5:

Single Threaded Apartment (STA): Each STA can only contain a single thread. Processes may contain any number of STAs. COM components instantiated inside of an STA can only be accessed from the sole STA thread. This means code trying to access the component from any other thread must first transition onto the STA thread, eliminating the possibility of simultaneous access.
Multi-Threaded Apartment (MTA): There can only be a single MTA per process, but an MTA can contain any number of threads. COM components created inside of an MTA can only be accessed from within the MTA, meaning that STA threads must transition to the MTA before accessing such components. Components running inside of an MTA are effectively free-threaded, meaning components must manually account for concurrent access via locks.
Neutral Apartment (NA): There is always one NA per process, and no threads live inside of it. Thus any objects created in the NA may be accessed from either STA threads or MTA threads without transitioning.

image from book
Figure 10-5: COM apartments (STA, MTA, NA).

The CLR will automatically initialize threads into a process with the CoInitializeEx API. There are a few ways to indicate which type of apartment it should join. You can annotate your application's entrypoint with the STAThreadingAttribute or the MTAThreadingAttribute. Many project types annotate your entrypoint automatically. For example, if you have a GUI application, the main worker thread must be placed in an STA. You can also use the SetApartmentState function on the Thread class prior to starting an explicitly created thread. Which type of apartment to choose depends entirely on your scenario and what type of COM interoperability you are performing; when COM servers are registered, they indicate whether they support free-threading, single-threading, or both.

When a call to a COM component proxy would violate the apartment access rules, the CLR handles the transition for you. If you attempt to call a COM component created in an STA from an MTA thread, the CLR responds by performing a PostMessage to the STA's message queue. The STA must then pump its queue by running its message loop, which will dispatch the method call on the target COM object. Meanwhile, the MTA pumps its message loop, waiting for a PostMessage back from the STA containing the function return information. Note that there is a special type, System.EnterpriseServices .ServicedComponent, which abides by the COM apartment rules. If one is created in an STA, for example, any calls from an MTA will do exactly as specified above.

This behavior introduces two complexities. First, if an STA doesn't pump for messages, any MTA trying to transition a method call will be delayed. In fact, if the STA gets into a situation where it never pumps for messages, the MTA code might become deadlocked. Since the GC's finalizer thread is run from an MTA, any attempt to Finalize a COM proxy or ServicedComponent requires a transition. A deadlock here could lead to unbounded resource leaks. Thankfully, the CLR will pump for your STA code whenever you block in managed code (e.g., a contentious Monitor.Enter, WaitHandle.WaitOne, Thread.Join, etc.). Second, when an STA pumps, any messages it dispatches are entered right on the current thread's physical stack. Thus, any thread-wide state can be observed by the code now running on your thread. If you've left security-sensitive state, acquired a Monitor, or stashed away data in TLS, this will be visible to the code.

Spin-Loops and Hyper-Threaded Processors

Intel's Hyper-Threading (HT) technology uses multiple logical hardware threads per physical processor. Much like a many-core CPU, this enables the processor to handle OS thread-level parallelism. Unlike many-cores, however, the hardware threads share the same execution unit and cache. Each hardware thread does get its own set of registers, however, reducing the need for context switches.

Spin-loops must take care not to negatively interfere with the predictive execution capabilities of HT CPUs. Doing so could prevent the thread holding the lock attempting to be acquired from making progress, which can lead to wasteful spinning. This is accomplished with the x86 PAUSE instruction, exposed by the Thread.SpinWait method. This method loops around the number of times specified by the argument passed to the function, executing a PAUSE upon each iteration; you should pass in at least 1 for the value. (On non-HT CPUs, this instruction is a no-op.) This relies on the Win32 Yield Processor function. Refer to Listing 10-1 above for an example of SpinWait's usage, for example for implementing a reusable spin-lock.