Threads | Unix Application Migration Guide (Patterns & Practices)

A thread is an independent path of execution in a process that shares the address space, code, and global data of the process. Time slices are allocated to each thread based on priority, and consist of an independent set of registers, stack, I/O handles, and message queue. Threads can usually run on separate processors on a multiprocessor computer. Win32 enables you to assign threads to a specific processoron a multiprocessor hardware platform.

An application using multiple processes usually has to implement some form of interprocess communication (IPC). This can result in significant overhead, and possibly a communication bottleneck. In contrast, threads share the process data between them, and interthread communication can be much faster. The problem with threads sharing data is that this can lead to data access conflicts between multiple threads. You can address these conflicts using synchronization techniques, such as semaphores and mutexes .

In UNIX, developers implement threads by using the POSIX pthread functions. In Win32, developers can implement UNIX threading by using the Win32 API thread management functions. The functionality and operation of threads in UNIX and Win32 is very similar; however, the function calls and syntax are very different.

The following are some similarities between UNIX and Windows :

Every thread must have an entry point. The name of the entry point is entirelyup to you as long as the signature is unique and the linker can adequately resolve any ambiguity.
Each thread is passed a single parameter when it is created. The contents of this parameter are entirely up to the developer and have no meaning to the operating system.
A thread function must return a value.
A thread function needs to use local parameters and variables as much as possible. When you use global variables or shared resources, threads must use some form of synchronization to avoid potentially clobbering and corrupting data.

This section looks at how you should go about converting UNIX threaded applications into Win32 threaded applications. As you know from the preceding section about processes, you may also have decided to convert some of your application s use of UNIX processes into threads.

Note	More information about programming with threads in Win32 can be found on the MSDN Web site at http://msdn.microsoft.com/library/default.asp?url=/library/en-us/vccore98/HTML/_core_multithreading.3a_.programming_tips.asp

Note	For details on thread management functions in the Win32 API, see the Win32 API referencein Visual Studio or MSDN.

Creating a Thread

When creating a thread in UNIX, use the pthread_create function. This function has three arguments: a pointer to a data structure that describes the thread, an argument specifying the thread s attributes (usually set to NULL indicating default settings), and the function that the thread will run. The thread finishes execution with a pthread_exit , where in this case, it returns a string. The process can wait for the thread to complete using the function pthread_join .

The simple UNIX example that follows creates a thread and waits for it to finish.

Creating a Single Thread in UNIX

 #include <unistd.h> #include <stdio.h> #include <stdlib.h> #include <pthread.h> char message[] = "Hello World"; void *thread_function(void *arg) {     printf("thread_function started. Arg was %s\n", (char *)arg);     sleep(3);     strcpy(message, "Bye!");     pthread_exit("See Ya"); } int main() {     int res;     pthread_t a_thread;     void *thread_result;     res = pthread_create(&a_thread, NULL, thread_function, (void *)message);     if (res != 0) {         perror("Thread creation failed");         exit(EXIT_FAILURE);     }     printf("Waiting for thread to finish...\n");     res = pthread_join(a_thread, &thread_result);     if (res != 0) {         perror("Thread join failed");         exit(EXIT_FAILURE);     }     printf("Thread joined, it returned %s\n", (char *)thread_result);     printf("Message is now %s\n", message);     exit(EXIT_SUCCESS); }

In Win32, threads are created using the CreateThread function. CreateThread requires:

The size of the thread s stack.
The security attributes of the thread.
The address at which to begin execution of a procedure.
An optional 32 bit value that is passed to the thread s procedure.
Flags that permit the thread priority to be set.
An address to store the system-wide unique thread identifier.

Once a thread is created, the thread identifier can be used to manage the thread until it has terminated . The next example demonstrates how you should use CreateThread to create a single thread.

Creating a Single Thread in Windows

 #include <windows.h> #include <stdio.h> #include <stdlib.h> char message[] = "Hello World"; DWORD WINAPI thread_function(PVOID arg) {     printf("thread_function started. Arg was %s\n", (char *)arg);     Sleep(3000);     strcpy(message, "Bye!");     return 100; } void main() {     HANDLE a_thread;     DWORD a_threadId;     DWORD thread_result; // Create a new thread.     a_thread = CreateThread(NULL, 0, thread_function, (PVOID)message, 0, &a_threadId);     if (a_thread == NULL) {         perror("Thread creation failed");         exit(EXIT_FAILURE);     }     printf("Waiting for thread to finish...\n");     if (WaitForSingleObject(a_thread, INFINITE) != WAIT_OBJECT_0) {         perror("Thread join failed");         exit(EXIT_FAILURE);     }     // Retrieve the code returned by the thread.     GetExitCodeThread(a_thread, &thread_result);     printf("Thread joined, it returned %d\n", thread_result);     printf("Message is now %s\n", message);     exit(EXIT_SUCCESS); }

The UNIX and Win32 examples have roughly equivalent semantics. There are only two notable differences.

The thread function in the Win32 code cannot return a string value. Developers must use some other means to convey the string message back to the parent(for example, returning an index into a string array).
The Win32 version of the thread function simply returns a DWORD value rather than calling a function to terminate the thread. ExitThread could have been called, but this is not necessary because ExitThread is called automatically upon the return from the thread procedure. TerminateThread could also be called,but this isn t necessary, nor is it recommended. This is because TerminateThread causes the thread to exit unexpectedly. The thread then has no chance to execute any user -mode code, and its initial stack in not deallocated. Furthermore, any DLLs attached to the thread are not notified that the thread is terminating.For more information, see http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dllproc/prothred_3mgj.asp .

The two solutions have vastly different syntaxes. Win32 uses a different set of API calls to manage threads. As a result, the relevant data elements and arguments are considerably different.

Canceling a Thread

The details of terminating threads differ significantly between UNIX and Win32. While both environments allow threads to block termination entirely, UNIX offers additional facilities that allow a thread to specify if it is to be terminated immediately or deferred until it reaches a safe recovery point. Moreover, UNIX provides a facility known as cancellation cleanup handlers, which a thread can push and pop from a stack that is invoked in a last-in-first-out order when the thread is terminated. These cleanup handlers are coded to clean up and restore any resources before the thread is actually terminated.

The Win32 API allows you to terminate a thread asynchronously. Unlike UNIX,in Win32 code you cannot create cleanup handlers and it is not possible for a thread to defer from being terminated. Therefore, it is recommended that you design your code so that threads terminate by returning an exit code and so that threads cannot be terminated forcibly . To do this, you should design your thread code to accept some form of message or event to signal that they should be terminated. Based on this notification, the thread logic can elect to execute cleanup handling code and return normally.

To prevent a thread from being terminated, you should remove the security attributes for THREAD_TERMINATE from the thread object.

Although forcing a thread to end by using TerminateThread is not recommended, for completeness, the following example shows how you could convert UNIX code that cancels a thread into Win32 code that cancels a thread using this method.

Canceling a Thread in UNIX

 #include <unistd.h> #include <stdio.h> #include <stdlib.h> #include <pthread.h> void *thread_function(void *arg) {     int i, res;     res = pthread_setcancelstate(PTHREAD_CANCEL_ENABLE, NULL);     if (res != 0) {         perror("Thread pthread_setcancelstate failed");         exit(EXIT_FAILURE);     }     res = pthread_setcanceltype(PTHREAD_CANCEL_DEFERRED, NULL);     if (res != 0) {         perror("Thread pthread_setcanceltype failed");         exit(EXIT_FAILURE);     }     printf("thread_function is running\n");     for(i = 0; i < 10; i++) {         printf("Thread is running (%d)...\n", i);         sleep(1);     }     pthread_exit(0); } int main() {     int res;     pthread_t a_thread;     void *thread_result;     res = pthread_create(&a_thread, NULL, thread_function, NULL);     if (res != 0) {         perror("Thread creation failed");         exit(EXIT_FAILURE);     }     sleep(3);     printf("Cancelling thread...\n");     res = pthread_cancel(a_thread);     if (res != 0) {         perror("Thread cancellation failed");         exit(EXIT_FAILURE);     }     printf("Waiting for thread to finish...\n");     res = pthread_join(a_thread, &thread_result);     if (res != 0) {         perror("Thread join failed");         exit(EXIT_FAILURE);     }     exit(EXIT_SUCCESS); }

Canceling a Thread in Windows

 #include <windows.h> #include <stdio.h> #include <stdlib.h> DWORD WINAPI thread_function(PVOID arg) {     printf("thread_function is running. Argument was %s\n", (char *)arg);     for(int i = 0; i < 10; i++) {         printf("Thread is running (%d)...\n", i);         Sleep(1000);     }     return 100; } void main() {     HANDLE a_thread;     DWORD thread_result;     // Create a new thread.     a_thread = CreateThread(NULL, 0, thread_function, (PVOID)NULL, 0, NULL);     if (a_thread == NULL) {         perror("Thread creation failed");         exit(EXIT_FAILURE);     } Sleep(3000); printf("Cancelling thread...\n"); if (!TerminateThread(a_thread, 0)) { perror("Thread cancellation failed"); exit(EXIT_FAILURE); }          printf("Waiting for thread to finish...\n");     WaitForSingleObject(a_thread, INFINITE);     GetExitCodeThread(a_thread, &thread_result);          exit(EXIT_SUCCESS); }

When you compare the UNIX and Win32 examples, you can see that in the Win32 implementation the setting for the deferred termination is absent. This is because deferring termination is not supported in Win32. TerminateThread is not immediate, and it is not predictable. The termination resulting from a TerminateThread call can occur at any point during the thread execution. In contrast, UNIX threads tagged as deferred can terminate when a safe cancellation point is reached.

If you need to match the UNIX behavior in your Win32 application exactly, you must create your own cancellation code and thereby prevent the thread from being forcibly terminated.

Thread Synchronization

When you have more than one thread executing simultaneously , you have to take the initiative to protect shared resources. For example, if your thread increments a variable, you cannot predict the result because the variable may have been modified by another thread before or after the increment. The reason that you cannot predict the result is that the order in which threads have access to a shared resource is indeterminate.

The following example illustrates code that is, in principle, indeterminate.

Note	This is a very simple example and on most computers the result would always be the same, but the important point to note is that this is not guaranteed .

The main thread in the following example is represented by the parent. It generates a P , and the child or secondary thread outputs a T . A UNIX example and a Windows example are shown.

Multiple Non-Synchronized Threads in UNIX

 #include <unistd.h> #include <stdio.h> #include <stdlib.h> #include <pthread.h> void *thread_function(void *arg) {     int count2;     printf("thread_function is running. Argument was: %s\n", (char *)arg);     for (count2 = 0; count2 < 10; count2++) {         sleep(1);         printf("T");     }     sleep(3); } char message[] = "Hello Im a Thread"; int main() {     int count1, res;     pthread_t a_thread;     void *thread_result;     res = pthread_create(&a_thread, NULL, thread_function, (void *)message);     if (res != 0) {         perror("Thread creation failed");         exit(EXIT_FAILURE);     }     printf("entering loop\n");     for (count1 = 0; count1 < 10; count1++) {         sleep(1);         printf("P");     }     printf("\nWaiting for thread to finish...\n");     res = pthread_join(a_thread, &thread_result);     if (res != 0) {         perror("Thread join failed");         exit(EXIT_FAILURE);     }     printf("\nThread joined\n");     exit(EXIT_SUCCESS); }

Multiple Non-Synchronized Threads in Windows

 #include <windows.h> #include <stdio.h> #include <stdlib.h> DWORD WINAPI thread_function(PVOID arg) {     int count2;     printf("thread_function is running. Argument was: %s\n", (char *)arg);     for (count2 = 0; count2 < 10; count2++) {         Sleep(1000);         printf("T");     }     Sleep(3000);     return 0; } char message[] = "Hello Im a Thread"; void main() {     HANDLE a_thread; DWORD a_threadId;     DWORD thread_result;     int count1; // Create a new thread.     a_thread = CreateThread(NULL, 0, thread_function, (PVOID)message, 0, &a_threadId);     if (a_thread == NULL) {         perror("Thread creation failed");         exit(EXIT_FAILURE);     }     printf("entering loop\n");     for (count1 = 0; count1 < 10; count1++) {         Sleep(1000);         printf("P");     }     printf("\nWaiting for thread to finish...\n"); if (WaitForSingleObject(a_thread, INFINITE) != WAIT_OBJECT_0) {         perror("Thread join failed");         exit(EXIT_FAILURE);     } // Retrieve the code returned by the thread. GetExitCodeThread(a_thread, &thread_result);     printf("\nThread joined\n");     exit(EXIT_SUCCESS); }

No actual synchronization between these two threads is being performed, and each thread uses the same shared variable. If the threads were running serially , you would see output like the following:

 MOV EAX, 2           ; Thread 1: Move 2 into a register. MOV [run_now], EAX   ; Thread 1: Store 2 in run_now. MOV EAX, 1           ; Thread 2: Move 1 into a register. MOV [run_now], EAX   ; Thread 2: Store 1 in run_now.

However, because there is no guarantee of the order that the threads will be executed in, you could have the following output:

 MOV EAX, 2           ; Thread 1: Move 2 into a register.  MOV EAX, 1           ; Thread 2: Move 1 into a register.  MOV [run_now], EAX   ; Thread 1: Store 2 in run_now. MOV [run_now], EAX   ; Thread 2: Store 1 in run_now.

It is not possible to predict the output that you will see from these examples. In most applications, unpredictable results are an undesirable feature. Consequently,it is important that you take great care in controlling access to shared resources in threaded code. UNIX and Windows provide mechanisms for controlling resource access. These mechanisms are referred to as synchronization techniques, which are discussed in the next few sections.

Interlocked Exchange

A simple form of synchronization is to use what is known as an interlocked exchange . An interlocked exchange performs a single operation that cannot be preempted.The threads of different processes can only use this mechanism if the variable is in shared memory. The variable pointed to by the target parameter must be aligned on a 32-bit boundary; otherwise , this function will fail on multiprocessor x86 systems. Because this is not the case in the example, the example has limited value; but it does illustrate the use of the InterlockedExchange functions.

Rewriting the previous Win32 example by using InterlockedExchange results in the following code.

Thread Synchronization Using Interlocked Exchange in Windows

 #include <windows.h> #include <stdio.h> #include <stdlib.h>  LONG new_value = 1;  char message[] = "Hello Im a Thread"; DWORD WINAPI thread_function(PVOID arg) {     int count2;     printf("thread_function is running. Argument was: %s\n", (char *)arg);     for (count2 = 0; count2 < 10; count2++) {         Sleep(1000);         printf("(T-%d)", new_value);  InterlockedExchange(  &  new_value, 1);  }     Sleep(3000);     return 0; } void main() {     HANDLE a_thread;     DWORD a_threadId;     DWORD thread_result;     int count1; // Create a new thread.     a_thread = CreateThread(NULL, 0, thread_function, (PVOID)message, 0, &a_threadId);     if (a_thread == NULL) {         perror("Thread creation failed");         exit(EXIT_FAILURE);     }     printf("entering loop\n");     for (count1 = 0; count1 < 10; count1++) {         Sleep(1000);         printf("(P-%d)", new_value);  InterlockedExchange(  &  new_value, 2);  }     printf("\nWaiting for thread to finish...\n"); if (WaitForSingleObject(a_thread, INFINITE) != WAIT_OBJECT_0) {         perror("Thread join failed");         exit(EXIT_FAILURE);     } // Retrieve the code returned by the thread.     GetExitCodeThread(a_thread, &thread_result);     printf("\nThread joined\n");     exit(EXIT_SUCCESS); }

Synchronization with SpinLocks

In the previous example, as noted, you still have no synchronization between the two threads. The output may still be out of order. One simple mechanism that offers synchronization is to implement a spin lock. To accomplish this, a variant of the Interlocked function called InterlockedCompareExchange is used as follows:

 #include <windows.h> #include <stdio.h> #include <stdlib.h>  LONG run_now = 1;  char message[] = "Hello Im a Thread"; DWORD WINAPI thread_function(PVOID arg) {     int count2;     printf("thread_function is running. Argument was: %s\n", (char *)arg);     for (count2 = 0; count2 < 10; count2++) {  if (InterlockedCompareExchange(  &  run_now, 1, 2) == 2)   printf("T-2");   else  Sleep(1000);     }     Sleep(3000);     return 0; } void main() {     HANDLE a_thread; DWORD a_threadId;     DWORD thread_result;     int count1; // Create a new thread.     a_thread = CreateThread(NULL, 0, thread_function, (PVOID)message, 0, &a_threadId);     if (a_thread == NULL) {         perror("Thread creation failed");         exit(EXIT_FAILURE);     }     printf("entering loop\n");     for (count1 = 0; count1 < 10; count1++) {  if (InterlockedCompareExchange(  &  run_now, 2, 1) == 1)   printf("P-1");   else  Sleep(1000);     }     printf("\nWaiting for thread to finish...\n");     if (WaitForSingleObject(a_thread, INFINITE) != WAIT_OBJECT_0) {         perror("Thread join failed");         exit(EXIT_FAILURE);     } // Retrieve the code returned by the thread.     GetExitCodeThread(a_thread, &thread_result);     printf("\nThread joined\n");     exit(EXIT_SUCCESS); }

Spinlocks work well for synchronizing access to a single object, but most applications are not this simple. Moreover, using spinlocks is not the most efficient means for controlling access to a shared resource. Running a While loop in user mode while waiting for a global value to change wastes CPU cycles unnecessarily. A mechanism is needed that allows the thread to not waste CPU time while waiting to access a shared resource.

When a thread requires access to a shared resource, for example a shared memory object, it must either be notified or scheduled to resume execution. To accomplish this, a thread must call an operating system function, passing it parameters that indicate what the thread is waiting for. If the operating system detects that the resource is available, the function returns and the thread resumes.

If the resource is unavailable, the system places the thread in a wait state, making the thread unschedulable. This prevents the thread from wasting any CPU time. When a thread is waiting, the system permits the exchange of information between the thread and the resource. The operating system tracks the resources that a thread needs and automatically resumes the thread when the resource becomes available. The thread s execution is synchronized with the availability of the resource.

Mechanisms that prevent the thread from wasting CPU time include critical sections (for example, the EnterCriticalSection function waits for ownership of the specified critical section object, and returns when the calling thread has been granted ownership), semaphores, and mutexes. Windows includes all three of these mechanisms, and UNIX provides both semaphores and mutexes. These three mechanisms are described in the following sections.

Synchronization with Critical Sections

Another mechanism for solving this simple scenario is to use a critical section.A critical section is similar to InterlockedExchange except that you have the ability to define the logic that takes place as an atomic operation.

What follows is the simple example from the previous section with the InterlockedExchange replaced with critical sections. On multiprocessor systems,it s best to use InitializeCriticalSectionAndSpinCount , instead of InitializeCriticalSection , which provides an optimized version of critical sectionsby employing spin counting. A critical section with spin locking allows the EnterCriticalSection to be tried up to spin count times before transitioning into kernel mode to wait for the resource. The advantage to this is that the transitioninto kernel mode requires approximately 1,000 CPU cycles.

Moreover, there is a slight chance that entering a critical section may fail dueto memory limitations. The InitializeCriticalSectionAndSpinCount form of the critical section function then returns a status of STATUS_NO_MEMORY. This isan improvement over the InitializeCriticalSection function, which does not return any status as can be determined by its void return type.

Critical section code is highlighted in bold.

 #include <windows.h> #include <stdio.h> #include <stdlib.h>  CRITICAL_SECTION g_cs;  char message[] = "Hello Im a Thread"; DWORD WINAPI thread_function(PVOID arg) {     int count2;     printf("\nthread_function is running. Argument was: %s\n", (char *)arg);     for (count2 = 0; count2 < 10; count2++) {  EnterCriticalSection(  &  g_cs);  printf("T");  LeaveCriticalSection(  &  g_cs);  }     Sleep(3000);     return 0; } void main() {     HANDLE a_thread;     DWORD a_threadId;     DWORD thread_result;     int count1;  InitializeCriticalSection(  &  g_cs);  // Create a new thread.     a_thread = CreateThread(NULL, 0, thread_function, (PVOID)message, 0, &a_threadId);     if (a_thread == NULL) {         perror("Thread creation failed");         exit(EXIT_FAILURE);     }     printf("entering loop\n");     for (count1 = 0; count1 < 10; count1++) {  EnterCriticalSection(  &  g_cs);  printf("P");  LeaveCriticalSection(  &  g_cs);  }     printf("\nWaiting for thread to finish...\n"); if (WaitForSingleObject(a_thread, INFINITE) != WAIT_OBJECT_0) {         perror("Thread join failed");         exit(EXIT_FAILURE);     } // Retrieve the code returned by the thread.     GetExitCodeThread(a_thread, &thread_result);     printf("\nThread joined\n");  DeleteCriticalSection(  &  g_cs);  exit(EXIT_SUCCESS); }

Synchronization Using Semaphores

In the following examples, two threads are created that use a shared memory buffer. Access to the shared memory is synchronized using a semaphore. The primary thread ( main ) creates a semaphore object and uses this object to handshake with the secondary thread ( thread_function ). The primary thread instantiates the semaphore in a state that prevents the secondary thread from acquiring the semaphore while it is initiated.

After the user types in some text at the console and presses ENTER, the primary thread relinquishes the semaphore. The secondary thread then acquires thesemaphore and processes the shared memory area. At this point, the main threadis blocked waiting for the semaphore, and will not resume until the secondary thread has relinquished control by calling ReleaseSemaphore .

In UNIX, the semaphore object functions of sem_post and sem_wait are all thatare required to perform handshaking. With Win32, you must use a combinationof WaitForSingleObject and ReleaseSemaphore in both the primary and the secondary threads in order to facilitate handshaking. The two solutions are also very different from a syntactic standpoint. The primary difference between their implementations is with the API calls that are used to manage the semaphore objects.

One aspect of CreateSemaphore that you need to be aware of is the last argumentin its parameter list. This is a string parameter specifying the name of the semaphore. You should not pass a NULL for this parameter. Most (but not all) of the kernel objects, including semaphores, are named. All kernel object names are stored ina common namespace except if it is a server running Microsoft Terminal Server,in which case there will also be a namespace for each session. If the namespace is global, one or more unassociated applications could attempt to use the same name for a semaphore. To avoid namespace contention , applications should use some unique naming convention. One solution would be to base your semaphore names on globally unique identifiers (GUIDs).

Terminal Server and Naming Semaphore Objects

As mentioned earlier, Terminal Servers have multiple namespaces for kernel objects. There is one global namespace, which is used by kernel objects that are accessibleby any and all client sessions, and is usually populated by services. Additionally, each client session has its own namespace to prevent namespace collisions between multiple instances of the same application running in different sessions.

In addition to the session and global namespaces, Terminal Servers also have a local namespace. By default, an application s named kernel objects reside in the session namespace. It is possible, however, to override what namespace will be used. Thisis accomplished by prefixing the name with Global\ or Local\. These prefix names are reserved by Microsoft, are case-sensitive, and are ignored if the computer is not operating as a Terminal Server.

UNIX Example: Synchronization Using Semaphores

 #include <unistd.h> #include <stdio.h> #include <stdlib.h> #include <pthread.h> #include <semaphore.h> #define SHARED_SIZE 1024 char shared_area[SHARED_SIZE]; sem_t bin_sem; void *thread_function(void *arg) {     sem_wait(&bin_sem);     while(strncmp("done", shared_area, 4) != 0) {         printf("You input %d characters\n", strlen(shared_area) -1);         sem_wait(&bin_sem);     }     pthread_exit(NULL); } int main() {     int res;     pthread_t a_thread;     void *thread_result;     res = sem_init(&bin_sem, 0, 0);     if (res != 0) {         perror("Semaphore initialization failed");         exit(EXIT_FAILURE);     }     res = pthread_create(&a_thread, NULL, thread_function, NULL);     if (res != 0) {         perror("Thread creation failed");         exit(EXIT_FAILURE);     }     printf("Input some text. Enter