5.4 User, Kernel, and Hybrid Threading Models | C++ Network Programming, Volume I: Mastering Complexity with ACE and Patterns

I l @ ve RuBoard

5.4 User , Kernel, and Hybrid Threading Models

Scheduling is the primary mechanism an OS provides to ensure that applications use host CPU resources appropriately. Threads are the units of scheduling and execution in multithreaded processes. Modern OS platforms provide various models for scheduling threads created by applications. A key difference between the models is the contention scope in which threads compete for system resources, particularly CPU time. There are two different contention scopes:

Process contention scope, where threads in the same process compete with each other (but not directly with threads in other processes) for scheduled CPU time.
System contention scope, where threads compete directly with other system-scope threads, regardless of what process they are associated with.

Three thread scheduling models are implemented in commonly available operating systems today:

N:1 user-threading model
1:1 kernel-threading model
N:M hybrid-threading model

We describe these models below, discuss their trade-offs, and show how they support various contention scopes.

The N:1 user-threading model. Early threading implementations were layered atop the native OS process control mechanisms and handled by libraries in user space. The OS kernel therefore had no knowledge of threads at all. The kernel scheduled the processes and the libraries managed n threads within one process, as shown in Figure 5.5 (1). Hence, this model is referred to as "N:1" user-threading model, and the threads are called "user-space threads" or simply "user threads." All threads operate in process contention scope in the N:1 thread model. HP-UX 10.20 and SunOS 4.x are examples of platforms that provide an N:1 user-threading model.

Figure 5.5. The N:1 and 1:1 Threading Models

In the N:1 threading model, the kernel isn't involved in any thread life-cycle events or context switches within the same process. Thread creation, deletion, and context switches can therefore be highly efficient. The two main problems with the N:1 model, ironically, also stem from the kernel's ignorance of threads:

Regardless of the number of host CPUs, each process is scheduled onto only one. All threads in a process contend for that CPU, sharing any time-slice allocation the kernel may use in its process scheduling.
If one thread issues a blocking operation, for example, to read() from or write() to a file, all threads in that process block until the operation completes. Many N:1 implementations, most notably DCE Threads, provide wrappers around OS system functions to alleviate this restriction. These wrappers aren't entirely transparent, however, and they have restrictions on your program's behavior. You must therefore be aware of them to avoid adversely affecting your application.

The 1:1 kernel-threading model. Most modern OS kernels provide direct support for threads. In the "1:1" kernel-threading model, each thread created by an application is handled directly by a kernel thread. The OS kernel schedules each kernel thread onto the system's CPU(s), as shown in Figure 5.5 (2). In the "1:1" model, therefore, all threads operate in system contention scope. HP-UX 11, Linux, and Windows NT/2000 are examples of platforms that provide a 1:1 kernel-threading model.

The 1:1 model fixes the following two problems with the N:1 model outlined above:

Multithreaded applications can take advantage of multiple CPUs if they are available.
If the kernel blocks one thread in a system function, other threads can continue to make progress.

Since the OS kernel is involved in thread creation and scheduling, however, thread life-cycle operations can be more costly than with the N:1 model, though generally still cheaper than process life-cycle operations.

The N:M hybrid-threading model. Some operating systems, such as Solaris [EKB ⁺ 92], offer a combination of the N:1 and 1:1 models, referred to as the "N:M" hybrid-threading model. This model supports a mix of user threads and kernel threads. The hybrid model is shown in Figure 5.6. When an application spawns a thread, it can indicate in which contention scope the thread should operate (the default on Solaris is process contention scope). The OS threading library creates a user-space thread, but only creates a kernel thread if needed or if the application explicitly requests the system contention scope. As in the 1:1 model, the OS kernel schedules kernel threads onto CPUs. As in the N:1 model, however, the OS threading library schedules user-space threads onto so-called "lightweight processes" (LWPs), which themselves map 1-to-1 onto kernel threads.

Figure 5.6. The N:M Hybrid Threading Model

The astute reader will note that a problem resurfaces in the N:M model, where multiple user-space threads can block when one of them issues a blocking system function. When the OS kernel blocks an LWP, all user threads scheduled onto it by the threads library also block, though threads scheduled onto other LWPs in the process can continue to make progress. The Solaris kernel addresses this problem via the following two-pronged approach based on the concept of scheduler activations [ABLL92]:

The OS threading library maintains a pool of LWPs that it uses to run all the process-scoped user threads. It can reschedule these user threads onto LWPs in the pool as the need arises. The size of this pool can be adjusted via the Pthreads pthread_setconcurrency() function.
When the OS kernel notices that all kernel threads in a process are blocked, it sends the SIGWAITING signal to the affected process. The threading library catches the signal and can start a new LWP. It can then reschedule a process-scope thread onto the new LWP, allowing the application to continue making progress.

Not all OS platforms allow you to influence how threads are mapped to, and how they allocate, system resources. You should know what your platform(s) do allow and how they behave to make the most of what you have to work with. Detailed discussions of OS concurrency mechanisms appear in [Lew95, But97, Ric97, Sol98, Sch94]. As with any powerful, full-featured tool, it's possible to hurt yourself when misusing threads. So, when given a choice between contention scope, which should you choose? The answer lies in which of the following reasons corresponds most closely to why you're spawning a thread, as well as how independent it must be of other threads in your program:

Spawn threads to avoid interference from other tasks . Some tasks must execute with minimal interference from other threads in your process, or even other threads in the system. Some examples are

- A thread that must react quickly to some stimulus, such as tracking mouse movement or closing a valve in a power plant

- A CPU- intensive task that should be isolated from other tasks

- An I/O-intensive task on a multiprocessor system.

In these cases, each thread should be scheduled on its own and have minimal contention from the other threads in your application. To achieve this aim, use a system-scoped thread to avoid scheduling the new thread against other threads on the same kernel thread and enable the OS to utilize multiple CPUs. If your system supports the N:M model, request system contention scope explicitly. If your system offers the 1:1 model, you're in luck because you get a system-scoped thread anyway. On N:1 systems, however, you may be out of luck.
Spawn threads to simplify application design. It's often wise to conserve your available OS resources for demanding situations. Your primary motivation for creating threads may be to simplify your application design by decomposing it into separate logical tasks, such as processing stages that operate on data and pass it to the next stage. In this case, you needn't incur the cost of a kernel-thread if you can employ process-scoped threads, which are always used in N:1 systems or by request in the N:M model. Process-scoped threads have the following consequences:

- They avoid extra kernel involvement in thread creation, scheduling, and synchronization, while still separating concerns in your application.

- As long as the wait states of process-scoped threads are induced by synchronization, such as waiting on a mutex, rather than blocking system functions, your entire process or kernel thread won't block inadvertently.

Although multithreading may seem intimidating at first, threads can help to simplify your application designs once you've mastered synchronization patterns [SSRB00] and OS concurrency mechanisms. For example, you can perform synchronous I/O from one or more threads, which can yield more straightforward designs compared with synchronous or asynchronous event handling patterns, such as Reactor or Proactor, respectively. We discuss OS concurrency mechanisms in Chapter 6 and the ACE threading and synchronization wrapper facades that encapsulate these mechanisms in Chapters 9 and 10.

Logging service Our logging server implementations in the rest of this book illustrate various ACE concurrency wrapper facades. These examples use threads with system contention scope, that is, 1:1 kernel threads, when the thread's purpose is to perform I/O, such as receiving log records from clients . This design ensures that a blocking call to receive data from a socket doesn't inadvertently block any other thread or the whole process!

I l @ ve RuBoard