Section 1.4. Processes, Threads, and Scheduling | Solaris Internals: Solaris 10 and OpenSolaris Kernel Architecture (2nd Edition)

1.4. Processes, Threads, and Scheduling

The Solaris kernel is multithreaded; that is, kernel services and tasks are executed as kernel threads. The kernel thread is the core unit of execution managed by the Solaris kernel. Kernel threads have an execution state and context that includes a global priority and scheduling class; kernel threads are the fundamental units that get scheduled, executed and context switched on and off processors. This same model applies to user level processes. The user process is a container that defines much of the execution context for its threads. Threads allow multiple streams of execution within a single virtual memory environment; consequently, switching execution between threads within the same process is inexpensive, since a virtual memory context switch is not required. The following objects form the nucleus of the Solaris kernel threads model and implementation.

Kernel threads. The object that gets scheduled and executed on a processor.
User threads. The user-level (non-kernel) thread state maintained within a user process.
Process. The executable form of a program; the execution environment for a user program.
Lightweight process (LWP). The kernel-visible execution context for a user thread.

Solaris executes kernel threads for kernel-related tasks, such as interrupt handling, memory page management, device drivers, etc. For user-process execution, kernel threads have a corresponding LWP; these kernel threads are scheduled for execution by the kernel on behalf of the user processes. Within the kernel, multiple threads of execution share the kernel's environment, primarily the kernel's address space. Processes also contain one or more threads, which share the virtual memory environment of the process as well as other components of the process context.

A process is an abstraction that contains the execution environment for a user program. It consists of a virtual memory environment (an address space), program resources such as an open file list, and at least one thread of execution. The virtual memory environment, open file list, and other components of the process environment are shared by all the threads within each process.

The LWP and its corresponding kernel thread define the virtual execution environment for a thread within a user process. Beginning in Solaris 9, there is a one-to-one relationship between user threads, LWPs, and kernel threads. That is, every thread in a user process is bound to an LWP, and each LWP has a kernel thread. The LWP allows each thread within a process to make system calls independently of other threads within the same process. Without an LWP, only one thread could enter the kernel at a timeonly one thread at a time could make a system call. Each time a system call is made by a thread, its registers are placed on a stack within the LWP. Upon return from a system call, the system call return codes are made available to the LWP. Figure 1.2 shows the relationship among user threads, LWPs, kernel threads, and processes.

Figure 1.2. Kernel Threads, Processes, and Lightweight Processes

1.4.1. A New Threads Model

Solaris releases 2.2 through Solaris 8 implemented a two-level threads model, whereby user threads were multiplexed onto a potentially smaller pool of LWPs. The original design was intended to support hundreds or thousands of threads in a process, without the need to enter the kernel for many thread management tasks, such as creating and destroying threads. This model served us well for many years, but was not without its challenges. The multiplexing of user threads onto available LWPs required maintaining a runnable thread queue and user thread scheduler at the threads library levelseparate and distinct from the kernel scheduler. A user thread needed to be bound to an LWP before the kernel could schedule it to run on a processor. Maintaining a library-level threads scheduler was enormously complex. Additionally, maintaining correct asynchronous signal behavior in the two-level model was quite challenging, since a user thread that is not masking a posted signal may not be on an LWP when the system attempts to deliver the signal. Finally, issues with concurrency management and scheduling latency could result in suboptimal performance for threaded applications. The scheduling latency was the effect of waiting for the threads library scheduler to link a user thread to an available LWP. The concurrency issue has to do with maintaining a sufficient number of LWPs such that the process does not have runnable user threads waiting for an execution resource (an LWP).

Beginning with Solaris 8, a new threads model was introduced: a single-level model. That is, when a user thread is created, an LWP and kernel thread are also created and linked to the user thread; the user thread is never without an LWP/kthread. This corresponds to what was referred to as bound threads in the two-level model. The threads programming interfaces provide a flag for the creation of bound threads; this flag has been available since the introduction of thread programming interfaces in Solaris. The new single-level model can be thought of as all bound threads, all the time. The new threads model was introduced in Solaris 8 through the distribution of an alternate threads library. By default, threaded applications link to /usr/lib/libthread.so, which in Solaris 8 delivers the original two-level model. An alternate libthread.so shared object library was placed in the /usr/lib/lwp directory. The new library is binary compatible with all existing threaded applications. You need not recompile to use the new threads library: simply set the runtime linker's path environmental variable to point to /usr/lib/lwp. The single-level threads library is the default library in Solaris 9 and Solaris 10, so setting the runtime linker path variable is not required in order to get the single-model behavior.

The new threads model offers several benefits over the original model:

Improved performance, scalability, and reliability. The library source code was reduced substantially in size and complexity with the development of the single-level model. Internal library locks required for a library-level scheduler were done away with.
Reliable signal behavior. Issues of sychronizing signal masks between the user thread and LWP no longer exist; asynchronous signal delivery is reliable and consistent.
Improved adaptive mutex lock implementation. Mutual exclusion (mutex) locks are synchronization primitives used by threaded programs to protect data from concurrent access by multiple threads at the same time. Adaptive mutexes provide an optimization whereby a thread that wishes to acquire a lock that is being held will dynamically decide to spin waiting for the lock, or will sleep and rely on the wakeup mechanism to give it another shot at the lock when it is released. With the new model, the adaptive mutex implementation has been optimized.
User-level sleep queues for synchronization objects. Synchronization objects, such as mutex locks, can be defined by the programmer to be intraprocess locks. This means that a lock will be shared only among threads within the process, not by threads in other processes. For these intraprocess locks, the code path for managing lock acquisition and release has been optimized to maintain threads waiting for a lock in a user-level sleep queue. There are fewer calls into the kernel for threads acquiring and releasing intraprocess locks.

These features, as well as other benefits derived from the new threads library, are discussed in more detail in Part Two.

1.4.2. Global Process Priorities and Scheduling

The Solaris kernel implements a global thread priority model for kernel threads. The kernel scheduler, or dispatcher, uses the model to select which kernel thread of potentially many runnable kernel threads executes next. The kernel supports the notion of preemption, allowing a higher-priority thread to preempt a running thread so that the higher-priority thread can execute. The kernel itself is preemptable, an innovation providing for time-critical scheduling of high-priority threads.

There are 170 global priorities; numerically larger priority values correspond to better thread priorities. The priority name space is partitioned by different scheduling classes (see Figure 1.3). The Solaris dispatcher implements multiple scheduling classes that allow different scheduling policies to be applied to threads. The three primary scheduling classes are TS (IA is an enhanced TS), SYS, and RT. The scheduling classes are shown in Figure 1.3 and described below the figure.

Figure 1.3. Global Thread Priorities

TS. The timeshare scheduling class is the default class for processes and all the kernel threads within the process. It changes process priorities dynamically according to recent processor usage in an attempt to evenly allocate processor resources among the kernel threads in the system. Process priorities and time quantums are calculated according to a timeshare scheduling table at each clock tick or during wakeup after sleeping for an I/O. The TS class uses priority ranges 0 to 59.
IA. The interactive class is an enhanced TS class used by the desktop windowing system to boost the priority of threads within the window under focus. The global priority range of IA class threads is also 0 to 59.
FSS. The fair-share scheduling class is share-based, not priority-based; available CPU resources are allocated in units called shares, and threads are scheduled based on share allocation and processor utilitzation. The FSS class was introduced in Solaris 9, and is managed through the Solaris projects database.
FX. The fixed-priority scheduling class. Threads in the FX class do not have their priority changed. The priority remains fixed throughout the lifetime of the thread. The FX class was introduced in Solaris 9.
SYS. The system class is used by the kernel for kernel threads. Threads in the system class are bound threads; that is, there is no time quantumthey run until they block or complete. The system class uses priorities 60 to 99.
RT. The real-time class implements fixed-priority, fixed-time-quantum scheduling. The real-time class uses priorities 100 to 159. Note that the priority of threads in the RT class is higher than that of kernel threads in the SYS class. RT class threads will preempt operating system kernel threads.

The interrupt priority levels shown in Figure 1.3 are not available for use by anything other than interrupt threads. Their positioning in the priority scheme is intended to guarantee that interrupt threads have priority over all other threads in the system.

The available scheduling classes, along with the user and administrator command set to observe and manage thread priorities and classes, furnish a rich environment in which any production workload, or combination of workloads, running within a single Solaris kernel instance can meet performance requirements and service levels.