Section 3.4. Dispatcher Locks | Solaris Internals: Solaris 10 and OpenSolaris Kernel Architecture (2nd Edition)

3.4. Dispatcher Locks

The kernel implements several types of synchronization primitives to facilitate support for hardware platforms with more than one processor. The most common is the mutual exclusion lock, or mutex lock. Other locking mechanisms used by the kernel include reader/writer locks and, in some cases, semaphores. These are discussed in Chapter 17.

These locking mechanisms provide fast and scalable methods of synchronizing activity among many kernel threads and maintain coherency for the various bits of kernel data and state they protect. However, mutex locks, by design, can require calling threads to enter the dispatcher for sleep, wakeup, and associated context switch operations. Also, interrupt activity requires dispatcher functions for managing the pinning of a running thread and putting an interrupt thread on a processor for execution. In some cases, interrupt threads may block, requiring the dispatcher code to manage changing the state of the interrupt thread from ONPROC to SLEEP, placing it on a sleep queue, and setting up the interrupted thread to resume execution.

Specific areas of the dispatcher code must be allowed to execute safely, without risk of an event or branch in the code that would reenter the dispatcher from another source. It is in these areas of the kernel that dispatcher locks are used. Simply put, a dispatcher lock is an implementation of a spin lock that runs at a high-priority level, blocking all but the highest-priority interrupts. A spin lock, as the name implies, causes the calling thread to enter a spin loop if the lock the thread is attempting to acquire is not free. If the target lock is free, the processor executing the thread that takes ownership of the dispatcher lock has its priority interrupt level (PIL) elevated to block low-level interrupts.

The exact priority level is shown in the header file below.

/*  * The definitions of the symbolic interrupt levels:  *  *   CLOCK_LEVEL =>  The level at which one must be to block the clock.  *  *   LOCK_LEVEL  =>  The highest level at which one may block (and thus the  *                   highest level at which one may acquire adaptive locks)  *                   Also the highest level at which one may be preempted.  *  *   DISP_LEVEL  =>  The level at which one must be to perform dispatcher  *                   operations.  *  * The constraints on the platform:  *  *  - CLOCK_LEVEL must be less than or equal to LOCK_LEVEL  *  - LOCK_LEVEL must be less than DISP_LEVEL  *  - DISP_LEVEL should be as close to LOCK_LEVEL as possible  *  * Note that LOCK_LEVEL and CLOCK_LEVEL have historically always been equal;  * changing this relationship is probably possible but not advised.  *  */ #define CLOCK_LEVEL     10 #define LOCK_LEVEL      10 #define DISP_LEVEL      (LOCK_LEVEL + 1) #define HIGH_LEVELS     (PIL_MAX - LOCK_LEVEL) #define PIL_MAX         15                                                    See usr/src/uts/sparc/sys/machlock.h

Several symbolic constants represent key interrupt levels. On both SPARC and Intel architectures, there are 15 interrupt priority levels, with interrupt levels 11 through 15 defined as high-priority interrupts. Interrupts are discussed in Section 3.11, but for this discussion, there is one key point to be aware of regarding high-priority interrupts: The interrupt handler for high-priority interrupts cannot blockdoing so would violate a critical constraint that kernel programmers must comply with when writing high PIL interrupt handlers.

The constraint exists because dispatcher locks are held at interrupt level 11 (DISP_LEVEL); thus, a processor executing a thread that acquires a dispatcher lock is blocking interrupts at and below 11only level 12 interrupts and higher cause the processor to stop what it's doing and allow the interrupt to be handled. This means that it's possible to interrupt a thread holding a dispatcher lock. Entering the dispatcher while executing in high-level interrupt context on the processor that was already in the dispatcher and holding a dispatcher lock would be disastrous and would most certainly either hang or panic the kernel. This is why mutex locks are not used for most dispatcher functionsthe adaptive behavior of kernel mutex locks can require entering the dispatcher to put the calling thread to sleep.

To elaborate a bit on this complex topic, when a dispatcher lock is held, the CPU is at DISP_LEVEL (PIL 11)all interrupts at DISP_LEVEL and below are blocked. This raises the question as to why DISP_LEVEL and LOCK_LEVEL are not the same. They used to be, prior to Solaris 7DISP_LEVEL did not exist, and the dispatcher operated at PIL 10, the same as CLOCK_LEVEL. The problem with this arrangement was that, on one hand, we can not preempt a thread holding a dispatcher lock (remember, at PIL 10), but on the other hand, if the clock interrupt thread, which operates at PIL 10, were to block, leaving the CPU at PIL 10, later, when the clock thread becomes runnable, we must preempt the non-interrupt thread running on the CPU, which is still at PIL 10. So with the dispatcher lock and clock thread running at the same PIL, we could not tell (given a CPU at PIL 10) whether a dispatcher lock is held (in which case we can not preempt), or if the clock was blocked and we need to preempt the thread that is now running on the CPU. In order to address this issue, DISP_LEVEL was introduced, and it was mandated that the dispatcher run at PIL 11. This way, we know we can preempt anything at PIL 10, and anything at PIL 11 is illegal.

Thus, we have well-defined constraints for coding high-level interrupt handlers; don't block, and make it fast. High-priority interrupt handlers are reserved for critical system events, such as hardware faults, which is why DISP_LEVEL is not 15; even in a critical section, we do not want to mask notification of important system events.

The actual dispatcher locks are embedded in the disp_t structure, (disp_lock), one of which exists for each per-processor dispatch queue. There is also a disp_t (and associated disp_lock) for the kernel preempt (kp) queues (see Figure 3.4). Last, several locks that are defined in the dispatcher code are not directly associated with a dispatch queue but are part of the dispatcher subsystemthe swapped_lock, which manages the thread swap queue, and the shuttle_lock, which protects shuttle objects, are examples of dispatcher locks not directly bound to a dispatch queue. Dispatcher locks are simply an unsigned char data type (1 byte in size) that is set to zero when the lock is initialized.

3.4.1. Dispatcher Lock Functions

The kernel implements functions for initializing, acquiring, releasing, and destroying dispatcher locks.

/*  * Dispatcher lock type, macros and routines.  *  * disp_lock_t is defined in machlock.h  */ extern  void    disp_lock_enter(disp_lock_t *); extern  void    disp_lock_exit(disp_lock_t *); extern  void    disp_lock_exit_nopreempt(disp_lock_t *); extern  void    disp_lock_enter_high(disp_lock_t *); extern  void    disp_lock_exit_high(disp_lock_t *); extern  void    disp_lock_init(disp_lock_t *lp, char *name); extern  void    disp_lock_destroy(disp_lock_t *lp);                                                     See usr/src/uts/common/sys/t_lock.h

Dispatcher locks are acquired with disp_lock_enter() or disp_lock_enter_high() and released by calls to disp_lock_exit() or disp_lock_exit_high(). The disp_lock_enter_high() code acquires the specified dispatcher lock (passed as an argument) without explicitly elevating the processor's PIL. It is called when the processor's PIL is already at PIL. disp_lock_enter() elevates the processor's PIL to DISP_LEVEL, then attempts to acquire the lock. The required PIL manipulation aside, the general flow for both lock enter functions is similar:

Enter assembly code and test if lock is free.
If the lock is free, take ownership and return.
If the lock is not free (owned), enter spin loop.
In each pass through the loop, test to see if the lock is held. If it is not held, retry step 1 to take ownership of the lock.

The mechanism is fast and simple by design, allowing a lock to be acquired in just a couple of assembly language instructions if the lock is free. One added point on the spin loop: The lock_set_spl_spin() code does not execute the spin loop at an elevated PIL (DISP_LOCK). Inside the spin loop, the processor's PIL is lowered to the PIL value the processor was operating in when the disp_lock_enter() function was called. Since the thread is not holding a dispatcher lock inside the spin loop, we need not block low-level interrupts within the loop.

When it's time to free the lock, disp_lock_exit_high() causes the lock to be cleared and return. disp_lock_enter() is used when it's safe to test for a kernel preemption on lock release. Recall that with disp_lock_enter_high(), the processor is already at an elevated PIL (DISP_LOCK), and as such, disp_lock_exit_high() does not allow for a kernel preemption on freeing the lockit is not safe to allow a kernel preemption with the processor at a high PIL. disp_lock_exit() tests to determine if a kernel preemption is pending and if that condition is true, clears the lock and enters the preempt code. Otherwise, it just clears the lock.

3.4.2. Thread Locks

Thread locks are per-thread dispatcher locks that protect a thread's dispatch queue and critical thread state information. Where a dispatcher lock protects a dispatch queue to maintain consistency for various dispatcher functions, a thread lock provides a mechanism for protecting the dispatch queue specific to a kernel thread, along with the thread's state. Thread locks are implemented specifically to provide a fast synchronization mechanism. Rather than require kernel code to make two lock calls (one to get a dispatcher lock and one to get a lock to protect thread state), the kernel can quickly protect both the target thread and the dispatch queue it is linked to with a single lock call to acquire the thread lock. Put another way, acquiring the thread lock locks the thread and its dispatch queue.

The lock itself is a member of the kernel thread structure and is defined as a pointer to a dispatcher lock data type.

/* * Pointer to the dispatcher lock protecting t_state and state-related * flags.  This pointer can change during waits on the lock, so * it should be grabbed only by thread_lock(). */ disp_lock_t     *t_lockp;       /* pointer to the dispatcher lock */                                                         See usr/src/uts/common/thread.h

A kernel thread's t_lockp is set to point to the dispatcher lock of the dispatch queue onto which the thread is inserted in the queue insertion functions, using the ThrEAD_SET_STATE macro.

#define THREAD_SET_STATE(tp, state, lp) \                 ((tp)->t_state = state, (tp)->t_lockp = lp)                                                      See usr/src/uts/common/sys/thread.h

The macro is passed the thread pointer, the state to set the thread to (for example, TS_RUN), and a pointer to the dispatcher lock.

The actual lock backing the thread lock depends on the thread's state. A thread in TS_ONPROC state has its lock in the CPU on which it is running. A TS_RUN thread's lock is in the dispatch queue the thread is on, and a TS_SLEEP thread's lock resides in the corresponding sleep queue. Setting the thread state with THREAD_SET_STATE sets the thread's thread lock to the appropriate place based on the new state.

A kernel thread's t_lockp may also reference the transition_lock, the stop_lock, or a sleep queue lock. The lock names give us a good indication of their use; a thread's t_lockp is set to the transition lock when the thread's state is changing. The transition lock is necessary because thread state changes often result in changes to the thread's t_lockp. For example, when a thread transitions from running (TS_ONPROC) to sleep (TS_SLEEP), the t_lockp is set to the lock associated with the sleep queue on which the thread is placed. If a thread is migrated to another processor, the address of the dispatcher lock changes (since dispatch queues are per-processor), resulting in a change to the thread's t_lockp. The transition lock provides a simple and safe mechanism for protecting thread state during such transitions.

The stop lock is used when a thread is being created, which is the initial state of a thread. Threads can also be stopped when executed under the control of a debugger.

3.4.3. Thread Lock Functions

The functions called to acquire and release thread locks are similar to the dispatcher lock code. tHRead_lock() and thread_lock_high() both attempt to acquire the thread lock and enter a spin loop, checking for lock availability in each pass through the loop. Like dispatcher locks, thread locks are held with the processor at an elevated interrupt level. If the spin loop is entered (the lock is not free), the processor's interrupt priority level is lowered to the level it was running at when the thread_lock() function was entered and raised to DISP_LEVEL when the lock is acquired. tHRead_lock_high() is called when the processor is already running at DISP_LEVEL.

void    thread_transition(kthread_t *); /* move to transition lock */ void    thread_stop(kthread_t *);       /* move to stop lock */ void    thread_lock(kthread_t *);       /* lock thread and its queue */ void    thread_lock_high(kthread_t *);  /* lock thread and its queue */ void    thread_onproc(kthread_t *, struct cpu *); /* set onproc state lock */ #define thread_unlock(t)                disp_lock_exit((t)->t_lockp) #define thread_unlock_high(t)           disp_lock_exit_high((t)->t_lockp) #define thread_unlock_nopreempt(t)      disp_lock_exit_nopreempt((t)->t_lockp)                                                     See usr/src/uts/common/sys/thread.h

The lock release (unlock) functions are substituted with the dispatcher lock release functions by the C language #define directive (shown above): The disp_lock_exit(), etc., functions are actually called to release thread locks. When a lock is freed, a test is made to determine if a kernel preemption is pending. If it is, the lock is freed, the processor's interrupt priority level is restored to its previous value, and the kernel preemption function is called (see Section 3.9). A no-preempt release function is used when the dispatcher is in the process of selecting the best priority thread to run (the kernel disp_getbest() function) and preparing to context-switch the selected thread onto a processor for execution. Since this specific code segment is doing priority-based thread selection, a real-time thread would be selected for execution if one was runnable; and recall that it is real-time threads that generate kernel preemptions.

3.4.4. Lock Statistics

Statistics on dispatcher locks and thread locks are available through the lockstat(1) command (which is a dtrace consumer), or the lock functions can be instrumented through the use of the dtrace FBT provider. lockstat(1) can be invoked with an event list that specifies reporting only on spin locks and thread locks (events 2 and 3), as in the following.

# lockstat -e2,3 sleep 10 Spin lock spin: 129 events in 10.119 seconds (13 events/sec) Count indv cuml rcnt     spin Lock                   Caller -------------------------------------------------------------------------------    11   9%   9% 0.00        5 0x30001bacd08          setfrontdq+0x158     9   7%  16% 0.00        4 0x30001bacc18          setfrontdq+0x158     8   6%  22% 0.00        5 0x30001bacc78          disp+0x84     8   6%  28% 0.00        4 0x30001baccd8          setfrontdq+0x158     7   5%  33% 0.00       43 0x30001bacd08          disp+0x84     7   5%  39% 0.00        5 0x30001bacc78          setfrontdq+0x158     6   5%  43% 0.00        5 0x30001baccd8          setbackdq+0x2d0     5   4%  47% 0.00        2 0x30001bacd08          setbackdq+0x2d0 . . .     1   1% 100% 0.00        4 0x30001bacbe8          setbackdq+0x2d0 ------------------------------------------------------------------------------- Thread lock spin: 40 events in 10.119 seconds (4 events/sec) Count indv cuml rcnt     spin Lock                   Caller -------------------------------------------------------------------------------     7  18%  18% 0.00       87 cpu[12]+0xf8           ts_tick+0x8     6  15%  32% 0.00       75 cpu[5]+0xf8            ts_tick+0x8     4  10%  42% 0.00       29 cpu[5]+0xf8            cv_wait_sig_swap_core+0x54     4  10%  52% 0.00       87 cpu[4]+0xf8            ts_tick+0x8 . . . 1   2%  90% 0.00       22 cpu[1]+0xf8            preempt+0x1c     1   2%  92% 0.00       29 cpu[4]+0xf8            cv_wait_sig_swap_core+0x54     1   2%  95% 0.00      471 transition_lock        ts_update_list+0x68     1   2%  98% 0.00       67 sleepq_head+0x4b8      ts_tick+0x8     1   2% 100% 0.00   135475 0x30001bacbe8          ts_update_list+0x68 -------------------------------------------------------------------------------

The example above reports 129 spin lock events on dispatcher locks and 40 occurrences of a thread lock spin. The values in the example here are pretty tamethe values reported in the Count and spin columns are relatively small, suggesting that this system is not burning significant time in lock spin loops, nor is there any indication of a hot lock (a lock that is highly contended).