17.6. Reader/Writer LocksReader/writer (RW) locks provide mutual exclusion semantics on write locks. Only one thread at a time is allowed to own the write lock, but there is concurrent access for readers. These locks are designed for scenarios in which it is acceptable to have multiple threads reading the data at the same time, but only one writer. While a writer is holding the lock, no readers are allowed. Also, because of the wakeup mechanism, a writer lock is a better solution for kernel code segments that require relatively long hold times, as we will see shortly. The basic mechanics of RW locks are similar to mutexes, in that RW locks have an initialization function (rw_init()), an entry function to acquire the lock (rw_enter()), and an exit function to release the lock (rw_exit()). The entry and exit points are optimized in assembly code to deal with the simple cases, and they call into C language functions if anything beyond the simplest case must be dealt with. As with mutex locks, the simple case is that the requested lock is available on an entry (acquire) call and no threads are waiting for the lock on the exit (release) call. 17.6.1. Solaris Reader/Writer LocksReader/writer locks are implemented as a single-word data structure in the kernel, either 32 bits or 64 bits wide, depending on the data model of the running kernel, as depicted in Figure 17.6. Figure 17.6. Reader/Writer Locktypedef struct rwlock_impl { uintptr_t rw_wwwh; /* waiters, write wanted, hold count */ } rwlock_impl_t; #endif /* _ASM */ #define RW_HAS_WAITERS 1 #define RW_WRITE_WANTED 2 #define RW_WRITE_LOCKED 4 #define RW_READ_LOCK 8 #define RW_WRITE_LOCK(thread) ((uintptr_t)(thread) | RW_WRITE_LOCKED) #define RW_HOLD_COUNT (-RW_READ_LOCK) #define RW_HOLD_COUNT_SHIFT 3 /* log2(RW_READ_LOCK) */ #define RW_READ_COUNT RW_HOLD_COUNT #define RW_OWNER RW_HOLD_COUNT #define RW_LOCKED RW_HOLD_COUNT #define RW_WRITE_CLAIMED (RW_WRITE_LOCKED | RW_WRITE_WANTED) #define RW_DOUBLE_LOCK (RW_WRITE_LOCK(0) | RW_READ_LOCK) See sys/rwlock.h There are two states for the reader writer lock, depending on whether the lock is held by a writer, as indicated by bit 2, wrlock. Bit 2, wrlock, is the actual write lock, and it determines the meaning of the high-order bits. If the write lock is held (bit 2 set), then the upper bits contain a pointer to the kernel thread holding the write lock. If bit 2 is clear, then the upper bits contain a count of the number of threads holding the lock as a read lock. The Solaris 10 RW lock defines bit 0, the wait bit, set to signify that threads are waiting for the lock. The wrwant bit (write wanted, bit 1) indicates that at least one thread is waiting for a write lock. The simple cases for lock acquisition through rw_enter() are the circumstances listed below:
The acquisition of the write lock results in bit 2 getting set and the kernel thread pointer getting loaded in the upper bits. For a reader, the hold count (upper bits) is incremented. Conditions where the write lock is being held, causing a lock request to fail, or where a thread is waiting for a write lock, causing a read lock request to fail, result in a call to the rw_enter_sleep() function. Important to note is that the rw_enter() code sets a flag in the kernel thread used by the dispatcher code when establishing a kernel thread's priority before preemption or changing state to sleep. We cover this in more detail in the paragraph beginning "It is in the dispatcher queue insertion code" on Page 262. Briefly, the kernel thread structure contains a t_kpri_req (kernel priority request) field that is checked in the dispatcher code when a thread is about to be preempted (forced off the processor on which it is executing because a higher-priority thread becomes runnable) or when the thread is about to have its state changed to sleep. If the t_kpri_req flag is set, the dispatcher assigns a kernel priority to the thread, such that when the thread resumes execution, it will run before threads in scheduling classes of lower priority (timeshare and interactive class threads). More succinctly, the priority of a thread holding a write lock is set to a better priority to minimize the hold time of the lock. Getting back to the rw_enter() flow: If the code falls through the simple case, we need to set up the kernel thread requesting the RW lock to block.
The acquisition of a RW lock and subsequent behavior if the lock is held are straightforward and similar in many ways to what happens in the mutex case. Things get interesting when a thread calls rw_exit() to release a lock it is hold-ingthere are several potential solutions to the problem of determining which thread gets the lock next. A wakeup is issued on all threads that are sleeping, waiting for the mutex, and we know from empirical data that this solution works well for reasons previously discussed. With RW locks, we're dealing with potentially longer hold times, which could result in more sleepers, a desire to give writers priority over readers (it's typically best to not have a reader read data that's about to be changed by a pending writer), and the potential for the priority inversion problem described in Section 17.7. For rw_exit(), which is called by the lock holder when it is ready to release the lock, the simple case is that there are no waiters. In this case, the wrlock bit is cleared if the holder was a writer, or the hold count field is decremented to reflect one less reader. The more complex case of the system having waiters when the lock is released is dealt with in the following manner:
The remaining bits of the algorithm go as follows:
The "let's favor writers but be fair to readers" policy described above was first implemented in Solaris 2.6. |