9.3. Analyzing Kernel LocksLocks are used in the kernel to serialize access to critical regions and data structures. If contention occurs around a lock, a performance problem or scalability limitation can result. Two main tools analyze lock contention in the kernel:lockstat(1M) and the DTrace lockstat provider. 9.3.1. Adaptive LocksAdaptive locks enforce mutual exclusion to a critical section and can be acquired in most contexts in the kernel. Because adaptive locks have few context restrictions, they constitute the vast majority of synchronization primitives in the Solaris kernel. These locks are adaptive in their behavior with respect to contention. When a thread attempts to acquire a held adaptive lock, it determines if the owning thread is currently running on a CPU. If the owner is running on another CPU, the acquiring thread spins. If the owner is not running, the acquiring thread blocks. To observe adaptive locks, first consider the spin behavior. Locks that spin excessively burn CPU cycles, behavior that is manifested as high system time. If you notice high system time with mpstat(1M), spin locks might be a contributor. You can confirm the amount of system time that results from spinning lock contention by looking at the kernel function profile; spinning locks show up as mutex_* functions high in the profile. To identify which lock is spinning and which functions are causing the lock contention, use lockstat(1M) and the DTrace lockstat provider. Adaptive locks that block yield the CPU, and excessive blocking results in idle time and nonscalability. To identify which lock is blocking and which functions are causing the lock contention, again use lockstat(1M) and DTrace. 9.3.2. Spin LocksThreads cannot block in some kernel contexts, such as high-level interrupt context and any context-manipulating dispatcher state. In these contexts, this restriction prevents the use of adaptive locks. Spin locks are instead used to effect mutual exclusion to critical sections in these contexts. As the name implies, the behavior of these locks in the presence of contention is to spin until the lock is released by the owning thread. Locks that spin excessively burn CPU cycles, manifested as high system time. If you notice high system time with mpstat(1M), spin locks might be a contributor. You can confirm the amount of system time that results from spinning lock contention by looking at the kernel function profile; spinning locks show up as mutex_* functions high in the profile. To identify which lock is spinning and which functions are causing the lock contention, use lockstat(1M) and the DTrace lockstat provider. 9.3.3. Reader/Writer LocksReaders/writer locks enforce a policy of allowing multiple readers or a single writerbut not bothto be in a critical section. These locks are typically used for structures that are searched more frequently than they are modified and for which there is substantial time in the critical section. If critical section times are short, readers/writer locks implicitly serialize over the shared memory used to implement the lock, giving them no advantage over adaptive locks. See rwlock(9F) for more details on readers/writer locks. Reader/writer locks that block yield the CPU, and excessive blocking results in idle time and nonscalability. To identify which lock is blocking and which functions are causing the lock contention, use lockstat(1M) and the DTrace lockstat provider. 9.3.4. Thread LocksA thread lock is a special kind of spin lock that locks a thread in order to change thread state. 9.3.5. Analyzing Locks with lockstatThe lockstat command provides summary or detail information about lock events in the kernel. By default (without the -I as previously demonstrated), it provides a systemwide summary for lock contention events for the duration of a command that is supplied as an argument. For example, to make lockstat sample for 30 seconds, we often use sleep 30 as the command. Note that lockstat doesn't actually introspect the sleep command; it's only there to control the sample window. We recommend starting with the -P option, which sorts by the product of the number of contention events with the cost of the contention event (this puts the most resource expensive events at the top of the list). # lockstat -P sleep 30 Adaptive mutex spin: 3486197 events in 30.031 seconds (116088 events/sec) Count indv cuml rcnt spin Lock Caller ------------------------------------------------------------------------------- 1499963 43% 43% 0.00 84 pr_pidlock pr_p_lock+0x29 1101187 32% 75% 0.00 24 0xffffffff810cdec0 pr_p_lock+0x50 285012 8% 83% 0.00 27 0xffffffff827a9858 rdip+0x506 ... For each type of lock, the total number of events during the sample and the length of the sample period are displayed. For each record within the lock type, the following information is provided:
To estimate the impact of a lock, multiply Count by the cost. For example, if a blocking event on average costs 48, 944, 759 ns and the event occurs 1, 929 times in a 30-second window, we can assert that the lock is blocking threads for a total of 94 seconds during that period (30 seconds). How is this greater than 30 seconds? Multiple threads are blocking, so because of overlapping blocking events, the total blocking time can be larger than the elapsed time of the sample. The full output from this example with the -P option follows. # lockstat -P sleep 30 Adaptive mutex spin: 3486197 events in 30.031 seconds (116088 events/sec) Count indv cuml rcnt spin Lock Caller ------------------------------------------------------------------------------- 1499963 43% 43% 0.00 84 pr_pidlock pr_p_lock+0x29 1101187 32% 75% 0.00 24 0xffffffff810cdec0 pr_p_lock+0x50 285012 8% 83% 0.00 27 0xffffffff827a9858 rdip+0x506 212621 6% 89% 0.00 29 0xffffffff827a9858 rdip+0x134 98531 3% 92% 0.00 103 0xffffffff9321d480 releasef+0x55 92486 3% 94% 0.00 19 0xffffffff8d5c4990 ufs_lockfs_end+0x81 89404 3% 97% 0.00 27 0xffffffff8d5c4990 ufs_lockfs_begin+0x9f 83186 2% 99% 0.00 96 0xffffffff9321d480 getf+0x5d 6356 0% 99% 0.00 186 0xffffffff810cdec0 clock+0x4e9 1164 0% 100% 0.00 141 0xffffffff810cdec0 post_syscall+0x352 294 0% 100% 0.00 11 0xffffffff801a4008 segmap_smapadd+0x77 279 0% 100% 0.00 11 0xffffffff801a41d0 segmap_getmapflt+0x275 278 0% 100% 0.00 11 0xffffffff801a48f0 segmap_smapadd+0x77 276 0% 100% 0.00 11 0xffffffff801a5010 segmap_getmapflt+0x275 276 0% 100% 0.00 11 0xffffffff801a4008 segmap_getmapflt+0x275 ... Adaptive mutex block: 3328 events in 30.031 seconds (111 events/sec) Count indv cuml rcnt nsec Lock Caller ------------------------------------------------------------------------------- 1929 58% 58% 0.00 48944759 pr_pidlock pr_p_lock+0x29 263 8% 66% 0.00 47017 0xffffffff810cdec0 pr_p_lock+0x50 255 8% 74% 0.00 53392369 0xffffffff9321d480 getf+0x5d 217 7% 80% 0.00 26133 0xffffffff810cdec0 clock+0x4e9 207 6% 86% 0.00 227146 0xffffffff827a9858 rdip+0x134 197 6% 92% 0.00 64467 0xffffffff8d5c4990 ufs_lockfs_begin+0x9f 122 4% 96% 0.00 64664 0xffffffff8d5c4990 ufs_lockfs_end+0x81 112 3% 99% 0.00 164559 0xffffffff827a9858 rdip+0x506 Spin lock spin: 3491 events in 30.031 seconds (116 events/sec) Count indv cuml rcnt spin Lock Caller ------------------------------------------------------------------------------- 2197 63% 63% 0.00 2151 turnstile_table+0xbd8 disp_lock_enter+0x35 314 9% 72% 0.00 3129 turnstile_table+0xe28 disp_lock_enter+0x35 296 8% 80% 0.00 3162 turnstile_table+0x888 disp_lock_enter+0x35 211 6% 86% 0.00 2032 turnstile_table+0x8a8 disp_lock_enter+0x35 127 4% 90% 0.00 856 turnstile_table+0x9f8 turnstile_interlock+0x171 114 3% 93% 0.00 269 turnstile_table+0x9f8 disp_lock_enter+0x35 44 1% 95% 0.00 90 0xffffffff827f4de0 disp_lock_enter_high+0x13 37 1% 96% 0.00 581 0xffffffff827f4de0 disp_lock_enter+0x35 ... Thread lock spin: 1104 events in 30.031 seconds (37 events/sec) Count indv cuml rcnt spin Lock Caller ------------------------------------------------------------------------------- 487 44% 44% 0.00 1671 turnstile_table+0xbd8 ts_tick+0x26 219 20% 64% 0.00 1510 turnstile_table+0xbd8 turnstile_block+0x387 92 8% 72% 0.00 1941 turnstile_table+0x8a8 ts_tick+0x26 77 7% 79% 0.00 2037 turnstile_table+0xe28 ts_tick+0x26 74 7% 86% 0.00 2296 turnstile_table+0x888 ts_tick+0x26 36 3% 89% 0.00 292 cpu[0]+0xf8 ts_tick+0x26 27 2% 92% 0.00 55 cpu[1]+0xf8 ts_tick+0x26 11 1% 93% 0.00 26 cpu[3]+0xf8 ts_tick+0x26 10 1% 94% 0.00 11 cpu[2]+0xf8 post_syscall+0x556 |