2.11. CPU Run Queue LatencyThe microstate accounting system maintains accurate time counters for threads as well as CPUs. Thread-based microstate accounting tracks several meaningful states per thread in addition to user and system time, which include trap time, lock time, sleep time, and latency time. The process statistics tool, prstat, reports the per-thread microstates for user processes. $ prstat -mL PID USERNAME USR SYS TRP TFL DFL LCK SLP LAT VCX ICX SCL SIG PROCESS/LWPID 25644 rmc 98 1.5 0.0 0.0 0.0 0.0 0.0 0.1 0 36 693 0 filebench/2 25660 rmc 98 1.7 0.1 0.0 0.0 0.0 0.0 0.1 2 44 693 0 filebench/2 25650 rmc 98 1.4 0.1 0.0 0.0 0.0 0.0 0.1 0 45 699 0 filebench/2 25655 rmc 98 1.4 0.1 0.0 0.0 0.0 0.0 0.2 0 46 693 0 filebench/2 25636 rmc 98 1.6 0.1 0.0 0.0 0.0 0.0 0.2 1 50 693 0 filebench/2 25651 rmc 98 1.6 0.1 0.0 0.0 0.0 0.0 0.2 0 54 693 0 filebench/2 25656 rmc 98 1.5 0.1 0.0 0.0 0.0 0.0 0.2 0 60 693 0 filebench/2 25639 rmc 98 1.5 0.1 0.0 0.0 0.0 0.0 0.2 1 61 693 0 filebench/2 25634 rmc 98 1.3 0.1 0.0 0.0 0.0 0.0 0.4 0 63 693 0 filebench/2 25654 rmc 98 1.3 0.1 0.0 0.0 0.0 0.0 0.4 0 67 693 0 filebench/2 25659 rmc 98 1.7 0.1 0.0 0.0 0.0 0.0 0.4 1 68 693 0 filebench/2 25647 rmc 98 1.5 0.1 0.0 0.0 0.0 0.0 0.4 0 73 693 0 filebench/2 25648 rmc 98 1.6 0.1 0.0 0.0 0.0 0.3 0.2 2 48 693 0 filebench/2 25643 rmc 98 1.6 0.1 0.0 0.0 0.0 0.0 0.5 0 75 693 0 filebench/2 25642 rmc 98 1.4 0.1 0.0 0.0 0.0 0.0 0.5 0 80 693 0 filebench/2 25638 rmc 98 1.4 0.1 0.0 0.0 0.0 0.0 0.6 0 76 693 0 filebench/2 25657 rmc 97 1.8 0.1 0.0 0.0 0.0 0.4 0.3 6 64 693 0 filebench/2 25646 rmc 97 1.7 0.1 0.0 0.0 0.0 0.0 0.6 6 83 660 0 filebench/2 25645 rmc 97 1.6 0.1 0.0 0.0 0.0 0.0 0.9 0 55 693 0 filebench/2 25652 rmc 97 1.7 0.2 0.0 0.0 0.0 0.0 0.9 2 106 693 0 filebench/2 25658 rmc 97 1.5 0.1 0.0 0.0 0.0 0.0 1.0 0 72 693 0 filebench/2 25637 rmc 97 1.7 0.1 0.0 0.0 0.0 0.3 0.6 4 95 693 0 filebench/2 Total: 91 processes, 510 lwps, load averages: 28.94, 28.66, 24.39 By specifying the -m (show microstates) and -L (show per-thread) options, we can observe the per-thread microstates. These microstates represent a time-based summary broken into percentages for each thread. The columns USR tHRough LAT sum to 100% of the time spent for each thread during the prstat sample. The important microstates for CPU utilization are USR, SYS, and LAT. The USR and SYS columns are the user and system time that this thread spent running on the CPU. The LAT (latency) column is the amount of time spent waiting for CPU. A non-zero number means there was some queuing for CPU resources. This is an extremely useful metricwe can use it to estimate the potential speedup for a thread if more CPU resources are added, assuming no other bottlenecks obstruct the way. In our example, we can see that on average the filebench tHReads are waiting for CPU about 0.2% of the time, so we can conclude that CPU resources for this system are not constrained. Another example shows what we would observe when the system is CPU-resource constrained. In this example, we can see that on average each thread is waiting for CPU resource about 80% of the time. $ prstat -mL PID USERNAME USR SYS TRP TFL DFL LCK SLP LAT VCX ICX SCL SIG PROCESS/LWPID 25765 rmc 22 0.3 0.1 0.0 0.0 0.0 0.0 77 0 42 165 0 filebench/2 25833 rmc 22 0.3 0.3 0.0 0.0 0.0 0.0 77 0 208 165 0 filebench/2 25712 rmc 20 0.2 0.1 0.0 0.0 0.0 0.0 80 0 53 132 0 filebench/2 25758 rmc 20 0.3 0.1 0.0 0.0 0.0 0.0 80 0 84 148 0 filebench/2 25715 rmc 20 0.3 0.1 0.0 0.0 0.0 0.0 80 0 56 132 0 filebench/2 25812 rmc 19 0.2 0.1 0.0 0.0 0.0 0.0 81 0 50 132 0 filebench/2 25874 rmc 19 0.2 0.0 0.0 0.0 0.0 0.0 81 0 22 132 0 filebench/2 25842 rmc 19 0.2 0.2 0.0 0.0 0.0 0.0 81 1 92 132 0 filebench/2 25732 rmc 19 0.2 0.1 0.0 0.0 0.0 0.0 81 0 54 99 0 filebench/2 25714 rmc 18 0.3 0.1 0.0 0.0 0.0 0.0 81 0 84 165 0 filebench/2 25793 rmc 18 0.3 0.1 0.0 0.0 0.0 0.0 81 0 30 132 0 filebench/2 25739 rmc 18 0.3 0.3 0.0 0.0 0.0 0.0 81 0 150 115 0 filebench/2 25849 rmc 18 0.3 0.0 0.0 0.0 0.0 0.0 81 1 19 132 0 filebench/2 25788 rmc 18 0.2 0.1 0.0 0.0 0.0 0.0 81 0 77 99 0 filebench/2 25760 rmc 18 0.2 0.0 0.0 0.0 0.0 0.0 82 0 26 132 0 filebench/2 25748 rmc 18 0.3 0.1 0.0 0.0 0.0 0.0 82 0 58 132 0 filebench/2 25835 rmc 18 0.3 0.1 0.0 0.0 0.0 0.0 82 0 65 132 0 filebench/2 25851 rmc 18 0.2 0.1 0.0 0.0 0.0 0.0 82 0 87 99 0 filebench/2 25811 rmc 18 0.3 0.2 0.0 0.0 0.0 0.0 82 0 129 132 0 filebench/2 25767 rmc 18 0.2 0.1 0.0 0.0 0.0 0.0 82 1 25 132 0 filebench/2 25740 rmc 18 0.3 0.2 0.0 0.0 0.0 0.0 82 0 118 132 0 filebench/2 25770 rmc 18 0.2 0.1 0.0 0.0 0.0 0.0 82 0 68 132 0 filebench/2 Total: 263 processes, 842 lwps, load averages: 201.45, 192.26, 136.16 We can further investigate which threads are consuming CPU within each process by directing prstat to examine a specific process. $ prstat -Lm -p 25691 PID USERNAME USR SYS TRP TFL DFL LCK SLP LAT VCX ICX SCL SIG PROCESS/LWPID 25691 rmc 17 0.2 0.2 0.0 0.0 0.0 0.0 83 0 74 99 0 filebench/2 25691 rmc 0.0 0.0 0.0 0.0 0.0 0.0 100 0.0 0 0 0 0 filebench/1 The example shows us that thread number two in the target process is using the most CPU, and spending 83% of its time waiting for CPU. We can further look at information about thread number two with the pstack command. $ pstack 25691/2 25691: shadow -a shadow -i 2 -s ffffffff10000000 -m /var/tmp/fbenchJDailY ----------------- lwp# 2 / thread# 2 -------------------- 000000010001ae90 flowoplib_hog (30d40, ffffffff6518dc60, 30d40, ffffffff37352c88, 1, 2570d) + 68 00000001000194a4 flowop_start (ffffffff37352c88, 0, 1, 0, 1, 1000888b0) + 408 ffffffff7e7ccea0 _lwp_start (0, 0, 0, 0, 0, 0) In this example, we've taken a snapshot of the stack of thread number two of our target process. At the time the snapshot was taken, we can see that the function flowop_start was calling flowoplib_hog. It's sometimes worth taking several snapshots to see if a pattern is exhibited. DTrace can analyze this further. |