9.2. Profiling the Kernel and DriversThe lockstat command and DTrace can profile the kernel and so identify hot functions. We begin by discussing lockstat's kernel profile function (the profile capability is buried inside the lock statistics tool). We then briefly mention how we would use DTrace. For a full description of how to use DTrace, refer to Chapter 10. 9.2.1. Profiling the Kernel with lockstat -IThe lockstat utility contains a kernel profiling capability. By specifying the -I option, you instruct the lockstat utility to collect kernel function samples from a time-based profile interrupt, rather than from lock contention events. The following profile summarizes sampled instruction addresses and can optionally be reduced to function names or other specific criteria. # lockstat -kIi997 sleep 10 Profiling interrupt: 10596 events in 5.314 seconds (1994 events/sec) Count indv cuml rcnt nsec CPU+PIL Caller ------------------------------------------------------------------------------- 5122 48% 48% 1.00 1419 cpu[0] default_copyout 1292 12% 61% 1.00 1177 cpu[1] splx 1288 12% 73% 1.00 1118 cpu[1] idle 911 9% 81% 1.00 1169 cpu[1] disp_getwork 695 7% 88% 1.00 1170 cpu[1] i_ddi_splhigh 440 4% 92% 1.00 1163 cpu[1]+11 splx 414 4% 96% 1.00 1163 cpu[1]+11 i_ddi_splhigh 254 2% 98% 1.00 1176 cpu[1]+11 disp_getwork 27 0% 99% 1.00 1349 cpu[0] uiomove 27 0% 99% 1.00 1624 cpu[0] bzero 24 0% 99% 1.00 1205 cpu[0] mmrw 21 0% 99% 1.00 1870 cpu[0] (usermode) 9 0% 99% 1.00 1174 cpu[0] xcopyout 8 0% 99% 1.00 650 cpu[0] ktl0 6 0% 99% 1.00 1220 cpu[0] mutex_enter 5 0% 99% 1.00 1236 cpu[0] default_xcopyout 3 0% 100% 1.00 1383 cpu[0] write 3 0% 100% 1.00 1330 cpu[0] getminor 3 0% 100% 1.00 333 cpu[0] utl0 2 0% 100% 1.00 961 cpu[0] mmread 2 0% 100% 1.00 2000 cpu[0]+10 read_rtc In the example, we use -I to request a kernel profile at 997 hertz (-i997) and to coalesce instruction addresses into function names (-k). If we didn't specify -k, then we would see samples with instruction level resolution, as function+offset. In the next example, we request that stack backtraces be collected for each sample, to a depth of 10 (-s 10). With this option, lockstat prints a summary of each unique stack as sampled. # lockstat -i997 -Iks 10 sleep 30 Profiling interrupt: 119800 events in 30.038 seconds (3988 events/sec) ------------------------------------------------------------------------------- Count indv cuml rcnt nsec CPU+PIL Hottest Caller 29919 25% 25% 0.00 5403 cpu[2] kcopy nsec ------ Time Distribution ------ count Stack 1024 | 2 uiomove 2048 | 18 rdip 4096 | 25 ufs_read 8192 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 29853 fop_read 16384 | 21 pread64 sys_syscall32 -------------------- ----------------------------------------------------------- Count indv cuml rcnt nsec CPU+PIL Hottest Caller 29918 25% 50% 0.00 5386 cpu[1] kcopy nsec ------ Time Distribution ------ count Stack 4096 | 38 uiomove 8192 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 29870 rdip 16384 | 10 ufs_read fop_read pread64 sys_syscall32 ------------------------------------------------------------------------------- Count indv cuml rcnt nsec CPU+PIL Hottest Caller 29893 25% 75% 0.00 5283 cpu[3] kcopy nsec ------ Time Distribution ------ count Stack 1024 | 140 uiomove 2048 | 761 rdip 4096 |@ 1443 ufs_read 8192 |@@@@@@@@@@@@@@@@@@@@@@@@@@@ 27532 fop_read 16384 | 17 pread64 sys_syscall32 ------------------------------------------------------------------------------- |