4.17. DTraceToolkit CommandsThe DTraceToolkit is a free collection of DTrace-based tools, some of which analyze disk behavior. We previously demonstrated cut-down versions of two of its scripts, bitesize.d and seeksize.d. Two of the most popular are iotop and iosnoop. 4.17.1. iotop Scriptiotop uses DTrace to print disk I/O summaries by process, for details such as size (bytes) and disk I/O times. The following demonstrates the default output of iotop, which prints size summaries and refreshes the screen every five seconds. # ./iotop 2006 Feb 13 13:38:21, load: 0.35, disk_r: 56615 Kb, disk_w: 637 Kb UID PID PPID CMD DEVICE MAJ MIN D BYTES 0 27732 27703 find cmdk0 102 0 R 38912 0 0 0 sched cmdk5 102 320 W 150016 0 0 0 sched cmdk2 102 128 W 167424 0 0 0 sched cmdk3 102 192 W 167424 0 0 0 sched cmdk4 102 256 W 167424 0 27733 27703 bart cmdk0 102 0 R 57897984 ... In the above output, the bart process read approximately 57 Mbytes from disk. Disk I/O time summaries can also be printed with -o, which uses the adaptive disk-response-time algorithm previously discussed. Here we demonstrate this with an interval of ten seconds. # ./iotop -o 10 2006 Feb 13 13:39:19, load: 0.38, disk_r: 74885 Kb, disk_w: 1345 Kb UID PID PPID CMD DEVICE MAJ MIN D DISKTIME 1 418 1 nfsd cmdk3 102 192 W 362 1 418 1 nfsd cmdk4 102 256 W 382 1 418 1 nfsd cmdk5 102 320 W 460 1 418 1 nfsd cmdk2 102 128 W 534 0 0 0 sched cmdk5 102 320 W 20643 0 0 0 sched cmdk3 102 192 W 25500 0 0 0 sched cmdk4 102 256 W 31024 0 0 0 sched cmdk2 102 128 W 35166 0 27732 27703 find cmdk0 102 0 R 722951 0 27733 27703 bart cmdk0 102 0 R 8858818 Note that iotop prints totals, not per second values. In this example, we read 74, 885 Mbytes from disk during those ten seconds (disk_r), with the top process bart (PID 27733) consuming 8.8 seconds of disk time. For this ten-second interval, 8.8 seconds equates to a utilization value of 88%. iotop can print %I/O utilization with the -P option; this percentage is based on 1000 ms of disk response time per second. The -C option can also be used to prevent the screen from being cleared and to instead provide a rolling output. # ./iotop -CP 1 ... 2006 Feb 13 13:40:34, load: 0.36, disk_r: 2350 Kb, disk_w: 1026 Kb UID PID PPID CMD DEVICE MAJ MIN D %I/O 0 0 0 sched cmdk0 102 0 R 0 0 3 0 fsflush cmdk0 102 0 W 1 0 27743 27742 dtrace cmdk0 102 0 R 2 0 3 0 fsflush cmdk0 102 0 R 8 0 0 0 sched cmdk0 102 0 W 14 0 27732 27703 find cmdk0 102 0 R 19 0 27733 27703 bart cmdk0 102 0 R 42 ... Figure 4.5 plots %I/O as find and bart read through /usr. This time bart causes heavier %I/O because there are bigger files to read and fewer directories for find to traverse. Figure 4.5. find and bart Read through /usrOther options for iotop can be listed with -h (this is version 0.75): # ./iotop -h USAGE: iotop [-C] [-D|-o|-P] [-j|-Z] [-d device] [-f filename] [-m mount_point] [-t top] [interval [count]] -C # don't clear the screen -D # print delta times, elapsed, us -j # print project ID -o # print disk delta times, us -P # print %I/O (disk delta times) -Z # print zone ID -d device # instance name to snoop -f filename # snoop this file only -m mount_point # this FS only -t top # print top number only eg, iotop # default output, 5 second samples iotop 1 # 1 second samples iotop -P # print %I/O (time based) iotop -m / # snoop events on filesystem / only iotop -t 20 # print top 20 lines only iotop -C 5 12 # print 12 x 5 second samples These options including printing Zone and Project details. 4.17.2. iosnoop Scriptiosnoop uses DTrace to monitor disk events in real time. The default output prints details such as PID, block address, and size. In the following example, a grep process reads several files from the /etc/default directory. # ./iosnoop UID PID D BLOCK SIZE COMM PATHNAME 0 1570 R 172636 2048 grep /etc/default/autofs 0 1570 R 102578 1024 grep /etc/default/cron 0 1570 R 102580 1024 grep /etc/default/devfsadm 0 1570 R 108310 4096 grep /etc/default/dhcpagent 0 1570 R 102582 1024 grep /etc/default/fs 0 1570 R 169070 1024 grep /etc/default/ftp 0 1570 R 108322 2048 grep /etc/default/inetinit 0 1570 R 108318 1024 grep /etc/default/ipsec 0 1570 R 102584 2048 grep /etc/default/kbd 0 1570 R 102588 1024 grep /etc/default/keyserv 0 1570 R 973440 8192 grep /etc/default/lu ... The output is printed as the disk events complete. To see a list of available options for iosnoop, use the -h option. The options include -o to print disk I/O time, using the adaptive disk-response-time algorithm previously discussed. The following is from iosnoop version 1.55. # ./iosnoop -h USAGE: iosnoop [-a|-A|-DeghiNostv] [-d device] [-f filename] [-m mount_point] [-n name] [-p PID] iosnoop # default output -a # print all data (mostly) -A # dump all data, space delimited -D # print time delta, us (elapsed) -e # print device name -g # print command arguments -i # print device instance -N # print major and minor numbers -o # print disk delta time, us -s # print start time, us -t # print completion time, us -v # print completion time, string -d device # instance name to snoop -f filename # snoop this file only -m mount_point # this FS only -n name # this process name only -p PID # this PID only eg, iosnoop -v # human readable timestamps iosnoop -N # print major and minor numbers iosnoop -m / # snoop events on filesystem / only The block addresses printed are relative to the disk slice, so what may appear to be similar block addresses may in fact be on different slices or disks. The -N option can help ensure that we are examining the same slice since it prints major and minor numbers on which we can be match. 4.17.3. Plotting Disk ActivityUsing the -t option for iosnoop prints the disk completion time in microseconds. In combination with -N, we can use this data to plot disk events for a process on one slice. Here we fetch the data for the find command, which contains the time (microseconds since boot) and block address. These are our X and Y coordinates. We check that we remain on the same slice (major and minor numbers) and then generate an X/Y plot. # ./iosnoop -tN TIME MAJ MIN UID PID D BLOCK SIZE COMM PATHNAME 1175384556358 102 0 0 27703 W 3932432 4096 ksh /root/.sh_history 1175384556572 102 0 0 27703 W 3826 512 ksh <none> 1175384565841 102 0 0 27849 R 198700 1024 find /usr/dt 1175384578103 102 0 0 27849 R 770288 3072 find /usr/dt/bin 1175384582354 102 0 0 27849 R 690320 8192 find <none> 1175384582817 102 0 0 27849 R 690336 8192 find <none> 1175384586787 102 0 0 27849 R 777984 2048 find /usr/dt/lib 1175384594313 102 0 0 27849 R 733880 1024 find /usr/dt/lib/amd64 ... We ran a find / command to generate random disk activity; the results are shown in Figure 4.6. As the disk heads seek to different block addresses, the position of the heads is plotted in red. Figure 4.6. Plotting Disk Activity, a Random I/O ExampleAre we really looking at disk head seek patterns? Not exactly. What we are looking at are block addresses for biodone functions from the block I/O driver. We aren't using some X-ray vision to look at the heads themselves. Now, if this is a simple disk device, then the block address probably relates to the disk head location.[12] But if this is a virtual device, say, a storage array, then block addresses could map to anything, depending on the storage layout. However, we could at least say that a large jump in block address probably means a seek at some point (although storage arrays will cache).
The block addresses do help us visualize the pattern of completed disk activity. But remember that "completed" means the block I/O driver thinks that the I/O event completed. 4.17.4. Plotting Concurrent ActivityPreviously, we discussed concurrent disk activity and included a plot (Figure 4.2) to help us understand how these events may occur. Since DTrace can easily trace concurrent disk activity, we can include a plot of actual activity. The following DTrace script provides input for a spreadsheet. We match on a device by its major and minor numbers, then print timestamps as the first column and block addresses for strategy and biodone events in the remaining columns. #!/usr/sbin/dtrace -s #pragma D option quiet io:genunix::start /args[1]->dev_major == 102 && args[1]->dev_minor == 0/ { printf("%d,%d,\n", timestamp/1000, args[0]->b_blkno); } io:genunix::done /args[1]->dev_major == 102 && args[1]->dev_minor == 0/ { printf("%d,,%d\n", timestamp/1000, args[0]->b_blkno); } The output of the DTrace script was plotted as Figure 4.7, with timestamps as X-coordinates. Figure 4.7. Plotting Raw Driver Events: Strategy and BiodoneInitially, we see many quick strategies between 0 and 200 µs, ending in almost a vertical line. This is then followed by slower biodones as the disk catches up at mechanical speeds. |