Section 4.17. DTraceToolkit Commands

4.17. DTraceToolkit Commands

The DTraceToolkit is a free collection of DTrace-based tools, some of which analyze disk behavior. We previously demonstrated cut-down versions of two of its scripts, bitesize.d and seeksize.d. Two of the most popular are iotop and iosnoop.

4.17.1. `iotop` Script

iotop uses DTrace to print disk I/O summaries by process, for details such as size (bytes) and disk I/O times. The following demonstrates the default output of iotop, which prints size summaries and refreshes the screen every five seconds.

# ./iotop 2006 Feb 13 13:38:21,  load: 0.35,  disk_r:  56615 Kb,  disk_w:    637 Kb   UID    PID   PPID CMD              DEVICE  MAJ MIN D            BYTES     0  27732  27703 find             cmdk0   102   0 R            38912     0      0      0 sched            cmdk5   102 320 W           150016     0      0      0 sched            cmdk2   102 128 W           167424     0      0      0 sched            cmdk3   102 192 W           167424     0      0      0 sched            cmdk4   102 256 W           167424     0  27733  27703 bart             cmdk0   102   0 R         57897984 ...

In the above output, the bart process read approximately 57 Mbytes from disk. Disk I/O time summaries can also be printed with -o, which uses the adaptive disk-response-time algorithm previously discussed. Here we demonstrate this with an interval of ten seconds.

# ./iotop -o 10 2006 Feb 13 13:39:19,  load: 0.38,  disk_r:  74885 Kb,  disk_w:   1345 Kb   UID    PID   PPID CMD              DEVICE  MAJ MIN D         DISKTIME     1    418      1 nfsd             cmdk3   102 192 W              362     1    418      1 nfsd             cmdk4   102 256 W              382     1    418      1 nfsd             cmdk5   102 320 W              460     1    418      1 nfsd             cmdk2   102 128 W              534     0      0      0 sched            cmdk5   102 320 W            20643     0      0      0 sched            cmdk3   102 192 W            25500     0      0      0 sched            cmdk4   102 256 W            31024     0      0      0 sched            cmdk2   102 128 W            35166     0  27732  27703 find             cmdk0   102   0 R           722951     0  27733  27703 bart             cmdk0   102   0 R          8858818

Note that iotop prints totals, not per second values. In this example, we read 74, 885 Mbytes from disk during those ten seconds (disk_r), with the top process bart (PID 27733) consuming 8.8 seconds of disk time. For this ten-second interval, 8.8 seconds equates to a utilization value of 88%.

iotop can print %I/O utilization with the -P option; this percentage is based on 1000 ms of disk response time per second. The -C option can also be used to prevent the screen from being cleared and to instead provide a rolling output.

# ./iotop -CP 1 ... 2006 Feb 13 13:40:34,  load: 0.36,  disk_r:   2350 Kb,  disk_w:   1026 Kb   UID    PID   PPID CMD              DEVICE  MAJ MIN D   %I/O     0      0      0 sched            cmdk0   102   0 R      0     0      3      0 fsflush          cmdk0   102   0 W      1     0  27743  27742 dtrace           cmdk0   102   0 R      2     0      3      0 fsflush          cmdk0   102   0 R      8     0      0      0 sched            cmdk0   102   0 W     14     0  27732  27703 find             cmdk0   102   0 R     19     0  27733  27703 bart             cmdk0   102   0 R     42 ...

Figure 4.5 plots %I/O as find and bart read through /usr. This time bart causes heavier %I/O because there are bigger files to read and fewer directories for find to traverse.

Figure 4.5. `find` and `bart` Read through `/usr`

Other options for iotop can be listed with -h (this is version 0.75):

# ./iotop -h USAGE: iotop [-C] [-D|-o|-P] [-j|-Z] [-d device] [-f filename]              [-m mount_point] [-t top] [interval [count]]                 -C      # don't clear the screen                 -D      # print delta times, elapsed, us                 -j      # print project ID                 -o      # print disk delta times, us                 -P      # print %I/O (disk delta times)                 -Z      # print zone ID                 -d device       # instance name to snoop                 -f filename     # snoop this file only                 -m mount_point  # this FS only                 -t top          # print top number only    eg,         iotop         # default output, 5 second samples         iotop 1       # 1 second samples         iotop -P      # print %I/O (time based)         iotop -m /    # snoop events on filesystem / only         iotop -t 20   # print top 20 lines only         iotop -C 5 12 # print 12 x 5 second samples

These options including printing Zone and Project details.

4.17.2. `iosnoop` Script

iosnoop uses DTrace to monitor disk events in real time. The default output prints details such as PID, block address, and size. In the following example, a grep process reads several files from the /etc/default directory.

# ./iosnoop   UID   PID D    BLOCK   SIZE       COMM PATHNAME     0  1570 R   172636   2048       grep /etc/default/autofs     0  1570 R   102578   1024       grep /etc/default/cron     0  1570 R   102580   1024       grep /etc/default/devfsadm     0  1570 R   108310   4096       grep /etc/default/dhcpagent     0  1570 R   102582   1024       grep /etc/default/fs     0  1570 R   169070   1024       grep /etc/default/ftp     0  1570 R   108322   2048       grep /etc/default/inetinit     0  1570 R   108318   1024       grep /etc/default/ipsec     0  1570 R   102584   2048       grep /etc/default/kbd     0  1570 R   102588   1024       grep /etc/default/keyserv     0  1570 R   973440   8192       grep /etc/default/lu ...

The output is printed as the disk events complete.

To see a list of available options for iosnoop, use the -h option. The options include -o to print disk I/O time, using the adaptive disk-response-time algorithm previously discussed. The following is from iosnoop version 1.55.

# ./iosnoop -h USAGE: iosnoop [-a|-A|-DeghiNostv] [-d device] [-f filename]                [-m mount_point] [-n name] [-p PID]        iosnoop          # default output                 -a      # print all data (mostly)                 -A      # dump all data, space delimited                 -D      # print time delta, us (elapsed)                 -e      # print device name                 -g      # print command arguments                 -i      # print device instance                 -N      # print major and minor numbers                 -o      # print disk delta time, us                 -s      # print start time, us                 -t      # print completion time, us                 -v      # print completion time, string                 -d device       # instance name to snoop                 -f filename     # snoop this file only                 -m mount_point  # this FS only                 -n name         # this process name only                 -p PID          # this PID only    eg,         iosnoop -v      # human readable timestamps         iosnoop -N      # print major and minor numbers         iosnoop -m /    # snoop events on filesystem / only

The block addresses printed are relative to the disk slice, so what may appear to be similar block addresses may in fact be on different slices or disks. The -N option can help ensure that we are examining the same slice since it prints major and minor numbers on which we can be match.

4.17.3. Plotting Disk Activity

Using the -t option for iosnoop prints the disk completion time in microseconds. In combination with -N, we can use this data to plot disk events for a process on one slice. Here we fetch the data for the find command, which contains the time (microseconds since boot) and block address. These are our X and Y coordinates. We check that we remain on the same slice (major and minor numbers) and then generate an X/Y plot.

# ./iosnoop -tN TIME           MAJ MIN   UID   PID D    BLOCK   SIZE       COMM PATHNAME 1175384556358  102   0     0 27703 W  3932432   4096        ksh /root/.sh_history 1175384556572  102   0     0 27703 W     3826    512        ksh <none> 1175384565841  102   0     0 27849 R   198700   1024       find /usr/dt 1175384578103  102   0     0 27849 R   770288   3072       find /usr/dt/bin 1175384582354  102   0     0 27849 R   690320   8192       find <none> 1175384582817  102   0     0 27849 R   690336   8192       find <none> 1175384586787  102   0     0 27849 R   777984   2048       find /usr/dt/lib 1175384594313  102   0     0 27849 R   733880   1024       find /usr/dt/lib/amd64 ...

We ran a find / command to generate random disk activity; the results are shown in Figure 4.6. As the disk heads seek to different block addresses, the position of the heads is plotted in red.

Figure 4.6. Plotting Disk Activity, a Random I/O Example

Are we really looking at disk head seek patterns? Not exactly. What we are looking at are block addresses for biodone functions from the block I/O driver. We aren't using some X-ray vision to look at the heads themselves.

Now, if this is a simple disk device, then the block address probably relates to the disk head location.^[12] But if this is a virtual device, say, a storage array, then block addresses could map to anything, depending on the storage layout. However, we could at least say that a large jump in block address probably means a seek at some point (although storage arrays will cache).

^[12] Even "simple" disks these days map addresses in firmware to an internal optimized layout; all we know is the image of the disk that its firmware presents. The classic example here is sector zoning, as discussed in Section 4.4.

The block addresses do help us visualize the pattern of completed disk activity. But remember that "completed" means the block I/O driver thinks that the I/O event completed.

4.17.4. Plotting Concurrent Activity

Previously, we discussed concurrent disk activity and included a plot (Figure 4.2) to help us understand how these events may occur. Since DTrace can easily trace concurrent disk activity, we can include a plot of actual activity. The following DTrace script provides input for a spreadsheet. We match on a device by its major and minor numbers, then print timestamps as the first column and block addresses for strategy and biodone events in the remaining columns.

#!/usr/sbin/dtrace -s #pragma D option quiet io:genunix::start /args[1]->dev_major == 102 && args[1]->dev_minor == 0/ {         printf("%d,%d,\n", timestamp/1000, args[0]->b_blkno); } io:genunix::done /args[1]->dev_major == 102 && args[1]->dev_minor == 0/ {         printf("%d,,%d\n", timestamp/1000, args[0]->b_blkno); }

The output of the DTrace script was plotted as Figure 4.7, with timestamps as X-coordinates.

Figure 4.7. Plotting Raw Driver Events: Strategy and Biodone

Initially, we see many quick strategies between 0 and 200 µs, ending in almost a vertical line. This is then followed by slower biodones as the disk catches up at mechanical speeds.