Section 4.12. iostat Internals | Solaris Performance and Tools: DTrace and MDB Techniques for Solaris 10 and OpenSolaris

4.12. `iostat` Internals

iostat is a consumer of kstat (the Kernel statistics facility, Chapter 11), which prints statistics for KSTAT_TYPE_IO devices. We can use the kstat(1M) command to see the data that iostat is using.

$ kstat -n dad1 module: dad                              instance: 1 name:   dad1                             class:     disk         crtime                           1.718803613         nread                            5172183552         nwritten                         1427398144         rcnt                             0         reads                            509751         rlastupdate                      1006817.75420951         rlentime                         4727.596773858         rtime                            3551.281310393         snaptime                         1006817.75420951         wcnt                             0         wlastupdate                      1006817.75420951         wlentime                         3681.523121192         writes                           207061         wtime                            492.453167341 $ kstat -n dad1,error module: daderror                         instance: 1 name:   dad1,error                       class:    device_error         No Device                        0         Device Not Ready                 0         Hard Errors                      0         Illegal Request                  0         Media Error                      0         Model                            ST38420A        Revision         Recoverable                      0         Revision                         3.05         Serial No                        7AZ04J9S        Size         Size                             8622415872         Soft Errors                      0         Transport Errors                 0         crtime                           1.718974829         snaptime                         1006852.93847071

This shows a kstat object named dad1, which is of kstat_io_t and is well documented in sys/kstat.h. The dad1, error object is a regular kstat object.

A sample is below.

typedef struct kstat_io { ...         hrtime_t wtime;         /* cumulative wait (pre-service) time */         hrtime_t wlentime;      /* cumulative wait length*time product */         hrtime_t wlastupdate;   /* last time wait queue changed */         hrtime_t rtime;         /* cumulative run (service) time */         hrtime_t rlentime;      /* cumulative run length*time product */         hrtime_t rlastupdate;   /* last time run queue changed */                                                                         See sys/kstat.h

Since kstat has already provided meaningful data, it is fairly easy for iostat to sample it, run some interval calculations, and then print it. As a demonstration of what iostat really does, the following is the code for calculating %b.

                /* % of time there is a transaction running */                 t_delta = hrtime_delta(old ? old->is_stats.rtime : 0,                     new->is_stats.rtime);                 if (t_delta) {                         r_pct = (double)t_delta;                         r_pct /= hr_etime;                         r_pct *= 100.0;                                                                   See ...cmd/stat/iostat.c

The key statistic, is_stats.rtime, is from the kstat_io struct and is described as "cumulative run (service) time." Since this is a cumulative counter, the old value of is_stats.rtime is subtracted from the new, to calculate the actual cumulative runtime since the last sample (t_delta). This is then divided by hr_etimethe total elapsed time since the last sampleand then multiplied by 100 to form a percentage.

This approach could be described as saying a service time of 1000 ms is available every one second. This provides a convenient known upper limit that can be used for percentage calculations. If 200 ms of service time was consumed in one second, then the disk is 20% busy. Consider using Kbytes/sec instead for our busy calculation; the upper limit would vary according to random or sequential activity, and determining it would be quite challenging.

How wait is calculated in the iostat.c source looks identical, this time with is_stats.wlentime. kstat.h describes this as "cumulative wait length x time product" and discusses when it is updated.

         * At each change of state (entry or exit from the queue),          * we add the elapsed time (since the previous state change)          * to the active time if the queue length was non-zero during          * that interval; and we add the product of the elapsed time          * times the queue length to the running length*time sum. ...                                                                          See kstat.h

This method, known as a "Riemann sum," allows us to calculate a proportionally accurate average wait queue length, based on the length of time at each queue length.

The comment from kstat.h also sheds light on how percent busy is calculated: At each change of disk state the elapsed time is added to the active time if there was activity. This sum of active time is the rtime used earlier.

For more information on these statistics and kstat, see Section 11.5.2.