Section 11.5. Adding Statistics to the Solaris Kernel


11.5. Adding Statistics to the Solaris Kernel

The kstat mechanism provides lightweight statistics that are a stable part of kernel code. The kstat interface can provide standard information that would be reported from a user-level tool. For example, if you wanted to add your own device driver I/O statistics into the statistics pool reported by the iostat command, you would add a kstat provider.

The statistics reported by vmstat, iostat, and most of the other Solaris tools are gathered by a central kernel statistics subsystem, known as "kstat." The kstat facility is an all-purpose interface for collecting and reporting named and typed data.

A typical scenario will have a kstat producer and a kstat reader. The kstat reader is a utility in user mode that reads, potentially aggregates, and then reports the results. For example, the vmstat utility is a kstat reader that aggregates statistics provided by the vm system in the kernel.

Statistics are named and accessed by a four-tuple: class, module, name, instance. Solaris 8 introduced a new method to access kstat information from the command line or in custom-written scripts. You can use the command-line tool /usr/bin/kstat interactively to print all or selected kstat information from a system. This program is written in the Perl language, and you can use the Perl XS extension module to write your own custom Perl programs. Both facilities are documented in the pages of the Perl online manual.

11.5.1. A kstat Provider Walkthrough

To add your own statistics to your Solaris kernel, you need to create a kstat provider, which consists of an initialization function to create the statistics group and then create a callback function that updates the statistics before they are read. The callback function is often used to aggregate or summarize information before it is reported to the reader. The kstat provider interface is defined in kstat(3KSTAT) and kstat(9S). More verbose information can be found in usr/ src/uts/common/sys/kstat.h.

The first step is to decide on the type of information you want to export. The two primary types are RAW and NAMED or IO. The RAW interface exports raw C data structures to userland; its use is strongly discouraged, since a change in the C structure will cause incompatibilities in the reader. The NAMED mechanisms are preferred since the data is typed and extensible. Both the NAMED and IO types use typed data.

The NAMED type provides single or multiple records of data and is the most common choice. The IO record provides I/O statistics only. It is collected and reported by the iostat command and therefore should be used only for items that can be viewed and reported as I/O devices (we do this currently for I/O devices and NFS file systems).

A simple example of NAMED statistics is the virtual memory summaries provided by system_pages.

$ kstat -n system_pages module: unix                            instance: 0 name:   system_pages                    class:    pages         availrmem                       343567         crtime                          0         desfree                         4001         desscan                         25         econtig                         4278190080         fastscan                        256068         freemem                         248309         kernelbase                      3556769792         lotsfree                        8002         minfree                         2000         nalloc                          11957763         nalloc_calls                    9981         nfree                           11856636         nfree_calls                     6689         nscan                           0         pagesfree                       248309         pageslocked                     168569         pagestotal                      512136         physmem                         522272         pp_kernel                       64102         slowscan                        100         snaptime                        6573953.83957897 


These are first declared and initialized by the following C structs in usr/src/ uts/common/os/kstat_fr.c.

struct {         kstat_named_t physmem;         kstat_named_t nalloc;         kstat_named_t nfree;         kstat_named_t nalloc_calls;         kstat_named_t nfree_calls;         kstat_named_t kernelbase;         kstat_named_t econtig;         kstat_named_t freemem;         kstat_named_t availrmem;         kstat_named_t lotsfree;         kstat_named_t desfree;         kstat_named_t minfree;         kstat_named_t fastscan;         kstat_named_t slowscan;         kstat_named_t nscan;         kstat_named_t desscan;         kstat_named_t pp_kernel;         kstat_named_t pagesfree;         kstat_named_t pageslocked;         kstat_named_t pagestotal; } system_pages_kstat = {         { "physmem",             KSTAT_DATA_ULONG },         { "nalloc",              KSTAT_DATA_ULONG },         { "nfree",               KSTAT_DATA_ULONG },         { "nalloc_calls",        KSTAT_DATA_ULONG },         { "nfree_calls",         KSTAT_DATA_ULONG },         { "kernelbase",          KSTAT_DATA_ULONG },         { "econtig",             KSTAT_DATA_ULONG },         { "freemem",             KSTAT_DATA_ULONG },         { "availrmem",           KSTAT_DATA_ULONG },         { "lotsfree",            KSTAT_DATA_ULONG },         { "desfree",             KSTAT_DATA_ULONG },         { "minfree",             KSTAT_DATA_ULONG },         { "fastscan",            KSTAT_DATA_ULONG },         { "slowscan",            KSTAT_DATA_ULONG },         { "nscan",               KSTAT_DATA_ULONG },         { "desscan",             KSTAT_DATA_ULONG },         { "pp_kernel",           KSTAT_DATA_ULONG },         { "pagesfree",           KSTAT_DATA_ULONG },         { "pageslocked",         KSTAT_DATA_ULONG },         { "pagestotal",          KSTAT_DATA_ULONG }, }; 


These statistics are the simplest type, merely a basic list of 64-bit variables. Once declared, the kstats are registered with the subsystem.

static int system_pages_kstat_update(kstat_t *, int); ...         kstat_t *ksp;         ksp = kstat_create("unix", 0, "system_pages", "pages", KSTAT_TYPE_NAMED,                 sizeof (system_pages_kstat) / sizeof (kstat_named_t),                 KSTAT_FLAG_VIRTUAL);         if (ksp) {                 ksp->ks_data = (void *) &system_pages_kstat;                 ksp->ks_update = system_pages_kstat_update;                 kstat_install(ksp);         } ... 


The kstat create function takes the 4-tuple description and the size of the kstat and provides a handle to the created kstats. The handle is then updated to include a pointer to the data and a callback function which will be invoked when the user reads the statistics.

The callback function when invoked has the task of updating the data structure pointed to by ks_data. If you choose not to update, simply set the callback function to default_kstat_update(). The system pages kstat preamble looks like this:

static int system_pages_kstat_update(kstat_t *ksp, int rw) {         if (rw == KSTAT_WRITE) {                 return (EACCES);         } 


This basic preamble checks to see if the user code is trying to read or write the structure. (Yes, it's possible to write to some statistics if the provider allows it.) Once basic checks are done, the update callback simply stores the statistics into the predefined data structure, and then returns.

...         system_pages_kstat.freemem.value.ul     = (ulong_t)freemem;         system_pages_kstat.availrmem.value.ul   = (ulong_t)availrmem;         system_pages_kstat.lotsfree.value.ul    = (ulong_t)lotsfree;         system_pages_kstat.desfree.value.ul     = (ulong_t)desfree;         system_pages_kstat.minfree.value.ul     = (ulong_t)minfree;         system_pages_kstat.fastscan.value.ul    = (ulong_t)fastscan;         system_pages_kstat.slowscan.value.ul    = (ulong_t)slowscan;         system_pages_kstat.nscan.value.ul       = (ulong_t)nscan;         system_pages_kstat.desscan.value.ul     = (ulong_t)desscan;         system_pages_kstat.pagesfree.value.ul   = (ulong_t)freemem; ...         return (0); } 


That's it for a basic named kstat.

11.5.2. I/O Statistics

In this section, we can see an example of how I/O stats are measured and recorded. As discussed in Section 11.1.3.5, there is special type of kstat for I/O statistics.

I/O devices are measured as a queue, using Reimann Sumwhich is a count of the visits to the queue and a sum of the "active" time. These two metrics can be used to determine the average service time and I/O counts for the device. There are typically two queues for each device, the wait queue and the active queue. This represents the time spent after the request has been accepted and enqueued, and then the time spent active on the device.

An I/O device driver has a similar declare and create section, as we saw with the NAMED statistics. For instance, the floppy disk device driver (usr/src/uts/sun/io/fd.c) shows kstat_create() in the device driver attach function.

static int fd_attach(dev_info_t *dip, ddi_attach_cmd_t cmd) { ...         fdc->c_un->un_iostat = kstat_create("fd", 0, "fd0", "disk",             KSTAT_TYPE_IO, 1, KSTAT_FLAG_PERSISTENT);         if (fdc->c_un->un_iostat) {                 fdc->c_un->un_iostat->ks_lock = &fdc->c_lolock;                 kstat_install(fdc->c_un->un_iostat);         } ... } 


The per-I/O statistics are updated when the device driver strategy function and the location where the I/O is first received and queued. At this point, the I/O is marked as waiting on the wait queue.

#define KIOSP    KSTAT_IO_PTR(un->un_iostat) static int fd_strategy(register struct buf *bp) {         struct fdctlr *fdc;         struct fdunit *un;         fdc = fd_getctlr(bp->b_edev);         un = fdc->c_un; ...         /* Mark I/O as waiting on wait q */         if (un->un_iostat) {                 kstat_waitq_enter(KIOSP);         } ... } 


The I/O spends some time on the wait queue until the device is able to process the request. For each I/O the fdstart() routine moves the I/O from the wait queue to the run queue with the kstat_waitq_to_runq() function.

static void fdstart(struct fdctlr *fdc) { ...                 /* Mark I/O as active, move from wait to active q */                 if (un->un_iostat) {                         kstat_waitq_to_runq(Kiosp);                 } ...                 /* Do I/O... */ ... 


When the I/O is complete (still in the fdstart() function), it is marked with kstat_runq_exit() as leaving the active queue. This updates the last part of the statistic, leaving us with the number of I/Os and the total time spent on each queue.

                /* Mark I/O as complete */                 if (un->un_iostat) {                         if (bp->b_flags & B_READ) {                                 KIOSP->reads++;                                 KIOSP->nread +=                                         (bp->b_bcount - bp->b_resid);                         } else {                                 KIOSP->writes++;                                 KIOSP->nwritten += (bp->b_bcount - bp->b_resid);                         }                          kstat_runq_exit(KIOSP);                 }                 biodone(bp); ... } 


These statistics provide us with our familiar metrics, where actv is the average length of the queue of active I/Os and asvc_t is the average service time in the device. The wait queue is represented accordingly with wait and wsvc_t.

$ iostat -xn 10                     extended device statistics     r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device     1.2    0.1    9.2    1.1  0.1  0.5    0.1   10.4   1   1 fd0 





Solaris Performance and Tools(c) Dtrace and Mdb Techniques for Solaris 10 and Opensolaris
Solaris Performance and Tools: DTrace and MDB Techniques for Solaris 10 and OpenSolaris
ISBN: 0131568191
EAN: 2147483647
Year: 2007
Pages: 180

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net