Section 11.1. C-Level Kstat Interface


11.1. C-Level Kstat Interface

The Solaris libkstat library contains the C-language functions for accessing kstats from an application. These functions utilize the pseudo-device /dev/kstat to provide a secure interface to kernel data, obviating the need for programs that are setuid to root.

Since many developers are interested in accessing kernel statistics through C programs, this chapter focuses on libkstat. The chapter explains the data structures and functions, and provides example code to get you started using the library.

11.1.1. Data Structure Overview

Solaris kernel statistics are maintained in a linked list of structures referred to as the kstat chain. Each kstat has a common header section and a type-specific data section, as shown in Figure 11.1.

Figure 11.1. Kstat Chain


The chain is initialized at system boot time, but since Solaris is a dynamic operating system, this chain may change over time. Kstat entries can be added and removed from the system as needed by the kernel. For example, when you add an I/O board and all of its attached components to a running system by using Dynamic Reconfiguration, the device drivers and other kernel modules that interact with the new hardware will insert kstat entries into the chain.

The structure member ks_data is a pointer to the kstat's data section. Multiple data types are supported: raw, named, timer, interrupt, and I/O. These are explained in Section 11.1.3.

The following header contains the full kstat header structure.

typedef struct kstat {        /*         * Fields relevant to both kernel and user         */        hrtime_t       ks_crtime;               /* creation time */        struct kstat  *ks_next;                 /* kstat chain linkage */        kid_t          ks_kid;                  /* unique kstat ID */        char           ks_module[KSTAT_STRLEN]; /* module name */        uchar_t        ks_resv;                 /* reserved */        int            ks_instance;             /* module's instance */        char           ks_name[KSTAT_STRLEN];   /* kstat name */        uchar_t        ks_type;                 /* kstat data type */        char           ks_class[KSTAT_STRLEN];  /* kstat class */        uchar_t        ks_flags;                /* kstat flags */        void          *ks_data;                 /* kstat type-specific data */        uint_t         ks_ndata;                /* # of data records */        size_t         ks_data_size;            /* size of kstat data section */        hrtime_t       ks_snaptime;             /* time of last data snapshot */        /*         * Fields relevant to kernel only         */        int (*ks_update)(struct kstat *, int);        void           *ks_private;        int (*ks_snapshot)(struct kstat *, void *, int);        void           *ks_lock; } kstat_t; 


The significant members are described below.

  • ks_crtime. This member reflects the time the kstat was created. Using the value, you can compute the rates of various counters since the kstat was created ("rate since boot" is replaced by the more general concept of "rate since kstat creation").

    All times associated with kstats, such as creation time, last snapshot time, kstat_timer_t, kstat_io_t timestamps, and the like, are 64-bit nanosecond values.

    The accuracy of kstat timestamps is machine-dependent, but the precision (units) is the same across all platforms. Refer to the gethrtime(3C) man page for general information about high-resolution timestamps.

  • ks_next. kstats are stored as a NULL-terminated linked list or a chain.ks_next points to the next kstat in the chain.

  • ks_kid. This member is a unique identifier for the kstat.

  • ks_module and ks_instance. These members contain the name and instance of the module that created the kstat. In cases where there can only be one instance, ks_instance is 0. Refer to Section 11.1.4 for more information.

  • ks_name. This member gives a meaningful name to a kstat. For additional kstat namespace information, see Section 11.1.4.

  • ks_type. This member identifies the type of data in this kstat. Kstat data types are covered in Section 11.1.3.

  • ks_class. Each kstat can be characterized as belonging to some broad class of statistics, such as bus, disk, net, vm, or misc. This field can be used as a filter to extract related kstats.

    The following values are currently in use by Solaris:

    bus

    hat

    met

    rpc

    controller

    kmem_cache

    nfs

    ufs

    device_error

    kstat

    pages

    vm

    taskq

    mib2

    crypto

    errorq

    disk

    misc

    partition

    vmem


  • ks_data, ks_ndata, and ks_data_size. ks_data is a pointer to the kstat's data section. The type of data stored there depends on ks_type. ks_ndata indicates the number of data records. Only some kstat types support multiple data records. The following kstats support multiple data records.

    - KSTAT_TYPE_RAW

    - KSTAT_TYPE_NAMED

    - KSTAT_TYPE_TIMER

    The following kstats support only one data record:

    - KSTAT_TYPE_INTR

    - KSTAT_TYPE_IO

    ks_data_size is the total size of the data section, in bytes.

  • ks_snaptime. Timestamp for the last data snapshot. With it, you can compute activity rates based on the following computational method:

    rate = (new_count - old_count) / (new_snaptime old_snaptime)

11.1.2. Getting Started

To use kstats, a program must first call kstat_open(), which returns a pointer to a kstat control structure. The following header shows the structure members.

typedef struct kstat_ctl {          kid_t     kc_chain_id;    /* current kstat chain ID */          kstat_t   *kc_chain;      /* pointer to kstat chain */          int       kc_kd;          /* /dev/kstat descriptor */ } kstat_ctl_t; 


kc_chain points to the head of your copy of the kstat chain. You typically walk the chain or use kstat_lookup() to find and process a particular kind of kstat.kc_chain_id is the kstat chain identifier, or KCID, of your copy of the kstat chain. Its use is explained in Section 11.1.4.

To avoid unnecessary overhead in accessing kstat data, a program first searches the kstat chain for the type of information of interest, then uses the kstat_read() and kstat_data_lookup() functions to get the statistics data from the kernel.

The following code fragment shows how you might print out all kstat entries with information about disk I/O. It traverses the entire chain looking for kstats of ks_type KSTAT_TYPE_IO, calls kstat_read() to retrieve the data, and then processes the data with my_io_display(). How to implement this sample function is shown in <ref>.

     kstat_ctl_t     *kc;      kstat_t        *ksp;      kstat_io_t      kio;      kc = kstat_open();      for (ksp = kc->kc_chain; ksp != NULL; ksp = ksp->ks_next) {        if (ksp->ks_type == KSTAT_TYPE_IO) {           kstat_read(kc, ksp, &kio);           my_io_display(kio);        }      } 


11.1.3. Data Types

The data section of a kstat can hold one of five types, identified in the ks_type field. The following kstat types can hold multiple records. The number of records is held in ks_ndata.

  • KSTAT_TYPE_RAW

  • KSTAT_TYPE_NAMED

  • KSTAT_TYPE_TIMER

The other two types are KSTATE_TYPE_INTR and KSTATE_TYPE_IO. The field ks_data_size holds the size, in bytes, of the entire data section.

11.1.3.1. KSTAT_TYPE_RAW

The "raw" kstat type is treated as an array of bytes and is generally used to export well-known structures, such as vminfo (defined in /usr/include/sys/sysinfo.h). The following example shows one method of printing this information.

static void print_vminfo(kstat_t *kp) {      vminfo_t *vminfop;      vminfop = (vminfo_t *)(kp->ks_data);      printf("Free memory: %dn", vminfop->freemem);      printf("Swap reserved: %dn" , vminfop->swap_resv);      printf("Swap allocated: %dn" , vminfop->swap_alloc);      printf("Swap available: %dn", vminfop->swap_avail);      printf("Swap free: %dn", vminfop->swap_free); } 


11.1.3.2. KSTAT_TYPE_NAMED

This type of kstat contains a list of arbitrary name=value statistics. The following example shows the data structure used to hold named kstats.

typedef struct kstat_named {         char    name[KSTAT_STRLEN];     /* name of counter */         uchar_t data_type;              /* data type */         union {                 char            c[16];  /* enough for 128-bit ints */                 int32_t         i32;                 uint32_t        ui32;                 struct {                         union {                                 char            *ptr;    /* NULL-term string */ #if defined(_KERNEL) && defined(_MULTI_DATAMODEL)                                 caddr32_t       ptr32; #endif                                 char            __pad[8]; /* 64-bit padding */                         } addr;                         uint32_t        len;    /* # bytes for strlen + '\0' */                 } str; #if defined(_INT64_TYPE)                 int64_t         i64;                 uint64_t        ui64; #endif                 long            l;                 ulong_t         ul;                 /* These structure members are obsolete */                 longlong_t      ll;                 u_longlong_t    ull;                 float           f;                 double          d;         } value;                        /* value of counter */ } kstat_named_t; #define KSTAT_DATA_CHAR         0 #define KSTAT_DATA_INT32        1 #define KSTAT_DATA_UINT32       2 #define KSTAT_DATA_INT64        3 #define KSTAT_DATA_UINT64       4 #if !defined(_LP64) #define KSTAT_DATA_LONG         KSTAT_DATA_INT32 #define KSTAT_DATA_ULONG        KSTAT_DATA_UINT32 #else #if !defined(_KERNEL) #define KSTAT_DATA_LONG         KSTAT_DATA_INT64 #define KSTAT_DATA_ULONG        KSTAT_DATA_UINT64 #else #define KSTAT_DATA_LONG         7       /* only visible to the kernel */ #define KSTAT_DATA_ULONG        8       /* only visible to the kernel */ #endif  /* !_KERNEL */ #endif  /* !_LP64 */                                                                         See sys/kstat.h 


The program in the above example uses a function my_named_display() to show how one might display named kstats.

Note that if the type is KSTAT_DATA_CHAR, the 16-byte value field is not guaranteed to be null-terminated. This is important to remember when you are printing the value with functions like printf().

11.1.3.3. KSTAT_TYPE_TIMER

This kstat holds event timer statistics. These provide basic counting and timing information for any type of event.

typedef struct kstat_timer {         char            name[KSTAT_STRLEN];     /* event name */         uchar_t         resv;                   /* reserved */         u_longlong_t    num_events;             /* number of events */         hrtime_t        elapsed_time;           /* cumulative elapsed time */         hrtime_t        min_time;               /* shortest event duration */         hrtime_t        max_time;               /* longest event duration */         hrtime_t        start_time;             /* previous event start time */         hrtime_t        stop_time;              /* previous event stop time */ } kstat_timer_t;                                                                         See sys/kstat.h 


11.1.3.4. KSTAT_TYPE_INTR

This type of kstat holds interrupt statistics. Interrupts are categorized as listed in Table 11.1 and as shown below the table.

Table 11.1. Types of Interrupt Kstats

Interrupt Type

Definition

Hard

Sourced from the hardware device itself

Soft

Induced by the system by means of some system interrupt source

Watchdog

Induced by a periodic timer call

Spurious

An interrupt entry point was entered but there was no interrupt to service

Multiple Service

An interrupt was detected and serviced just before returning from any of the other types


#define KSTAT_INTR_HARD      0 #define KSTAT_INTR_SOFT      1 #define KSTAT_INTR_WATCHDOG  2 #define KSTAT_INTR_SPURIOUS  3 #define KSTAT_INTR_MULTSVC   4 #define KSTAT_NUM_INTRS      5 typedef struct kstat_intr {     uint_t intrs[KSTAT_NUM_INTRS]; /* interrupt counters */ } kstat_intr_t;                                                                         See sys/kstat.h 


11.1.3.5. KSTAT_TYPE_IO

This kstat counts I/O's for statistical analysis.

typedef struct kstat_io {      /*       * Basic counters.       */      u_longlong_t     nread;      /* number of bytes read */      u_longlong_t     nwritten;   /* number of bytes written */      uint_t           reads;      /* number of read operations */      uint_t           writes;     /* number of write operations */      /*       * Accumulated time and queue length statistics.       */      hrtime_t   wtime;            /* cumulative wait (pre-service) time */      hrtime_t   wlentime;         /* cumulative wait length*time product*/      hrtime_t   wlastupdate;      /* last time wait queue changed */      hrtime_t   rtime;            /* cumulative run (service) time */      hrtime_t   rlentime;         /* cumulative run length*time product */      hrtime_t   rlastupdate;      /* last time run queue changed */      uint_t     wcnt;             /* count of elements in wait state */      uint_t     rcnt;             /* count of elements in run state */ } kstat_io_t;                                                                         See sys/kstat.h 


Accumulated Time and Queue Length Statistics

Time statistics are kept as a running sum of "active" time. Queue length statistics are kept as a running sum of the product of queue length and elapsed time at that length. That is, a Riemann sum for queue length integrated against time. Figure 11.2 illustrates a sample graphical representation of queue vs. time.

Figure 11.2. Queue Length Sampling


At each change of state (either an entry or exit from the queue), the elapsed time since the previous state change is added to the active time (wlen or rlen fields) if the queue length was non-zero during that interval.

The product of the elapsed time and the queue length is added to the running sum of the length (wlentime or rlentime fields) multiplied by the time.

Stated programmatically:

if (queue length != 0) {     time += elapsed time since last state change;     lentime +=  (elapsed time since last state change * queue length); } 


You can generalize this method to measure residency in any defined system. Instead of queue lengths, think of "outstanding RPC calls to server X."

A large number of I/O subsystems have at least two basic lists of transactions they manage:

  • A list for transactions that have been accepted for processing but for which processing has yet to begin

  • A list for transactions that are actively being processed but that are not complete

For these reasons, two cumulative time statistics are defined:

  • Pre-service (wait) time

  • Service (run) time

The units of cumulative busy time are accumulated nanoseconds.

11.1.4. Kstat Names

The kstat namespace is defined by three fields from the kstat structure:

  • ks_module

  • ks_instance

  • ks_name

The combination of these three fields is guaranteed to be unique.

For example, imagine a system with four FastEthernet interfaces. The device driver module for Sun's FastEthernet controller is called "hme". The first Ethernet interface would be instance 0, the second instance 1, and so on. The "hme" driver provides two types of kstat for each interface. The first contains named kstats with performance statistics. The second contains interrupt statistics.

The kstat data for the first interface's network statistics is found under ks_module == "hme", ks_instance == 0, and ks_name == "hme0". The interrupt statistics are contained in a kstat identified by ks_module == "hme", ks_instance == 0, and ks_name == "hmec0".

In that example, the combination of module name and instance number to make the ks_name field ("hme0" and "hmec0") is simply a convention for this driver. Other drivers may use similar naming conventions to publish multiple kstat data types but are not required to do so; the module is required to make sure that the combination is unique.

How do you determine what kstats the kernel provides? One of the easiest ways with Solaris 8 is to run /usr/bin/kstat with no arguments. This command prints nearly all the current kstat data. The Solaris kstat command can dump most of the known kstats of type KSTAT_TYPE_RAW.

11.1.5. Functions

The following functions are available to C programs for accessing kstat data from user programs:

kstat_ctl_t * kstat_open(void); Initializes a kstat control structure to provide access to the kernel statistics library. It returns a pointer to this structure, which must be supplied as the kc argu- ment in subsequent libkstat function calls. kstat_t * kstat_lookup(kstat_ctl_t *kc, char *ks_module, int ks_instance,                        char *ks_name); Traverses the kstat chain searching for a kstat with a given ks_module, ks_instance, and ks_name fields. If the ks_module is NULL, ks_instance is -1, or if ks_name is NULL, then those fields are ignored in the search. For example, kstat_lookup(kc, NULL, -1, "foo") simply finds the first kstat with the name "foo". void * kstat_data_lookup(kstat_t *ksp, char *name); Searches the kstat's data section for the record with the specified name. This operation is valid only for kstat types that have named data records. Currently, only the KSTAT_ TYPE_NAMED and KSTAT_TYPE_TIMER kstats have named data records. You must first call kstat_read() to get the data from the kernel. This routine then finds a particular record in the data section. kid_t kstat_read(kstat_ctl_t *kc, kstat_t *ksp, void *buf); Gets data from the kernel for a particular kstat. kid_t kstat_write(kstat_ctl_t *kc, kstat_t *ksp, void *buf); Writes data to a particular kstat in the kernel. Only the superuser can use kstat_ write(). kid_t kstat_chain_update(kstat_ctl_t *kc); Synchronizes the user's kstat header chain with that of the kernel. int kstat_close(kstat_ctl_t *kc); Frees all resources that were associated with the kstat control structure. This is done automatically on exit(2) and execve(). (For more information on exit(2) and execve(), see the exec(2) man page.) 


11.1.6. Management of Chain Updates

Recall that the kstat chain is dynamic in nature. The libkstat library function kstat_open() returns a copy of the kernel's kstat chain. Since the content of the kernel's chain may change, your program should call the kstat_chain_update() function at the appropriate times to see if its private copy of the chain is the same as the kernel's. This is the purpose of the KCID (stored in kc_chain_id in the kstat control structure).

Each time a kernel module adds or removes a kstat from the system's chain, the KCID is incremented. When your program calls kstat_chain_update(), the function checks to see if the kc_chain_id in your program's control structure matches the kernel's. If not, kc_chain_update() rebuilds your program's local kstat chain and returns the following:

  • The new KCID if the chain has been updated

  • 0 if no change has been made

  • -1 if some error was detected

If your program has cached some local data from previous calls to the kstat library, then a new KCID acts as a flag to indicate that you have up-to-date information. You can search the chain again to see if data that your program is interested in has been added or removed.

A practical example is the system command iostat. It caches some internal data about the disks in the system and needs to recognize that a disk has been brought on-line or off-line. If iostat is called with an interval argument, it prints I/O statistics every interval second. Each time through the loop, it calls kstat_chain_update() to see if something has changed. If a change took place, it figures out if a device of interest has been added or removed.

11.1.7. Putting It All Together

Your C source file must contain:

#include <kstat.h> 


When your program is linked, the compiler command line must include the argument -lkstat.

$ cc -o print_some_kstats -lkstat print_some_kstats.c 


The following is a short example program. First, it uses kstat_lookup() and kstat_read() to find the system's CPU speed. Then it goes into an infinite loop to print a small amount of information about all kstats of type KSTAT_TYPE_IO. Note that at the top of the loop, it calls kstat_chain_update() to check that you have current data. If the kstat chain has changed, the program sends a short message on stderr.

/*  print_some_kstats.c:  *  print out a couple of interesting things  */ #include <kstat.h> #include <stdio.h> #include <inttypes.h> #define SLEEPTIME 10 void my_named_display(char *, char *, kstat_named_t *); void my_io_display(char *, char *, kstat_io_t); main(int argc, char **argv) {      kstat_ctl_t    *kc;      kstat_t       *ksp;      kstat_io_t     kio;      kstat_named_t *knp;      kc = kstat_open();      /*       * Print out the CPU speed. We make two assumptions here:       * 1) All CPUs are the same speed, so we'll just search for the       *    first one;       * 2) At least one CPU is online, so our search will always       *    find something. :)       */      ksp = kstat_lookup(kc, "cpu_info", -1, NULL);      kstat_read(kc, ksp, NULL);      /* lookup the CPU speed data record */      knp = kstat_data_lookup(ksp, "clock_MHz");      printf("CPU speed of system is ");      my_named_display(ksp->ks_name, ksp->ks_class, knp);      printf("n");      /* dump some info about all I/O kstats every         SLEEPTIME seconds  */      while(1) {         /* make sure we have current data */          if(kstat_chain_update(kc))              fprintf(stderr, "<<State Changed>>n");          for (ksp = kc->kc_chain; ksp != NULL; ksp = ksp->ks_next) {            if (ksp->ks_type == KSTAT_TYPE_IO) {               kstat_read(kc, ksp, &kio);               my_io_display(ksp->ks_name, ksp->ks_class, kio);            }          }          sleep(SLEEPTIME);      } /* while(1) */ } void my_io_display(char *devname, char *class, kstat_io_t k) {      printf("Name: %s Class: %sn",devname,class);      printf("tnumber of bytes read %lldn", k.nread);      printf("tnumber of bytes written %lldn", k.nwritten);      printf("tnumber of read operations %dn", k.reads);      printf("tnumber of write operations %dnn", k.writes); } void my_named_display(char *devname, char *class, kstat_named_t *knp) {      switch(knp->data_type) {      case KSTAT_DATA_CHAR:           printf("%.16s",knp->value.c);           break;      case KSTAT_DATA_INT32:           printf("%" PRId32,knp->value.i32);           break;      case KSTAT_DATA_UINT32:           printf("%" PRIu32,knp->value.ui32);           break;      case KSTAT_DATA_INT64:           printf("%" PRId64,knp->value.i64);           break;      case KSTAT_DATA_UINT64:           printf("%" PRIu64,knp->value.ui64);     } } 





Solaris Performance and Tools(c) Dtrace and Mdb Techniques for Solaris 10 and Opensolaris
Solaris Performance and Tools: DTrace and MDB Techniques for Solaris 10 and OpenSolaris
ISBN: 0131568191
EAN: 2147483647
Year: 2007
Pages: 180

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net