7.3 Tools

In this section, we examine just a few of the tools that are likely to be useful to the email administrator. Many possibilities are available many more than are listed here. Some are very specific, whereas others have broad applications. The tools discussed here are both generally useful and widely available. Email administrators interested in tuning, or just understanding, an email system should not restrict their studies to just the utilities mentioned here. Magazine articles, books, Web sites, and other system administrators can all provide insight into very helpful tools.

Each tool discussed has different options and displays slightly different information on each operating system version. While this inconsistency is annoying, some of the differences are tied to the internal workings of the operating system and are unavoidable. Also, some of the less common options are the most useful. It's just not practical to limit the use of these utilities to their common flag and output subsets, so that won't be done here. Instead, this section will generally provide examples using the FreeBSD (version 4.5) operating system utilities, throwing in some examples specific to other operating systems.

A final note: As we know from science, it is impossible to measure a system without affecting it. Just by running a tool we necessarily change the behavior of the very computer we're monitoring. These utilities consume memory and CPU time, they open sockets and files, and they read data off disks. Therefore, we can never be entirely sure that a problem that we observe on a system isn't at least partially influenced by the fact that we're monitoring it. Although this is rarely the case, it's a good idea to not go overboard by continually running top or by having scripts run ps every five seconds to capture the state of the machine. A much more modest approach to capturing data (running ps every five minutes, for example) will provide equally useful information without adding substantially to the server's load.

7.3.1 `ps`

The venerable ps utility comes in two flavors: the Berkeley flavor (found on BSDbased systems and Linux) and the System V flavor (found on AIX, HP-UX, and other systems). Solaris provides the System V flavor in /usr/bin, and the Berkeley flavor appears in /usr/ucb. My preference is for the Berkeley-style output of ps; I like the information it provides and the way that the Berkeley ps -u sorts the data. Essentially the same information is available from either version, however, so other than remembering which option does what, one shouldn't be handicapped by any particular flavor.

A lot of information is available from ps, and it's especially useful for such tasks as tracking the number of certain types of processes running on a machine or seeing which processes are the largest resource consumers. A great deal of information is available from this program, which varies depending on the option flags selected. Everyone performing system troubleshooting would be well advised to become very familiar with the ps man page for the operating system that runs on their email server.

For both varieties of ps, some command-line flags require more processing to resolve than others. On Berkeley-type systems, it is more computationally intensive to resolve commands with the -u flag than without it. For System V versions, adding the -l flag requires more computational resources than if the command is run without it. Therefore, these flags, which produce extra output, should be used only when they relate important information. One thing that ps provides is rough process counts, for example:

 ps -acx | grep -c "sendmail"

These sorts of data are useful, and periodic counts are often scripted. Especially in automated systems, it's worthwhile to make sure that they produce minimal strain on the server. Determining which options are more resource intensive than others isn't always straightforward, but the time command or shell built-in can aid in this calculation. On quiet servers, the response time for this command might be too fast to measure, so the aggregation of several commands may provide a more precise measurement. For example:

 /usr/bin/time sh -c 'for i in 1 2 3 4 5 6 7 8 9 10; \      do ps -aux > /dev/null; done'

Some ps output from the CPU-bound test server during one of the tests cited earlier in this book appears in Table 7.1. At the moment that this snapshot was taken, syslogd was the most active process. While it is a busy process on an email server, it rarely does the most work at any given time. However, unlike the MTA and LDA processes that move data, this persistent process reads data from the IP stack and writes it to disk on every delivery attempt.

Adding all numbers in the RSS column, they roughly equal the system's total main memory (only 32MB), which doesn't count RAM consumed by the kernel or the buffer cache. Because much of the memory consumed by the processes is shared, it provides enough space to keep the parts of the programs that run while resident in memory and still allow extra space for the kernel and the buffer cache.

On this machine, the script command is used to capture output from the iostat and vmstat commands, which will be discussed shortly. The stat entry is a home-built script that adds date and time information to the output of these two utilities. As we'd expect, most of the CPU time is consumed by sendmail and mail.local processes. Also as we'd expect, concurrent MTA processes outnum-ber LDA processes, even though the email is sent to this server over a low-latency local area network.

Most of the rest of the processes running on this server are either standard parts of the operating system or processes related to remote connections to the server.

7.3.2 `top`

Many UNIX operating systems include the venerable top utility, which is also one of the first Open Source programs installed on many other operating systems. The top utility lists the largest CPU resource consumers on a system and updates this list periodically, typically every few seconds. For understanding the general state of the system, some of the most valuable information appears in the first few lines of the program's display. A system consistently showing a CPU idle state at or near 0% is

 % /usr/ucb/ps -uaxc  USER        PID %CPU %MEM   SZ  RSS TT      S    START  TIME COMMAND  root      11302  1.3  3.3 3480 1004 ?       S 15:03:18  0:23 syslogd  root      23420  1.1  4.3 2296 1304 ?       R 16:05:56  0:02 sendmail  root      24881  0.9  5.6 2392 1700 ?       S 16:13:11  0:00 sendmail  root      24884  0.8  5.3 2352 1592 ?       R 16:13:11  0:00 sendmail  root      24861  0.7  5.6 2392 1700 ?       S 16:13:07  0:00 sendmail  root      11009  0.6  3.9 2012 1172 ?       S 14:47:30  0:08 nscd  root      24871  0.6  3.4 1552 1016 ?       S 16:13:08  0:00 mail.local  root      24886  0.5  5.0 2312 1516 ?       R 16:13:12  0:00 sendmail  root      24892  0.5  2.9 1140  860 pts/4   O 16:13:13  0:00 ps  test1     24890  0.5  3.4 1552 1012 ?       R 16:13:12  0:00 mail.local  root      24889  0.4  5.0 2312 1516 ?       R 16:13:12  0:00 sendmail  root      18650  0.3  3.8 1712 1160 ?       S 15:45:14  0:01 sshd  npc       18716  0.2  2.5 1016  756 pts/4   S 15:45:24  0:00 csh  root      24891  0.2  1.7 2296  492 ?       S 16:13:12  0:00 sendmail  npc       23454  0.2  1.9  856  580 pts/3   S 16:08:02  0:00 stat  root      23449  0.1  1.9  856  580 pts/2   S 16:08:00  0:00 stat  npc       23434  0.1  1.9  788  560 pts/0   S 16:06:48  0:00 script  root          3  0.0  0.0    0    0 ?       S   Feb 04 19:27 fsflush  root          0  0.0  0.0    0    0 ?       T   Feb 04  0:00 sched  root          1  0.0  0.5  652  132 ?       S   Feb 04  0:26 init  root          2  0.0  0.0    0    0 ?       S   Feb 04  0:02 pageout  root        156  0.0  1.8 1464  548 ?       S   Feb 04  0:01 cron  root        159  0.0  2.4 1644  724 ?       S   Feb 04  1:46 sshd  root        174  0.0  1.6  852  480 ?       S   Feb 04  0:00 utmpd  root        203  0.0  2.1 1404  632 ?       S   Feb 04  0:00 sac  root        204  0.0  2.1 1496  624 console S   Feb 04  0:00 ttymon  root        206  0.0  2.3 1496  688 ?       S   Feb 04  0:00 ttymon  root      10540  0.0  3.1 1800  936 ?       S 14:19:23  0:10 sshd  npc       10543  0.0  1.5 1012  448 pts/1   S 14:19:33  0:00 csh  root      10554  0.0  0.0  276    4 pts/1   S 14:20:03  0:00 sh  root      11262  0.0  3.0 1712  904 ?       S 15:01:22  0:02 sshd  npc       11265  0.0  2.5 1028  756 pts/0   S 15:01:25  0:00 csh  root      11456  0.0  2.7 1052  800 pts/1   S 15:05:33  0:00 csh  root      23429  0.0  1.8  764  536 pts/1   S 16:06:46  0:00 script  root      23430  0.0  1.9  788  560 pts/1   S 16:06:46  0:00 script  root      23431  0.0  2.4  996  732 pts/2   S 16:06:46  0:00 csh  npc       23433  0.0  1.8  764  536 pts/0   S 16:06:48  0:00 script  npc       23435  0.0  2.7 1024  804 pts/3   S 16:06:48  0:00 csh  root      23448  0.0  2.2  840  660 pts/2   S 16:08:00  0:00 vmstat  npc       23453  0.0  2.3  848  684 pts/3   S 16:08:02  0:00 iostat

almost certainly CPU bound. The caveat is that some systems list an iowait state indicating what percentage of processes are waiting for I/O. This number doesn't represent CPU time being consumed, but rather consists of the system's best guess as to the amount of CPU time that would be consumed if no processes were blocked waiting for I/O. If a significant percentage of processes are in the iowait state, then the system may show 0% idle while the CPU is barely being used.

In the upper-left corner is the last process identifier (PID) used by the system. From its rate of change, one can deduce how many new processes are spawned per second, giving some idea of how fast sessions are coming and going on the server. This method isn't useful on those few operating systems, such as OpenBSD, that assign new PIDs randomly rather than sequentially.

The memory information displayed isn't as useful as one would first expect. On nearly any system that has been running for a few minutes, or even a few seconds if it's busy, we should expect the amount of free memory listed to stay very near zero. On contemporary operating systems, any RAM that goes unused by processes will be allocated to caching some data. Thus, just because there is very little memory free, it doesn't mean that the system is memory starved. On some operating systems, top will show more memory information, such as how much RAM is allocated to filesystem caches; if this number drops near zero, it would likely indicate that the server would use additional RAM effectively.

Even more so than with ps, the information displayed via top varies from operating system to operating system. A thorough reading of the utility's man page should be performed before its results are interpreted.

7.3.3 `vmstat`

The vmstat utility explores the activity of the virtual memory system, which includes real memory used by processes, memory used for caching, and swap space. The first line of data produced summarizes the activity since the system was booted. Generally, this information should be ignored.

While it's much less impressive than the output that one will find on a true high-performance email server, some example output from the CPU-bound server during one of the test cases discussed in this book can be instructive. This output appears in Table 7.2.

Excessive memory activity will cause heavy paging, which translates into relatively large numbers in the pi and po columns. Of course, what constitutes a large number depends heavily on the particular system. Interpreting these numbers without a baseline will be next to impossible. In the example case, these numbers are so small that we can safely conclude that the system is not memory bound.

 % vmstat 15   r b w swap  free  re   mf pi po  fr de sr  in   sy  cs us sy  id   0 0 0  4692  1804  0    0  0  0   0  0  0   3   39  22  0  0 100   7 1 0 64440  1956  5  925  6 38  38  0  0 256 2150 328 31 68   2   8 0 0 64008  1800  8  923  0 40  44  0  1 249 2030 285 24 67   9   8 0 0 64612  2276 10  950  1 46  48  0  0 249 2079 283 27 66   7   7 1 0 64712  2956  4  954  6 23  94  0 22 262 2101 294 28 67   5   7 1 0 64320  2852  0 1024  2  0   0  0  0 271 2260 317 27 73   0   9 0 0 61684  1960  8  995  6 62 170  0 37 281 2186 329 29 71   0   7 0 0 62968  3836  0 1061  4  0   0  0  0 255 2209 315 29 71   0  15 1 0 58508  1800 12  956  7 71 138  0 27 288 2302 342 30 70   0   6 0 0 62936  4860  2 1035  1 10  10  0  0 252 2072 299 26 71   2

On those systems whose vmstat provides this information, another column worth tracking is de. It gives a system's expected short-term memory deficiency, for which memory space will have to be actively reclaimed. A nonzero entry will show up occasionally in this column on a healthy but busy system. The more often this result appears, though, the more likely the system could use more memory. Our sample data show no deficiencies, another indication that this system is not memory bound.

The first column, labeled r, indicates the number of runnable processes, which provides a snapshot of the system load average. In this example, a number of processes want to run but can't because they have no CPU time slice available to them. The second column, labeled b, gives the number of processes that are blocked from proceeding because they are waiting for I/O. If a significant number of processes are listed in this column, the system is likely I/O bound. In our example, we occasionally see a blocked process, but this event is rare, giving us an indication that this system isn't I/O bound. Yet one more variable worth tracking is the third column, labeled w. It represents the number of processes that are either runnable or have been idle for a short period of time and have now been swapped out. Frequent nonzero numbers in this column also indicate that the server may be desperately short of RAM. The example looks like it's in good shape on that point.

In the past, one could tell whether a system was memory starved just by looking for swapping activity, as opposed to the more healthy activity of paging. Paging is the process of writing parts of process data to swap space to make room for pages of other data in active memory. An operating system may "page out" part of a process if that page hasn't been accessed in a while, even if the process is running. This efficient behavior allows new processes to start up more quickly because memory reclamations don't need to occur first, and it leaves more room for caching data, leading to better performance. Some amount of paging will occur on all operating systems and is considered normal and healthy.

Swapping usually refers to taking a process and moving its entire memory image to disk. It might happen if the process has remained idle for a very long time (tens of seconds, which is a very long time in computer terms) or if the system desperately needs to make room for new processes. "Desperation swapping" and "thrashing" are terms used to describe a system that is so memory starved that nearly every time a process receives a CPU slice, it must be read in from swap to active memory before it can proceed. This horrible circumstance effectively slows memory access (typically measured in tens of nanoseconds) to disk speeds (measured in ones to tens to hundreds of milliseconds, a difference of two to four orders of magnitude). Once a system starts thrashing, it will not operate efficiently. One should aggressively avoid this situation.

Somewhat unfortunately, as virtual memory algorithms have become more complex and sophisticated over the years, it's become more difficult to tell in a vacuum whether a system is thrashing. In fact, many operating systems don't distinguish between paging and swapping, eliminating the latter behavior altogether. Here is where a baseline becomes crucial. One must understand what sort of paging statistics occur on a heavily loaded but properly operating server before one can determine whether a system is beginning to thrash. However, once the disks with swap on them begin to get loaded, it will be painfully obvious that the system has simply run out of memory. Of course, this behavior will occur beyond the point where a server starts to slow down noticeably.

Solaris 8 introduced a new system for managing the buffer cache. Now the page daemon is no longer needed to free up memory used to cache filesystem information. Consequently, the page daemon does not have to do any work to reclaim memory space for new processes. The upshot is that on Solaris 8, if the sr field of vmstat output is nonzero, running processes are being paged to disk to make room for new processes. On this operating system, it has now become more straightforward to identify significant memory deficiencies. Significant activity in the sr field on other operating systems can indicate that the machine is memory starved, but the demarkation point is not as obvious as it is on Solaris 8.

7.3.4 `iostat`

The iostat tool is similar to vmstat, except that it measures system I/O rather than virtual memory statistics. On many systems, it can measure not only disk-by-disk data transfers, but also I/O information to and from a wide variety of sources,

 % iostat -cx 15                              extended device statistics              cpu  device  r/s   w/s  kr/s  kw/s  wait  actv  svc_t  %w  %b  us  sy  wt  id  sd0     0.0   0.6   0.1   3.0   0.0   0.0    9.2   0   1   0   0   0 100  sd3     0.0   0.0   0.0   0.2   0.0   0.0   22.8   0   0                              extended device statistics              cpu  device  r/s   w/s  kr/s  kw/s  wai t actv  svc_t  %w  %b  us  sy  wt  id  sd0     0.5  12.0   1.7  62.0   0.0   0.1   10.9   0  12  27  68   1   4  sd3     0.0  47.9   0.0 225.8   0.0   0.4    7.4   0  30                              extended device statistics              cpu  device  r/s   w/s  kr/s  kw/s  wait  actv  svc_t  %w  %b  us  sy  wt  id  sd0     0.7  10.7   1.9  54.3   0.0   0.1   11.0   0  11  29  71   0   0  sd3     0.2  51.2   0.4 251.9   0.0   0.5   10.0   0  34                              extended device statistics              cpu  device  r/s   w/s  kr/s  kw/s  wait  actv  svc_t  %w  %b  us  sy  wt  id  sd0     0.7  13.9   2.4  71.1   0.0   0.2   11.1   0  14  29  71   0   0  sd3     0.9  47.6   1.3 235.7   0.0   0.4    9.2   0  33                              extended device statistics              cpu  device  r/s   w/s  kr/s  kw/s  wait  actv  svc_t  %w  %b  us  sy  wt  id  sd0     0.7   9.8   2.3  51.3   0.0   0.1   10.2   0  10  30  70   0   0  sd3     1.0  54.0   1.8 268.9   0.0   0.5    8.3   0  36

including tape drives, printers, scanners, ttys, and so on. Like vmstat, this command displays CPU information in the last set of columns. On many systems, if one specifies no I/O devices, it can be a good mechanism to track CPU usage in scripts, such as running iostat -c 60 to get basic output of CPU information every minute on a Linux or Solaris system. As with vmstat, the first line of output by the iostat program is a summary since boot time and is effectively useless. Table 7.3 gives some data gathered with iostat while testing earlier examples in this book.

Typically, iostat reports its data as kilobytes per second or transfers per second. In this example, reads and writes per second for each device are listed in the second and third columns, while the amount of data being moved appears in the fourth and fifth columns. Some versions also show how long the average transfer takes, svc_t in this example, which can be very useful metric for determining loading. If this number starts going up, it indicates that the device is heavily loaded.

On Solaris and some recent versions of Linux, the -x flag gives even more valuable information, as in this example, including the average amount of time each request spends in the wait queue and the percentage of time I/O requests are waiting to be serviced by the disk device. These numbers represent some of the best indicators of disk contention in the absence of a baseline, but they're no substitute for one. A disk can be 100% busy and yet the system can still provide adequate service. In our example, we can clearly see that the two disk devices (sd0 contains the message store and sd3 contains the logs and the email queue) are not saturated and, therefore, this system is not I/O bound.

Knowing that a disk always has requests sitting in the wait queue doesn't explain why a change in server behavior has occurred. If kilobytes per second increases while tps remains constant, it would indicate that we're dealing with larger requests, which may alert us to a temporary or permanent change in the type of email flowing through the system.

On some operating systems, iostat has problems reporting useful information about disks managed by software RAID or from a hardware RAID system. This is especially true for those numbers indicated on a percentage basis. Absolute throughput numbers such as numbers of reads and writes per second or bytes per second compared against a baseline are likely to be more reliable. Because email servers so often become I/O bound, iostat may be the single most important utility in the email administrator's toolkit. Anyone who expects to maintain such a system would be well advised to become very familiar with it.

In the operating system used in the examples here (Solaris 2.6), note that the CPU loading information given by the iostat command lists an I/O wait stat (the wt column), whereas the vmstat command lumps it in with the idle CPU state (the id column). Someone who looked at just the vmstat output might conclude that the system is not quite CPU bound, whereas this result would become more obvious if the CPU loading information was examined via top or iostat.

7.3.5 `netstat`

The third tool in the "*stat" trio is netstat. As one would expect, netstat provides information about system networking. It can display either a snapshot of very detailed information about nearly every conceivable network parameter (netstat -s) or periodic data like that found with vmstat or iostat (e.g., netstat -w 5 on BSD systems, netstat -i 5 on Solaris, or netstat -c on Linux).

Obviously, in its periodic mode, some of the parameters provided by netstat that we want to carefully observe include the number of packets per second and the number of bytes per second. Both statistics, and especially trends in them, can provide the most direct information on the objective external load on a system, so they should be tracked. How the ratio of input to output statistics might change can also be highly informative.

On some types of shared networks, such as Ethernet, when computers are connected to the network via a hub rather than a switch, two machines could potentially try to send a network packet at the same time. This attempt can result in a collision. Both senders will then wait for a small, random amount of time and try to send their packets again. On a shared network, the number of collisions is a good indicator of general network load. Again, hard and fast numbers are difficult to identify, as they depend on the speed of the network, packet sizes, and the number of other machines on the network, but as a rule of thumb a busy email server should not reside on a network that consistently shows hundreds of collisions per second. On a switched network, no collisions should occur. If they do arise, it might mean that the switch, or the connection between the server and the switch, dropped into a nonswitched mode for some period of time. To avoid this possibility, one can lock network interfaces on switched networks into full-duplex, rather than letting them autonegotiate speed and mode.

The other piece of data of special value from netstat in periodic mode involves the error rates. An error usually indicates that a packet has failed its checksum that is, its contents don't match what the packet header indicates. An output error indicates that this problem occurred somewhere between the formation of the packet by the operating system and its transmission over the wire. This result is never good. Even a handful of entries in this field can indicate a serious problem with the server's NIC and should be investigated. Input errors are less severe, as a packet might legitimately have become corrupted traveling over a network to the server, but input error rates of even 0.1% may indicate a network problem, such as bad cabling, electrical interference, or a bad NIC. An error rate of 1% means something is seriously wrong with the network somewhere, and this problem should be tracked down and eliminated before it worsens and interferes with operations.

7.3.6 `sar`

On System V-derived UNIX versions, you can run the System Activity Reporter (sar) program in the background to gather statistics and accounting information, including much of the data reported by the tools that have already been mentioned in this section. It is an excellent baselining tool, and collecting data every 1 to 15 minutes on a system via sar and archiving those data is something that every server administrator should seriously consider. This effort will be worthwhile on any system where performance monitoring is important.

Just about every piece of data one could want to examine is available via sar. In fact, it's more likely that one will miss key information due to the presence of too much data than that information on the nature of a given problem isn't available. This tool provides a superset of the information available from vmstat, iostat, netstat, and other utilities. Any performance-critical server administrator should become very familiar with sar and its affiliated utilities.

7.3.7 Other Utilities

Many other utilities could have been mentioned here, such as pstat, lsof, ifconfig, systat, pstack, ad nauseum. They have been omitted not because they're not valuable, but because a line must be drawn somewhere. Playing around with these other possibilities is worthwhile with the proviso that before one makes a new utility part of the "canon," it should be demonstrated that programs that are more familiar and already on the system cannot easily generate the same information.

Finally, if for no other reason than to satisfy the reader's curiosity, I'll explain the stat shell script that appeared on the example ps output. This trivial script receives output from commands such as vmstat and iostat that do not indicate the date and time the data were gathered, and adds this information. Thus, instead of

 % vmstat 15   procs         memory ...   r b w swap  free  re ...   0 0 0  4692  1804  0 ...   7 1 0 64440  1956  5 ...   8 0 0 64008  1800  8 ...   8 0 0 64612  2276 10 ...

we could run

 % vmstat 15 | /usr/local/etc/stat  020315 15:14:03  procs        memory ...  020315 15:14:03  r b w swap free  re ...  020315 15:14:03  0 0 0  4692 1804  0 ...  020315 15:14:18  7 1 0 64440 1956  5 ...  020315 15:14:33  8 0 0 64008 1800  8 ...  020315 15:14:48  8 0 0 64612 2276 10 ...

Now data from one source can be matched up in time against data from another source.

The stat script is trivial:

 #!/bin/sh  OLDIFS=$IFS  IFS=  while read LINE  do          echo -n 'date "+%y%m%d %H:%M:%S"'          echo " " $LINE  done  IFS=$OLDIFS

IFS is redefined to be null so that the whitespace isn't adjusted when each line of input is collected by the read command.

7.3.1 `ps`

7.3.2 `top`

Table 7.1. Sample `/usr/ucb/ps -uaxc` Output from the CPU-Bound Test Server

7.3.3 `vmstat`

Table 7.2. Sample `vmstat 15` Output from the CPU-Bound Test Server

7.3.4 `iostat`

Table 7.3. Sample `iostat -cx 15` Output from the CPU-Bound Test Server

7.3.5 `netstat`

7.3.6 `sar`

7.3.7 Other Utilities

7.3 Tools

7.3.1 ps

7.3.2 top

Table 7.1. Sample /usr/ucb/ps -uaxc Output from the CPU-Bound Test Server

7.3.3 vmstat

Table 7.2. Sample vmstat 15 Output from the CPU-Bound Test Server

7.3.4 iostat

Table 7.3. Sample iostat -cx 15 Output from the CPU-Bound Test Server

7.3.5 netstat

7.3.6 sar

7.3.7 Other Utilities

7.3.1 `ps`

7.3.2 `top`

Table 7.1. Sample `/usr/ucb/ps -uaxc` Output from the CPU-Bound Test Server

7.3.3 `vmstat`

Table 7.2. Sample `vmstat 15` Output from the CPU-Bound Test Server

7.3.4 `iostat`

Table 7.3. Sample `iostat -cx 15` Output from the CPU-Bound Test Server

7.3.5 `netstat`

7.3.6 `sar`