Performance Monitoring with top

Performance Monitoring with `top`

The easiest-to-use of the process-monitoring utilities is top, so named because it was originally designed to list the top 10 processes currently running on the system, in descending order of CPU usage. The current version of FreeBSD's top by default shows you every process currently running in any statesomewhere around 40 processes on a freshly installed FreeBSD system.

The benefit that top provides is that it's interactive and works in real time. When you run it, it takes over your terminal and updates itself every second, giving you instantaneous information about the state of the system at that moment. You can also pass commands to top, such as the kill and renice commands (covered later in this chapter), or give it different options for filtering and sorting the processes it shows you. This makes top an immensely useful tool for reining in an out-of-control server, fine-tuning the performance of certain tasks, or simply keeping an eye on things as you work in another window.

`top` Output Explained

When you run the top program, you get output similar to what's shown in Listing 15.1.

Listing 15.1. Sample Output of `top`

last pid: 30283;  load averages:  0.51,  0.89,  0.87   up 52+15:48:43  11:19:03 126 processes: 1 running, 124 sleeping, 1 zombie CPU states:  0.7% user,  0.0% nice,  2.8% system,  0.7% interrupt, 95.8% idle Mem: 142M Active, 35M Inact, 59M Wired, 7496K Cache, 35M Buf, 4256K Free Swap: 500M Total, 48M Used, 452M Free   PID USERNAME      THR PRI NICE   SIZE    RES STATE  C   TIME   WCPU  COMMAND 30283 bob             1   2    0  1360K   976K sbwait 1   0:00  6.02%  qpopper 30282 root            1  29    0  2076K  1236K CPU1   0   0:00  3.14%  top 19460 mysql          11   2    0 25908K  2692K poll   0 112:20  2.20%  mysqld   245 root            1   2    0   868K   232K select 1 177:39  0.00%  healthd 18427 root            1   2    0  7592K  5092K select 0  80:24  0.00%  named 86694 frank           1  10    0  1700K   56K  nanslp 0  76:46  0.00%  elm    86 root            1   2  -12  1296K   412K select 0   6:28  0.00%  ntpd 80717 root            1  10    0  2132K   472K nanslp 0   5:53  0.00%  telnetd 61945 root            1   2    0  1868K   360K select 0   3:29  0.00%  inetd    80 root            1   2    0   916K   320K select 0   3:26  0.00%  syslogd 56054 root            1   2    0  2168K   584K select 0   3:12  0.00%  sshd 73772 root            1   2    0  8956K  2540K select 0   3:12  0.00%  httpd 40567 www             1   2    0  9880K  2872K sbwait 1   0:57  0.00%  httpd 40581 www             1   2    0 10008K  3796K sbwait 0   0:55  0.00%  httpd

By default, top shows all the system's processes (no matter who owns them), whether they're active, idle, or in "zombie" mode, and how much CPU time they're taking up. The first useful bit of information is in the second linethe number of processes. This number varies from system to system, but chances are that many more processes are currently running than can fit on your screen. You can press the I ("eye") key to switch top into displaying only the processes that are active. Because the top program is interactive, there are a number of other commands you can issue while it is running, such as the K key followed by a process ID to kill a process. We will cover these commands a little later.

The next items to notice in the top output are the "load averages," which are fairly obtuse metrics you can use to tell at a glance how busy the system is. The exact derivation of the values is from the number of jobs executed over the last 1, 5, and 15 minutes, respectively; however, it's difficult to relate this to a profile of a system running real-world applications that vary greatly in resource consumption from job to job. A load of 1 generally means that the system is processing each job as it comes in, as though each one were a single person in line at the post office; if the load is higher than 1, it means processeslike post office customersare stacking up in line and the system is becoming more congested.

Load Averages

Eventually, you will get a feel for what constitutes a high load average on your system; typically, a load should not go above 1 on a continuous basis for a server, although a desktop system with lots of graphical tools will bump it up into the 2 to 3 range (5 is considered an unsustainably high load). Certain daemons stop accepting new requests at a certain load level (for example, 12 for Sendmail). If the load reaches 20 or 30, chances are the system is in a feedback-loop situation (which can be thought of as a race condition), in which new processes are being created faster than the system can complete them. This only serves to slow it down more and drive the load higher, in what's not-so-affectionately termed a death spiral.

This is one of those rare times when you might have to reboot a UNIX system because a server under this kind of load can become so tightly wedged that it will never recoveror it will take so long to complete all its processes and return to normal duty that it's faster to just reboot. Either way, your remote Telnet or SSH session might be unresponsive in this condition, or you might not be running one at the time the death spiral occurs (in which case, the system probably won't be able to open a new connection for you to come to the rescue). This is when logging in to the physical consoleor even power-cycling the machine, or shutting it down physically and rebooting, as an absolute last resortmight be the only recourse.

The header block contains more information about the RAM in the system than you'll probably ever find useful. You won't find a simple "used/free" graph of all available RAM here; instead, you see the states of all chunks of memory in the fourth and fifth lines in Listing 15.1.

Note

Here is where FreeBSD's robust memory management system shows its ugly underbelly. In UNIX, there is no such thing as a simple, clear division between memory that is used and memory that isn't used. The amount of RAM you have installed in your system is a nice figure to know, but it will never have any bearing on your day-to-day usageyou can't just add up the memory requirements of every application and calculate how many such programs you can fit into the RAM you've got installed. Because UNIX's memory model is heavily dependent on virtual memory (free space on the hard disk used for caching inactive data from RAM, also known as swap), you can actually run far more applications than you'd think would normally fit into RAM. The only drawback is a decrease in speed as more data (that needs to be accessed more often) gets paged into swap. See Chapter 2, "Installing FreeBSD," for a discussion of ways to optimize your swap partition for maximum efficiency, such as putting it near the edge of the disk.

Don't look at the Free block and assume that it represents all the memory available in the system. That block is only the memory that hasn't yet been used at all since the system was last brought online. What you should be looking at is the Active block because that describes memory in use by active processesprograms that are currently running and not idle. The rest of the fields describe other states of use that may or may not be mutually exclusive, so adding up all the fields won't necessarily give you the amount of RAM you have. It will, in fact, probably add up to more.

The Swap fields are more straightforward. Here, data is paged in and out of the virtual memory space as needed (copied to the disk and out of RAM), and usually the only fields that top shows are Used and Free. The numbers here add up predictably. It's probably more useful to look at the Swap fields than at the actual RAM fields to see how well your system is doing; if there's a lot of data in Swap (50 percent or more used), it means that data has been paged in fairly recently as a result of your physical RAM being full, and you may want to consider adding more memory. A FreeBSD system rarely runs out of swap space. If it does, as with most UNIX implementations, the results will usually be benign (you'll see error messages, but the system won't destabilize). The occasional unpredictable behavior or instability will surface, however. You'll want to keep your swap as little used as possiblefor this reason and also because naturally everything runs faster in RAM than in swap.

Next, notice that the processes are listed in descending order in the WCPU column. This column lists how much of the CPU's cycles are being used currently by each process (using a "weighted" scale, taking into account CPU cycles in which the process was in a "resident" state). Don't expect the column to add up to 100%your CPU will only be lightly used most of the time, and most of the CPU's cycles will be unused. Take a look at the headers again; the CPU states line tells you how much of the processor is being used in each of the four possible states, and you can relate these values fairly closely to the percentages in the CPU column.

Note

Some programs are designed to use 100 percent of the CPU, unless actively throttled by the configuration. For example, Qmail (an SMTP daemon that we will cover in Chapter 25, "Configuring Email Services") or a database back-end such as MySQL might run to 100 percent of the system's capacity during heavy load. This is normal behavior and should not cause concern if the system's primary role is in running those programs.

The CPU operates in discrete cycles, many millions per second (depending on its speed). Each of these cycles is dedicated to some part of some process, and over time a process will have used enough of these cycles to add up to a number measurable in seconds. This is what the TIME column tells you. Don't let the colon separator fool you into thinking that it's an hours:minutes reading; the values in the TIME column actually represent the number of CPU seconds that the process used in system states and user states, respectively. It may take minutes or hours for a process to use enough cycles to accumulate a measurable number. If a process has a large value (such as mysqld in the sample output in Listing 15.1), it's usually because the process has been running for weeks or it has become a runaway and has been taking up some huge percentage of the CPU during its runtime. In the latter case, you can easily check by looking at the WCPU column.

The next parts of top's output that you should understand are the SIZE and RES columns. SIZE is the entirety of a process's allocated size, including the text, data, and stack components. Because parts of these components are shared systemwide, this column is not accurate for seeing how much memory a process is using. Instead, RES shows the resident memory value (this column should add up to the current amount of in-use memory). Both size values are "correct" in their own way, but you should use RES for determining the "traditional" amount of memory a process uses, the equivalent to what it would be reported as using in Windows or classic Mac OS.

The rest of the fields in top are less important or are self-explanatory. The C column tells you which CPU a process is using if your system has more than one. PID is the process ID, a number that is assigned to each process upon execution, and USERNAME is the user who executed the process. STATE tells you which of the possible states a process is in, which isn't very informative unless it's zomb or zombie (which refers to a child process that has terminated but has not yet fully given up its process table space).

Using Interactive `top` Commands

You also can give top commands interactively to help sort through the information it gives you. Earlier in this section, you learned that you can press the I key to show only active processes. You also can press the U key to be prompted for a username; top then displays only processes owned by that username (use + as the username to show them all again). You can issue a kill command with the K key, which then prompts you for a PID to kill. The T key toggles whether the top process itself is displayed. These and other options are listed in the man top page.

With this feature set, top serves as a very good all-around summary of what's going on in the system, and it allows you to handle the majority of the process-management tasks you'll have to perform. But top isn't a total solution; it doesn't give you detailed information about the processes themselves, and its interactive nature keeps top from being a scriptable tool or something that can be used in conjunction with pipes and other programs. For these functions, you use ps.