15.2 Monitoring and Controlling Processes

Unix provides the ability to monitor process execution and, to a limited extent, specify execution priorities. By doing so, you can control how CPU time is allocated and (indirectly) how memory is used. For example, you can expedite certain jobs at the expense of all others, or you can maintain interactive response times by forcing large jobs to run at lowered priority. This section discusses Unix processes and the tools available for monitoring and controlling process execution.

The uptime command gives you a rough estimate of the system load:

% uptime  3:24pm up 2 days, 2:41, 16 users, load average: 1.90, 1.43, 1.33

uptime reports the current time, how long the system has been up, and three load average figures. The load average is a rough measure of CPU use. These three figures report the average number of processes active during the last minute, the last five minutes, and the last 15 minutes. High load averages usually mean that the system is being used heavily and the response time is correspondingly slow. Note that the system's load average does not take into account the priorities of the processes that are running.

What's high? As usual, that depends on your system. Ideally, you'd like a load average under about 3-5 (per CPU), but that's not always possible given the workload that some systems are required to handle. Ultimately, "high" means high enough that you don't need uptime to tell you that the system is overloaded you can tell from its response time.

Furthermore, different systems behave differently under the same load average. For example, on some workstations, running a single CPU-bound background job at the same time as X Windows will bring interactive response to a crawl even though the load average remains quite low. A low load average is no guarantee of a fast response time, because CPU availability is just one factor affecting overall system performance. You can generally expect to see higher typical load averages on server systems than on single-user workstations.

15.2.1 The ps Command

The ps command gives a more complete picture of system activity. This utility produces a report summarizing execution statistics for current processes. The command's options control which processes are listed and what information is displayed about each one. The format of the command differs considerably between the BSD and System V forms.

To obtain an overall view of current system activity, the most useful form of the BSD-style command is ps aux, which produces a table of all processes, arranged in order of decreasing CPU usage at the moment when the ps command was executed.^[2] It is often useful to pipe this output to head, which displays the most active processes:

^[2] Linux, FreeBSD, AIX, and Tru64 provide the BSD form of ps. Under AIX and Tru64, the ps command supports both BSD and System V options. The BSD options are not preceded by a hyphen (which is a legal syntax variation under BSD), and the System V options do include a hyphen. Thus, for these Unix versions, ps -au does not equal ps au.

Even in this mode, however, the AIX command is the System V version, even if its output is displayed with BSD column headings. Thus, ps aux output is displayed in PID rather than %CPU order. Solaris also provides a somewhat BSD-like ps command in /usr/ucb (which uses System V column headings).

% ps aux | head -5  USER      PID  %CPU  %MEM   SZ   RSS  TTY STAT  TIME COMMAND  harvey  12923  74.2  22.5  223   376  p5  R     2:12 f77 -o test test.F  chavez  16725  10.9  50.8 1146  1826  p6  R N  56:04 g04 HgO.dat  wang    17026   3.5  1.2   354   240  co  I     0:19 vi benzene.txt  marj     7997   0.2  0.3   142    46  p3  S     0:04 csh

The meanings of the fields in this output (as well as others displayed by the -l option to ps) are given in Table 15-2.

The first line in the previous example shows that user harvey is running a Fortran compilation. This process has PID 12923 and is currently running or runnable. User chavez's process (PID 16725), executing the program g04, is also running or runnable, though at a lowered priority. From this display, it's obvious who is using the most system resources at this instant: harvey and chavez have about 85% of the CPU and 73% of the memory between them. However, although it does display total CPU time, ps does not average the %CPU or %MEM values over time in any way.^[3]

^[3] This describes the true BSD definition for these fields. However, many System V-based operating systems fudge them even when they provide a BSD-compatible ps command. Under Linux, AIX, and Solaris, the %CPU column has a different meaning: it indicates the ratio of CPU time to elapsed time for the entire lifetime of each process, a very different statistic than current CPU usage.

Table 15-2. ps command output
Column	Contents
USER (BSD)UID (System V)	Username of process owner.
PID	Process ID.
%CPU	Estimated fraction of CPU consumed (FreeBSD and Tru64); CPUtime/elapsed time (AIX, Solaris, and Linux)
%MEM	Estimated fraction of system memory consumed (BSD-style); the estimates are sometimes quite poor
SZ	Virtual memory used in KB (BSD) or pages (System V)
RSS	Physical memory used (in same units as SZ)
TT, TTY	TTY associated with process.
STAT (BSD)S (System V)	Current process state; one (or more, under BSD) of the following: R Running or runnable. S Sleeping I Idle (BSD); Intermediate state (System V) T Stopped Z Zombie process D (BSD) Disk wait X (System V) Growing: waiting for memory K (AIX) Available kernel process W (BSD) Swapped out N (BSD) Niced: execution priority lowered < (BSD) Niced: execution priority artificially raised TIME Total CPU time used
COMMAND	Command line being executed (truncated).
STIME (System V)STARTED (BSD)	Time or date process started.
F	Flags associated with process (see the `ps` manual page).
PPID	Parent's PID.
NI	Process nice number.
C (System V)CP (BSD)	Short term CPU-use factor; used by scheduler for computing the execution priority (PRI).
PRI	Actual execution priority (recomputed dynamically).
WCHAN	Specifies the event the process is waiting for.

A vaguely similar listing is produced by the System V ps -ef command:

$ ps -ef  UID        PID  PPID   C STIME    TTY      TIME CMD  root         0     0   0 09:36:35 ?        0:00 sched  root         1     0   0 09:36:35 ?        0:02 /etc/init ... marj      7997     1  10 09:49:32 ttyp3    0:04 csh  harvey   12923 11324   9 10:19:49 ttyp5   56:12 f77 -o test test.F  chavez   16725 16652  15 17:02:43 ttyp6   10:04 g04 HgO.dat  wang     17026 17012  14 17:23:12 console  0:19 vi benzene.txt

The columns hold the username, process ID, parent's PID (the PID of the process that created it), the current scheduler value, the time the process started, its associated terminal, its accumulated CPU time, and the command it is running. Note that the ordering is by PID, not resource usage. This form of ps is supported under Solaris, HP-UX, AIX, and Tru64. ps is also useful in pipes; a common use is:

% ps aux | grep chavez

This command lists the processes user chavez currently has running.

You can use the sort command in conjunction with the System V version of ps to extract performance-related data from its process listings. For example, the following command finds processes using large amounts of memory (shown in the SZ field):

$ ps -el | head -1 ; ps -el | sort -nkr10 | head -5         F S UID    PID   PPID   C PRI NI SZ     .. .  TIME  CMD   240001 A 603 630828 483460 240 120 20 9711568 29530:42  l703.exe   240001 A 603 573616 540786 240 120 20 9710404 29516:30  l802.exe   240001 A   0 221240 139322   0  60 20 6140       25:50  X   240001 A   0 303204 270428   0  60 20 2004        0:32  sendmail   240001 A   0 458898 270428   0  60 20 1996        0:07  IBM.Errmd

Some columns have been removed from this output for space reasons.

15.2.2 Other Process Listing Utilities

There are several useful, free system monitoring tools. In this section, we'll look at pstree and top.

pstree displays system processes in a tree-like structure, and it is accordingly useful for illuminating the relationships between processes and for a quick, pictorial snapshot of what is running on the system. pstree was written by Werner Almesberger. It can be found by itself on many network sites and as part of the psmisc package (ftp://sunsite.unc.edu/pub/Linux/system/status/ps). It is included by default on Linux, and FreeBSD includes it among the additional packages on the installation CDs.^[4]

^[4] Solaris has a vaguely similar utility named ptree .

Here is an example of its output:

$ pstree  init-+-alarmd      |-anacron      |-apmd      |-atd      |-crond      |-gpm      |-inetd-+-in.rlogind---bash---vi                                  Two remote users.      |       `-in.rlogind---bash---mkps---gbmat-+-grops       |                                          |-gtbl       |                                          `-gtroff       |-kapm-idled      |-7*[kdeinit]      |-kdeinit-+-kdeinitKDE clients .      |         `-kdeinit---bash-+-pstree      |                          |-xclock      |                          |-xterm---tcsh---ls      |                          `-2*[xterm---rlogin]      |-kdeinit---cat      |-keventd      |-khubd      |-kjournald      |-klogd      |-login---bash---startx---xinit-+-X                         X windows main processes.      |                               `-startkde---ksmserver      |-mdrecoveryd      |-5*[mingetty]      |-portmap      |-rpc.statd      |-sendmail      |-sshd      |-syslogd      |-vmware-guestd      |-xfs      `-xinetd---fam

In general, all processes are listed by command name, and child processes appear to the right of their parent process. Thus, init appears at the extreme left of the display, appropriately, because it is the ultimate parent of every other process. The notation:

n*[command]

indicates that there are n processes running command. The sample output shows five mingetty processes.

On this system, there are three groups of user processes:

A local user running X and several clients: the KDE window manager, xclock; two xterm windows onto remote systems, and a local xterm window running the tcsh shell. These processes are displayed on the second and third annotated groups of lines in the output.
A remote user running the bash shell and this pstree command (the annotated line headed by "inetd").
Another remote user running three GNU text processing utilities (the three lines making up the second branch "in.rlogind" under "inetd").

The remainder of the lines in the display are the usual system processes.

The top utility provides a continuous display of the system status and most active processes, which it automatically updates every few seconds. Versions of top are included with FreeBSD, HP-UX, Linux, and Tru64. The utility was written by William LeFebvre and is available from http://www.groupsys.com/top/.

Here is a snapshot of the display from a Linux system:

6:19pm  up 13 days, 23:42,  1 user,  load average: 0.03, 0.03, 0.00  28 processes: 27 sleeping, 1 running, 0 zombie, 0 stopped  CPU states: 7.7% user, 14.7% system,  0.0% nice, 77.6% idle  Mem: 6952K av, 6480K used,   472K free,  3996K shrd,  2368K buff  Swap: 16468K av,  2064K used, 14404K free  PID  USER   PRI  NI  SIZE RSS  SHARE STAT %CPU %MEM  TIME COMMAND  1215 chavez  14   0  8908 8908  7940 S     1.1  9.4  0:03 kdeinit  1106 chavez  14  -1 12748 9420  1692 S <   0.9  9.9  0:14 X  1262 chavez  16   0  1040 1040   836 R     0.9  1.1  0:00 top  1201 chavez   9   0 10096 9.9M  9024 S     0.1 10.6  0:02 kdeinit     1 root     8   0   520  520   452 S     0.0  0.5  0:04 init     2 root     9   0     0    0     0 SW    0.0  0.0  0:00 keventd   ...

The first five lines give general system information: uptime statistics, overall number of processes statistics, and current CPU, memory, and swap space usage. The rest of the display consists of output similar to that provided by various options to ps (with similar column headings), arranged in order of decreasing current CPU usage. In top displays, the %CPU column indicates very recent CPU consumption for each process (over the last minute or less of elapsed time).

The HP-UX version of top is display-only. By default, the top display is updated every five seconds. You can change that interval using these command forms:

FreeBSD	`top -s 8`
Linux	`top d8`
HP-UX	`top -s 8`
Tru64	`top -s 8`

All of these examples set the update interval to eight seconds. top runs continuously until you press the q key.

Most versions of top also allow you to interact with the processes that are being displayed. Pressing the k and r keys allow you to kill and renice a process, respectively (these actions are discussed in detail later in this chapter). In both cases, top will prompt you for the PID of the process that you want to affect.

15.2.3 The /proc Filesystem

All of the Unix versions we are considering except HP-UX support the /proc filesystem. This is a pseudo filesystem whose files are actually views into parts of kernel memory and its data structures.

On most systems, the /proc filesystem consists entirely of numbered files or subdirectories under /proc, each named for the corresponding process's PID. When these items are subdirectories, the available information about each process is divided among several files located within it. Here is an example from a Linux system:

$ ls /proc/1234 cmdline  cwd  environ  exe  fd  maps  mem  root  stat  statm  status

The per-process information contained in the /proc filesystem is generally available in other ways (e.g., via the ps command).

Linux systems extend the /proc filesystem to include many other files and subdirectories that hold a great many system settings and current system data. For example, the cpuinfo file contains information about the processor on the computer:

$ cat /proc/cpuinfo processor       : 0 vendor_id       : GenuineIntel cpu family      : 6 model           : 7 model name      : Pentium III (Katmai) stepping        : 3 cpu MHz         : 497.847 cache size      : 512 KB fdiv_bug        : no hlt_bug         : no f00f_bug        : no coma_bug        : no fpu             : yes fpu_exception   : yes cpuid level     : 2 wp              : yes flags           : fpu vme de pse tsc msr pae mce cx8 apic sep                    mtrr pge mca cmov pat pse36 mmx fxsr sse bogomips        : 992.87

These are some of the most useful files under /proc:

devices: Major and minor device number.
filesystems: Filesystems supported by the current kernel.
meminfo: Memory usage and configuration statistics.
modules: Loaded kernel modules.
pci: List of detected PCI devices and their configurations.
scsi/scsi: List of detected SCSI devices and their configurations.
version: Linux version of the currently running kernel (long version). The file /proc/sys/kernel/oslevel lists only the numeric Linux kernel version string.

There are many, many more files in the /proc tree. However, I consider many of them to be of marginal use to those who are not programmers or script writers, because their information is available in a more convenient, prettier form via standard Unix commands.

In addition, the sys subdirectory tree provides access to kernel variables. Some of these files can be modified to change the corresponding system value. For example, the file kernel/panic holds the number of seconds to wait before rebooting after a kernel panic. These commands change the default value of 0 (immediately) to 60 seconds:

# cd /proc/sys/kernel # cat panic 0 # echo "60" > panic

Changing kernel variables always carries associated risk. Experiment on nonproduction systems.

Such changes do not persist across boots, so you'll need to place such commands into a boot script to make them permanent.

15.2.4 Kernel Idle Processes

Occasionally, you may seeprocesses that seem to have accumulated a staggering amount of both CPU time and short-term CPU usage, as in these examples:

AIX USER PID  %CPU %MEM   SZ RSS  TTY STAT STIME      TIME COMMAND  root 516  99.2  0.0   20 20   -   A    Mar 18  6028:47 kproc  Tru64 USER PID  %CPU %MEM   SZ RSS  TTY STAT STIME      TIME COMMAND  root   0   0.0  7.7 396M 17M  ??  R    Jan 23 49:46.53 [kernel idle]

Both listed processes are kernel idle processes, which indicate how much idle time available CPU cycles that went unused has accumulated since the last system reboot. On AIX systems, there are usually multiple kproc processes (and not all of them are necessarily idle). In any case, such processes are no cause for concern.

15.2.5 Process Resource Limits

Unix provides very simple process resource limits. These are the limits that may be defined:

Total accumulated CPU time
Largest file that may be created (whether created from scratch or by extending an existing file)
Maximum size of the data segment of the process
Maximum size of the stack segment of the process
Maximum size of a core file (created when a program bombs)
Maximum amount of memory that may be used by the process

Resource limits are divided into two types: soft and hard. Soft limits are resource use limits currently applied by default when a new process is created. A user may increase these values up to the systemwide hard limits, beyond which only the superuser may extend them. Hard limits are thus defined as absolute ceilings on resource use.

The C shell and tcsh have two built-in commands for displaying and setting resource limits. The limit command displays current resource limits. The hard limits may be displayed by including the -h option on the limit command:

% limit                                 % limit -h cputime          1:00:00                cputime          unlimited  filesize         1048575 kbytes         filesize         unlimited  datasize         65536 kbytes           datasize         3686336 kbytes  stacksize        4096 kbytes            stacksize        262144 kbytes  coredumpsize     1024 kbytes            coredumpsize     unlimited  memoryuse        32768 kbytes           memoryuse        54528 kbytes

The bash and ksh equivalent command is ulimit (also supported in some Bourne shells). The -a and -Ha options will display the current soft and hard limits respectively; for example:

$ ulimit -a                             $ ulimit -Ha  time(seconds)        3600               time(seconds)        unlimited file(blocks)         2097151            file(blocks)         2097151  data(kbytes)         65536              data(kbytes)         257532  stack(kbytes)        4096               stack(kbytes)        196092 memory(kbytes)       32768              memory(kbytes)       unlimited coredump(blocks)     1024               coredump(blocks)     unlimited

Table 15-3 lists the commands that set the values of resource limits. They would usually be placed in users' login initialization files.^[5]

^[5] There is also a PAM module for setting limits.

Table 15-3. Setting per-process resource limits
Resource	csh and tcsh	bash and ksh
CPU time	`limit cputime` `secs`	`ulimit -t` `secs`
Maximum file size	`limit filesize` `KB`	`ulimit -f` `KB`
Maximum process data segment	`limit datasize` `KB`	`ulimit -d` `KB`
Maximum process stack size	`limit stacksize` `KB`	`ulimit -s` `KB`
Maximum amount of physical memory	`limit memory` `KB`	`ulimit -m` `KB`
Maximum core file size	`limit coredumpsize` `KB`	`ulimit -c` `KB`
Maximum number of processes^[6]		`ulimit -u` `n`
Maximum amount of virtual memory^[6]		`ulimit -v` `KB`

^[6] bash only.

For example, the following commands increase the current CPU time limit to its maximum value and increase the memory use limit to 64 MB:

bash and ksh	C shell and tcsh
$ ulimit -t unlimited $ ulimit -m 65536	% limit cputime unlimited % limit memory 65536

Now for the bad news. On most Unix systems, resource limits are poorly implemented from an administrative standpoint, for several reasons. First, the hard limits are often hard-wired into the kernel and cannot be changed by the system administrator. Second, users can always change their own soft limits. All an administrator can do is place the desired commands into users' .profile or .cshrc files and hope. Third, the limits are on a per-process basis. Unfortunately, many real jobs consist of many processes, not just one. There is currently no way to impose limits on a parent process and all its children. Finally, in many cases, limits are not even enforced; this is most often true of the ones you probably care about the most: CPU time and memory use. You'll need to experiment to find out which ones are enforced on your system.

FreeBSD is an exception, and limits can be effectively set via login classes (/etc/login.conf). See Section 6.2 for details.

However, one limit which it is often worth setting in user login initialization files is the core file size limit. If the users on your system will have little use for core files, set the limit to 0, preventing their creation.

15.2.6 Process Resource Limits Under AIX

AIX includes the structure for a more elaborate version of these limits, via the file /etc/security/limits (which may be modified directly or by the chuser command). It has stanzas of the form:

chavez:     fsize = 2097151          Maximum file size.    core = 0                 Maximum core file size.     cpu = 3600               Maximum CPU seconds.    data = 131072            Maximum process data segment.    rss = 65536              Maximum amount of physical memory.    stack = 8192             Maximum process stack size.

Each stanza specifies the resource usage limits for the username that labels the stanza. These settings specify absolute limits on resource usage, and they cannot be overridden by the user.

To change chavez's memory use limit, use a command like this one:

# chuser rss=102400 chavez

This command sets chavez's default memory use limit to 100 MB by modifying or adding the rss line for chavez in /etc/security/limits. As usual, the limits set in the default stanza are applied for any user without specific settings of her own. Setting a limit to a value of -1 will allow unlimited use of that system resource.

You can also use SMIT to specify user per-process resource limits. The dialog is illustrated in Figure 15-1, and it displays the appropriate fields from the user account addition/modification screen.

Figure 15-1. Setting per-process Resource Limits with SMIT

15.2.7 Signaling and Killing Processes

Sometimes it's necessary to eliminate aprocess entirely; this is the purpose of the kill command. The syntax of the kill command, which is actually a general purpose process signaling utility, is as follows:

# kill [-signal] pids

pid is the process's identification number (or a space-separated list of process numbers), and signal is the (optional) signal to send to the process. The default signal is number 15, theTERM signal, which asks the process to terminate.^[7] In general, either the signal number or its symbolic name may be used (although on a few older System V systems, the signal must be specified numerically). You must be the superuser in order to kill someone else's process.

^[7] This signal number happens to be the same in System V and BSD. Be aware that this is not always the case. Signals are defined in the /usr/include/signal.h file (or /usr/include/sys/signal.h), and the command kill -l may be used to generate a quick list of their symbolic names.

Sometimes, a process may still exist after a kill command. If this happens, execute the kill command with the -9 option, which sends the process signal number 9, appropriately named KILL. This almost always guarantees that the process will be destroyed. However, it does not allow the dying process to clean up before terminating and therefore may leave the process' files in an inconsistent state.

Suspended processes must be resumed before they can be killed.

15.2.7.1 Killing multiple processes with killall

Although you can use the kill command to kill more than one process at the same time, many systems provide a killall command to make this process slightly easier. This command began life as part of the System V system shutdown procedures. In its simplest form, it kills all processes in the same process group as the process that invoked it (but not the calling process itself); thus, when invoked by init as part of a system shutdown, it will kill all processes running on the system. Like kill, killall optionally takes a signal name or number as its argument. This form of killall may also be useful in administrative scripts, and it is provided by Tru64, AIX, HP-UX, and Solaris.^[8]

^[8] Some older Unix operating systems also have a killall command, but it has a completely different function. Check the manual page to be safe before using it under an unfamiliar operating system.

Linux and FreeBSD offer an enhanced form of killall, which accepts a second argument: the name of a command. In this form, killall kills all processes running the specified command. For example, the following command sends a KILL signal to all processes running the find command:

# killall -KILL find

15.2.7.2 Processes that won't die

Occasionally, processes will not die even after being sent the KILL signal. The vast majority of such processes fall into one of three categories:

A process in the zombie state (displayed as Z status in BSD ps displays and as <defunct> under System V). When a process is exiting, the kernel informs its parent, and the latter must respond to that message. A zombie process results when the parent process does not respond. Usually, init handles terminating such processes when the parent is gone, but on occasion this fails to happen. Zombies are always cleared the next time the system is booted and rarely affect system performance adversely.
Processes waiting for unavailableNFS resources (for example, trying to write to a remote file on a system that has crashed) will not die if sent a KILL signal. Use theQUIT signal (3) or the INT (interrupt) signal (2) to kill such processes. See Section 10.4 for full details.
Processes waiting for a device to complete an I/O operation before exiting may not die even when sent a KILL signal. For example, a process might be waiting for a tape to finish rewinding.

15.2.7.3 Pausing and restarting processes

The signals STOP and CONT may be used to suspend and then resume a running process. They use the same mechanism as the Ctrl-Z facility within user shells, but these signals may be sent by the superuser to any running process.