System-Monitoring Tools

To keep your system in optimum shape, you need to be able to monitor it closely. Such monitoring is imperative in a corporate environment where uptime is vital and any system failures can cost real money. Whether it is checking processes for any errant daemons, or keeping a close eye on CPU and memory usage, Fedora provides a wealth of utilities designed to give you as little or as much feedback as you want. In this chapter we look at some of the basic monitoring tools, along with some tactics designed to keep your system up longer. Some of the monitoring tools cover network connectivity, memory, and hard drive usage, but all should find a place in your sysadmin toolkit. Finally you will learn how to manipulate active system processes using a mixture of graphical and command-line tools.

Console-Based Monitoring

Those familiar with UNIX system administration already know about the ps or process display command commonly found on most flavors of UNIX. Because Linux is closely related to UNIX, it also benefits from this command and allows you to quickly see the current running processes on the system as well as who owns them and how resource-hungry they are.

Although the Linux kernel has its own distinct architecture and memory management, it also benefits from enhanced use of the /proc file system, the virtual file system found on many UNIX flavors. Through the proc file system you can directly communicate with the kernel to get a deep view of what is currently happening. Developers tend to use the /proc file system as a way of getting information out from the kernel and for their programs to manipulate it into more human readable formats. The /proc file system is beyond the scope of this book, but if you want to get a better idea of what it contains you should head on over to http://en.tldp.org/LDP/Linux-Filesystem-Hierarchy/html/proc.html for an excellent and very in-depth guide.

Processes can also be controlled at the command line, which is important because you might sometimes have only a command-line interface. Whenever an application or command is launched, either from the command line or by clicking on an icon, the process that comes from the kernel is assigned an identification number called a process ID or PID for short. This number is shown in the shell if the program is launched via the command line.

$ xosview & [1] 11670

In this example, the xosview client has been launched in the background, and the (bash) shell reported a shell job number ([1] in this case). A job number or job control is a shell-specific feature that allows a different form of process control, such as sending or suspending programs to the background and retrieving background jobs to the foreground (see your shell's man pages for more information if you are not using bash).

The second number displayed (11670 in this example) represents the process ID. You can get a quick list of your processes by using the ps command like this:

$ ps   PID TTY           TIME CMD   736 tty1      00:00:00 bash   743 tty1      00:00:00 startx   744 tty1      00:00:00 tee   752 tty1      00:00:00 xinit   756 tty1      00:00:09 kwm   ... 11670 pts/4     00:00:00 xosview 11671 pts/4     00:00:00 ps

Note that not all output from the display is shown here. But as you can see, the output includes the process ID, abbreviated as PID, along with other information, such as the name of the running program. As with any UNIX command, many options are available; the proc man page has a full list. A most useful option is aux, which provides a friendly list of all the processes. You should also know that ps works not by polling memory, but through the interrogation of the Linux /proc or process filesystem. (ps is one of the interfaces mentioned at the beginning of this section.)

The /proc directory contains quite a few filessome of which include constantly updated hardware information (such as battery power levels, and so on). Linux administrators often pipe the output of ps through a member of the grep family of commands to display information about a specific program, perhaps like this:

$ ps aux | grep xosview USER   PID    %CPU %MEM  VSZ   RSS  TTY   STAT START  TIME COMMAND andrew 11670  0.3  1.1   2940  1412 pts/4 S    14:04  0:00 xosview

This example returns the owner (the user who launched the program) and the PID, along with other information, such as the percentage of CPU and memory usage, size of the command (code, data, and stack), time (or date) the command was launched, and name of the command. Processes can also be queried by PID like this:

$ ps 11670   PID TTY     STAT   TIME COMMAND 11670 pts/4   S      0:00 xosview

You can use the PID to stop a running process by using the shell's built-in kill command. This command asks the kernel to stop a running process and reclaim system memory. For example, to stop the xosview client in the example, use the kill command like this:

$ kill 11670

After you press Enter (or perhaps press Enter again), the shell might report

[1]+  Terminated           xosview

Note that users can kill only their own processes, but root can kill them all. Controlling any other running process requires root permission, which should be used judiciously (especially when forcing a kill by using the -9 option); by inadvertently killing the wrong process through a typo in the command, you could bring down an active system.

Using the `kill` Command to Control Processes

The kill command is a basic UNIX system command. We can communicate with a running process by entering a command into its interface, such as when we type into a text editor. But some processes (usually system processes rather than application processes) run without such an interface, and we need a way to communicate with them as well, so we use a system of signals. The kill system accomplishes that by sending a signal to a process, and we can use it to communicate with any process. The general format of the kill command is

# kill option PID

A number of signal options can be sent as words or numbers, but most are of interest only to programmers. One of the most common is

# kill PID

This tells the process with PID to stop; you supply the actual PID.

# kill -9 PID

is the signal for kill (9 is the number of the SIGKILL signal); use this combination when the plain kill shown previously does not work.

# kill -SIGHUP PID

is the signal to "hang up"stopand then clean up all associated processes as well. (Its number is -1.)

As you become proficient at process control and job control, you will learn the utility of a number of kill options. A full list of signal options can be found in the signal man page.

Using Priority Scheduling and Control

Every process cannot make use of the systems resources (CPU, memory, disk access, and so on) as it pleases. After all, the kernel's primary function is to manage the system resources equitably. It does this by assigning a priority to each process so that some processes get better access to system resources and some processes might have to wait longer until their turn arrives. Priority scheduling can be an important tool in managing a system supporting critical applications or in a situation in which CPU and RAM usage must be reserved or allocated for a specific task. Two legacy applications included with Fedora include the nice and renice commands. (nice is part of the GNU sh-utils package, whereas renice is inherited from BSD UNIX.)

The nice command is used with its -n option, along with an argument in the range of -20 to 19, in order from highest to lowest priority (the lower the number, the higher the priority). For example, to run the xosview client with a low priority, use the nice command like this:

$ nice -n 12 xosview &

The nice command is typically used for disk- or CPU-intensive tasks that might be obtrusive or cause system slowdown. The renice command can be used to reset the priority of running processes or control the priority and scheduling of all processes owned by a user. Regular users can only numerically increase process priorities (that is, make tasks less important) using this command, but the root operator can use the full nice range of scheduling (-20 to 19).

System administrators can also use the time command to get an idea of how long and how much of a system's resources are required for a task, such as a shell script. (Here, time is used to measure the duration of elapsed time; the command that deals with civil and sidereal time is the date command.) This command is used with the name of another command (or script) as an argument like this:

# time -p find / -name core -print /dev/core /proc/sys/net/core real 1.20 user 0.14 sys 0.71

Output of the command displays the time from start to finish, along with the user and system time required. Other factors you can query include memory, CPU usage, and filesystem input/output (I/O) statistics. See the time command's man page for more details.

Nearly all graphical process-monitoring tools include some form of process control or management. Many of the early tools ported to Linux were clones of legacy UNIX utilities. One familiar monitoring (and control) program is top. Based on the ps command, the top command provides a text-based display of constantly updated console-based output showing the most CPU-intensive processes currently running. It can be started like this:

# top

After you press Enter, you see a display as shown in Figure 16.1. The top command has a few interactive commands: Pressing h displays the help screen; pressing k prompts you to enter the pid of a process to kill; pressing n prompts you to enter the pid of a process to change its nice value. The top man page describes other commands and includes a detailed description of what all the columns of information top can display actually represent; have a look at top's well-written man page.

Figure 16.1. The `top` command can be used to monitor and control processes. Here, we are prompted to `renice` a process.

The top command displays quite a bit of information about your system. Processes can be sorted by PID, age, CPU or memory usage, time, or user. This command also provides process management, and system administrators can use its k and r keypress commands to kill and reschedule running tasks, respectively.

The top command uses a fair amount of memory, so you might want to be judicious in its use and not leave it running all the time. When you've finished with it, simply press q to quit top.

Displaying Free and Used Memory with `free`

Although top includes some memory information, the free utility displays the amount of free and used memory in the system in kilobytes (the -m switch displays in megabytes). On one system, the output looks like this:

# free                      total    used     free    shared   buffers   cached Mem:                 516372   484972   31400   0        19816      317420 -/+ buffers/cache:   147736   368636 Swap:                433712        0   433712

This output describes a machine with 512MB of RAM memory and a swap partition of 444MB. Note that some swap is being used although the machine is not heavily loaded. Linux is very good at memory management and grabs all the memory it can in anticipation of future work.

Tip

A useful trick is to employ the watch command; it repeatedly reruns a command every two seconds by default. If you use

# watch free

you will see the output of the free command updated every two seconds.

Another useful system-monitoring tool is vmstat (virtual memory statistics). This command reports on processes, memory, I/O, and CPU, typically providing an average since the last reboot; or you can make it report usage for a current period of time by telling it the time interval in seconds and the number of iterations you desire, like

# vmstat 5 10

which runs vmstat every five seconds for 10 iterations.

Use the uptime command to see how long it has been since the last reboot and to get an idea of what the load average has been; higher numbers mean higher loads.

Disk Quotas

Disk quotas are a way to restrict the usage of disk space either by user or by groups. Although rarelyif everused on a local or standalone workstation, quotas are definitely a way of life at the enterprise level of computing. Usage limits on disk space not only conserve resources, but also provide a measure of operational safety by limiting the amount of disk space any user can consume.

Disk quotas are more fully covered in Chapter 14, "Managing Users."

Graphical Process and System Management Tools

The GNOME and KDE desktop environments offer a rich set of network and system-monitoring tools. Graphical interface elements, such as menus and buttons, and graphical output, including metering and real-time load charts, make these tools easy to use. These clients, which require an active X session and (in some cases) root permission, are included with Fedora.

If you view the graphical tools locally while they are being run on a server, you must have X properly installed and configured on your local machine. Although some tools can be used to remotely monitor systems or locally mounted remote filesystems, you have to properly configure pertinent X11 environment variables, such as $DISPLAY, to use the software or use the ssh client's -X option when connecting to the remote host.

Fedora no longer includes the xosview client, which provides load, CPU, memory and swap usage, disk I/O usage and activity, page swapping information, network activity, I/O activity, I/O rates, serial port status, and if APM is enabled, the battery level (such as for a laptop). You have to obtain xosview from sourceforge.net and compile it from source.

For example, to see most of these options, start the client like this:

# xosview -geometry 406x488 -font 8x16 +load +cpu +mem +swap \ +page +disk +int +net &

After you press Enter, you see a display as shown in Figure 16.2.

Figure 16.2. The `xosview` client displays basic system stats in a small window. You can choose from several options to determine what it monitors for you.

The display can be customized for a variety of hardware and information, and the xosview client (like most well-behaved X clients) obeys geometry settings such as size, placement, or font. If you have similar monitoring requirements, and want to try a similar but different client from xosview, TRy xcpustate, which has features that enable it to monitor network CPU statistics foreign to Linux. Neither of these applications is installed with the base set of packages; you have to install them manually if you want to use them.

Some of the graphical system- and process-monitoring tools that come with Fedora include the following:

vncviewer AT&T's open source remote session manager (part of the Xvnc package), which can be used to view and run a remote desktop session locally. This software (discussed in more detail in Chapter 19, "Remote Access with SSH and Telnet") requires an active, but background, X session on the remote computer.
gnome-nettool A GNOME-developed tool that enables system administrators to carry out a wide range of diagnostics on network interfaces, including port scanning and route tracing.
ethereal This graphical network protocol analyzer can be used to save or display packet data in real-time and has intelligent filtering to recognize data signatures or patterns from a variety of hardware and data captures from third-party data capture programs, including compressed files. Some protocols include AppleTalk, Andrew File System (AFS), AOL's Instant Messenger, various Cisco protocols, and many more.
gnome-system-monitor Replacing gtop, this tool is a simple process monitor offering two views: a list view and a moving graph. It is accessed via the System Tool menu selection as the System Monitor item (see Figure 16.3).

Figure 16.3. The Process Listing view of the System Monitor.

The System Monitor menu item (shown in Figure 16.3) is found in the System Tools menu. It can be launched from the command line with

# gnome-system-monitor

From the Process Listing view (chosen via the tab in the upper-left portion of the window), select a process and click on More Info at the bottom left of the screen to display details on that process at the bottom of the display. You can select from three views to filter the display, available in the drop-down View list: All Processes, My Processes (those you alone own), or Active Processes (all processes that are active).

Choose Hidden Processes under the Edit command accessible from the top of the display to show any hidden processes (those that the kernel does not enable the normal monitoring tools to see). Select any process and kill it with End Process.

The processes can be reniced by selecting Edit, Change Priority. The View selection from the menu bar also provides a memory map. In the Resource Monitor tab, you can view a moving graph representing CPU and memory usage (see Figure 16.4).

Figure 16.4. The Graph view of the System Monitor. It shows CPU usage, memory/swap usage, and disk usage. To get this view, select the Resource Monitor tab.

KDE Process- and System-Monitoring Tools

KDE provides several process- and system-monitoring clients. The KDE graphical clients are integrated into the desktop taskbar by right-clicking on the taskbar and following the menus.

These KDE monitoring clients include the following:

kdf A graphical interface to your system's filesystem table that displays free disk space and enables you to mount and unmount filesystems using a pointing device.
ksysguard Another panel applet that provides CPU load and memory use information in animated graphs.