System Monitoring Tools

< Day Day Up >

Monitoring your server or workstation is an important task, especially in a commercial or corporate environment. Whether you are working on critical application programming or conducting e-commerce on the Internet, you will want to track your system's health signs while it's running. Good Fedora Core Linux system administrators are also quite vigilant about watching running processes on their systems, including resources such as CPU and disk, memory, network, and printer usage. Even though the task is not strictly part of standard security operations, such as examining system logs and network traffic, monitoring resource usage can help you spot misuse and avoid developing problems, such as unwanted intruder connections to your network.

This chapter introduces just a few of the basic tools and approaches used to monitor a running Linux system. Some of the tools focus on in-memory processes, whereas others, such as filesystem reporting and network monitoring, have more comprehensive uses. You will also see how to control some system processes using various command-line and graphical tools included with Fedora Core.

Console-based Monitoring

Traditional Unix systems have always included the ps or process display command. This command lists the running processes on the system and identifies who owns them and how much of the system resources are being used.

Because of the architecture of the Linux kernel and its memory management, Linux also provides much process reporting and control via the command line. This feature can be accessed manually through the /proc filesystem, a pseudo-filesystem used as a direct interface to the kernel. (You see how it's used in the upcoming discussion of the ps command.) The /proc filesystem is frequently used by application programmers who construct an interface for the raw information it provides. This filesystem is too complex to adequately deal with in the context of this chapter, but you can benefit from reading the proc man page to examine the list and description of the scores of kernel values available. You then can write shell scripts (refer to Chapter 14, "Automating Tasks") to use those values as needed.

Processes can be controlled at the command line as well. Whenever a program or command is launched on your Fedora Core Linux system, the process started by the kernel is assigned an identification number, called a PID or Process ID. This number is (generally) displayed by the shell if the program is launched in the background, like this:

 $ xosview & [1] 11670

In this example, the xosview client has been launched in the background, and the (bash) shell reported a shell job number ([1] in this case). A job number or job control is a shell-specific feature that allows a different form of process control (such as sending or suspending programs to the background and retrieving background jobs to the foreground; see your shell's manual pages for more information if you are not using bash).

The second number displayed (11670, in this example) represents the Process ID. You can get a quick list of your processes by using the ps command like this:

 $ ps   PID TTY          TIME CMD   736 tty1     00:00:00 bash   743 tty1     00:00:00 startx   744 tty1     00:00:00 tee   752 tty1     00:00:00 xinit   756 tty1     00:00:09 kwm   ... 11670 pts/4    00:00:00 xosview 11671 pts/4    00:00:00 ps

Note that not all output from the display is shown here. But as you can see, the output includes the process ID, abbreviated as PID, along with other information, such as the name of the running program. Like any Unix command, many options are available; the proc man page has a full list. A most useful option is aux, which provides a friendly list of all the processes. You should also know that ps works not by polling memory, but through the interrogation of the Linux /proc or process filesystem. (ps is one of the interfaces mentioned at the beginning of this section.)

The /proc directory contains quite a few files some of which include constantly updated hardware information (such as battery power levels, and so on). Linux administrators will often pipe the output of ps tHRough a member of the grep family of commands in order to display information about a specific program, perhaps like this:

 $ ps aux | grep xosview USER       PID %CPU %MEM   VSZ  RSS TTY      STAT START   TIME COMMAND andrew          11670  0.3    1.1      2940  1412 pts/4S    14:04        0:00    xosview

This example returns the owner (the user who launched the program) and the PID, along with other information, such as the percentage of CPU and memory usage, size of the command (code, data, and stack), time (or date) the command was launched, and name of the command. Processes can also be queried by PID like this:

 $ ps 11670   PID TTY      STAT   TIME COMMAND 11670 pts/4    S      0:00 xosview

You can use the PID to stop a running process by using the shell's built-in kill command. This command will ask the kernel to stop a running process and reclaim system memory. For example, to stop the xosview client in the example, use the kill command like this:

 $ kill 11670

After you press Enter (or perhaps press Enter again), the shell might report

 [1]+  Terminated              xosview

Note that users can only kill their own processes, but root can kill them all. Controlling any other running process requires root permission, which should be used judiciously (especially when forcing a kill by using the -9 option); by inadvertently killing the wrong process through a typo in the command, you could bring down an active system.

Using the `kill` Command to Control Processes

The kill command is a basic Unix system command. We can communicate with a running process by entering a command into its interface, such as when we type into a text editor. But some processes (usually system processes rather than application processes) run without such an interface, and we need a way to communicate with them as well, so we use a system of signals. The kill system accomplishes just that by sending a signal to a process, and we can use it to communicate with any process. The general format of the kill command is

 # kill option PID

A number of signal options can be sent as words or numbers, but most are of interest only to programmers. One of the most common ones you will use is

 # kill PID

This tells the process with PID to stop; you supply the actual PID.

 # kill -9 PID

This is the signal for kill (9 is the number of the SIGKILL signal); use this combination when the plain kill shown previously doesn't work.

 # kill -SIGHUP PID

This is the signal to "hangup" stop and then clean up all associated processes as well. (Its number is -1.)

As you become proficient at process control and job control, you will learn the utility of a number of kill options. A full list of signal options can be found in the man signal page.

Using Priority Scheduling and Control

Every process cannot make use of the systems resources (CPU, memory, disk access, and so on) as it pleases. After all, the kernel's primary function is to manage the system resources equitably. It does this by assigning a priority to each process so that some processes get better access to system resources and some processes might have to wait longer until their turn arrives. Priority scheduling can be an important tool in managing a system supporting critical applications or in a situation in which CPU and RAM usage must be reserved or allocated for a specific task. Two legacy applications included with Red Linux include the nice and re-nice commands. (nice is part of the GNU sh-utils package, whereas re-nice is inherited from BSD Unix.)

The nice command is used with its -n option, along with an argument in the range of -20 to 19, in order from highest to lowest priority (the lower the number, the higher the priority). For example, to run the xosview client with a low priority, use the nice command like this:

 $ nice -n 12 xosview &

The nice command is typically used for disk or CPU-intensive tasks that might be obtrusive or cause system slowdown. The re-nice command can be used to reset the priority of running processes or control the priority and scheduling of all processes owned by a user. Regular users can only numerically increase process priorities (for example, make tasks less important) using this command, but the root operator can use the full nice range of scheduling ( 20 19).

System administrators can also use the time command (here, time is used to measure the duration of elapsed time; the command that deals with civil and sidereal time is the date command) to get an idea about how long and how much of a system's resources will be required for a task (such as a shell script). This command is used with the name of another command (or script) as an argument like this:

 # time -p find / -name core -print /dev/core /proc/sys/net/core real 1.20 user 0.14 sys 0.71

Output of the command displays the time from start to finish, along with user and system time required. Other factors you can query include memory, CPU usage, and filesystem Input/Output (I/O) statistics. See the time command's manual page for more details.

Nearly all graphical process monitoring tools include some form of process control or management. Many of the early tools ported to Linux were clones of legacy Unix utilities. One familiar monitoring (and control) program is top. Based on the ps command, the top command provides a text-based display, constantly updated console-based output showing the most CPU-intensive processes currently running. It can be started like this:

 # top

After you press Enter, you will see a display as shown in Figure 15.1. The top command has a few interactive commands: pressing h displays the help screen; pressing k prompts you to enter the pid of a process to kill; pressing n prompts you to enter the pid of a process to change its nice value. The top man page describes other commands and includes a detailed description of what all the columns of information top can display actually represent; have a look at top's well-written man page.

Figure 15.1. The `top` command can be used to monitor and control processes. Here, we are being prompted to re-`nice` a process.

The top command displays quite a bit of information about your system. Processes can be sorted by PID, age, CPU or memory usage, time, or user. This command also provides process management, and system administrators can use its k or r keypress commands to kill or reschedule running tasks.

The top command uses a fair amount of memory, so you might want to be judicious in its use and not leave it running all the time.

After you have finished, simply press q to quit top.

Displaying Free and Used Memory with `free`

Although top includes some memory information, the free utility will display the amount of free and used memory in the system in kilobytes (the -m switch displays in megabytes). On one system, the output looks like this:

 # free                    total       used        free       shared    buffers     cached Mem                255452      251132      4320       0         19688       77548 -/+ buffers/cache: 153896      101556 Swap:              136512      31528       104984

This output describes a machine with 256MB of RAM memory and a swap partition of 137MB. Note that some swap is being used although the machine is not heavily loaded. Linux is very good at memory management and "grabs" all the memory it can in anticipation of future work.

TIP

A useful trick is to employ the watch command; it will repeatedly rerun a command every 2 seconds by default. If you use

 #  watch free

you'll see the output of the free command updated every 2 seconds.

Another useful system monitoring tool is vmstat (virtual memory statistics). This command reports on processes, memory, I/O, and CPU typically providing an average since the last reboot, or you can make it report usage for a current period of time by telling it the time interval in seconds and the number of iterations you desire, like

 # vmstat 5 10

which will run vmstat every 5 seconds for 10 iterations.

Use the uptime command to see how long it has been since the last reboot and to get an idea of what the load average has been; higher numbers mean higher loads.

Disk Quotas

Disk quotas are a way to restrict the usage of disk space either by user or by groups. Although rarely if ever used on a local or standalone workstation, quotas are definitely a way of life at the enterprise level of computing. Usage limits on disk space not only conserve resources, but also provide a measure of operational safety by limiting the amount of disk space any user can consume.

Disk quotas are more fully covered in Chapter 13, "Managing Users."

Graphical Process and System Management Tools

The GNOME and KDE desktop environments offer a rich set of network and system monitoring tools. Graphical interface elements, such as menus and buttons, and graphical output, including metering and real-time load charts, make these tools easy to use. These clients, which require an active X session and in some cases (but not all) root permission, are included with Fedora Core.

If you view the graphical tools locally while they are being run on a server, you must have X properly installed and configured on your local machine. Although some tools can be used to remotely monitor systems or locally mounted remote filesystems, you'll need to properly configure pertinent X11 environment variables, such as $DISPLAY, to use the software or use the ssh client's -X option when connecting to the remote host.

Fedora Core includes the xosview client, which provides load, CPU, memory and swap usage, disk I/O usage and activity, page swapping information, network activity, I/O activity, I/O rates, serial port status, and if APM is enabled, the battery level (such as for a laptop).

For example, to see most of these options, start the client like this:

 # xosview -geometry 406x488 -font 8x16 +load +cpu +mem +swap \  +page +disk +int +net &

After you press Enter, you will see a display as shown in Figure 15.2.

Figure 15.2. The `xosview` client displays basic system stats in a small window. You can choose from several options to determine what it will monitor for you.

The display can be customized for a variety of hardware and information, and the xosview client (like most well-behaved X clients) obeys geometry settings such as size, placement, or font. If you have similar monitoring requirements, but want to try a similar but different client from xosview, TRy xcpustate, which has features that enable it to monitor network CPU statistics foreign to Linux. Neither of these applications is installed with the base set of packages; you need to install them manually if you want to use them.

Some of the graphical system and process monitoring tools included with Fedora Core Linux include the following:

vncviewer AT&T's open-source remote session manager (part of the Xvnc package), which can be used to view and run a remote desktop session locally. This software (discussed in more detail in Chapter 25, "Remote System Access with SSH and Telnet") requires an active, but background X session on the remote computer.
nmapfe A GTK+ graphical front end to the nmap command. This client provides system administrators with the ability to scan networks to monitor the availability of hosts and services.
ethereal This graphical network protocol analyzer can be used to save or display packet data in real time and has intelligent filtering to recognize data signatures or patterns from a variety of hardware and data captures from third-party, data-capture programs, including compressed files. Some protocols include AppleTalk, Andrew File System (AFS), AOL's Instant Messenger, various Cisco protocols, and many more.
gnome-system-monitor Replacing gtop, this tool is a simple process monitor offering two views: a list view and a moving graph. It is accessed via the System Tool menu selection as the System Monitor item (see Figure 15.3).
Figure 15.3. The Process Listing view of the System Monitor.

The System Monitor menu item (shown in Figure 15.3) is found in the System Tools menu. It can be launched from the command line with

 # gnome-system-monitor

From the Process Listing view (chosen via the tab in the upper left portion of the window), select a process and click on More Info at the bottom left of the screen to display details on that process at the bottom of the display. You can select from three views to filter the display, available in the drop-down View list: All Processes, My Processes (those you alone own), or Active Processes (All Processes that are active).

Choose Hidden Processes under the Edit command accessible from the top of the display to show any hidden processes (those that the kernel does not enable the normal monitoring tools to see). Select any process and kill it with End Process.

The processes can be re-niced by selecting Edit, Change Priority. The View selection from the menu bar also provides a memory map. In the Resource Monitor tab, you can view a moving graph representing CPU and memory usage (see Figure 15.4).

Figure 15.4. The Graph view of the System Monitor. It shows CPU usage, Memory/Swap usage, and disk usage. To get this view, select the Resource Monitor tab.

KDE Process and System Monitoring Tools

KDE provides several process and system monitoring clients. The KDE graphical clients are integrated into the desktop taskbar by right-clicking on the taskbar and following the menus.

These KDE monitoring clients include the following:

kdf A graphical interface to your system's filesystem table that displays free disk space and enables you to mount and unmount filesystems using a pointing device.
ksysguard Another panel applet that provides CPU load and memory use information in animated graphs.

< Day Day Up >

Console-based Monitoring

Using the kill Command to Control Processes

Using Priority Scheduling and Control

Figure 15.1. The top command can be used to monitor and control processes. Here, we are being prompted to re-nice a process.

Displaying Free and Used Memory with free