Your system has a lot of different resources that can be used by processes. These resources include CPU processing time, disk space, disk I/O, RAM, graphic memory, and network traffic. Fortunately, there are ways to measure each of these resources.
Linux provides a virtual file system that is mounted in the /proc directory. This directory lists system resources and running processes. For example:
$ ls -F /proc 1/ 3910/ 4133/ 4351/ bus/ iomem partitions 1642/ 3930/ 4135/ 4352/ cmdline ioports pmu/ 1645/ 3945/ 4137/ 4363/ cpuinfo irq/ scsi/ 1650/ 3951/ 4167/ 4364/ crypto kallsyms self@ 1736/ 3993/ 4220/ 4382/ devices kcore slabinfo 1946/ 4/ 4224/ 5/ device-tree/ key-users stat 2/ 4009/ 4237/ 54/ diskstats kmsg swaps 20/ 4027/ 4250/ 55/ dma loadavg sys/ 3/ 4057/ 4270/ 56/ driver/ locks sysrq-trigger 3310/ 4072/ 4286/ 57/ execdomains mdstat sysvipc/ 3333/ 4073/ 4299/ 6/ fb meminfo tty/ 3335/ 4081/ 4347/ 651/ filesystems misc uptime 3356/ 4091/ 4348/ apm fs/ modules version 3402/ 4092/ 4349/ asound/ ide/ mounts@ vmstat 3904/ 4127/ 4350/ buddyinfo interrupts net/ zoneinfo
The numbered directories match every running process. In each directory, you will find the actual running command-line and running environment. Device drivers and the kernel use non-numeric directories. These show system resources. For example, /proc/iomem shows the hardware I/O map and /proc/cpuinfo provides information about the system CPUs.
Although /proc is useful for debugging, applications should be careful when depending on it. In particular, everything is dynamic: process directories may appear and vanish quickly and some resources constantly change.
The CPU load can be measured in a couple of ways. The uptime command provides a simple summary. It lists three values: load averages for 1 minute, 5 minutes, and 15 minutes. The load is a measurement of queue time. If you have one CPU and the load is less than 1.0, then you are not consuming all of the CPU resources. A load of 2.0 means all resources are being consumed and you need twice as many CPUs to reduce any wait-time. If you have two CPUs, then a load of 1.0 indicates that both processors are operating at maximum capacity. Although a load of 1.0 won't seem sluggish, a load of 5.0 can be noticeably detectable because commands may need to wait a few seconds few moments before being processed.
While uptime provides a basic metric, top gives finer details. While running top, you can press 1 to see the load per CPU at the top of the screen and you can see which processes are consuming the most CPU resources. The command ps aux also shows CPU resources per process.
The commands df and du are used to identify disk space. The disk-free command (df, also sometimes called disk-full or disk-file system) lists every mounted partition and the amount of disk usage. The default output shows the information in blocks. You can also see the output in a human-readable form (-h) and see the sizes in kilobytes or megabytes: df -h. The df command also allows you to specify a file or directory name. In this case, it will show the disk usage for the partition containing the file (or directory). For example to see how much space if in the current directory, use:
$ df . # default output Filesystem 1K-blocks Used Available Use% Mounted on /dev/hda1 154585604 72737288 73995748 50% / $ df -h . # human readable form Filesystem Size Used Avail Use% Mounted on /dev/hda1 148G 70G 71G 50% /
You can also use the System Monitor (System Administration System Monitor) to graphically show the df results (see Figure 7-2).
Figure 7-2: System Monitor showing available disk space
The disk-usage (du) command shows disk usage by directory. When used by itself, it will display the disk space in your current directory and every subdirectory. If you specify a directory, then it starts there instead. To see the biggest directories, you can use a command like du | sort -rn | head. This will sort all directories by size and display the top 10 biggest directories. Finally, you can use the -s parameter to stop du from listing the sizes from every subdirectory. When I am looking for disk hogs in my directory, I usually use du -s * | sort -rn | head. This lists the directories in size order. I can then enter the biggest directory and repeat the command until I find the largest files.
Tip | The du command looks at every file in every subdirectory. If you have thousands of files, then this could take a while. When looking for large directories, consider the ones that take the longest to process. If every directory takes a second to display and one directory takes a minute, then you can press Ctrl+C because you probably found the biggest directory. |
All processes that access a disk do so over the same I/O channel. If the channel becomes clogged with traffic, then the entire system may slow down. It is very easy for a low-CPU application to consume most of the disk I/O. While the system load will remain low, the computer will appear sluggish.
If the system seems to be running slowly, you can use iostat (sudo apt-get install sysstat) to check the performance (see Listing 7-2). Besides showing the system load, the I/O metrics from each device are displayed. I usually use iostat with the watch command in order to identify devices that seem overly active.
watch --interval 0.5 iostat
Listing 7-2: Installing and Using iostat
$ sudo apt-get install sysstat # install iostat $ iostat Linux 2.6.15-26-686 (chutney) 09/30/2006 avg-cpu: %user %nice %system %iowait %steal %idle 0.22 0.00 0.13 0.10 0.00 99.55 Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn hda 1.98 19.46 23.12 2805161 3332264 hdb 0.18 1.39 0.46 200903 66000 sda 0.01 0.06 0.01 8612 1536 sdb 0.01 0.07 0.01 9518 1536 md0 0.01 0.11 0.01 15450 1392
After finding which device is active, you can identify where the device is mounted using the mount command:
$ mount /dev/hda1 on / type ext3 (rw,errors=remount-ro) proc on /proc type proc (rw) /sys on /sys type sysfs (rw) varrun on /var/run type tmpfs (rw) varlock on /var/lock type tmpfs (rw) udev on /dev type tmpfs (rw) devpts on /dev/pts type devpts (rw,gid=5,mode=620) devshm on /dev/shm type tmpfs (rw)
Now that you know which device is active and where it is used, you can use lsof to identify which processes are using the device. For example, if device hda is the most active and it is mounted on /, then you can use lsof / to list every process accessing the directory. If a raw device is being used, then you can specify all devices with lsof /dev or a single device (for example, hda) using lsof /dev/hda.
Note | Unfortunately, there is no top-like command for disk I/O. You can narrow down the list of suspected applications using lsof, but you cannot identify which application is consuming most of the disk resources. |
RAM is a limited resource on the system. If your applications allocate all available RAM, then the kernel will begin swapping memory to disk. Although swap space can allow you to run massively large applications, swap is also very slow compared to just using RAM. There are a couple of ways to view swap usage. The command swapon -s will list the available swap space and show the usage. There is usually a little swap space used, but if it is very full then you either need to allocate more swap space, install more RAM, or find out what is consuming the available RAM. The System Monitor (System Administration System Monitor) enables you to graphically view the available memory usage and swap space and identify if it is actively being used (see Figure 7-3).
Figure 7-3: The System Monitor displaying CPU, memory, swap, and network usage
To identify which applications are consuming memory, use the top or ps aux commands. Both of these commands show memory allocation per process. In addition, the pmap command can show you memory allocations for specific process IDs.
The amount of memory on your video card will directly impact your display. If you have an old video card with 256 KB of RAM, then the best you can hope for is 800x600 with 16 colors. Most high-end video cards today have upwards of 128 MB of RAM, allowing monster resolutions like 1280x1024 with 32 million colors. More memory also eases animation for games and desktops. While one set of video memory holds the main picture, other memory sections can act as layers for animated elements.
There is no simple way to determine video memory. If you have a PCI memory card, then the command lspci -v will show you all PCI cards (including your video card) and all memory associated with the card. For example:
$ lspci -v | more 0000:01:00.0 VGA compatible controller: nVidia Corporation NV18 [GeForce4 MX 400 0 AGP 8x] (rev c1) (prog-if 00 [VGA]) Subsystem: Jaton Corp: Unknown device 0000 Flags: bus master, 66MHz, medium devsel, latency 248, IRQ 177 Memory at fa000000 (32-bit, non-prefetchable) [size=16M] Memory at f0000000 (32-bit, prefetchable) [size=128M] Expansion ROM at fbee0000 [disabled] [size=128K] Capabilities: <available only to root>
This listing shows an NVIDIA NV18 video card with 128 MB of video RAM.
Tip | On large supercomputers, lspci not only shows what is attached but also where. For example, if you have eight network cards then it can identify which slot each card is in. This is extremely useful for diagnostics in a mission-critical environment with fail-over hardware support. One example is to use (lspci -t ; lspci -v) | less to show the bus tree and each item's details. |
Just as disk I/O can create a performance bottleneck, so can network I/O. While some applications poll the network for data and increase CPU load when the network is slow, most applications just wait until the network is available and do not impact the CPU's load.
If the computer seems sluggish when accessing the network, then you can check the network performance using netstat -i inet:
$ netstat -i inet Kernel Interface table Iface MTU Met RX-OK RX-ERR RX-DRP RX-OVR TX-OK TX-ERR TX-DRP TX-OVR Flg eth0 1500 0 338386 0 0 0 737350 0 0 0 BMRU lo 16436 0 786 0 0 0 786 0 0 0 LRU vmnet 1500 0 0 0 0 0 465 0 0 0 BMRU vmnet 1500 0 0 0 0 0 465 0 0 0 BMRU
This shows the amount of traffic on each network interface as well as any network errors, dropped packets, and overruns. This also shows the name of the network interface (for example, eth0). When checking network usage, I usually use netstat with the watch command so I can see network usage over time:
watch --interval 0.5 netstat -i inet
Tip | The netstat -i inet command shows the number of packets from every interface. You can also use ifconfig (for example, ifconfig eth0) to see more detail; ifconfig shows the number of packets and number of bytes from a particular network interface. |
The netstat -t and netstat -u commands allow you to see which network connections are active. The -t option shows TCP traffic, and -u shows UDP traffic. There are many other options including –protocol=ip to show all IP (IPv4) connections, and IPv6 connections are listed with –protocol=ip6.
To identify which processes are using the network, you can use lsof. The -i4 parameter shows which processes have IPv4 connections, -i6 displays IPv6, -i tcp lists TCP, and -i udp displays applications with open UDP sockets:
$ lsof -i4 -n # show network processes and give IP addresses as numbers COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME ssh 8699 mark 3u IPv4 120398 TCP 10.3.1.5:41525->10.3.1.3:ssh (ESTABLISHED) ssh 8706 mark 3u IPv4 120576 TCP 10.3.1.5:41526->10.3.7.245:ssh (ESTABLISHED)