To begin, let's look at some commands you can issue from the UNIX prompt to give you some information about your system. The commands I'll cover are: iostat vmstat netstat ps kill showmount swapinfo and swap sar We'll first look at each of these commands so that you get an understanding of the output produced by them and how this output may be used. There are online manual pages for many of the commands covered on your HP-UX system. Please keep in mind that, like all topics we have covered, the output of these commands may differ somewhat among UNIX variants. The basic information produced on most UNIX variants is the same; however, the format of the outputs may differ somewhat. This usually is not significant if you're viewing the outputs; however, if you're writing programs that accept these outputs and manipulate them in some way, then the format of the outputs is important. I/O and CPU Statistics with iostat The iostat command gives you an indication of the level of effort the CPU is putting into I/O and the amount of I/O taking place among your disks and terminals. iostat provides a lot of useful information; however, it acts somewhat differently among UNIX variants. The following examples show issuing iostat on a Solaris system, an HP-UX system, and an AIX system. iostat was not supported on the Linux system I was using for this chapter. Note that on some systems, using the -t option for terminal information produces just terminal information, and on some systems it produces a full output. You will, of course, have to determine the best options for your needs on your UNIX variant. The following examples show the iostat command: Here is a Solaris example executed ten times at five-second intervals: # iostat 5 10 tty fd0 sd1 sd3 sd6 cpu tin tout kps tps serv kps tps serv kps tps serv kps tps serv us sy wt id 0 0 0 0 0 0 0 0 3 0 57 0 79 0 0 7 49 43 0 47 0 0 0 0 0 0 14 2 75 0 0 0 0 2 0 98 0 16 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 98 0 16 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 98 0 16 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 100 0 16 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 100 0 16 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 99 0 16 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 100 0 16 0 0 0 0 0 0 6 1 35 0 0 0 0 4 0 96 0 16 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 100 An HP-UX example includes the -t option executed five times at five-second intervals: # iostat -t 5 5 tty cpu tin tout us ni sy id 1 58 5 1 10 84 device bps sps msps c1t2d0 0 0.0 1.0 tty cpu tin tout us ni sy id 0 30 0 2 26 72 device bps sps msps c1t2d0 484 249.6 1.0 tty cpu tin tout us ni sy id 0 31 1 3 23 73 device bps sps msps c1t2d0 517 256.1 1.0 tty cpu tin tout us ni sy id 0 35 0 2 23 75 device bps sps msps c1t2d0 456 254.4 1.0 tty cpu tin tout us ni sy id 0 744 1 6 38 55 device bps sps msps c1t2d0 155 83.1 1.0 # Here is an AIX example executed ten times at five-second intervals: # iostat 5 10 tty: tin tout avg-cpu: % user % sys % idle % iowait 0.0 0.0 0.3 1.0 98.4 0.3 Disks: % tm_act Kbps tps Kb_read Kb_wrtn hdisk0 0.4 2.7 0.4 2366635 959304 hdisk1 0.0 0.0 0.0 18843 37928 hdisk2 0.1 0.6 0.1 269803 423284 hdisk3 0.0 0.0 0.0 20875 172 cd0 0.0 0.0 0.0 14 0 tty: tin tout avg-cpu: % user % sys % idle % iowait 0.0 108.2 0.0 0.2 99.8 0.0 Disks: % tm_act Kbps tps Kb_read Kb_wrtn hdisk0 0.0 0.0 0.0 0 0 hdisk1 0.0 0.0 0.0 0 0 hdisk2 0.0 0.0 0.0 0 0 hdisk3 0.0 0.0 0.0 0 0 cd0 0.0 0.0 0.0 0 0 tty: tin tout avg-cpu: % user % sys % idle % iowait 0.0 108.4 0.2 0.8 99.0 0.0 Disks: % tm_act Kbps tps Kb_read Kb_wrtn hdisk0 0.0 0.0 0.0 0 0 hdisk1 0.0 0.0 0.0 0 0 hdisk2 0.0 0.0 0.0 0 0 hdisk3 0.0 0.0 0.0 0 0 cd0 0.0 0.0 0.0 0 0 tty: tin tout avg-cpu: % user % sys % idle % iowait 0.0 108.4 0.4 0.2 99.4 0.0 Disks: % tm_act Kbps tps Kb_read Kb_wrtn hdisk0 0.0 0.0 0.0 0 0 hdisk1 0.0 0.0 0.0 0 0 hdisk2 0.0 0.0 0.0 0 0 hdisk3 0.0 0.0 0.0 0 0 cd0 0.0 0.0 0.0 0 0 tty: tin tout avg-cpu: % user % sys % idle % iowait 0.0 108.2 0.4 0.6 99.0 0.0 Disks: % tm_act Kbps tps Kb_read Kb_wrtn hdisk0 0.0 0.0 0.0 0 0 hdisk1 0.0 0.0 0.0 0 0 hdisk2 0.0 0.0 0.0 0 0 hdisk3 0.0 0.0 0.0 0 0 cd0 0.0 0.0 0.0 0 0 tty: tin tout avg-cpu: % user % sys % idle % iowait 0.0 108.4 0.0 0.4 99.6 0.0 Disks: % tm_act Kbps tps Kb_read Kb_wrtn hdisk0 0.0 0.0 0.0 0 0 hdisk1 0.0 0.0 0.0 0 0 hdisk2 0.0 0.0 0.0 0 0 hdisk3 0.0 0.0 0.0 0 0 cd0 0.0 0.0 0.0 0 0 tty: tin tout avg-cpu: % user % sys % idle % iowait 0.0 108.4 0.6 0.0 99.4 0.0 Disks: % tm_act Kbps tps Kb_read Kb_wrtn hdisk0 0.0 0.0 0.0 0 0 hdisk1 0.0 0.0 0.0 0 0 hdisk2 0.0 0.0 0.0 0 0 hdisk3 0.0 0.0 0.0 0 0 cd0 0.0 0.0 0.0 0 0 tty: tin tout avg-cpu: % user % sys % idle % iowait 0.0 108.2 0.2 0.8 99.0 0.0 Disks: % tm_act Kbps tps Kb_read Kb_wrtn hdisk0 0.0 0.0 0.0 0 0 hdisk1 0.0 0.0 0.0 0 0 hdisk2 0.0 0.0 0.0 0 0 hdisk3 0.0 0.0 0.0 0 0 cd0 0.0 0.0 0.0 0 0 tty: tin tout avg-cpu: % user % sys % idle % iowait 0.0 108.4 0.4 0.0 99.6 0.0 Disks: % tm_act Kbps tps Kb_read Kb_wrtn hdisk0 0.0 0.0 0.0 0 0 hdisk1 0.0 0.0 0.0 0 0 hdisk2 0.0 0.0 0.0 0 0 hdisk3 0.0 0.0 0.0 0 0 cd0 0.0 0.0 0.0 0 0 tty: tin tout avg-cpu: % user % sys % idle % iowait 0.0 108.4 0.4 0.4 99.2 0.0 Disks: % tm_act Kbps tps Kb_read Kb_wrtn hdisk0 0.0 0.0 0.0 0 0 hdisk1 0.0 0.0 0.0 0 0 hdisk2 0.0 0.0 0.0 0 0 hdisk3 0.0 0.0 0.0 0 0 cd0 0.0 0.0 0.0 0 0 Here are descriptions of the reports you receive with iostat for terminals, the CPU, and mounted file systems. Because the reports are somewhat different, I have included detailed information from the HP-UX output. A more detailed description of these fields is included in the iostat online manual page available on your HP-UX system. Most of the fields appear in the outputs; however, the outputs of the commands differ somewhat among UNIX variants. For every terminal you have connected (tty), you see a "tin" and "tout," which represent the number of characters read from your terminal and the number of characters written to your terminal, respectively. For your CPU, you see the percentage of time spent in user mode ("us"), the percentage of time spent running user processes at a low priority called nice ("ni"), the percentage of time spent in system mode ("sy"), and the percentage of time the CPU is idle ("id"). For every locally mounted file system, you receive information on the kilobytes transferred per second ("bps"), number of seeks per second ("sps"), and number of milliseconds per average seek ("msps"). For disks that are NFS-mounted or disks on client nodes of your server, you will not receive a report; iostat reports only on locally mounted file systems. When viewing the output of iostat, there are some parameters to take note of. First, note that the time that your CPU is spending in the four categories shown. The CPU report is produced with the -t option. I have worked on systems with poor performance that the administrator assumed to be a result of a slow CPU because the "id" number was very high, indicating that the CPU was actually idle most of the time. If the CPU is mostly idle, the chances are that the bottleneck is not the CPU, but may be I/O, memory, or networking. If the CPU is indeed busy most of the time ("id" is very low), see whether any processes are running "nice" (check the "ni" number). It may be that there are some background processes consuming a lot of CPU time that can be changed to run "nice." Second, compare the number of transfers taking place. These are usually indicated by something like blocks per second (bps), transfers per second (tps), or seeks per second (sps). These numbers give an indication of the amount of activity taking place on a disk. If one volume is consistently much higher than other volumes, then it may be performing an inordinate amount of the workload. Notice on HP-UX that the milliseconds per average seek (msps) for all disks is always equal to one. Virtual Memory Statistics with vmstat vmstat provides virtual memory statistics. It provides information on the status of processes, virtual memory, paging activity, faults, and a breakdown of the percentage of CPU time. vmstat acts somewhat differently among UNIX variants. The following examples show issuing vmstat on a Solaris system, an HP-UX system, an AIX system, and a Linux system. You will, of course, have to determine the best options for your needs on your UNIX variant. In the following examples, the output was produced nine times at five-second intervals. The first argument to the vmstat command is the interval; the second is the number of times you would like the output produced. Solaris example: # vmstat 5 9 procs memory page disk faults cpu r b w swap free re mf pi po fr de sr f0 s1 s3 s6 in sy cs us sy id 0 0 0 4480 4696 0 0 1 0 0 0 0 0 0 0 79 864 130 297 0 7 92 0 0 0 133020 5916 0 3 0 0 0 0 0 0 0 3 0 102 42 24 0 2 98 0 0 0 133020 5916 0 0 0 0 0 0 0 0 0 0 0 70 48 24 0 0 100 0 0 0 133020 5916 0 0 0 0 0 0 0 0 0 0 0 74 42 24 0 0 100 0 0 0 133020 5916 0 0 0 0 0 0 0 0 0 0 0 35 45 23 0 0 99 0 0 0 133020 5916 0 0 0 0 0 0 0 0 0 0 0 65 66 26 0 0 100 0 0 0 133020 5916 0 0 0 0 0 0 0 0 0 0 0 52 44 23 0 1 99 0 0 0 133020 5916 0 0 0 0 0 0 0 0 0 0 0 53 54 24 0 1 99 0 0 0 133020 5916 0 0 0 0 0 0 0 0 0 1 0 60 53 25 0 2 98 HP-UX example: # vmstat 5 9 procs memory page faults cpu r b w avm free re at pi po fr de sr in sy cs us sy id 5 240 0 17646 3979 2 0 0 0 0 0 0 0 778 193 17 3 80 4 242 0 16722 4106 0 0 0 0 0 0 0 814 20649 258 89 10 2 4 240 0 16649 4106 0 0 0 0 0 0 0 83 18384 218 91 9 0 4 240 0 16468 4106 0 0 0 0 0 0 0 792 19552 273 89 11 1 5 239 0 15630 4012 9 0 0 0 0 0 0 804 18295 270 93 8 -1 5 241 0 16087 3934 6 0 0 0 0 0 0 920 21044 392 89 10 0 5 241 0 15313 3952 11 0 0 0 0 0 0 968 20239 431 90 10 0 4 242 0 16577 4043 3 0 0 0 0 0 0 926 19230 409 89 10 0 6 238 0 17453 4122 0 0 0 0 0 0 0 837 19269 299 89 9 2 AIX example: martyp $ vmstat 5 9 kthr memory page faults cpu ----- ----------- ------------------------ ------------ ----------- r b avm fre re pi po fr sr cy in sy cs us sy id wa 0 0 16604 246 0 0 0 0 2 0 149 79 36 0 1 98 0 0 0 16604 246 0 0 0 0 0 0 153 125 41 0 0 99 0 0 0 16604 246 0 0 0 0 0 0 143 83 33 0 0 99 0 0 0 16604 246 0 0 0 0 0 0 140 94 35 0 1 99 0 0 0 16604 246 0 0 0 0 0 0 166 62 32 0 0 99 0 0 0 16604 246 0 0 0 0 0 0 150 102 38 1 0 99 0 0 0 16604 246 0 0 0 0 0 0 183 78 34 0 0 99 0 0 0 16604 246 0 0 0 0 0 0 132 87 33 0 1 99 0 0 0 16604 246 0 0 0 0 0 0 147 84 38 0 0 99 0 Linux example: # vmstat 5 5 procs memory swap io system cpu r b w swpd free buff cache si so bi bo in cs us sy id 1 0 0 9432 1160 656 12024 1 2 14 1 138 274 3 1 96 1 0 0 9684 828 652 12148 0 50 0 14 205 8499 82 18 0 1 0 0 9684 784 652 11508 0 0 0 1 103 8682 81 19 0 1 0 0 9684 800 652 10996 0 0 0 0 101 8683 80 20 0 0 0 0 9772 796 652 9824 12 18 3 4 160 6577 66 17 18 You certainly get a lot for your money out of the vmstat command. Here is a brief description of the categories of information produced by vmstat. I have included a description of the fields in the HP-UX example because of the manual page that appears at the end of this chapter for HP-UX. You can see, however, that the outputs are very similar. Processes are classified into one of three categories: runnable ("r"), blocked on I/O or short-term resources ("b"), or swapped ("w"). Next you will see information about memory. "avm" is the number of virtual memory pages owned by processes that have run within the last 20 seconds. If this number is roughly the size of physical memory minus your kernel, then you are near forced paging. The "free" column indicates the number of pages on the system's free list. It doesn't mean that the process is finished running and these pages won't be accessed again; it just means that they have not been accessed recently. I suggest that you ignore this column. Next is paging activity. The first field ("re") shows the pages that were reclaimed. These pages made it to the free list but were later referenced and had to be salvaged. Next you see the number of faults in three categories: interrupts per second, which usually come from hardware ("in"), system calls per second ("sy"), and context switches per second ("cs"). The final output is CPU usage percentage for user ("us"), system ("sy"), and idle ("id"). This is not as complete as the iostat output, which also shows nice entries. If you are running an I/O-intensive workload, you may indeed see a lot of activity in runnable processes ("r"), blocked processes("b"), and the runnable but swapped ("w") processes. If you have many runnable but swapped processes, then you probably have an I/O bottleneck. Network Statistics with netstat netstat provides information related to network statistics. Because network bandwidth has as much to do with performance as the CPU and memory in some networks, you want to get an idea of the level of network traffic you have. I use two forms of netstat to obtain network statistics. The first is netstat -i,which shows the state of interfaces that are autoconfigured. Although netstat -i gives a good rundown of the primary LAN interface, such as the network it is on, its name, and so on, it does not show useful statistical information. The following shows the output of netstat -i: # netstat -i Name Mtu Network Address Ipkts Ierrs Opkts Oerrs Col lan0 1497 151.150 a4410.e.h.c 242194 120 107665 23 19884 netstat provides a concise output. Put another way, most of what you get from netstat is useful. Here is a description of the nine fields in the netstat example: Name | The name of your network interface (Name), in this case, "lan0." | Mtu | The "maximum transmission unit," which is the maximum packet size sent by the interface card. | Network | The network address of the LAN to which the interface card is connected (151.150). | Address | The host name of your system. This is the symbolic name of your system as it appears in the /etc/hosts file if your networking is configured to use /etc/hosts. | Below is the statistical information. Depending on the system you are using, or revision of OS, you may not see some of these commands: Ipkts | The number of packets received by the interface card, in this case, "lan0." | Ierrs | The number of errors detected on incoming packets by the interface card. | Opkts | The number of packets transmitted by the interface card. | Oerrs | The number of errors detected during the transmission of packets by the interface card. | Col | The number of collisions that resulted from packet traffic. | netstat provides cumulative data since the node was last powered up; therefore, you might have a long elapsed time over which data was accumulated. If you are interested in seeing useful statistical information, you can use netstat with different options. You can also specify an interval to report statistics. I usually ignore the first entry, because it shows all data since the system was last powered up. This means that the data includes non-prime hours when the system was idle. I prefer to view data at the time the system is working its hardest. The following examples show running netstat -I and specifying the lan interface for Solaris, HP-UX, and AIX. These outputs are nearly identical, although the name of the network interface does vary among UNIX variants. The netstat command is run at an interval of five seconds. The Linux version of this command, which is not shown, does not allow me to specify an interval. Solaris example: # netstat -I le0 5 input le0 output input (Total) output packets errs packets errs colls packets errs packets errs colls 116817990 0 3299582 11899 1653100 116993185 0 3474777 11899 1653100 185 0 3 0 0 185 0 3 0 0 273 0 8 0 0 273 0 8 0 0 153 0 3 0 0 153 0 3 0 0 154 0 3 0 0 154 0 3 0 0 126 0 3 0 0 126 0 3 0 0 378 0 2 0 0 378 0 2 0 0 399 0 4 0 0 399 0 4 0 0 286 0 2 0 0 286 0 2 0 0 HP-UX example (10.x): # netstat -I lan0 5 (lan0)-> input output (Total)-> input output packets errs packets errs colls packets errs packets errs colls 269841735 27 256627585 1 5092223 281472199 27 268258048 1 5092223 1602 0 1238 0 49 1673 0 1309 0 49 1223 0 1048 0 25 1235 0 1060 0 25 1516 0 1151 0 42 1560 0 1195 0 42 1553 0 1188 0 17 1565 0 1200 0 17 2539 0 2180 0 44 2628 0 2269 0 44 3000 0 2193 0 228 3000 0 2193 0 228 2959 0 2213 0 118 3003 0 2257 0 118 2423 0 1981 0 75 2435 0 1993 0 75 AIX example: # netstat -I en0 5 input (en0) output input (Total) output packets errs packets errs colls packets errs packets errs colls 46333531 0 1785025 0 0 47426087 0 2913405 0 0 203 0 1 0 0 204 0 2 0 0 298 0 1 0 0 298 0 1 0 0 293 0 1 0 0 304 0 12 0 0 191 0 1 0 0 191 0 1 0 0 150 0 2 0 0 151 0 3 0 0 207 0 3 0 0 218 0 15 0 0 162 0 3 0 0 162 0 4 0 0 120 0 2 0 0 120 0 2 0 0 With this example, you get multiple outputs of what is taking place on the LAN interface, including the totals on the right side of the output. As I mentioned earlier, you may want to ignore the first output, because it includes information over a long time period. This may include a time when your network was idle, and therefore the data may not be important to you. You can specify the network interface on which you want statistics reported by using -I interface; in the case of the example, it was -I and either le0, lan0, or en0. An interval of five seconds was also used in this example. Analyzing netstat statistical information is intuitive. You want to verify that the collisions (Colls) are much lower than the packets transmitted (Opkts). Collisions occur on output from your LAN interface. Every collision your LAN interface encounters slows down the network. You will get varying opinions about what is too many collisions. If your collisions are less than 5 percent of "Opkts," you're probably in good shape and better off spending your time analyzing some other system resource. If this number is high, you may want to consider segmenting your network in some way such as installing networking equipment between portions of the network that don't share a lot of data. As a rule of thumb, if you reduce the number of packets you are receiving and transmitting ("Ipkts" and "Opkts"), then you will have less overall network traffic and fewer collisions. Keep this in mind as you plan your network or upgrades to your systems. You may want to have two LAN cards in systems that are in constant communication. That way, these systems have a "private" LAN over which to communicate and do not adversely affect the performance of other systems on the network. One LAN interface on each system is devoted to intrasystem communication. This provides a "tight" communication path among systems that usually act as servers. The second LAN interface is used to communicate with any systems that are usually clients on a larger network. You can also obtain information related to routing with netstat (see Chapter 13). The -r option to netstat shows the routing tables, which you usually want to know about, and the -n option can be used to print network addresses as numbers rather than as names. In the following examples, netstat is issued with the -r option (this will be used when describing the netstat output) and the -rn options, so that you can compare the two outputs: $netstat-r Routing tables |
---|
Destination | Gateway | Flags | Refs | Use | Interface | Pmtu |
---|
hp700 | localhost | UH | 0 | 28 | lo0 | 4608 | default | router1 | UG | 0 | 0 | lan0 | 4608 | 128.185.61 | system1 | U | 347 | 28668 | lan0 | 1500 | $netstat -rn Routing tables |
---|
Destination | Gateway | Flags | Refs | Use | Interface | Pmtu |
---|
127.0.0.1 | 127.0.0.1 | UH | 0 | 28 | lo0 | 4608 | default | 128.185.61.1 | UG | 0 | 0 | lan0 | 4608 | 128.185.61 | 128.185.61.2 | U | 347 | 28668 | lan0 | 1500 | With netstat, some information is provided about the router, which is the middle entry. The -r option shows information about routing, but there are many other useful options to this command are available. Of particular interest in this output is "Flags," which defines the type of routing that takes place. Here are descriptions of the most common flags, which may be different among UNIX variants, from the online manual page on my HP-UX system. 1=U | Route to a network via a gateway that is the local host itself. | 3=UG | Route to a network via a gateway that is the remote host. | 5=UH | Route to a host via a gateway that is the local host itself. | 7=UGH | Route to a host via a remote gateway that is a host. | The first line is for the local host, or loopback interface called, lo0 at address 127.0.0.1 (you can see this address in the netstat -rn example). The UH flags indicate that the destination address is the local host itself. This Class A address allows a client and server on the same host to communicate with one another via TCP/IP. A datagram sent to the loopback interface won't go out onto the network; it will simply go through the loopback. The second line is for the default route. This entry says to send packets to Router 1 if a more specific route can't be found. In this case, the router has a UG under Flags. Some routers are configured with a U; others, such as the one in this example, with a UG. I've found that I usually end up determining through trial and error whether a U or UG is required. If there is a U in Flags and I am unable to ping a system on the other side of a router, a UG entry usually fixes the problem. The third line is for the system's network interface, lan0. This means to use this network interface for packets to be sent to 128.185.61. |