The system manager is responsible not only for the provision of an IT service to customers, but also for the maintenance of an acceptable level of performance. To this end, the system manager must ensure that the systems under his control are not suffering from poor performance or substandard response times. Performance monitoring is normally carried out by the system administrator, but the system manager needs to have a good appreciation of what is involved and the tools that are available to obtain the required information. This is especially important because the system manager will be compiling the periodic reports that undoubtedly will form part of an ongoing SLA. Regular monitoring of the systems builds a crucial library of information, highlighting trends of both usage and performance. For example, the data that is gathered could provide an early warning indicator of increased usage or load on the systems, perhaps necessitating a hardware upgrade or even a replacement system. The important point is that, armed with this information, the system manager should not face any surprises ; any additional processing power or other resource that might be needed can be planned for and implemented before it becomes a performance issue. The other side of performance monitoring is to investigate problems as they occur, to try to identify bottlenecks that could be affecting the overall performance of the system, or to gather evidence that the loading is not evenly balancedthis is especially visible with disks. Frequently, a problem may manifest itself in some way but actually be caused by something else. This is one reason why the activity of monitoring is so useful:The process of elimination helps to narrow down the true cause of the problem. The next subsections describe the kind of problems likely to be encountered , as well as the tools provided with the standard Solaris release. These sections also discuss some utilities and tools that are available both commercially and in the public domain. What Are You Looking For?Monitoring the performance of the system generally involves looking for areas of weakness or contention . Bottlenecks can slow down the rest of the system; for example, a badly balanced disk setup that is overloaded with requests and is causing the processor to regularly wait for data. The monitoring process also can provide evidence that the current resources are starting to struggle with the load being placed upon them. The system manager can use this information as evidence that an upgrade is required (possibly due to increased usage by the customer) and also to justify why the consumer department should pay for it to be able to maintain the required level of service. Solaris UtilitiesSolaris provides a comprehensive set of tools to allow monitoring of the system's performance. These are discussed in the following paragraphs to show the type of information that can be obtained and to tell what to actually look for to ascertain whether there is a performance problem. The final utility in this section, top, is available in the public domain and is now included with the standard release of the Solaris operating environment (as of Solaris 8). It is extremely useful for seeing the processes that are utilizing most of the system resources. vmstatThe vmstat utility is listed as a tool for reporting virtual memory statistics, but it does much more than that. It also gives a good overall picture of how the system is performing. Listing 9.3 shows the output from the vmstat command. Listing 9.3 Output from the vmstat Command Using an Interval Period of 3 Seconds#vmstat 3 procs memory page disk faults cpu r b w swap free re mf pi po fr de sr s0 s6 s7 s8 in sy cs us sy id 0 0 0 1184 96992 0 6 11 9 12 0 0 2 0 0 2 260 1253 429 64 8 29 6 0 0 2873928 33128 0 0 5 18 18 0 0 23 0 0 66 1754 4880 1806 81 19 0 3 0 0 2873928 33136 0 0 0 2 2 0 0 26 0 0 33 1281 5937 1523 83 17 0 3 0 0 2873928 33128 0 0 2 2 2 0 0 15 0 0 15 988 6416 1523 83 16 0 1 0 0 2873928 33128 1 0 8 18 18 0 0 16 0 0 23 1229 5428 1755 77 22 0 2 0 0 2873928 33128 0 0 2 2 2 0 0 28 0 0 6 966 6366 1342 84 16 0 2 1 0 2873928 33136 0 0 5 8 8 0 0 29 0 0 30 1405 6175 1925 81 18 1 2 0 0 2873928 33128 15 0 525 312 312 0 0 31 0 0 35 1452 3331 2005 76 20 4 1 2 0 2873928 32744 15 1 8 594 594 0 0 9 0 0 77 1838 3356 2047 74 23 3 2 0 0 2873928 33128 4 0 8 45 45 0 0 24 0 0 19 1178 3077 1616 80 20 1 1 0 0 2873928 33120 14 1 8 128 128 0 0 28 0 0 14 975 3099 1439 70 23 6 1 1 0 2873928 33120 5 0 16 66 66 0 0 27 0 0 32 1386 3764 1953 74 22 4 2 1 0 2873928 33128 3 0 2 90 90 0 0 34 0 0 81 1984 2961 1820 78 18 3 2 0 0 2873928 33128 2 0 2 24 24 0 0 2 0 0 26 1255 3518 1944 78 21 0 2 1 0 2873928 33128 3 0 5 58 58 0 0 30 0 0 74 1804 2810 1656 75 23 2 2 0 0 2873928 33184 10 0 592 77 77 0 0 8 0 0 32 1129 2972 1490 80 20 1 3 0 0 2873928 33176 1 0 202 10 10 0 0 31 0 0 26 1305 3315 1825 71 26 3 1 0 0 2873928 33128 1 0 2 13 13 0 0 27 0 0 14 952 3063 1366 68 24 8 2 0 0 2873928 33128 0 0 2 8 8 0 0 12 0 0 40 1529 3514 2060 73 24 4 2 2 0 2873928 33120 5 0 13 261 261 0 0 30 0 0 83 2260 3466 2180 71 18 11 Ignore the first entry here because it is a summary of all activity since the system was last rebooted. The information provided in this entry is meaningless. This system is heavily loaded and could benefit from additional processor power, as shown by the combination of the number of runnable processes in the "r" column, coupled with the fact that the CPU is running almost constantly at near 100% busy, with very little idle time. Additionally, disk balancing is not ideal, with s0 and s8 bearing most of the work while s7 has no activity at all. There are also some blocked processes, identified by the "b" column. This indicates that the process had to wait for I/O, probably from disk. The output in this listing also shows that there is no shortage of physical memory otherwise , the scan rate (sr) column would be showing high numbers (more than 200300). The page-in and page-out columns (pi and po) would be high (maybe several thousand) if the system were being used as an NFS server. These show both file system I/O activity and virtual memory activity. The vmstat command shows at a glance the overall status of the various components . For fuller details of processor usage on systems containing more than one processor, use mpstat . Detailed disk information can be found using the iostat command; further statistics about the swap space can be displayed with the swap command. These commands are discussed next. mpstatThe mpstat command produces a tabular report for each processor in a multiprocessor environment, providing useful information on how the CPUs are performing, how busy they are, whether the load is evenly balanced among them, and also how much time is spent waiting for other resources. Listing 9.4 displays a sample output from running the mpstat command. Listing 9.4 Output from the mpstat Command Using an Interval Period of 5 Seconds#mpstat 5 CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl 0 2 0 8 39 3 159 55 6 2 0 450 67 7 1 26 1 2 0 14 275 52 170 56 6 2 0 522 58 8 2 32 CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl 0 23 0 3 306 49 1264 553 29 7 0 2204 86 13 0 1 1 51 0 16 543 26 1208 581 29 7 0 2036 82 18 0 1 CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl 0 0 0 2 88 32 637 274 13 4 0 1791 83 17 0 0 1 0 0 6 279 17 193 91 13 5 0 388 92 8 0 0 CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl 0 0 0 1 77 1 269 105 32 5 0 506 89 4 0 7 1 0 0 5 297 58 573 226 33 4 0 1123 67 5 0 28 CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl 0 0 0 1 85 2 213 95 24 2 0 778 73 19 0 8 1 0 0 7 267 36 907 404 23 3 0 2003 89 9 1 2 CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl 0 0 0 4 74 3 622 258 26 3 0 1253 94 4 0 2 1 0 0 6 291 27 314 112 27 2 0 611 53 15 0 32 CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl 0 0 0 1 16 2 23 14 3 0 0 27 99 1 0 0 1 0 0 7 217 16 28 7 2 0 0 611 34 15 0 52 CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl 0 0 0 5 16 6 68 9 7 0 0 81 60 2 1 37 1 0 0 7 233 30 53 8 5 0 0 385 53 8 1 39 CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl 0 0 0 1 53 2 158 51 5 0 0 161 79 2 0 19 1 0 0 8 262 33 170 37 6 0 0 517 41 12 0 47 CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl 0 0 0 2 16 1 29 15 4 0 0 25 91 2 0 8 1 0 0 6 223 22 25 8 3 0 0 680 43 17 0 40 CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl 0 2 0 6 38 3 213 71 7 0 0 307 61 4 0 35 1 0 0 7 253 18 237 84 5 0 0 366 49 6 0 46 CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl 0 0 0 1 18 1 27 6 2 0 0 23 99 1 0 0 1 0 0 5 212 11 21 8 4 0 0 979 52 27 0 21 CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl 0 0 0 6 16 6 48 10 4 0 0 59 83 0 1 15 1 0 0 7 238 36 76 2 5 0 0 96 11 3 1 85 CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl 0 0 0 1 75 1 142 73 6 1 0 177 97 3 0 0 1 0 0 7 271 18 138 64 5 1 0 875 52 17 0 31 As with vmstat , the first entry is a summary of all activity since the system was last rebooted. The information in this entry is meaningless and should be ignored. The columns of real interest are described here:
iostatThe iostat command reports statistics on terminal, disk, and tape I/O activity (as well as CPU utilization), although its main use is to monitor disk performance. It can be used to identify a badly balanced disk configuration, and it provides sufficient options to examine the performance of a single disk partition. Listing 9.5 displays the iostat command with extended statistics for each disk volume. Listing 9.5 Sample Output from the iostat Command Using an Interval Period of 5 Seconds and the Flags -xn#iostat -xn 5 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.0 4.0 0.0 30.7 0.0 0.0 0.0 9.7 0 4 c0t0d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c0t6d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c0t8d0 1.8 49.7 16.0 397.6 0.0 0.4 0.0 8.5 0 34 c0t9d0 0.0 74.5 0.0 595.7 0.0 0.5 0.0 6.3 0 44 c0t10d0 2.0 66.3 22.4 530.2 0.0 0.7 0.0 11.0 0 56 c0t11d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 rmt/0 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.0 14.2 0.0 67.4 0.0 0.1 0.0 6.8 0 10 c0t0d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c0t6d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c0t8d0 0.6 32.0 4.8 256.0 0.0 0.2 0.0 6.7 0 20 c0t9d0 0.0 73.8 0.0 590.4 0.0 0.4 0.0 5.9 0 42 c0t10d0 0.6 68.2 4.8 545.6 0.0 0.9 0.0 13.2 0 64 c0t11d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 rmt/0 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.0 23.6 0.0 111.8 0.0 0.2 0.0 7.0 0 16 c0t0d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c0t6d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c0t8d0 1.8 52.2 14.4 417.5 0.0 0.4 0.0 6.9 0 34 c0t9d0 0.0 34.6 0.0 276.7 0.0 0.2 0.0 5.9 0 20 c0t10d0 1.6 89.2 22.4 713.4 0.0 1.5 0.0 16.1 0 85 c0t11d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 rmt/0 As with vmstat and mpstat , the first entry is a summary of all activity since the system was last rebooted. The information in this entry is meaningless and should be ignored. The information provided by iostat clearly shows disk volumes that are being used excessively and those that are not being used at all. The columns of interest are described here:
Listing 9.6 shows a more detailed picture for disk volumes, displaying the usage information for each partition of each disk. Listing 9.6 Sample Output from the iostat Command Using an Interval Period of 5 Seconds and the Flags -xnp to Show Greater Detail for Each Physical Disk Volume#iostat -xnp 5 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 3.5 14.4 29.1 92.1 0.0 0.2 0.0 12.9 0 10 c0t0d0 0.0 0.1 0.2 0.3 0.0 0.0 0.0 42.2 0 0 c0t0d0s0 0.1 1.9 0.4 10.3 0.0 0.0 0.1 17.8 0 2 c0t0d0s1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c0t0d0s2 0.0 0.0 0.0 0.1 0.0 0.0 0.0 9.0 0 0 c0t0d0s3 0.3 8.9 2.5 46.1 0.0 0.1 0.0 14.3 0 6 c0t0d0s5 0.4 0.0 2.5 0.1 0.0 0.0 0.0 5.4 0 0 c0t0d0s6 2.6 3.5 23.5 35.1 0.0 0.1 0.0 9.3 0 4 c0t0d0s7 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c0t6d0 1.6 2.2 17.0 18.4 0.0 0.0 0.0 5.8 0 2 c0t8d0 0.5 0.3 6.9 2.3 0.0 0.0 0.0 5.2 0 0 c0t8d0s0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c0t8d0s2 0.6 1.9 6.4 15.8 0.0 0.0 0.0 6.1 0 1 c0t8d0s5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 5.2 0 0 c0t8d0s6 0.5 0.0 3.7 0.3 0.0 0.0 0.0 5.2 0 0 c0t8d0s7 0.0 0.1 0.9 0.9 0.0 0.0 0.0 5.5 0 0 c0t9d0 0.0 0.1 0.5 0.5 0.0 0.0 0.0 5.5 0 0 c0t9d0s0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c0t9d0s1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c0t9d0s2 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c0t9d0s6 0.3 1.9 14.7 14.8 0.0 0.0 0.0 5.5 0 1 c0t9d0s7 0.0 0.1 0.9 0.9 0.0 0.0 0.0 5.5 0 0 c0t10d0 0.0 0.1 0.5 0.5 0.0 0.0 0.0 5.5 0 0 c0t10d0s0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c0t10d0s1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c0t10d0s2 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c0t10d0s6 0.3 2.0 15.7 15.9 0.0 0.0 0.0 5.5 0 1 c0t10d0s7 0.0 0.1 0.9 0.9 0.0 0.0 0.0 5.5 0 0 c0t11d0 0.0 0.1 0.5 0.5 0.0 0.0 0.0 5.5 0 0 c0t11d0s0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c0t11d0s1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c0t11d0s2 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c0t11d0s6 0.3 2.0 16.1 16.2 0.0 0.0 0.0 5.5 0 1 c0t11d0s7 0.0 0.1 0.9 0.9 0.0 0.0 0.0 5.5 0 0 c0t12d0 0.0 0.1 0.5 0.5 0.0 0.0 0.0 5.5 0 0 c0t12d0s0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c0t12d0s1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c0t12d0s2 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c0t12d0s6 0.3 2.1 16.4 16.6 0.0 0.0 0.0 5.5 0 1 c0t12d0s7 0.0 0.6 0.1 37.4 0.0 0.0 0.0 35.1 0 2 rmt/0 The initial information displayed in Listing 9.5 could identify a disk volume that might be the cause of a performance problem, but the information in Listing 9.6 can isolate the problem to a particular disk partition or file system being used to excess. The result is usually that the disk is too slow and needs to be replaced with a faster one, or that the file systems resident on the disk need to be split across multiple disks to achieve a performance gain. This problem would not normally point to a disk controller being at fault unless all disk volumes and partitions being serviced by a particular controller displayed the same symptoms. sarsar, which stands for "System Activity Report," is used to collect cumulative activity information and optionally to gather the information into files for subsequent analysis. By default, automatic collection of this information is disabled. To enable it, carry out the following steps:
The data is collected and stored in daily files in the directory /var/adm/sa. The data is stored in files named saxx, where xx is a number representing the day of the month. An example of sar output monitoring CPU activity is shown in Listing 9.7. Listing 9.7 Sample Output from the sar Command Using the Flag -u to Show CPU Activity with a Time Interval of 5 SecondsSunOS systemA 5.7 Generic_106541-12 sun4u 11/20/00 15:29:02 %usr %sys %wio %idle 15:29:07 80 20 0 0 15:29:12 78 22 0 0 15:29:17 79 21 0 0 15:29:22 80 20 0 0 15:29:27 83 17 0 0 Average 80 20 0 0 The sar command provides information on many system resources and, because it is stored in daily files, can be used to good effect for historical analysis. Consult the sar manual page for a full description of the facilities available with this command. perfmeterThe performance meter, or perfmeter, is a graphical display of system performance. It allows monitoring of performance on remote hosts as well as the local system, but it provides less detail than the commands already discussed in this section. The data can be displayed either as a strip chart or as multiple dials. Figures 9.1 and 9.2 show an example of each type of display, with all the available options selected. Figure 9.1. The strip chart displays cumulative results and is extremely useful in identifying peaks of activity over a given period of time.
Figure 9.2. The hour hand of the dials represents the average figure over a 20-second period, while the minute hand shows the average over a 2-second period.
topThis utility is bundled with versions of Solaris from version 8 onward and is also freely available in the public domain. It displays information about the processes currently utilizing the most of the system's resources. The display is updated regularly at an interval that is configurable by the user. The top command can be downloaded from a number of sites on the World Wide Web, including http://www.sunfreeware.com. It is easily installed and comes either as a precompiled delivery or with source code, along with supporting documentation. It is most useful for analyzing performance problems as they are in progress so that it can be seen in a real-time display exactly which processes are being executed and how much of the CPU is currently allocated to a particular process. The top command also shows summary information, such as the current system load, the amount of swap space and physical memory in use (and how much is free), how many processes are on the system and what their respective states are, and also how CPU utilization is being divided. Listing 9.8 A Snapshot from the top Command Showing the Processes Using the Most of the Resourcesload averages: 0.63, 0.61, 0.64 18:58:46 103 processes: 95 sleeping, 1 running, 4 zombie, 3 on cpu CPU states: 43.4% idle, 25.0% user, 27.2% kernel, 4.4% iowait, 0.0% swap Memory: 2048M real, 317M free, 517M swap in use, 3198M swap free PID USERNAME THR PRI NICE SIZE RES STATE TIME CPU COMMAND 15132 nobody 1 58 0 49M 40M sleep 0:09 7.34% oracle 6811 oracle 1 1 0 48M 37M sleep 0:21 1.88% oracle 29672 john 1 11 0 19M 12M cpu1 0:03 1.56% sqlplus 29673 john 1 11 0 48M 37M cpu1 0:02 1.47% oracle 15128 oracle 1 58 0 20M 12M sleep 0:01 0.84% svrmgrl 6658 nobody 7 12 0 24M 21M sleep 0:15 0.80% jre 15879 john 1 0 0 2208K 1592K cpu0 0:00 0.72% top 1 root 1 38 0 752K 304K sleep 5:36 0.42% init 15970 root 1 0 0 1080K 840K sleep 0:00 0.13% mkbb.sh 2949 oracle 1 58 0 60M 52M sleep 3:57 0.11% oracle 16104 root 1 0 0 1080K 664K sleep 0:00 0.11% mkbb.sh 1395 oracle 60 58 0 50M 36M sleep 0:48 0.09% oracle 28718 nobody 1 58 0 49M 40M sleep 0:10 0.08% oracle |
Top |