You can gather the appropriate performance information for router processors through MIBs or through the command-line interface (CLI) via Telnet or rsh. We will look at both methods, as well as explain what the highlighted variables indicate, why they are useful, and how to manage the resulting factors. We will also identify some starting point thresholds to set and watch for when monitoring performance. But please note that the threshold settings defined in this section reflect only a starting point and nothing more. You must first understand your network traffic flows and network characteristics before making the appropriate threshold setting. Thresholds will constantly need tweaking and re-evaluation to meet the needs of your environment. MIB Variables for Router CPU UtilizationProcess-intensive tasks, such as process switching data packets, routing updates, interface flapping, or broadcasts/multicasts can cause the CPU to increase on the router. From OLD-CISCO-CPU MIB or OLD-CISCO-SYS MIB, the avgBusy5 value reports the percentage of the processor in use over a running five-minute average. NOTE Starting in IOS version 12.0(3)T, the CISCO-PROCESS MIB and the MIB object cpmCPUTotal5min are used to replace the avgBusy5 object from the OLD-CISCO-CPU MIB. The avgBusy5 MIB provides a more accurate view of your router's performance over time than using the MIBs avgBusy1 and BusyPer, which look at CPU at one minute and five-second intervals, respectively. It is valuable for trend monitoring and capacity planning of the network. Watching the CPU closely when it exceeds the rising threshold value is more important than watching the falling threshold value, especially when you are troubleshooting performance degradation. In other words, a higher CPU might indicate a problem, whereas a lower CPU may not be an issue. The recommended baseline rising threshold for avgBusy5 is 90 percent. TIP Depending on the platform, some routers running at 90 percent, such as 2500s, may exhibit performance degradation versus a high-end router, such as the 7500 series, which may operate fine. Trending of CPU over time is important for developing your baseline. Related MIB objects from OLD-CISCO-CPU MIB or OLD-CISCO-SYS MIB are as follows:
CLI Commands Relating to CPU on the RouterThe show proc cpu command displays the five-second, one-minute, and five-minute CPU utilization for the router. It also provides a breakdown of the individual processes running on the router and what percentage of the CPU each process takes. This command is very useful when you are troubleshooting, especially when avgBusy5 exceeds the threshold you define. Many valuable data points in the display output can help you identify possible problem areas with the router or in your network. Looking at the individual processes running with high CPU percentages associated with them can help narrow your scope of where a particular problem may lie. Example 11-3 shows sample output from show proc cpu. Example 11-3 Using show proc cpu to get cpu information. Router>sh proc cpu CPU utilization for five seconds:12%A /3%B ;one minute:7%C ;five minutes:10%D PID Runtime(ms) Invoked uSecs 5Sec 1Min 5Min TTY Process 1 0 1 0 0.00% 0.00% 0.00% 0 SCOP Input 2 179548 30793 5830 0.00% 0.00% 0.00% 0 Check heaps 3 4 9 444 0.00% 0.00% 0.00% 0 Pool Manager 4 0 2 0 0.00% 0.00% 0.00% 0 Timers 5 8272 711817 14 0.00% 0.00% 0.00% 0 OIR Handler 6 0 1 0 0.00% 0.00% 0.00% 0 IPC Zone Manager 7 0 1 0 0.00% 0.00% 10.00%E 0 IP InputF 8 0 74520 0 0.00% 0.00% 0.00% 0 IPC Seat Manager 9 116 26351 4 0.00% 0.00% 0.00% 0 ARP Input ... ...etc. Important information from Example 11-3 is annotated as follows:
MIB Variables for Router Device UptimeFrom MIB RFC 1213, the sysUpTime variable indicates how long the router has been up since the last reboot, caused commonly either by a power-on, reload, or software exception error. It is in units of 1/100 second. sysUpTime is not that valuable by itself, but it can be useful as a comparison or correlation to other variables. Comparing sysUpTime relative to other routers in the network can help you correlate downtimes and network availability in the network. You should be able to easily correlate sysUpTime on routers to your change management "windows." For example, if you have a scheduled power outage on a weekend in one portion of the network, you should be able to correlate the sysUpTime on all the routers in that area to approximately the same sysUpTime. If you see any routers whose sysUpTime is significantly different from the others, then you know something unexpected happened. The recommended baseline threshold for sysUpTime is that all the routers in a common region of the network reflect approximately the same sysUpTime value. Change management practices and accurate documentation will provide the evidence on why and when the router reloaded. A related MIB object from OLD-CISCO-SYS MIB is WhyReload, which returns a printable octet string that contains the reason why the system was last restarted. CLI Commands Relating to sysUptimeThe show version command displays the IOS software release running and the hardware inventory of what is installed on the router, as well as the configuration register setting, which can directly affect how the router boots up when it reloads. This output also displays the reason for the last restart. Example 11-4 shows sample output from show version. Example 11-4 Using show version to obtain uptime information. Router>sh ver Cisco Internetwork Operating System Software IOS (tm) GS Software (RSP-J-M), Version 11.1(18)CA1, EARLY DEPLOYMENT RELEASE SOFTWARE (fc1) Synced to mainline version: 11.1(18) Copyright (c) 1986-1998 by cisco Systems, Inc. Compiled Tue 21-Apr-98 19:41 by richardd Image text-base: 0x60010900, data-base: 0x607AE000 ROM: System Bootstrap, Version 11.1(2) [nitin 2], RELEASE SOFTWARE (fc1) ROM: GS Software (RSP-BOOT-M), Version 11.1(6), RELEASE SOFTWARE (fc1) GOAL uptime is 2 days, 10 hours, 57 minutes System restarted by reload at 10:44:07 UTC Thu Jan 28 1999A System image file is "slot0:rsp-j-mz.111-18.CA1", booted via slot0 cisco RSP2 (R4700) processor with 65536K/2072K bytes of memory. R4700 processor, Implementation 33, Revision 1.0 Last reset from power-on G.703/E1 software, Version 1.0. G.703/JT2 software, Version 1.0. SuperLAT software copyright 1990 by Meridian Technology Corp). Bridging software. X.25 software, Version 2.0, NET2, BFE and GOSIP compliant. TN3270 Emulation software (copyright 1994 by TGV Inc). Chassis Interface. 1 HIP controller (1 HSSI). 1 FIP controller (1 FDDI). 1 HSSI network interface. 1 FDDI network interface. 123K bytes of non-volatile configuration memory. 16384K bytes of Flash PCMCIA card at slot 0 (Sector size 128K). 4096K bytes of Flash internal SIMM (Sector size 256K). No slave installed in slot 3. Configuration register is 0x2102 The "System restarted by…" line (A) tells you why the router was reloaded. Other values you may see here are "power-on" or "software forced crash." If the router was restarted by a software error, the show stack command output provides details on why the router crashed, mainly in hex code. This information is valuable when you open a case with the Cisco Technical Assistance Center (TAC) because the engineers there have internal tools that can decode the stack trace. Or you can decode it yourself on CCO's (Cisco Connection Online) Stack Decoder Web page: http://www.cisco.com/stack/stackdecoder.shtml. MIB Variables for Memory Utilization on RoutersAn unstable network, such as one with flapping routes, can cause fragmented memory or constant system buffer creates and trims. Actively monitoring the largest contiguous memory block free in a router helps trend and gauge the available contiguous memory. You can have all the free memory you want, but if the contiguous block free is too low, then the router will not function properly. Processes won't be able to run if they cannot get enough free contiguous memory to execute effectively. You'll know you are hitting a critical point here when you start seeing %SYS-2-MALLOCFAIL or %SYS-2-NOMEMORY in syslog, or if you are unable to Telnet into the router due to low memory. Typically, you'll start seeing the syslog messages prior to being kicked out of a Telnet session. Memory leaks or other defects may cause this fragmentation. The following MIBs are relevant for monitoring memory utilization on routers:
The ciscoMemoryPoolLargestFree variable indicates the largest number of contiguous bytes from the memory pool that are currently unused on the managed device. This MIB was first introduced in IOS release 11.1. IOS prior to 11.1 requires the execution of the CLI command show mem to get the same data. This variable, ciscoMemoryPoolLargestFree, is called fragmented memory. The recommended baseline threshold for ciscoMemoryPoolLargestFree is 500 KB for the low watermark. The ciscoMemoryPoolFree and freeMem variables indicate the number of bytes from the memory pool that are currently unused on the managed device. It is still important to trend the amount of memory free on the router for capacity planning, even when monitoring the largest memory block free. Monitoring total memory free helps mainly in the capacity planning of your network. Changing traffic patterns or the addition of more networks to the infrastructure can cause the free memory in the routers to change. Total memory changes would be even more evident when you upgrade your router to another IOS release. Recall from the previous introduction to system memory that IOS takes up a portion of the total DRAM installed, thus leaving you with the actual amount of free memory you have to play with for other processes. It is wise to baseline your free memory after an IOS upgrade to accurately represent the amount you have for processes in the router. The memory pool is accessed when buffers are needed to process incoming packets or when a router process needs more memory, such as when routing updates are sent out. When buffers free up the memory, you'll see the trim counters increment. The recommended baseline threshold for ciscoMemoryPoolFree and freeMem is 1 MB for the low watermark. CLI Commands for Memory UsageThere are several show commands relating to memory utilization on routers. The following two will be discussed here:
Using the show memory CommandThe show mem output displays the following:
Data for the processor memory, fast cache, and I/O memory is seen here. When gathering this data, it is important to leave your terminal length set to 24 because the output of this command can be rather large. The most pertinent data needed from this output is within the first four lines displayed. The output fields are shown in bytes indicated by a (b) (see Example 11-5). The primary use of show mem is for routers that are not running IOS version 11.1 or greater, which has the support for the CISCO-MEMORY-POOL MIB. The "Largest(b)" field is the value you want to trend here because it is equivalent to the MIB ciscoMemoryPoolLargestFree. You can also use this command to trend how low the memory has dropped based on the "Lowest(b)" field since the last time the router restarted. This variable is not available via a MIB. Example 11-5 shows sample output from show mem for both high-end and low-end routers. Example 11-5 Using show mem to obtain information on memory usage.High-end routers (7xxx series) Router>show mem Head Total(b) Used(b) Free(b) Lowest(b) Largest(b) Processor 60E6F330 18418896A 2156560B 16262336C 16117456D 16157784E Fast 60E4F330 131072A 80144B 50928C 50928D 50892E --More-- Low-end routers (4xxx,2500,3600,etc.series) Router>sh mem Head Total(b) Used(b) Free(b) Lowest(b) Largest(b) Processor 60947BF0 23823376A 1890660B 21932716C 21783296D 21801520E I/O 40000000 16777216A 1390348B 15386868C 15384516D 15386356E --More-- The following information is highlighted in Example 11-5:
Using the show proc memory CommandThe command show proc mem displays the amount of memory taken by each process running on the router. It shows the following information:
This output, which is available only via the CLI, provides an understanding of what processes are being used most and least in the router. Typically, all processes running on the router take up a fixed amount of memory, as defined by the process when it starts up, and the amount stays fairly constant over time. But if you see a steady increase in memory "held" (the Holding column) by a process, it could be an indication of a memory leak. A memory leak can occur when there is additional memory allocated for processes and the memory is not released or given back to main memory when not in use. Example 11-6 shows sample output from show proc mem. Example 11-6 Using show proc mem to obtain memory usage information. Router# show proc mem Total: 5611448, Used: 2307548, Free: 3303900 PID TTY Allocated Freed Holding Getbufs Retbufs Process 0 0 199592 1236 1907220A 0 0 *Init* 0 0 400 76928 400 0 0 *Sched* 0 0 5431176 3340052 140760 349780 0 *Dead* 1 0 256 256 1724 0 0 Load Meter 2 0 264 0 5032 0 0 Exec 3 0 0 0 2724 0 0 Check heaps 4 0 97932 0 2852 32760 0 Pool Manager 5 0 256 256 2724 0 0 Timers 6 0 92 0 2816 0 0 CXBus hot stall 7 0 0 0 2724 0 0 IPC Zone Manager 8 0 0 0 2724 0 0 IPC Realm Manager ... 77 0 116 0 2844 0 0 IPX-EIGRP Hello 2307224 B Total Following is the highlighted information in Example 11-6:
MIB Variables for Buffer Utilization on RoutersFrom OLD-CISCO-MEMORY-MIB, the following variables are relevant to buffer utilization on routers:
Either one of these MIBS alone doesn't mean much. You need to correlate the two together to develop a percentage of misses to hits using Equation 11-1: Equation 11-1 Using the percentage approach puts the number of misses in perspective; thus, routers with the most misses do not necessarily require tuning first. We are focusing on the middle buffer pool because these buffers fall in what we feel is the most common data packet range: 105 600 bytes. Remember that packets that hit these system buffers are process-switched packets, such as IPX RIP/SAP packets. The same principle and correlation can be drawn between the other buffer sizes as well. The amount of traffic hitting each buffer pool determines which pools need more attention than others. If we were to rank the buffers in order of most to least important, the relevant order for most routers would be: Medium, Big, Small, VeryBig, Large, and then Huge. Tuning buffers is not an easy task. Misunderstanding what actually utilizes buffers is common and results in the incorrect configuration of system buffers. We have developed an understanding that tuning should occur when the total number of misses divided by the total number of hits for a given buffer pool is less than 0.5 of 1 percent. Small buffers are used mainly for broadcast traffic such as explorers, ARPs, GNS requests, etc., and should not necessarily be tuned to perfection. Allowing more broadcast traffic in to further be processed out is not a good practice. In fact, process-oriented packet drops often can be beneficial to the health of the overall network, such as in the case of SNA and NetBIOS broadcast storms. Our approach on buffer analysis begins with middle buffers. Because most packets hitting this buffer pool control routing stability and session data for process-switched protocols, RIP and SAP updates, and others, it is essential to tune this pool first. Looking at the buffer miss-to-hit analysis gives you a good understanding of buffer stability. In other words, you will have some buffer misses and this is normal because broadcast and bursty traffic exist from time to time in any network. NOTE This methodology relating to buffers represents the authors' views based on experience, and does not reflect the views of Cisco in general. The recommended baseline threshold is a miss-to-hit percentage of 0.5 of 1 percent. This initial baseline value holds true for all buffer sizes. CLI Commands for Buffer UsageThe show buffers command displays the statistics for all the different buffer pools on the router. The configured values for permanent, max-free, and min-free buffers for each size are displayed in this output, as well as the hits, misses, and "no memory" values. The show buffers command actually provides you with the same data as from the buffer MIBs, except for one value: the "fallbacks" counter under the interface pools. Buffers, unlike other counters on the router, are cumulative values since the last restart; thus, they cannot be cleared. Because SNMP is cumulative as well, the buffer values you see in this output should be similar to the appropriate MIB values. Example 11-7 shows sample output from show buffers. Example 11-7 Using the show buffers command to get router buffer information. Router# show buffers Buffer elements: 398 in free list (500 max allowed) 1266 hits, 0 misses, 0 created Public buffer pools: Small buffers, 104 bytes (total 50A, permanent 50 B): 50 in free list (20 min, 150 max allowed) 551 Chits, 0D misses, 0 E trims, 0 F created Middle buffers, 600 bytes (total 25, permanent 25): 25 in free list (10 min, 150 max allowed) 39 hits, 0 misses, 0 trims, 0 created Big buffers, 1524 bytes (total 50, permanent 50): 49 in free list (5 min, 150 max allowed) 27 hits, 0 misses, 0 trims, 0 created VeryBig buffers, 4520 bytes (total 10, permanent 10): 10 in free list (0 min, 100 max allowed) 0 hits, 0 misses, 0 trims, 0 created Large buffers, 5024 bytes (total 0, permanent 0): 0 in free list (0 min, 10 max allowed) 0 hits, 0 misses, 0 trims, 0 created Huge buffers, 18024 bytes (total 0, permanent 0): 0 in free list (0 min, 4 max allowed) 0 hits, 0 misses, 0 trims, 0 created Interface buffer pools: Ethernet0 buffers, 1524 bytes (total 64, permanent 64): 16 in free list (0 min, 64 max allowed) 48 hits, 0 G fallbacks 16 max cache size, 16 in cache Ethernet1 buffers, 1524 bytes (total 64, permanent 64): 16 in free list (0 min, 64 max allowed) 48 hits, 0 fallbacks 16 max cache size, 16 in cache Serial0 buffers, 1524 bytes (total 64, permanent 64): 16 in free list (0 min, 64 max allowed) 48 hits, 0 fallbacks 16 max cache size, 16 in cache Serial1 buffers, 1524 bytes (total 64, permanent 64): 16 in free list (0 min, 64 max allowed) 48 hits, 0 fallbacks 16 max cache size, 16 in cache TokenRing0 buffers, 4516 bytes (total 48, permanent 48): 0 in free list (0 min, 48 max allowed) 48 hits, 0 fallbacks 16 max cache size, 16 in cache TokenRing1 buffers, 4516 bytes (total 32, permanent 32): 32 in free list (0 min, 48 max allowed) 16 hits, 0 fallbacks 0 H failures (0 I no memory) Following are the highlighted values from Example 11-7:
|