Performance Data for Router Processors | Performance and Fault Management: A Practical Guide to Effectively Managing Cisco Network Devices (Cisco Press Core Series)

You can gather the appropriate performance information for router processors through MIBs or through the command-line interface (CLI) via Telnet or rsh. We will look at both methods, as well as explain what the highlighted variables indicate, why they are useful, and how to manage the resulting factors. We will also identify some starting point thresholds to set and watch for when monitoring performance. But please note that the threshold settings defined in this section reflect only a starting point and nothing more. You must first understand your network traffic flows and network characteristics before making the appropriate threshold setting. Thresholds will constantly need tweaking and re-evaluation to meet the needs of your environment.

MIB Variables for Router CPU Utilization

Process-intensive tasks, such as process switching data packets, routing updates, interface flapping, or broadcasts/multicasts can cause the CPU to increase on the router.

From OLD-CISCO-CPU MIB or OLD-CISCO-SYS MIB, the avgBusy5 value reports the percentage of the processor in use over a running five-minute average.

NOTE

Starting in IOS version 12.0(3)T, the CISCO-PROCESS MIB and the MIB object cpmCPUTotal5min are used to replace the avgBusy5 object from the OLD-CISCO-CPU MIB.

The avgBusy5 MIB provides a more accurate view of your router's performance over time than using the MIBs avgBusy1 and BusyPer, which look at CPU at one minute and five-second intervals, respectively. It is valuable for trend monitoring and capacity planning of the network. Watching the CPU closely when it exceeds the rising threshold value is more important than watching the falling threshold value, especially when you are troubleshooting performance degradation. In other words, a higher CPU might indicate a problem, whereas a lower CPU may not be an issue.

The recommended baseline rising threshold for avgBusy5 is 90 percent.

TIP

Depending on the platform, some routers running at 90 percent, such as 2500s, may exhibit performance degradation versus a high-end router, such as the 7500 series, which may operate fine. Trending of CPU over time is important for developing your baseline.

Related MIB objects from OLD-CISCO-CPU MIB or OLD-CISCO-SYS MIB are as follows:

avgBusy1: One minute exponentially-decayed moving average of the CPU busy percentage.
BusyPer: CPU busy percentage in the last five-second period. Not the last five real time seconds, but the last five-second period in the scheduler.

CLI Commands Relating to CPU on the Router

The show proc cpu command displays the five-second, one-minute, and five-minute CPU utilization for the router. It also provides a breakdown of the individual processes running on the router and what percentage of the CPU each process takes. This command is very useful when you are troubleshooting, especially when avgBusy5 exceeds the threshold you define. Many valuable data points in the display output can help you identify possible problem areas with the router or in your network. Looking at the individual processes running with high CPU percentages associated with them can help narrow your scope of where a particular problem may lie.

Example 11-3 shows sample output from show proc cpu.

Example 11-3 Using show proc cpu to get cpu information.

 Router>sh proc cpu CPU utilization for five seconds:12%^A /3%^B ;one minute:7%^C ;five minutes:10%^D PID  Runtime(ms)   Invoked    uSecs   5Sec   1Min    5Min   TTY    Process 1          0             1        0  0.00%  0.00%   0.00%     0    SCOP Input 2     179548         30793     5830  0.00%  0.00%   0.00%     0    Check heaps 3          4             9      444  0.00%  0.00%   0.00%     0    Pool Manager 4          0             2        0  0.00%  0.00%   0.00%     0    Timers 5       8272        711817       14  0.00%  0.00%   0.00%     0    OIR Handler 6          0             1        0  0.00%  0.00%   0.00%     0  IPC Zone Manager 7          0             1        0  0.00%  0.00%  10.00%^E    0    IP Input^F 8          0         74520        0  0.00%  0.00%   0.00%     0  IPC Seat Manager 9        116         26351        4  0.00%  0.00%   0.00%     0    ARP Input ... ...etc.

Important information from Example 11-3 is annotated as follows:

A The CPU utilization over the last five seconds (also available via MIB busyPer).

B The percentage of CPU time at interrupt level (fast-switched packets), over a five-second period. If you take the difference between A and B, you'll get the five-second percentage the router spent at the process level. In this case, the router spent 9 percent at the process level over the last five seconds processed switched packets. (No MIB available.)

C The CPU utilization over the last minute (also available via MIB avgBusy1).

D The CPU utilization over the last five minutes (also available via MIB avgBusy5).

E The five-minute average of an individual process running on the router. In this example, you see that IP input is taking up 10 percent of the processor over a five-minute time frame.

F The name of the individual process running. You may be able to correlate an individual process that is running high CPU usage with other similar processes. For example, if IP Input is high, you may also want to look and see whether any of the IP routing processes are running high as well to help you narrow down the specific IP process causing the high CPU.

MIB Variables for Router Device Uptime

From MIB RFC 1213, the sysUpTime variable indicates how long the router has been up since the last reboot, caused commonly either by a power-on, reload, or software exception error. It is in units of 1/100 second.

sysUpTime is not that valuable by itself, but it can be useful as a comparison or correlation to other variables. Comparing sysUpTime relative to other routers in the network can help you correlate downtimes and network availability in the network. You should be able to easily correlate sysUpTime on routers to your change management "windows." For example, if you have a scheduled power outage on a weekend in one portion of the network, you should be able to correlate the sysUpTime on all the routers in that area to approximately the same sysUpTime. If you see any routers whose sysUpTime is significantly different from the others, then you know something unexpected happened.

The recommended baseline threshold for sysUpTime is that all the routers in a common region of the network reflect approximately the same sysUpTime value. Change management practices and accurate documentation will provide the evidence on why and when the router reloaded.

A related MIB object from OLD-CISCO-SYS MIB is WhyReload, which returns a printable octet string that contains the reason why the system was last restarted.

CLI Commands Relating to sysUptime

The show version command displays the IOS software release running and the hardware inventory of what is installed on the router, as well as the configuration register setting, which can directly affect how the router boots up when it reloads. This output also displays the reason for the last restart.

Example 11-4 shows sample output from show version.

Example 11-4 Using show version to obtain uptime information.

 Router>sh ver Cisco Internetwork Operating System Software IOS (tm) GS Software (RSP-J-M), Version 11.1(18)CA1, EARLY DEPLOYMENT RELEASE SOFTWARE (fc1) Synced to mainline version: 11.1(18) Copyright (c) 1986-1998 by cisco Systems, Inc. Compiled Tue 21-Apr-98 19:41 by richardd Image text-base: 0x60010900, data-base: 0x607AE000 ROM: System Bootstrap, Version 11.1(2) [nitin 2], RELEASE SOFTWARE (fc1) ROM: GS Software (RSP-BOOT-M), Version 11.1(6), RELEASE SOFTWARE (fc1) GOAL uptime is 2 days, 10 hours, 57 minutes System restarted by reload at 10:44:07 UTC Thu Jan 28 1999^A System image file is "slot0:rsp-j-mz.111-18.CA1", booted via slot0 cisco RSP2 (R4700) processor with 65536K/2072K bytes of memory. R4700 processor, Implementation 33, Revision 1.0 Last reset from power-on G.703/E1 software, Version 1.0. G.703/JT2 software, Version 1.0. SuperLAT software copyright 1990 by Meridian Technology Corp). Bridging software. X.25 software, Version 2.0, NET2, BFE and GOSIP compliant. TN3270 Emulation software (copyright 1994 by TGV Inc). Chassis Interface. 1 HIP controller (1 HSSI). 1 FIP controller (1 FDDI). 1 HSSI network interface. 1 FDDI network interface. 123K bytes of non-volatile configuration memory. 16384K bytes of Flash PCMCIA card at slot 0 (Sector size 128K). 4096K bytes of Flash internal SIMM (Sector size 256K). No slave installed in slot 3. Configuration register is 0x2102

The "System restarted by…" line (A) tells you why the router was reloaded. Other values you may see here are "power-on" or "software forced crash."

If the router was restarted by a software error, the show stack command output provides details on why the router crashed, mainly in hex code. This information is valuable when you open a case with the Cisco Technical Assistance Center (TAC) because the engineers there have internal tools that can decode the stack trace. Or you can decode it yourself on CCO's (Cisco Connection Online) Stack Decoder Web page: http://www.cisco.com/stack/stackdecoder.shtml.

MIB Variables for Memory Utilization on Routers

An unstable network, such as one with flapping routes, can cause fragmented memory or constant system buffer creates and trims. Actively monitoring the largest contiguous memory block free in a router helps trend and gauge the available contiguous memory. You can have all the free memory you want, but if the contiguous block free is too low, then the router will not function properly. Processes won't be able to run if they cannot get enough free contiguous memory to execute effectively. You'll know you are hitting a critical point here when you start seeing %SYS-2-MALLOCFAIL or %SYS-2-NOMEMORY in syslog, or if you are unable to Telnet into the router due to low memory. Typically, you'll start seeing the syslog messages prior to being kicked out of a Telnet session. Memory leaks or other defects may cause this fragmentation.

The following MIBs are relevant for monitoring memory utilization on routers:

ciscoMemoryPoolLargestFree (from CISCO-MEMORY-POOL-MIB)
ciscoMemoryPoolFree (from CISCO-MEMORY-POOL-MIB or OLD-CISCO-MEMORY-MIB)
freeMem (prior to IOS 11.1; from CISCO-MEMORY-POOL-MIB or OLD-CISCO-MEMORY-MIB)

The ciscoMemoryPoolLargestFree variable indicates the largest number of contiguous bytes from the memory pool that are currently unused on the managed device. This MIB was first introduced in IOS release 11.1. IOS prior to 11.1 requires the execution of the CLI command show mem to get the same data. This variable, ciscoMemoryPoolLargestFree, is called fragmented memory.

The recommended baseline threshold for ciscoMemoryPoolLargestFree is 500 KB for the low watermark.

The ciscoMemoryPoolFree and freeMem variables indicate the number of bytes from the memory pool that are currently unused on the managed device. It is still important to trend the amount of memory free on the router for capacity planning, even when monitoring the largest memory block free.

Monitoring total memory free helps mainly in the capacity planning of your network. Changing traffic patterns or the addition of more networks to the infrastructure can cause the free memory in the routers to change. Total memory changes would be even more evident when you upgrade your router to another IOS release. Recall from the previous introduction to system memory that IOS takes up a portion of the total DRAM installed, thus leaving you with the actual amount of free memory you have to play with for other processes. It is wise to baseline your free memory after an IOS upgrade to accurately represent the amount you have for processes in the router. The memory pool is accessed when buffers are needed to process incoming packets or when a router process needs more memory, such as when routing updates are sent out. When buffers free up the memory, you'll see the trim counters increment.

The recommended baseline threshold for ciscoMemoryPoolFree and freeMem is 1 MB for the low watermark.

CLI Commands for Memory Usage

There are several show commands relating to memory utilization on routers. The following two will be discussed here:

show mem
show proc mem

Using the show memory Command

The show mem output displays the following:

Total amount of memory available
Amount of used memory
Amount of current free memory
Lowest amount of free memory since the last restart
Largest contiguous memory block currently available

Data for the processor memory, fast cache, and I/O memory is seen here. When gathering this data, it is important to leave your terminal length set to 24 because the output of this command can be rather large. The most pertinent data needed from this output is within the first four lines displayed. The output fields are shown in bytes indicated by a (b) (see Example 11-5).

The primary use of show mem is for routers that are not running IOS version 11.1 or greater, which has the support for the CISCO-MEMORY-POOL MIB. The "Largest(b)" field is the value you want to trend here because it is equivalent to the MIB ciscoMemoryPoolLargestFree. You can also use this command to trend how low the memory has dropped based on the "Lowest(b)" field since the last time the router restarted. This variable is not available via a MIB.

Example 11-5 shows sample output from show mem for both high-end and low-end routers.

Example 11-5 Using show mem to obtain information on memory usage.

 High-end routers (7xxx series) Router>show mem             Head      Total(b)    Used(b)    Free(b)    Lowest(b)   Largest(b) Processor   60E6F330  18418896^A   2156560^B   16262336^C  16117456^D   16157784^E Fast        60E4F330  131072^A     80144^B     50928^C     50928^D      50892^E --More-- Low-end routers (4xxx,2500,3600,etc.series) Router>sh mem             Head      Total(b)     Used(b)    Free(b)     Lowest(b)   Largest(b) Processor   60947BF0  23823376^A    1890660^B   21932716^C   21783296^D   21801520^E       I/O   40000000  16777216^A    1390348^B   15386868^C    15384516^D   15386356^E --More--

The following information is highlighted in Example 11-5:

A "Total(b)" is the total amount of memory, in bytes, available for the processor after the IOS is loaded. If you want to know how much memory the IOS is taking on the router, subtract the Total bytes shown here from the total amount of DRAM or system memory (processorRam) installed on the router. The total I/O memory or Fast memory is based on the physical I/O memory installed on the low-end routers or based on the amount of packet memory allocated on high-end routers from system memory (typically, 2 MB on RSP platforms).

B "Used(b)" is the total amount of memory, in bytes, currently used (ciscoMemoryPoolUsed) by the router.

C "Free(b)" is the total amount of memory, in bytes, currently free (ciscoMemoryPoolFree or freeMem) in the router.

D "Lowest(b)" is the lowest amount of memory that was free at some point in time since the last reload of the router. There is no equivalent MIB for this value.

E "Largest(b)" is the largest contiguous block of memory free in the router (ciscoMemoryPoolLargestFree). This is the most important field to look at in this output.

Using the show proc memory Command

The command show proc mem displays the amount of memory taken by each process running on the router. It shows the following information:

Amount of allocated memory by each process
Amount of freed memory by a process
Number of times the process requested a packet buffer or relinquished a packet buffer
Total amount of memory allocated to all processes

This output, which is available only via the CLI, provides an understanding of what processes are being used most and least in the router. Typically, all processes running on the router take up a fixed amount of memory, as defined by the process when it starts up, and the amount stays fairly constant over time. But if you see a steady increase in memory "held" (the Holding column) by a process, it could be an indication of a memory leak. A memory leak can occur when there is additional memory allocated for processes and the memory is not released or given back to main memory when not in use.

Example 11-6 shows sample output from show proc mem.

Example 11-6 Using show proc mem to obtain memory usage information.

 Router# show proc mem Total: 5611448, Used: 2307548, Free: 3303900 PID  TTY  Allocated      Freed    Holding    Getbufs    Retbufs Process   0    0     199592       1236    1907220^A          0         0  *Init*   0    0        400      76928        400          0          0 *Sched*   0    0    5431176    3340052     140760     349780          0 *Dead*   1    0        256        256       1724          0          0 Load Meter   2    0        264          0       5032          0          0 Exec   3    0          0          0       2724          0          0 Check heaps   4    0      97932          0       2852      32760          0 Pool Manager   5    0        256        256       2724          0          0 Timers   6    0         92          0       2816          0          0 CXBus hot stall   7    0          0          0       2724          0          0 IPC Zone Manager   8    0          0          0       2724          0          0 IPC Realm Manager    ...  77    0        116          0       2844          0          0 IPX-EIGRP Hello                                   2307224 ^B Total

Following is the highlighted information in Example 11-6:

A The only field that needs attention here is the "Holding" column. It is displayed in bytes, and represents how much memory is currently used by each individual process.

B The total amount of memory allocated to processes in the router at a given time. The value should be fairly close to, if not the same as, the MIB ciscoMemoryPoolUsed.

MIB Variables for Buffer Utilization on Routers

From OLD-CISCO-MEMORY-MIB, the following variables are relevant to buffer utilization on routers:

bufferMdMiss This MIB is indicative of the middle buffer misses seen. Packets ranging in size of 105 to 600 bytes.
bufferMdHit This MIB is indicative of the middle buffer hits seen.

Either one of these MIBS alone doesn't mean much. You need to correlate the two together to develop a percentage of misses to hits using Equation 11-1:

Equation 11-1

Using the percentage approach puts the number of misses in perspective; thus, routers with the most misses do not necessarily require tuning first. We are focusing on the middle buffer pool because these buffers fall in what we feel is the most common data packet range: 105 600 bytes. Remember that packets that hit these system buffers are process-switched packets, such as IPX RIP/SAP packets. The same principle and correlation can be drawn between the other buffer sizes as well. The amount of traffic hitting each buffer pool determines which pools need more attention than others. If we were to rank the buffers in order of most to least important, the relevant order for most routers would be: Medium, Big, Small, VeryBig, Large, and then Huge.

Tuning buffers is not an easy task. Misunderstanding what actually utilizes buffers is common and results in the incorrect configuration of system buffers. We have developed an understanding that tuning should occur when the total number of misses divided by the total number of hits for a given buffer pool is less than 0.5 of 1 percent. Small buffers are used mainly for broadcast traffic such as explorers, ARPs, GNS requests, etc., and should not necessarily be tuned to perfection. Allowing more broadcast traffic in to further be processed out is not a good practice. In fact, process-oriented packet drops often can be beneficial to the health of the overall network, such as in the case of SNA and NetBIOS broadcast storms.

Our approach on buffer analysis begins with middle buffers. Because most packets hitting this buffer pool control routing stability and session data for process-switched protocols, RIP and SAP updates, and others, it is essential to tune this pool first. Looking at the buffer miss-to-hit analysis gives you a good understanding of buffer stability. In other words, you will have some buffer misses and this is normal because broadcast and bursty traffic exist from time to time in any network.

NOTE

This methodology relating to buffers represents the authors' views based on experience, and does not reflect the views of Cisco in general.

The recommended baseline threshold is a miss-to-hit percentage of 0.5 of 1 percent. This initial baseline value holds true for all buffer sizes.

CLI Commands for Buffer Usage

The show buffers command displays the statistics for all the different buffer pools on the router. The configured values for permanent, max-free, and min-free buffers for each size are displayed in this output, as well as the hits, misses, and "no memory" values.

The show buffers command actually provides you with the same data as from the buffer MIBs, except for one value: the "fallbacks" counter under the interface pools. Buffers, unlike other counters on the router, are cumulative values since the last restart; thus, they cannot be cleared. Because SNMP is cumulative as well, the buffer values you see in this output should be similar to the appropriate MIB values.

Example 11-7 shows sample output from show buffers.

Example 11-7 Using the show buffers command to get router buffer information.

 Router# show buffers Buffer elements:      398 in free list (500 max allowed)      1266 hits, 0 misses, 0 created Public buffer pools: Small buffers, 104 bytes (total 50^A, permanent 50 ^B):      50 in free list (20 min, 150 max allowed)      551 ^Chits, 0^D misses, 0 ^E trims, 0 ^F created Middle buffers, 600 bytes (total 25, permanent 25):      25 in free list (10 min, 150 max allowed)      39 hits, 0 misses, 0 trims, 0 created Big buffers, 1524 bytes (total 50, permanent 50):      49 in free list (5 min, 150 max allowed)      27 hits, 0 misses, 0 trims, 0 created VeryBig buffers, 4520 bytes (total 10, permanent 10):      10 in free list (0 min, 100 max allowed)      0 hits, 0 misses, 0 trims, 0 created Large buffers, 5024 bytes (total 0, permanent 0):      0 in free list (0 min, 10 max allowed)      0 hits, 0 misses, 0 trims, 0 created Huge buffers, 18024 bytes (total 0, permanent 0):      0 in free list (0 min, 4 max allowed)      0 hits, 0 misses, 0 trims, 0 created Interface buffer pools: Ethernet0 buffers, 1524 bytes (total 64, permanent 64):      16 in free list (0 min, 64 max allowed)      48 hits, 0 ^G fallbacks      16 max cache size, 16 in cache Ethernet1 buffers, 1524 bytes (total 64, permanent 64):      16 in free list (0 min, 64 max allowed)      48 hits, 0 fallbacks      16 max cache size, 16 in cache Serial0 buffers, 1524 bytes (total 64, permanent 64):      16 in free list (0 min, 64 max allowed)      48 hits, 0 fallbacks      16 max cache size, 16 in cache Serial1 buffers, 1524 bytes (total 64, permanent 64):      16 in free list (0 min, 64 max allowed)      48 hits, 0 fallbacks      16 max cache size, 16 in cache TokenRing0 buffers, 4516 bytes (total 48, permanent 48):      0 in free list (0 min, 48 max allowed)      48 hits, 0 fallbacks      16 max cache size, 16 in cache TokenRing1 buffers, 4516 bytes (total 32, permanent 32):      32 in free list (0 min, 48 max allowed)      16 hits, 0 fallbacks 0 ^H failures (0 ^I no memory)

Following are the highlighted values from Example 11-7:

A "Total" identifies the total number of buffers in the pool, including both used and unused buffers. Also available via MIB buffer<buffer pool size>Total.

B "permanent" identifies the permanent number of allocated buffers in the pool. These buffers are always in the pool and cannot be trimmed away. Also available via MIB buffer<buffer pool size>Total.

C "hits" identifies the number of buffers that have been requested from the pool. The "hits" counter provides a mechanism for determining which pool must meet the highest demand for buffers. Also available via MIB buffer<buffer pool size>Hit.

D "misses" identifies the number of times a buffer has been requested and the processor detected that additional buffers are required (the number of buffers in the free list has dropped below "min"). The "misses" counter represents the number of times the processor has been forced to create additional buffers. Also available via MIB buffer<buffer pool size>Miss.

E "trims" identifies the number of buffers that have been trimmed from the pool by the route processor when the number of buffers "in free list" exceeds the number of "max-allowed" buffers. Also available via MIB buffer<buffer pool size>Trim.

F "created" identifies the number of buffers that have been created in the pool by the RP when demand for buffers has increased so that the number of buffers "in free list" is less than "min" buffers and/or a "miss" occurs due to zero buffers "in free list." Also available via MIB buffer<buffer pool size>Create.

G "fallbacks" are counts of buffer allocation attempts that resulted in falling back to the public buffer pool that is the smallest pool at least as big as the interface buffer pool. No MIB associated with this value.

H "failures" identifies the number of failures to grant a buffer to a requester, even after attempting to create an additional buffer. The number of "failures" represents the number of packets that have been dropped due to buffer shortage. Also available via MIB bufferFail.

I "no memory" identifies the number of "failures" due to insufficient memory to create additional buffers. Also available via MIB bufferNoMem.