ErrorFault Data for Router Processors

Error/Fault Data for Router Processors

The format of this section is identical to the performance sections, except that in addition to MIBs and CLI commands, we also will present relevant SNMP traps and syslog messages.

SNMP traps require you have a trapd daemon running on some SNMP server and the router must point to that server configured via the snmp server-host configuration command. Syslog messages can be stored on a syslog server or on the router itself, either on the console or in a buffer. If stored in the buffer, executing the command show logging will show the same values as seen on a syslog server. This feature is especially important to use in addition to a syslog server when the network is down or when the route to the syslog server is down or unavailable. See "Setting Up SNMP" in Chapter 18 for more information on best practices for SNMP configuration.

MIB Variables for Memory Leaking or Depletion

The bufferNoMem variable, from OLD-CISCO-MEMORY-MIB, is a counter of the number of buffer create failures due to no free memory. If there is not enough system memory to allocate an appropriate size buffer, the buffernoMem counter is incremented. Increments in buffernoMem can also provide you with locations of some of your network issues. If this counter increases at all, then a packet probably is being dropped.

In "Performance Data for Router Processors," we focused on looking at the CPU processes and the memory usage. Here, we look at the way system buffers affect memory. System buffers are dynamic in nature and constantly changing, either by creating or trimming, thus affecting the buffers' interaction with system memory. Ideally, you want to have the trims and creates stay fairly constant. We'll look at the trims and creates in the "show" command output a little later in this section.

The buffernoMem MIB can be used in correlation with the memory MIBs ciscoMemoryPoolFree, freeMem, or ciscoMemoryPoolLargestFree.

The recommended baseline threshold for BufferNoMem is that the value should be relative to sysUptime. But even a value of 1 can be an indication to start looking at where the misses are occurring because misses cause creates, and create failures cause "no memory" conditions.

A related MIB object from OLD-CISCO-MEMORY MIB is bufferFail, which is a count of the number of buffer allocation failures. This MIB is really a superset of bufferNoMem variable, and is typically seen as the same value as bufferNoMem.

CLI Commands for Analyzing Memory Usage

See "CLI Commands for Buffer Usage" for details on the show buffers command and output (see Example 11-7). For details on the show memory command and output (see Example 11-5), see "Using the show memory Command."

Syslog Messages Relating to Memory Issues

A number of syslog messages are useful for memory fault management, and apply directly to the MIB objects and CLI commands previously discussed. They are collected in Table 11-3.

Table 11-3. Syslog Messages for Memory Information
Message	Explanation
`%SYS-2-MALLOCFAIL: Memory allocation of [dec] bytes failed from [hex], pool [chars], alignment [dec]`	The requested memory allocation is not available from the specified memory pool or system buffer. The current system configuration, network environment, or possibly a software error might have exhausted or fragmented the router's memory. If this message is seen in the syslog, you more than likely are experiencing a memory leak of some kind in the router. To assist in isolating where the problem lies, look at the fragmented memory or largest block free from a show memory output to get that number, as well as looking at the amount of memory allocated (holding) for each process from the CLI command show proc mem. By trending the holding memory for each process and the largest memory block free, you can easily pinpoint where the memory leak is occurring. Record your findings and report it to the Cisco TAC (Technical Assistance Center) or search the Bug Navigator tool on CCO URL to identify possible known defects: http://www.cisco.com/support/bugtools/bugtool.shtml.
`%SYS-2-NOMEMORY: No memory available for [chars] [dec]`	This syslog message indicates that an operation could not be accomplished because of a low-memory condition. The current system configuration, network environment, or possibly a software error might have exhausted or fragmented the router's memory. This message typically is attributed to fragmented memory. The show memory CLI command or ciscoMemoryPoolLargestFree MIB variable values can be directly correlated to these kinds of system messages.

MIB Variables for Identifying Router Reloads

The WhyReload MIB, from OLD-CISCO-SYSTEM MIB, contains a printable octet string that contains the reason why the system was last restarted. Reasons include things such as "power on," user-initiated reload, exception, or some other error.

The whyReload MIB can help you track change management windows, such as scheduled powerdowns and possible IOS defects, when values such as exceptions are seen for reasons. Used in conjunction with the SNMP trap reload, whyReload can provide further insight. For example, based on a reload trap seen on the snmp server-host, you can trigger an SNMP poll of the whyReload MIB variable to find the reason why the router reloaded.

The recommended baseline threshold is that any value other than "power-on" or "reload" should be flagged because it can identify possible software or hardware errors.

CLI Commands for Analyzing Reload Crash Conditions

The show stacks command is useful for troubleshooting software-forced crashes on the router that caused the reload SNMP trap to initiate. The result of the whyReload MIB can lead you to look at this command output. See Chapter 10 for details on the output from the show stack command in the section "Router Health from show stack."

SNMP Traps Relating to Reload Conditions

The reload trap, from CISCO-GENERAL-TRAPS, indicates that your router reloaded for some reason. It is sent when the router detects that it is booting because a trap is unlikely to successfully get sent when the router is in the act of rebooting itself. The following section displays the syslog messages relating to reload conditions. When a reload syslog message is reported, the reload trap will follow and correspond to that message. Refer to your network management vendor for details on the format of the reload trap.

Syslog Messages Relating to Reload Conditions

A number of syslog messages are useful for analyzing why routers reload, and apply directly to the MIB objects and CLI commands previously discussed. They are collected in Table 11-4.

Table 11-4. Syslog Messages for Router Reload Information
Message	Explanation
`%SYS-5-RELOAD: Reload requested`	This message indicates that someone or something requested a reload of the router. This can happen if the actual reload command is typed from the command line or if the router reloaded due to a software error.
`%SYS-5-RESTART: System restarted` `--[chars]`	This syslog message indicates the router has restarted and is up and operational, or is at least done booting up the IOS.

The two syslog messages in Table 11-4 are good baseline or "threshold" points to monitor, once for when the reload was requested and once for when the router is back online. Also, you can determine how long it takes for the router to boot up from these two syslog messages, from the reload request to the "restarted" message.