ErrorFault Monitoring | Performance and Fault Management: A Practical Guide to Effectively Managing Cisco Network Devices (Cisco Press Core Series)

Error/Fault Monitoring

Error and fault monitoring on any interface consist of monitoring the operational status and framing, signaling, or other errors that result in the loss of a frame. Each type of interface has its own unique errors. For example, Ethernet links have collisions. T1 links have bipolar violations. But all interfaces do share some basic fault data. The rest of this chapter is devoted to generic fault management of interfaces.

Link Status

The most basic fault management element of an interface is monitoring the link status. ifAdminStatus and ifOperStatus will show whether the interface is operational or not as far as the device is concerned. If ifAdminStatus is disabled, it is because the device administrator configured it that way. A device interface generally can be disabled either through the command line or via SNMP set commands. If ifAdminStatus is down, ifOperStatus is also down.

ifOperStatus indicates the operational state of the interface. The original mib-2 specification in RFC 1213 defined ifOperStatus with three states: up, down, or testing. An operational interface is up. An interface with a serious fault such as a cable break is down. An interface in some testing mode such as a WAN interface in Loopback mode has ifOperStatus set to "testing."

RFC 1573 defined a new state for ifOperStatus dormant. A dormant interface is not operating but rather is in a pending state. A dial-on-demand interface is dormant. Dialer interfaces in spoofing mode are dormant. Cisco router IOS started support for this new state in IOS 11.1.

You can monitor the operational status of an interface via active polling of ifOperStatus or receiving a link trap. RFC 1157 defined linkUp and linkDown traps. A device will generate a linkUp trap when ifOperStatus transitions from the down state to either the up or dormant state. Similarly, a device will issue a linkDown trap when ifOperStatus enters the down state.

An additional object in the portTable of the CISCO-STACK-MIB for Catalyst workgroup switches is portOperStatus.

portOperStatus can take four values:

other(1) any state other than ok, minorFault, or majorFault
ok(2) the same as ifOperStatus in up state
minorFault(3) a port administratively disabled
majorFault(4) the same definition as ifOperStatus in down state.

Most Cisco devices also use the linkUp or linkDown trap, as defined by RFC 1157, to proactively notify a management station of an interface state change for a physical interface. However, link traps are not implemented for any sub-interface types. The rationale is that if a physical interface goes down, a router would generate only one trap rather than a trap for the physical interface and each sub-interface.

Further, it may be troublesome to receive link traps for every port status change. Some interfaces change state as part of the normal course of operation(for example, asynchronous dial up ports or closet switch ports). Routers offer an interface configuration command:

no snmp trap link-status

The corresponding command for Catalyst switches:

set port trap mod_num/port_num enable | disable

You can use these commands to filter needless traps from clogging up the event logs on your management station.

Standard linkDown and linkUp traps are sent with ifIndex, ifDescr, and ifType as variable bindings. Cisco Routers also add a proprietary object (locIfReason) from the OLD-CISCO-INTERFACES-MIB. locIfReason is just an ASCII string. Some of the possible values you might see for locIfReason include the following:

Fatal Tx Error
Keepalive failed
Lost Carrier
Late Collision
Excessive collision
Open Failure, Lobe
Ring Beaconing
Duplicate Address
Remove MAC
Keepalive failed
Wire-fault
Auto-removal
Ring Beaconing
LAPB down
EIA signal lost

However, the link traps generated by the Catalyst switches only send out ifIndex, ifDescr, and ifType.

In addition to link traps, there are also system error messages that may be logged to a buffer on the device or to a syslog server.

Link Errors

All framing schemes on various types of media contain some type of error check field at the end of the frame. Bad media commonly cause framing errors. Bad media may consist of faulty interfaces, bad cables and connectors, or out-of-spec cables (too long). Most agents will count such errors in relevant fields in the ifInErrors column in the interface table. The ifOutErrors column indicates problems within the device itself, where it could not transmit the frame for some reason.

Each medium has a specified maximum error rate. For example, the IEEE 802.3 specification states that an Ethernet's error rate will not exceed 10⁸. To get the received error rate, you want to divide the received errors by the number of received packets not the number of received octets. A single bit error in a frame will cause the whole frame to be discarded. From Chapter 4, the equation for calculating the input error rate is as follows:

graphics/12equ02a.gif

It is important to note that the output errors generally are related to resource availability (for example, internal buffers) of the router or switch rather than faulty interface or media. Therefore, output errors rates should not necessarily hold to specification.

Acceptable error rates vary between different media types. We will cover specific media types in subsequent chapters.

MIB Variables for Interface Errors

From MIB RFC 2233, the MIBs relevant to interface errors are as follows:

ifInErrors, ifOutErrors The number of frames containing some type of error that prevents them from being forwarded
ifInDiscards, ifOutDiscards The number of error-free frames the device had to discard, generally because of some resource problem internal to the device such as no free input buffers. Please refer to Chapter 11 for more details.