Fault Management for ATM Interfaces | Performance and Fault Management: A Practical Guide to Effectively Managing Cisco Network Devices (Cisco Press Core Series)

Fault management for ATM interfaces involves monitoring for errors at the various levels of the ATM stack. There may be bit errors at the ATM level or SAR errors at the AAL5 level. The errors may be due to bad media, interface hardware, or internal resource limitations. This section covers the meaning of different error counters from both the MIB and the CLI.

Some standard errors are as follows:

HEC errors: The HEC (Header Error Control) in the ATM cell header is an 8-bit CRC code. HEC errors would be similar to FCS errors in Ethernet (see the "Error/Fault Monitoring" section in Chapter 12.
SAR timeouts: For the AAL5 layer this error indicates CPCS PDUs that are discarded because they were not reassembled in the required time period.
SDUs too long: Also for the AAL5 layer this error means that CPCS PDUs were discarded because they are too long.

This fault section is divided into two sections, one on edge devices (the routers and catalyst switches) and one on ATM switches (the LS1010 and the 8500MSR).

Router/Catalyst Interfaces

For routers, you need to watch for errors at both the ATM layer and at the AAL5 layer because routers are an ATM host. But some Cisco router ATM interfaces have limited support for certain error counters due to chipset limitations. You will not see counts for them in the MIB or the show interface command. The ifTable shows no information for interface errors or discards at any layer for ATM interfaces.

MIB Variables for ATM Interface Errors

Because AAL5 plays a much bigger part on the edge of the ATM cloud, the aal5VccTable from the AToM MIB is particularly useful here.

From RFC 2215, the following MIB variables are useful for detecting ATM interface errors:

aal5VccCrcErrors: Indicates the number of CPCS PDUs with CRC errors.
aal5VccSarTimeOuts: Indicates CPCS PDUs that could not get reassembled within the required time period.
aal5VccOverSizedSDUs: Indicates thatCPCS PDUs were discarded because they are too big.

The preceding three objects are all in the aal5VccTable, which is indexed by ifIndex, VPI, and VCI. These counters are supported only on the Enhanced ATM Port Adapter and on IOS 11.3 or above. They are not supported for the ATM Interface Processor (AIP) or the original ATM Port Adapter.

In addition, you can use the generic error counters from the interfaces table. However, the same object may have different meanings, depending on whether you are looking at the ATM cell layer or the AAL5 layer. From RFC 2233, the following MIB variables are useful:

ifInDiscards, ifOutDiscards (Available only for the AAL5 layer): The number of AAL5 CPCS PDUs discarded. Could be due to buffer errors.
ifInErrors: At the AAL5 layer, this object is the sum of the three error counters in the aal5VccTable. At the ATM cell layer, it is the number of cells with HEC errors.
IfInUnknownProtos: The number of received cells with unrecognized VPI/VCI errors.

CLI Commands for ATM Interface Errors

Use your NMS to monitor for errors via SNMP collections. When errors are detected, you should turn to the CLI show commands to drill down to find the problem. This section covers several show commands that are useful for tracking down problems on your ATM interfaces.

The show interface command (see Example 14-14) gives you the most information on the current state of an ATM interface.

Example 14-14 Obtaining ATM error information with the show interface command.

 nms-7507a#sh inter atm1/0/0 ATM1/0/0 is up, line protocol is up   Hardware is cyBus ENHANCED ATM PA   MTU 4470 bytes, sub MTU 4470, BW 44209 Kbit, DLY 190 usec, rely 255/255, load 1/255   Encapsulation ATM, loopback not set, keepalive not set   Encapsulation(s): AAL5 AAL3/4   4096 maximum active VCs, 1 current VCCs   VC idle disconnect time: 300 seconds   Last input never, output 00:03:14, output hang never   Last clearing of "show interface" counters never   Queueing strategy: fifo   Output queue 0/40, 0 drops^G; input queue 0/75, 0 drops   5 minute input rate 0 bits/sec, 0 packets/sec   5 minute output rate 0 bits/sec, 0 packets/sec      8 packets input, 743 bytes, 0 no buffer^F      Received 0 broadcasts, 0 runts^D, 0 giants^E      0 input errors, 0 CRC^C, 0 frame, 0 overrun^A, 0 ignored^B, 0 abort      5 packets output, 560 bytes, 0 underruns      0 output errors, 0 collisions, 0 interface resets      Output buffers copied, 0 interrupts, 0 failures

The highlighted items in Example 14-14 are as follows:

A overrun: Input stack overflow due to a lack of SAR buffers.

B ignore: A problem processing packets further up the stack after the SAR layer.

C CRC: Either line noise or cell drops due to resource limitations on the ATM adapter.

D runts: Packets smaller than a single cell due to cell corruption.

E giants: Packets larger than the VC MTU.

F no buffer: Internal SAR resource limitations.

G Output queue drop: Packets dropped due to resource limitation on the Virtual Interface Processor or the port adapter itself.

The CRC, runts, and giant errors would be due to faulty media or interface hardware. The remaining errors are related to traffic load, so the amount of traffic is more than the device can process through the ATM stack.

LS1010/8500MSR Interfaces

For ATM switches, we are more concerned with the cell layer statistics than with AAL5 layer statistics. The vast majority of the traffic passing through the switch is transit rather than terminating on the switch itself. Therefore, we are not concerned with AAL5 statistics.

The best indicators of errors on both the LS1010 or Catalyst 8500MSR are in the ifTable. ifInErrors counts HEC-type errors at the ATM cell layer. At the AAL5 layer, ifInErrors is a sum of CSPS CRC errors, SAR timeouts, and SDU Oversize errors.

The CISCO-ATM-IF-PHYS-MIB counts error statistics at the physical layer. The ciscoAtmIfPhysTable is a subset of objects from the sonet and ds3 MIBs.

MIB Variable for Errors on the LS1010/Catalyst8500

As with performance management, for error management of ATM switches in the cloud we are more concerned with the ATM cell layer than the AAL5 layer. For a big picture view, objects from RFC 2233 work fine. If you need to drill down to the physical layer to look at low-level errors, use the CISCO-ATM-IF-PHYS-MIB. The following MIBs are particularly relevant:

ifInErrors: From the ifTable a count of HEC errors.
ifInUnknownPro: From the ifTable a count of cells with invalid VPI/VCIs.
ciscoAtmIfPhysTa: From the CISCO-ATM-IF-PHYS-MIB any entries applicable to the type of interface (sonet or DS3) are of interest.

CLI Commands for Errors on ATM Interfaces

The show interfaces command shows several error types, including several counters you can ignore. As shown in Example 14-15, the runt, giant, overrun, ignore, and abort counters are all not applicable to ATM interfaces on the LS1010 or the 8500MSR.

Example 14-15 Obtaining ATM error information for a switch with the show interfaces command.

 nms-5500a-asp>sh interface atm12/0/0 ATM12/0/0 is up, line protocol is up   Hardware is oc3suni   Description: testit   MTU 4470 bytes, sub MTU 4470, BW 155520 Kbit, DLY 0 usec, rely 255/255, load 1/255   Encapsulation ATM, loopback not set, keepalive not supported   Last input 00:00:00, output 00:00:00, output hang never   Last clearing of "show interface" counters never   Queueing strategy: fifo   Output queue 0/40, 0 drops; input queue 0/75, 0 drops   5 minute input rate 9000 bits/sec, 47 packets/sec   5 minute output rate 18000 bits/sec, 63 packets/sec      22143052 packets input, 1173581756 bytes, 0 no buffer      Received 0 broadcasts, 0 runts, 0 giants, 0 throttles      0 input errors ^A, 0 CRC^B, 0 frame^C, 0 overrun, 0 ignored, 0 abort      350516665 packets output, 1397514061 bytes, 0 underruns      0 output errors, 0 collisions, 0 interface resets      0 output buffer failures, 0 output buffers swapped out

The relevant information from Example 14-15 is as follows:

A The input counter counts damaged cells.

B The CRC counter counts cells with HEC errors.

C The frame counter counts those cells with framing/alignment errors.

The show controller atm command (see Example 14-16) gives details on physical and clocking problems. All error counters should be close to zero.

Example 14-16 Using the show controller atm command to obtain information on physical and clocking problems.

 nms-5500a-asp>sh cont atm12/0/0 IF Name: ATM12/0/0    Chip Base Address: A8E08000 Port type: OC3    Port rate: 155 Mbps    Port medium: MM Fiber Port status:Good Signal    Loopback:None    Flags:8308 TX Led: Traffic Pattern    RX Led: Steady Green  TX clock source:  network-derived Framing mode:  sts-3c Cell payload scrambling on Sts-stream scrambling on OC3 counters:   Key: txcell - # cells transmitted        rxcell - # cells received        b1     - # section BIP-8 errors        b2     - # line BIP-8 errors        b3     - # path BIP-8 errors        ocd    - # out-of-cell delineation errors - not implemented        g1     - # path FEBE errors        z2     - # line FEBE errors        chcs   - # correctable HEC errors        uhcs   - # uncorrectable HEC errors txcell:350556951, rxcell:22168814 b1:0, b2:0, b3:0, ocd:0 g1:0, z2:0, chcs:0, uhcs:0 OC3 errored secs: b1:0, b2:0, b3:0, ocd:0 g1:0, z2:0, chcs:0, uhcs:0 OC3 error-free secs: b1:446692, b2:446692, b3:446692, ocd:0 g1:446692, z2:446692, chcs:446692, uhcs:446692 Clock reg:8F   mr 0x30, mcfgr 0x70, misr 0x00, mcmr 0x0F,   mctlr 0x08, cscsr 0x50, crcsr 0x20, rsop_cier 0x00,   rsop_sisr 0x00, rsop_bip80r 0x00, rsop_bip81r 0x00, tsop_ctlr 0x80,   tsop_diagr 0x80, rlop_csr 0x00, rlop_ieisr 0x00, rlop_bip8_240r 0x00,   rlop_bip8_241r 0x00, rlop_bip8_242r 0x00, rlop_febe0r 0x00, rlop_febe1r 0x00,   rlop_febe2r 0x00, tlop_ctlr 0x80, tlop_diagr 0x80, rpop_scr 0x00,   rpop_isr 0x00, rpop_ier 0x00, rpop_pslr 0x13, rpop_pbip80r 0x00,   rpop_pbip81r 0x00, rpop_pfebe0r 0x00, rpop_pfebe1r 0x00, tpop_cdr 0x80,   tpop_pcr 0x80, tpop_ap0r 0x00, tpop_ap1r 0x90, tpop_pslr 0x13,   tpop_psr 0x00, racp_csr 0x04, racp_iesr 0x01, racp_mhpr 0x00,   racp_mhmr 0x00, racp_checr 0x00, racp_uhecr 0x00, racp_rcc0r 0x04,   racp_rcc1r 0x00, racp_rcc2r 0x00, racp_cfgr 0xFC, tacp_csr 0x04,   tacp_iuchpr 0x00, tacp_iucpopr 0x6A, tacp_fctlr 0x00, tacp_tcc0r 0x2A,   tacp_tcc1r 0x00, tacp_tcc2r 0x00, tacp_cfgr 0x08,   phy_tx_cnt:350556951, phy_rx_cnt:22168808

Monitoring PVC Status

In addition to looking at interface errors, monitoring the status of PVCs is important. It is quite possible to have an error-free link and perfectly functioning interfaces, but if the PVC is down for some reason, traffic will not pass.

As defined in RFC 2515, the ATM Interface VCL Table shows you all the VCs on a given interface. You can monitor atmVclOperStatus in that table for a given VC indexed by ifIndex, VPI, and VCI to ensure the PVC is operational.

Or you can go proactive: As of IOS 12.0T, Cisco routers support some extensions to the atmInterfaceConfTable from RFC 2515. These extensions are currently under the ciscoExperiment branch, but will be moved out to a standard branch at future date. For now, the objects are defined in the CISCO-IETF-ATM2-PVCTRAP-MIB. In this document, an SNMP notification (trap) is defined for PVC failures: atmIntfPvcFailuresTrap. This notification must be enabled either via SNMP set on the atmIntfPvcFailuresTrapEnable objects or via the CLI with the following command:

 snmp-server enable trap atm pvc