ErrorFault Data for Switch Hardware


Error/Fault Data for Switch Hardware

For fault management in switches, you will depend mainly on the SNMP traps and syslog messages to tell you when hardware issues are arising in the network, versus actively going out and polling MIB objects. However, we will look at some MIB objects that you may want to actively poll for or poll for based on some event correlation, such as a syslog message or defined RMON thresholds exceeded, based on SNMP traps.

MIB Variables for Switch Failures

From MIB CISCO-STACK MIB, the following variables are relevant to switch failures:

  • chassisMinorAlarm: A minor alarm varbind within an snmp trap message.

  • chassisMajorAlarm: A major alarm varbind within an snmp trap message.

From the moduleTable within the CISCO-STACK MIB:

  • moduleStatus: The status of a module within the switch chassis.

  • moduleTestResult: The result of a power-on self test for a module within the switch chassis.

Only two of these MIB objects are worth polling: moduleStatus and moduleTestTesult. And they need to be actively polled based on only an SNMP trap or syslog message seen. The other two MIB objects, chassisMajorAlarm and chassisMinorAlarm, are varbinds within the SNMP trap chassisAlarmOn and chassisAlarmOff.

Minor and Major Chassis Alarms

When the system LED status turns to red, a chassisMajorAlarm is generated. When the system LED status turns orange, a chassisMinorAlarm is generated. The trap generated will be a chassisAlarmOn trap. Included with the traps are variables that indicate whether the trap is from a chassisTempAlarm, a chassisMinorAlarm, or a chassisMajorAlarm. Decoding the trap indicates what kind of alarm generated the trap.

A chassisMajorAlarm exhibits one of the following conditions:

  1. Any voltage failure

  2. Simultaneous Temp and Fan failure

  3. 100 percent power supply failure (2 out of 2 or 1 out of 1)

  4. EEPROM failure

  5. NVRAM failure

  6. MCP communication failure

  7. NMP status "unknown"

A chassisMinorAlarm exhibits one of the following conditions:

  1. Temp alarm

  2. Fan failure

  3. Partial power supply failure (1 out of 2)

  4. Two power supplies of incompatible types

Based on appropriate syslog messages or SNMP traps received on your Network Management console, you can determine when you need to actively poll these MIB objects.

CLI Commands for Switch Failure

The following are show commands that can be used to get the same type of data points as the MIB objects mentioned previously for switch health.

Switch Health from show system

The show system command for this section will "zoom" in on the system status (Sys-Status) as displayed in the output. Other components seen in this output are power supply, fan, and temperature status. The normal system status should have a value of "ok". The only other value seen here is "faulty," which is based on a particular alarm that triggered, either Major or Minor.

Example 10-13 shows ouput from show system, with emphasis on information regarding switch health.

Example 10-13 Obtaining switch health information from show system.
 Switch>show system PS1-Status PS2-Status Fan-Status Temp-Alarm Sys-Status Uptime d,h:m:s Logout ---------- ---------- ---------- ---------- ---------- -------------- --------- ok         none       ok         off        ok A          4,23:06:16     20 min PS1-Type   PS2-Type   Modem   Baud  Traffic Peak Peak-Time ---------- ---------- ------- ----- ------- ---- ------------------------- WS-C5508   none       disable  9600   0%      0% Wed Apr 21 1999, 15:57:24 System Name              System Location          System Contact ------------------------ ------------------------ ------------------------ 

"Sys-Status" (A) displays the current state of the switch based on the "health" of the processor. If there are any alarms triggered that are power-, temperature- or fan-related, the Sys-Status would be affected, in addition to the other variables. Think of the Sys-Status as the main reporting mechanism for the switch as a whole.

Switch Health from show module

This command allows you to see what kind of card is installed in the switch chassis and what the status is of the line cards or supervisor cards. You can also get the module, number of ports, card model, serial number, hardware version, firmware version, and software version from this output. You can also see from this output any sub-model types typically installed on the supervisor card, such as the netflow feature card (NFFC) or the uplink modules. This data is especially prevalent on the newer Supervisor cards (Supervisor III or WS-X5530).

The focus for Example 10-14 is on the individual module Status column.

Example 10-14 Obtaining switch health information from show module.
 Switch> sh module Mod Module-Name         Ports Module-Type           Model    Serial-Num Status --- ------------------- ----- --------------------- --------- --------- ------- 1                       2     100BaseFX MMF Supervi WS-X5530  011437543 ok A 2                       2     MM MIC FDDI           WS-X5101  003397731 ok A 6                       12    10/100BaseTX Ethernet WS-X5213  003974709 ok A Mod MAC-Address(es)                        Hw     Fw         Sw --- -------------------------------------- ------ ---------- ----------------- 1   00-e0-4f-73-8e-00 to 00-e0-4f-73-91-ff 2.0    3.1.2      4.5(1) 2   00-60-3e-cd-55-6c                      1.1    1.1        3.1(1) 6   00-60-83-5d-8a-ec to 00-60-83-5d-8a-f7 1.0    1.4        4.5(1) Mod Sub-Type Sub-Model Sub-Serial Sub-Hw --- -------- --------- ---------- ------ 1   NFFC     WS-F5521  0011437958 1.1 1   uplink   WS-U5533  0008588482 1.0 Mod SMT User-Data              T-Notify CF-St    ECM-St    Bypass --- -------------------------- -------- -------- --------- ------- 2   WorkGroup Stack            30       c-Wrap-B in        absent 

The "Status" column (A) shows you the current state of the module. It can be one of the following values: ok, disable, faulty, other, standby, or error. If there is a "faulty" condition on the module, you can issue the show log or show test [mod_num] command to see why it is faulty.

SNMP Traps for Switch Failure

From MIB CISCO-STACK-MIB TRAPS, several SNMP traps are relevant to switch failure:

  • chassisAlarmOn

  • chassisAlarmOff

  • moduleDown

  • moduleUp

A chassisAlarmOn trap signifies that the agent entity has detected the chassisTempAlarm, chassisMinorAlarm, or chassisMajorAlarm object, and this MIB has transitioned to the on(2) state. The generation of this trap can be controlled by the sysEnableChassisTraps object in this MIB or by using the CLI command set snmp trap enable chassis.

A chassisAlarmOff trap signifies that the agent entity has detected the chassisTempAlarm, chassisMinorAlarm, or chassisMajorAlarm object, and this MIB has transitioned to the off(1) state. The generation of this trap can be controlled by the sysEnableChassisTraps object in this MIB or by using the CLI command set snmp trap enable chassis.

A moduleDown trap signifies that the agent entity has detected that the moduleStatus object in this MIB has transitioned out of the ok(2) state for one of its modules. The generation of this trap can be controlled by the sysEnableModuleTraps object in this MIB or by using the CLI command set snmp trap enable module.

Refer to the Chassis Alarm MIBs previously discussed for an explanation of when a certain trap would be seen.

A moduleUp trap signifies that the agent entity has detected that the moduleStatus object in this MIB has transitioned to the ok(2) state for one of its modules. The generation of this trap can be controlled by the sysEnableModuleTraps object in this MIB or by using the CLI command set snmp trap enable chassis.

Syslog Messages for Switch Failure

The syslog functionality was first introduced to the Catalyst series switches in software release 2.4. Table 10-5 summarizes only those messages that apply to hardware and to the variables already discussed in this section.

TIP

It is recommended to turn on timestamps on the log messages so you can correlate events to issues in the network. Using the command set logging timestamp enable will turn on the timestamps for the log messages.


Table 10-5. Syslog Messages for Switch Health Information
Message Explanation
SYS-3-MOD_FAILREASON: Module [dec] failed due to [chars][chars][chars] [chars] This message indicates that the module [dec] has failed because of [chars]. [dec] is the module number and [chars] is one of the following: CPU Initialization Error, Memory Test Failed, Boot Checksum Verification Failed, SPROM Checksum Verification Failed, EOBC Loopback Test Failed, LTL-A Error, Flash Erase/Write Error, Pinnacle CBL Error, Pinnacle Packet Buffer Error, Pinnacle TLB Error, or Unknown or Undocumented Error. The first [chars] line is Ports disabled if the module is a non-ATM/Route Switch Module (RSM) (non-IOS). The second [chars] line is a description of the module type configured in NVRAM. The third [chars] line is a description of the module type inserted in the slot. Execute the CLI command show test [mod_num] to see what specifically failed.
SYS-3-MOD_MINORFAIL: Minor problem in module [dec] This message indicates that a module [dec] failed the self-test; [dec] is the module number. Execute the CLI command show test [mod_num] to see what specifically failed.
SYS-3-MOD_FAIL: Module [dec] failed to come online This message indicates that module [dec] failed to come online; [dec] is the module number. Execute the CLI command show module to see the status of the module.
SYS-5-MOD_INSERT: Module [dec] has been Inserted This message indicates that module [dec] was inserted; [dec] is the module number. This message is provided for information only. If a module is inserted and the message does not appear, this might indicate a problem. Enter the show module or show port [mod_num/port_num] command to verify that the system has acknowledged the module and brought it online.
SYS-5-MOD_REMOVE: Module [dec] has been Removed This message indicates that module [dec] was removed; [dec] is the module number. This message is provided for information only. If a module is removed and the message does not appear, this might indicate a problem. Enter the show port [mod_num/port_num] command to query the module. The system should respond as follows: Module n is not installed.
SYS-5-SYS_RESET: System reset from [chars] This message indicates that the system was reset from [chars]; [chars] is a console number if the request is from a console session or IP address if the request is from a Telnet session or SNMP.
SYS-5-MOD_OK: Module [dec] is online This message indicates that module [dec] passed diagnostic self-test and is online; [dec] is the module number. Usually seen after the SYS-5-SYS_RESET message occurs if modules are working properly.



Performance and Fault Management
Performance and Fault Management: A Practical Guide to Effectively Managing Cisco Network Devices (Cisco Press Core Series)
ISBN: 1578701805
EAN: 2147483647
Year: 2005
Pages: 200

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net