Scenario 10-3: Troubleshooting the errDisable Status | CCNP(R) Practical Studies: Switching (CCNP Self-Study)

The errDisable status is a feature utilized by Cisco Catalyst switches that is designed to protect the network from issues resulting from switch misconfiguration and other errors in the network. The errDisable status describes a port that has been shut down by the switch operating system due to an error being detected on the port. Once a port is placed into the errDisable state, an administrator must manually re-enable the port. This feature is reserved for errors that might seriously jeopardize the stability of the switch or the entire LAN network.

NOTE

You can also configure Cisco Catalyst switches to automatically enable errDisabled ports after a configurable timer expires.

The following lists common reasons why a port might be placed into an errDisable state:

Spanning-tree BPDU Guard This feature is used on ports that have the spanning-tree PortFast feature enabled. Because PortFast places a port into a forwarding state immediately upon activation, it should be used only for ports that are connected to a single-homed device, and not any bridging device, such as another LAN switch. If another LAN switch is connected to the port, the port receives spanning-tree bridge protocol data units (BPDUs). This indicates to the receiving switch that a multi-homed device is connected to the PortFast-enabled port (in error). If BPDU Guard is enabled, the switch shuts down the port and places the port into an errDisable state.
EtherChannel misconfiguration A common cause of the ports being placed into an errDisable is EtherChannel misconfiguration. When configuring EtherChannel, you must ensure that various parameters are identical for all ports on both sides of the bundle. For example, all ports must belong to the same VLAN or if using trunks, all ports must be configured as trunks with the same native VLAN (when using 802.1Q trunking). A common misconfiguration problem is when incompatible PAgP modes are used on the endpoints of the bundle or when speed and duplex settings are not matched.
Unidirectional Link Detection (UDLD) This features is used to detect unidirectional failures. Unidirectional failures are common with fiber-based connections where a pair of physically separated fiber strands represents the transmit and receive for the connection. If one of these fibers is damaged or broken, while the other fiber still operates okay, traffic can flow only in one direction. This introduces an unforeseen situation that was probably not anticipated by the designers of spanning tree. STP assumes that all links can send traffic in both directions, which is a reasonable assumption for any network connection that is useful. Having traffic being sent only in one direction can cause loops to form in a spanning-tree topology when a blocked spanning-tree port fails to received BDPUs from the designated bridge for the link, due to the failure of the receive fiber on the blocked port. After spanning-tree timers expire, the blocked port is placed into a forwarding state, generating a loop in the network. UDLD is designed to ensure a Layer 2 connection is bidirectional at all times by detecting unidirectional failures. If a unidirectional failure is detected, the switch places the port into an errDisable state to prevent spanning-tree loops from forming.
Port security Port security enables administrators to define a list of source MAC addresses permitted on a port and by default shuts down a port if a security violation, where a frame is received that contains an unauthorized MAC address, occurs. The fact that ports are disabled due to port security violations is often by design; the security policy of an organization might dictate that such an action is appropriate for a port security violation.
Other issues Other issues can cause a port to be placed into the errDisable state, including invalid GBICs (i.e., GBICs inserted manufactured by a non-Cisco approved manufacturer), excessive late collisions, duplex mismatches, link flapping, PAgP flapping, and Dynamic Host Configuration Protocol (DHCP) snooping rate-limiting.

Troubleshooting Steps

As with any problem that you might try to solve, you should take clear troubleshooting steps, depending on the issue you are trying to tackle. The following describes each of the troubleshooting steps you should take when trying to determine the cause of the errDisable state of a port:

Step 1.	Determine an issue exists
Step 2.	Determine why the port(s) were disabled
Step 3.	Resolve the issue(s)
Step 4.	Re-enable the port(s)

Each of these troubleshooting steps is now examined in detail.

Step 1Determining an Issue Exists

When a port is placed into an errDisable state, the visibility of such an event to network operations personnel responsible for maintaining the network is important, so that the issue can be resolved and so that the port can be restored to an operational state. An errDisable event can be detected in several ways:

Port LED changes color from green to orange
Loss of functionality in the network
Notification via network management systems (e.g., SYSLOG messages or SNMP traps)

The first indication of a port being placed in the errDisable state is the physical status of the port as displayed on the switch itself. Cisco Catalyst switches include LEDs for each switch port, with a color of green indicating a port is connected and operating normally and a color of orange indicating the port has been placed into an errDisable state.

The next and most apparent indication of a port being in the errDisable state is that traffic ceases to be forwarded in and out of the port. If the disabled port is a workstation port, it might take a while for the problem to be detected by the end user whose workstation is connected to the port. However, if the port is connected to an important server or is an interswitch link, you generally know about the problem fairly quickly, as it has a major impact on the network.

Aside from the functional visibility and impact on the network of a port being placed into an errDisable state, operationally on the switch, as soon as a port is placed into an errDisable state, a SYSLOG message can be generated that is displayed on the console and forwarded to a SYSLOG server.

NOTE

A Simple Network Management Protocol (SNMP) trap can also be generated and forwarded to an SNMP management server if SNMP is configured appropriately on the switch.

Depending on the error event that causes a port to be placed in the errDisable state, SYSLOG messages are generated that give you clues to the reason for disabling the port. For example, Example 10-33 and Example 10-34 show what happens on a Cisco IOS switch and CatOS switch respectively when a BPDU is detected on a port that has spanning-tree PortFast enabled and BPDU guard is also enabled.

Example 10-33. SYSLOG Messages Indicating BDPU Guard Has Been Invoked on Cisco IOS

 00:54:17: %SPANTREE-2-BLOCK_BPDUGUARD: Received BPDU on port FastEthernet0/1     with BPDU Guard enabled. Disabling port. 00:54:17: %PM-4-ERR_DISABLE: bpduguard error detected on Fa0/1, putting     Fa0/1 in err-disable state 00:54:18: %LINEPROTO-5-UPDOWN: Line protocol on Interface FastEthernet0/1,     changed state to down

Example 10-34. SYSLOG Message Indicating BDPU Guard Has Been Invoked on CatOS

 21:00:09 %SPANTREE-2-RX_PORTFAST:Received BPDU on PortFast enable port.     Disabling 2/1 21:00:09 %PAGP-5-PORTFROMSTP:Port 2/1 left bridge port 2/1

In Example 10-33, the first message indicates a critical spanning tree event (as indicated by the "%SPANTREE-2" portion of the message). The message description indicates that a BPDU has been received on a port configured with PortFast (FastEthernet0/1) and that the port is being disabled as a result. The second message ("%PM-4-ERR DISABLE") indicates that an errDisable event has occurred, with interface FastEthernet0/1 placed into an errDisable state. In Example 10-34, a similar message sequence also occurs.

Example 10-35 and Example 10-36 demonstrate the messages generated when EtherChannel misconfiguration causes a ports to be disabled.

Example 10-35. SYSLOG Message Indicating EtherChannel Misconfiguration on Cisco IOS

 01:06:56: %PM-4-ERR_DISABLE: channel-misconfig error detected on Po1,     putting Fa0/1 in err-disable state 01:06:56: %PM-4-ERR_DISABLE: channel-misconfig error detected on Po1,     putting Fa0/2 in err-disable state

Example 10-36. SYSLOG Message Indicating EtherChannel Misconfiguration on CatOS

 21:00:09 %PAGP-5-PORTTOSTP:Port 1/1 joined bridge port 1/1-2 21:00:09 %PAGP-5-PORTTOSTP:Port 1/2 joined bridge port 1/1-2 21:00:09 %SPANTREE-2-CHNMISCFG: STP loop - channel 1/1-2 is disabled in vlan 1 21:00:09 %PAGP-5-PORTFROMSTP:Port 1/1 left bridge port 1/1-2 21:00:09 %PAGP-5-PORTFROMSTP:Port 1/2 left bridge port 1/1-2

In Example 10-35 and Example 10-36, the local ports are configured with a PAgP mode of on, which means they always form a bundle and don't send PAgP negotiation packets. The remote switch to which the ports are connected has PAgP disabled for the connected ports. This means that an EtherChannel bundle is formed on one side, but not formed on the other side. When an EtherChannel bundle is formed, spanning-tree BPDUs are sent down only one physical link. In Example 10-35 and Example 10-36, the remote switch sends BPDUs out both ports and the local switch receives these BPDUs on both ports of the EtherChannel bundle.

Once you have detected something is wrong in your network, and suspect that it is related to a port being disabled by the switch, you can use the show interfaces (Cisco IOS) and show port (CatOS) command to check the status of a port. Example 10-37 and Example 10-38 demonstrate checking the status of an interface on a Cisco IOS-based switch and a CatOS-based switch.

Example 10-37. Checking Interface Status on Cisco IOS

 Switch# show interfaces fastEthernet0/1 FastEthernet0/1 is down, line protocol is down (err-disabled)   Hardware is Fast Ethernet, address is 0009.b7aa.9c81 (bia 0009.b7aa.9c81) ... (Output truncated)

Example 10-38. Checking Port Status on CatOS

 Console> (enable) show port 2/1 Port  Name               Status     Vlan       Level  Duplex Speed Type ----- ------------------ ---------- ---------- ------ ------ ----- ------------ 2/1                      errDisable 1          normal   auto  auto 10/100BaseTX ... (Output truncated)

In Example 10-37 and Example 10-38, the shaded output indicates the port in each example is in an errDisable state.

Step 2Determining Why a Port is Disabled

Once you have confirmed that a port has been disabled, the next step is to determine why the port was disabled. You could re-enable the port at this point; however, it is more than likely that the problem will manifest itself again. Re-enabling the port might give users a few minutes of access and then another outage, but this can actually portray an image that the network is unstable because it is flapping up and down. A much better approach is to determine the cause of the issue and resolve it before re-enabling the port.

As discussed in Step 1, depending on the event that caused a port to be disabled, messages that indicate the cause of the port being disabled can be displayed by the switch operating system. For example, Example 10-33 and Example 10-34 show the messages that are generated when a port is disabled due to BPDU guard being invoked, while Example 10-35 and Example 10-36 show the messages that are generated when an EtherChannel misconfiguration causes a spanning-tree loop.

Cisco Catalyst switches possess an errDisable recovery mechanism, where a timer can be invoked that automatically re-enables a port that has been shut down due to an errDisable condition. Although this feature is primarily used for recovery purposes, it also allows administrators to determine the exact reason why a port has been disabled. When errDisable recovery is enabled, the switch keeps track of the interfaces currently in an errDisable state and the conditions that caused the errDisable state to be invoked. Administrators can then view this information, allowing them to determine what caused the errDisable status.

To enable the errDisable recovery mechanism on Cisco IOS, the errdisable recovery cause global configuration command is used, which can enable the feature for some or all conditions that can cause an errDisable state. Example 10-39 demonstrates enabling the errDisable recovery mechanism on Cisco IOS.

Example 10-39. Enabling errDisable Recovery on Cisco IOS

 Switch# configure terminal Switch(config)# errdisable recovery cause ?   all                 Enable timer to recover from all causes   bpduguard           Enable timer to recover from BPDU Guard error disable state   channel-misconfig   Enable timer to recover from channel misconfig disable state   dtp-flap            Enable timer to recover from dtp-flap error disable state   gbic-invalid        Enable timer to recover from invalid GBIC error disable state   l2ptguard           Enable timer to recover from l2protocol-tunnel error disable   state   link-flap           Enable timer to recover from link-flap error disable state   loopback            Enable timer to recover from loopback detected disable state   pagp-flap           Enable timer to recover from pagp-flap error disable state   psecure-violation   Enable timer to recover from psecure violation disable state   security-violation  Enable timer to recover from 802.1x violation disable state   udld                Enable timer to recover from udld error disable state   vmps                Enable timer to recover from vmps shutdown error disable state Switch(config)# errdisable recovery cause all

In Example 10-39, the errdisable recovery cause ? command is executed, which displays all of the individual causes of the errDisable status. The errdisable recovery cause all command is then executed to enable errDisable recovery for any event that causes an errDisable state on a port.

To enable the errDisable recovery mechanism on CatOS, the errdisable-timeout enable configuration command is used, which can enable the feature for some or all conditions that can cause an errDisable state. Example 10-40 demonstrates enabling the errDisable recovery mechanism on CatOS.

Example 10-40. Enabling errDisable Recovery on CatOS

 Console> (enable) set errdisable-timeout enable ?   bpdu-guard                 BPDU Port-guard   channel-misconfig          Channel misconfiguration   udld                       UDLD   other                      Reasons other than the above   all                        Apply errDisable timeout to all reasons Console> (enable) set errdisable-timeout enable all Successfully enabled errdisable-timeout for all.

After enabling errDisable recovery, if an errDisable event occurs, you can use the show errdisable recovery (Cisco IOS) and show errdisable-timeout (CatOS) commands to determine exactly what caused an errDisable event. Example 10-41 and Example 10-42 demonstrate the output of these commands on a Cisco IOS switch and CatOS switch respectively.

Example 10-41. Determining the Reason for errDisable Events on Cisco IOS

 Switch# show errdisable recovery ErrDisable Reason    Timer Status -----------------    -------------- udld                 Enabled bpduguard            Enabled security-violatio    Enabled channel-misconfig    Enabled vmps                 Enabled pagp-flap            Enabled dtp-flap             Enabled link-flap            Enabled gbic-invalid         Enabled l2ptguard            Enabled psecure-violation    Enabled loopback             Enabled Timer interval: 300 seconds Interfaces that will be enabled at the next timeout: Interface    Errdisable reason    Time left(sec) ---------    -----------------    -------------- Fa0/1              bpduguard             281

Example 10-42. Determining the Reason for errDisable Events on Cisco IOS

 Console> (enable) show errdisable-timeout ErrDisable Reason      Timeout Status ---------------------- -------------- bpdu-guard             enable channel-misconfig      enable udld                   enable other                  enable Interval: 300 seconds Port  ErrDisable Reason    Port ErrDisableTimeout  Action on Timeout ----  -------------------  ----------------------  -----------------  2/1  bpdu-guard           Enable                  Enabled

In Example 10-41 and Example 10-42, the bottom line of each output indicates the current ports that are in the errDisable state and the reason why the port is in such a state. As you can see from both examples, a port on each switch has been disabled due to the BPDU guard feature being invoked.

Step 3Resolving the Issue

Once you know the issues that are responsible for ports being disabled, you should take the necessary steps to resolve the issue(s). Obviously, the course of action taken depends on the issue. The following lists how you should approach common issues that disable ports:

BPDU Guard Check your spanning-tree PortFast configurations and either connect devices that are generating BPDUs to ports that do not have PortFast enabled or disable PortFast on ports that have PortFast incorrectly enabled.
EtherChannel misconfiguration Ensure that the various parameters configured for each port of a bundle on both sides of the bundle are identical. This includes speed/duplex settings, port VLAN membership, if the ports are trunks, and compatible PAgP modes.
UDLD This normally indicates a cable fault or possibly a faulty transceiver on one side of the link. Verify the physical cabling, and if this is okay, try replacing any fiber-based transceivers or active equipment in between the switch ports.

Step 4Re-enabling Disabled Ports

Once you are confident that you have resolved the issue that has caused ports to be disabled by the switch, you can re-enable the disabled port(s) using the shutdown and no shutdown interface configuration commands on Cisco IOS and set port enable command on CatOS.

Example 10-43 and Example 10-44 demonstrate manually re-enabling a port that has been placed into an errDisable state on Cisco IOS and CatOS respectively.

Example 10-43. Manually Re-enabling errDisabled Ports on Cisco IOS

 Switch# configuration terminal Switch(config)# interface fastEthernet0/1 Switch(config-if)# shutdown Switch(config-if)# no shutdown 01:22:06: %LINK-3-UPDOWN: Interface FastEthernet0/1, changed state to up 01:22:07: %LINEPROTO-5-UPDOWN: Line protocol on Interface FastEthernet0/1,     changed state to up

Example 10-44. Manually Re-enabling errDisabled Ports on CatOS

 Console> (enable) set port enable 2/1 Port 2/1 enabled. 21:24:17 %ETHC-5-PORTTOSTP:Port 2/1 joined bridge port 2/1

In Example 10-43, notice that to re-enable an errDisabled port, you must first execute the shutdown command and then execute the no shutdown command to bring the interface up. This is somewhat different to the normal process of enabling an interface on Cisco IOS, where you need to issue only the no shutdown command.

The process demonstrated in both examples above is totally manual and obviously incurs some administrative overhead. As discussed earlier, Cisco IOS and CatOS include an automated errDisable recovery feature, which enables the switch operating system to re-enable errDisabled ports after a configurable timeout value. You learned how to configure errDisable recovery on Cisco IOS in Example 10-39; however, you did not learn how to configure the errDisable timer, which is a global timer that determines how long the switch should wait before re-enabling a port. On Cisco IOS, this timer is configured using the errdisable recovery interval global configuration command and has a value of 300 seconds by default. On Cisco IOS, you can also enable/disable errDisable detection, which means if Cisco IOS is incorrectly detecting a condition that causes a port to be placed in an errDisable status, you can disable errDisable detection for the feature that is at fault. Example 10-45 demonstrates configuring errDisable detection, enabling errDisable recovery, and configuring an interval of 30 seconds on Cisco IOS.

Example 10-45. Configuring errDisable Detection, Recovery, and Timeouts on Cisco IOS

 Switch# configuration terminal Switch(config)# errdisable detect cause ?   all           Enable error detection on all cases   dtp-flap      Enable error detection on dtp-flapping   gbic-invalid  Enable error detection on gbic-invalid   l2ptguard     Enable error detection on l2protocol-tunnel   link-flap     Enable error detection on linkstate-flapping   loopback      Enable error detection on loopback   pagp-flap     Enable error detection on pagp-flapping   vmps          Enable error detection on vmps Switch(config)# no errdisable detect cause gbic-invalid Switch(config)# errdisable recovery cause all Switch(config)# errdisable recovery interval 30 Switch(config)# exit Switch# show errdisable detect ErrDisable Reason    Detection status -----------------    ---------------- pagp-flap            Enabled dtp-flap             Enabled link-flap            Enabled l2ptguard            Enabled gbic-invalid         Disabled loopback             Enabled Switch# show errdisable recovery ErrDisable Reason    Timer Status -----------------    -------------- udld                 Enabled bpduguard            Enabled security-violatio    Enabled channel-misconfig    Enabled vmps                 Enabled pagp-flap            Enabled dtp-flap             Enabled link-flap            Enabled gbic-invalid         Enabled l2ptguard            Enabled psecure-violation    Enabled loopback             Enabled Timer interval: 30 seconds Interfaces that will be enabled at the next timeout:

In Example 10-45, the switch is configured to not place a port into an errDisable state due to an invalid GBIC being inserted. Next, the errDisable recovery timer is configured as 30 seconds. The show errdisable detect command is used to verify the errDisable detection configuration, which verifies that invalid GBIC detection is disabled. The show errdisable recovery command is then used to verify the new recovery timer, which is 30 seconds as indicated by the shaded output.

On CatOS, a similar recovery timer exists which determines the amount of time the switch should wait before re-enabling an errDisabled port. Unlike Cisco IOS, you cannot enable/disable detection of specific events that can cause an errDisable eventall events supported are always enabled. You learned how to enable errDisable recovery on CatOS in Example 10-40 using the set errdisable-timeout enable command; however, to modify the recovery timer from the default setting of 300 seconds, you use the set errdisable-timeout interval command as demonstrated in Example 10-46.

Example 10-46. Configuring errDisable Recovery and Timeouts on CatOS

 Console> (enable) set errdisable-timeout enable all Successfully enabled errdisable-timeout for all. Console> (enable) set errdisable-timeout interval 30 Successfully set errdisable timeout to 30 seconds. Console> (enable) show errdisable-timeout ErrDisable Reason      Timeout Status ---------------------- -------------- bpdu-guard             enable channel-misconfig      enable udld                   enable other                  enable Interval: 30 seconds Port  ErrDisable Reason    Port ErrDisableTimeout  Action on Timeout ----  -------------------  ----------------------  -----------------

In Example 10-46, the errDisable recovery feature is first enabled for all events, and then the errDisable recovery timer is configured to automatically re-enable ports after 30 seconds. The show errdisable-timeout command is then used to verify that the timer has been configured correctly.