Spanning tree is designed to ensure a loop-free forwarding topology is generated in a multi-switch LAN. Due to the nature of transparent bridging, if any loops are in the active Layer 2 topology, frames such as broadcast frames, multicast frames, and unknown unicast frames will continuously circle the looped network.
Transparent bridging defines no mechanism such as the TTL (time-to-live) field used in IP packets to prevent frames from continuously circling a looped network. This fact causes a snowball effect, with the number of broadcast, multicast, and unknown unicast frames looping the network increasing. Because broadcast frames must also be processed by the CPU of every device receiving the frame, CPU usage on every device increases as more and more frames loop the network. Eventually (normally within a matter of seconds), the entire network goes into a meltdown. CPU time and memory on each switch are consumed just processing each broadcast frame, and the available bandwidth on each link for valid traffic becomes less and less. As you can see, a looped Layer 2 topology is a catastrophe for any network, and you definitely need to prevent loops in the topology, while still providing redundant paths.
In this section, you are introduced to the concepts of spanning tree and how it can generate a loop-free topology that dynamically reconverges to a new loop-free topology in the event of failures.
Spanning Tree Operation
A looped Layer 2 topology causes serious issues. Simply blocking a port from sending and receiving data can prevent a looped topology. Spanning tree is the protocol responsible for determining a loop-free topology and blocking the appropriate ports as required. To create a loop-free topology, spanning tree forms a tree structure that is generated from a root node or root bridge.
On Cisco Catalyst switches, by default, a separate, loop-free topology is created for each VLAN. A root bridge is selected for each VLAN, which means that while a physical port might be blocked for one VLAN, the same physical port might be forwarding for another VLAN. In spanning tree, each physical port contains a logical port per VLAN, which allows a logical port for one VLAN to be blocked while another logical port for another VLAN can be forwarding. Thus, spanning tree actually calculates a loop-tree logical topology for each VLAN.
The root bridge is the heart of the spanning-tree topology, and is used as a reference point to generate a loop-free topology. Once the root bridge is selected, each bridge determines the best path to reach the root bridge and blocks any other paths that introduce loops. In a converged spanning-tree topology, a port can be either in a forwarding state or in a blocking state. Only ports that are considered the best path to the root bridge are placed into a forwarding state; all other ports are placed into a blocking state.
When spanning tree first initializes, each switch generates a unique bridge ID per VLAN, which is used by spanning tree to uniquely identify the switch. The bridge ID consists of the bridge MAC address plus a 2-byte field called bridge priority, which can be altered to directly affect whether or not a bridge becomes the root bridge. The bridge priority can be configured as any value between 0 and 65535 and is 32768 on Cisco switches. Figure 4-1 shows the structure of the bridge ID.
Figure 4-1. Bridge ID
All Cisco Catalyst switches are assigned a set of MAC addresses that can be used for spanning tree and other purposes.
The bridge ID is used to select the root bridge; the bridge with the lowest bridge ID always becomes the root bridge. An example of a bridge ID is 32768.000d.7903.0c00. The first portion (32768) is the bridge priority, represented in decimal, while the remaining portion of the bridge ID (000d.7903.0c00) is the hexadecimal representation of the bridge MAC address. Once the bridge ID has been determined, each bridge starts out by assuming that it is the root bridge and begins to generate configuration bridge protocol data units (BPDUs). Configuration BPDUs are the main communication mechanism for spanning tree and are used to determine the root bridge as well as whether or not a port should be forwarding or blocking. A configuration BPDU has various fields that are used to indicate parameters that are important to the generation of the final spanning-tree topology. Table 4-1 describes the important fields that are present in each configuration BPDU.
With regards to selecting the root bridge, the important field in Table 4-1 is the root bridge ID. If a bridge receives a configuration BPDU that lists a lower root bridge ID than what the bridge considers is the current root bridge ID, the bridge immediately considers the lower root bridge ID as the root bridge and begins propagating configuration BPDUs received from this root bridge. Eventually, in a Layer 2 network with multiple bridges, the bridge with the lowest bridge ID becomes known as the root bridge to all bridges. At this point, the root bridge has been selected, and each non-root bridge now begins the process of generating a loop-free topology. Figure 4-2 demonstrates the selection of a root bridge.
Figure 4-2. Selecting the Root Bridge
In Figure 4-2, Bridge A is selected as the root bridge, because it has the lowest bridge ID. Once the root bridge has been selected, all non-root bridges do not actually generate configuration BPDUs by themselves. Each non-root bridge generates configuration BPDUs only when a configuration BPDU originated by the root bridge is received. The non-root bridge updates certain fields in the configuration BPDU (such as root path cost and sender bridge ID) and then propagates the updated configuration BPDU out all ports, except the port upon which the BPDU was generated. This process ensures that configuration BPDUs are propagated throughout the entire network to all switches.
Once the root bridge has been selected, each non-root bridge attempts to build a topology that forms the lowest-cost path to the root bridge. To accommodate this requirement, spanning tree uses the concept of cost. The concept of cost in spanning tree is a measure of the how preferable a link or logical port is in comparison to other links. The lower the cost, the more preferable the link. For example, a 10-Mbps port is considered less preferable than a 100-Mbps port and, thus, has a higher cost to indicate this. Each logical port has a default cost associated with it, which is defined in the 802.1d standard and depends on the bandwidth of the link. The cost for a logical port can be modified to influence root port selection. Table 4-2 shows the 802.1d default costs for various bandwidths associated with a link.
Generating a Loop-Free Topology
It is important to understand that every logical port within a spanning-tree instance transitions through several states upon port initialization. Table 4-3 summarizes each of the STP states a port can be in.
As you can see, user data is only forwarded when a port is in the forwarding state. Spanning tree takes this very cautious approach to prevent any loops from forming even for a short time, because a broadcast storm can bring down a network in seconds. Figure 4-3 illustrates how a port transitions through each of the various states to reach either a forwarding state or a blocking state.
Figure 4-3. STP State Transition
In Figure 4-3, you can see the various events that cause a transition in port state. Notice that a port in the Disabled state only ever transitions to a Blocking state (unless Cisco PortFast is configured), which ensures a loop cannot be created before the network topology is learned. Each of the phases listed in Table 4-3 and Figure 4-3 are now described.
A port is disabled when the Layer 2 protocol is down on the port, whether it be because the port has been administratively shut down, because it is not connected, or because of some issue with processing BPDUs. A port transitions from the Disabled state to the Blocking state and then immediately to a Listening state after it is initialized at the Layer 2 level.
The Listening state is the phase where most of the important legwork of generating a loop-free topology is performed. To generate a loop-free topology, spanning tree goes through the following processes:
When any of the decisions just listed for spanning-tree topology calculation are made, all those decisions are based upon the configuration BPDUs that are received by each bridge. No matter what the decision, whether it is selecting the root bridge or a root port, the same selection process is used for all decisions. This selection process is known as the Spanning-Tree Algorithm (STA) and is described in Table 4-4.
Each of the criteria in Table 4-4 is processed one by one, by comparing the configuration BPDUs received on a port with the configuration BPDUs that are sent out a port, until a decision can be made. If parameters are equal, the next criterion is processed until a decision can be made. Referring back to Table 4-1, you can see that each of the selection criteria is a field in configuration BPDUs.
For example, consider the process of selecting the root bridge. If you take the STA and apply it to this process, you can see that the lowest root bridge ID becomes the root bridge. When it comes to selecting the root port, because a root bridge has been selected, the root bridge ID on all configuration BPDUs is the same, so this criteria cannot be used to make a selection. This fact means that the next criterion is evaluated (select the lowest root path cost). Again, if the criteria is the same on the configuration BPDUs being compared, the next criteria is evaluated, which is to select the lowest sender bridge ID.
During the Learning phase, the spanning-tree topology has normally been determined, and the switch is accepting user data. However, it is not forwarding it. The purpose of this phase is to populate the local bridging table on each switch, so that once traffic is actually forwarded, the switch does not need to flood a lot of traffic. Because the bridging table has been populated to a certain extent, the amount of unknown unicast destination MAC addresses is reduced, reducing the amount of flooding in the network.
After the Learning phase, if a port has been selected as either a root port or designated port, it is placed into the Forwarding state, which means that the port forwards user data. A port remains in the forwarding state until a topology change occurs where the path to the root bridge is affected or the root bridge itself fails. If this change occurs, the port transitions to the Listening phase and performs the appropriate selection processes.
If a topology change occurs downstream from a switch (i.e., in an area of the network that is further away from the root bridge), no changes on the local switch should occur, unless superior configuration BPDUs are received from a downstream switch.
Spanning-tree timers are important because they determine how quickly or slowly a spanning-tree topology can react to a link or bridge failure and converge to a new topology. As indicated in Table 4-1, there are three spanning-tree timers:
A failure in the spanning-tree topology can be detected either directly (if a physical interface goes down, the logical ports on that interface go down as well) or indirectly (a failure occurs that does not down a physical link, which can happen if an active device is in between two bridges). Detecting an indirect failure is simply a matter of not receiving configuration BPDUs for the Max Age timer interval.
It is important to ensure that the spanning-tree timers implemented are consistent throughout the spanning-tree topology. To ensure this, the root bridge configures the spanning-tree timers and attaches these to each configuration BPDU generated (see Table 4-1). Each non-root bridge inherits the spanning-tree timers in the configuration BPDUs, overriding any local configuration and ensuring the spanning-tree timers are consistent for the entire topology.
If a failure occurs in the spanning-tree topology, the various STP timers control how quickly the spanning-tree topology can converge. The following describes how to calculate the convergence time for different types of failures:
You can optimize spanning-tree timers to reduce the default convergence times, depending on your spanning-tree topology. Spanning-tree timers are dependant upon the network diameter of the Layer 2 network, which is defined as the maximum number of bridge hops between any two devices. The timers also depend on the value of the Hello timer, which can be reduced to ensure topology changes are learned of faster than when using the standard Hello timer value. Each timer is calculated so as to ensure that configuration BPDUs can be propagated throughout the network fully before decisions are made about forwarding or blocking ports. Clearly, if there are more bridge hops for a configuration BPDU to travel, the time required for propagation of BPDUs throughout the entire network is higher.
The default spanning-tree timers are designed to accommodate a spanning-tree topology that has a network diameter of seven. For some topologies, the network diameter might be lower than this; in these cases, the spanning-tree timers can safely be reduced. The 802.1d specification includes the correct formula for calculating spanning-tree timers based upon the Hello timer used and the network diameter. Cisco Catalyst switches provides tools that calculate the correct spanning-tree timers based upon network diameter and Hello timer interval.
You can also use Cisco proprietary enhancements to STP to reduce convergence for common situations. These enhancements are discussed later in the chapter.
Recent Spanning Tree Developments
The IEEE has been busy at work recently and has released new specifications relating to spanning tree. Two important specifications are now supported by certain Cisco Catalyst switches:
Each of these new protocols is now discussed.
The most significant development for spanning tree in recent times is the 802.1w specification, which is also known as Rapid Spanning Tree Protocol (RSTP). RSTP is intended to replace the 802.1d standard and redefines the states that switch ports can be in, as well as how switches detect failure and the associated convergence time. With the advent of Layer 3 switching and the use of multilayer design to reduce the convergence times for modern switched networks, a primary goal of RSTP is to reduce convergence times to at least similar levels. RSTP achieves this and also includes standards-based implementations of PortFast, UplinkFast, and BackboneFast.
RSTP is supported from CatOS 7.1 and native IOS 12.1(11)EX on the Catalyst 6000/6500 platform. RSTP support is present from CatOS 7.2 on the Catalyst 4000, and at the time of this writing, it is not supported on the Cisco IOS-based Catalyst 4000 with Supervisor 3. It is supported from Cisco IOS 12.1(9)EA on the Cisco 2950 and 3550 platforms.
The other important specification relating to spanning tree is the 802.1s specification, which is also known as Multiple Spanning Tree (MST). MST relates to how spanning tree interoperates with topologies that include multiple VLANs. On Cisco Catalyst switches, you can define the mode of spanning tree operation, which determines how the switch maintains STP for multiple VLANs. The following lists the common STP modes of operation:
Cisco developed a proprietary version of MST before MST was released called multi-instance spanning tree protocol (MISTP), which has similar principles of MST.
Figure 4-4 demonstrates a simple STP topology that includes 1000 VLANs and shows how load sharing is achieved for each of the technologies just discussed.
Figure 4-4. STP Load Sharing and 802.1Q, PVST+, and MST
In Figure 4-4, the STP instances for each spanning tree mode are shown. In CST (802.1q) mode, a single STP instance exists for all VLANs, and only one active STP topology exists. This arrangement means that Switch-C can have only one active path uplink.
In PVST+ mode, a single STP instance exists for each VLAN, which means that 1000 STP instances exist in total. This arrangement allows for load sharing to be implemented by configuring 500 STP instances to use one uplink on Switch-C as the active forwarding path and the remaining 500 STP instances to use the other uplink on Switch-C. Although STP load sharing is now possible with PVST+, it comes at the expense of significant CPU load on every switch in the network because 1000 STP instances need to be maintained.
Finally, with MST (802.1s) mode, only two STP instances are required, because you need only two separate STP topologies to implement load sharing. The first STP instance is used for VLANs 1-500, and the second STP instance is used for VLANs 501-1000. MST achieves the same load sharing results as PVST+ (traffic for 500 VLANs are forwarded over each uplink on Switch-C), but does so only requiring two STP instances, which significantly reduces CPU load on all switches throughout the network.
MST is supported from CatOS 7.1 and native IOS 12.1(11)EX on the Catalyst 6000/6500 platform. MST support is present from CatOS 7.1 on the Catalyst 4000, and at the time of writing, it is not supported on the Cisco IOS-based Catalyst 4000 with Supervisor 3. MST is supported from Cisco IOS 12.1(9)EA on the Cisco 2950 and 3550 platforms.
At present, configuring RSTP requires you to use the MST mode of operation; however, in future releases you will be able to use RSTP independently of MST (e.g., have a single RSTP instance per VLAN, similar to PVST+).