12.4 Implementation | Linux Network Architecture

The implementation of the bridge functionality discussed here is relatively new. It has been integrated into the Linux kernel since Version 2.2.14 and 2.3.x and replaces the former and in many ways less flexible implementation. This version includes several new functions (e.g., the capability of managing several bridges in one system, and better options to configure the bridge functionality).

In addition, several details of the implementation have changed to provide more efficient handling. Among other things, the forwarding table is no longer stored in the form of an AVL tree, but in a hash table. Though AVL trees are data structures with a relatively low search cost, O(log n), hash tables are generally faster when the collision domain remains as low as possible. This means that a well-distributed hash table has the cost O(1). We can assume that a Linux bridge has to store several hundred reachable systems at most, so a hash table is probably the better choice, especially considering that it is much easier to configure.

The following sections describe in more detail how you can implement the bridge functionality in Linux. We will first introduce the most important data structures and how they are linked, then discuss the algorithms and functions.

12.4.1 Architecture of the Bridge Implementation

Figure 12-12 shows the architecture of the bridge implementation in the Linux kernel. The individual components are divided, by their tasks and over several files. This makes the program text easier to understand and forces the programmer to define the interfaces between the individual components well.

Figure 12-12. Integrating the bridge implementation into the Linux network architecture.

12.4.2 Building and Linking Important Data Structures

The most important data structures of a Linux bridge include information about the bridges themselves and information about the network adapters (ports) allocated to them. We want to repeat here that you can use the new bridge implementation to construct several logically separated bridges in a Linux system. For example, this allows you to easily configure virtual local area networks (VLANs) that are not mutually accessible. In addition to information about the bridge and its ports, you need to store the forwarding table (filter table) for each bridge.

The forwarding table stores the IDs of each reachable station and the port used to reach that station. In addition, a transparent bridge also manages information for the spanning-tree protocol.

The file net/bridge/br private.h defines the structures used to manage the information about bridges and their ports. Figure 12-13 shows how they are built and generally interlinked.

Figure 12-13. Structures in the Linux bridge implementation.

All bridges of a system are linked in a linear list, where the entry point is the bridge_list variable.A net_bridge structure with the following parameters is created for each bridge:

next is used to link all net_bridge structures in a linear list. It points to the next element in the list.
lock is a kind of mutex used to provide atomic access to important bridge structures and to prevent problems caused by concurrent access attempts.
port_list is the entry point into a linear list that stores all ports of a bridge instance.
dev is a pointer to the net_device structure of the virtual network adapter of a bridge.
hash is a pointer to the hash table, which stores the stations a bridge can reach (forwarding table).

The following parameters relate to the spanning-tree algorithm and are discussed in Section 12.4.5. As mentioned earlier, the ports of a bridge are managed in a linear list, starting from the net_bridge structure. Each port in this list is represented by a net_bridge_port structure:

next serves for linear linking of the ports of a bridge.
net_bridge points to the net_bridge structure of the specified bridge. This pointer allows you to find the appropriate bridge instance quickly.
net_device is a pointer to the net_device structure of the network adapter allocated to that port.
port_no stores the port number.

More parameters also relate to the spanning-tree algorithm and are discussed in Section 12.2.4.

Using a Hash Table for Forwarding Information

The Forwarding Data Base (fdb), which is used for forwarding in a bridge, is stored in a hash table. The major benefit of hash tables is that they normally offer direct access to the desired data.

Each net_bridge structure uses the forwarding table to point to the hash vector. The preprocessor variable BR_HASH_SIZE can be used to set the number of entries. If a large number of stations is connected in a LAN internetwork, then the size of the hash vector should be selected accordingly.

If you want to check on whether a MAC address is already known to the destination station, then a more complex hash function (br_mac_hash()) is used to select a row of the hash vector. This row links all entries linearly with the same hash value (linear collision resolution). Figure 12-13 shows how structures are linked.

An entry in the hash table consists of a net_bridge_fdb_entry structure. The first two parameters of this structure serve to link a hash row. The other parameters store the MAC address of the destination station and a reference to the port used to reach that station. The variable aging_timer is used to delete an entry from the hash table after some time when it is no longer required.

12.4.3 The Path of a Packet Through the Kernel

This section describes the path of packets through the Linux kernel (i.e., through a transparent bridge in Linux). As with packets entering a Linux router (see Chapter 6), the network adapter receives a packet, triggers an interrupt on the network adapter, and stores the packet in an input queue. Subsequently, the function net_rx_action() runs the net_rx tasklet. This function also includes the entry point for the bridge implementation (br_handle_frame_hook()).

In contrast to a router implementation, the network layer is not accessed here. Instead, once a bridge is activated, the function br_handle_frame() is invoked, and a pointer, br_handle_frame_hook, points to this function. If no bridge has been activated for instance, when the bridge functionality was created as a module then the hook points to null. In this case, an attempt is made to forward the packet to the higher layers. If the kernel was created without bridge support, then these functions are missing, and no valuable computing time is wasted on searching for an activated bridge.

After a few checks (e.g., on whether the port was activated), the function br_handle_frame() decides whether the packet should be forwarded or its destination should have been in the previous LAN. It obtains this information from the function br_fdb_get(), which searches the forwarding table for the MAC destination address of the packet. If the packet has to be forwarded, then br_forward() will forward it; otherwise, it will be rejected.

But first, br_fdb_put() updates or creates the sender's entry in the forwarding table. As described earlier, a bridge can alternatively pass packets to the higher layers (e.g., the IP instance). For this reason, there has to have been a previous check on whether the destination station is the bridge itself. If this is the case, then the packet is further handled by br_pass_frame_up(), where a clone rather than the original packet is passed upwards. Other cases in which the packet has to be transported upwards are multicast packets and an adapter in promiscuous mode.

If the packet belongs to the spanning-tree protocol, it is passed to the function br_stp_handle_bpdu(). These bridge PDUs can be identified by a special MAC address.

As was mentioned earlier, a hash table within each bridge instance is used to manage the forwarding table. The spanning-tree algorithm and the relevant protocol messages (Config BPDUs, TCN BPDUs) ensure that no cycles can persist in the topology of a redundant LAN internetwork. (See Section 12.2.4). ioctl() commands are used to configure the bridge and set the parameters of the spanning-tree algorithm.

After this brief introduction to the architecture of the bridge implementation in Linux, the following sections describe each of the functions in more detail.

Forwarding Functions

`br_handle_frame()`	net/bridge/br_input.c

This function represents the entry point to the bridge implementation. All packets received by a Linux bridge travel through this function. It is invoked by the NET_RX tasklet in the function net_rx_action(), if the bridge functionality in the kernel was activated (CONFIG_BRIDGE).

The first step checks for whether the input port or the network adapter is deactivated (BR_PORT_DISABLED !IFF_UP); if this is the case, then the packet would be rejected. Next, the MAC header is removed from the packet, and the packet is subjected to several checks. For example, if it is a multicast packet, or if the network adapter is in promiscuous mode, then a clone of the packet is created and passed to the higher layers (br_pass_frame_up()).

Subsequently, the function checks for whether the packet is a bridge PDU. If the MAC destination address begins with the combination 01.80.C2.00.00, then the packet is treated as a special PDU of the spanning-tree protocol and is passed to the function br_stp_handle_pdu().

If the packet is not a bridge PDU, then the bridge remembers its origin (or more specifically, the port on which it was received and the MAC sender address within the Ethernet frame). If the bridge is in either the BR_STATE_LEARNING or BR_STATE_FORWARDING state, it invokes the function br_fdb_insert() for this purpose. This action either adds a new entry to the table or renews the validity of an existing entry. If the bridge is in the BR_STATE_BLOCKING state, then the packet is rejected and the bridge does not remember its origin.

Any further handling of packets is done only in the BR_STATE_FORWARDING state. If an input port is in another state, then the packets are rejected. To be able to forward a packet, a decision has to be taken as to whether the packet should be forwarded at all and, if so, over which port. A multicast or broadcast packet is output to all ports of the bridge (br_flood()). In addition, such a packet is also passed to the higher layers.

br_fdb_get() searches the forwarding table for an entry with the specified MAC destination address. If it concerns the MAC address of an adapter of the bridge instance, then the packet is passed to the higher layers and not forwarded. If the destination is not the bridge, then the function br_forward() forwards the packet over the appropriate output port. Notice that the packet is passed to br_forward(), even if the input port was identified as the destination port, where it will eventually be verified and filtered. If no entry is found in the forwarding table, then the packet is flooded to all outputs (br_flood()).

`br_forward()`	net/bridge/br_forward.c

br_forward() is invoked either once by br_handle_frame() or several times by br_flood(). The purpose of this function is to output a data packet on the specified port. To this end, the function dev_queue_xmit(), described in Section 6.2.2, is used.

Beforehand, however, there are two checks (br_should_forward()). First, the output port has to be in the BR_STATE_FORWARDING state; and, second, it must not be identical with the input port of the packet. Otherwise, the packet would be transferred twice within the local area network.

`br_flood()`	net/bridge/br_forward.c

br_flood() is invoked by br_learn() or br_forward() when a packet should be sent to all ports, except the input port, for some reason. br_flood() simply invokes the function br_forward() for each entry in port_list of the net_bridge structure. As mentioned earlier, this function checks for the port's forwarding state and for mismatch between input and output adapters.

`br_pass_frame_up()`	net/bridge/br_input.c

Transparent bridges are normally invisible to the other stations in a LAN internetwork; they forward data packets on the data-link layer or filter packets. But when we use a Linux system as a bridge, we will probably want to use it also for other purposes. Consequently, the computer should be able to receive IP packets. This is possible with the bridge implementation discussed here. When the bridge receives a packet with the MAC destination address of one of its adapters, it is passed to the higher layers by br_pass_frame_up().

The packet type is set to PACKET_HOST (arrived in the destination system), and the Ethernet header is removed. Subsequently, the packet is passed to the function netif_rx(), which invokes the protocol-handling routine of the appropriate layer-3 protocol.

`br_fdb_get()`	net/bridge/br_fdb.c

br_fdb_get() searches the forwarding table in the hash table of the specified bridge instance for a MAC destination address passed as a parameter. It first calculates the hash value and searches the hash row to see whether there is an entry with the desired MAC address. If there is an entry, then the desired information for the MAC address is found, and a pointer to the output port used to reach that station is returned. If no entry is found, then the route to the destination station is unknown and the value null is returned.

12.4.4 Learning New MAC Addresses

The learning of new MAC addresses is a characteristic of a transparent bridge. It can be achieved only provided that the port is in learning or forwarding state. (See Section 12.2.4.) As was described earlier, the learning function is invoked for each data packet. The source address is added to the forwarding table. If an address already exists in the table, then the information of the net_bridge_fdb_entry structure is updated and the pointer to the entry is returned.

Functions

`br_fdb_insert()`	net/bridge/br_fdb.c

br_fdb_insert() includes the entire learning function of a transparent bridge. The MAC sender address is entered in the forwarding table for each incoming packet in the BR_STATE_LEARNING and BR_STATE_FORWARDING states. To this end, the hash value of the MAC address is calculated (br_mac_hash()), and the hash row is searched for the appropriate entry. If this entry is found, then both the entry for the input adapter and the aging_timer are updated. This means that the bridge will also learn when a station has moved.

If no entry can be found in the hash row, then br_fdb_insert() creates a new net_bridge_fdb_entry structure and uses hash_link() to add it to the hash row.

`br_fdb_cleanup()`	net/bridge/br_fdb.c

The forwarding table should be updated whenever a station is no longer active or the network has changed. Unfortunately, a bridge cannot see such an action, because it responds actively to a station's packets only by remembering the origin of a packet in the forwarding table. This means that, when a station has not sent anything for a certain period of time, then the bridge assumes that the station was deactivated or moved. For this purpose, the gc_timer is set in a bridge instance. This timer starts the function br_fdb_cleanup() periodically in a specific interval, gc_interval. It checks all entries in the hash table of a bridge instance and removes all entries with an aging value exceeding timeout.

12.4.5 Implementing the Spanning-Tree Protocol

This section describes how the spanning-tree protocol according to IEEE 802.1d and the relevant functions are implemented. The spanning-tree protocol is used to prevent cycles in a redundant LAN internetwork. The algorithm operates in a decentralized way: Each station has to work out the current state in the LAN internetwork from the information contained in control packets (BPDUs). For example, each bridge assumes initially that it is the root bridge, and it probably has to learn that this is not so from incoming BPDUs.

For this reason, the implementation of the spanning tree protocol is based on the fact that the currently "best" configuration is stored in each port. This means that each new incoming message is verified to see whether the information it contains is better than the information currently stored, so that the currently best configuration is accepted. By comparing the configuration message most recently received with the information available on the bridge itself, it is easy to figure out the root bridge, the root port, and the designated ports.

This also means that all steps are executed consecutively for each configuration message with better information. This means that the root bridge is not defined in all bridges to then compute the least cost for all bridges, and so on; instead, the bridges decide first on the basis of their own knowledge, and subsequently the knowledge of the immediate neighbors is added, and so on, until the configuration messages have eventually visited the entire LAN internetwork, so that the bridges can make their optimal choice for the LAN internetwork. This shows clearly that a real-world implementation does not necessarily have to correspond to the theoretical model to be efficient.

One major benefit of this implementation is that relatively few configuration messages have to be exchanged. How fast a tree structure can be built also depends on the bridges that send their configuration messages first. It is normally more beneficial when bridges with smaller identifiers or higher priorities send configuration messages earlier. However, bridges do not immediately change from the blocking into the forwarding state; they take various intermediate states where no data packets may be forwarded, so that the probability of temporary cycles is low.

The following subsections describe the important aspects of how the spanning-tree protocol is implemented.

Initialization

A bridge in the kernel is initialized by the functions br_add_bridge() and new_nb() when a bridge instance is created by the brctl addbr ... command. As the instructions in this command are processed, the bridge is set as the designated root bridge. When brctl addif ... adds ports to the bridge, then these ports are initially put into the BR_STATE_BLOCKING state. All timers are initially set to inactive (br_stp_enable_port()). Subsequently, the information currently available is verified to see the state the new port can now take.

When a bridge instance is initialized, the bridge timer is also initialized. This timer is a timer_list type (see Section 2.7.1); it invokes the br_tick() function each second. This function is used to control all timer functions of the bridge instance and the spanning-tree protocol. This means that each bridge instance uses only one single system timer. All internal time-controlled processes run over this timer. (See more information in the later subsection Timer Handling.)

Processing BPDUs

The function br_handle_stp_pdu() of br_handle_frame() is invoked as soon as a BPDU is received. When topology-change packets (TCN) arrive, then br_received_tcn_bpdu() assumes all further handling. When configuration packets (Config BPDU) arrive, then the packet content is copied into a br_config_bpdu structure, and some of the fields are converted into the internal representation format. For example, time values are stored in jiffies rather than in ticks. Subsequently, the BPDU is further handled by the function br_received_config_bpdu().

Steps of the Spanning-Tree Algorithm

The individual steps of the spanning-tree algorithm were described in Section 12.2.4. As mentioned in that section, the spanning-tree mechanism runs for each configuration message received that changes something in the current configuration.

Whether a new configuration message has information that is better than that currently stored is a decision implemented by logic functions, as are the selection of a root port and the naming of a designated port. Notice that these actions normally use few comparisons.

`br_received_config_bpdu()`	net/bridge/br_stp.c

This function initially invokes br_is_root_bridge() to check on whether it has been the root bridge itself. Notice that the bridge does not have a global view of the LAN internetwork, as mentioned earlier. There could indeed be other bridges that classify themselves as the root bridge. This situation will change gradually as the spanning-tree algorithm runs its steps, and one bridge will eventually become the only root bridge.

When a new configuration message is better than the current information (a result of calling br_supersedes_port_info()), then the following things happen:

First, the invocation of br_record_config_information() causes the data of the configuration BPDU to be written to the net_bridge_port structure.
Next, the br_configuration_update() function is invoked. It selects the root ports and designated ports. This action could cause the information structures of the bridge and its ports to change.
Subsequently, br_port_state_selection() recognizes the state of a port. The hello timer is stopped, if the bridge was the root bridge before the new information was stored, but now if it is no longer the root bridge. If a change to the topology is discovered in additional, then the topology_change_timer is stopped, the tcn_timer is started, and a topology-change message is sent (br_transmit_tcn()).
If the input port was marked as the root port, then the timeout values of the configuration BPDU are added to the net_bridge structure and a configuration BPDU is generated (by br_config_bpdu_generation()). In addition, the function br_topology_change_acknowledged() is invoked, if the topoplogy_change_ack flag was set in the configuration BPDU.

In contrast, if nothing changes in response to the configuration BPDU, then br_reply() is invoked, provided that the input port is the designated port. This means that a configuration message with locally stored values is sent.

`br_supersedes_port_info()`	net/bridge/br_stp.c

This function checks for whether the stored net_bridge_port structure changes in response to a configuration BPDU received (i.e., if the new configuration BPDU includes "better" information). This is the case in either of the following situations:

The root bridge in the BPDU has a smaller ID than the root bridge currently stored in the structure.
The two IDs are equal, but the path cost in the BPDU is less.
The path cost is equal, but the ID of the sending bridge is smaller than the ID of the bridge itself.
The IDs of the bridges match, but the port ID of the sending bridge is smaller than the ID of the input port.

The first two points in the above list are normally decisive, but if two local area networks are connected by parallel bridges, then the port ID could also play a role.

`br_record_config_information()`	net/bridge/br_stp.c

This function is invoked if the configuration message is better than the information currently stored. The root bridge ID and the cost over the path to the root bridge (RPC) are stored in the net_bridge_port structure as designated root and cost, respectively. The bridge sending the configuration message and its output port serve as the designated bridge and port.

The message-age timer is started with the value from the configuration message, to be able to detect potential failures of a component.

`br_record_config_timeout_values()`	net/bridge/br_stp.c

An invocation of this function causes the values for expiry of the timers to be copied from the configuration message to the information memory of the bridge. This ensures that critical timers in all bridges of the LAN internetwork have the same timeout values, which are determined by the root bridge.

`br_root_selection()`	net/bridge/br_stp.c

This function selects the root port of a bridge. The function iterates over all ports, starting with the smallest port number, and it checks for whether the conditions for the root port are met (br_should_become_root_port()). The port must not be a designated port, it must not have the BR_STATE_DISABLED state, and the bridge must not be the root bridge. Subsequently, the path cost to the root bridge is compared. If the costs are equal, then the information from the net_bridge_port structure is considered. Figure 12-14 shows the algorithm used for this procedure.

Figure 12-14. Selecting a root port.

If the loop was fully walked through, but no root port was assigned, then the bridge itself becomes the root bridge. Finally, the selected root bridge and the root path cost (RPC) are entered in the net_bridge structure.

`br_designated_port_selection()`	net/bridge/br_stp.c

This function also checks the ports one after the other. A port becomes the designated port if the configuration message that arrived on this port is better than the configuration message received (and stored in the net_bridge_port structure). The configuration message consists mainly of the root bridge ID, the path cost to the root bridge, and the bridge and port IDs, so the corresponding fields in the net_bridge structure and in the net_bridge_port structure have to be compared. Figure 12-15 shows the algorithm used to implement this condition.

Figure 12-15. Selecting a designated port.

`br_become_designated_port()`	net/bridge/br_stp.c

This function is invoked in br_designated_port_selection() for each designated port. The port in the bridge whose number is called in this function becomes the designated port. This means that the corresponding information is stored in the net_bridge_port structure.

`br_port_state_selection()`	net/bridge/br_stp.c

The future state is determined for each port, and appropriate functions are invoked. If the port is in the BR_STATE_DISABLED state, then nothing is done. If a port is the root port or a designated port, then br_make_forwarding() is invoked to put that port in the forwarding state.

Remember that the intermediate states, BR_STATE_LISTENING und BR_STATE_LEARNING, are used first, as described in Section 12.2.4. The forward delay timer controls this procedure.

In all other cases, br_make_blocking() is invoked to put the port in the blocking state. In addition, a topology-change request is caused by br_topology_change_detection(), if the port has been in the forwarding or learning state.

`br_transmit_config()`	net/bridge/br_stp.c

This function initially checks whether for the hold timer is active. If so, then config_pending is set to 1, and the function returns immediately.

If the hold timer has not always been active, then a configuration BPDU with the corresponding values is filled in from the net_bridge structure; then the function br_send_config_bpdu() is invoked, and the hold timer is started. Figure 12-16 shows how a configuration message is built from the net_bridge structure.

Figure 12-16. Example of a configuration message.

graphics/12fig16.gif

Example Running the Spanning-Tree Protocol

The initialization of the bridges results in the configuration shown in Figure 12-17. However, this figure shows only the most important fields in the structures. Each bridge is initialized as a root bridge. Though the interfaces are in blocking state, configuration messages are sent, because the ports are defined as designated ports.

Figure 12-17. Initializing bridges for the spanning-tree protocol.

After the bridge initialization, configuration messages are sent over all network adapters. In this example, bridge 1 sends the first configuration message. This information is better than the information stored in the input ports of bridges 2 and 3 (in this case), so the new information is stored in the net_bridge_port structures of these ports. Figure 12-18 shows this procedure.

Figure 12-18. Storing information from a configuration message.

From this information, the root port is selected in bridges 2 and 3. In the example shown in Figure 12-18, port 1 is selected in both bridges, because it is the only port that stored the root bridge with the smallest ID. This selection causes the net_bridge structure to change, as shown in Figure 12-19.

Figure 12-19. Selecting a root port.

The next step selects designated ports in the bridges. Port 2 is defined as the designated port in each of the two bridges, because the information about the root bridge in the net_bridge_port structure of this port differs from the information in the net_bridge structure. The selecting of designated ports changes the net_bridge_port structure of these ports, so the new root bridge and its path cost are entered. Figure 12-20 shows this procedure.

Figure 12-20. Selecting designated ports.

Two paths to the root bridge exist for LAN 3 in this example, and so there is a cycle; hence, we have to select a bridge as designated bridge for this LAN. We opt for bridge 2 as the designated bridge, because it has the smaller ID. Notice that this selection, too, depends on the order of the exchanging of configuration messages. For example, if bridge 3 sends the first configuration message over its designated port, then bridge 2 will check for whether it is better than the stored configuration. Such would not be the case in this example, so bridge 2 would return a configuration message with its own values. The information in this message would be better than the configuration stored on port 2 in bridge 3. Consequently, this bridge would run the spanning-tree algorithm. Port 1 would remains the root port, but port 2 would no longer be a designated port and would be put into the blocking state. (See Figure 12-21.)

Figure 12-21. Configuration for LAN 3.

Timer Handling

Each bridge has a function, br_tick(), to handle timers. This section describes the seven defined timers To implement these timers, only one system timer of the type timer_list is used. This is the variable tick in each net_bridge structure.

The timer tick is invoked once every second (expires = jiffies + HZ) This causes the function br_tick() to be invoked each time. This function defines the behavior of the timers in the following list. Whether a timer has expired is checked for each timer (br_check_timers()). In addition, the timers are incremented. The appropriate behavior function is invoked as soon as a timer expires.

Hold timer: The hold timer starts after the configuration BPDU has been sent. When it is active, no configuration BPDU can be sent over the same port. The hold timer expires when its value reaches or exceeds the stored hold_time. Then the function br_transmit_config() is invoked, if no BPDU has been sent yet. Once it has expired, the hold timer is not restarted. It is stopped explicitly when a port is disabled.
GC timer: The garbage collection timer does cleanup work in the forwarding table. It checks periodically (gc_interval) on whether there are old entries in the forwarding table. If there are, then these entries are deleted, to respond to moving stations. In addition, this cleanup work prevents the forwarding table from filling up with entries for inactive stations. The function br_fdb_cleanup() is responsible for this check.
Hello timer: The hello timer is used to send hello packets (configuration BPDUs) at regular intervals. This timer is started after the call of br_config_bpdu_generation(), while the spanning-tree protocol is running. It is incremented until its value has reached the stored hello time. Subsequently, br_config_bpdu_generation() is invoked again, and the hello timer is restarted.
TCN timer: The TCN timer is used once a TCN BPDU has been sent. This timer causes TCN BPDUs to be sent at regular intervals until the topology change has been acknowledged. The intervals are identical to those for configuration BPDUs.
Topology-change timer: This timer is used exclusively by the root bridge. It specifies the period for which the flags for a topology change request are set (i.e., the period during which configuration messages may be passed). The fields topology_change_detected and topology_change in the net_bridge structure are set to null as soon as this timer expires. This timer is not restarted.
Message-age timer: There is one message-age timer for each network interface in each bridge. This timer is started when the values of a configuration BPDU are written to the net_bridge_port structure. The expiry of the message-age timer means that a component has failed. For this reason, the spanning-tree protocol is restarted, where the port with the expired timer is set to be the designated port. Subsequently, the spanning-tree algorithm runs its normal procedure.
Forward-delay timer: The forward-delay timer is used to move the ports of a bridge from the blocking to the forwarding state. This is the reason why there is one such timer for each port. This timer specifies the time interval between two states. It is started by the function br_make_forwarding(), and the state of a port is set to BR_STATE_LISTENING in this function.
When in the BR_STATE_LISTENING, the port is switched to the BR_STATE_LEARNING state, and the forward-delay timer is restarted. When the timer expires again, then the state changes from BR_STATE_LEARNING to BR_STATE_FORWARDING. This requires the br_topology_change_detection() function to be invoked, if any of the ports stored this bridge is the designated bridge. Figure 12-11 shows these transitions.

Topology Changes

When a new bridge is added to the LAN internetwork, then the spanning-tree protocol (STP) runs as described above: The bridge is initialized as root bridge. If it is actually the (new) root bridge, then its configuration messages will win across all bridges in the internetwork. Otherwise, it receives configuration messages from neighboring bridges, which it will then use to configure its interfaces.

As was mentioned previously, if a bridge or an active port fails, then the message-age timer in the neighboring bridge expires. Figure 12-22 shows this procedure in an example. The port owning the expired timer is set to be the designated port. This means that the current configuration of this port is overwritten. Subsequently, the spanning-tree mechanism runs once more in this bridge.

Figure 12-22. Example for a topology change: The message-age timer expires.

Functions Used to Display a Topology Change

As previously described, the execution of the spanning-tree protocol in a LAN internetwork could cause changes to the topology. TCN BPDUs are sent over the path to the root bridge to ensure that all bridges are informed about such a topology change. In turn, the root bridge sends then configuration BPDUs with the topology_change field set, and these BPDUs are transported across all paths within the tree topology.

It is interesting to note that MAC addresses are not added to the forwarding table when the topology is reconfigured. Instead, this is done exclusively by the learning function. However, the entries in the forwarding table can become invalid after a relatively short time, so packets are sent to the relevant stations over all ports so that they will eventually reach their destination.

`br_received_tcn_bpdu()`	net/bridge/br_stp.c

If the port that received a BPDU is a designated port, then the function br_topology_change_detection() is invoked. br_topology_change_acknowledge() is used to send a configuration message with the topology_change_ack field set over the input port.

`br_topology_change_detection()`	net/bridge/br_stp.c

If the bridge is the root of the tree topology, then the topology_change field in the net_bridge structure is set to one, and the topology-change timer is started. Unless the topology change has been detected, all other bridges use the br_transmit_tcn() function to send a TCN BPDU over their root ports and start their TCN timers. Finally, it is marked that the topology change was detected, to limit the number of TCN BPDUs announcing the same topology change.

`br_topology_change_acknowledged()`	net/bridge/br_stp.c

The marking for a topology change is reset, and the TCN timer is stopped. This function is invoked by br_received_config_bpdu(), if the flag topology_change_ack is set in the incoming configuration message.