|
|
From the work presented in [2], you may recall that the basic physical topology chosen for the network has a profound effect on cost, performance, and fault tolerance. Figure 6.5 illustrates how network resilience can be improved by increasing the number of edges per node in the graph (i.e., links per node) and eliminating single points of failure. Clearly, if a node originates in more than one link it has a higher probability of being able to communicate should the topology fail. If we take this to the extreme, each node requires a direct connection to every other node in the graph for complete fault tolerance; we call this a full mesh. The average number of edges, E, is calculated by simply doubling the number of links in the graph and then dividing by the number of nodes. For example, in the partial-mesh network, Figure 6.5(c), this is calculated as (2 ×10/7) = 2.86. As one would expect, for the full-mesh topology, E = 6 (i.e., E = N - 1, where N is the total number of nodes).
Figure 6.5: Topological fault tolerance. (a) Bus or multidrop, (b) Tree, (c) Partial mesh, (d) Full mesh, (e) Star.
If we look at the example networks in Figure 6.5, we can see subtle differences in the way that failed nodes affect overall connectivity. Both the tree and star topologies have the same average number of edges, but there is a fundamental difference in the impact of node failure. In the tree topology, Figure 6.5(b), a failure at a leaf node (i.e., nodes at the edge of the tree with no downstream edges) affects no other node and does not disconnect the graph (though clearly it would be a bad idea to centralize services at a leaf node). However, failure of a root or branch node within the tree causes a topological disconnect, and two new discrete trees are formed. Depending upon the placement of services within the network, such a break would still allow internodes, communication within the two new trees. On the other hand, in the star topology, Figure 6.5(e), all nodes are effectively leaf nodes except for the central hub node. The hub represents a central point of failure; if this node dies, then the whole network is broken.
Remember that we are only considering network resilience here; there are other factors that must be taken into account. Real network designs are always a compromise and may be subject to many constraints imposed by the technologies employed, site locations, cost, network management, and the existing installed base. For example, in Figure 6.5 the star network generally offers better performance than the tree network (since the star topology has only one hop between any two nodes, but the example tree topology has a worst-case number of hops of three). On the other hand, a star topology is likely to be more expensive, since it requires longer direct circuits. The full pros and cons of all of these topologies are discussed in [2]. For the interested reader [10] includes mathematical techniques used to model reliability in various topologies and methods for predicting connectivity.
In practice it is not unusual to use a combination of techniques—for example, a national wide area backbone may comprise a tree of star networks, with each star collapsing short local access circuits into a central hub site, and each hub site connected over the backbone forming a tree topology. The tree may be supplemented by backup circuits or additional meshing between key sites (to form a partial mesh) for additional resilience, as discussed in the next section.
Wide area links are used to create internetwork backbones and are often the most expensive and critical component in internetwork design. However, wide area links, especially long-haul links, are relatively less reliable components of an internetwork (partly because of the large distances and number of intermediate systems and carriers involved, and also because of problems associated with the local loop). Furthermore, because of the relatively low bandwidths and high tariffs associated with these links, they are prone to congestion, since they are often operated at high utilization to maximize cost efficiencies, and current traffic engineering practices are not especially intelligent. These factors are all compelling reasons for considering fault tolerance and load-splitting techniques to supplement the design.
Note that adding capacity or additional circuits is not normally a problem, but if you intend to change capacities on a working network or modify routing behavior in any way, then it is advisable to perform some basic modeling beforehand, as the consequences of redirecting flows could lead to unexpected results.
Consider the five-node star network illustrated in Figure 6.6(a). This topology is a traditional regional leased-line network with a central hub site. If any of the satellite nodes or links fail, then no other node is affected (assuming that important services are not placed at a satellite site). However, if the central site equipment fails, then the whole network is broken and no communication is possible between any of the four satellite sites. To improve fault tolerance there are several options, including the following:
Add more hub links—Each of the satellite links could be doubled up. This can improve performance through load sharing but will only provide resilience against link failures (if the central site fails, we lose all connectivity) and might be cost prohibitive. This design option clearly does not scale.
Add dial-up links—Dial backup links (such as provided by ISDN or analog modems) could be utilized as illustrated in Figure 6.6(b). Instead of deploying dial-up in a spoke configuration, dialing between sites is enabled to offer routing around central site failures. For example, on site C this would require the dial-up software to monitor either physical link status on the primary leased line or the reachability to site HQ (this could be achieved through the routing table). If either fails, then the dial-up link would automatically be invoked either to dial-up site D specifically or to a pool of addresses for even greater resilience.
Add more direct links—Employ partial or full-circuit meshing so that site-to-site communications can continue if the central hub fails.
Add a second hub—A twin-star network requires a second hub and is inherently more reliable (see Figure 6.6[c]). This requires twice as ]many hub links but also means that the second hub could be placed at a different location, so that complete failure of the primary hub site can be dealt with. To save costs, the secondary hub links could be dial-up links (e.g., ISDN).
Figure 6.6: (a) Traditional leased-line star topology. (b) Leased-line star topology, supplemented by dial-up links. (c) Twin star topology with leased lines.
Simple star network topologies offer good performance (fewer hops) but with poor fault tolerance and a higher overall cost. A partial-mesh topology has shorter direct circuits and more inherent resilience.
An attractive low-cost option for resilience is the use of dial-up links as backups for the primary backbone interfaces, illustrated in Figure 6.6(b). This feature is sometimes referred to as Dial on Demand Routing (DDR). Since dial-up links are charged primarily on a call-use basis, these facilities can be very cost effective, since they are likely to be idle for much of the time and only called into action when a failure occurs. The tariff is, therefore, likely only to comprise a one-off connection charge and a relatively low monthly rental.
A traditional low bandwidth solution for this would be via features such as DTR dial or V.25bis through external modems. Many routers, for example, include support for at least one of these types of modem support. A more attractive solution is ISDN, since it allows multiple 64-Kbps channels to be bonded dynamically and provides almost instant backup for failed backbone links. Many router vendors offer sophisticated monitoring facilities so that several primary links can be backed up, and different destinations can be dialed depending upon which link failed. When the primary link recovers (possibly after some hold-down timer to avoid link oscillations), the ISDN line is automatically disconnected to avoid further line charges. ISDN vendors may also offer the capability to add bandwidth on demand, a useful way of dynamically topping up the primary link bandwidth if the utilization reaches a configured threshold.
Typical ISDN call-setup times are in the order of a few seconds, depending upon the amount of negotiation required at the link layer. It is worth noting that on some international ISDN connections the call setup time may be in excess of ten seconds (due to additional latency introduced by the convoluted way that end-to-end circuit switch paths are routed).
If a point-to-point network is required, then some simple steps could be taken to improve resilience, ranging from a partial mesh, in topology Figure 6.7(a), to a full mesh (although the costs of a full mesh could be prohibitive). If the backbone is routed, then all of these additional links would be active, and any site-to-site traffic could be offloaded from the HQ site, which previously acted as the transit point for all traffic. Clearly, the additional links will add cost, so, depending upon traffic loads, there could be some waste, since any additional links need enough capacity to carry rerouted traffic in the event of a failure. If there is any form of meshing in the routed backbone, routers can automatically reconfigure around failed WAN links using routing algorithms to calculate alternate routes. This allows services to continue operating, although there may be user disconnection at the point of router reconvergence, depending upon application timeouts.
Figure 6.7: (a) Partial mesh leased line network. (b) Fully meshed leased line network.
One of the benefits of multiple parallel links would be the option to split the load over the link pairs, hence reducing response times and increasing throughput. Standard data link protocols such as MLPPP are available to provide this functionality for bridges and routers transparently; if parallel connections are made (sometimes called bonding) between the same router pair, then packets will be delivered in sequence at the far end. If MLPPP is not running, then routing protocols such as OSPF and EIGRP can provide load sharing. A redundant star is a reasonable topology to choose if there is relatively little traffic between remote sites or mission-critical traffic moving between corporate and remote sites is highly sensitive to delay. Figure 6.8 illustrates a topology where both multilink and multipath routing are available. Here we see that between routers R1 and R2 multilink PPP is used over three parallel links to distribute traffic evenly. This is transparent to any routing protocol. Between Site A and Site D equal cost multipath routing is available via protocols such as OSPF, since the aggregate metrics for the two data paths, A-R1-R2-R3-D and A-R1-R5-R4-R3-D, are equal to 1,960 (note that we assume a directed graph, so the cost from Site A to R1 is zero).
Figure 6.8: Logical routing topology where both multilink and multipath load sharing are employed.
Routers typically support load balancing on either a per packet or a per destination basis. Per packet load balancing, sometimes called round robin, is recommended where the WAN links are relatively slow (e.g., less than 64 Kbps). However, with round robin packets can arrive out of sequence packets at the remote end, so the application must be capable of dealing with this (TCP can handle this but many other applications cannot). For faster link speeds load balancing on a per destination basis is recommended. If firewalls are used in resilient pairs, you may need to use session-based load balancing to ensure sessions are routed explicitly through a particular firewall, since many firewalls cannot tolerate asymmetrical connections or out-of-sequence packets.
Figure 6.9 illustrates a more flexible and more resilient topology, using a meshed-switched WAN network such as ATM, X.25, SMDS, or Frame Relay (this may also be more cost effective for higher-bandwidth applications). The service provider is responsible for providing a highly meshed topology inside the cloud, and this offers a higher degree of resilience overall. The nice thing about this option is that the backbone is effectively out-sourced, and it is up to the service provider to manage the internals of the network and to ensure that it offers a high level of availability.
Figure 6.9: Resilient connections into a switched cloud via dual connections to different PoPs.
One thing to watch here, however, is the number and location of the PoPs provided by the service and how many switches are actually available at the PoP. For example, you may choose to build a highly resilient corporate LAN at HQ, with multiple routers providing separate interfaces from the HQ site into the cloud, thinking that this will provide even better resilience for all corporate services (see Figure 6.9). However, if the PoP distribution is quite sparse, you may find in reality that you are hooking these routers back into a single switch inside the cloud, defeating much of what you thought you were achieving. This situation is not unusual; it is common in developing countries and where providers roll out a new service (typically the provider will deploy PoPs in major population centers first, and then spread out and improve density as the subscriber levels increase). Even in large cities it is by no means guaranteed that you will be diversely routed. This basic topology issue has compromised several very high profile organizations. To be absolutely sure, always check with your service provider.
So far we have assumed that we know when a WAN link has failed. In reality this is not quite as straightforward as it might seem. Leased lines, for example, do not always fail cleanly; Bit Error Rates (BER) may increase over time until the line becomes effectively unusable, or they may fail cleanly and then bounce back into life several times (sometimes referred to as link flapping). An intermittent problem is much more problematic than a clean failure. Clearly, if this is a routed network, we do not want link-state protocols such as OSPF to be forced to reconverge every time a WAN link oscillates; otherwise, the whole network could be rendered unusable, and for this reason even highly sensitive protocols employ damping features. The mechanisms normally used to detect and handle link failure include the following
Physical signal monitoring—Low-level software or firmware monitors the state of key signals on the WAN interface and flags any changes to higher-layer protocol layers. Typically some hysteresis is implemented to avoid excessive and continuous transitions. Recognized state changes are reported up to higher-level protocols.
Link status messages—link status messages are sent across the link to monitor link integrity on a periodic basis. Proprietary serial protocols implement Breath of Life (BOFL) features; while standards-based protocols such as PPP use a feature called Link Quality Monitoring (LQM), where statistical reports on link status are periodically sent over the wire. If the link is unavoidably error prone, then connection-oriented data link protocols such as LAPB may be required to provide buffering and guaranteed delivery.
Signal monitoring capabilities are dependent upon the media type and the hardware components used by the manufacturer in the system design. Link status mechanisms such as PPP's LQM are media independent. Note that some media types provide link monitoring features as standard (e.g., link-pulse monitoring in Ethernet point-to-point copper and fiber).
Local area networks are generally more resilient than wide area networks due to their local scope, bandwidth availability, and overall topological robustness. We can summarize a number of common causes of failure in LAN media as follows:
Cabling failure—Media cable failure (bad termination, cable break, etc.), AUI cable failure.
Connector failure—Patching failure, loose connectors.
Concentrator failure—Hub or switch failures.
Network interface failure—Faulty network interface controller (PC NICs, transceiver failures).
There are also several techniques and tools available to build fault-tolerant LANs—some proprietary, some standards based, including Spanning Tree bridges and switches, source route bridging, redundant link repeater protocols. We discuss these techniques in the following sections.
Local area backbones are a critical part of many enterprise designs. Regardless of advances in distributed applications, the traffic pattern of many networks today is still highly asymmetrical, with users accessing services concentrated at one or more central sites (often for pragmatic reasons such as cost efficiencies and simplified management). It follows that much of the network traffic may be aggregated at these central locations, and high-speed resilient backbones are required to attach high-speed servers and to carry many concurrent user sessions.
A number of popular LAN technologies incorporate fault-tolerance features as standard. FDDI and Token Ring, for example, incorporate dual self-healing rings. Token Ring Multistation Access Units (MAUs) can detect some media connection failures and bypass the failures internally. FDDI dual rings can wrap around failed sections of the ring by moving traffic automatically onto a backup ring. FDDI also incorporates Spanning Tree protocol for resilient meshed concentrator topologies. From a local router, bridge, or switch perspective, media failures can normally be bypassed automatically as long as there are alternate paths available in the network. All of these features enable the network designer to build fairly robust LAN backbones, and it is often easy and relatively inexpensive to deploy or make use of additional cables for topological resilience (much easier than the WAN environment, where additional links can be very costly to deploy).
Figure 6.10 illustrates the two commonly used LAN backbone configurations. Figure 6.10(a) shows a backbone ring topology. Here, total resilience relies on ring nodes to be dual attached; otherwise they run the risk of being isolated in the event of a single point of failure. The ring is easy to install and manage but may not be flexible enough for some sites and does not scale well (as the ring grows, the distance between nodes increases, and often the medium is a shared-access technology such as FDDI). Figure 6.10(b) shows a backbone mesh topology comprising a fabric of interconnected switches. This topology is highly flexible, and switches may be deployed in a hierarchical or flat configuration to suit the application. The design is also scalable, new switches can be easily added to increase overall capacity, and nodes requiring high reliability can deploy many diverse links. Traffic segmentation and the ability to support full-duplex, point-to-point links at the switch level means that bandwidth is also managed more efficiently. For further information on particular media types refer to [2].
Figure 6.10: Topological resilience in LAN backbones. (a) Self-healing fiber ring backbone (e.g., FDDI, Token Ring), (b) Meshed-switched backbone (e.g., FDDI, Fast Ethernet, Gigabit Ethernet, ATM).
Routers and Layer 3 switches maintain loop-free networks using standard routing protocols (such as RIP, EIGRP, and OSPF) in combination with path metrics. These protocols typically create multiple Spanning Trees (one for each source-destination address pair), so traffic routing is much more efficient than a simple bridged or switched network (which relies on the Spanning Tree protocol). Routing protocols detect topology problems and automatically reroute around failed devices or segments. They may also offer load balancing.
Bridges and switches maintain loop-free networks using the Spanning Tree Protocol (STP). STP detects topology loops and automatically reroutes around failed devices or segments. STP creates a single shortest-path tree for the whole network (i.e., a Minimum Spanning Tree, or MST). Ports are placed in a blocking state if loops are detected; this is undesirable for expensive wide area links and several vendors have added proprietary extensions to handle WAN link load sharing in tandem with STP (e.g., Xyplex and Vitalink). For further information, refer to [2].
At the physical level there are various link and interface monitoring techniques, such as carrier sensing and the use of link-pulse signaling in 10BaseT and FOIRL. These techniques are useful for detecting point-to-point problems and are used to alert higher-level bridging and routing protocols of a possible change in topology. Managed repeaters may also implement a form of proprietary loop control, in particular those with Ethernet connectivity. It is important to understand that there is no standard for resilient link management at the repeater level, although several manufacturers (including Cabletron, Case Communications, Xyplex, and Unger-man-Bass) employ similar techniques in their managed hubs. In this section we refer to this feature generically as Redundant Link Management (RLM). RLM provides controlled topology changes in the event of link or repeater failures. During the late 1980s and early 1990s routers were expensive, often single protocol, and many networks were no more than extended bridged LANs. For mission-critical networks, topological convergence on the order of a few seconds is not uncommon, and the standard Spanning Tree protocol was considered far too slow for such applications. Using RLM and repeaters also removed the requirement for expensive bridges where traffic filtering was not the primary concern.
RLM is typically implemented using a simple master-slave relationship; hubs that operate as a master perform polling of one or more slave hubs. The designer typically configures polling parameters (such as the interfaces used, target addresses to poll for, try intervals, retry counts, etc.). Some implementations allow the designer to set up a number of redundant link groups, where each group is an abstraction for a particular failure scenario. For example, the configuration defines a primary link and two backups for group 1 (shown in the following code segment). The backup links are disabled in software. The primary interface (e0) issues polls for 193.24.2.1 and 193.24.2.21, and the topology remains stable unless either target fails to respond within 3 × 1 seconds (in practice the target addresses could represent the LAN and WAN interface cards on a remote slave hub). In a failure scenario the first backup port in the list is activated and the primary interface is disabled. Note that the backup link may have a completely different target list depending upon the physical topology of the network.
Group If Status Interval Retry Targetlist 1 e0 primary 1 3 193.24.2.1 AND 193.24.2.21 1 e1 backup 1 3 190.4.4.1 AND 190.4.4.6 1 e4 backup 1 3 150.20.6.99
The protocol used to poll and pass status information is often proprietary (some implementations incorporate standard ping and SNMP operations also). Low-level hardware features (such as link pulse) are often employed to speed up detection of physical link failures. RLM implementations vary significantly; more sophisticated techniques may distribute topology information between hubs, enabling discovery and increased automation. Implementations relying on the simple polling mechanism rely heavily on the designer to ensure that loops are not formed in certain failure scenarios; this is particularly important for designs where multiple hubs are operating as master, and some hubs are both master and slave.
For example, consider the simple designs in Figure 6.11. The first design illustrates a simple topology, with three Ethernet repeaters protected by two pairs of dual fiber links. In this case H2 and H3 will both be polling H1. If either hub fails to receive a response from H1 in the allotted window, it will disable its primary interfaces and swap over to the backup links. Note that this design protects only against cable failure; if H1 dies, then H2 and H3 will most likely oscillate their interface status until H1 comes back online. H1 is a Single Point of Failure (SPOF) for the whole network. In Figure 6.11(b) we see an improved design. No hub is a single point of failure, and we use one less cable. H1 is polling H3 via H2, and in the event of failure, H1 will automatically open up link 1-3 but is configured not to disable its primary interface. This takes care of the event that either link 1-2 or 2-3 dies. If H3 itself dies, then we still have a link between H1 and H2.
Figure 6.11: Basic redundant link configurations. (a) A simple star configuration requiring four physical cables to provide link resilience but having a single point of failure at hub H1. (b) An improved configuration with only three cables and no single point of failure.
RLM is nonstandard, and, therefore, implementations rarely interoperate in multivendor environments (it may be possible to poll devices if a standard polling scheme such as ping is used). RLM traffic is in-band, making it prone to false-positive behavior in very busy networks (in extreme cases links can oscillate during busy periods unless long timeouts are configured and damping mechanisms are introduced). Beyond very simple designs there are some challenging subtleties and inconsistencies to resolve, and I would not recommend that you try these. In general it is best to keep the design simple, reduce the number of masters to a minimum, ensure all timers are consistent with the expected failover behavior, and test all scenarios religiously to ensure that there are no surprises. Then document the design thoroughly. While RLM is a very cost-effective way of improving topological resilience, the potential pitfalls and likely maintenance issues make it less and less attractive as fast switches and routers become available as low-cost commodity products.
Segmentation is a widely used design technique in LANs. It reduces the scope of media failure by dividing the network into smaller segments and spreading users and key servers over multiple devices. For example, if you have 200 workstations attached to a single switch, this represents a single point of failure. Adding another switch immediately reduces the impact of a single hub failure to 50 percent of the user base. Both switches could be connected via two backbone links for resilience. Clearly, you can take this model as far as is practical and cost effective. The design becomes more complex as you split device responsibilities, and this complexity manifests itself in the interconnection topology. As a general rule complexity is the enemy of reliability and must also be kept in check.
As discussed in [2], introducing Layer 2 switches (or bridges) partitions the network to create multiple traffic domains. This further improves availability by reducing overall traffic levels. Switches forward traffic intelligently so that traffic is forwarded only to the appropriate LAN segments (instead of being repeated over the whole LAN). Switches also ensure that frame errors such as CRCs are not forwarded. Introducing Layer 3 routers and switches creates partitioning at the Network Layer, improving availability even further. Routers suppress broadcast storms and perform more optimal packet forwarding to reduce overall traffic. Routers also support more intelligent protocols that can resolve topological loops very rapidly (e.g., OSPF).
For mission- or business-critical servers it is often appropriate to implement high availability and even fault tolerance. These devices represent a very small percentage of the overall end-system population, and it is particularly important that they remain online. Providing fault tolerance at the user desktop level is, however, likely to be prohibitively expensive for all but a very small number of mission-critical applications (e.g., online traders, critical health care, process control, etc.). There are more cost-effective solutions that can be used to improve availability; even here the cost burden rises significantly with the number of users. These techniques also have a layer of complexity that can make the infrastructure more difficult to manage and maintain.
Servers are generally installed in or near equipment rooms, so access to multiple network points is generally not an issue. Deploying multiple network outlets at the user desk position can be a relatively inexpensive exercise, if done at the time the cabling system is installed. This technique provides a backup network point. If the primary network connection fails, then the user can manually connect to the second outlet. Unless the second port is permanently online (this would require double the port density in concentrator equipment, with additional costs), the other end of the cable will need to be manually patched by network support staff. Clearly, if the second outlet is wired back to the same card in the same hub, then only cable resilience is provided. For improved availability the second cable should be diversely routed to alternate concentrator equipment. For example, with a dealer desk scenario we could diversely route alternate desk positions to different equipment rooms, so that any single point of failure in the network would take down only a fixed percentage of dealer terminals.
These devices provide two interfaces from the user floor/desk point, which can be diversely routed (see Figure 6.12). The transceiver has limited built-in intelligence to sense the status of the primary link and automatically switches to the backup if a failure is detected. The advantage of this approach is that it is transparent to the end user's application stack and requires no intervention by the user.
Figure 6.12: User workstation wired back to two communications rooms using a dual port transceiver connected to different floor outlets.
Two network interface cards can be installed in the desktop PC/workstation. This provides resilience right inside the user's node and protects against cable and network attachment failure. This approach can, however, prove to be problematical; you should check that the applications and protocol stack are able to cope with multiple interfaces, network, and MAC addresses. You should also ensure that gateway configuration or gateway discovery is handled correctly under all circumstances. It is common practice to install dual NIC cards in mission-critical fault-tolerant servers. Often the protocol stacks in these systems have been modified to accommodate multiple NICs. NIC interfaces should be diversely wired back to different LAN segments or to discrete concentrator equipment.
A UPS and backup or mirrored hard disk medium can be installed if required. This is rarely applicable for desktop users for cost reasons.
As indicated, it is unlikely that you will be able to cost justify high availability at the desktop level, especially in a large network. Even so, there are steps you can take to improve availability of desktop machines through management processes; you should at least take steps to deploy instrumentation that will monitor performance regularly. Desktop devices frequently have outages, and desktop operating systems have a tendency to degrade over time; as new software is installed, deinstallers fail to clean up properly, and disk drives start to underperform. Often these problems go unnoticed on a large network, since users simply reboot, and problems often go unreported. There is a culture of expectation that desktop machines will frequently fail as part of their normal operation (the machine I am writing this chapter on failed to boot properly when started; I simply rebooted and forgot about it).
To improve overall reliability it is recommended that you implement the following measures:
Instrumentation—Take regular measurements of response times, and maintain statistics on reboots and any protocol errors.
Control—The responsibility for desktops should be centralized, so that OS versions and vendor platforms can be standardized. This will greatly improve the deployment, maintenance, and fault-analysis processes.
Classification—Classify the supported platforms according to capacity, performance, and any fault-tolerance requirements—for example, engineer, power user, executive, administration.
Certification—Initiate a certification process for determining which software will and will not be allowed on the network and which software will be supported on each platform.
Burn-in—Initiate physical testing procedures for each piece of software before it is allowed on the network. This will ensure that you get first-hand experience of any possible problems in a controlled environment without ending up fire fighting the whole network.
Build images—For each platform create a standard build image on CD. This will enable you to rapidly deploy or reconfigure failed machines without having to go through a complete install.
Spares—Maintain a good stock of key spares so that you can quickly replace dead monitors, systems, mice, keyboards, and media.
All of these measures will help to maintain overall reliability in the desktop environment.
|
|