Lesson 1: Designing a Highly Available Network Topology | MCSE Training Kit (Exam 70-226): Designing Highly Available Web Solutions with Microsoft Windows 2000 Server Technologies (MCSE Training Kits)

One of the most effective ways to ensure a Web site’s high availability is to use redundant hardware and software. Careful use of redundancy allows a data center or network system to tolerate failures of individual components and computers. A highly available topology eliminates any single point of failure. This type of topology will likely include redundant components, paths, and services. For example, a network design might include planning for redundant network interface cards (NICs), routers, and switches so that an individual component failure can occur without affecting a system’s overall availability. In this lesson you’ll learn how to design a network topology that includes redundant components, paths, and services.

After this lesson, you will be able to

Describe the three types of redundancy—components, paths, and services—that you can incorporate into a network topology
Design a highly available network topology that incorporates redundancy

Estimated lesson time: 30 minutes

Network Topology

In a complex system it’s often possible to include redundant components, paths, and services to ensure a highly available network design.

Redundant Components

Redundant components can refer to hardware within a computer (such as hot swappable drives), completely duplicated computers, or network components outside a computer, such as routers, switches, and hubs. This section focuses on network components outside a computer. Duplicated computers are discussed later in this lesson (in the section entitled "Redundant Services"), and redundant components within a computer are discussed in Chapter 3, "Server Configurations."

Hubs and Switches

A hub is a network-enabled device joining communication lines at a central location, providing a common connection to all devices on the network. Active hubs require electrical power but are able to regenerate and retransmit network data. Passive hubs simply organize the wiring. When a hub receives a transmission, it broadcasts traffic to all ports.

A switch is a computer or other network-enabled device that controls routing and operation of a signal path. In clustering, a switch is used to connect the cluster host’s network interface to a router or other source of incoming network connections. Rather than broadcasting the traffic to all ports, a switch establishes a direct path between two ports on the switch so that multiple pairs of ports can communicate without collisions. As the price of switches has decreased, their use has become more common in network topologies.

Multilayer switches provide the core network switching of many Web sites, particularly e-commerce sites. The switches can include the connectivity of Web, application, and database servers. As a result, they need to deliver high- performance Layer 2 and Layer 3 switching while supporting services that meet the requirements for availability, scalability, and security in a Web environment.

For example, multilayer switches must support high-speed interfaces, redundant power supplies, quality of service (QoS), virtual local area networks (VLANs), high port density, and rapid fault recovery. In addition, the switches must be able to carry a large number of user connections while providing Layer 3 forwarding at millions of packets per second (pps) in order to ensure that the switch isn’t a performance bottleneck in the network architecture.

Network hubs and switches are very reliable, but they do fail. Consequently, using redundancy is very important. Figure 2.1 shows how redundant switches are used for each network segment.

Notice that the redundant switches provide redundant paths of network connectivity. Switches have the ability to aggregate any doubled-up connection, allowing for two connections with each section’s hub. Each server and each clustered server has two NICs configured per local area network (LAN) segment by using the vendor’s teaming solutions. Due to the redundancy, any one failure in any one item still allows the network segment to continue to function.

Figure 2.1 - A multitiered network with redundant switches on each subnet

Routers

A router is a network device used to connect networks of different types, such as those using different architectures and protocols. Routers work at the network layer and can switch and route packets across multiple networks, which they do by exchanging protocol-specific information between separate networks. Routers determine the best path for sending data and filter broadcast traffic to the local segment.

Although routers don’t fail very often, they can still fail. When they do, an entire Web site can become unavailable. Any single router that’s placed between network segments represents a point of failure. Note that in Figure 2.1, the router that connects to the Internet and the router between subnet 2 and subnet 3 each represents a single point of failure. Having redundant routing capability at each single point of router failure is critical to a highly available network configuration.

In Figure 2.2, a second Internet connection has been added to the network topology. As a result, component redundancy is achieved through the second router and path redundancy is achieved through the second Internet connection.

Figure 2.2 - A network topology with a redundant Internet connection

Redundant Paths

You can implement redundant connectivity at several points in your network topology: within each LAN segment, at your connection to the Internet Service Provider (ISP), or through multiple sites.

LANs

A LAN is a communications network connecting a group of computers, printers, and other devices located within a relatively limited area (for example, a building). A LAN allows any connected device to interact with any other on the network. LANs allow users to share storage devices, printers, applications, data, and other network resources. In a multitiered network environment, LANs are often segmented into smaller units (subnets). You can use redundant switches in each subnet to provide redundant LAN connectivity, as shown in Figure 2.1.

ISPs

An ISP is an organization that provides individuals or companies access to the Internet and the Web. In most cases a network is connected to an ISP through a T1, DS3, OC3, or OC12 connection. To reduce the risk of a single route failure, many Web sites introduce a second connection to the Internet, as shown in Figure 2.2.

Multiple Sites

To ensure high availability, some organizations implement a live alternate site. This basically consists of setting up the same service at two locations. There are many ways to achieve this setup. Figure 2.3 demonstrates one way to split a Web service over two sites.

You can construct a multisite architecture in several ways. The architecture typically comprises a main Web site and one or more satellite sites that extend a company’s service offerings. The satellite sites can contain a portion of the main site or its entire architecture. The key factors that determine your choice of architecture are the degrees of database synchronization desired between the sites and the amount of traffic that must be backhauled to a main site.

You can use a geographic load balancer (such as Cisco DistributedDirector) when a Web site is expanded to include geographically distributed sites. A geographic load balancer directs connection requests from clients to the site with the closest proximity based on information about the network topology. This process helps improve the response times of applications as seen by end users, especially when the geographic sites are widely distributed.

A geographic load balancer provides scalability to multiple sites and also delivers a high degree of availability by monitoring the state of each distributed e-commerce site. If a site is rendered inoperable, the geographic load balancer stops directing new client connections to the failed site.

Figure 2.3 - Multisite network architecture

The geographic load balancer’s primary function is to play the role of an authoritative DNS server for the domain (for example, www.microsoft.com). A client who wants to access a Web site initiates a DNS request for the appropriate URL. The load balancer receives the DNS request and responds with the unique IP address of the site’s data center that will provide the best service to the end user.

Redundant Services

An organization’s Web applications typically perform such mission-critical tasks as e-commerce, financial transactions, and intranet applications. Businesses can lose millions of dollars when mission-critical applications aren’t available, so it’s important that these applications are available at all times.

An easy and effective way to achieve availability and reliability for your services is to use redundant servers. If you have more than one server implementing your site and one of them fails, the processing requests can be redirected to another server. This provides a highly available Web site.

You can use two methods of implementing redundant services on multiple computers: use backup servers or implement load balancing and clustering.

Backup Servers

One of the most common methods for restoring service is the use of backup systems. You can achieve high availability by using a hot standby with automated failover or by swapping the failed system with spare systems already configured for use.

Hot Standby In situations where prolonged outages cause severe problems, hot standby systems provide a way to recover quickly. You use the standby system to replace a failed system quickly, or in some cases you use it as an source of spare parts. Should a system have a catastrophic failure, it might be possible to remove the drives from the failed system or use backup tapes to restore operations in a relatively short time. This scenario doesn’t happen very frequently, but it does happen, in particular with CPU or motherboard component failures.

Hot standby systems are very expensive and complicated to manage, but their worth is measured by the reduced loss of service. One advantage to using standby equipment to recover from an outage is that the failed unit is available for a leisurely diagnostic to determine what failed. Getting to the root cause of the failure is extremely important to prevent repeated failures.

Standby equipment should be certified and running on a round-the-clock basis, just like the production equipment. You should monitor the equipment to make sure it’s always operational. Keeping the equipment running is important. If it weren’t running, you’d have no guarantees that it’ll be available when it’s needed.

Standby equipment is primarily used in data center operations where it has the highest return on its investment. However, in some cases where the costs of downtime are very high and clustering isn’t a viable answer, standby systems can be used to provide reasonably fast recovery times in some cases. This is particularly true of process control, where loss of a computer can produce very expensive or dangerous results.

Spare Systems Using spare systems to replace failed systems is another technique for rapidly restoring service. Sometimes the replacement system becomes the primary system. In other cases, the failed system is returned to operation as the primary system after it has been repaired. The spare system’s success depends on using a cost-effective procedure to keep an adequate supply of spare systems and using standard configurations.

Load Balancing and Clustering

Load balancing and clustering provide access to resources on a group of servers in such a way that the workload can be shared among multiple servers. Numerous vendors supply hardware- and software-based load balancing and clustering solutions for enterprise networking, including round-robin DNS, load-balancing switches, and software-based solutions such as Windows 2000 Cluster service and Network Load Balancing (NLB).

DNS Round Robin Round robin is a technique used by DNS servers to distribute the load for network resources. This technique rotates the order of the resource record (RR) data returned in a query answer when there are multiple RRs of the same type for a queried DNS domain name. For example, suppose a query is made against a computer that uses three IP addresses (10.0.0.1, 10.0.0.2, and 10.0.0.3), with each address specified in its own A-type RR. Table 2.1 illustrates how these client requests will be handled.

Table 2.1 Rotation of RRs

Client Request	IP Address Return Sequence
First	10.0.0.1, 10.0.0.2, 10.0.0.3
Second	10.0.0.2, 10.0.0.3, 10.0.0.1
Third	10.0.0.3, 10.0.0.1, 10.0.0.2

The rotation process continues until data from all of the same-type RRs for a name have been rotated to the top of the list returned in client query responses.

Although DNS round robin provides simple load balancing among Web servers as well as scalability and redundancy, it doesn’t provide an extensive feature set for unified server management, content deployment and management, or health and performance monitoring. In the event of a failure, you have to remove the servers manually from the DNS tables.

The main advantage of round-robin DNS is that it requires no additional hardware—you just set up your DNS server properly and it works. However, several disadvantages prevent many sites from using round-robin DNS for load balancing:

The caching feature of DNS prevents complete load balancing because not every request that comes in will get its address directly from our DNS server.
You can solve the above problem by disabling caching, but doing so means that every resolution will have to be resolved by your servers, which is expensive and potentially slower for your users.
The DNS server has no way of knowing if one or more of the servers in your Web farm is overloaded or out of service. So the round-robin scheme will send traffic to all servers in turn, even if some are overburdened or dead. This can affect a site’s availability, although a browser user can hit the reload button to try again (and have a random chance of succeeding, assuming that caching doesn’t occur).

Because of this last issue, round-robin DNS isn’t used much (at least not by itself) for large or mission-critical Web farms. But you can use round-robin DNS for load balancing in a Web farm with two or three servers—or you can use it to balance load across two or three server clusters, each of which is load-balanced with one of the methods below. (The chances that an entire cluster will fail are quite small.)

Load-Balancing Switches Load-balancing switches, such as Cisco LocalDirector, are hardware Internet scalability solutions that distribute TCP requests across multiple servers. Server load balancers help increase the scalability of an e-commerce site. Server load balancing works by distributing user requests among a group of servers that appear as a single virtual server to the end user. Its main function is to forward user traffic to the most available or the "best" server that can respond to the user. Server load balancers use sophisticated mechanisms to detect the best server. These mechanisms include finding the server with the fewest connections, the smallest load, or the fastest response times. They can also detect failed servers and automatically redirect users to the active servers. Ultimately, server load balancing helps maximize the use of servers and improves the response times to end users.

Hardware-based solutions use a specialized switch with additional software to manage request routing. For load balancing to take place, the switch first has to discover the IP addresses of all of the servers that it’s connected to. The switch scans all the incoming packets directed to its IP address and rewrites them to contain a chosen server’s IP address. Server selection depends on server availability and the particular load-balancing algorithm in use. The configuration shown in Figure 2.4 uses switches in combination with LocalDirector from Cisco Systems to distribute the load among three servers.

LocalDirector and similar third-party products provide more sophisticated mechanisms for delivering high performance load balancing solutions than round-robin DNS. These products are intelligent and feature-rich in the load- balancing arena—for example, they can transparently remove a server if it fails. However, they don’t provide broad and robust Web farm management tools—and you’ll need multiple switches to avoid making the switch a single point of failure for your entire Web application. Clustering is often less expensive than a load-balancing switch and avoids having a single point of failure.

Windows 2000 Clustering A cluster is a collection of loosely coupled, independent servers that behave as a single system. Cluster members, or nodes, can be symmetric multiprocessing (SMP) systems if that level of computing power is required. The following features characterize clusters:

The ability to treat all the computers in the cluster as a single server Application clients interact with a cluster as if it were a single server, and system administrators view the cluster in much the same way: as a single system image. The ease of cluster management depends on how a given clustering technology is implemented in addition to the toolset provided by the vendor.

Figure 2.4 - LocalDirector used in conjunction with switches
The ability to share the workload In a cluster some form of load balancing mechanism serves to distribute the load among the servers.
The ability to scale the cluster Whether clustering is implemented by using a group of standard servers or by using high-performance SMP servers, you can increase a cluster’s processing capability in small incremental steps by adding another server.
The ability to provide a high level of availability Among the techniques used are fault tolerance, failover/failback, and isolation. These techniques are frequently used interchangeably—and incorrectly.

Clustering is a computer architecture that addresses several issues, including performance, availability, and scalability. As is the case with other architectures, the idea of clustering isn’t new. What’s new about it are its implementation and the platforms that can take advantage of this architecture.

Load balancing is one aspect of clustering. Microsoft’s initial software-based load balancing solution was its Windows NT Load Balancing Service (WLBS), also known as Convoy. The essence of WLBS is a mapping of a shared virtual IP address (VIP) to the real IP addresses of the servers that are part of the load-balancing scheme. Network Load Balancing (NLB) is a Network Driver Interface Specification (NDIS) packet filter driver that sits above the network adapter’s NDIS driver and below the TCP/IP stack. Each server receives every packet for the VIP, and NLB decides on a packet-by-packet basis which packets should be processed by a given server. If another server should process the packet, the server running NLB discards the packet. If it determines that the packet should be processed locally, the packet is passed up to the TCP/IP stack.

Clustering is discussed in more detail in Chapter 4, "Microsoft Windows 2000 Cluster Service," Chapter 5, "Network Load Balancing (NLB)," and Chapter 6, "Microsoft Application Center 2000."

Making a Decision

When designing a highly available network topology, you can use redundancy to avoid any single point of failure. A highly available topology can include redundant components, redundant network paths, and redundant services. Table 2.2 provides a description of each strategy that you can use when designing redundancy into your network topology.

Table 2.2 Designing a Highly Available Network Topology

Strategy	Examples	Description
Redundant components	hubs switches routers	You should use redundant components to avoid any single point of failure. When a component fails, a backup component should be able to operate so that system availability isn’t compromised. All components should be made redundant.
Redundant network paths	LANs ISPs multiple sites	Redundant network connections should exist at every level of the network, including connections within the LAN and the connections to the ISP. You can also implement redundant LAN and ISP connectivity through the use of multiple sites. Some organizations might choose to implement redundancy within the main site and then implement satellite sites to ensure further redundancy.
Redundant services	hot standby spare systems round-robin DNS load-balancing switches clustering at all levels of a multitiered Windows Clustering	Of the examples provided here, clustering is considered the most efficient, manageable and cost-effective in terms of delivering high availability. You should implement environment. Note that clustering is discussed in detail in later chapters. With the information provided in those chapters, you’ll have a better understanding of how clustering is implemented and how you can use it to ensure high availability in your network topology.

Recommendations

When designing a highly available network infrastructure, you should use redundant components and network paths to avoid any single point of failure. You can achieve redundancy by implementing it within the LAN and to the ISP, by implementing multiple sites, or by combining both solutions. In addition, when designing a highly available network topology, you should implement clustering rather than round-robin DNS or load-balancing switches.

Example: Network Redundancy

The network topology shown in Figure 2.5 illustrates a design that uses redundant components and paths.

Figure 2.5 - A network topology that uses redundant components and network paths

This network uses redundant routers and switches to provide redundant connections to the Internet. Should one connection fail, the redundant components and paths provide availability so that users aren’t affected by any single point of failure. In addition, redundant switches and connections have been included for each subnet so that a path is always available in case of failure. Note that the network design uses clustering to ensure high availability, rather than round-robin DNS or load-balancing switches. Clustering is used at each layer of the network topology.

Lesson Summary

One of the most effective ways to ensure the high availability of a Web site is to use redundant hardware and software. In a complex system it’s often possible to include redundant components, paths, and services to ensure a highly available network design. Redundant components can refer to hardware within a computer (such as hot swappable drives), completely duplicated computers, or network components outside a computer, such as routers, switches, and hubs. Although hubs, switches, and routers don’t fail very often, they can still fail. Consequently, using redundancy is very important. You can also implement redundant connectivity at several points in your network topology: within each LAN segment, at your connection to the ISP, or through multiple sites. You can also achieve high availability for your services by using redundant servers. If you have more than one server implementing your site and one of the servers crashes, the processing requests can be redirected to another server. You can use two methods to implement redundant services on multiple computers: use backup servers and implement load balancing and clustering. When designing a highly available network infrastructure, you should use redundant components and network paths to avoid any single point of failure. In addition, when designing a highly available network topology, you should implement clustering rather than round-robin DNS or load-balancing switches.