NLB enhances the scalability and availability of mission-critical services based on Transmission Control Protocol/Internet Protocol (TCP/IP), such as Web, Microsoft Terminal Services, virtual private networking (VPN), and streaming media servers. This service runs within cluster hosts as part of the Windows 2000 Advanced Server and Datacenter Server operating systems and requires no dedicated hardware support. To scale performance, NLB distributes IP traffic across multiple cluster hosts. It also ensures high availability by detecting host failures and automatically redistributing traffic to the surviving hosts. NLB provides remote controllability and supports rolling upgrades from the Windows NT 4 operating system. NLB’s unique and fully distributed architecture enables it to deliver very high performance and failover protection, especially in comparison with dispatcher-based load balancers. This lesson provides an overview of NLB, describes how the service works, and discusses the NLB architecture.
An organization’s Web applications typically perform such mission-critical tasks as e-commerce and financial transactions. Businesses can lose millions of dollars when mission-critical applications aren’t available, so it’s important that these applications are available at all times. They must be highly available and reliable. An easy and effective way to achieve availability and reliability is to use redundant servers. If you have more than one server implemented within your site and one of the servers fails, the processing requests can be redirected to another server, which provides a highly available Web site.
Adding just a few servers can increase your availability tremendously. If you have one server with only 90 percent availability, adding just one server increases your availability for the cluster to 99 percent, or only 14.4 unavailable minutes each day. That’s because the probability of both failing at once is 0.1 squared, or 0.01. With seven available servers, you have 99.99999 percent availability—at least for the Web farm. The availability will be far better with better hardware, of course.
Adding Web servers also helps your site handle larger loads in an elegant manner: adding Web servers provides linear scalability (provided there aren’t other bottlenecks in your system). For example, if you need to handle 10 times the number of Hypertext Transfer Protocol (HTTP) client requests, you should use 10 times as many servers, assuming that other factors, such as network bandwidth, don’t create bottlenecks or limit availability.
Figure 5.1 shows the performance of servers in a Windows 2000 Web environment.
Figure 5.1 - Scalability increasing as more Web servers are added to the Web farm
When you implement a site on a Web farm, you must distribute the processing load across the hosts. The processing requests received by the cluster are directed to the different hosts in the cluster so that response time is low and the application throughput is high. Each host in the cluster runs a separate copy of the server programs required by the application. For example, if the purpose of the cluster were to support File Transfer Protocol (FTP) services, each host in the cluster would be configured as an FTP server. Load balancing distributes the incoming client requests across the cluster hosts.
As discussed in Chapter 1, "Introduction to Designing Highly Available Web Solutions," there are a number of ways to implement load balancing:
Round-robin DNS involves setting up your site’s DNS server to return the set of all the IP addresses of the servers in the cluster in a different order on each successive request. The client typically forwards the request to the first IP address in the list of IP addresses returned. Consequently, the request is directed to a different server in the cluster, and the traffic is distributed across the servers.
Load-balancing switches, such as Cisco LocalDirector, redirect TCP requests to servers in a server farm. These switches provide a highly scalable, interoperable solution that’s also very reliable. To implement load balancing, the LocalDirector presents a common virtual IP address to the requesting clients and then forwards the requests to an available server.
Windows 2000 Advanced Server and Windows 2000 Datacenter Server NLB distributes the IP traffic across multiple Web servers that provide TCP/IP services. NLB presents a common virtual IP address for the entire cluster and transparently partitions client requests across the multiple servers in the cluster. NLB provides high availability and high scalability to the Internet applications.
Application Center 2000 also supports NLB. Although Application Center doesn’t actually install NLB on Windows 2000 Advanced Server or Windows 2000 Datacenter Server, it provides a enhanced management interface that’s integrated into the NLB service that’s already included in both operating systems. However, if Application Center 2000 is installed on Windows 2000 Server (which doesn’t provide NLB), Application Center 2000 will install NLB automatically.
NLB provides scalability and high availability to enterprise-wide TCP/IP services, such as Web, Terminal Services, proxy, VPN, and streaming media services. NLB brings special value to enterprises deploying TCP/IP services, such as e-commerce applications, that link clients with transaction applications and back-end databases.
NLB servers (also called hosts) in a cluster communicate among themselves to provide the following benefits:
NLB distributes IP traffic to multiple copies (or instances) of a TCP/IP service, such as a Web server. Each service runs on a host within the cluster. NLB transparently partitions the client requests among the hosts and lets the clients access the cluster using one or more virtual IP addresses. From the client’s point of view, the cluster appears to be a single server that answers these client requests. As enterprise traffic increases, network administrators can simply plug another server into the cluster.
For example, the clustered hosts in Figure 5.2 work together to service network traffic from the Internet. Each server runs a copy of an IP-based service, such as Internet Information Services (IIS) 5.0, and NLB distributes the networking workload among them. This speeds up normal processing so that Internet clients see faster turnaround on their requests. For added system availability, the back-end application (a database, for example) can operate on a two-node cluster running the Cluster service.
Figure 5.2 - A four-host NLB cluster working as a single virtual server to handle network traffic
Each host runs its own copy of the server with NLB distributing the work among the four hosts.
NLB scales the performance of a server-based program, such as a Web server, by distributing its client requests among multiple servers within the cluster. With NLB, each host receives each incoming IP packet, but only the intended recipient accepts it. The cluster hosts concurrently respond to different client requests or to multiple requests from the same client. For example, a Web browser may obtain the various images within a single Web page from different hosts in a load-balanced cluster. This speeds up processing and shortens the response time to clients.
By default, NLB distributes the load equally among all the hosts in the cluster. However, you can specify the load percentage that each host handles. Using these percentages, each NLB server selects and handles a portion of the workload. Clients are statistically distributed among cluster hosts so that each server receives its percentage of incoming requests. This load balance dynamically changes when hosts enter or leave the cluster. However, the load balance doesn’t change in response to varying server loads (such as CPU or memory usage). For applications such as Web servers, which have numerous clients and relatively short-lived client requests, NLB’s ability to distribute workload through statistical mapping efficiently balances loads and provides fast response to cluster changes.
NLB cluster servers emit a heartbeat message to other hosts in the cluster and listen for the heartbeat of other hosts. If a server in a cluster fails, the remaining hosts adjust and redistribute the workload while maintaining continuous service to their clients. Although existing connections to an offline host are lost, the Internet services still remain continuously available. In most cases (for example, with Web servers), client software automatically retries the failed connections and the clients experience a delay of only a few seconds in receiving a response.
Figure 5.3 provides a high-level picture of how NLB works. In short, traffic is sent to all the hosts, but only one host decides to pick it up.
Figure 5.3 - How NLB processes client requests
The statistical mapping of client IP addresses and IP ports to hosts results in a very even load balance; a good figure is to have all your load-balanced servers running the same load, ±5 percent. In addition to an even load, you need a very fast, very efficient load-balancing algorithm so that you can maintain high throughput. NLB meets these requirements by precisely sending a particular client to a particular host.
NLB employs a fully distributed filtering algorithm to map incoming clients to the cluster hosts. This algorithm was chosen to enable cluster hosts to make a load-balancing decision independently and quickly for each incoming packet. It was optimized to deliver statistically even loads for a large client population making numerous, relatively small requests, such as those typically made to Web servers. When the client population is small or the client connections produce widely varying loads on the server, the load-balancing algorithm is less effective. However, its algorithm’s simplicity and speed allow it to deliver very high performance, including both high throughput and low response time, in a wide range of useful client/server applications.
Network Load Balancing load balances incoming client requests by directing a selected percentage of new requests to each cluster host; you can set the load percentage in the NLB Properties dialog box for each port range to be load- balanced. The algorithm doesn’t respond to changes in the load on each cluster host (such as the CPU load or memory usage). However, the mapping is modified when the cluster membership changes, and load percentages are renormalized accordingly.
When inspecting an arriving packet, all hosts simultaneously perform a statistical mapping to quickly determine which host should handle the packet. The mapping uses a randomization function that calculates a host priority based on the client’s IP address, port, and other state information maintained to optimize load balance. The corresponding host forwards the packet up the network stack to TCP/IP, and the other cluster hosts discard it. The mapping doesn’t vary unless the membership of cluster hosts changes, ensuring that a given client’s IP address and port will always map to the same cluster host. However, the particular cluster host to which the client’s IP address and port map can’t be predetermined because the randomization function takes into account the current and past cluster’s membership to minimize the process of remapping.
The load-balancing algorithm assumes that client IP addresses and port numbers (when client affinity isn’t enabled) are statistically independent. (Client affinity is discussed later in this lesson.) This assumption can break down if a server-side firewall is used that proxies client addresses with one IP address and, at the same time, client affinity is enabled. In this case, all client requests will be handled by one cluster host and load balancing is defeated. However, if client affinity isn’t enabled, the distribution of client ports within the firewall usually provides good load balance.
NLB hosts periodically exchange multicast or broadcast heartbeat messages within the cluster. This allows them to monitor the cluster’s status. When the cluster’s state changes (such as when hosts fail, leave, or join the cluster), NLB invokes a process known as convergence, in which the hosts exchange heartbeat messages to determine a new, consistent state of the cluster and to elect the host with the highest priority as the new default host. When all cluster hosts have reached consensus on the cluster’s correct new state, they record the change in cluster membership upon completion of convergence in the Windows 2000 event log. Figure 5.4 illustrates the NLB convergence process.
Figure 5.4 - NLB failure recovery/convergence
During convergence, the hosts continue to handle incoming network traffic as usual, except that traffic for a failed host doesn’t receive service. Client requests to surviving hosts are unaffected. Convergence terminates when all cluster hosts report a consistent view of the cluster membership for several heartbeat periods. If a host attempts to join the cluster with inconsistent port rules or an overlapping host priority, completion of convergence is inhibited. This prevents an improperly configured host from handling cluster traffic.
At the completion of convergence, client traffic for a failed host is redistributed to the remaining hosts, as shown in Figure 5.5. If a host is added to the cluster, convergence allows this host to receive its share of load-balanced traffic. Expansion of the cluster doesn’t affect ongoing cluster operations and is achieved in a manner transparent to both Internet clients and to server programs. However, it might affect client sessions because clients may be remapped to different cluster hosts between connections.
Figure 5.5 - An NLB cluster with one failed host
Once per second, every host broadcasts a heartbeat message conveying its state, its logic’s state, and its port rule policy. The other hosts examine these messages to verify a consistent membership of the cluster.
The local area network (LAN) resources that the heartbeats occupy is very small—in fact, measured in the tens of kilobytes—because the broadcast is only one packet per second per host. It’s an n -fold algorithm, not an n algorithm.
If, over a 5-second period, any host recognizes that another host has dropped out or a new member has been added, that host will enter the convergence process. In that process, the host will double the rate of heartbeat messages. During the convergence process, the other hosts continue to respond to client requests.
However, the clients that would have been targeted to the failed host don’t see any response. Their existing connections die, and any new connections they produce that map to that failed host aren’t serviced. Instead, within a 10-second period, these clients are remapped to the other hosts and a failover is achieved or the cluster recognizes a newly added host.
It’s important to understand that the cluster doesn’t stop servicing traffic during the convergence process. For example, say you add a new Windows 2000–based host to an existing Windows NT–based cluster. Because you forgot to change the default port rules, that new port will be inconsistent with the Windows NT–based cluster. In this case, convergence will just continue until you pull that inconsistent host out of the cluster and fix its port rules. And while this may seem odd, it’s the only way to ensure consistent load balancing across the hosts.
NLB session support uses a process called client affinity, which allows a client to be mapped to the same host during a session. After the initial client request, which is distributed like any other request, NLB looks at only the source IP address and not the source port information. Therefore, a client with a given IP address will always map to a particular cluster host, and any session state that’s maintained in that cluster host will persist across those connections. The client won’t suddenly be mapped to another host at connection boundaries.
Figure 5.6 illustrates how client affinity works after a client makes an initial request.
Figure 5.6 - An NLB cluster with one failed host
You can establish two types of client affinity: Single or Class C. These two types help maintain client sessions. With Single affinity, NLB pins that client to a particular host without setting a timeout limit; this mapping is in effect until the cluster set changes. The trouble with Single affinity is that in a large site with multiple proxy servers, a client can appear to come from different IP addresses. To address this, NLB also includes Class C affinity, which specifies that all clients within a given Class C address space will map to a given cluster host. However, Class C affinity doesn’t address situations in which proxy servers are placed across Class C address spaces. Currently the only solution is to handle it at the Active Server Pages (ASP) level.
To maximize throughput and availability, NLB uses a fully distributed software architecture. An identical copy of the NLB driver runs in parallel on each cluster host. The drivers arrange for all cluster hosts on a single subnet to concurrently detect incoming network traffic for the cluster’s primary IP address (and for additional IP addresses on multihomed hosts). On each cluster host, the driver acts as a filter between the network adapter’s driver and the TCP/IP stack, allowing the host to receive a portion of the incoming network traffic. In this way, incoming client requests are partitioned and load balanced among the cluster hosts.
NLB runs as a network driver logically situated beneath higher-level application protocols, such as HTTP and FTP. Figure 5.7 shows the implementation of NLB as an intermediate driver in the Windows 2000 network stack.
Figure 5.7 - NLB running as an intermediate driver between the TCP/IP protocol and network adapter drivers within the Windows 2000 protocol stack
This architecture maximizes throughput by using the broadcast subnet to deliver incoming network traffic to all cluster hosts and by eliminating the need to route incoming packets to individual cluster hosts. Since filtering unwanted packets is faster than routing packets (which involves receiving, examining, rewriting, and resending), NLB delivers higher network throughput than dispatcher-based solutions. As network and server speeds grow, NLB’s throughput also grows proportionally, thus eliminating any dependency on a particular hardware routing implementation. For example, NLB has demonstrated 250 megabits per second (Mbps) throughput on gigabit networks.
NLB’s architecture takes advantage of the subnet’s hub architecture, switch architecture, or both, to deliver incoming network traffic simultaneously to all cluster hosts. However, this approach increases the burden on switches by occupying additional port bandwidth. This is usually not a concern in most intended applications, such as Web services and streaming media, since the percentage of incoming traffic is a small fraction of total network traffic. However, if the client-side network connections to the switch are significantly faster than the server-side connections, incoming traffic can occupy a prohibitively large portion of the server-side port bandwidth. The same problem arises if multiple clusters are hosted on the same switch and you haven’t set up virtual LANs for individual clusters.
During packet reception, the fully pipelined implementation of NLB overlaps the delivery of incoming packets to TCP/IP and the reception of other packets by the network adapter driver. This speeds up overall processing and reduces latency because TCP/IP can process a packet while the Network Driver Interface Specification (NDIS) driver receives a subsequent packet. It also reduces the overhead required for TCP/IP and the NDIS driver to coordinate their actions, and in many cases it eliminates an extra copy of packet data in memory. During packet sending, NLB also enhances throughput and reduces latency and overhead by increasing the number of packets that TCP/IP can send with one NDIS call. To achieve these performance enhancements, NLB allocates and manages a pool of packet buffers and descriptors that it uses to overlap the actions of TCP/IP and the NDIS driver.
Application state refers to data maintained by a server application on behalf of its clients. If a server application (such as a Web server) maintains state information about a client session—that is, when it maintains a client’s session state —that spans multiple TCP connections, it’s usually important that all TCP connections for this client be directed to the same cluster host. Shopping cart contents at an e-commerce site and Secure Sockets Layer (SSL) authentication data are examples of a client’s session state.
You can use NLB to scale applications that manage session state spanning multiple connections. When its client affinity parameter setting is enabled, NLB directs all TCP connections from one client IP address to the same cluster host. This allows session state to be maintained in host memory. However, should a server or network failure occur during a client session, a new logon may be required to reauthenticate the client and reestablish session state. Also, adding a new cluster host redirects some client traffic to the new host, which can affect sessions, although ongoing TCP connections aren’t disturbed. Client/server applications that manage client state so that it can be retrieved from any cluster host (for example, by embedding state within cookies or pushing it to a back-end database) don’t need to use client affinity.
To further assist in managing session state, NLB provides an optional client affinity setting that directs all client requests from a TCP/IP class C address range to a single cluster host. With this feature, clients that use multiple proxy servers can have their TCP connections directed to the same cluster host. The use of multiple proxy servers at the client’s site causes requests from a single client to appear to originate from different systems. Assuming that all of the client’s proxy servers are located within the same 254-host class C address range, NLB ensures that the same host handles client sessions with minimum impact on load distribution among the cluster hosts. Some very large client sites may use multiple proxy servers that span class C address spaces.
In addition to session state, server applications often maintain persistent, server-based state information that’s updated by client transactions, such as merchandise inventory at an e-commerce site. You should not use NLB to directly scale applications such as Microsoft SQL Server (other than for read-only database access) that independently update interclient state because updates made on one cluster host won’t be visible to other cluster hosts. To benefit from NLB, applications must be designed to permit multiple instances to simultaneously access a shared database server that synchronizes updates. For example, Web servers with ASP should have their client updates pushed to a shared back-end database server.
When you install Windows 2000 Advanced Server or Windows 2000 Datacenter Server, NLB is automatically installed. It operates as an optional service for LAN connections, and you can enable it for one LAN connection in the system; this LAN connection is known as the cluster adapter. No hardware changes are required to install and run NLB. Since it’s compatible with almost all Ethernet and Fiber Distributed Data Interface (FDDI) network adapters, it has no specific hardware compatibility list.
The cluster is assigned a primary IP address, which represents a virtual IP address to which all cluster hosts respond. The remote control program provided as a part of NLB uses this IP address to identify a target cluster. Each cluster host also can be assigned a dedicated IP address for network traffic unique to that particular host within the cluster. NLB never load balances traffic for the dedicated IP address. Instead, it load balances incoming traffic from all IP addresses other than the dedicated IP address.
When configuring NLB, it’s important to enter the dedicated IP address, primary IP address, and other optional virtual IP addresses into the TCP/IP Properties dialog box in order to enable the host’s TCP/IP stack to respond to these IP addresses. You always enter the dedicated IP address first so that outgoing connections from the cluster host are sourced with this IP address instead of a virtual IP address. Otherwise, replies to the cluster host could be inadvertently load balanced by NLB and delivered to another cluster host. Some services, such as the Point-to-Point Tunneling Protocol (PPTP) server, don’t allow outgoing connections to be sourced from a different IP address, and thus you can’t use a dedicated IP address with them.
Each cluster host is assigned a unique host priority in the range of 1 to 32, with lower numbers denoting higher priorities. The host with the highest host priority (lowest numeric value) is called the default host. It handles all client traffic for the virtual IP addresses that’s not specifically intended to be load-balanced. This ensures that server applications not configured for load balancing receive only client traffic on a single host. If the default host fails, the host with the next highest priority takes over as default host.
NLB uses port rules to customize load balancing for a consecutive numeric range of server ports. Port rules can select either multiple-host or single-host load balancing policies. With multiple-host load balancing, incoming client requests are distributed among all cluster hosts, and you can specify a load percentage for each host. Load percentages allow hosts with higher capacity to receive a larger fraction of the total client load. Single-host load balancing directs all client requests to the host with highest handling priority. The handling priority essentially overrides the host priority for the port range and allows different hosts to individually handle all client traffic for specific server applications. You can also use port rules to block undesired network access to certain IP ports.
By default, NLB is configured with a single port rule that covers all ports (0–65,535) with multiple-host load balancing and single-client affinity. You can use this rule for most applications. It’s important that this rule not be modified for VPN applications and whenever IP fragmentation is expected. This ensures that fragments are efficiently handled by the cluster hosts.
An easy and effective way to achieve availability and reliability is to use redundant servers. NLB provides scalability and high availability to enterprise-wide TCP/IP services by distributing its client requests among multiple servers within the cluster. NLB employs a fully distributed filtering algorithm to map incoming clients to the cluster hosts. Cluster hosts periodically exchange multicast or broadcast heartbeat messages within the cluster, allowing the hosts to monitor the cluster’s status. When the cluster’s state changes (such as when hosts fail, leave, or join the cluster), NLB invokes a process known as convergence. Client affinity allows a client to always map to a particular cluster host so that the session state data can persist. Application state refers to data maintained by a server application on behalf of its clients. To maximize throughput and availability, NLB uses a fully distributed software architecture. The cluster is assigned a primary IP address, which represents a virtual IP address to which all cluster hosts respond. Each cluster host is assigned a unique host priority in the range of 1 to 32, with lower numbers denoting higher priorities. NLB uses port rules to customize load balancing for a consecutive numeric range of server ports.