DHCP Failover | Microsoft Windows Server 2003 Unleashed (R2 Edition)

The importance of DHCP cannot be understated. Downtime for DHCP translates into hordes of angry users who can no longer access the network. Consequently, it is extremely important to build redundancy into the DHCP environment and provide for disaster recovery procedures in the event of total DHCP failure.

Unfortunately, the DHCP service has no method of dynamically working in tandem with another DHCP server to synchronize client leases and scope information. However, using a few tricks, you can configure a failover DHCP environment that will provide for redundancy in the case of server failure or outage. Three specific options will provide for redundancy, and the pros and cons of each should be matched to the requirements of your organization.

The 50/50 Failover Approach for DHCP Fault Tolerance

The 50/50 failover approach effectively uses two DHCP servers that each handle an equal amount of client traffic on a subnet. Each DHCP server is configured with similar scope, but each must have a different IP range to avoid IP addressing conflicts.

Figure 10.7 illustrates the 50/50 failover approach. As indicated in the diagram, the network has 200 clients defined by 192.168.1.0/24. Each DHCP server contains a scope to cover the entire specific client subnet. Server1's scope is configured with exclusions for all IPs except for the range of 192.168.1.1192.168.1.125. Server2's scope is configured with exclusions for the first half and a client lease range of 192.168.1.126192.168.1.254.

Figure 10.7. The 50/50 failover approach.

Upon requesting a client IP address, the first server to respond to a request will be accepted, thus roughly balancing the load between the two servers.

The advantage to this approach is that a degree of redundancy is built into the DHCP environment without the need for extra IP address ranges reserved for clients. However, several caveats must be considered before implementing this approach.

First and foremost, it is theoretically possible that one server is located closer to the majority of the clients, and therefore more clients would be directed to that particular server. This could theoretically cause the DHCP server to run out of client leases, making it ineffectual for redundancy. For this reason, it is preferable to consider other methods of failover for DHCP, if sufficient lease ranges are available.

Another important consideration whenever configuring DHCP servers in this method is that an exclusion range must be established for the range that exists on the other server so that when a client from the other server attempts to renew the lease, it is not refused a new lease. This situation could potentially occur if the exclusion is not established because the client and server would have trouble negotiating if the client was using an IP address out of the range that exists in the scope. Consequently, if the range exists, but an exclusion is established, the server will simply assign a new address in the backup range.

The 80/20 Failover Approach to DHCP Fault Tolerance

The 80/20 failover approach is similar to the 50/50 approach, except that the effective scope range on the server designated as the backup DHCP server contains only 20% of the available client IP range. In most cases, this server that holds 20% would be located across the network on a remote subnet, so it would not primarily be responsible for client leases. The server with 80% of the range would be physically located closer to the actual server, thus accepting the majority of the clients by responding to their requests faster, as illustrated in Figure 10.8.

Figure 10.8. The 80/20 failover approach.

In the event of Server1's failure, Server2 would respond to client requests until Server1 could be re-established in the network.

The downside to this approach is that if Server1 is down for too long a period of time, it would eventually run out of potential leases for clients, and client renewal would fail. It is therefore important to establish a disaster recovery plan for the server with 80% of the scopes so that downtime is minimized.

Just as with the 50/50 approach, it is important to establish exclusion ranges for the other DHCP server's range, as described in the previous sections.

The 100/100 Failover Approach to DHCP Fault Tolerance

The 100/100 failover approach in Windows Server 2003 DHCP is the most effective means of achieving high availability out of a DHCP environment. However, several big "gotchas" must be worked out before this type of redundancy can be implemented.

The 100/100 failover approach in its simplest form consists of two servers running DHCP, with each servicing the same subnets in an organization. The scopes on each server, however, contain different, equivalent size ranges for clients that are each large enough to handle all clients in a specific subnet.

In Figure 10.9, the 10.2.0.0/16 subnet has a total of 750 clients. This subnet is serviced by two DHCP servers, each of which has a scope for the subnet. Each server has a scope with addresses from 10.2.1.1 through 10.2.8.254. The scope on Server1 excludes all IP addresses except those in the range of 10.2.1.1 through 10.2.4.254. The scope on Server2 excludes all IP addresses except those in the range from 10.2.5.1 through 10.2.8.254. Each effective range is subsequently large enough to handle 1,000 clients, more than enough for every machine on the network.

Figure 10.9. The 100/100 failover approach.

If one of the DHCP servers experiences an interruption in service, and it no longer responds, the second server will take over, responding to clients and allowing them to change their IP addresses to the IPs available in the separate range.

The advantages to this design are obvious. In the event of a single server failure, the second server will immediately issue new IP addresses for clients that previously used the failed server. Because both servers run constantly, the failover is instantaneous. In addition, the failed DHCP server could theoretically remain out of service for the entire lease duration because the second server will be able to pick up all the slack from the failed server.

The main caveat to this approach is that a large number of IP addresses must be available for clients, more than twice the number that would normally be available. This may prove difficult, if not impossible, in many networks that have a limited IP range to work with. However, in organizations with a larger IP range, such as those offered by private network configurations (10.x.x.x and so on), this type of configuration is ideal.

As you can see in Figure 10.9, both ranges must include the scopes from the other servers to prevent the types of problems described in the preceding examples.

Note

If your organization uses a private IP addressing scheme such as 10.x.x.x or 192.168.x.x, it is wise to segment available IP addresses on specific subnets to include several times more potential IP addresses than are currently required in a network. This not only ensures effective DHCP failover strategies, but it also allows for robust network growth without the need for an IP addressing overhaul.

Standby Scopes Approach

A standby DHCP server is simply a server with DHCP installed, configured with scopes, but not turned on. The scopes must be configured in different ranges, as in the previous examples, but they normally lie dormant until they are needed. The advantage to this approach lies in the fact that the DHCP service can be installed on a server that will not normally be using additional resources for DHCP. In the case of a problem, you simply need to activate the dormant scopes. An automated tool or script can be used to perform this function, if desired.

Clustering DHCP Servers

The final redundancy option with DHCP is to deploy a clustered server set to run DHCP. In this option, if a single server goes down, the second server in a cluster will take over DHCP operations. This option requires a greater investment in hardware and should be considered only in specific cases in which it is necessary. For more information on clustering servers, see Chapter 31, "System-Level Fault Tolerance (Clustering/Network Load Balancing)."