Load Sharing with Heartbeat | Linux Enterprise Cluster: Build a Highly Available Cluster with Commodity Hardware and Free Software

Using Heartbeat and two computers, we can offer one daemon, or service, from the primary server and then offer a different service from the backup server. If either server fails, the other one will start offering both services.

This is a form of load sharing called an active-active server configuration, but to use it you will have to ensure that each system has the processing power and ability to handle the network load that will allow it to offer both services in the event of a failure. This configuration, however, is much more difficult to administer and support in a production environment. Neither server can go down for maintenance or upgrades without causing a failover of at least one service. Figure 8-1 shows a sample diagram of this type of Heartbeat configuration.

image from book
Figure 8-1: Heartbeat active-active configuration

The two-line haresources entry used to create the configuration shown in Figure 8-1 (again, the primary and the backup server should always have identical haresources files) looks like this:

 primary.mydomain.com 209.100.100.3 sendmail backup.mydomain.com 209.100.100.4 httpd

Once Heartbeat is running on both servers, the primary.mydomain.com computer will offer sendmail at IP address 209.100.100.3, and the backup. mydomain.com computer will offer httpd at IP address 209.100.100.4. If the backup server fails, the primary server will perform Gratuitous ARP broadcasts for the IP address 209.100.100.4 and run httpd with the start argument. (The names "primary" and "backup" are really meaningless in this configuration because both computers are acting as both a primary and a backup server.) Client computers would always look for the sendmail resource at 209.100.100.3 and the http resource at 209.100.100.4, and even if one of these servers went down, the resource would still be available once Heartbeat started the resource and moved the IP alias over to the other computer.

Note

This configuration is more difficult to administer than an active-standby configuration, because you will always be making changes on a "live" system unless you failover all resources so they run on a single server before you do your maintenance.

This configuration is also not recommended when using local data (data stored on a locally attached disk drive of either server) because complex data replication and synchronization methods must be used.^[9]

Load Sharing with Heartbeat: Round-Robin DNS

But what if you wanted to offer just one resource from both computers and have them share the work? This is the goal of the cluster described in this book, and it will be possible to attain that goal using load balancing software described in Part III. However, for now we can achieve a simple form of load balancing using round-robin DNS.

One feature of the Domain Name System called round-robin DNS is that it allows you to offer one service on two (or more) IP addresses. For example, the host name (or web URL) is first resolved at IP address 209.100.100.4, then at 209.100.100.3, and then back at 209.100.100.4 again, and so on in roundrobin fashion. The DNS (BIND version 4.9.3 or later) entry for your server might look like this:

 ;Round-robin entry for www.mydomain.com www.mydomain.com    IN  A  209.100.100.3 www.mydomain.com    IN  A  209.100.100.4

with reverse address entries for the 209.100.100.3 and 209.100.100.4 IP addresses that look like this:

 3         IN PTR www.mydomain.com 4         IN PTR www.mydomain.com

The entry in your haresources file for this type of configuration would then look like this:

 primary.mydomain.com 209.100.100.3 httpd backup.mydomain.com 209.100.100.4 httpd

Problems with Round Robin DNS Load Balancing

When using round-robin DNS, the two servers would, in theory, each get half of the client requests, and Heartbeat would ensure that both IP addresses are available even if one of the servers goes down. Most client computers, however, have a name services caching daemon or NSCD that will cause them to remember (at least for a while) an IP address once they learn it. Storing this IP address on the client computer reduces the need for the client computer to repeatedly ask, "What is the IP address for this host name?" and helps improve the chances that client computers will not enter into a dialog (such as an HTTPS secure transaction) with one web server only to end up improperly sending a response (such as a credit card number) to another web server.

Caching IP address-to-host name can cause a cache-only DNS server on the Internet to respond to client requests for an IP address with a nonauthoritative reply using only one of the IP addresses. This intervening, cache-only DNS server effectively blocks the round-robin DNS replies from your authoritative DNS server.

You can try to stop this behavior by setting a very low time to live (TTL) value for your DNS replies. Once the amount of time specified in your time to live entry elapses, the intervening DNS server should drop the IP address-to-host name mapping it has stored in its memory and ask your authoritative DNS server once again for the proper IP address.

But there's a problem: If everyone on the Internet started using a onesecond DNS TTL value, every client computer on the Internet would effectively end up asking only the authoritative DNS server for the correct IP address when doing a host name-to-IP address resolution. This would circumvent the design of the DNS system whereby intervening DNS servers cache this information and thus reduce both the time it takes to resolve a host name and the amount of DNS traffic that has to be passed around on the Internet. (A very low TTL value for DNS entries means a more heavily loaded authoritative DNS server.^[10])

Using round-robin DNS and this type of heartbeat configuration for load sharing, however, lets you locate your two servers at two different physical locations. In the event of a true disaster, one of the servers would be able to take over both (or all) IP addresses, and once the routers on the Internet figured out where the IP address had moved to, eventually allow the client computers to continue to connect to your web server at its new location.

Note

This configuration is very susceptible to a split-brain condition. If exclusive access to resources is required, automatic failover mechanisms that require heartbeats to traverse a WAN or a public network should not be used.

Wide-Area Load Balancing

Now, what if the client computer on the Internet is much closer to one of the two web servers offering the same web page and it accidentally ends up with the IP address that happens to be on a server located on the other side of the world? It would make sense to offer your resources from the servers closest to the Internet client and only force the client to route to another server in the event of a failure or system crash.

This is called wide-area load balancing or globally distributed content and can be accomplished with an open source program for Linux called Super Sparrow. A server running Super Sparrow will examine the client's source address and determine whether the client computer would be better off talking to a different server with synchronized content that is closer to the client computer. If so, the client computer's request is redirected to the closest available server. (For more information, see the Super Sparrow website at http://www.supersparrow.org.)

^[9]See Chapter 7, "Failover Configurations and Issues," in Blueprints for High Availability by Evan Marcus and Hal Stern.

^[10]To prevent this problem from ever happening, many DNS clients and servers ignore small TTL values and use cached information anyway.