Heartbeat Control Messages | Linux Enterprise Cluster: Build a Highly Available Cluster with Commodity Hardware and Free Software

In this chapter we will look at the three most basic heartbeat control messages^[6] (three kinds of packets, if you are using an Ethernet network):

Heartbeats or status messages
Cluster transition messages
Retransmission requests

Heartbeats

Heartbeats (sometimes called status messages) are broadcast, unicast, or multicast packets that are only about 150 bytes long. You control how often each computer broadcasts its heartbeat and how long the heartbeat daemon running on another node should wait before assuming something has gone wrong.

Cluster Transition Messages

The two most prevalent cluster transition messages are ip-request and ip-request-resp. These messages are relatively rare and contain the conversation between heartbeat daemons when they want to move a resource from one computer to another.

When you repair the primary server and it comes back online, it uses ip-request to ask the backup server to release the resource it took ownership of when the primary server failed. The backup server then shuts off the service and replies with an ip-request-resp message to inform the primary server that it no longer owns the resource. When the primary server receives this ip-request-resp, it starts up the service and offers it to the client computers again (it takes back ownership of the resource).

Retransmission Requests

The rexmit-request (or ns_rexmit) message—a request for a retransmission of a heartbeat control message—is issued when one of the servers running heartbeat notices that it is receiving heartbeat control messages that are out of sequence. (Heartbeat daemons use sequence numbers to ensure packets are not dropped or corrupted.) Heartbeat daemons will only ask for a retransmission of a heartbeat control message (with no sequence number, hence the "ns" in ns_rexmit) once every second, to avoid flooding the network with these retry requests when something goes wrong.

Ethernet Heartbeat Control Messages

All three of these heartbeat control messages are sent using the UDP protocol to either the port number specified in the /etc/ha.d/ha.cf file, or to the multicast address specified in this same configuration file (when using Ethernet).

Currently Heartbeat does not support more than two nodes.^[7] More than one pair of heartbeat servers can share the same Ethernet network connection and exchange heartbeats and heartbeat control messages, but each of these pairs of heartbeat servers must use a unique UDP port number as specified in the /etc/ha.d/ha.cf file, or a unique unicast or multicast address.

Security and Heartbeat Control Messages

In addition to using a numbering sequence to recover from dropped or corrupted packets, Heartbeat digitally signs each packet using either a 128-bit hashing algorithm called MD5 (see RFC^[8] 1321), or the even more secure 160-bit HMAC-SHA1 (see RFC 2104). (You enter the same encryption password for either of these methods on both the primary and the backup heartbeat nodes.)

Note

The Heartbeat developers recommend that you use one of these encryption methods even on private networks to protect Heartbeat from an attacker (spoofed packets or a packet replay attack).

^[6]For a complete list of heartbeat control message types see this website: http://www.linux-ha.org/comm/Heartbeat-msgfmt.html.

^[7]To be more accurate, Heartbeat will only allow two nodes to share haresources entries (see Chapter 8 for more information about the proper use of the haresources file). In this book we are focusing on building a pair of high-availability LVS-DR directors to achieve high-availability clustering.

^[8]The Requests for Comments (RFCs) can be found at http://www.rfc-editor.org.