Check Point High Availability (CPHA) | The Best Damn Firewall Book Period

High availability can be your best friend, both from a network performance and from a security perspective. Many enterprises are concerned about the firewall being their single point of failure, and we've seen more than one contingency plan allowing for the redirection of traffic around a firewall, should it fail. With a highly available solution, this won't be needed.

One of the first questions we are often asked when dealing with high availability concerns the definition of available. What makes a system available? Is it that the operating system is…for lack of a better term…operating? Is it defined by a daemon on the system, or, like a server group discussed earlier in the book, does it require some sort of agent installed to monitor "upness"? To answer these questions, we'll delve into the mechanics of Check Point High Availability.

Enabling High Availability

Before you can begin using HA or define and join clusters, you have to do some preparatory work. Primarily, you need to make sure that you have the proper licensing in place in order to run the High Availability module, and that HA is enabled. Then you must define the configuration and the IP addresses on the future cluster members. The cluster members must have three interfaces, with four interfaces being preferred if you opt to use synchronization. All of the internally facing IP addresses must be the same, as must all of the externally facing addresses. The Check Point High Availability module will make sure that the MAC addresses are identical, so there's no need to play around with Address Resolution Protocol (ARP) entries. Figure 15.1 illustrates what a sample network layout for High Availability might look like. Note that all of the external facing IP addresses are the same in the diagram (noted as .101 to indicate the final octet) as are the Internal IP addresses. Also, while we indicate that a hub or switch can be used, Check Point is, at the time of this writing, reported to be addressing a problem whereby only a hub can be used for a HA cluster. The interfaces on the management segment must each use a unique IP address. Also, if state synchronization is opted for, you'll probably want to connect the firewall machines on another interface, one used exclusively for synchronization. We'll discuss synchronization later in the chapter.

click to expand
Figure 15.1: An HA Cluster

The next step toward gaining the benefits of Check Point High Availability is to enable it on the enforcement module. This is a really easy step, and only involves running the cpconfig command. An example of the cpconfig command run on a Solaris machine that is running the enforcement module is shown in the output below and in Figure 15.2.

 # cpconfig This program will let you re-configure your VPN-1 & FireWall-1 configuration.         Configuration Options: ---------------------- (1)  Licenses (2)  SNMP Extension (3)  PKCS#11 Token (4)  Random Pool (5)  Secure Internal Communication (6)  Enable Check Point High Availability/State  Synchronization (7)  Automatic start of Check Point modules     (8) Exit     Enter your choice (1-8) :6   Configuring Enable Check Point High Availability/State Synchronization... =========================================================== High Availability module is currently disabled.         Would you like to enable the High Availability module (y/n) [y] ? y              -----------------------------------------------------------     You have changed the High Availability configuration. Would you like to restart High Availability Module now so that your changes will take effect? (y/n) [y] ? y             *********************************************************** The High Availability module is now enabled. cpconfig will now end. To continue, please run cpconfig again.

Figure 15.2: Enabling High Availability

Enabling Check Point High Availability in Windows is even easier, since the Windows version of cpconfig is GUI based. Access the High Availability tab by selecting Start | Programs | Check Point Management Clients | Check Point Configuration NG | High Availability tab. Place a checkmark in the checkbox, indicating that you are enabling High Availability.

Note

There are two distinct types of bootups for a HA member. Initially, at the first boot, there are no real elements of the cluster associated with that machine. The policy has not yet been installed, no priority is associated with the machine, and no gateway priority has been defined. In this case, the gateway begins to look for information by listening on UDP port 8116, from an already configured cluster member. If it can't determine information from a configured cluster member, then it looks for information from other machines with its shared IP address. Once it sees that traffic, it will select the MAC address from the machine with the lowest Random ID and use it for its own.

After that initial boot, and after the remaining cluster information has been assigned, the CPHA module looks for packets coming from the Primary cluster machine, compares that machine's MAC to its own, and changes its own, if necessary.

There are some restrictions when implementing a High Availability solution. The gateways must be running the same version of VPN-1/FW-1, and they must be on the same platform (e.g., you cannot synchronize a Solaris firewall with a Windows NT firewall). Also, you must have a separate management server; the management module cannot reside on a cluster member.

Another wise bit of advice is to configure each cluster member offline, that is, off of the network. While it is good security practice to build machines disconnected from the network anyway, there is a different reason here. Since each machine will be sharing IP addresses, it's nice to avoid address conflicts that might be present if the machines were active on the network segment. Finally, if you are configuring a Single Entry Point (SEP) Virtual Private Network (VPN) HA solution, the VPN domain for the cluster should be a group object containing the cluster member gateways and their respective VPN domains. We'll discuss SEP later in this chapter.

Failing Over

Now that we've seen how to enable Check Point's High Availability, your next question most likely harkens back to our earlier wonderings about what classifies a system as "up." When dealing with Check Point FW-1, the answer to this question is up to you.

When using the Check Point High Availability module, you gain access to the functionality of the cphaprob command. This command enables you to define services that are considered critical to the operation of the VPN-1/FW-1 system. There are also some default conditions that must be met for the system to be considered available. These are as follows:

The fwd process must be running, and must not report any problems (for example, the un-installation of the security policy is considered a problem).
The network connection must be active.
The machine must be running.

These are, of course, the most basic of conditions. As you've come to expect, (and, we hope, appreciate) Check Point enables you to enhance the granularity of the checking. This is done using the aforementioned cphaprob command. This command is used to register additional devices within the firewall machine as critical, so that their failure will cause the preemption of cluster control. The options to this command are displayed in Table 15.1.

Table 15.1: *cphaprob* Command Options
Command Option	Command Explanation
-d <device name>	Specify a device to be monitored.
-s <status>	The state of the device. Status can be either "ok," "init," or "problem." If the value is anything besides "ok", the device is not considered active.
-t <timeout>	Define a timeout value. If the device doesn't report its status before the timeout expires, the device is considered to be failed.
-f <filename> register	Allow the specification of a file containing multiple device definitions.
[-I[a]][-e] list	Display the current state of CPHA devices.
Register	Register the device as a critical process.
Unregister	Remove the registration of this device as a critical process.
Report	Display the status of the HA modules.
If	Display the status of interfaces.
Init	Instruct the firewall to reacquire the shared MAC address.

You can also use the cphaprob command with the state argument to see the status of the HA cluster. Example output for a two-member cluster might resemble this:

$ cphaprob state     Number     Unique Address   State     1 (local)  192.168.10.1     active 2          192.168.10.2     standby

You can also check your log files for information about both synchronization and failover.

Firewall Synchronization

State Synchronization allows the firewall or VPN module to be really highly available, in the truest sense. Without synchronization, when a fail-over occurs, the connections that are currently active will be dropped. This may not be that important when dealing with a firewall, for example, when the majority of the traffic through your firewall is destined for the Web, but can be disastrous in a VPN context. You probably never want to be without synchronization when dealing with a VPN.

Synchronization maintains an identical state table on all of the machines involved in the gateway cluster. This, obviously, uses resources. The synchronization process consumes memory, CPU, and network resources, and depending on the size of the state table, this could be significant.

How does it work? The first thing to grasp is that the entire state table is not copied from machine to machine all the time. Obviously, the first synchronization involves the entire state table, but subsequent updates only involve the changes since the last update. The updates occur by default every 100 milliseconds, and while this can be changed, the process isn't easy and you'll probably never want to try. Another thing to consider is that processing the updates takes a minimum of 55 milliseconds. If you are maintaining a particularly busy site, one with a lot of HTTP traffic, for example, your state table may have a larger number of changes, and processing may require more time than the minimum. When we say that synchronization consumes resources, we mean it.

Also, synchronization is not available when using a Multiple Entry Point (MEP) VPN solution. This is because, as we will discuss later in this chapter, MEP is designed for use with a disperse VPN solution. Synchronization is most often used with a SEP VPN solution, and you can see a screen shot of the Synchronization panel in the section on SEP. In a truly user-friendly manner, enabling synchronization is as easy as placing a checkmark in the box labeled Use State Synchronization on the Synchronization tab of the cluster object. Next, you'll need to define the synchronization network by clicking on Add on the Synchronization panel. Clicking Add will show you a panel similar to the one shown in Figure 15.3.

Figure 15.3: Add Synchronization Network

There's a caveat here: Make sure that the synchronization network is trusted. The way we do this is to segment the synchronization traffic from any general-use traffic (for example, by using a crossover cable when dealing with a two-member cluster). Next, you need to make sure that FW-1 control connections are allowed to pass between the cluster members. Simply make a rule that allows the FW1 service from member to member.

After you have activated synchronization, you'll want to test it to make sure that it is working. There are a couple of different techniques. The quickest way is to check the size of the state tables on each machine. The command to do this is as follows:

fw tab -t connections -s

While this is quick, it is the least accurate. Remember, the state table is updated frequently, so there is a chance that the table on one machine could change before you can type the command.

The most accurate method (although we've seen it return false information) is the use of the fw ctl command. Using the pstat option will give you the info on the synchronization process (and other processes as well). A sample bit of the output is shown below.

sync new ver working sync out: on  sync in: on sync packets sent: total: 2145 retransmitted: 0 retrans reqs:0 acks: 0 sync packets received: total 2473 of which 1 queued and 31 dropped by net also received 0 retrans reqs and 2 acks to 0 cb requests

Another way to check is to see that two or more firewalls are connected to one another via the netstat –an command (for example, netstat –an | grep 256). On Windows machines you can substitute the findstr command for grep.

The second line is the key to determining the operation of state synchronization. If synchronization is on, then both the sync out and sync in fields should be on.

What if you are working on a particularly busy boundary firewall cluster, where the vast majority of traffic consists of HTTP and SMTP connections? Each of these connections is relatively short-lived, and might not be the best candidates for synchronization. HTTP, for example, is totally stateless by design, so a fail-over probably wouldn't be noticed. Does the burden of synchronization outweigh the benefits? If so, you are in luck - you don't have to synchronize every protocol. You can selectively weed out those protocols that are hogging too many resources when compared to the necessity of their HA condition. This is done by editing the $FWIDR/lib/user.def file and inserting a line like this:

//Don't sync the web!  non_sync_ports {<80, 6>};

The first line is a comment, which is always a wise thing to add. The second line supplies port numbers as arguments a port number (80) and a protocol number (6). After applying that change to all cluster members and restarting the firewall service, you'll no longer be synching HTTP, and perhaps will be saving CPU cycles.