Check Point High Availability and Check Point Load Sharing


High availability can be your best friend, both from a network performance and from a security perspective. Many enterprises are concerned about the firewall being their single point of failure, and some organizations even have a contingency plan allowing for the redirection of traffic around a firewall, should it fail which is a poor solution, because an attacker could purposely attempt to cause this to happen. With a highly available solution, this won t be necessary.

The first question you have to ask yourself when implementing high availability is: What makes a system available? Is it that the operating system is for lack of a better term operating? Is it defined by a daemon on the system, or, like a server group discussed earlier in the book, does it require some sort of agent installed to monitor upness ? To answer these questions, we ll delve into the mechanics of Check Point High Availability.

Load sharing is simply an extension of high availability that allows all systems in the cluster to process traffic and be active at the same time.

Enabling High Availability (Legacy Mode)

Before you can begin using high availability, or define and join clusters, you have to do some preparatory work. Primarily, you need to make sure that you have the proper licensing in place in order to run the High Availability module, and that high availability is enabled. Then you must begin by defining the configuration and the Internet Protocol (IP) addresses on the future cluster members. The cluster members must have three interfaces, with four interfaces being preferred if you opt to use synchronization on a network separate from the management network. All of the internally facing IP addresses must be the same, as must all of the externally facing addresses. The Check Point High Availability module will make sure that the media access control (MAC) addresses are identical, so there s no need to play around with Address Resolution Protocol (ARP) entries. Figure 12.1 illustrates what a sample network layout for high availability might look like. Note that all of the external facing IP addresses are the same in the diagram (noted as .5 to indicate the final octet) as are the internal IP addresses. The interfaces on the management segment and synchronization network must each use a unique IP address.

click to expand
Figure 12.1: Highly Available Cluster using Legacy Mode

The next step toward gaining the benefits of Check Point High Availability is to enable it on the enforcement module. This is a really easy step, and only involves running the cpconfig command. On UNIX installations, simply run cpconfig , select Enable Check Point High Availability/State Synchronization and answer y for yes. Access the High Availability tab in Windows by selecting Start Programs Check Point Management Clients Check Point Configuration NG High Availability . Place a checkmark in the checkbox, indicating that you are enabling High Availability.

Because each system maintains the same IP addresses and MAC addresses on shared interfaces, when a failover condition occurs the standby system simply begins responding to ARP requests and starts processing the traffic. Because the same MAC addresses is used, no information must be updated on routers or other connected servers.

There are some restrictions when implementing a high availability solution. The gateways must be running the same version of Check Point VPN-1/FireWall-1 (VPN-1/FW-1), and they must be on the same platform (for example, you cannot synchronize a Solaris firewall with a Windows NT firewall). Also, you must have a separate management server; the management module cannot reside on a cluster member.

Another wise bit of advice is to configure each cluster member offline; that is, off of the network. While it is good security practice to build machines while they are disconnected from the network anyway, there is a different reason here. Since each machine will be sharing IP addresses, it s nice to avoid address conflicts that might be present if the machines were active on the network segment. Finally, if you are configuring a single entry point (SEP) VPN high availability solution, the VPN domain for the cluster should be a group object containing the cluster member gateways and their respective VPN domains. We ll discuss SEP later in this chapter.

start sidebar
Configuring & Implementing
How does the High Availability Module Select the MAC Address?

There are two distinct types of bootups for a High Availability (HA) member. Initially, at the first boot, there are no real elements of the cluster associated with that machine. The policy has not yet been installed, no priority is associated with the machine, and no gateway priority has been defined. In this case, the gateway begins to look for information by listening on User Datagram Protocol (UDP) port 8116, from an already configured cluster member. If it can t determine information from a configured cluster member, then it looks for information from other machines with its shared IP address. Once it sees that traffic, it will select the MAC address from the machine with the lowest Random ID and use it for its own.

After that initial boot, and after the remaining cluster information has been assigned, the CPHA module looks for packets coming from the primary cluster machine, compares that machine s MAC to its own, and changes its own, if necessary.

end sidebar
 

Enabling High Availability (New Mode)

New mode HA is very dissimilar to legacy mode. With legacy mode, there are many limitations, one of which is the fact that all systems utilize the same address on interfaces which are marked for high availability. This has been overcome in new mode. new mode functions in a way that is similar to other HA protocols such as Virtual Router Redundancy Protocol (VRRP) and Hot Standby Routing Protocol (HSRP) in that each system has its own IP address and utilizes a secondary, virtual IP (VIP) address for communicating with other devices on the network. Figure 12.2 shows the differences in IP addressing between legacy mode and new mode. In this configuration, the management station does not have to be on its own network since it can communicate directly with each of the cluster members in this mode.

click to expand
Figure 12.2: Other HA and Load Sharing Cluster Configurations

Another difference in this mode is how traffic is migrated from one system to another. At any one point in time, the VIP will resolve to the MAC address of the active cluster member. The standby system(s) will respond to ARP requests for its native IP address, but not the VIP. In the event that the cluster needs to failover traffic to the standby member, the standby begins responding to ARP requests for the VIP. To speed up the failover, it also sends a gratuitous number of ARP replies/updates to other systems on the network to notify them of the MAC sddress change for the VIP address. This shortens the failover time significantly.

Enabling Load Sharing (Multicast Mode)

Some organizations don t like to hear that they are paying for systems that sit idle. Others may have the need to spread load across multiple systems because of the load a lot of VPN or security servers may generate. This is where load sharing comes into the picture. Load sharing is an extension of the HA modes discussed previously; it will still allow traffic to be dynamically rerouted around a failed gateway to an active one without losing session state, but it also allows all systems in a cluster to be active instead of designating one or more as standby.

Load sharing configuration is a tricky process with a few caveats. You should definitely set it up in a lab before attempting to implement it. One of the biggest caveats is that there are numerous devices out there that it will not interoperate with. The reason for this is that Check Point s load sharing design using multicast requires all systems in the cluster (how else would the firewalls, or other devices on the network which treat the cluster as a single device, distribute the traffic amongst multiple devices) to see all the packets, and using what Check Point calls a decision function the devices in the cluster will decide which system will process which connections. This ensures that one system will process each packet causing it not to be inadvertently dropped, but that two systems do not process it causing duplicated traffic. Typical network design is for unicast and the understanding of most network administrators is that each packet will only be sent in one direction, so getting a single packet to multiple devices at the same time is rather difficult.

To solve this, Check Point operates load sharing multicast mode in a method similar to CPHA HA new mode with one small change: the MAC address used is a multicast MAC address (a MAC address that begins with 01:) instead of a unicast MAC address (the type your desktop system uses, which contains a MAC address beginning with 00:). Multicasting allows a single MAC address to be associated with multiple physical interfaces. Basically, this tells networking devices to send the packet to multiple network cards at the same time. Unfortunately, the combination of a multicast MAC address and a unicast IP address is not handled properly by some networking equipment. A short list of routers and switches known to handle this correctly can be found in Check Point s ClusterXL User Guide.

Enabling Load Sharing (Unicast/Pivot Mode)

In the event that you do not have equipment or the inclination to support load sharing in multicast mode, NG with Application Intelligence (AI) adds the option to do load sharing using Unicast MAC addresses instead of multicast ones.

Unicast mode, also called pivot mode, provides a solution to the limitations you may run into in your environment with multicast mode. In unicast mode, the handling of MAC addresses is similar to CPHA new mode in that only one device responds to ARP requests for the VIP address and traffic is sent to only one device. This device, referred to as the pivot , handles all the traffic and is the only device to make a decision function. This decision the pivot device makes is which cluster member will process the packet with regards to routing and firewalling. The pivot device can send the traffic to any of the other devices in the cluster to be inspected or inspect the traffic itself, hence their classification as being active, not standby. Because of the additional overhead of making the decision functions, it will typically handle less traffic than other devices in the cluster.

Other cluster members will not have to make a decision function due to the fact that they will only see traffic they have to process and inspect, so each will process all packets it sees.

In the event that a failure in the pivot mode system occurs, the next highest priority gateway will take over the pivot mode functions reassigning the amount of load the other devices will be responsible for. All traffic, including connections which were processed by the now failed pivot device, will continue to function. When the failed pivot device comes back online, it will reassume the pivot functionalities by telling the current pivot device to fail back to the new pivot device.

Failing Over

Now that we ve seen how to enable Check Point s high availability and load sharing functions, your next question most likely harkens back to our earlier wonderings about what classifies a system as up. When dealing with VPN-1/FW-1, the answer to this question is up to you.

When using the CPHA or CPLS modules, you gain access to the functionality of the cphaprob command. This command allows you to define services that are considered critical to the operation of the VPN-1/FW-1 system. There are also some default conditions that must be met for the system to be considered available:

  • The fwd process (and other critical pieces on the device) must be running, and must not report any problems.

  • The network connection must be active (interface up and link OK).

  • The machine must be running.

  • A security policy must be installed.

These are, of course, the most basic of conditions. As you ve come to expect (and, hopefully, appreciate) Check Point allows you to enhance the granularity of the checking. This is done using the aforementioned cphaprob command. This command is used to register additional devices within the firewall machine as critical, so that their failure will cause the preemption of cluster control. The options to this command are displayed in Table 12.1.

Table 12.1: cphaprob Command Options

Command Option

Command Explanation

-d <device name >

Specify a device to be monitored .

-s <status>

The state of the device. Status can be either ok, init, or problem. If the value is anything besides ok , the device is not considered active.

-t <timeout>

Define a timeout value. If the device doesn t report its status before the timeout expires , the device is considered as failed.

-f <filename> register

Allow the specification of a file containing multiple device definitions.

[-I[a]][-e] list

Display the current state of CPHA devices.

Register

Register the device as a critical process.

Unregister

Remove the registration of this device as a critical process.

Report

Display the status of the HA modules.

If

Display the status of interfaces.

Init

Instruct the firewall to reacquire the shared MAC address.

You can also use the cphaprob command with the state argument to see the status of the HA cluster. Example output for a two-member cluster might resemble this:

 $ cphaprob state Number     Unique Address   State 1 (local)  172.16.1.3       active 2          172.16.1.4       standby 

You can also check your log files for information about both synchronization and failover.

Firewall Synchronization

State synchronization allows the firewall or VPN module to be really highly available, in the truest sense. Without synchronization, when a failover occurs, the connections that are currently active will be dropped. This may not be that important when dealing with a firewall, for example, when the majority of the traffic through your firewall is destined for the web, but can be disastrous in a VPN context. You probably never want to be without synchronization when dealing with a VPN.

What synchronization does is maintain an identical state table on all of the machines involved in the gateway cluster. This, obviously, uses resources. The synchronization process consumes memory, CPU, and network resources, and depending on the size of the state table, this could be significant.

How does it work? The first thing to understand is that the entire state table is not copied from machine to machine all the time. Obviously, the first synchronization involves the entire state table (called a full sync ), but subsequent updates only involve the changes since the last update (referred to as a delta sync ). The updates occur by default every 100 milliseconds, and while this can be changed, the process isn t easy and you ll probably never want to try. Another thing to consider is that processing the updates takes a minimum of 55 milliseconds . If you are maintaining a particularly busy site, one with a lot of Hypertext Transfer Protocol (HTTP) traffic, for example, your state table may have a larger number of changes, and processing may require more time than the minimum.

Also, synchronization is not available when using a multiple entry point (MEP) VPN solution. This is because, as discussed later in this chapter, MEP is designed for use with a physically disperse VPN solution. Synchronization is most often used with a SEP VPN solution. You can see a screen shot of the Synchronization window in the section on SEP. In a truly user-friendly manner, enabling synchronization is as easy as placing a checkmark in the box labeled Use State Synchronization on the Synchronization tab of the cluster object. Next, you ll need to define the synchronization network by clicking Add on the Synchronization window. Clicking Add will show you a window such as the one shown in Figure 12.3.


Figure 12.3: Add Synchronization Network

There is a caveat here: Make sure that the synchronization network is trusted. The way to do this is to segment the synchronization traffic from any general-use traffic. In the case of a two-node cluster, you may use a crossover cable, for example. Next, you need to make sure that VPN-1/FW-1 control connections are allowed to pass between the cluster members. Simply make a rule that allows the VPN-1/FW-1 service from member to member.

After you have activated synchronization, you ll want to test it to make sure that it is working. There are a couple of different techniques. The quickest way is to check the size of the state tables on each machine. The command to do this is as follows :

 fw tab -t connections s 

While this is quick, it is the least accurate. Remember, the state table is updated frequently, so there is a chance that the table on one machine could change before you can type the command.

The most accurate method is the use of the fw ctl command. Using the pstat option will give you the information on the synchronization process (and other processes as well). A sample bit of the output is shown below.

 sync new ver working sync out: on  sync in: on sync packets sent: total: 2145 retransmitted: 0 retrans reqs:0 acks: 0 sync packets received: total 2473 of which 1 queued and 31 dropped by net also received 0 retrans reqs and 2 acks to 0 cb requests 

Another way to check is to see that two or more firewalls are connected to one another via the netstat “an command. We usually run netstat “an grep 256 . On Windows machines you can substitute the findstr command for grep .

The second line is the key to determining the operation of state synchronization. If synchronization is on, then both of these should be on.

Yet another manner is to simply use the SmartView status to view the status of the cluster. The ClusterXL section under each cluster member will revel if there are any problems with the state synchronization.

What if you are working on a particularly busy boundary firewall cluster, where the vast majority of traffic consists of HTTP and Simple Mail Transfer Protocol (SMTP) connections? Each of these connections is relatively short-lived, and might not be the best candidate for synchronization. HTTP, for example, is totally stateless from connection to connection by design, so a failover probably wouldn t be noticed. Does the burden of synchronization outweigh the benefits? If so, you are in luck. You don t have to synchronize every protocol. You can selectively weed out those protocols that are hogging too many resources when compared to the necessity of their HA conditions. This is done by editing the service object, clicking Advanced and unchecking Synchronize connections on Cluster . Here you also have the option to only synchronize a protocol after it has been open for a certain period of time. This is done using the Start Synchronizing n seconds after connection initiation option.

You can also selectively synchronize certain connections instead of protocols as a whole. Simply create another service object, with the same properties as the original, only with the option to synchronize disabled. Wherever this object is used, the connection will not be synchronized. Note: only one service can have Match for ˜Any selected. The one with the Match for ˜Any option checked will be the service with whose properties are used when Any is defined for the service in a rule. This can be very useful if you wish to synchronize connections to the e-commerce website, but connections on the same network running through the same cluster for the server only handling advertisement images does not have to be synchronized.

A new feature in Check Point NG with Application Intelligence R55 removes the software version dependence of state synchronization. This allows administrators to remove a system from the cluster, upgrade a single system in the cluster, say from NG AI R54 to NG AI R55, bring it back into the cluster, synchronize with other members, and then take the other systems down ”all without anyone losing a connection. This is absolutely necessary in environments where downtime windows are not available or cost money at any time. Check Point calls this a Zero-Downtime Upgrade .




Check Point NG[s]AI
Check Point NG[s]AI
ISBN: 735623015
EAN: N/A
Year: 2004
Pages: 149

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net