LVS Persistence | Linux Enterprise Cluster: Build a Highly Available Cluster with Commodity Hardware and Free Software

Regardless of the LVS forwarding method you choose, if you need to make sure all of the connections from a client return to the same real server, you need LVS persistence. For example, you may want to use LVS persistence to avoid wasting licenses if one user (one client computer) needs to run multiple instances of the same application.^[14] Persistence is also often desirable with SSL because the key exchange process required to establish an SSL connection will only need to be done one time when you enable persistence.

Persistent Connection Template

When using LVS persistence, the Director is internally using a connection tracking record called a persistent connection template to ensure that all connections from the client computer are assigned to the same real server. As the client computer makes connection requests to the cluster, the director creates a normal connection tracking record for each connection, but it does so only after it looks at the persistent connection template record and decides which real server has already been assigned to this type of connection. What do I mean by type of connection? I'll discuss that in a moment, but first let me explain how the Director removes persistent connection templates based on a timeout value you specify.

Persistence Timeout

Use the ipvsadm utility to specify a timeout value for the persistent connection template on the Director. I'll show you which ipvsadm commands to use to set the persistence timeout for each type of connection in a moment when I get to the discussion of the types of LVS persistence. The timeout value you specify is used to set a timer for each persistent connection template as it is created. The timer will count down to zero whether or not the connection is active, and can be viewed^[15] with ipvsadm -L -c.

If the counter reaches zero and the connection is still active (the client computer is still communicating with the real server), the counter will reset to a default value^[16] of two minutes regardless of the persistent timeout value you specified and will then begin counting down to zero again. The counter is then reset to the default timeout value each time the counter reaches 0 as long as the connection remains active.

Note

You can also use the ipvsadm utility to specify a TCP session timeout value that may be larger than the persistent timeout value you specified. A large TCP session timeout value will therefore also increase the amount of time a connection template entry remains on the Director, which may be greater than the persistent timeout value you specify.

Types of Persistent Connections

Now that I've introduced you to the method LVS uses to expire unused persistent connection template records, let's examine the five types of persistent connections. They are:

Persistent client connections (PCC), which cause all services a client is accessing to persist. (Also called zero port connections.)
Persistent port connections (PPC), which cause a single service to persist.
Persistent Netfilter Marked Packet persistence, which causes packets that have been marked with the ipchains/iptables utility to persist.
FTP connections (FTP connections require careful handling due to the complex^[17] nature of FTP connections).
Expired persistence, which is used internally by the Director to expire connection tracking entries when the persistent connection template expires.^[18]

Note

If you are building a web cluster, you may need to set the persistence granularity that LVS should use to group CIPs. Normally, each CIP is treated as a unique address when LVS looks up records in the persistent connection template. You can, however, group CIPs using a network mask (see the -M option on the ipvsadm man page and the LVS HOWTO for details).

We are most interested in PPC, PCC, and Netfilter Marked Packet persistence.

Persistent Client Connection (PCC)

A persistent client connection (PCC) forces all connections from a client computer to a single real server. A PCC is simply a virtual service created with no port number (or port number 0) and with the -p flag set. You would use a persistence client connection when you want all of the connections from a client computer to go to the same real server. If a customer adds items to a shopping cart on your web cluster using the HTTP protocol and then clicks the checkout button to use the encrypted HTTPS protocol, you want to use a persistent client connection so that both port 80 (HTTP) and port 443 (HTTPS) will go to the same real server inside the cluster. For example:

 /sbin/ipvsadm -A -t 209.100.100.3:0 -s rr -p /sbin/ipvsadm -a -t 209.100.100.3:0 -r 10.1.1.2 -m /sbin/ipvsadm -a -t 209.100.100.3:0 -r 10.1.1.3 -m

These three lines create a PCC virtual service on VIP address 209.100.100.3 using real servers 10.1.1.2 and 10.1.1.3.

The default timeout value of 360 seconds can be modified by supplying the -p option with the number of seconds that the persistent connection template should remain on the Director. For example, to create a one-hour PCC virtual service, use:

 /sbin/ipvsadm -A -t 209.100.100.3:0 -s rr -p 3600 /sbin/ipvsadm -a -t 209.100.100.3:0 -r 10.1.1.2 -m /sbin/ipvsadm -a -t 209.100.100.3:0 -r 10.1.1.3 -m

Persistent Port Connection (PPC)

A persistent port connection (PPC) forces all connections from a client computer for a particular destination port number to the same real server. For example, let's say you want to allow a user to create multiple telnet sessions,^[19] and you would like all of the telnet sessions to go to the same real server; however, when the user calls up a web page, you'd like to assign this request to any node, regardless of which real server they are using for telnet. In this case, you could use persistent port connections for the telnet port (port 23) and the HTTP port (port 80), as follows:

 /sbin/ipvsadm -A -t 209.100.100.3:80 -s rr -p 3600 /sbin/ipvsadm -a -t 209.100.100.3:80 -r 10.1.1.2 -m /sbin/ipvsadm -a -t 209.100.100.3:80 -r 10.1.1.3 -m /sbin/ipvsadm -A -t 209.100.100.3:23 -s rr -p 3600 /sbin/ipvsadm -a -t 209.100.100.3:23 -r 10.1.1.2 -m /sbin/ipvsadm -a -t 209.100.100.3:23 -r 10.1.1.3 -m

Port Affinity

The key difference between PCC and PPC persistence is sometimes called port affinity. With PCC persistence, all connections to the cluster from a single client computer end up on the same real server. Thus, a client computer talking on port 80 (using HTTP) will connect to the same real server when it needs to use port 443 (for HTTPS). When you use PCC, ports 80 and 443 are therefore said to have an affinity with each other.

On the other hand, when you create an IPVS table with multiple ports using PPC, the Director creates one connection tracking template record for each port the client uses to talk to the cluster so that one client computer may be assigned to multiple real servers (one for each port used). With PPC persistence, the ports do not have an affinity with each other.

In fact, when using PCC persistence, all ports have an affinity with each other. This may increase the chance of an imbalanced cluster load if client computers need to connect to several port numbers to use the cluster. To create port affinity that only applies to a specific set of ports (port 80 and port 443 for HTTP and HTTPS communication, for example), use Netfilter Marked Packets.

Netfilter Marked Packets

Netfilter Marked Packets (formerly called fwmarked packets) have been marked by either the ipchains or the iptables utility on the Director. The Netfilter mark only affects the packet while it is on the Director; once the packet leaves the Director, it is no longer marked (real servers can't see the marks made on the Director).

When you specify the criteria for marking a packet (using iptables or ipchains), you assign a mark number. This number is then associated with an IP virtual service (using the ipvsadm utility), so that the marked packets will be sent to the proper real server.

The Netfilter mark number is placed in the socket buffer (called sk_buff), not in the packet header, and is only associated with the packet while it is being processed by the kernel on the Director.

Your iptables rules cause the Netfilter mark number to be placed inside the packet sk_buff entry before the packet passes through the routing process in the PRE_ROUTING hook. Once the packet completes through the routing process and the kernel decides which packets should be delivered locally, it reaches the LOCAL_IN hook. At this point, the LVS code sees the packet and can check the Netfilter mark number to determine which IP virtual server (IPVS) to use, as shown in Figure 14-3.

image from book
Figure 14-3: Netfilter Marked Packets and LVS

Notice in Figure 14-3 that the Netfilter mark is placed into the incoming packet header (shown as a white square inside of the black box representing packets) in the PRE_ROUTING hook. The Netfilter mark can then be read by the LVS code that searches for IPVS packets in the LOCAL_IN hook. Normally, only packets destined for a VIP address that has been configured as a virtual service by the ipvsadm utility are selected by the LVS code, but if you are using Netfilter to mark packets as shown in this diagram, you can build a virtual service that selects packets based on the mark number.^[20] LVS can then send these packets back out a NIC (shown as an eth1 in this example) for processing on a real server.

Marking Packets with iptables

For example, say we want to create one-hour persistent port affinity between ports 80 and 443 on VIP 209.100.100.3 for real servers 10.1.1.2 and 10.1.1.3. We'd use these iptables and ipvsadm commands:

 /sbin/iptables -F -t mangle /sbin/iptables -A PREROUTING -i eth0 -t mangle -p tcp \     -d 209.100.100.3/24 --dport 80 -j MARK --set-mark 99 /sbin/iptables -A PREROUTING -i eth0 -t mangle -p tcp \     -d 209.100.100.3/24 --dport 443 -j MARK --set-mark 99 /sbin/ipvsadm -A -f 99 -s rr -p 3600 /sbin/ipvsadm -a -f 99 -r 10.1.1.2 -m /sbin/ipvsadm -a -f 99 -r 10.1.1.3 -m

Note

The \ character in the above commands means the command continues on the next line.

The command in the first line flushes all of the rules out of the iptables mangle table. This is the table that allows you to insert your own rules into the PRE_ROUTING hook.

The commands on the second line contain the criteria we want to use to select packets for marking. We're selecting packets that are destined for our VIP address (209.100.100.3) with a netmask of 255.255.255.0. The /24 indicates the first 24 bits are the netmask. The balance of that line says to mark packets that are trying to reach port 80 and port 443 with the Netfilter mark number 99.

The commands on the last three lines use ipvsadm to send these packets using round robin scheduling and LVS-NAT forwarding to real servers 10.1.1.2 and 10.1.1.3.

To view the iptables we have created, you would enter:

 #iptables -L -t mangle -n Chain PREROUTING (policy ACCEPT) target      prot opt source     destination MARK        tcp  --  0.0.0.0/0  209.100.100.3    tcp dpt:80 MARK set 0x63 MARK        tcp  --  0.0.0.0/0  209.100.100.3    tcp dpt:443 MARK set 0x63 Chain OUTPUT (policy ACCEPT) target      prot opt source     destination

Recall from Chapter 2 that iptables uses table names instead of Netfilter hooks to simplify the administration of the kernel netfilter rules. The chain called PREROUTING in this report refers to the netfilter PRE_ROUTING hook shown in Figure 14-3. So this report shows two MARK rules that will now be applied to packets as they hit the PRE_ROUTING hook based on destination address, protocol (tcp), and destination ports (dpt) 80 and 443. The report shows that packets that match this criteria will have their MARK number set to hexa-decimal 63 (0x63), which is decimal 99.

Now, let's examine the LVS IP Virtual Server routing rules we've created with the following command:

 #ipvsadm -L -n IP Virtual Server version x.x.x (size=4096) Prot LocalAddress:Port Scheduler Flags   -> RemoteAddress:Port        Forward   Weight   ActiveConn    InActConn FWM  99 rr persistent 3600      -> 10.1.1.2:0             Masq      1        0             0      -> 10.1.1.3:0             Masq      1        0             0

The output of this command shows that packets with a Netfilter marked (fwmarked) FWM value of decimal 99 will be sent to real servers 10.1.1.2 and 10.1.1.3 for processing using the Masq forwarding method (recall from Chapter 11 that Masq refers to the LVS-NAT forwarding method).

Notice the flexibility and power we have when using iptables to mark packets that we can then forward to real servers with ipvsadm forwarding and scheduling rules. You can use any criteria available to the iptables utility to mark packets when they arrive on the Director, at which point these packets can be sent via LVS scheduling and routing methods to real servers inside the cluster for processing.

The iptables utility can select packets based on the source address, destination address, or port number and a variety of other criteria. See Chapter 2 for more examples of the iptables utility. (The same rules that match packets for acceptance into the system that were provided in Chapter 2 can also match packets for marking when you use the -t mangle option and the PREROUTING chain.) Also see the iptables man page for more information or search the Web for "Rusty's Remarkably Unreliable Guides."

^[14]Again, this depends on your licensing scheme.

^[15]The value in the "expire" column.

^[16]Controlled by the TIME_WAIT timeout value in the IPVS code. If you are using secure tcp to prevent a DoS attack as discussed earlier in this chapter, the default TIME_WAIT timeout value is one minute

^[17]In normal FTP connections, the server connects back to the client to send or receive data. FTP can also switch to a passive mode so that the client can connect to the server to send or receive data.

^[18]To avoid the time-consuming process of searching through the connection tracking table for all of the connections that are no longer valid and removing them (because a persistent connection template cannot be removed until after the connection tracking records it is associated with expire), LVS simply enters 65535 for the port numbers in the persistent connection template entry and allows the connection tracking records to expire normally (which will ultimately cause the persistent connection template entry to expire).

^[19]From one client computer.

^[20]Regardless of the destination address in the packet header.