Server Load Balancing (SLB) involves three elements: real servers, server farms, and virtual servers. The primary element is the real server that contains the origin server content. Real servers are logical units and may not be associated with a physical server. For example, you may have multiple logical real servers on a single physical server. A server farm is a group or pool of real servers. You can create server farms by replicating a single real server to scale the performance and provide redundancy to your application. Each real server in a server farm responds to requests for the same content. Lastly, virtual servers are the client-facing element of an SLB environment. You can configure your virtual servers by specifying Layer 3 and 4 policies for your content, such as IP address or range of addresses, IP protocol, and TCP/User Datagram Protocol (UDP) ports. You can also configure Layer 57 policies to further narrow your virtual servers down to specific functionality, based on such criterion as HTTP headers and URLs.
The virtual server provides your clients with access to the server farm, as Figure 10-1 illustrates.
Figure 10-1. A Typical Server Farm
Although Figure 10-1 shows a single server farm environment with standard client-to-server load balancing, Cisco supports hundreds of server farms with server-to-server load balancing.
Virtual servers contain the policies that content switches use to match content requests to make load balancing decisions. The content switch performs a virtual server lookup by matching the client's requests to the virtual server that contains the largest number of identical policies. For example, you may have Secure Socket Layer (SSL) available on your website. You would then have two virtual servers differentiated by TCP port, but otherwise the configuration would be the same. You would normally configure port 80 for your HTTP virtual server and port 443 for your SSL virtual server. Even your configured real servers would be the same, unless you are offloading your server's SSL processing to a dedicated device, which you will learn about in Chapter 11, "Switching Secured Content." Example 10-1 shows how you can configure a single server farm and two virtual servers on the Cisco Content Services Switch (CSS).
The terms "real server," "server farm," and "virtual servers" are Content Switching Module (CSM)-specific terms. In contrast, "services" and "content rules" are used for CSS-specific explanationsthere is no CSS-equivalent terminology for server farms. Because the CSS and CSM share the majority of the same concepts, this Chapter will use the CSM-specific terms when referring to concepts from either product, primarily because of the additional term for server farms. When referring to the configuration, this Chapter will refer to the product-specific terminology (that is, services and content rules for CSS configuration and real servers, virtual servers, and server farms for CSM configuration).
Example 10-1. Configuring Two Virtual Servers Using a Single Server Farm on a CSS
Clients send requests to the virtual IP address (VIP) of the virtual server, which is the client-facing IP address for the server farm, resident on the content switch. For most organizations, the VIP of a site is normally a registered IP address available in the public Domain Name System (DNS) for client use. However, you can also use a private IP address for internal or public applications. If you use a private IP address for public applications, you will require a firewall to source-Network Address Translation (NAT) to a publicly routable IP address. When the content switch receives the client request, it performs a virtual server lookup. The content switch load balances the request to a real server within the server farm associated with the resultant virtual server. The content switch forwards the request to the real server in one of two modes:
To send an application request to the content switch, the client sends a TCP SYN segment to the content switch. When the content switch receives the TCP SYN segment, it performs a virtual server lookup, by matching the Layer 3 and 4 fields in the segment to policies that you configure for your virtual servers. If you configure your virtual server with additional Layer 57 policies, such as URL file extensions, cookies, and HTTP headers, the content switch proxies the TCP connection to the client in order to inspect the client's application request. Otherwise, the content switch simply selects a real server, translates the destination address to that of the real server, and forwards the TCP SYN request to the real server. When the real server receives the request, it sends a TCP SYN-ACK response back through the content switch. The content switch translates the source address back to the virtual server address, and forwards the TCP SYN-ACK to the client directly. The client's TCP ACK completes the TCP three-way handshake with the real server.
Because the content switch rewrites the destination IP address in the IP packet, it must recalculate the IP checksum, encompassing the header of the IP packet.
For Layer 57 policies, the content switch must complete the TCP connection with the client on behalf of the real serverthe content switch performs delayed binding to inspect the application request. The content switch can then match content in the application request to the Layer 57 policies of your virtual servers. For example, you can configure a virtual server policy to match all HTTP GET requests containing the string "*.jpg" and another virtual server policy to match requests containing the string "*.html." Based on the matched policies, the content switch selects a real server from the configured server farm to forward the request to. Figure 10-2 illustrates how delayed binding works.
Figure 10-2. Content Switch Delayed Binding
The content switch generates an initial sequence number for the front-end TCP connection. As a result, the content switch is responsible for sequence number remapping. Remapping involves the content switch rewriting the sequence number generated by the real server throughout the TCP flow, as Figure 10-2 illustrates. Because the content switch rewrites the sequence numbers in the TCP segment, it must also recalculate the TCP checksum, to encompass the entire TCP segment.
The benefits of delayed binding are twofold. The first benefit is ensuring that clients open valid TCP connections before the content switch establishes connections to back-end real servers. This is beneficial because the CSS can detect and drop denial-of-service (DoS) attack TCP SYN segments before forwarding them to real servers. The second benefit is that the content switches can inspect the payload of the client's request in order to make intelligent Layer 57 decisions. For example, you can configure your content switch to inspect HTTP headers or URLs and forward requests to real servers based on that information.
Content switches can also process pipelined HTTP requests. Recall from Chapter 8, "Exploring the Application Layer," that, with persistent HTTP connections, clients send multiple requests over the same TCP connection. With HTTP pipelining, the client does not wait for the HTTP responses from the server before sending additional requests. Persistence and pipelining avoid overhead that is associated with establishing multiple TCP connections, and the delay associated with waiting for server responses for each HTTP request. Figure 10-3 shows how a content switch forwards two requests to the same real server, using delayed binding.
Figure 10-3. Forwarding Pipelined Persistent Requests to a Single Real Server
Delayed binding is not required for forwarding pipelined persistent requests, unless you configure Layer 57 policies for the virtual server.
Alternatively, content switches can distribute non-pipelined persistent requests across real servers by extracting each HTTP request from within the persistent HTTP session. The server can then send the requests to individual back-end real servers. Figure 10-4 illustrates HTTP rebalancing multiple persistent requests with HTTP 1.1.
Figure 10-4. Rebalancing Multiple Persistent Requests to Different Servers
If a request spans multiple packets, the CSS buffers a configurable number of packets, with the spanning-packets command on the CSS, before making a load-balancing decision.
Rebalancing HTTP requests to different servers is of particular interest in caching environments where even distribution of files across a pool of caches is desirable. You will learn about cache load-balancing in Chapter 13.
By default, the CSS distributes all subsequent GET requests within an HTTP persistent TCP connection to the originally selected real server (unless the CSS matches a subsequent request with a different virtual server that does not contain the originally selected real server). However, you can configure your CSS to rebalance back-end real server connections, using the content rule configuration command
By issuing this command, when the CSS receives a new GET request for the current real server, the CSS sends a TCP RST to the current real server and an HTTP 302 Object Moved to the client to remap the connection to another real server. The client generates the GET request again, which the CSS uses to select another back-end real server for the request. As Figure 10-4 illustrates, to enable the CSS to instead remap the back-end connection to a new real server automatically, you can use the content rule configuration command
persistence reset remap
With this command, the CSS sends a TCP RST to the current real server, establishes a TCP connection to the new real server, and sends the GET request to the new real server.
To revert back to HTTP 302 redirection, you can manually use the command
persistence reset redirect
The CSM does not support HTTP 302 redirection. However, you can enable your CSM to rebalance back-end real server connections automatically using the virtual server configuration command: