Health Checking | Content Networking Fundamentals

To determine the health of a real, content switches can perform either out-of-band or in-band health checks.

Out-of-Band Health Checking

With out-of-band (OOB) health checking, you configure the content switch to pose as an actual client and send mock requests to the real server. The content switch inspects the response of the request and changes the status of the real server accordingly. If the health check fails, the content switch temporarily removes the real from the pool, and does not issue any more connections until the health check succeeds. The health check parameters, such as ports, execution frequency, and timeouts, are highly tunableyou can adjust them to meet the specific requirements of your application.

You can configure the content switch to send TCP probes to a real server by opening a TCP connection to a configurable port. That is, the content switch sends a TCP SYN to the real, and if it receives a TCP SYN/ACK from the real within a configurable timeout value, the content switch deems the real to be healthy. To configure TCP probes on the CSS, you can use the configuration in Example 10-13.

Example 10-13. Configuring TCP Keep-Alives on the CSS

 keepalive tcp-http  type tcp  port 80  frequency 10  retryperiod 5  maxfailure 4 service web01  ip address 10.1.10.11  keepalive type named tcp-http  active

Example 10-13 gives an example TCP keep-alive called "tcp-http" that you can assign to particular real servers by using the keepalive command in service configuration mode. This keep-alive sends a TCP SYN segment to port 80, waits 5 seconds for a response, and then retries three more times (that is, you have to subtract the initial attempt from maxfailure to get three retries) before taking the real server out of rotation. The CSS reattempts the keep-alive every 10 seconds.

To configure TCP probes on the CSM, you can use the configuration in Example 10-14.

Example 10-14. Configuring TCP Keep-Alives on the CSM

 module csm 5 probe tcp-http tcp  port 80  interval 10  open 15  retries 3 serverfarm web-farm  real 10.1.10.11   inservice  probe tcp-http

You can also configure your content switch to send application-specific commands for various applications. Content switches contain a few built-in application layer checks, such as simple mail transfer protocol (SMTP), FTP, and HTTP. Content switches can send HTTP GET or HEAD messages to the real servers and then parse the response codes or the HTML page from the real server for errors. If you need to check for only particular HTTP response code, you should use the HTTP HEAD method to request only the HTTP headers, and avoid the real having to send the HTTP body in the message over the network.

To configure application layer keep-alives on your CSS, you can use the configuration in Example 10-15. By default, the CSS expects the HTTP 200 OK return code in response to HTTP HEAD method keep-alives.

Example 10-15. Configuring HTTP Keep-Alives on the CSS

 keepalive keep-http  type http  method head service web01  ip address 10.1.10.11  keepalive type named keep-http active

Example 10-16 illustrates HTTP keep-alives on the CSM. You must explicitly specify the successful HTTP status code in the CSM configuration.

Example 10-16. Configuring HTTP Keep-Alives on the CSM

 module csm 5 probe keep-http http  request-method head  expect status 200 serverfarm web-farm  real 10.1.10.11   inservice  probe keep-http

With scripting languages, you can also write custom application layer probes that you can load and execute on the content switch, to send various application-specific commands to your real server. For example, you can write an application-specific probe to log in to your web application server with a set of known user credentials. Additionally, the CSM provides sample scripts on Cisco.com including an SSL-specific probe to check to see whether an SSL session can be opened to an SSL server. The script sends an SSL Client Hello message to the server, and waits to receive the Server Hello message from the server for successful probe execution. To enable a scripted keep-alive for a service on the CSS, use the service configuration command:

 keepalive type script script-name "arguments"

Note

The CSS uses a proprietary scripting language, and the CSM uses TCL scripting language to execute scripts. For more information on scripted keep-alives, refer to your product documentation on Cisco.com.

In the event of a real server failure, you can control what happens to the existing connections to the real on the CSM. You can either reassign the connections to new real servers or completely remove the connections from the CSM, using the following server farm configuration command:

 failaction {purge | reassign}

The drawback of OOB health checking is that it imposes a slight increase in load on your servers. Additionally, your content switch knows only the status of a server when it issues a keep-aliveyour content switch is unaware of a failure if a real fails between probe intervals. Therefore, the content switch may send requests to a failed real for a maximum time equal to the probe interval. To help overcome these drawbacks, you should consider using in-band health checking in conjunction with OOB health checking.

In-Band Health Checking

You can configure the content switch to derive the status of a real server by inspecting live TCP connections and application transactions between the content switch and real servers. You can configure two forms of in-band health checking:

TCP connection monitoring If real servers are unable to complete the TCP handshake for a live request in a timely fashion, the CSM can retry the request a number of times. If the real does not complete its portion of the TCP handshake after a configurable number of retries, the content switch automatically removes the real from the pool. Once removed, the real remains out-of-service for a configurable amount of time.
HTTP return code monitoring The content switch can parse real servers' HTTP return codes. If the content switch receives numerous unexpected error codes from the real servers in response to valid requests, the content switch automatically removes the real from the pool. The number of erroneous return codes must first reach configurable thresholds to trigger removal from the pool.

Note

The CSS does not support in-band health checking.

Figure 10-23 illustrates TCP connection monitoring in-band health checking.

Figure 10-23. TCP Connection Monitoring In-band Health Checking

To configure TCP connection monitoring health checking on your CSM, use the following command in server farm configuration mode:

 health retries count failed seconds

The retries is the number of TCP SYN requests that the CSM attempts to the real server before taking the real out-of-service. Once the CSM takes the real out-of-service with in-band health monitoring, the CSM continues issuing TCP connections to the real after the number of seconds you configure within the failed parameter. For example, the command health retries 2 failed 20 reattempts the TCP connection twice before taking the real out-of-service for 20 seconds.

To configure HTTP return code checking on the CSM, use the following commands:

 map name retcode match protocol http retcode min max action [count | log | remove] threshold [reset seconds]

You can specify the CSM to count, log the return codes to a Syslog server, or remove the real from service when it receives particular HTTP return codes. For example, you can configure a number of match entries with different actions with the configuration in Example 10-17.

Example 10-17. Return Code Checking on the CSM

 module csm 5 map http-rets retcode  match protocol http retcode 400 401 action log 5 reset 120  match protocol http retcode 402 417 action count  match protocol http retcode 500 500 action remove 3 reset 0  match protocol http retcode 501 505 action log 3 reset 0 serverfarm webfarm  real 10.1.10.11   inservice  retcode-map http-rets

Note

Because the error codes within the range 400499 are client errors, you should not configure the CSM to take the real out-of-service when the server sends these return codes. See Chapter 8 for a list of the available return codes in HTTP 1.1.

You should not rely completely on in-band health checking because in-band catches only simple failures, such as a lack of SYN-ACK reply, RSTs, and unexpected HTTP return codes. Instead, you should use in-band health checking in conjunction with OOB health checking.