Load-Distribution Algorithms

In order to distribute requests among real servers, you can use any of the following load-balancing algorithms:

Round Robin
Least Connections
Server Load
Hash Distribution
Layer 57 Load Balancing
Equal-Cost Multi-Path (ECMP) Router Load Balancing

Round Robin

Round robin is the most basic prediction algorithm. As requests arrive, the content switch forwards them blindly to its list of real servers starting with the real (short for real server) that you configured first, and ending with the last real you configured. With round robin, during one cycle, the content switch sends only one connection to each real server. If a server farm has N servers, a single cycle represents N connections. Alternately, with weighted round robin (WRR), each cycle provides multiple connections to each real. You can assign to each real a weightthose reals with a higher weight receive a larger number of the available connections. Now the sum of all the weights of the real servers represents the number of connections issued by the content switch during one round-robin cycle. With WRR, you can assign different weights to subgroups of real servers within a server farm. For example, the assigning of weights is beneficial if your server farm has numerous real servers running on a few different hardware platforms, with various memory capacities and CPU speeds. Table 10-2 illustrates a server farm consisting of four groups of six real servers in total that are assigned various weights. The total number of connections per cycle in this example is 27.

Table 10-2. Sample WRR Weights
Sub-Group	Server	Weight / Connections per Cycle
1	S1	8
1	S2	8
2	S3	2
2	S4	2
3	S5	4
4	S6	3
	Total	27

To configure WRR on your CSS, use the balance weightedrr command in your content rule. To specify the weights for the real, use the weight command in service configuration mode on the CSS. To configure least connections on your CSM, use the predictor leastconns command in your server farm configuration. You can configure the weights using the weight command in real configuration mode on the CSM.

You can manually weight your reals, as Table 10-2 illustrates, or you can automatically weight them using DFP. DFP requires that you install a software agent on your real servers. DFP agents are commercially available from third-party software vendors. If the installed agent recognizes that resources are low on its real server, it can send a DPF message to the content switch to modify its weight within the server farm. Your content switches act as DFP managers. The DFP manager actively probes the agents for status information, such as whether the real server is over- or underused. To configure your CSS as a DFP manager, use the following command in global configuration mode:

 dfp ip_address {port} {key "secret" | [des-encrypted encrypted_key |   " encrypt_key"]} {timeout seconds} {retry count} {delay time} {max-agent-wt weight}

The ip_address value indicates the IP address of the DFP agent. To encrypt the information exchanged between the DFP agent and manager, you can enter an MD5 key as a quoted string, using the key parameter. Alternatively, you can perform DES encryption by providing an encrypted unquoted key (that has already been encrypted by the CSS) or quoted unencrypted key (that the CSS encrypts for you), using the des-encrypted parameter. The CSS encrypts your key, if necessary, before saving it to the running configuration. The timeout value indicates the inactivity period before the manager closes the connection to the agent. The retries value specifies the number of times the manager attempts to connect to the agent. The delay value indicates the length of time that the DFP manager waits between each retry. All time units are in seconds. The default TCP port for DFP is 14001. For example, you can use the following command to configure your CSS to connect to the real 10.1.10.11:

 dfp 10.1.10.11 14001 key "cisco" timeout 300 retry 3 delay 15

To configure your CSM as a DFP manager, use the following commands in CSM configuration mode:

 dfp [password password] agent ip-address port [activity-timeout [retry-count [retry-interval]]]

For example, you can use the following commands to configure your CSM to connect to the real 10.1.10.11:

 dfp password cisco agent 10.1.10.11 14001 300 3 15

A drawback to round robin is that it can cause an uneven distribution of concurrent connections in volatile environments where real servers fail frequently. In the event of a real failure for a period of time, the remaining servers absorb the load. Even when the failed real comes back online, its concurrent connections may not catch up to the others, unless they also fail for the same duration and frequency. To provide an even distribution of connections to your reals in volatile environments, use the least-connections distribution algorithm.

Least Connections

With the least-connections algorithm, as the name suggests, the content switch forwards new requests to real servers with the fewest connections. The content switch maintains the concurrent number of existing connections to each real. When a real receives a new connection, the content switch increments the count. When clients or servers tear down connections, the content switch decrements the amount. The benefit of the least-connections load distribution mechanism is that it creates an even distribution of connections across your reals.

Real server weighting is also available for the least-connections predictor algorithmthose reals with higher relative weights receive a larger proportion of the available connections. The difference with least-connection weighting and the weighting mechanism in WRR is the way in which the content switch uses the weight to determine the distribution of connections. For example, say that you give the same weights to subgroups of reals within your server farm as given previously in Table 10-2. Consider a server farm consisting of N subgroups of reals, with N different weights 1, 2, ... N. During one cycle, the real subgroup with weight 1 would receive 1 / (1 + 2 + ... + N) connections, the real with weight 2 would receive 2 / (1 + 2 + ... + N) connections, and so forth. Table 10-3 illustrates how the least-connections algorithm distributes the load with the same weights as given previously with WRR in Table 10-2.

Table 10-3. Sample Weighted Least-Connections Proportion Calculations
Server	Weight	Percentage of Connections
S1	8	8/27 = 29%
S2	8	8/27 = 29%
S3	2	2/27 = 7%
S4	2	2/27 = 7%
S5	4	4/27 = 14%
S6	3	3/27 = 11%
Total	27	27/27 = 100%

To configure least connections on your CSS, use the balance leastconn command in your content rule. The CSS does not use the service weight with least connectionsit assumes that every service has the same weight.

To configure least connections on your CSM, use the predictor leastconns command in your server farm configuration. You configure the weight in real configuration mode on the CSM, using the weight command. You can also use DFP as discussed previously to automatically adjust the weights of your real servers.

A drawback to least connections is that, when real servers fail or when you add a new real server to the farm, the CSS sets the real's connection count to zero. When they come online again, the content switch may overload them with connections to bring them up to the level of connections that are on the rest of the available reals in the server farm. To overcome the possibility of overloading a real when it comes back online, the CSM least-connections algorithm provides an inherent slow-start mechanism to avoid overloading real servers when they come online.

If you find that your servers tend to fail repeatedly when they come online on your CSS, you should consider using weight round robinthe CSS least connection algorithm does not provide connection slow start. When you add the real server to the farm, make sure that its weight is very low, and then gradually increase its weight manually to provide a similar function to slow start.

Both round robin and least connection load-balancing methods enable you to configure connection thresholds. Once you configure the connection thresholds, the content switch will take a real server out of the rotation, leaving existing connections intact. To define the maximum number of connections of a real server on the CSS, use the following command in service configuration mode:

 max connections max-conns

Once the CSS reaches the maximum connections for the real, it will not issue connections until the connections drop below the maximum threshold.

To define the maximum number of connections of a real server on the CSM, use the following commands in real server configuration mode:

 maxconns max-conns minconns min-conns

Once the CSM reaches the maximum connections for the real, it will not issue connections until the connections drop below the minimum threshold.

Configuring Server Load

Your CSS can estimate the average load of a real server by calculating its average response times that are obtained from live traffic over a period of time. The content switch uses response times from both TCP connection setup and application layer requests/responses for live flows in its calculations. The content switch forwards requests to the real servers based on their loadthe reals with a higher load receive fewer requests than those with a lower load.

The CSS implements a proprietary algorithm called Arrowpoint Content Awareness (ACA) to calculate the real-time server load based on the average response time over a period of time. ACA is not useful in small server farms (for example, server farms with only two or three reals). To enable ACA, use the balance aca content rule command. You can then configure either relative or absolute server load calculations.

Note

As an alternative to ACA, you can also use the load value service configuration command to manually specify the load of a server. The CSS considers a real with a value of 254 configured with this command as out of service. This command bypasses the ACA load calculation and can be set using XML, SNMP, or the command line. To use the load command, you must first disable load reporting by using the no load reporting global configuration command.

Calculating Relative Load

With relative load calculations, the CSS computes the load for each real to a value between 2 and 254, where 255 is fully loaded and therefore ineligible for content requests. The CSS computes two different load numbers for each serverone for large files (greater than 15 KB) and one for small files (less than 15 KB)and then averages the two.

The load step is a value that the content switch uses to determine whether a significant difference exists between the average response times of different servers. The greater the step, the less likely it is that the average response times of the reals servers will be different. You can manually set the load step yourself, using the command

 load step step static

Alternately, you can configure the CSS to automatically calculate the load step for you by using the command

 load step step dynamic

With the load step dynamic command, you must configure an initial load step, but the CSS modifies it after the CSS collects sufficient response times.

The CSS takes a real server out-of-service when its load surpasses a load threshold that you configure with the command

 load threshold load

To calculate the load of the real Real_i, where i is the index of real within the server farm, the CSS uses the following function to normalize the load against the fastest responding real MinimumRespTime:

Load(Real_i) = [(RespTime(Real_i) MinimumRespTime)/LoadStep] + 2

The content switch assigns the real server with the lowest response time (MinimumRespTime) with a load of 2. All other real servers' loads are based on the response time of that real. For example, Table 10-4 shows four servers with response times of 100 ms, 200 ms, 600 ms, and 700 ms. The lowest response time is that of server S1 at 100 ms. The load is calculated for the remaining services in Table 10-4 using a MinimumRespTime of 100 ms and a load step of 10 ms.

Table 10-4. Sample Load Calculations with Load Step of 10 ms
Real_i	RTT	Load(Real_i)
S1	100	Lowest Load = 2
S2	200	2 + (200 100)/10 = 12
S3	600	2 + (600 100)/10 = 52
S4	700	2 + (700 100)/10 = 62

If you use a load step of 100 ms instead, you will get the loads calculated in Table 10-5. Because the difference in response times between S1 and S2 is equal to 100, their load numbers are the same, and therefore they will receive the same amount of connections from the CSS. Also, notice that, the higher you choose the load step, the closer the load numbers are for the reals.

Table 10-5. Sample Load Calculations with Load Step of 100 ms
Real_i	RTT	Load(Real_i)
S1	100	Lowest Load = 2
S2	200	2 + (100 100)/100 = 2
S3	600	2 + (600 100)/100 = 7
S4	700	2 + (700 100)/100 = 8

The CSS recalculates the load numbers for each real server when the load-teardown timer expires. To configure the load-teardown timer, use the following command:

 load teardown-timer interval

The CSS uses an age-out mechanism to test real servers whose performance has dropped substantially for an extended time, but that may be back to normal by the end of the age-out timer. When the age-out timer expires, the content switch resets the load to a value of two of all reals whose load has not changed since the beginning of the age-out timer. This way, the content switch can issue more connections to the reals in order for the tear-down reports to accumulate more flow information and more accurately calculate the load for the server. To change the age-out timer, use the following command:

 load ageout-timer seconds

The default setting for the tear-down and age-out timers are 20 and 60 seconds, respectively.

Calculating Absolute Load

As you learned with relative load calculations, the content switch normalizes the response time values with the fastest responding real while calculating the load of the remaining real servers. However, if you use your CSS to switch content for multiple applications, relative load normalizes the load values of services of a slower application with the response time of the fastest service of the faster responding application. For example, the average response times of application A is 50 ms and application B is 700 ms. Furthermore, the response time of the fastest responding service is 20 ms for a service within application A. In this example, the CSS normalizes all services within applications A and B using the response time of 20 ms. If you want to use ACA for multiple applications on a single CSS, you should use absolute load calculations instead of relative load.

Absolute load calculations take into account only the actual perceived load of the server. With absolute load calculations, the CSS divides the load number scale into 16 ranges of load numbers. The CSS assigns individual load steps to the ranges, depending on the granularity of the response times of your applications. Ranges are groups of consecutive load numbers that share a common load step between numbers. You can adjust the load step and maximum response times for the ranges by using the command

 load absolute-sensitivity number

The CSS uses a sensitivity of 21 by default. Table 10-6 gives the load step and maximum response time values for the 16 ranges, based on a sensitivity of 21. With this sensitivity, the upper boundaries for the load step and maximum response time are 65,536 ms and 33 minutes, 51 seconds, respectively.

Table 10-6. Absolute Load Values for Sensitivity of 21
Range Number	Load Numbers	Step Size (ms)	Maximum Response Time (ms)	Maximum Response Time (h:m:s)
1	2-15	2	32	0: 0: 0
2	16-31	4	96	0: 0: 0
3	32-47	8	224	0: 0: 0
4	48-63	16	480	0: 0: 0
5	64-79	32	992	0: 0: 0
6	80-95	64	2016	0: 0: 2
7	96-111	128	4064	0: 0: 4
8	112-127	256	8160	0: 0: 8
9	128-143	512	16,352	0: 0:16
10	144-159	1024	32,736	0: 0:32
11	160-175	2048	65,504	0: 1: 5
12	176-191	4096	131,040	0: 2:11
13	192-207	8192	262,112	0: 4:22
14	208-223	16,384	524,256	0: 8:44
15	224-239	32,768	1,048,544	0:17:28
16	240-254	65,536	2,031,584	0:33:51

With absolute load calculations, the CSS groups services from your faster applications into the lower ranges. The maximum response time value in Table 10-6 separates your slower applications from faster applications. Additionally, real servers of slower-responding applications tend to have larger differences in response times. Therefore, the CSS increases the load step as the maximum response time increases, as Table 10-6 illustrates.

As an example, Table 10-7 gives sample absolute load calculations for two applications with three reals each. Based on the perceived response times, the CSS assigns application A to range 1 and application B to range 5.

Table 10-7. Sample Absolute Load Calculations for Two Applications with Three Servers Each
Application	Real_i	RTT	Load(Real_i)	Range
A	S1	45	19	1
A	S2	50	20	1
A	S3	55	21	1
B	S1	670	70	5
B	S2	700	71	5
B	S3	730	72	5

Increasing the sensitivity increases the upper load step size and maximum response time boundaries, and thus decreases the load numbers that the CSS assigns to your reals. For example, if you change the sensitivity to 22 using the command load absolute-sensitivity 22, the CSS would assign lower load numbers to the reals of application B in Table 10-7.

Hash Distribution

In previous chapters, you learned that hashing can be used to check the integrity of SSL segments, authenticate HTTP requests, and compute hash values from keys used in hash table lookups during CEF adjacency determination. You can also use hashing for distributing load across real servers.

With simple hash distribution, the content switch extracts the input to its hash function from fields within the incoming IP packet and TCP/UDP segment headers, including source/destination IP addresses and TCP/UDP ports. The content switch computes the hash as a value between 0 and a number that is usually quite a bit larger than the number of reals. The content switch then divides the hash value by the number of reals N, with the remainder being in between 0 and N 1. The remainder of the division provides the real server to forward the request to.

Note

To determine the remainder of a division between two numbers, the content switch uses the modulus or "mod" operator.

A benefit of hashing in general is that, because the content switch hashes every packet of a flow, it does not need to store the associations of the client's connection to the selected real server in RAM. As a result, hashing is a stateless distribution method, whereas the methods described previously are all stateful because the content switch must store the mappings in RAM in order to know where to forward subsequent packets of the client's flow.

Hashing the packet's source IP address is useful only in environments where there are vast numbers of clients, because the same keys provide the same hash value. A small number of clients may cause an over-selection of particular real servers. Additionally, some ISPs use mega-proxy servers to NAT hundreds or even thousands of clients to a single IP address. As a result, your content switch will forward all requests from clients behind the proxy to the same real server, potentially causing that real server to overload.

You can configure your content switch to hash the following information for selecting real servers:

Source IP address hashing A benefit of source address hashing in some environments is that the client connects to the same server for every request, causing the client to stick to the same real server for all requests to the virtual server. This benefit proves useful for e-commerce applications that store client-specific information across content requests. You will learn more about session stickiness later in this Chapter.
URL, domain name, and destination IP address hashing URL hashing is useful in caching environments in which the content is not duplicated across the real servers. Cache load balancing requires the content switch to distribute files evenly across available CEs. Delayed binding enables the content switch to extract the URL in the request before choosing a real server. You will learn more about cache load balancing in Chapter 13.

Layer 57 Load Balancing

In order to match a client request to a virtual server, you can configure your CSS to inspect application layer headers as matching criteria. By enabling application-layer load balancing, you can assign client requests to virtual servers given the following criteria:

URLs and file types URLs contain information that you can use to divide your content into various groups. For example, you can distinguish between static and dynamic content using file extensions.
Languages Content for various demographics can reside on dedicated servers or server farms.
Browser types Palm, PDA, or cell phones may have different screen resolutions than regular PCs or interpret markup languages other than HTML, such as Wireless Markup Language (WML) or extensible HTML (XHTML). You can house a transformed version of content for these clients on a dedicated server. See Chapter 7, "Presenting and Transforming Content," for more information on content transformation.
Search engine services Search engine services and wireless application protocol (WAP) gateways sometimes send requests that contain distinguishable characteristics from regular client requests. For example, a request for uniform resource identifier (URI) /robots.txt is probably from a search engine, in which case you can handle the requests differently than regular content requests. Search engines may also use specific user-agent header values to distinguish themselves, such as "Googlebot" and "YahooSeeker."
Cookies Cookies can be used as criteria for inspecting customized content for clients. You will learn about connection stickiness later in this Chapter.
Bypass noncacheable traffic When selecting whether a request should be forwarded to a cache engine, the HTTP "Cache-Control:" header can be inspected by the content switch for a value of "no-cache" or "no-store." Recall from Chapter 8 that these values indicate that content must be validated for every request. Therefore, there is no reason to send the request to a cache engine; rather, simply send the request to the virtual server of the origin server where the content resides.

Figure 10-21 illustrates how you can configure a CSS with two virtual servers for the same content. The content switch forwards the requests to different server farms by inspecting the application request.

Figure 10-21. HTTP Header Load Balancing

To configure your CSS with these two different virtual servers, use the configuration in Example 10-9. Browsers use character codes for specifying the language within the "Accept-Language:" HTTP header. Additionally, different dialects of the same language use a unique code. For example, the English language has dialect codes for Australian (en-au), Belize (en-bz), and many others. The example in Figure 10-21 uses United States English (en-us) and Canadian French (fr-ca), but you can catch all the available dialects by using the contain keyword in Example 10-9.

Example 10-9. HTTP Header Load Balancing on the CSS

 header-field-group french-lang  header-field french language contain "fr-" 20 header-field-group english-lang  header-field english language contain "en-" 20 service fr-web01  ip address 10.1.10.10  active service en-web02  ip address 10.1.10.11  active owner cisco  content french-vip   vip address 10.1.10.100   header-field-rule french-lang   protocol tcp   port 80   add service fr-web01   active content english-vip   vip address 10.1.10.100   header-field-rule english-lang   protocol tcp   port 80   add service en-web02   active

You can also create Extension Qualifier Lists (EQL) to enable the CSS to inspect the URI for file extensions. For example, to configure a virtual server to service your static files, and another to serve your dynamic files, you can use the configuration in Example 10-10.

Example 10-10. Using EQLs on the CSS

 eql static-files  extension gif  extension jpg  extension jpeg  extension asf  extension rm  extension qt  extension mp4  extension html  extension htm eql dynamic-files  extension perl  extension asp  extension cgi owner cisco  content static-vip   vip address 10.1.10.100   url "/*" eql static-files   protocol tcp   port 80   add service web01   add service web02   add service web03   active content dynamic-vip   vip address 10.1.10.100   url "/*" eql dynamic-files   protocol tcp   port 80   add service web01   add service web02   add service web03   active

To configure your CSM with these two virtual servers, use the configuration in Example 10-11. To configure the two language policies, you must associate the HTTP header maps ("french-lang" and "english-lang") to their respective server farms by using the policy command. Then, instead of applying the server farms to the virtual server, you apply the policies.

Example 10-11. HTTP Header Load Balancing on the CSM

 mod csm 5  map french-lang header   match protocol http header language header-value "fr-"  map english-lang header   match protocol http header language header-value "en-" serverfarm en-webfarm  nat server  no nat client  real 10.1.10.10   inservice serverfarm fr-webfarm  nat server  no nat client  real 10.1.10.11   inservice policy french-policy  serverfarm fr-webfarm  header-map french-lang policy english-policy  serverfarm en-webfarm  header-map english-lang vserver french-vip  virtual 10.1.10.100 tcp www  policy french-policy  persistent rebalance  inservice vserver english-vip  virtual 10.1.10.100 tcp www  policy english-policy  persistent rebalance  inservice

To switch content based on file extensions on your CSM, you must create a URL map with the url-map command in policy configuration mode, as Example 10-12 illustrates. The first matching policy that is configured within the virtual server content-vip wins. You should also consider configuring a default policy that catches requests that do not match any of your other policies.

Example 10-12. Extension Matching on the CSM

 mod csm 5  map static-files url   match protocol http url *.gif   match protocol http url *.jpg   match protocol http url *.asf   match protocol http url *.rm   match protocol http url *.qt   match protocol http url *.mp4   match protocol http url *.html   match protocol http url *.htm map dynamic-files url   match protocol http url *.cgi   match protocol http url *.perl   match protocol http url *.asp map default-files url   match protocol http url *.* serverfarm stat-webfarm  nat server  no nat client  real 10.1.10.10   inservice serverfarm dyn-webfarm  nat server  no nat client  real 10.1.10.11   inservice serverfarm default-webfarm  nat server  no nat client  real 10.1.10.12   inservice policy static-policy  serverfarm stat-webfarm  url-map static-files policy dynamic-policy  serverfarm dyn-webfarm  url-map dynamic-files policy default-policy  serverfarm default-webfarm  url-map default-files vserver content-vip  virtual 10.1.10.100 tcp www  policy static-policy  policy dynamic-policy  policy default-policy  persistent rebalance  inservice

Note

Bear in mind that HTTP header load balancing is different from HTTP URL hashing. HTTP header load balancing is a method of selecting virtual servers. URL hashing is a method of deciding which real server to send the request to, after the content switch selects a virtual server.

Equal-Cost Multi-Path Router Load Balancing

Equal-Cost Multi-Path (ECMP) is similar to CEF load balancing. To configure per-flow load balancing, use the command

 ip ecmp address

This command enables the CSS to hash the source and destination addresses of incoming packets together to ensure that your flows will traverse the same outgoing router. To enable per-packet load balancing, use the command

 ip ecmp round-robin

By default, the CSS selects the same interface for return traffic of a flow as the incoming traffic came from. To disable this behavior for ECMP, you can use the command

 ip ecmp no-prefer-ingress

Figure 10-22 illustrates ECMP.

Figure 10-22. Load Balancing Packets Across Multiple Routers

So far in this Chapter, you've learned a lot about how content switches receive and forward requests to real servers from clients. As a result, you should now be able to configure and deploy a basic server farm in your production environment. Once you enable your configuration in production, moving forward, you may want to tweak the settings and make the server farm perform more efficiently by using enhanced application health checking. You may also have application requirements to ensure that users remain connected to the same real server for the duration of their sessions. Additionally, when the capacity of your server farms increase, you may find that your content switches start to failthis is not an uncommon event in large server farm environments. To prevent your application from failing when your content switch fails, you should consider configuring content switch fault tolerance. The remainder of the Chapter will provide you with the information you need to configure a robust and highly available server farm.

Round Robin

Table 10-2. Sample WRR Weights

Least Connections

Table 10-3. Sample Weighted Least-Connections Proportion Calculations

Configuring Server Load

Calculating Relative Load

Table 10-4. Sample Load Calculations with Load Step of 10 ms

Table 10-5. Sample Load Calculations with Load Step of 100 ms

Calculating Absolute Load

Table 10-6. Absolute Load Values for Sensitivity of 21

Table 10-7. Sample Absolute Load Calculations for Two Applications with Three Servers Each

Hash Distribution

Layer 57 Load Balancing

Figure 10-21. HTTP Header Load Balancing

Example 10-9. HTTP Header Load Balancing on the CSS

Example 10-10. Using EQLs on the CSS

Example 10-11. HTTP Header Load Balancing on the CSM

Example 10-12. Extension Matching on the CSM

Equal-Cost Multi-Path Router Load Balancing

Figure 10-22. Load Balancing Packets Across Multiple Routers