Internet Service ProvidersProxying and Traffic Volumes | Optimizing Network Performance with Content Switching: Server, Firewall and Cache Load Balancing

Internet Service Providers”Proxying and Traffic Volumes

Anybody connecting to the Internet today, be it for individual use or for a business requirement, will typically use an Internet Service Provider (ISP). ISPs offer a centralized point for users to connect to the Internet. This eradicates the need for each individual user or business to provide their own infrastructure and participate in routing updates, DNS configuration, and so forth. All of this is handled by the ISP, who in addition will have a resilient network and have peering agreements with other service providers to connect into their infrastructures . This provides a transparent view to the end users; they merely have to sign up with an ISP and connect to a predetermined end point within the ISP, and all onward communication is carried out seamlessly. This makes connection to the Internet a relatively painless operation, but brings with it design issues for network architects . These can vary depending on the ISP used, but mainly revolve around proxy issues and Network Address Translation (NAT).

Proxies

In the days before content switching, the only way to provide access and some form of control was to point all the users at a single device. Traditionally, a single proxy always handled access out to the Internet. This device was configured to perform NAT, provide a minimum level of caching, and allow access to external devices. As the Internet has evolved and more users are online, large ISPs are now under pressure to provide a comprehensive and cost-effective service. As ISPs may have hundreds of thousands of subscribers, managing IP address allocation to all these remote devices is a major concern. Moreover, with the IP address shortage being experienced it makes no sense to allocate hundreds of thousands of individual addresses when only thousands of people are online simultaneously . ISPs have an obligation to preserve address space as effectively as possible. The way the ISPs get around this is to allocate everybody a private IP address as described in RFC 1918. This allows every subscriber to connect to the ISP concentration point, or proxy, using his or her allocated private IP address. Then, using NAT, each private IP address is translated to a single public IP address using Port Address Translation. Port Address Translation uses the source TCP port to differentiate the different users. It allows many thousands of addresses to be translated to a single public address, as illustrated in Figure 7-2. This is often called a mega proxy .

Figure 7-2. Port Address Translation. This maximizes the use of IP addresses and ensures that thousands of devices can access the Internet using a single IP address. This is often referred to as a mega proxy.

graphics/07fig02.gif

As each user is now connecting through a proxy with access to the Internet, it makes sense that these proxies also perform other functions. With the drive to increase revenues , other applications and services are provided at a small increase to subscription costs. These services typically include:

Personal firewalls
Virtual private networks (VPNs)
Personal content portals
Antivirus checking

However, one of the biggest advantages of a proxy is its caching ability. Caching allows the content being requested by one user to be stored locally at each ISP POP and therefore will be available to other users who require the same data at a later date. This increases performance while minimizing bandwidth usage, since subsequent requests do not have to be sent out across the Internet, as the data is already local. Caching breaking news stories or major sporting events are typical examples of how we can increase the user experience and reduce bandwidth usage. Web cache redirection is discussed in detail in Chapter 8, Application Redirection .

Therefore, it is a compelling argument for ISPs to deploy a proxy to handle these functions. These are known as mega proxies. However, due to the sheer volume of users and content being served , a single proxy, while it can handle thousands of users simultaneously, is not a viable option when effective throughput is required. To overcome this, multiple proxies are deployed and are often load balanced to provide better performance and resilience. This brings with it a whole host of other issues that can play havoc on persistence.

Proxy servers from different vendors also throw their own unique issues and quirks into the mix. Proxy servers can communicate with one another and distribute the load between them for the same user session. The thinking behind this is that the requesting device is oblivious to any optimization of its request. All the requesting device requires is that is gets a response back from the device to which it sent the request. If all the acknowledgment and sequence numbers are valid, the cyclic redundancy checksum calculates, then the packet is processed . Therefore, proxy servers can manipulate and distribute load without the user knowing. This in itself can be a major pitfall in environments requiring persistence.

Once a user connects to a mega proxy, all onward connections into a site appear as if they have come from that proxy. As these proxies are handling sessions for many thousands of users, all of these connections appear to have come from that same device. While this is great because it conserves the rapidly diminishing IP address space and allows routing tables and updates to be reduced, it causes serious issues when persistence is required. As these proxies are load balanced, sessions for a single page of HTML objects can be seen as coming from different devices. In an HTTP environment where every HTTP GET or POST is seen as a single session, this doesn't affect the user. However, in environments that require the same server to handle the session, this obviously is not a feasible solution. Based on this we will see that IP address hashing, while an excellent method for persistence, is not a scaleable and preferred solution when large mega proxies are involved. Figure 7-3 illustrates the potential hazards of using proxy servers when persistence is required.

Figure 7-3. Issues with load balanced proxies and persistence. By load balancing requests across multiple proxies, persistence is not maintained , as the request appears to have come from multiple sources instead of a single source as is required.

graphics/07fig03.gif