Architecture of the Server Infrastructure | Practical Service Level Management: Delivering High-Quality Web-Based Services

Most systems use a tiered architecture, with multiple servers within each tier, as shown in Figure 9-1. Each tier has a set of servers that carry out the same function. Multiple servers in a tier can increase performance and availability because distribution of workload across redundant instances can absorb a failure. Multiple servers can also increase the flow volume because multiple flows are spread across the tier, increasing parallelism and reducing the time for application processes to execute.

Figure 9-1. Web Service Delivery Architecture

Web applications are in the first tier. They are customer facing, providing the first layer of access to services and information. To build the web page, the appropriate application logic is activated in the next tier. That application may in turn use databases or other back-end services as needed.

The challenge has become to optimize each server layer for a range of applications, often in the face of conflicting demands. For example, one application may need intermittent, but quick, access to small objects, while another application needs to move bulk data to and from disk storage for sustained intervals. A dedicated server running a specific application can be tuned to meet those requirements without compromising performance by attempting to accommodate a wide range of conflicting resource demands.

Load distribution, other front-end processing, caching, and content distribution also play key roles in moving content efficiently; they are discussed in the next subsections.

Load Distribution and Front-End Processing

The networking industry regularly reinvents old approaches and applies them to new situations. For example, the front-end processor was classically used to improve mainframe performance after mainframes were networked with remote terminals. The mainframe was a highly optimized computing platform and was not suited to handle high interrupt volumes from communications activity.

A similar approach is emerging to wring more performance from server farms. Servers are designed to be high-performance computing and data-access platforms. They can suffer from dealing with high-speed network communications tasks, such as the following:

Processing connection offers
Detecting and suppressing attacks
Handling high interrupt levels associated with hundreds or thousands of active Transmission Control Protocol (TCP) connections
Performing the processing needed for key establishment and for encryption and decryption of Secure Sockets Layer (SSL) connections
Handling data compression
Handling error recovery, flow control, and timers for each connection

Hypertext Transfer Protocol (HTTP), the protocol used for web page transfers, adds additional strains because some versions of HTTP use a separate connection for each object that is accessed. This means that a browser will create a connection, access the object, and break the connection for each object, even if the same server is involved for all the objects on a pageand many web sites have 50 or more objects per page. This adds additional server overhead and slows response.

New products address these limitations with a computer system placed between the server farm and the customers using the site. This new front end is purpose-built for handling communications tasks, in contrast to a general-purpose server where these functions compete with application services for resources. Such front-end devices handle the communications tasks and also perform load-balancing functions.

SSL Accelerators

Businesses and customers are increasingly concerned about the privacy of their transactions. The SSL protocol is an application-layer protocol using TCP for reliable delivery. SSL uses special software at the client and server ends of the connection to ensure that communications are private.

After a TCP connection is established, the client and server authenticate each other to establish that they are who they represent themselves to be. Encrypted digital certificates are exchanged and validated. Then the parties exchange encrypted messages and create a unique key that they use for only this session. The key enables secure communications and the detection of any alterations to the traffic in transit.

SSL adds some additional network overhead for authenticating the partners and negotiating the security profile, but the biggest SSL impact is in computing processing loads associated with key creation. Large numbers of secure connections can degrade server performance because servers must dedicate cycles to the processing associated with SSL establishment.

This places administrators in a bindproviding secure communications degrades performance, and buying more general purpose servers is a costly tax.

A class of products called SSL accelerators is used to off-load the server. Using special-purpose hardware, they handle all the brute force computation that encryption needs. When combined with load-distribution devicesenabling all end-user traffic to flow through the SSL accelerator regardless of the ultimate server destinationthey can also reuse a session key across all of the end user's sessions. This greatly decreases the computation load and simplifies digital certificate management.

There are two related types of load distribution: local load distribution, which shares load across servers in a single server farm, and geographic load distribution, which uses the end user's location to optimize server farm selection. Both types often contain extra functions, such as SSL acceleration, attack handling, and aggregation of many hundreds of incoming connections into far fewer server connections to decrease the server's connection-handling workload.

Local Load Distribution

In a tiered server farm, the first tier has several candidate servers available for the incoming connection. The goal of local load distribution is to balance the load across all members of the tier so that bottlenecks can be avoided, and transaction throughputs maximized, by utilizing available capacity efficiently across servers.

At each tier there must be a means of making a sound selection of the best server to use at that moment. There are load balancing or content switches designed to forward transactions to that server best suited to take on the next incremental request for services.

There are a variety of local load balancing techniques, implemented either in dedicated network hardware or in software. The simplest techniques allocate incoming connections to the next available server using a scheduling algorithmround robin, for example.

More sophisticated load balancing techniques can depend on both demand-side (end-user request) criteria and supply-side (server status) criteria.

To handle demand-side criteria, the load distributor inspects the entire HTTP request and uses that information in its selection decision. By extracting the URL and the cookie, the load balancer has information about the content and the user requesting it. There are many business situations where it is advantageous to treat your customers differently. Those who do large amounts of business might get preferential treatment, or content may be customized for each key customer. Some content, such as URLs associated with purchasing products or services, can be treated as higher priority than those for customers browsing a catalog, for example.

The state of the server infrastructure is an example of a supply-side criterion that can influence the selection process. The load distributor uses information about server loads, access controls, application or content availability, and priority to find the best server at that moment.

Dynamic server supply-side selection strategies are based upon criteria such as the following:

Determining the server with the lowest number of active connections
Selecting a server with the fastest response at that moment
Using an algorithm to predict the best server
Using a ratio of incoming requests for each server

Some load distributors periodically execute a set of scripts that check the health of the servers. For example, they can request a web page and test to see if it was correctly presented. This can be done at the same time that they're timing the server's response speed.

Session persistence is important to transactions that require multiple requests. A customer making a purchase needs to enter information, such as an address, a credit card number, and shipping instructions. These types of services are statefulsome context is needed between requests to maintain coherence and associate the requests.

When servers share a common repository for state information, switching a request to any server is acceptable. However, most applications maintain their state independently, each in a specific server. In those cases, sending requests associated with a single transaction to different servers can cause the transaction to fail.

Session persistence is the capacity to associate a set of requests so that they are directed to the same server. A few examples of the persistence options used by load distributors illustrate the capabilities:

Source persistence The load distributor remembers the addresses of the end user and the identity of the server that was assigned on the first request. Further requests from that user are directed to the original server. However, many end users are on the other side of a firewall or a Network Address Translation (NAT) system, which can reassign addresses frequently, limiting the utility of this technique.
Cookie manipulation The load distributor can create or manipulate a cookie passed between the server and the end user. The cookie is used to store the identity of the server used on the first request. Note this limitation: Many end users set their browsers to refuse cookies.
SSL session ID persistence For secure sessions using SSL, the load distributor can use the SSL protocol's session ID, which is unique to a particular end user, to identify that end user and the assigned server.

Geographic Load Distribution

Organizations with distributed centers of operations need to consider geographic load distribution as a strategy to optimize performance based on geography. Geographic load distribution optimizes which users in which locations are connected to geographically dispersed server farms.

Multiple server farms offer higher availability by removing the threat of a single point of failure. There are other advantages to distribution, such as the following:

Performance is improved by getting users close to the desired content on the network.
Users can be identified by country and receive content in the specified language.

The distributed sites will show higher performance in the aggregate if traffic is distributed intelligently, so this strategy works best if all are (approximately) equally loaded. Having one center under-utilized while another is congested wastes resources at each location. Intelligent geographic distribution decisions must also incorporate persistencethe user must be directed to a single site, at least for the duration of a transaction.

Geographic distribution decisions are made using the same criteria as for local load distribution, with additional input about the location of the end user. In some cases, the Internet address of the end user or of the end user's DNS server can be matched against a table of Internet addresses to determine probable locations. In other cases, all the server farms attempt return contact with the end user, and the server farm with the fastest access is assigned to that end user.

Content distribution network (CDN) switching is an interesting feature that enables a web site to use public content distribution networks for extended geographic reach and to handle traffic surges. As the web site's data centers reach capacity, a public CDN is used to handle the overflow until traffic levels fall. Public CDNs are available with usage-based pricing, enabling the web site owners to control costs as well.

Caching

Caches are special-purpose appliances that hide network and server latency by quickly delivering frequently used content. A cache delivers objects faster than a serverafter the objects are in the cache.

The cache is used in conjunction with an interception switch, which intercepts all traffic designated for particular services, such as the Web service on TCP Port 80, usually without regard for the ultimate destination. The cache looks to see if it already has that object in its storage; if so, it provides that object to the requester much faster than if the object had to be fetched from the server. Objects can be stored in cache explicitly, by being preloaded, or they are stored when the cache sees an object requested by an end user that it hasn't seen before. In that case, the cache performs the fetch from the web server on behalf of the end user, and it then stores the object in the cache memory for the next retrieval (see Figure 9-2).

Figure 9-2. Caching

The cache must be sensitive to the aging characteristics of each object. Some objects, such as company logos, may never change. Other content, such as current stock prices, will change constantly. Caching loses its value when it delivers expired (stale) content. Objects often are delivered with caching headers from their origin servers; those headers tell the cache how long it can store the object before it expires. If there isn't a caching header, or if the item is probably unique and will never be requested again (for example, a URL with an embedded query string, or a dynamically generated file with a URL ending in .jsp, or an encrypted object), the cache will simply ignore the object and not cache it.

Placing a cache in front of a server hides much of the server delay after objects are in the cache. Preloading those objects in the cache can be easily controlled by the server administration, if it owns both the servers and the server-side cache.

The cache can also be placed close to the client so that network latencies are eliminated as well. In those cases, the client-side caches are probably owned by the end user's ISP and will depend on caching headers for information about expiration.

All browsers also contain caches; this is readily apparent when an end user navigates by using the Back button on the browser. Note that a lot of the end-user activity may be concealed from server management tools if it comes out of cache instead of from the original server.

Content Distribution

On the face of it, there seems to be significant business opportunities for those who can deliver high-quality, content-rich services, including detailed graphics, animation, and sound. Service providers are also attracted to the potential of these high-value and high-margin services, as they represent significant business opportunities to attract new customers while growing revenues from the current customer base.

However, the impact of content-rich traffic on the current mix of services generally results in degraded quality and access delays. Despite all the new capacity, the initial model of a centralized server distributing content across the Internet backbone simply does not stand up to the demands for content-rich service quality at the scale needed to support large numbers of customers. Raw network capacity by itself will not solve the problems of time delay and packet loss at internetwork connections. These obstacles must be dealt with through structural changes in the content-delivery system itself.

As was discussed in Chapter 8, "Managing the Application Infrastructure," time delay across a networkpropagation delayis not decreased by increasing bandwidth. It is a result of the laws of physics and the distance traveled. Shortening that distance will therefore improve end-user performance. If that shorter distance results in the data packet's crossing fewer network boundaries, packet losses will also probably decrease.

Getting the content to the network edge is a good way of decreasing time delay and the number of network boundaries that are crossed. Placing multiple copies of the content at the edgessuch as cable system head-endsbrings the content closer to its consumers and thereby improves service quality. It also avoids the congestion and variability on the backbone, improving delivery of high-quality services. Investment in multiple servers at the edge is also cheaper than upgrading the entire backbone. By shifting compute cycles to the edge, where network latency is low, content-delivery architectures strive to ensure that end users get the full impact of rich content.

Content servers are caches that replicate the contents of the origin server around the network edges. The content servers can deliver high-quality video and audio streams as well as web page objects with high service quality.

Customers must be connected to the closest content server to take advantage of their relative location. This must be done transparently because customers should need to have no knowledge of the closest content server. The content-server assignment is made by the equivalent of a geographic distribution service, as described in a preceding subsection.

The content manager supervises the flow of content from the origin server to the content servers at the edge. The content is distributed over high-speed connections to maintain fresh information at the content servers. Content managers are also usually able to force the early expiration, if necessary, of pieces of content that have been cached in the content servers.

Most content distribution involves only static, unchanging, readily cached content. However, there are techniques that enable dynamically generated web pages to be cached. These generally involve a set of techniques to reduce the transmission volume of dynamic pages by identifying the specific changes and sending only those changes to the requesting end user. There is an industry standard that facilitates this process: Edge-Side Includes (ESI). ESIs are a way of marking the contents of a web page to tell the content distribution system which pieces of the page are unchanging, what their cache expiration times are, and how to do some simple processing to select one of a number of web page fragments for inclusion in a web page to be delivered to an end user.

Content distribution is available by assembling content servers and content managers from components sold by cache vendors or by subscribing to a content distribution network service, such as those furnished by Akamai, Speedera, and Mirror Image.