Going from High-Volume Sport Web Sites to Commercial Web Sites | IBM(R) WebSphere(R) and Lotus: Implementing Collaborative Solutions

How has IBM gone from the valuable experiences of hosting high-volume sport Web sites to incorporating the best practices "lessons learned" into the design of large commercial Web sites? Some examples of high-volume commercial Web sites that routinely use the best practices learned from the high-volume sport Web sites are as follows :

New York Stock Exchange Public Site, www.nyse.com
eBay auction site, www.ebay.com
Victoria's Secret woman 's clothing, www.victoriassecret.com
News/weather sites

First, let's look more in-depth at one of the most important aspects discussed for the Olympic Web sites in the previous sections: the use of Web server accelerators for the Olympics. These Web server accelerators fall into the general description of Web hardware appliance and are based on caching technology.

Web Caching Alternatives

IBM's experience with the Olympic Web sites and other high-volume sporting event Web sites helped show the significant benefits of using caching technology to increase the performance and scalability of Web sites. This section will discuss the hardware appliance, dynamic caching, and content delivery network (CDN) solutions available, explaining their differences, the types of content each serves best, pricing structures, and how they can act as complementary technologies as opposed to competitive ones.

Caching Theory

Caching is a fundamental computer science technique that involves the storing of frequently accessed content. It is used to address the Web site performance and scalability issues brought forth by ever-growing Internet traffic. Businesses tend to seek caching technology for three primary reasons:

Provide quality of service to end users
Protect from Web server stresses that cause site outages
Reduce performance-bogging site traffic

Caching theory suggests that when a particular resource is accessed, it is likely the same resource will be accessed again and again. As with the World Wide Web, one page request indicates that an identical one will be made shortly thereafter. Caching allows otherwise resource- intensive pages to be served without any burden to the Web server.

Hardware appliances, dynamic caching and CDN solutions, use caching theory as a basis to improve Web site performance, scalability, and uptime at a layer between the user 's browser and the Web server. However, they all do it differently.

The Different Approaches

Hardware and CDN solutions increase Web site performance by handling page requests before they reach the server. Dynamic caching solutions improve Web site performance by reducing the overall number of resources necessary to compile and execute database-dependent pages and eliminate the need for page re-executions.

One way to think about it is that hardware and CDN solutions take pages that are easy to serve, like graphics and streaming media, and place them on multiple servers closer to end users. Dynamic caching solutions make hard-to-serve pages such as Active Server Pages, ColdFusion, and Java Script Pages easier for Web servers to handle.

Because hardware, dynamic caching, and CDN solutions specialize in different file types, they can be used together to produce better performance for the entire array of Internet file types. Contrary to what many believe, these technologies are complementary, not competitive.

Hardware Caching Solutions

Hardware solutions are caching servers deployed locally, either in the rack with the Web servers or at co-location centers with the Web servers (see Figure 15-9). The actual caching devices are basically computers with preinstalled software. The difference between this type of hardware and a reverse proxy server is minimal, although hardware usually offers alternative means of purging the cache.

Figure 15-9. Hardware caching solution.

Caching hardware handles page requests before they get to the Web server, so if a requested page is found in cache, the hardware fulfills the request without communication with the server. This caching technique removes server load, freeing it to perform other tasks . To implement a hardware solution, each load balancer must point to the IP address of the device as opposed to the actual Web site. More than one caching device is typically needed for redundancy ”you wouldn't want your Web site going down should you have to reboot the hardware cache.

Pages stored in a hardware cache are purged when they reach their expiration time, which is based on headers returned from the Web server. Alternatively, pages can be purged based on a maximum duration configured by the administrator of the hardware cache. Administrators also can decide which files should be cached and which should not. Hardware caching solutions are best suited for graphics and streaming media. Dynamic pages are not typically cached on hardware because there is no way for the Web server to update them on content or database changes. CacheFlow, Volera, and Application Server are some of the hardware vendors for this type of solution.

Dynamic Caching Solutions

Dynamic caching solutions are designed to cache database-dependent pages. Typically these are software products that are installed directly on the Web server. Unlike hardware and CDN solutions, dynamic caching solutions reduce the overall resource load on the Web server.

Two things distinguish dynamic caching solutions from hardware and CDN solutions:

When changes are made to the database, the same changes reflect in the cache.
The ability to perform partial page caching.

Partial page caching (see Figure 15-10, based on use of XCache Technologies as the dynamic caching provider) enables administrators to cache parts of pages while leaving selected elements such as stock quotes, personalized content, and rotating banners dynamic. This greatly reduces the number of server roundtrips and system resources required to fulfill requests. To execute uncached page pieces, most dynamic caching solutions must reside on the Web server, which is the primary reason hardware and CDN solutions do not perform partial page caching.

Figure 15-10. Partial page caching.

graphics/15fig10.gif

Administrators can also decide which files should be cached and which should not. Hardware caching solutions are best suited for graphics and streaming media. Dynamic pages are not typically cached on hardware because there is no way for the Web server to update them when content or database changes are made. XCache Technologies, Persistence Software, and Chutney Technologies are some of the dynamic caching solution providers.

Content Delivery Networks

Content Delivery Networks (CDNs) are caching servers deployed around the world to form a network (see Figure 15-11). They usually reside in co-location centers and have access to large amounts of bandwidth. CDNs provide a way to offload Web traffic and are primarily used for graphics and streaming media. Unique to CDNs is the ability to deliver content from servers that reside closer to end users. This is to say that when a graphic is requested from the network, the server sitting closest to the particular request serves the graphic.

Figure 15-11. Content Delivery Network.

graphics/15fig11.gif

CDN servers often do not have the ability to "talk" to one another. This means if a user in Singapore requests a graphic, the CDN server sitting closest to the Singapore user will cache and serve the graphic, but a New York server belonging to the same network will not know to behave in the same manner. In other words, performance improvements are not realized until a second user in Singapore requests the same graphic, and a New York user will not realize any performance improvements at all. Implication: The more CDN servers there are, the more traffic a Web site has to have before performance improvements are achieved. This scenario assumes traffic is evenly spread across the CDN network.

CDNs know how to retrieve, cache, and serve graphics. How this works is the browser requests graphics from the CDN who then retrieves them from the Web server. Once the graphics are cached on the CDN, they are served directly to end users without any further server communication. An important concept to remember is that the HTML returned from the Web server must contain links to the graphics cached on the CDN for all this to happen, which typically requires code revisions. There are products available that can help change the HTML in dynamic Web pages so that they reference graphics located on CDNs.

Removing images and streaming media traffic from the origin Web server can greatly reduce load, a CDN specialty. On the other hand, dynamic pages that must be generated on the Web server must be served from the Web server ”once they are placed on a CDN, they become static. CDNs do not have the ability to know when Web server data changes and therefore do not have the capability to update page content. CDNs typically charge monthly service fees for their service and look for customers who have sites attracting a million plus page views per month. CDN solution providers include Akamai, Mirror Image, and SolidSpeed Networks.

Choosing the right caching solution depends on a Web site's design and the areas needing improvement. If there is a need to cache secure information, hardware caching solutions can keep all caching in house. If offloading traffic from a facility is the goal, CDN solutions can cache graphics and streaming media on the edge. If database-dependent pages are slow to deliver, dynamic caching will dramatically improve site performance.

Design of a Public Information Site for an Exchange

This section gives information on the design of an Exchange public site, which is hosted by IBM at an IBM server farm. This Web site brings in many of the design concepts for large Web sites discussed in the previous sections. This site receives over a million hits per hour during business hours (as opposed to a million hits per minute for the Olympic Web sites). Figure 15-12 shows the design schematic for this Web site.

Figure 15-12. Design of a public information Web site.

This design for a high-volume Web site uses IP Sprayers for failover and load balancing provided by IBM Network Dispatchers. WebSphere Application Server is used for the three large Web servers. Domino servers with HACMP Active/Passive failover are used to register users and keep track of their portfolios. The Domino servers are accessed by the WebSphere servers, not by end users.

New Design for the Exchange Public Site

In order to provide for very even higher availability and disaster recovery, IBM proposed to the customer the use of multiple sites ”in case of a disaster at one site, the second site would immediately be available to take all of the load. Instead of multiple sites, the customer chose to use a Content Delivery Network (CDN) as discussed earlier in this chapter. Akamai Content Distribution (www.akamai.com) was chosen . Akamai has hundreds of server sites worldwide, and users are now sent first to the closest Akamai site, which improves end-user response time and helps off-load the large Web servers. For some content the end users must still be sent on to the large Web servers. The Akamai servers constantly receive new information from the large Exchange Web servers, which is cached by the Akamai servers. The Akamai servers have features that can help prevent Denial of Service (DoS) attacks ”an added benefit to use of a Content Delivery Network.

A Global Web Conferencing Offering on the Internet

This IBM Web conferencing offering gives another good example of leveraging "lessons learned" from the design of IBM's high-volume Web sites for the Olympics to commercial projects. IBM's On Demand Workplace (ODW) Web Conferencing is a hosted offering designed to allow customers to securely share Web conferencing resources. ODW Web Conferencing was also discussed at the end of Chapter 13 since it involves the use of WAS clusters.

This offering also has redundancy for all components . The system is built around the Lotus Web Conferencing 3.1 product (a.k.a. Sametime 3.1) including the Enterprise Meeting Server (EMS) option. Sametime 3.1 runs on a Domino 6 platform, so this is a Web application that is primarily about Lotus Collaboration and the role of WebSphere as that of a support role (as opposed to most Web server sites where the collaboration role is secondary).

The core software implementing the solution functionality is based on a set of IBM software products. In particular, the solution will leverage the following IBM products:

Tivoli Monitoring
Tivoli Storage Management (TSM)
IBM Directory Server (used as LDAP directory)
WebSphere Application Server (WAS)
IBM/Lotus Web Conferencing 3.1 with the Enterprise Meeting Server (EMS) option
WebSphere Everyplace Subscription Manager (WESM)
DB/2
Domino 6

When this book was written, this IBM offering was being upgraded to Lotus Web Conferencing version 6.5.1 and Domino was being upgraded to Domino 6.5.1. However, the overall design of the Web conferencing offering at an IBM server farm as given in Figure 15-13 is the same, although several software levels have been upgraded and will continue to be upgraded as new versions of software appear.

Figure 15-13. Web Conferencing Logical server architecture.

The use of SSL Accelerators for the IBM Web Conferencing offering comes from IBM's experience supporting high-volume Internet sites.

SSL Accelerator/Terminator

SSL accelerators are network appliances that off-load the CPU (Central Processing Unit)-intensive encryption and decryption associated with SSL from the servers. When a server is running SSL by itself, a majority of the available CPUs are consumed with just the cryptographic functions, leaving few resources for actual server functions. With an SSL accelerator, the encryption functionality is off-loaded, and the server is free to do its other requirements.

Although SSL is encrypted, every SSL connection has a session ID that is not encrypted, which allows both sides of the connection to know which encrypted transaction is in session. However, some versions of Internet Explorer (5.0 through 5.5) renegotiate the SSL session ID every two minutes, thus making the SSL session ID ineffective as a way to differentiate between users. SSL accelerators can solve this problem. Because they decrypt traffic before it hits the Web servers, a load balancer between the SSL accelerator and the Web server facilitates cookie-based persistence.

SSL cards were once fairly popular items, but the advantages of SSL accelerators have started to win out. SSL cards address the need to off-load the work of encryption, but, because the demarcation point for the SSL traffic is the server itself, one cannot use cookie-based persistence from a load balancer. The only option for persistence is source-IP address. For all of these reasons the use of a hardware SSL accelerator for ODW Web Conferencing is a good idea and something that has been recently accepted as a "best practices" design at the IBM commercial server farms.