History of IBM s Experience with High-Volume Sport and Event Web Sites | IBM(R) WebSphere(R) and Lotus: Implementing Collaborative Solutions

History of IBM's Experience with High-Volume Sport and Event Web Sites

Much of the information in this section was provided by Paul Dantzig, manager of the High-Volume Web Serving group at IBM Research in Hawthorne, New York. Professor Dantzig has been involved with high-volume Web serving design for the IBM Sport and Event Web sites since before the 1996 Summer Olympics in Atlanta, Georgia. He has also applied the "lessons learned" from these high-volume sport Web sites to commercial sites such as www.ebay.com, where he has served as a design consultant. Copyright 2002 ACM, Inc. is included here by permission (see the Paul Dantzig reference in Bibliography for details of the ACM paper, which is the basis for much of the material in this section).

Architecting and designing high-volume Web sites has changed immensely since 1996 when IBM hosted the Olympic Web Site for the Atlanta Olympics. These changes include the availability of inexpensive Pentium based servers, Linux, Java applications, commodity switches, connection management, caching engines, bandwidth price reductions, content distribution services, and many others. This section describes the evolution of the best practices within IBM in "architecting" sites that handle millions of page views per day. Discussed is the transition to multi-tiered architectures, the use of publish/subscribe software to reduce Web site hits, the migration from static to dynamic content, and techniques for caching of dynamic and personalized content.

Domino played a significant role in IBM's design of the Olympic Web sites and other sports and events sites. Domino's role was in the collaboration areas of Guestbooks, Feedback, Kids (mail, stories, pictures), and Digital Postcards. Domino replication was used to distribute the Domino content to each of the sports and events server complex, and then Domino replication was used again to distribute the content throughout the servers within a complex.

Evolution of Design for Very High-Volume Web Sites

The initial design of high-volume sites consisted of clustering a set of Web servers together and either using round- robin Domain Name Servers or routing at the TCP level. All the Web servers were fully replicated mirrors of each other (containing the same database, application software, Web serving software, etc.). Users of such systems often experienced inconsistencies in a site's content due to lags or failures in replication as different servers served each page. The cost of replicating databases on the serving nodes consumed cycles needed for serving, resulting in inconsistent response times.

To reduce the consistency and performance issues, sites began implementing two-tiered architectures. The first tier consisted of a set of systems containing just Web servers, and the second tier consisted of back-end servers such as databases, payment systems, and other legacy systems. This type of architecture provided significant relief from the above problems. Reduced numbers of database replications improved consistency and consumed significantly fewer resources. Response time for static requests was not impacted by the dynamic requests , which were now run in the back-end systems. Some high-volume sites grew to contain hundreds of front-end nodes. Scaling the backend usually consisted of installing bigger and bigger machines to avoid the database replication problems. It was quickly realized that the primary difficulty of running hundreds of nodes was not the hardware cost but the personnel cost to manage them.

Designing Web Sites to Support Over One Million "Hits" Per Minute

In order to help reduce the number of "front-end" Web servers for high-volume Web sites, many companies entered the Web server accelerator market (Cisco, Cacheflow, Network Appliance, IBM, and others) to market new devices to reduce the number of servers. These devices typically handle an order of magnitude more traffic than a single Web server running on a general-purpose operating system. These devices run as reverse proxy servers with large amounts of memory and disk. In addition, specialized switches and routers came on the market to distribute connections to the Web servers or the Web server accelerators. Newer devices on the market function both as proxies and connection routers (load balancers), which use heuristics such as the number of connections, server load, and content to distribute connections to multiple servers.

In addition to specialized hardware, new software techniques were developed to improve the performance of dynamic pages. One technique that was quite radical at the time was to cache dynamic pages the first time they were created. Methods for constructing pages from individual fragments allowed expensive content to be generated once and shared among many pages. However, successful deployment of these techniques requires much stricter cache consistency. In order to maintain the consistency of the cached data, the server needed to explicitly manage the cache contents. Algorithms for keeping cached dynamic data consistent were developed.

Dynamic data was realized to be nothing more than rapidly changing static data. To keep certain types of content up-to-date required reducing the expiration times to tens of seconds. Some Web sites adopted automatic refresh of pages thus creating an enormous numbers of hits. The new problem was how to reduce the number of hits without sacrificing the better response time to which users had become accustomed.

New Designs with Sports Consoles, Persistent Sessions, and Java Applets

For some types of content, in particular sports scoring results and financial market data, even better response times were considered necessary. Sport and Financial consoles have been created to invoke Java Applets, which open a long- term connection to a "publish and subscribe" system using either private protocols or published protocols, such as Java Message Service. As content changes, the publish and subscribe system pushes changes to those users interested in the change. These systems have become enormously popular with hundred of thousands of active consoles being supported.

Many high-volume Web sites require 100 percent availability. Geographically redundant sites and content distribution services (such as Akamai) place servers and proxy servers closer to the user 's ISP connections. In addition to improving availability, using multiple geographic sites or content distribution services can reduce the latency for delivery of graphic and streaming content.

Role of WebSphere and Domino in IBM's High-Volume Sport Web Sites

Both WebSphere and Domino played significant roles in the design of IBM's High-Volume Sport Web sites. For the 2000 Olympics at Sydney, Australia, almost half of the Web server role was Domino. Collaborative ”interactive features for sending mail, postcards, guestbook entries, and feedback based on Domino could be used both by the athletes and the "virtual spectators" by accessing the Web Site. Domino replication and the use of NotesPump provided the distribution and publishing of Domino content.

The following sections give examples of how the IBM Sport and Event Web sites were impacted by the high-volume Web site technologies discussed previously.

1996 Atlanta Summer Olympic Web Site

Like many first of a kind Web projects during the early years , the 1996 Atlanta Games Web Site was architected, designed, and programmed in a hurry. The project started in September of 1995. Figure 15-1 shows the basic architecture for the part of the Atlanta site for handling results on the Web.

Figure 15-1. 1996 Atlanta Summer Olympics Web site architecture.

The basic system setup was a cluster of 20 IBM RS6000s rack mounted in two SP2 Frames and two experimental TCP/IP router boxes for routing the incoming HTTP traffic. Each node contained identical software including the Apache HTTP server, a Fast CGI Apache extension, a connection manager daemon, a cache manager daemon, a database, and 20 prestarted processes reconnected to the database that performed SQL queries in the database if the memory cache did not contain the prebuilt page. Each sport and medals information section was allocated a separate cache memory. If a node were restarted, the caches would come up completely empty.

The first request for a page would invoke the following actions:

The HTTP requests were distributed to a server node based on system load from the TCP/IP Router to the Apache Process.
Apache called the Fast CGI Apache Extension.
The Fast CGI Apache Extension queried the Cache Manager with the unique cache ID (part of the URL).
If the page was cached in the appropriate Sport or Medal Cache, the cache manager returned the data, which was then returned to the user.
If the page was not cached; the connection manager was called to negotiate for a prestarted database process if available.
The prestarted process negotiated for and retrieved information from the database and generated an HTML result page. The Cache Manager was invoked to store information in the appropriate cache. Finally, the page was returned to the user.

In order to maintain cache consistency, whenever an update was made to one of the database tables related to a sport or medal, a database trigger invoked a process to delete all entries in the specific cache for the affected sport.

Since the system depended on keeping 20 nodes in sync (which was difficult to do), node-to-node consistency and performance problems were both quite visible to the user community. The characteristics of sporting events made the performance problem even more obvious. Users would note when an event would end and inundate the system with requests. This was precisely when the result arrived in the databases, causing the entire sport cache to be invalidated. All the requests would have to be resolved from the database until that data finally arrived in the cache. There was more than an order of magnitude difference between a cache hit and a database hit in response times.

Over the 17 days, the 1996 Atlanta site received over 190 million hits. 114 million of those hits were result requests. Peak result server hits reached nine million on August 1, 1996. Cache hit ratio averaged 80 percent, and a peak of 45 cache transactions per second per node was achieved.

1998 Nagano Winter Olympic Web Site

By 1998, IBM had signed up to sponsor several major sporting and other events. These new events included the U.S. Open Tennis Tournament, Wimbledon, the French Open Tennis Tournament, the Grammy Awards, the Tony Awards, among others. It was decided to invest in building a permanent infrastructure that could last several years to handle the Sport and Event Web sites. This new infrastructure was to be architected, designed, built, and tested before the Nagano games.

To achieve a high level of redundancy, three permanent geographic locations were chosen close to major Internet backbone sites. One east coast and two midwestern United States sites were constructed . In addition, a fourth temporary site for Asian traffic was established in Tokyo. Each site consisted of either three or four racks of 11 RS6000 nodes (one SP2 Frame). Figure 15-2 shows the configuration of one rack of nodes.

Figure 15-2. 1998 Nagano Winter Olympics Web site architecture.

One of the criticisms of the Atlanta Web site was that finding information required too many page fetches. The Atlanta site was organized by results, news, photos, and so on ”and by sports and events. Although country information and athlete biographies were provided, results corresponding to a particular country or athlete could not be collated. Consequently, the Nagano Web site organized results, news, photos, etc. by countries and athletes as well as by sports and events. The Web results displayed all of the information related to an event (the Atlanta site in many cases only had summaries). Some of the pages now were complex enough to require up to 30 seconds to produce the HTML tables (Ice Hockey results were a notable example). To reduce computational load, Country and Athlete pages were only generated every 15 minutes.

Shown in Figure 15-2 is the Nagano architecture. This architecture is a combination of two architectures: one to pregenerate content and perform content distribution and the other to serve Web requests. Each time a change was made to news, photos, and related content, the Domino server triggered a change. Each time a schedule, bio, or result changed, the database triggered a change. The pieces of information flowing into the system (each news item, result, etc.), when converted to a Web object (HTML, image, etc.), were now considered a fragment of information and were usable anywhere in the site. A new technology, Object Dependency Graph (ODG), was developed to keep track of the relationship between fragments and pages.

The Cache Manager on each node took on the role of a distributed file system with no persistence. In theory, a 100 percent cache hit-ratio could have been achieved with this type of system. Because it was a memory-based cache, pages infrequently requested , such as athlete biographies and country information pages, were frequently invalidated. Except for the country and athlete pages, the new model became a pro-active page replacement in which a page is replaced in cache and never invalidated. Using fragments and the ODG, only the pages rendering from page serving each server node now had consistent response time. Note: When an SMP node goes down, the cache information is lost for all the nodes; to avoid performance problems, key pages in the site were primed before the nodes were put back on-line.

Based on the lessons learned from Atlanta, database replications were minimized. Each site contained no more than four databases (one for each rack). Users were directed to one of four sites by using standard router algorithms (e.g., Japanese traffic to Tokyo or British traffic to the East Coast). If one site failed or needed to be down for maintenance, the routes automatically shifted load to the other sites. The only data inconsistencies were between sites, so users were unaware of the fact that the Asian site and East Coast site were not consistent with each other.

Nagano was an immense success. All of the goals for the site were achieved: 100 percent availability, good and consistent response times, every country and athlete treated equally, and no single points of failure. At the end of the event the publishing system contained approximately 50,000 pages in English and Japanese. The site had built 10,645 news article fragments, 14,355 photo fragments, 5,988 result fragments, and total combinations of the previous into 23,939 Sport, Country, and Athlete pages. The average number of pages rebuilt each day was 20,000, and on the peak day, 58,000 pages were rebuilt.

The Lotus Notes/Domino collaboration statistics were also impressive: 116,540 digital postcards; 12,990 kids' stories, pictures, and mail; 45,914 guestbook entries; 31,265 feedback entries; and 5,511 newsletter subscriptions.

The traffic to the site broke two Guinness records: The Most Popular Internet Event Ever Recorded , based on the officially audited figure of 634.7 million requests over the 16 days of the Games, and The Most Hits On An Internet Site In One Minute , based on the officially audited figure of 110,414 hits received in a single minute around the time of the Women's Figure Skating Free Skating (see Figure 15-3). Hits on the peak day were 55.4 million. Note, however, that Guinness records based on Web site activity are short lived. Figure 15-8, later in the chapter, shows that this 1998 Guinness record was surpassed the next year (1999) by about a four-fold increase in hits at the Wimbledon Tennis Web site, where 430 K hits per minute were recorded. Then at the 2000 Sydney Summer Olympics, an officially audited figure of 1.2 million hits a minute was recorded. This interest in growth in hits at Web sites for sports and event sites is also very evident in commercial Web sites, where companies are very interested in how many people are visiting their Web site and which areas of the site are most popular. How high-volume commercial Web sites have leaned from the earlier high-volume Olympic Web sites is discussed in the second half of this chapter.

Figure 15-3. 1998 Nagano Winter Olympics results page.

Figure 15-8. Growth of sport and events peak hits per minute.

graphics/15fig08.gif

1999 Wimbledon Tennis Web Site

Even before Nagano, many of the large Web sites were already looking for cheaper and faster ways of serving static content. By 1997, many of them had switched from using expensive proprietary Unix systems (SUN, IBM, HP) to Intel-based systems running operating systems such as Free-BSD and Linux with Apache to reduce costs and improve performance. Also in 1997, IBM Research prototyped a combination router and proxy server cache that could support an order of magnitude more hits than a server on the Sport and Event Web sites could handle. The product was transferred to development and made available to the Sport and Event environment after Nagano was complete.

After a number of trial runs during events in 1998, four production Web server accelerators (IBM 2216s) were put into production for the 1999 Wimbledon event. Figure 15-4 shows the Web site architecture for that event.

Figure 15-4. 1999 Wimbledon Tennis Web site.

To get high-cache hit ratios (93 percent), result-scoring content was pushed into the Web server accelerators through a proprietary interface whenever content changed using the Trigger Monitor daemon. The proxy server built-in to each Web Server Accelerator requested other site content. Each Web server accelerator contained 500MB of memory, and the working set for Wimbledon was less than 200MB of memory.

The 1999 Wimbledon event ended with the officially audited figure of 942 million requests over the 14-day event. The peak hit rate of 430K hits per minute was received on June 30, 1999. Also received were 71.2 million page views and 8.7 million visits to the site. This event proved that inexpensive Web Server Accelerator hardware could reduce the size and cost of the infrastructure even with the enormous rate of growth of these events.

Many Web Server Accelerators (including the IBM 2216) produce no Web server logs, either because the logging function does not exist, or because logging impacts the performance of the accelerator. However, most do support SNMP interfaces, which can be used to record hits, misses, memory usage, and so on. This was the process IBM used to obtain information on the performance of the accelerators.

2000 Sydney Summer Olympic Web Site

Based on what we had learned from the previous events, the key to the success of the Sydney Olympic Web site would be making Web Server Accelerator cache misses low-cost. Two critical pieces were already prototyped. Learned from Nagano was how to make a system for prebuilding pages (Trigger Monitor) and distributing pages. Learned from Wimbledon was how to make a really good edge-serving infrastructure using Web server accelerators. Both of these systems needed critical improvements to handle the bigger loads anticipated in Sydney.

Installed after the 1999 Wimbledon event were 65 Web server accelerators. A fourth temporary server site in Sydney was planed to augment the three permanent server sites, which now had one East Coast, one Midwest, and one West Coast United States locations. An agreement was made with a telecom to collocate a large number of the Web server accelerators at five major Internet hubs in the USA. The installed capacity of the accelerators was between three and four million hits per minute. Figure 15-5 shows the server and Web server accelerator locations. The East Coast would handle the European traffic, and the West Coast the Asian traffic. The origin servers were located at IBM Global Services Hosting locations nearby. Experiments had been run during other events, which demonstrated that if all of the content (HTML, GIFs, JPEGs, etc.) were given correct expirations , content would not have to be pushed to the caches; the accelerators would simply proxy the content on a miss . The cache-hit ratio typically dropped to about 75 percent, which was within the hit capacity of the server sites.

Figure 15-5. 2000 Sydney Olympics origin server and Web server accelerator locations

graphics/15fig05.gif

Because of the experimental nature of the dynamic page generation system, both the Atlanta and Nagano sites were designed and implemented as two separate sites ”one of which served relatively static content and the highly dynamic results delivery system. Shortly before the games started, an ad-hoc effort was made to merge the sites and present the illusion of a single, unified site. It became quite clear during the merge phase of the Nagano site that the Sydney site would need to be constructed from the beginning as a single, unified site.

The combination of building a complete, unified site based on fragment technology, as well as improvements in the technology itself, produced an unforeseen problem: how to manage a very large number of fragments. Early estimates for the size of Sydney content were based on simply taking the number of sports, events, counties, athletes, etc. and multiplying Nagano times three (75K pages and 150K rebuilt pages a day). No more than four fragments per page and no imbedded fragments were supported in the Nagano publishing system.

There was a large shift between Nagano and Sydney from many expensive replicated engines at every site to a single, large, centralized multiprocessor engine for content management and many inexpensive Web server accelerators to serve the site. Also, in Sydney, there was only a single instance of any database, finally eliminating all replications.

Figure 15-6 shows a Sydney 2000 Web site page. It illustrates the depth of the information available to the Web site users. All of this information typically was available in less than one minute after the result system triggered a new result.

Figure 15-6. 2000 Sydney Web page

More than seven terabytes of data was distributed by the content distribution system to the 57 nodes. Scoreboard updates were delivered from Sydney to the East Coast, serving nodes in less than six seconds. Given automatic delays built into most television broadcasts of 30 seconds, users discovered that the site had scores on the Web before the broadcasters communicated them. Even the complex result updates with hundreds of pages changed were delivered on average in less than 40 seconds to the serving nodes.

It should be no surprise that every event record was broken again. The traffic to the site broke the most hits on a sport event site in a single minute based on an officially audited figure of 1.2M hits. Total traffic for the event over the 16 days was 11.3 billion hits. The Web server accelerators handled less than half (5.5 billion) of the hits; the ISP proxy caches and the user browsers handled the rest (due to the expiration policy). The Web server accelerators had an 80-90 percent cache-hit ratio during the entire event. Thirty percent of the log entries on the serving nodes were UC.GIF records. Other significant statistics include the peak traffic day of 875M hits, 2.8M visits, and 18M page views and a total for the event of 35M official visits and 222M page views. Sydney was completely successful and fulfilled the promises of being bigger than any of the previous events.

2001 Wimbledon Tennis Web Site

Sports fans are a difficult audience to satisfy . For Atlanta, almost all of the scores were delivered after the event was over. For Nagano, many events provided periodic updates. For Sydney, all the major sports had sport consoles with 30-second updates. With Wimbledon 2001, technology finally caught up with the fans desire for real-time scoring consoles.

With the new content distribution system, Web objects could be reliably distributed to all the geographic server sites in seconds. This enabled very short expirations of 30 seconds to be set on data for the scoreboard applets. With scoring data delivered to the most distant serving node in about six seconds and the scoring data's 30-second expiration, the scoring console on the user's workstation was nearly always up-to-date. New technologies based on "publish and subscribe" were added to the even infrastructure to support a real-time scoreboard. Figure 15-7 shows a sample tennis console from Wimbledon 2001. The console is a Java Applet and uses Java Message Service (JMS) as the protocol to subscribe with the message broker. Each user acquires a permanent TCP/IP socket to a message broker. The trade-off of the cost of hits to a Web server accelerator or a permanent connection to a message broker is based on update rate.

Figure 15-7. Wimbledon 2001 tennis console "real-time scoreboard."

graphics/15fig07.gif

Using current technology update rates of less than 30 seconds, messaging tends to be cheaper (although research into the exact " break-even " point is needed). For the 2001 Wimbledon finals between Rafter and Ivanisevic, over 230K consoles were active. Two other sporting event records were broken during this event with peak hits per minute at 1.4M and peak page views in a day at 24M.

Current Design for IBM Sport and Events Web Sites

Based on "lessons learned," all of the hardware and telecommunication equipment for the IBM Sport and Events system has been replaced in the last few years. The servers are now IBM Pentium-based processors running Linux. The Web server accelerators were replaced with commercially available equipment. Multiple vendors provide telecommunication bandwidth directly (OC-12s) into each of three geographic sites. Figure 15-8 shows the huge growth the sites have received. As the Web audience has increased worldwide, the sites have continued to receive more hits, page views, visits, etc., year after year. Notice that the 1998 Nagano Guinness record of 110,414 hits in a single minute was quickly eclipsed. The 2000 Sydney Summer Olympics at 1.2 million hits per minute was more than 10 times that Nagano Guinness book of records mark.