7.4 Optimizing applications

7.4.1 Use of multicasts

Many legacy network applications still rely on unicast or broadcast distribution techniques to push data out to users. For example, a market data feed may be distributed to each subscriber via individual TCP sessions. Applications that distribute essentially the same information to many users would be much better served by multicasting. If the application is developed in-house, it may be well worth reexamining the application architecture to see if this is possible. Several leading application providers in the financial arena have already implemented, or are considering the use of, multicasts. Clearly, if the application is multicast based and is to be used over an internetwork, a suitable distribution mechanism needs to be in place. This requires a routing infrastructure based on protocols such as IGMP, MOSPF, PIM, and DVMRP.

7.4.2 Wide area tuning issues

In wide area networks there are a number of points to consider when optimizing applications, including the following:

Sun's Network File System (NFS) runs over UDP and has a default retransmission timeout of 700 ms. A full datagram is 8,192 bytes (six frames). Therefore, if NFS sends a complete datagram over a 64-Kbps leased line, this will take over a second. The timer must be increased to approximately two seconds, or the datagram size in the NFS end systems must be decreased to 3,000 bytes or less. The latest versions of NFS adjust timeout and message size dynamically to suit the WAN delay characteristics.
Satellite links typically have a Round-Trip Time (RTT) of 500 ms or more. Too large a message size gives best efficiency but increases the probability of error. Lowering the message size decreases errors but loses efficiency due to encapsulation overheads.
Bridges cannot modify the message size, since this would result in fragmentation. However, routers can, so they can optimize the MTU to improve throughput according to the WAN delay characteristics. In a situation where Token Ring is being bridged to Ethernet over the wide area, you can either set the MTU for all Token Ring interfaces to 1,500 bytes or use routers. Source Route (SR) bridges report the MTU size available for the path and how much would be possibly used.

7.4.3 Proxy services

A router or switch can act as a proxy for services that are not available locally. In a sense they are acting as surrogates, pretending to originate a service that is actually hosted elsewhere, and for this reason this functionality is often referred to as spoofing. Proxy services also improve application performance by minimizing latency and preserving valuable wide area bandwidth. Typical examples of proxy services include the following:

Service announcements—A router or switch may have to pretend that it is a server if resource-discovery frames are issued on a remote LAN and there is no server locally. Examples include the following:
- NetWare NCP servers broadcast Service Advertisement Protocol (SAP) messages regularly. In a remote dial-up environment, where there is no local server, a router may spoof SAP broadcast announcements and respond to Get-Nearest-Server requests from clients (i.e., act as a SAP proxy).
- Banyan VINES networks may require a router to respond to local VINES clients requesting network addresses.
- Address resolution services—For example, there are circumstances where a router must support Proxy ARP to offer MAC to IP address resolution where subnetting is not supported on end systems.
- NetBIOS name caching enables a router to maintain a cache of mappings between NetBIOS server and client names and their MAC addresses. This avoids the overhead of transmitting broadcasts between NetBIOS clients and servers in a Source Route Bridging (SRB) environment. As a result, broadcast requests sent by clients to find servers (and by servers in reply to their clients) can be sent directly to their destinations, rather than being broadcast across the entire bridged network. The router also notices when any host sends a series of duplicate query frames and limits retransmission to one frame per period (the time period is typically configurable).
Routing announcements—Proxy services are commonly employed where routing protocols are used over dial-up or low-bandwidth WAN links. For example:
- NetWare routers broadcast Routing Information Protocol (RIP) messages regularly.
- IP Routing Information Protocol (RIP) announcements need to be constrained and spoofed by local spoofing or low-demand services.
- OSPF LSAs need to be constrained by running in low-demand mode.

For remote dial applications where there is a single satellite site (i.e., a stub network), it is often easier to implement static routes than use a low-demand dynamic routing protocol.

Keep-alive spoofing—Spoofing is commonly employed where LAN operating systems are used over dial-up or low-bandwidth WAN links. For example:
- Novell NetWare has a number of features that require spoofing. For example, NetWare Core Protocol (NCP) servers may send keep-alive messages to all connected clients every five minutes to establish their status. NetWare IPX and SPX Watchdog packets must also be spoofed. Finally, NetWare issues regular serialization queries to check that installations are properly licensed; these also must be spoofed.
- Data link acknowledgments (e.g., LLC2)—The majority of enterprises use the connectionless form of link layer (LLC Type 1), which is more an encapsulation format than an actual protocol. Some enterprises, however, (especially those dominated by IBM) use connection-oriented SDLC and LLC Type 2. These protocols create and maintain session states and are not supported natively over the Internet. If SDLC or LLC2 is tunneled end to end over IP, this will most likely result in timeouts, since the full round-trip time could be orders of magnitude higher than normally expected. To avoid this problem some routers enable data link sessions to be spoofed locally; both the local and remote nodes have data link sessions with their local routers and not with each other (no Layer 2 traffic is sent over the Internet). This preserves backbone bandwidth and ensures sessions stay active. This feature is often called local ACK.
- Applications that send regular TCP ACKs as keep-alives must be spoofed at routers or switches over dial-up links. In practice this is hard to achieve, and since both ends of the link must keep state and be prepared to synchronize once the link is active again.

As indicated, proxy services are commonly used on dial-up links, since it would be ill advised to let regular announcements and keep-alives cause the link to stay up permanently (or cause regular dial-ups) and since we are likely to be charged for both call setup and uptime (e.g., ISDN). Proxy services are also a good general technique for conserving backbone bandwidth.

7.4.4 Caching techniques

On any large internetwork where data sources are distributed, one of the prime areas for optimization is concerned with the way in which content is moved around. If we analyze the movement of content on a public backbone such as the Internet, we begin to see consistent patterns emerging. A significant proportion of that content may be moved again and again, often over large distances from the origin server to the recipient. This leads to two problems, as follows:

Latency—There is a noticeable delay between the recipient requesting content and that content being delivered.
Congestion—Moving the same content over the backbone many times wastes valuable bandwidth.

Both these issues are intimately linked. If congestion increases, this contributes to additional latency. If latency is excessive, sessions may time out, leading to retransmissions and further contributing to congestion. In order to counter these delays and preserve valuable backbone capacity, significant research activity has taken place in recent years leading to the implementation of caching systems.

Caching basics

Caching systems monitor content utilization statistically and store copies of frequently used data closer to the users, so that future requests can be serviced locally. Caching is a generic technique for improving performance and has been applied almost universally at the network system and microcomponent level. As with compression techniques, caching relies on repeatability for its efficiency. If on a particular network content is moved only once, then caching adds no value.

In practice, placing caching systems at strategic points on an internetwork brings several major benefits: Average latency is reduced, backbone capacity is increased, and resource utilization of origin servers is diminished. Overall caching systems can significantly improve scalability and prove very cost effective (wide area circuit costs make up a substantial part of the IT budget, so reducing backbone utilization means that either downsizing or increased productivity is possible). Whenever a user requests information, the request is redirected to a nearby cache server. The server checks whether duplicate content is held in cache. If found, the server returns the content to the requester. If not found, the server retrieves the original data from the origin server, delivers the content to the requester, and possibly caches this new content.

Caching system design issues

Designing a highly efficient caching system is not easy. Some of the key problems that cache systems have to address are as follows:

Optimizing content—Since caching systems have finite resources, they operate by analyzing user requests statistically. In principle, the more the same content is requested, the higher the probability that the content will be cached. When users request content, if that content resides in cache, we term this a cache hit; otherwise, it is termed a cache miss. The more requests can be serviced from cache, the higher the hit rate. The decision to cache a requested object needs to be made quickly. To optimize resources the server needs to employ sophisticated statistical techniques to analyze the most frequently accessed content, together with actual content size. This must be implemented dynamically, so that changes in use patterns at different times of the day or different days of the week are reflected in the cache population.
Synchronization—Cached content may become out of date and must be periodically refreshed. Most cache managers include synchronization mechanisms to regularly query the origin server to ensure that data held locally are current. Ideally this should be asynchronous (i.e., done in background mode). If a large proportion of the content is changing rapidly, caching systems either present inaccurate data (if synchronization intervals are too long) or become less efficient (since the caching system is spending much of its time pulling down updates). Note also that Web browsers often implement local caching at the desktop device; again, if this content is changing, the user must remember to manually refresh these data.
Capacity and responsiveness—System performance is determined largely by the architecture and the implementation of the cache server, such as the operating system, whether the server uses multiple threads, queuing strategies, load balancing, and so on. If only a modest amount of cache resources (CPU, RAM, disk) are available and content is changing frequently, the hit rate will be relatively low. In busy systems it is important to ensure that cache retrieval latency and cache size restrictions do not become a bottleneck. The ability to respond to user requests in a timely manner is determined by a number of techniques used to maximize cache hit rate, including the structure of cache hierarchies and content optimization.
Reliability and availability—Since caching is essentially a transparent enhancement to a normal network operation, it is usually possible to continue operations if a cache server fails (albeit with degraded performance). Having said that, loss of caching systems may violate user QoS guarantees. It is, therefore, highly beneficial to design in some level of fault tolerance or high availability (perhaps through clustering techniques) to minimize service disruption.
Management and control—As with any sophisticated network-wide resource, it is important to be able to easily install, configure, and maintain cache servers. Even though cache clusters may be distributed over large distances, they need to be centrally managed, with the ability to download consistent policy. Management should include event logging and statistical reporting capabilities, together with tight security mechanisms.
Scalability—Since cache server resources are finite, they must be monitored regularly. If too many users are directed to a particular cache server, performance can degrade, causing more able users to circumvent the cache. If caching systems become overloaded, more cache servers need to be installed to share the load. It is desirable for large caching systems to scale performance close to linearly, O(n). In the general case, where a significant number of the remote objects are being requested by many users, the real problem is how to design a cache server architecture that scales to keep pace with demand. The size of disk arrays and disk seek times may not create a bottleneck, but caching software must manage large amounts of data concurrently (both for writing new content to be cached, and reading cached content). The physical organization and access of data on disk is a potential issue, since general-purpose file layouts and operating system file IO functions may not be sufficiently optimized for specialized applications. Very large caches are effective but also the most difficult to implement, since they typically require some form of clustering or load sharing to scale beyond the limitations of a single system.

To provide scalability, some caching systems are hierarchical. There are two types of hierarchical caches: sibling and parent-child. In the sibling model, a cache server that does not have the request content sends a request to all other servers in the group for that content. The parent-child model is a vertical hierarchy; the child cache server only asks its parent for a resource. Both designs are supported by the Internet Cache Protocol (ICP) version 2, a lightweight query-response protocol used for communicating between Web caches (specified in [31, 32]. For most organizations using multiple cache servers the sibling model is probably adequate. But if an organization's ISP also uses a cache server, a parent-child design can significantly reduce traffic on ISP backbones.

Applications that benefit from caching

In an internetwork environment network caching benefits a number of content delivery services. In general the most suitable services are those where fairly static content is requested and delivered frequently and repeatedly (e.g., documents, Web pages, video clips, certificates, etc.). Promising applications include the following:

HyperText Transfer Protocol (HTTP)—HTTP content is perhaps the most obvious example of content that is frequently and repeatedly accessed. The increased use of embedded objects in Web content means that file sizes are also increasing.
File Transfer Protocol (FTP)—FTP servers are commonly used to store boot images, documents, configuration data, driver updates, and so on. Since the average file size transferred by FTP tends to be larger than a typical HTML file, FTP users can benefit even more from local caching.
Network News Transport Protocol (NNTP)—Frequent news updates are delivered to large numbers of subscribers [33]. Usenet news is one of the biggest sources of inbound Internet traffic for ISPs. Many organizations and smaller ISPs no longer maintain full Usenet feeds for their networks. Instead, they use a larger ISP's upstream news server but only pull down selected articles of interest to end users.
Real-Time Streaming Protocol (RTSP)—Internet video and audio applications transfer literally hundreds of megabytes to registered users.
Public Key Infrastructure (PKI) services—Certificate Revocation Lists (CRLs) may be cached locally to reduce latency. This is especially beneficial for transaction-oriented applications where latency is critical.

It is possible to use a cache with dynamic Web content, since even these pages tend to contain a large proportion of static content. With Web caching, 40 percent or more of browser requests can typically be offloaded from the network and serviced locally. To illustrate the point let us consider what happens when a user requests Web and FTP content without caching, as illustrated in Figure 7.13. If we assume several thousand LAN users, every new user requiring the same content causes both HTTP requests and the content to repeatedly traverse the network. All of these users are subject to the end-to-end latency of hauling megabytes of data over the network, and the network is effectively wasting bandwidth sending the same content repeatedly (and often concurrently) to the same destination. In Figure 7.13, the content requested could, for example, be a static HTML page (with links to other files and embedded graphics) or a dynamically created page (created by a search engine, database query, or Web application). Dynamically created pages typically have some static components. FTP requests are dealt with in a similar manner.

click to expand
Figure 7.13: Web browsing and FTP file requests without caching. (1) The Web browser issues an HTTP request for a Uniform Resource Locator (URL), which refers to a specific Web page on a particular server (also known as an HTTP server or origin server) attached to the Internet. (2) The request is forwarded to the server through standard IP routing. (3) The HTTP Content Server returns content to the Web browser one file at a time (which typically comprises a sequence of large packets).

There is clearly room for improvement. We can reduce latency and also preserve valuable backbone bandwidth by distributing frequently used content closer to the users. As illustrated in Figure 7.14, content may be cached inside the perimeter firewall at the user site, at the local ISP, or at some other Network Access Point (NAP) or Point of Presence (PoP) closer to the user site. For an ISP, traffic reduction could provide substantial savings in circuit utilization and represent additional capacity (up to one-third of an ISP's operating costs may be attributed to recurring circuit costs). For a user organization, placing cache inside the perimeter firewall may provide substantial performance benefits if all content must be scrutinized by the firewall. For example, if all incoming content must be checked for viruses (notoriously slow), cached content will already have been decontaminated and is, therefore, instantly available. As can be seen in Figure 7.14, cache servers could be placed behind the user's firewall, on the local LAN, or even at the ISP PoP. In this case only requests for new content (infrequently accessed content or content that has aged out) or content synchronization updates are sent over the wide area. Local cache servers service all requests for frequently accessed content, reducing both WAN bandwidth and transaction latency.

click to expand
Figure 7.14: Web and FTP transactions with caching.

Deploying cache in an internetwork

How to deploy cache servers

There are several approaches to implementing cache server architectures. The model used depends on several factors, including where the cache is implemented, the primary purpose of the cache, and the nature of the traffic. Caches can be deployed with varying degrees of transparency, as follows:

Nontransparent—In order to make use of cache servers, network administrators must reconfigure Web browsers and FTP client software on all end stations to point at the cache instead of the Internet at large. In this case the cache acts as a proxy agent. Nontransparent caches are often components of larger proxy servers acting as part of an overall gateway or firewall solution.
Semitransparent—A semitransparent cache can be implemented by mass dynamic configuration of browsers. Both Microsoft and Netscape provide tools that enable this to be achieved, and browser plug-ins are available to provide automatic configuration. This approach is better than hard configuration but does require some maintenance effort on an ongoing basis.
Transparent—A transparent cache is invisible to browsers and other systems. It operates by listening to network traffic and intercepting and responding as appropriate. Some router and application-aware switching products enable you to redirect all outbound content queries (e.g., HTTP requests) to one or more local cache servers. This approach is completely transparent and requires no end-system modifications; changes to the cache server configuration need only be reflected at the router or switch, and attempts to circumvent the local cache are blocked. It may also be possible to use intelligent load sharing using algorithms on the router or switch to distribute load between cache clusters and improve availability.

For service providers or large internetworks a transparent or semitransparent configuration is preferable, since it minimizes maintenance and support overheads and avoids service disruption if caching components fail. Users automatically receive the benefits of caching without any knowledge of its activities, and caching server designs can be modified without having to reconfigure client machines manually.

Where to deploy cache servers

The most promising locations for caching systems are points where the network aggregates large volumes of traffic (choke points such as PoPs, edge routers, etc.), points where many sessions need to be examined (e.g., perimeter firewalls), and points where there is a significant financial or performance cost associated with moving content upstream (e.g., the LAN-WAN interface, satellite links, etc.). These locations generally present the cache server with a high visibility of network traffic, enabling the cache to deal with more requests and store more content locally while improving performance and conserving upstream bandwidth.

A number of common cache locations are illustrated in Figure 7.15 and are discussed in the following list:

Deploying cache in the default router—See Figure 7.15(1). The cache server can be configured as a default router for all Internet traffic. This approach is highly transparent and requires no browser configuration but does require the cache server to operate as a full router, making it a mission-critical component of the network. Some high-end routers support policy-based routing, which enables them to get around this issue by operating as a default router and forwarding only HTTP (TCP Port 80) traffic to the cache server. In the event of cache server failure, policy routing can be configured to bypass the server, forwarding all HTTP traffic as normal.
Deploying cache in Layer 4 switches—See Figure 7.15(2), Layer 4 switches are increasingly being used to interface with cache servers. These are typically high-performance switching devices that can differentiate traffic flows in hardware and make rapid forwarding decisions based on protocols above the IP level (such as UDP or TCP port information). Some of these devices parse into the application space to identify individual HTTP sessions or URL requests. Based on these data, these switches can direct HTTP, FTP, and other application traffic to the caches and forward other traffic as standard.
Clustering with Web Cache Control Protocol (WCCP)—See Figure 7.15(3). WCCP is a protocol developed by Cisco Systems [34]. It enables Web caches to be transparently deployed using Cisco IOS-based routers. WCCPv1 runs between a router functioning as a Web redirector and a cluster of out-of-path proxies. The protocol enables one or more proxies to register with a single router to receive redirected Web traffic. It also allows one of the proxies, the designated proxy, to dictate to the router how redirected Web traffic should be distributed across the cluster. WCCP does not require any changes to the network architecture; HTTP traffic is redirected transparently to the Web cache instead of the origin server.
Web Hosting/Reverse Proxy Configuration—See Figure 7.15(4). By using a cache server configured in a reverse proxy configuration, frequently accessed content on the hosted Web sites can be cached, providing improved performance while reducing the ISP's equipment costs. Here the caching server sits in front of one or more Web servers, intercepting traffic to those servers and acting as a proxy for those servers. This also enables traffic spikes to be handled without affecting overall performance. Distributed cache servers in this mode also reduce bandwidth requirements by providing a lower-cost method of content replication compared with replicated Web servers. The proxy servers will request dynamic and other short-lived content from the origin servers as and when required. This enables content from the site to be served from a local cache instead of from the origin server. Benefits include the ability to enable load balancing, provide peak-demand insurance to assure availability, and provide dynamic mirroring of content for high availability.
NAPs, exchanges, and PoPs—See Figure 7.15(6). The major distribution points on the Internet are NAPs, exchanges, and PoPs (and so-called SuperPoPs). These are ideal places for implementing caching solutions, because they are aggregation points for large volumes of traffic. By caching frequently accessed content, service providers can free up additional bandwidth by eliminating redundant data transfers. Locating a cache at these aggregation points optimizes upstream link utilization and provides improved service to subscribers. As requests for content are made, the PoP resident cache stores the content locally, making it readily available to subsequent users. The economics of locating caches at NAPs are likely to continue to improve as ISPs employ more sophisticated traffic-engineering measures and optimize their pricing structures.
Satellite interfaces—Satellite links are expensive and notoriously slow (with latencies often exceeding 500 ms). Locating caching systems at the user side of a satellite link makes good sense, since it preserves bandwidth over the link and significantly improves latency.
International gateways—Wide area circuit tariffs vary widely around the world [1]. This makes it important for ISPs to optimize the use of expensive Internet uplink connections at gateway points between the provider's own network infrastructure and access points for international traffic (often high-volume Web content from the United States to other parts of the world). By caching upstream content, providers can significantly reduce costs and bandwidth utilization.
Access network concentrators—High-speed access links to customer premises and businesses are increasingly being provided through broadband cable and xDSL services [1]. Providers offering these services need to compensate for the relatively slow performance of the Internet in order to improve access to content for their customers, and caching locally in the access network is a good solution.
Perimeter gateways—Caching at or behind a perimeter gateway reduces upstream bandwidth requirements and provides improved service to users. If the gateway is a firewall, the performance gain could be significant, since these devices often have to perform extensive content analysis before forwarding to users (e.g., virus checking). Since the content is cached behind the firewall, this process is no longer required whenever the user requests content that is cached.

click to expand
Figure 7.15: Cache deployment options. Note that cache locations are highlighted with a C. (1) Cache server acting as a default gateway. (2) Layer 4 switches can route requests for cacheable data (HTTP, NNTP, etc.) to the cache server while forwarding all other requests to the Internet. (3) Web Cache Control Protocol (WCCP) implemented in a Cisco IOS-based router. Port 80 (HTTP) requests are routed to the cache servers, while other requests are routed to the Internet. (4) In front of a Web farm to reduce load on content servers. (5) At an ISP Point of Presence (PoP) to serve requests locally. (6) At an aggregation point at the edge of the Internet to reduce bandwidth requirements.

One of the problems in deploying a large distributed cache system is how to locate caches transparently. Reference [35] describes HyperText Caching Protocol (HTCP), an experimental protocol for discovering HTTP caches and cached data, managing sets of HTTP caches, and monitoring cache activity. The IETF is also working on a protocol called Web Proxy Autodiscovery Protocol (WPAD). WPAD is also designed to automatically locate caches and services on the network without requiring user configuration.

Capacity planning

Determining how large a cache to maintain is a trade-off between the cache hit rate and the cost of configuring disk storage. Cache capacity can range from single cache servers to clusters of servers to distributed cache hierarchies. The more content that is cached, the higher the hit rates will be for retrieving content but the higher the cost of maintaining the cache will be. Although more caches are better, each installation has to be evaluated on its own merits. Undersizing the cache will fail to provide the desired benefits, but oversizing the cache can waste resources. In general, if a scalable caching solution is used, it is best to start with a conservative amount of capacity and have a plan to expand capacity incrementally to meet demand.

Benchmarks have shown that even modest cache hit rates (of approximately 30 to 40 percent in a moderately loaded Network Access Point [NAP]) can reduce the amount of outbound bandwidth that has to be configured by the equivalent of several T3 lines. This means that a provider can deliver more content to customers with less outlay for bandwidth, effectively providing more bandwidth per dollar. This allows service providers to better manage their network build-out, which is critical since it may not always be available to add.

To get a sense of the magnitude of the possible economic benefits to network access providers, assume that one T3 line carries 45 Mbps of data and 40 percent of that is HTTP traffic. A conservative cache hit rate of 40 percent would take 40 percent of Web traffic off the backbone. Caching additional protocols, such as NNTP and FTP, would result in even greater savings. Just using HTTP traffic for this example, the access provider would have a 16 percent reduction in backbone traffic (0.40 × 0.40 = 0.16). If bandwidth costs are estimated at $1,000 per Mbps per month, then a network cache in front of a metered T3 with Internet access can save an ISP $7,200 per month or $86,000 per year. For a large ISP with multiple T3 lines, these savings can add up quickly.

Cache performance metrics

Definitions of cache performance metrics are as follows:

Response time is the primary metric of cache performance. This is the mean time taken for a requested content to be delivered to the user. In terms of Web page content this includes all objects encapsulated by that page. A typical HTML page is made up of many embedded objects, such as GIF and JPEG images and sounds, Java applets, and so on. The browser must individually retrieve each object before it can completely render the page. A user's perception of Web performance is based on the time it takes to access completely each page. Individual page latency varies significantly, depending upon the number of objects encapsulated and the different types of objects (simple pages can have less than ten objects and rich pages can contain 50 or more objects, some of them more complex).
Hit ratio is the percentage of requests served by the cache, as a proportion of all requests, and central to any measurement of cache performance. Response time metrics are heavily influenced by those requests that lie within the hit ratio (faster) and those outside (slower). A low hit ratio of 20 percent means that 80 percent of requests are required to traverse the entire network in order to communicate with the origin server, while only 20 percent of requests are serviced locally.
Response time under load indicates how network scalability is affected by the ability of caching systems to deliver consistent response times as cache load increases. For example, at any given time, there will be many concurrent HTTP object requests to the cache, and the load on the cache is the rate at which it is delivering requested objects to clients. HTTP object access times will also vary with the load. As the load increases, the time to retrieve a given object from cache is likely to increase, and hence the response time for a complete page is likely to increase. If the load on a cache exceeds a threshold where the cache stops responding consistently to requests, then the caching system effectively acts as a performance bottleneck.

In effect every cache miss degrades response times and places more unnecessary traffic on the backbone. The challenge in designing efficient caching systems is that in order to achieve a high hit ratio more resources need to be made available to that system to maintain connection states, and the management of these additional resources may limit scalability. As the system is loaded, we should ideally be looking for a fairly constant average response time until the load threshold of the system is reached; the response time should not oscillate as load is increased. At very high loads there will be internal system limitations that come into play (such as disk and memory access times). Some cache systems may also store content using off-the-shelf general-purpose file systems; this is likely to be a significant performance constraint. A dedicated high-performance caching system will almost certainly require a highly optimized storage system to minimize the number of disk seek and I/O operations per object. High-performance caches also query origin servers using asynchronous checks for content freshness, since synchronous checks would penalize response times. For further details of performance analysis and methodologies applicable for modeling Web performance, the interested reader is referred to [36].

Measurement techniques

Most network cache performance measurement techniques are derived from benchmarks designed for measuring Web server performance. They use synthetic workloads, assume that all objects are cacheable, and try to simulate, if not eliminate, the behavior of the Internet by using local servers and isolated networks. They focus on throughput only. A more realistic test tool is available from [37]. This test is end-user oriented and focuses on the Web page response time that users of a cache would experience. It measures this under different loads on the cache. It uses real workloads derived from traces collected at network access points and accesses objects live on the Internet (including approximately 2,500 URLs of popular business-oriented Web pages). The workloads are generated using URLs from existing Web proxy access logs. For these tests, we used workloads derived from the sanitized weekly access logs maintained by the National Laboratory for Applied Network Research (NLANR) [38].

Cache products

One of the best-known caches deployed on the Internet is Squid, a public domain UNIX tool maintained by the National Laboratory for Applied Network Research (NLANR). Squid caches Web and FTP content, as well as DNS lookups. It also supports hierarchical caching, allowing a group of caches to improve performance through cooperation. Several vendors sell commercial caching software, including CacheFlow [37] and Proxy Server from Netscape Communications Corp. (a Squid derivative). Proxy Server from Microsoft Corp. and Bordermanager from Novell, Inc. also support hierarchical caching. NNTP Cache, a freeware UNIX product, caches Usenet news.