12.1 Metrics

only for RuBoard - do not distribute or recompile

There are a number of measurements that relate to proxy cache performance. The most common are throughput, response time, and hit ratio. For web caches, these metrics are related and tend to influence each other. For example, a decrease in hit ratio usually results in an increase in mean response time. To fully understand a benchmark result, you need to consider all the metrics. Don't look at just the throughput. It's quite possible that two products can achieve the same throughput but with much different hit ratios.

Some people find multiple metrics confusing because they complicate product comparisons. Is a product with high throughput and high response time better than one with low throughput and low response time? The answer depends on individual circumstances and the attributes most valued by the user . Other types of benchmarks report a single measurement or combine many measurements into a single value. While this makes it possible to easily rank products, I feel that doing so for proxy caches does more harm than good. Combining multiple measurements into a single value makes it difficult for people to evaluate products in the areas they feel are the most important.

12.1.1 Throughput

Throughput is a metric common to many types of benchmarks. Database benchmarks report throughput as transactions per second. CPU benchmarks measure instructions per second. For web caches, it's HTTP responses per second. Some people say " requests per second" or even "URLs per second" instead. The three phrases are equivalent for most purposes.

Throughput is not always a measured value. For some tests, it is an input parameter. That is, the workload specifies a particular offered request rate. Instead of asking, "What's the maximum throughput this product can support?" the question becomes, "What is this product's response time and hit ratio at X requests per second?"

Peak throughput varies significantly among products. In recent industry Cache-Off tests [Rousskov and Wessels, 2000], the lowest was 115 responses per second, while the maximum was about 3,300. A cache's peak throughput determines how much bandwidth and how many simultaneous users it can support. If you have more traffic than one box can handle, you can scale the peak throughput linearly by adding additional boxes.

Some network administrators are more comfortable thinking in terms of bandwidth ”megabits per second ”rather than responses per second. To convert from response rate to bandwidth, simply multiply by the mean response size. For example, in the Cache-Off tests mentioned earlier, the mean response size is 10.9KB. Thus, the lowest tested throughput was 9.7 Mbps, while the highest was 281 Mbps. Real traffic reply size distributions vary from location to location. If you're not sure what to use, 10KB is probably a good approximation . Note that responses, flowing from servers to clients , account for the majority of bandwidth. Requests also consume some bandwidth in the other direction, but the amount is almost insignificant since requests are typically 50 times smaller than responses.

12.1.2 Response Time

Response time measures how quickly a cache responds to requests. For a single request, it's the amount of time elapsed between sending the request and receiving the end of the response. Since individual response times vary greatly, it's useful to keep a histogram of them. From the histogram we can report the mean, median, percentiles, and other statistical measures.

Response times for cache misses depend largely on network and origin server delays. The mean response time for real web traffic is typically 2 “3 seconds. Cache hits, on the other hand, have very small response times, usually a few hundred milliseconds . Thus, the cache hit ratio affects the overall response time. As the hit ratio goes down, more requests result in cache misses, which drives up the mean response time.

If a proxy cache becomes overloaded, both cache hits and misses can experience additional delays. In some cases, a cache becomes so busy that hits take longer than misses. When this happens, you would get better response time with no caching at all. However, even a slow and busy cache can reduce your network bandwidth usage.

12.1.3 Hit Ratio

Hit ratio measures the effectiveness of the cache and the amount of bandwidth saved. Recall that there are two hit ratio measurements: cache hit ratio and byte hit ratio. The first refers to the number of requests made, the latter to the number of bytes transferred. Since small objects are more popular than large ones, byte hit ratio is normally smaller than cache hit ratio. When a result says just "hit ratio," it probably refers to cache hit ratio.

A particular workload in a test environment has an upper bound on achievable hit ratio, based on an ideal cache. An ideal cache has an infinite size, stores all cachable responses, and returns a hit whenever possible. Of course, real caches are not quite ideal and may not achieve the maximum hit ratio for a number of reasons. The disk capacity may not be large enough. The product's rules for caching may be different from what the benchmark software expects. Some caching products are able to bypass their disk storage (for both reads and writes ) when faced with high loads. A cachable response that doesn't get stored results in a missed opportunity for a cache hit later on.

12.1.4 Connection Capacity

You might be interested to know how many simultaneous connections a product can handle. To get an approximation, simply multiply the throughput by the response time. Connection capacity is usually not reported as a separate metric because it is rarely a limiting factor.

12.1.5 Cost

It may seem strange to include cost as a metric. However, the clich "you get what you pay for" is true for web caches. Certainly, you would expect a $50,000 solution to perform better than one costing $5,000. The cost of a product is a good way to normalize the performance metrics, such as throughput. By including cost, you can calculate how many cheap boxes you need to get the same performance as one expensive box. Some benchmarks, such as the Transaction Processing Performance Council (TPC) (http://www.tpc.org), which tests databases, take this a step further and report long- term cost of ownership information. For example, how much does a support contract cost? What about power and cooling requirements? How much space does the product need?

only for RuBoard - do not distribute or recompile