12.5 How to Benchmark a Proxy Cache

only for RuBoard - do not distribute or recompile

12.5 How to Benchmark a Proxy Cache

Proxy cache benchmarking is tricky because many systems are involved, and many things can go wrong. To ensure we're really measuring the proxy's performance, we need to eliminate other devices and systems as potential bottlenecks and sources of uncertainty. For example, all computer systems used to drive the benchmark should be identically configured. Also, we need to test the networking equipment to guarantee that it's up to the task.

12.5.1 Configure Systems

The first step is to select and configure a number of systems for use as clients and servers. The number of machines you need depends on the total throughput you intend to achieve. For Web Polygraph, you can plan on 400 “500 requests per second for each client-server pair. Other benchmarks may have different characteristics. Later, I'll describe how to run a test that proves your systems can adequately generate the load.

Benchmarking systems should be dedicated to their task. You don't want other processes running that can interfere with the measurements. For example, a busy server running on the same machine can use up significant amounts of bandwidth. Even worse , a runaway program can use significant amounts of memory or CPU time, starving the benchmarking processes and adversely affecting your results.

Continuing along these lines, any unneeded services or processes should be disabled. For example, don't run lpd , cron , sendmail , portmap , or even inetd . The standard FreeBSD installation, as an example, has a nightly cron job that runs find on all mounted filesystems. This I/O intensive process takes resources away from the benchmark and leads to unreproducible results.

Be sure to configure and run xntpd on all systems. Having clocks synchronized avoids certain problems interpreting HTTP headers that have dates. Additionally, it's much easier to correlate logfiles written on different systems. You might feel it is sufficient to simply synchronize clocks once, before a test starts, but some system clocks drift significantly over time. If your test is very long, there is a chance that one system will finish before or after the others.

Because a benchmark uses a lot of networking resources, you may need to tune your system or build a new kernel. It's likely you'll need to increase or change the following parameters:

Open file descriptor limit
Maximum number of threads
TCP delayed ACKs
Socket listen queue length
Number of mbufs or mbuf clusters
Ephemeral port range

12.5.2 Test the Network

Testing and benchmarking the network is critical. You don't want to spend a lot of time measuring cache performance only to discover later that a NIC was in half-duplex mode or your switch is a bottleneck.

First of all, use ping to make sure all systems can talk to each other. If you're using a routed environment, this also ensures you have the routes configured correctly. It's a good idea to let ping run for a while so it sends 10 “20 packets. If you observe any packet loss, investigate the problem. Bad Ethernet cables are a likely culprit.

Next , run some TCP throughput tests. Good tools for this are netperf (http://www.netperf.org) and nttcp (http://users.leo.org/~bartel/nttcp/). Be sure to run throughput tests in both directions on each host at the same time. In other words, make sure each host is both sending and receiving traffic. I recommend running the throughput test for about five minutes. For a full-duplex , switched network, each host should achieve 88 “94 Mbps in each direction. If you're using a hub instead of a switch, the peak throughput will be much less, because every host must compete for 100 Mbps total bandwidth, and each NIC runs in half-duplex mode. A test between two hosts should result in about 37 Mbps in each direction. If you get significantly lower throughput than you expect, it probably means one of your Ethernet cables is bad. A measurement slightly lower than expected is more difficult to diagnose. It may be due to an inefficient NIC driver, inadequate CPU power, or a switch or router that cannot support the full network bandwidth.

12.5.3 No-Proxy Test

By this point, you should be confident that your network can support a lot of TCP traffic. The next step is to make sure that the benchmarking systems can generate enough load to drive the proxy cache under test. To do this, I recommend running a no-proxy test. The no-proxy test is important for testing the system as a whole, minus the proxy cache. If successful, it helps to eliminate the client/server machines and networking equipment as a potential bottleneck. It also helps to establish a worst-case measurement you can use in a before-and-after comparison. If this test fails, then you know any measurements made against the proxy cache would be useless.

The no-proxy test uses the same workload you will use on the proxy cache. However, instead of sending requests to the proxy, they are sent directly from clients to servers. Some benchmarking software may not support this mode of operation. With Polygraph, you just comment out the --proxy command-line option. If you're using a layer four switch to divert traffic, simply disable the redirection temporarily. The no-proxy test should run for at least 30 minutes. Be sure to save the log files and other results from this test for later comparison with the proxy cache results.

12.5.4 Fill the Cache

We're almost ready to measure the proxy cache's performance. Before doing that, however, make sure the cache's disks are full. If the cache is already full, you may be able to skip this step. However, if reproducible results are really important, you should flush the cache and fill it again. I have seen considerable evidence that the method used to fill a cache affects subsequent results.

Ideally, you should fill the cache the same workload characteristics you are measuring with. A cache filled with really large objects, for example, leads to too few delete operations during the measurement phase. During the first industry Cache-Off, caches were filled with a best-effort workload, and the participants were allowed to choose the number of agents to use. One participant discovered that their product achieved higher throughput during the measurement phase when the cache was filled with one best-effort agent, while multiple agents resulted in lower throughput. This is probably because files are written to disk sequentially under the single-agent workload. When these files are later removed, the disk is left with relatively large sections of contiguous free space, which in turn results in faster write times for new objects.

12.5.5 Run the Benchmark

Finally, we're ready to actually benchmark the proxy cache. As I've already mentioned, this test should run for at least six hours. Longer tests are more likely to achieve, and thus measure, steady-state conditions.

The benchmark should record performance data as it runs. You may also want to collect your own statistics from the cache during the benchmark. For example, you can probably use SNMP tools as described in Chapter 11. It's possible, but unlikely in my experience, that such SNMP monitoring can adversely affect the cache's performance; if you're planning to use SNMP on your production caches, you'd probably like to know about this anyway.

Once the test completes, you'll want to analyze the results. Usually this means plotting graphs of various measurements versus time. By looking at the graphs, you should be able to tell whether the product's performance meets your expectations. If you observe spikes or very sudden changes, you may want to investigate these further. Most likely, you'll see something that's not quite right and will want to run another test after making an adjustment.

only for RuBoard - do not distribute or recompile