10.2 Disk Space

only for RuBoard - do not distribute or recompile

10.2 Disk Space

When buying or building a proxy cache, it is important to have enough disk space. If your disk size is too small, then your cache replaces valuable objects that otherwise would result in cache hits. It's okay to have too much disk space, but after some point, adding more space does not significantly increase your hit ratio. Figuring out the right amount of disk space is complicated because you must consider a number of parameters and overheads. The following advice should help you get started.

As a rule of thumb, your cache should take at least three days to fill up. This magic number of three days comes from empirical observations and analysis of real Web traffic. It is essentially the average time that particular web objects remain popular and valuable. To find out how much disk space we can fill in three days, we need to perform a number of calculations, beginning with the rate that HTTP traffic enters your organization's network.

Ideally, you already have a sense of how much HTTP traffic comes into your network. We're looking for a number with units of bytes per second, averaged over a 24-hour period. Any network measurement tool should be able to give you a breakdown by port number. Here, we're mostly interested in port 80 traffic, although there is likely to be a small amount of HTTP traffic on other ports as well. If you have no idea, you can use your network connection speed as an upper limit. For example, lets say that your company has two T1 connections to the Internet. Together, these can carry about 3Mbps, or 375KB per second. If your company is typical, then 60% of your total traffic is HTTP. If you suspect your company is not typical in this regard, then adjust accordingly . Given two ( saturated !) T1s, we can assume a 24- hour average HTTP traffic rate of 225KB/second.

Remember that we're interested in how quickly the cache fills up. Thus, we have to account for cache hits, which do not cause objects to be written to disk. If your organization is small, you can expect a 25% byte hit ratio. Medium- sized organizations can expect 35%, and large ones can expect 45% or higher. To get the HTTP miss traffic rate, we multiply the total rate by the miss ratio, which is the opposite of the hit ratio. Let's say our two T1s service a medium-sized organization. Our HTTP miss rate is 225 x 65%, or about 150KB/second.

Next , we also need to account for the fact that some HTTP requests are not cachable. On an average site, roughly 20% of cache misses are not cachable . Thus, we subtract that 20% from the HTTP miss rate. In our example, we now have 117KB/second.

Now we know the HTTP cachable miss rate, or the cache fill rate, in terms of bytes/sec. To find out how much space we need, we simply multiply by 3 days. Continuing our example, 117KB/second x 3 days x 86,400 seconds/day comes out to about 30GB.

If you are specifying a cache for the first time, you probably want to add some extra capacity for future growth. In most cases, doubling the size estimate based on three days of current traffic should provide you with plenty of capacity for the next year or two.

Keep in mind that the formula calculates cache storage size, not physical disk size. This is particularly important if you are using a software solution. Your software solution may use the operating system's favorite filesystem, and some of the disk space is lost to overheads such as inodes, directories, file allocation tables, etc. It's a good idea to increase your estimate by 25% for these overheads. Also remember that hard drive vendors cheat when they tell you about disk sizes. The vendor may say 20GB, but your operating system reports 18GB, even before formatting.

Finally, keep these points in mind when going through this calculation:

  • Bandwidth estimates are averaged over a 24-hour period, which is probably different from your peak utilization.

  • These are only guidelines, not requirements. If you can't afford a huge amount of disk space, a smaller cache is still useful.

  • Any of the preceding numbers (e.g., hit ratio, cachable ratio) are probably different for your organization. Unfortunately, you won't have exact figures until you install a proxy cache and take your own measurements.

only for RuBoard - do not distribute or recompile


Web Caching
Web Caching
ISBN: 156592536X
EAN: N/A
Year: 2001
Pages: 160

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net