Recipe10.5.Calculating the Rate of Client Cache Hits on Apache


Recipe 10.5. Calculating the Rate of Client Cache Hits on Apache

Credit: Mark Nenadov

Problem

You need to monitor how often client requests are refused by your Apache web server because the client's cache of the page is already up to date.

Solution

When a browser queries a server for a page that the browser has in its cache, the browser lets the server know about the cached data, and the server returns a special error code (rather than serving the page again) if the client's cache is up to date. Here's how to find the statistics for such occurrences in your server's logs:

def clientCachePercentage(logfile_pathname):     contents = open(logfile_pathname, "r")     totalRequests = 0     cachedRequests = 0     for line in contents:         totalRequests += 1         if line.split(" ")[8] == "304":             # if server returned "not modified"             cachedRequests += 1     return int(0.5+float(100*cachedRequests)/totalRequests)

Discussion

The percentage of requests to your Apache server that are met by the client's own cache is an important factor in the perceived performance of your server. The code in this recipe helps you get this information from the server's log. Typical use would be:

log_path = "/usr/local/nusphere/apache/logs/access_log" print "Percentage of requests that were client-cached: " + str(        clientCachePercentage(log_path)) + '%'

The recipe reads the log file one line at a time by looping over the filethe normal way to read a file nowadays. Trying to read the whole log file in memory, by calling the readlines method on the file object, would be an unsuitable approach for very large files, which server log files can certainly be. That approach might not work at all, or might work but damage performance considerably by swamping your machine's virtual memory. Even when it works, readlines offers no advantage over the approach used in this recipe.

The body of the for loop calls the split method on each line string, with a string of a single space as the argument, to split the line into a tuple of its space-separated fields. Then it uses indexing ([8]) to get the ninth such field. Apache puts the error code into the ninth field of each line in the log. Code "304" means "not modified" (i.e., the client's cache was already correctly updated). We count those cases in the cachedRequests variable and all lines in the log in the totalRequests variable, so that, in the end, we can return the percentage of cache hits. The expression we use in the return statement computes the percentage as a float number, then rounds it correctly to the closest int, because an integer result is most useful in practice.

See Also

The Apache web server is available and documented at http://httpd.apache.org.



Python Cookbook
Python Cookbook
ISBN: 0596007973
EAN: 2147483647
Year: 2004
Pages: 420

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net