Recipe 10.5. Calculating the Rate of Client Cache Hits on ApacheCredit: Mark Nenadov ProblemYou need to monitor how often client requests are refused by your Apache web server because the client's cache of the page is already up to date. SolutionWhen a browser queries a server for a page that the browser has in its cache, the browser lets the server know about the cached data, and the server returns a special error code (rather than serving the page again) if the client's cache is up to date. Here's how to find the statistics for such occurrences in your server's logs: def clientCachePercentage(logfile_pathname): contents = open(logfile_pathname, "r") totalRequests = 0 cachedRequests = 0 for line in contents: totalRequests += 1 if line.split(" ")[8] == "304": # if server returned "not modified" cachedRequests += 1 return int(0.5+float(100*cachedRequests)/totalRequests) DiscussionThe percentage of requests to your Apache server that are met by the client's own cache is an important factor in the perceived performance of your server. The code in this recipe helps you get this information from the server's log. Typical use would be: log_path = "/usr/local/nusphere/apache/logs/access_log" print "Percentage of requests that were client-cached: " + str( clientCachePercentage(log_path)) + '%' The recipe reads the log file one line at a time by looping over the filethe normal way to read a file nowadays. Trying to read the whole log file in memory, by calling the readlines method on the file object, would be an unsuitable approach for very large files, which server log files can certainly be. That approach might not work at all, or might work but damage performance considerably by swamping your machine's virtual memory. Even when it works, readlines offers no advantage over the approach used in this recipe. The body of the for loop calls the split method on each line string, with a string of a single space as the argument, to split the line into a tuple of its space-separated fields. Then it uses indexing ([8]) to get the ninth such field. Apache puts the error code into the ninth field of each line in the log. Code "304" means "not modified" (i.e., the client's cache was already correctly updated). We count those cases in the cachedRequests variable and all lines in the log in the totalRequests variable, so that, in the end, we can return the percentage of cache hits. The expression we use in the return statement computes the percentage as a float number, then rounds it correctly to the closest int, because an integer result is most useful in practice. See AlsoThe Apache web server is available and documented at http://httpd.apache.org. |