Apache provides several tools for managing your logs. Other Apache-specific third-party tools are available and are mentioned here. Because Apache can log requests in the CLF, most generic log processing tools can be used with Apache as well.
Earlier in the chapter, you learned how to use the HostNameLookups directive to enable or disable hostname resolution at the time the request is made. If HostNameLookups is set to off (the default), the log file will contain only IP addresses. Later, you can use the command-line logresolve utility on UNIX or logresolve.exe on Windows to process the log file and convert the IP addresses to hostnames.
The logresolve utility reads log entries from standard input and outputs the result to its standard output. To read to and from a file, you can use redirection on both UNIX and Windows:
logresolve < access.log > resolved.log
Log-resolving tools are efficient because they can cache results and do not cause any delay when serving requests to clients.
In websites with high traffic, access log files can quickly grow in size. You should have a mechanism to rotate logs periodically, archiving and compressing older logs at defined intervals.
Log files should not be removed while Apache is running because the server is writing directly to them. A solution would be to use an intermediate program to log the requests. The program will, in turn, take care of rotating the logs.
Apache provides the rotatelogs program on Unix and rotatelogs.exe on Windows for this purpose. It accepts three arguments: a filename, a rotate interval in seconds, and an optional offset in minutes against UTC (Coordinated Universal Time).
TransferLog "|bin/rotatelogs /var/logs/apachelog 86400"
creates a new log file and moves the current log to the /var/logs directory daily. (At the end of the command, 86400 is the number of seconds in one day.)
By the Way
If the path to the program includes spaces, you might need to escape them by prefixing them with a \ (backslash)for example, My\ Documents. This is especially common in the Windows platform.
If the name of the file includes % prefixed options, the name will be treated as input to the strftime function that converts the % options to time values. The manual page for the rotatelogs utility contains a complete listing of options, but here's an example:
TransferLog "|bin/rotatelogs /var/logs/apachelog%m_%d_%y 86400"
This command adds the current month, day, and year to the log filename.
If the name does not include any %-formatted options, the current time in seconds is added to the name of the archived file.
Merging and Splitting Logs
When you have a cluster of web servers serving similar content, perhaps behind a load balancer, you often need to merge the logs from all the servers in a unique log stream before passing the log to analysis tools.
Similarly, if a single Apache server instance handles several virtual hosts, sometimes it is useful to split a single log file into different files, one per each virtual host.
Logtools is a collection of log-manipulation tools that can be found at http://www.coker.com.au/logtools/. Additionally, Apache includes the split-file Perl script for splitting logs. You can find it in the support subdirectory of the Apache distribution.
After you collect the logs, you can analyze them and gain information about traffic and visitor behavior.
Many commercial, shareware, and freeware applications are available for log analysis and reporting. Two popular open source applications are Webalizer (http://www.mrunix.net/webalizer/) and awstats (http://awstats.sourceforge.net/).
Wusage is a nice, inexpensive commercial alternative and can be found at http://www.boutell.com/wusage/.
Monitoring Error Logs
If you run Apache on a UNIX system, you can use the tail command-line utility to monitor in real-time log entries to both your access and error logs. The syntax is
tail -f logname
where logname is the path to the Apache log file. It will print onscreen the last few lines of the log file and will continue to print entries as they are added to the file.
You can find additional programs that enable you to quickly identify problems by scanning your error log files for specific errors, malformed requests, and so on, and reporting on them: