Standard Apache Access Logging


Using Apache's basic logging features, you can keep track of who visits your websites by logging accesses to the servers hosting them. You can log every aspect of the browser requests and server responses, including the IP address of the client, user, and resource accessed. You need to take three steps to create a request log:

1.

Define what you want to logyour log format.

2.

Define where you want to log ityour log files, a database, an external program.

3.

Define whether to logconditional logging rules.

The next few sections take a closer look at these steps.

Deciding What to Log

As well as logging nearly every aspect associated with the request, you can define how your log entries appear by creating a log format. A log format is a string that contains text mixed with log formatting directives. Log formatting directives start with a % and are followed by a directive name or identifier, usually a letter indicating the piece of information to be logged.

When Apache logs a request, it scans the string and substitutes the value for each directive. For example, if the log format is This is the client address %a, the log entry is something like This is the client address 10.0.0.2. That is, the logging directive %a is replaced by the IP address of the client making the request. Table 26.1 provides a comprehensive list of all formatting directives.

Table 26.1. Log Formatting Directives

Formatting Options

Explanation

Data from the Client

 

%a

Remote IP address, from the client.

%h

Hostname or IP address of the client making the request. Whether or not the hostname is logged depends on two factors: The IP address of the client must resolve to a hostname via a reverse DNS lookup, and Apache must be configured to do that lookup using the HostNameLookups directive, explained later in this chapter. If these conditions are not met, the IP address of the client will be logged instead of the hostname.

%l

Remote user, obtained via the identd protocol. This option is not very useful because this protocol is not supported on the majority of the client machines.

%u

Remote user, from the HTTP basic authentication protocol.

Data from the Server

 

%A

Local IP address, from the server.

%D

Time it took to serve the request, in microseconds.

%{env_variable}e

Value for an environment variable named env_variable (there are many).

%{time_format}t

Current time. If {time_format} is present, it will be inter-preted as an argument to the UNIX strftime function. See the logresolve Apache manual page for details.

%T

Time it took to serve the request, in seconds.

%v

Canonical name of the server that answered the request.

%V

Server name according to the UseCanonicalName directive.

%X

Status of the connection to the server. A value of x means the connection was aborted before the server could send the data. A + means the connection will be kept alive for further requests from the same client. A - means the connection will be closed.

Data from the Request

 

%{cookie_name}C

Value for a cookie named cookie_name.

%H

Request protocol, such as HTTP or HTTPS.

%m

Request method such as GET, POST, PUT, and so on.

%{header_name}i

Value for a header named header_name in the request from the client. This information can be useful, for example, to log the names and versions of your visitors' browsers.

%r

Text of the original HTTP request.

%q

Query parameters, if any, prefixed by a ?.

%U

Requested URL, without query parameters.

%y

Username for the HTTP authentication (basic or digest).

Data from the Response

 

%b, %B

Size, in bytes, of the body of the response sent back to the client (excluding headers). The only difference between the options is that if no data was sent, %b will log a - and %B will log 0.

%f

Path of the file served, if any.

%t

Time when the request was served.

%{header_name}o

Value for a header named header_name in the response to the client.

%>s

Final status code. Apache can process several times the same request (internal redirects). This is the status code of the final response.


The Common Log Format (CLF) is a standard log format. Most websites can log requests using this format, and the format is understood by many log processing and reporting tools. Its format is the following:

"%h %l %u %t \"%r\" %>s %b"


That is, it includes the hostname or IP address of the client, remote user via identd, remote user via HTTP authentication, time when the request was served, text of the request, status code, and size in bytes of the content served.

By the Way

You can read the Common Log Format documentation of the original W3C server at http://www.w3.org/Daemon/User/Config/Logging.html.


The following is a sample CLF entry:

10.0.0.1 - - [23/Apr/2006:11:27:56 -0800] "GET / HTTP/1.1" 200 1456


You are now ready to learn how to define log formats using the LogFormat directive. This directive takes two arguments: The first argument is a logging string, and the second is a nickname that will be associated with that logging string.

For example, the following directive from the default Apache configuration file defines the CLF and assigns it the nickname common:

LogFormat "%h %l %u %t \"%r\" %>s %b" common


You can also use the LogFormat directive with only one argument, either a log format string or a nickname. This will have the effect of setting the default value for the logging format used by the TRansferLog directive, explained in "Logging Accesses to Files" later in this chapter.

The HostNameLookups Directive

When a client makes a request, Apache knows only the IP address of the client. Apache must perform what is called a reverse DNS lookup to find out the hostname associated with the IP address. This operation can be time-consuming and can introduce a noticeable lag in the request processing. The HostNameLookups directive allows you to control whether to perform the reverse DNS lookup.

The HostNameLookups directive can take one of the following arguments: on, off, or double. The default is off. The double lookup argument means that Apache will find out the hostname from the IP and then will try to find the IP from the hostname. This process is necessary if you are really concerned with security, as described in http://httpd.apache.org/docs-2.0/dns-caveats.html. If you are using hostnames as part of your Allow and Deny rules, a double DNS lookup is performed regardless of the HostNameLookups settings.

If HostNameLookups is enabled (on or double), Apache will log the hostname. This causes extra load on your server, which you should be aware of when making the decision to turn HostNameLookups on or off. If you choose to keep HostNameLookups off, which would be recommended for medium-to-high traffic sites, Apache will log only the associated IP address. There are plenty of tools to resolve the IP addresses in the logs later. Refer to the "Managing Apache Logs" section later in this chapter. Additionally, the result will be passed to CGI scripts via the environment variable REMOTE_HOST.

The IdentityCheck Directive

At the beginning of the chapter, we explained how to log the remote username via the identd protocol using the %l log formatting directive. The IdentityCheck directive takes a value of on or off to enable or disable checking for that value and making it available for inclusion in the logs. Because the information is not reliable and takes a long time to check, it is switched off by default and should probably never be enabled. We mentioned %l only because it is part of the CLF. For more information on the identd protocol, see RFC 1413 at http://www.rfc-editor.org/rfc/rfc1413.txt.

Status Code

You can specify whether to log specific elements in a log entry. At the beginning of the chapter, you learned that log directives start with a %, followed by a directive identifier. In between, you can insert a list of status codes, separated by commas. If the request status is one of the listed codes, the parameter will be logged; otherwise, a - will be logged.

For example, the following directive identifier logs the browser name and version for malformed requests (status code 400) and requests with methods not implemented (status code 501). This information can be useful for tracking which clients are causing problems.

%400,501{User-agent}i


You can precede the method list with an ! to log the parameter if the methods are implemented:

%!400,501{User-agent}i


Logging Accesses to Files

Logging to files is the default way of logging requests in Apache. You can define the name of the file using the TRansferLog and CustomLog directives.

The TRansferLog directive takes a file argument and uses the latest log format defined by a LogFormat directive with a single argument (the nickname or the format string). If no log format is present, it defaults to the CLF.

The following example shows how to use the LogFormat and transferLog directives to define a log format that is based on the CLF but that also includes the browser name:

LogFormat "%h %l %u %t \"%r\" %>s %b \"%{User-agent}i\"" TransferLog logs/access_log


The CustomLog directive enables you to specify the logging format explicitly. It takes at least two arguments: a logging format and a destination file. The logging format can be specified as a nickname or as a logging string directly.

For example, the directives

LogFormat "%h %l %u %t \"%r\" %>s %b \"%{User-agent}i\"" myformat CustomLog logs/access_log myformat


and

CustomLog logs/access_log "%h %l %u %t \"%r\" %>s %b \"%{User-agent}i\""


are equivalent.

Logging Environment Variables with CustomLog

The CustomLog directive accepts an environment variable as a third argument. If the environment variable is present, the entry will be logged; otherwise, it will not. If the environment variable is negated by prefixing an ! to it, the entry will be logged if the variable is not present.

The following example shows how to avoid logging images in GIF and JPEG format in your logs:

SetEnvIf Request_URI "(\.gif|\.jpg)$" image CustomLog logs/access_log common env=!image


By the Way

The regular expression used for pattern matching in this and other areas of the httpd.conf file follow the same format for regular expressions in PHP and other programming languages.


Logging Accesses to a Program

Both transferLog and CustomLog directives can accept an executable program, prefixed by a pipe sign |, as an argument. Apache will write the log entries to the standard input of this program. The program will, in turn, process the input by logging the entries to a database, transmitting them to another system, and so on.

If the program dies for some reason, the server makes sure that it is restarted. If the server stops, the program is stopped as well. The rotatelogs utility, bundled with Apache and explained later in this chapter, is an example of a logging program.

As a general rule, unless you have a specific requirement for using a particular program, it is easier and more reliable to log to a file on disk and do the processing, merging, analysis of logs, and so on, at a later time, possibly on a different machine.

By the Way

Make sure that the program you use for logging requests is secure because it runs as the user Apache was started with. On UNIX, this usually means root because the external program will be started before the server changes its user ID to the value of the User directive, typically nobody or www.





Sams Teach Yourself PHP, MySQL And Apache All in One
Sams Teach Yourself PHP, MySQL and Apache All in One (3rd Edition)
ISBN: 0672328739
EAN: 2147483647
Year: 2004
Pages: 327

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net