Log Formats

21.2 Log Formats

Several log formats have become standard, and we'll discuss some of the most common formats in this section. Most commercial and open source HTTP applications support logging in one or more of these common formats. Many of these applications also support the ability of administrators to configure log formats and create their own custom formats.

One of the main benefits of supporting (for applications) and using (for administrators) these more standard formats rests in the ability to leverage the tools that have been built to process and generate basic statistics from these logs. Many open source and commercial packages exist to crunch logs for reporting purposes, and by utilizing standard formats, applications and their administrators can plug into these resources.

21.2.1 Common Log Format

One of the most common log formats in use today is called, appropriately, the Common Log Format. Originally defined by NCSA, many servers use this log format as a default. Most commercial and open source servers can be configured to use this format, and many commercial and freeware tools exist to help parse common log files. Table 21-1 lists, in order, the fields of the Common Log Format.

Table 21-1. Common Log Format fields

Field

Description

remotehost

The hostname or IP address of the requestor's machine (IP if the server was not configured to perform reverse DNS or cannot look up the requestor 's hostname)

username

If an ident lookup was performed, the requestor's authenticated username [1]

auth-username

If authentication was performed, the username with which the requestor authenticated

timestamp

The date and time of the request

request-line

The exact text of the HTTP request line, "GET /index.html HTTP/1.1"

response-code

The HTTP status code that was returned in the response

response-size

The Content-Length of the response entityif no entity was returned in the response, a zero is logged

[1] RFC 931 describes the ident lookup used in this authentication. The ident protocol was discussed in Chapter 5 .

Example 21-1 lists a few examples of Common Log Format entries.

Example 21-1. Common Log Format
 209.1.32.44 - - [03/Oct/1999:14:16:00 -0400] "GET / HTTP/1.0" 200 1024 
 http-guide.com - dg [03/Oct/1999:14:16:32 -0400] "GET / HTTP/1.0" 200 477 
 http-guide.com - dg [03/Oct/1999:14:16:32 -0400] "GET /foo HTTP/1.0" 404 0 

In these examples, the fields are assigned as follows :

Field

Entry 1

Entry 2

Entry 2

remotehost

209.1.32.44

http-guide.com

http-guide.com

username

<empty>

<empty>

<empty>

auth-username

<empty>

dg

dg

timestamp

03/Oct/1999:14:16:00 -0400

03/Oct/1999:14:16:32 -0400

03/Oct/1999:14:16:32 -0400

request-line

GET / HTTP/1.0

GET / HTTP/1.0

GET /foo HTTP/1.0

response-code

200

200

404

response-size

1024

477

Note that the remotehost field can be either a hostname, as in http-guide.com , or an IP address, such as 209.1.32.44.

The dashes in the second (username) and third (auth-username) fields indicate that the fields are empty. This indicates that either an ident lookup did not occur (second field empty) or authentication was not performed (third field empty).

21.2.2 Combined Log Format

Another commonly used log format is the Combined Log Format. This format is supported by servers such as Apache. The Combined Log Format is very similar to the Common Log Format; in fact, it mirrors it exactly, with the addition of two fields (listed in Table 21-2 ). The User -Agent field is useful in noting which HTTP client applications are making the logged requests , while the Referer field provides more detail about where the requestor found this URL.

Table 21-2. Additional Combined Log Format fields

Field

Description

Referer

The contents of the Referer HTTP header

User-Agent

The contents of the User-Agent HTTP header

Example 21-2 gives an example of a Combined Log Format entry.

Example 21-2. Combined Log Format
 209.1.32.44 - - [03/Oct/1999:14:16:00 -0400] "GET / HTTP/1.0" 200 1024 "http://www.joes- 
 hardware.com/" "5.0: Mozilla/4.0 (compatible; MSIE 5.0; Windows 98)" 

In Example 21-2 , the Referer and User-Agent fields are assigned as follows:

Field

Value

Referer

http://www.joes-hardware.com/

User-Agent

5.0: Mozilla/4.0 (compatible; MSIE 5.0; Windows 98)

The first seven fields of the example Combined Log Format entry in Example 21-2 are exactly as they would be in the Common Log Format (see the first entry in Example 21-1 ). The two new fields, Referer and User-Agent, are tacked onto the end of the log entry.

21.2.3 Netscape Extended Log Format

When Netscape entered into the commercial HTTP application space, it defined for its servers many log formats that have been adopted by other HTTP application developers. Netscape's formats derive from the NCSA Common Log Format, but they extend that format to incorporate fields relevant to HTTP applications such as proxies and web caches.

The first seven fields in the Netscape Extended Log Format are identical to those in the Common Log Format (see Table 21-1 ). Table 21-3 lists, in order, the new fields that the Netscape Extended Log Format introduces.

Table 21-3. Additional Netscape Extended Log Format fields

Field

Description

proxy-response-code

If the transaction went through a proxy, the HTTP response code from the server to the proxy

proxy-response-size

If the transaction went through a proxy, the Content-Length of the server's response entity sent to the proxy

client-request-size

The Content-Length of any body or entity in the client's request to the proxy

proxy-request-size

If the transaction went through a proxy, the Content-Length of any body or entity in the proxy's request to the server

client-request-hdr-size

The length, in bytes, of the client's request headers

proxy-response-hdr-size

If the transaction went through a proxy, the length, in bytes, of the proxy's response headers that were sent to the requestor

proxy-request-hdr-size

If the transaction went through a proxy, the length, in bytes, of the proxy's request headers that were sent to the server

server-response-hdr-size

The length, in bytes, of the server's response headers

proxy-timestamp

If the transaction went through a proxy, the elapsed time for the request and response to travel through the proxy, in seconds

Example 21-3 gives an example of a Netscape Extended Log Format entry.

Example 21-3. Netscape Extended Log Format
 209.1.32.44 - - [03/Oct/1999:14:16:00-0400] "GET / HTTP/1.0" 200 1024 200 1024 0 0 215 260 
 279 254 3 

In this example, the extended fields are assigned as follows:

Field

Value

proxy-response-code

200

proxy-response-size

1024

client-request-size

proxy-request-size

client-request-hdr-size

215

proxy-response-hdr-size

260

proxy-request-hdr-size

279

server-response-hdr-size

254

proxy-timestamp

3

The first seven fields of the example Netscape Extended Log Format entry in Example 21-3 mirror the entries in the Common Log Format example (see the first entry in Example 21-1 ).

21.2.4 Netscape Extended 2 Log Format

Another Netscape log format, the Netscape Extended 2 Log Format, takes the Extended Log Format and adds further information relevant to HTTP proxy and web caching applications. These extra fields help paint a better picture of the interactions between an HTTP client and an HTTP proxy application.

The Netscape Extended 2 Log Format derives from the Netscape Extended Log Format, and its initial fields are identical to those listed in Table 21-3 (it also extends the Common Log Format fields listed in Table 21-1 ).

Table 21-4 lists, in order, the additional fields of the Netscape Extended 2 Log Format.

Table 21-4. Additional Netscape Extended 2 Log Format fields

Field

Description

route

The route that the proxy used to make the request for the client (see Table 21-5 )

client-finish-status-code

The client finish status code; specifies whether the client request to the proxy completed successfully (FIN) or was interrupted (INTR)

proxy-finish-status-code

The proxy finish status code; specifies whether the proxy request to the server completed successfully (FIN) or was interrupted (INTR)

cache-result-code

The cache result code; tells how the cache responded to the request [2]

[2] Table 21-7 lists the Netscape cache result codes.

Example 21-4 gives an example of a Netscape Extended 2 Log Format entry.

Example 21-4. Netscape Extended 2 Log Format
 209.1.32.44 - - [03/Oct/1999:14:16:00-0400] "GET / HTTP/1.0" 200 1024 200 1024 0 0 215 260 
 279 254 3 DIRECT FIN FIN WRITTEN 

The extended fields in this example are assigned as follows:

Field

Value

route

DIRECT

client-finish-status-code

FIN

proxy-finish-status-code

FIN

cache-result-code

WRITTEN

The first 16 fields in the Netscape Extended 2 Log Format entry in Example 21-4 mirror the entries in the Netscape Extended Log Format example (see Example 21-3 ).

Table 21-5 lists the valid Netscape route codes.

Table 21-5. Netscape route codes

Value

Description

DIRECT

The resource was fetched directly from the server.

PROXY(host:port)

The resource was fetched through the proxy "host."

SOCKS(socks:port)

The resource was fetched through the SOCKS server "host."

Table 21-6 lists the valid Netscape finish codes.

Table 21-6. Netscape finish status codes

Value

Description

-

The request never even started.

FIN

The request was completed successfully.

INTR

The request was interrupted by the client or ended by a proxy/server.

TIMEOUT

The request was timed out by the proxy/server.

Table 21-7 lists the valid Netscape cache codes. [3]

[3] Chapter 7 discusses HTTP caching in detail.

Table 21-7. Netscape cache codes

Code

Description

-

The resource was uncacheable.

WRITTEN

The resource was written into the cache.

REFRESHED

The resource was cached and it was refreshed.

NO-CHECK

The cached resource was returned; no freshness check was done.

UP-TO-DATE

The cached resource was returned; a freshness check was done.

HOST-NOT-AVAILABLE

The cached resource was returned; no freshness check was done because the remote server was not available.

CL-MISMATCH

The resource was not written to the cache; the write was aborted because the Content-Length did not match the resource size.

ERROR

The resource was not written to the cache due to some error; for example, a timeout occurred or the client aborted the transaction.

Netscape applications, like many other HTTP applications, have other log formats too, including a Flexible Log Format and a means for administrators to output custom log fields. These formats allow administrators greater control and the ability to customize their logs by choosing which parts of the HTTP transaction (headers, status, sizes, etc.) to report in their logs.

The ability for administrators to configure custom formats was added because it is difficult to predict what information administrators will be interested in getting from their logs. Many other proxies and servers also have the ability to emit custom logs.

21.2.5 Squid Proxy Log Format

The Squid proxy cache ( http://www.squid-cache.org ) is a venerable part of the Web. Its roots trace back to one of the early web proxy cache projects ( ftp://ftp.cs. colorado .edu/pub/techreports/schwartz/Harvest.Conf.ps.Z ). Squid is an open source project that has been extended and enhanced by the open source community over the years . Many tools have been written to help administer the Squid application, including tools to help process, audit, and mine its logs. Many subsequent proxy caches adopted the Squid format for their own logs so that they could leverage these tools.

The format of a Squid log entry is fairly simple. Its fields are summarized in Table 21-8 .

Table 21-8. Squid Log Format fields

Field

Description

timestamp

The timestamp when the request arrived, in seconds since January 1, 1970 GMT.

time-elapsed

The elapsed time for request and response to travel through the proxy, in milliseconds .

host-ip

The IP address of the client's (requestor's) host machine.

result-code/status

The result field is a Squid-ism that tells what action the proxy took during this request [4] ; the code field is the HTTP response code that the proxy sent to the client.

size

The length of the proxy's response to the client, including HTTP response headers and body, in bytes.

method

The HTTP method of the client's request.

url

The URL in the client's request. [5]

rfc931-ident [6]

The client's authenticated username. [7]

hierarchy/from

Like the route field in Netscape formats, the hierarchy field tells what route the proxy used to make the request for the client. [8] The from field tells the name of the server that the proxy used to make the request.

content-type

The Content-Type of the proxy response entity.

[4] Table 21-9 lists the various result codes and their meanings.

[5] Recall from Chapter 2 that proxies often log the entire requested URL, so if a username and password component are in the URL, a proxy can inadvertently record this information.

[6] The rfc931-ident, hierarchy/from, and content-type fields were added in Squid 1.1. Previous versions did not have these fields.

[7] RFC 931 describes the ident lookup used in this authentication.

[8] http://squid.nlanr.net/Doc/FAQ/FAQ-6.html#ss6.6 lists all of the valid Squid hierarchy codes.

Example 21-5 gives an example of a Squid Log Format entry.

Example 21-5. Squid Log Format
 99823414 3001 209.1.32.44 TCP_MISS/200 4087 GET http://www.joes-hardware.com - DIRECT/ 
 proxy.com text/html 

The fields are assigned as follows:

Field

Value

timestamp

99823414

time-elapsed

3001

host-ip

209.1.32.44

action-code

TCP_MISS

status

200

size

4087

method

GET

URL

http://www.joes-hardware.com

RFC 931 ident

-

hierarchy

DIRECT [9]

from

proxy.com

content-type

text/html

[9] The DIRECT Squid hierarchy value is the same as the DIRECT route value in Netscape log formats.

Table 21-9 lists the various Squid result codes. [10]

[10] Several of these action codes deal more with the internals of the Squid proxy cache, so not all of them are used by other proxies that implement the Squid Log Format.

Table 21-9. Squid result codes

Action

Description

TCP_HIT

A valid copy of the resource was served out of the cache.

TCP_MISS

The resource was not in the cache.

TCP_REFRESH_HIT

The resource was in the cache but needed to be checked for freshness. The proxy revalidated the resource with the server and found that the in-cache copy was indeed still fresh.

TCP_REF_FAIL_HIT

The resource was in the cache but needed to be checked for freshness. However, the revalidation failed (perhaps the proxy could not connect to the server), so the "stale" resource was returned.

TCP_REFRESH_MISS

The resource was in the cache but needed to be checked for freshness. Upon checking with the server, the proxy learned that the resource in the cache was out of date and received a new version.

TCP_CLIENT_REFRESH_MISS

The requestor sent a Pragma: no-cache or similar Cache-Control directive, so the proxy was forced to fetch the resource.

TCP_IMS_HIT

The requestor issued a conditional request, which was validated against the cached copy of the resource.

TCP_SWAPFAIL_MISS

The proxy thought the resource was in the cache but for some reason could not access it.

TCP_NEGATIVE_HIT

A cached response was returned, but the response was a negatively cached response. Squid supports the notion of caching errors for resourcesfor example, caching a 404 Not Found responseso if multiple requests go through the proxy-cache for an invalid resource, the error is served from the proxy cache.

TCP_MEM_HIT

A valid copy of the resource was served out of the cache, and the resource was in the proxy cache's memory (as opposed to having to access the disk to retrieve the cached resource).

TCP_DENIED

The request for this resource was denied , probably because the requestor does not have permission to make requests for this resource.

TCP_OFFLINE_HIT

The requested resource was retrieved from the cache during its offline mode. Resources are not validated when Squid (or another proxy using this format) is in offline mode.

UDP_*

The UDP_* codes indicate that requests were received through the UDP interface to the proxy. HTTP normally uses the TCP transport protocol, so these requests are not using the HTTP protocol. [11]

UDP_HIT

A valid copy of the resource was served out of the cache.

UDP_MISS

The resource was not in the cache.

UDP_DENIED

The request for this resource was denied, probably because the requestor does not have permission to make requests for this resource.

UDP_INVALID

The request that the proxy received was invalid.

UDP_MISS_NOFETCH

Used by Squid during specific operation modes or in the cache of frequent failures. A cache miss was returned and the resource was not fetched.

NONE

Logged sometimes with errors.

TCP_CLIENT_REFRESH

See TCP_CLIENT_REFRESH_MISS.

TCP_SWAPFAIL

See TCP_SWAPFAIL_MISS.

UDP_RELOADING

See UDP_MISS_NOFETCH.

[11] Squid has its own protocol for making these requests: ICP. This protocol is used for cache-to-cache requests. See http://www.squid-cache.org for more information.

 



HTTP. The Definitive Guide
HTTP: The Definitive Guide
ISBN: 1565925092
EAN: 2147483647
Year: 2001
Pages: 294

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net