Chapter 14: Tracking User Access and Logging


One of your primary responsibilities as a Web administrator might be to log access to your company’s Internet servers. As you’ll see in this chapter, enabling logging on Hypertext Transfer Protocol (HTTP), File Transfer Protocol (FTP), and Simple Mail Transfer Protocol (SMTP) servers isn’t very difficult. What is difficult, however, is gathering the correct access information and recording this information in the proper format so that it can be read and analyzed. Software used to analyze IIS access logs is called tracking software. You’ll find many types of tracking software. Most commercial tracking software produces detailed reports that include tables and graphs that summarize activity for specific periods. For example, you could compile tracking reports daily, weekly, or monthly.

You can configure logging for HTTP, FTP, and SMTP servers. You can configure the file format for access logs in several ways. You can configure standard logging, Open Database Connectivity (ODBC) logging, and extended logging. With standard logging, you choose a log file format and rely on the format to record the user access information you need. With ODBC logging, you record user access directly to an ODBC-compliant database, such as Microsoft SQL Server 2000. With extended logging, you can customize the logging process and record exactly the information you need to track user access.

Tracking Statistics: The Big Picture

Access logs are created when you enable logging for an HTTP, FTP, or SMTP server. Every time someone requests a file from your Web site, an entry goes into the access log, making the access log a running history of every successful and unsuccessful attempt to retrieve information from your site. Because each entry has its own line, entries in the access log can be easily extracted and compiled into reports. From these reports, you can learn many things about those who visit your site. You can do the following:

  • Determine the busiest times of the day and week

  • Determine which browsers and platforms are used by people who visit your site

  • Discover popular and unpopular resources

  • Discover sites that refer users to your site

  • Learn more about the effectiveness of your advertising

  • Learn more about the people who visit your site

  • Obtain information about search engine usage and keywords

  • Obtain information about the amount of time users spend at the site

You can configure access logs in several formats. The available formats are:

  • National Center for Supercomputer Applications (NCSA) common log file format (Web and SMTP Only) Use the NCSA common log file format when your reporting and tracking needs are basic. With this format, log entries are small, which reduces the amount of storage space required for logging.

  • Microsoft Internet Information Services (IIS) log file format Use the IIS log file format when you need a bit more information from the logs but don’t need to tailor the entries to get detailed information. With this format, log entries are compact, which reduces the amount of storage space required for logging.

  • World Wide Web Consortium (W3C) extended log file format Use the W3C extended log file format when you need to customize the information tracked and obtain detailed information. With this format, log entries can become large, which greatly increases the amount of storage space required. Recording lengthy entries can affect the performance of a busy server as well.

  • ODBC logging Use ODBC logging when you want to write access information directly to an ODBC-compliant database. With this format, you’ll need tracking software capable of reading from a database. Entries are compact, however, and data can be read much more quickly than from a standard log file. Keep in mind that ODBC logging is more processor-intensive when you log directly to a local database instance.

  • Centralized binary logging Use centralized binary logging when you want all Web sites running on a server to write log data to a single log file. With centralized binary logging, the log files contain fixed-length and index records that are written in a raw binary format called the Internet Binary Log (IBL) format, giving the log file an .ibl extension. Professional software applications or tools in the IIS 6.0 Software Development Kit can read this format.

    Tip

    Microsoft distributes a tool for converting a log file to NCSA common log file format. The tool is called CONVLOG, and it’s located in the \%WinDir%\System32 directory. You can use CONVLOG to convert logs formatted using IIS and W3C extended log file formats to NCSA common log file format. The tool also performs reverse Domain Name System (DNS) lookups during the conversion process. This allows you to resolve some Internet Protocol (IP) addresses to domain names.

Because an understanding of what is written to log files is important to understanding logging itself, the sections that follow examine the main file formats. After this discussion, you’ll be able to determine what each format has to offer and, hopefully, to better determine when to use each format.

Working with the NCSA Common Log File Format

NCSA common log file format is the most basic of the log file formats. The common log file format is a fixed American Standard Code of Information Interchange (ASCII) format in which each log entry represents a unique file request. You’ll use the common log file format when your tracking and reporting needs are basic. More specifically, the common log file format is a good choice when you need to track only certain items, such as:

  • Hits (the number of unique file requests)

  • Page views (the number of unique page requests)

  • Visits (the number of user sessions in a specified period)

  • Other basic access information

With this format, log entries are small, which reduces the amount of storage space required for logging. Each entry in the common log file format has only seven fields:

  • Host

  • Identification

  • User Authentication

  • Time Stamp

  • HTTP Request Type

  • Status Code

  • Transfer Volume

As you’ll see, the common log file format is easy to understand, which makes it a good stepping-stone to more advanced log file formats. The following listing shows entries in a sample access log that are formatted using the NCSA common log file format. As you can see from the sample, log fields are separated by spaces.

192.168.11.15 - ENGSVR01\wrstanek [15/Jan/2003:18:44:57 -0800] "GET /  HTTP/1.1" 200 1970192.168.11.15 - ENGSVR01\wrstanek [15/Jan/2003:18:45:06 -0800] "GET / home.gif HTTP/1.1" 200 5032192.168.11.15 - ENGSVR01\wrstanek [15/Jan/2003:18:45:28 -0800] "GET / main.htm HTTP/1.1" 200 5432192.168.11.15 - ENGSVR01\wrstanek [15/Jan/2003:18:45:31 -0800] "GET / details.gif HTTP/1.1" 200 1211192.168.11.15 - ENGSVR01\wrstanek [15/Jan/2003:18:45:31 -0800] "GET / menu.gif HTTP/1.1" 200 6075192.168.11.15 - ENGSVR01\wrstanek [15/Jan/2003:18:45:31 -0800] "GET / sidebar.gif HTTP/1.1" 200 9023192.168.11.15 - ENGSVR01\wrstanek [15/Jan/2003:18:45:31 -0800] "GET / sun.gif HTTP/1.1" 200 4706192.168.11.15 - ENGSVR01\wrstanek [15/Jan/2003:18:45:38 -0800] "GET / moon.gif HTTP/1.1" 200 1984192.168.11.15 - ENGSVR01\wrstanek [15/Jan/2003:18:45:41 -0800] "GET / stars.gif HTTP/1.1" 200 2098 

Since most other log file formats build off the NCSA common log file format, it’s useful to examine how these fields are used.

Host Field

Host is the first field in the common log format. This field identifies the host computer requesting a file from your Web server. The value in this field is either the remote host’s IP address, such as 192.168.11.15, or the remote host’s fully qualified domain name (FQDN), such as net48.microsoft.com. The following example shows an HTTP query initiated by a host that was successfully resolved to a domain name (the host field information is bold):

 net48.microsoft.com - ENGSVR01\wrstanek [15/Jan/2003:18:44:57 -0800]  "GET / HTTP/1.1" 200 1970

IP addresses are the numeric equivalent of FQDNs. You can often use a reverse DNS lookup to determine the actual domain name from the IP address. When you have a domain name or resolve an IP address to an actual name, you can examine the name to learn more about the user accessing your server. Divisions within the domain name are separated by periods. The final division identifies the domain class, which can tell you where the user lives and works.

Domain classes are geographically and demographically organized. Geographically organized domain classes end in a two- or three-letter designator for the state or country in which the user lives. For example, the .ca domain class is for companies in Canada. Demographically organized domain classes tell you the type of company providing network access to the user. Table 14-1 summarizes these domain classes.

Table 14-1: Basic Domain Classes

Domain Name

Description

.com

Commercial; users from commercial organizations

.edu

Education; users from colleges and universities

.gov

U.S. government; users from U.S. government agencies, except the military

.mil

U.S. military; users who work at military installations

.net

Network; users who work at network service providers and other network-related organizations

.org

Nonprofit organizations; users who work for nonprofit organizations

Identification Field

The Identification field is the second field in the common log file format. This field is meant to identify users by their user name but in practice is rarely used. Because of this, you’ll generally see a hyphen (-) in this field, as in the following:

net48.microsoft.com - ENGSVR01\wrstanek [15/Jan/2003:18:44:57 -0800]  "GET / HTTP/1.1" 200 1970

If you do see a value in this field, keep in mind that the user name isn’t validated. This means it could be made up and shouldn’t be trusted.

User Authentication Field

The User Authentication field is the third field in the common log format. If you have a password-protected area at your Web site, users must authenticate themselves with a user name and password that’s registered for this area. After users validate themselves with their user name and password, their user name is entered in the User Authentication field. In unprotected areas of a site, you’ll usually see a hyphen (-) in this field. In protected areas of a site, you’ll see the authenticated user’s account name. The account name can be preceded by the name of the domain in which the user is authenticated, as shown in this example (the user authentication field information is bold):

net48.microsoft.com - ENGSVR01\wrstanek [15/Jan/2003:18:44:57 -0800]  "GET / HTTP/1.1" 200 1970

Time Stamp Field

The Time Stamp field is the fourth field in the common log file format. This field tells you exactly when someone accessed a file on the server. The format for the Time Stamp field is as follows:

DD/MMM/YYYY:HH:MM:SS OFFSET

such as:

15/Jan/2003:18:44:57 -0800

The only designator that probably doesn’t make intuitive sense is the offset, which indicates the difference in the server’s time from Greenwich Mean Time (GMT). In the following example, the offset is -8 hours, meaning that the server time is 8 hours behind GMT:

net48.microsoft.com - ENGSVR01\wrstanek [15/Jan/2003:18:44:57 -0800] "GET /  HTTP/1.1" 200 1970 

HTTP Request Field

The HTTP Request field is the fifth field in the common log format. Use this field to determine the method that the remote client used to request the resource, the resource that the remote client requested, and the HTTP version that the client used to retrieve the resource. In the following example, the HTTP Request field information is bold:

192.168.11.15 - ENGSVR01\wrstanek [15/Jan/2003:18:45:06 -0800] "GET /home.gif HTTP/1.1" 200 5032

Here, the transfer method is GET, the resource is /Home.gif, and the transfer method is HTTP 1.1. One thing you should note is that resources are specified using relative Uniform Resource Locators (URLs). The server interprets relative URLs. For example, if you request the file http://www.microsoft.com/home/main.htm, the server will use the relative URL /home/main.htm to log where the file is found. When you see an entry that ends in a slash, keep in mind that this refers to the default document for a directory, which is typically called Index.htm, Default.htm, or Default.asp.

Status Code Field

The Status Code field is the sixth field in the common log file format. Status codes indicate whether files were transferred correctly, were loaded from cache, weren’t found, and so on. Generally, status codes are three-digit numbers. As shown in Table 14-2, the first digit indicates the status code’s class or category.

Table 14-2: Status Code Classes

Code Class

Description

1XX

Continue/protocol change

2XX

Success

3XX

Redirection

4XX

Client error/failure

5XX

Server error

Because you’ll rarely see a status code beginning with 1, you need to remember only the other four categories. A status code that begins with 2 indicates that the associated file transferred successfully. A status code that begins with 3 indicates that the server performed a redirect. A status code that begins with 4 indicates some type of client error or failure. Finally, a status code that begins with 5 tells you that a server error occurred.

Transfer Volume Field

The last field in the common log file format is the Transfer Volume field. This field indicates the number of bytes transferred to the client because of the request. In the following example, 4096 bytes were transferred to the client (the transfer volume field information is bold):

net48.microsoft.com - ENGSVR01\wrstanek [15/Jan/2003:18:45:06 -0800]  "GET / HTTP/1.1" 200 4096 

You’ll only see a transfer volume when the status code class indicates success. If another status code class is used in field six, the Transfer Volume field will contain a hyphen (-) or a 0 to indicate that no data was transferred.

Working with the Microsoft IIS Log File Format

Like the common log file format, the Microsoft IIS log file format is a fixed ASCII format. This means that the fields in the log are of a fixed type and can’t be changed. It also means the log is formatted as standard ASCII text and can be read with any standard text editor or compliant application.

You’ll use the IIS log file format when you need a bit more information than the common log file format provides but don’t need to tailor the entries to get detailed information. Since the log entries are compact, the amount of storage space required for logging is much less than the expanded or ODBC logging formats.

The following listing shows entries from a sample log using the IIS log file format. The IIS log entries include common log fields such as the client IP address, authenticated user name, request date and time, HTTP status code, and number of bytes received. IIS log entries also include detailed items such as the Web service name, the server IP address, and the elapsed time. Note that commas separate log fields and entries are much longer than those in the common log file format.

192.14.16.2, -, 04/15/2003, 15:42:25, W3SVC1, ENGSVR01, 192.15.14.81,  0, 594, 3847, 401, 5, GET, /localstart.asp,  -,192.14.16.2, ENGSVR01\wrstanek, 04/15/2003, 15:42:25, W3SVC1,  ENGSVR01, 192.15.14.81, 10, 412, 3406, 404, 0, GET, /localstart.asp,  |-|0|404_Object_Not_Found, 192.14.16.2, -, 04/15/2003, 15:42:29, W3SVC1, ENGSVR01, 192.15.14.81,  0, 622, 3847, 401, 5, GET, /IISHelp/iis/misc/default.asp,  -,192.14.16.2, ENGSVR01\wrstanek, 04/15/2003, 15:42:29, W3SVC1,  ENGSVR01, 192.15.14.81, 10, 426, 0, 200, 0, GET, /IISHelp/iis/misc/ default.asp, -,192.14.16.2, ENGSVR01\wrstanek, 04/15/2003, 15:42:29, W3SVC1,  ENGSVR01, 192.15.14.81, 10, 368, 0, 200, 0, GET, /IISHelp/iis/misc/ contents.asp,  -,192.14.16.2, -, 04/15/2003, 15:42:29, W3SVC1, ENGSVR01, 192.15.14.81,  0, 732, 3847, 401, 5, GET, /IISHelp/iis/misc/navbar.asp,  -,192.14.16.2, -, 04/15/2003, 15:42:29, W3SVC1, ENGSVR01, 192.15.14.81,  0, 742, 3847, 401, 5, GET, /IISHelp/iis/htm/core/iiwltop.htm,  -,192.14.16.2, ENGSVR01\wrstanek, 04/15/2003, 15:42:29, W3SVC1,  ENGSVR01, 192.15.14.81, 20, 481, 0, 200, 0, GET, /IISHelp/iis/misc/ navbar.asp, -,192.14.16.2, ENGSVR01\wrstanek, 04/15/2003, 15:42:29,  W3SVC1, ENGSVR01, 192.15.14.81, 91, 486, 6520, 200, 0, GET, /IISHelp/ iis/htm/core/iiwltop.htm, -,

The fields supported by IIS are summarized in Table 14-3 on the following page. Note that the listed field order is the general order used by IIS to record fields.

Table 14-3: Fields for the IIS Log File Format

Field Name

Description

Example

Client IP

IP address of the client

192.14.16.2

Username

Authenticated name of the user

ENGSVR01\wrstanek

Date

Date at which the transaction was completed

04/15/2003

Time

Time at which the transaction was completed

15:42:29

Service

Name of the Web service logging the transaction

W3SVC1

Computer Name

Name of the computer that made the request

ENGSVR01

Server IP

IP address of the Web server

192.15.14.81

Elapsed Time

Time taken (in milliseconds) for the transaction to be completed

40

Bytes Received

Number of bytes received by the server in client request

486

Bytes Sent

Number of bytes sent to the client

6520

Status Code

HTTP status code

200

Windows Status Code

Error status code from Windows

0

Method Used

HTTP request method

GET

File URI

The requested file

/localstart.asp

Referrer

The referrer—the location the user came from

http: // www.microsoft.com/

Working with the W3C Extended Log File Format

The W3C extended log file format is very different from either of the previously discussed formats. With this format you can customize the information tracked and obtain detailed information. When you customize an extended log file, you select the fields you want the server to log, and the server handles the logging for you. Keep in mind that each additional field you track increases the size of entries recorded in the access logs, which can greatly increase the amount of storage space required.

The following listing shows sample entries from an extended log. Note that, as with the common log file format, extended log fields are separated with spaces.

#Software: Microsoft Internet Information Services 6.0 #Version: 1.0 #Date: 2003-04-05 06:27:58 #Fields: date time c-ip cs-username s-ip s-port cs-method cs-uri-stem  cs-uri-query sc-status cs(User- Agent) 2003-04-05 06:27:58 192.14.16.2 ENGSVR01\wrstanek 192.14.15.81 80 GET  /iishelp/iis/htm/core/ iierrcst.htm - 304 Mozilla/ 4.0+(compatible;+MSIE+6.01;+Windows+NT+5.2;+.NET+CLR+1.1.4322) 2003-04-05 06:28:00 192.14.16.2 ENGSVR01\wrstanek 192.14.15.81 80 GET  /iishelp/iis/htm/core/ iierrdtl.htm - 304 Mozilla/ 4.0+(compatible;+MSIE+6.01;+Windows+NT+5.2;+.NET+CLR+1.1.4322) 2003-04-05 06:28:02 192.14.16.2 ENGSVR01\wrstanek 192.14.15.81 80 GET  /iishelp/iis/htm/core/ iierrabt.htm - 200 Mozilla/ 4.0+(compatible;+MSIE+6.01;+Windows+NT+5.2;+.NET+CLR+1.1.4322) 2003-04-05 06:28:02 192.14.16.2 ENGSVR01\wrstanek 192.14.15.81 80 GET  /iishelp/iis/htm/core/ iierradd.htm - 200 Mozilla/ 4.0+(compatible;+MSIE+6.01;+Windows+NT+5.2;+.NET+CLR+1.1.4322) 2003-04-05 06:28:05 192.14.16.2 ENGSVR01\wrstanek 192.14.15.81 80 GET  /iishelp/iis/htm/core/ iiprstop.htm - 200 Mozilla/ 4.0+(compatible;+MSIE+6.01;+Windows+NT+5.2;+.NET+CLR+1.1.4322) 

The first time you look at log entries that use the extended log file format, you might be a bit confused because the extended logs are written with server directives as well as file requests. The good news is that server directives are always preceded by the hash symbol (#), which easily allows you to distinguish them from actual file requests. The key directives you’ll see are those that identify the server software and the fields being recorded. These directives are summarized in Table 14-4.

Table 14-4: Directives Used with the Extended Log File Format

Directive

Name Description

Date

Identifies the date and time the entries were made in the log

End-Date

Identifies the date and time the log was finished and then archived

Fields

Specifies the fields and the field order used in the log file

Remark

Specifies comments

Software

Identifies the server software that created the log entries

Start-Date

Identifies the date and time the log was started

Version

Identifies the version of the extended log file format used

Most extended log fields have a prefix. The prefix tells you how a particular field is used or how the field was obtained. For example, the cs prefix tells you the field was obtained from a request sent by the client to the server. Field prefixes are summarized in Table 14-5.

Table 14-5: Prefixes Used with the Extended Log Fields

Prefix

Description

c

Identifies a client-related field

s

Identifies a server-related field

r

Identifies a remote server field

cs

Identifies information obtained from a request sent by the client to the server

sc

Identifies information obtained from a request sent by the IIS server to the client

sr

Identifies information obtained from a request sent by the Web server to a remote server (used by proxies)

rs

Identifies information obtained from a request sent by a remote server to the IIS server (used by proxies)

x

Application-specific prefix

All fields recorded in an extended log have a field identifier. This identifier details the type of information a particular field records. To create a named field, the IIS server can combine a field prefix with a field identifier, or it can simply use a field identifier. The most commonly used field names are summarized in Table 14-6. As you examine the table, keep in mind that most of these fields relate directly to the fields we’ve already discussed for the common and extended log file formats. Again, the key difference is that the extended format can give you information that’s much more detailed.

Table 14-6: Field Identifiers Used with the Extended Log File Format

Field Type

Actual Field Name

Description

Bytes Received

cs-bytes

Number of bytes received by the server

Bytes Sent

sc-bytes

Number of bytes sent by the server

Client IP Address

c-ip

IP address of the client that accessed the server

Cookie

cs(Cookie)

Content of the cookie sent or received (if any)

Date

Date

Date on which the activity occurred

Method Used

cs-method

HTTP request method

Protocol Status

sc-status

HTTP status code, such as 404

Protocol Substatus

sc-status

HTTP substatus code, such as 2

Protocol Version

cs-protocol

Protocol version used by the client

Referrer

cs(Referrer)

Previous site visited by the user, which provided a link to the current site

Server IP

s-ip

IP address of the IIS server

Server Name

s-computername

Name of the IIS server

Server Port

s-port

Port number to which client is connected

Service Name and Instance Number

s-sitename

Internet service and instance number that was running on the server

Time

Time

Time the activity occurred

Time Taken

time-taken

Time taken (in milliseconds) for the transaction to be completed

URI Query

cs-uri-query

Query parameters passed in request (if any)

URI Stem

cs-uri-stem

Requested resource

User Agent

cs(User-Agent)

Browser type and version used on the client

User Name

c-username

Name of an authenticated user (if available)

Win32 Status

sc-win32-status

Error status code from Windows

Real World

In IIS 6, the HTTP Status option is renamed Protocol Status and you have the additional option of being able to log Protocol Substatus. Protocol Status logs the request’s HTTP status code, such as 404. Protocol Substatus logs the request’s HTTP substatus code, such as 2. When used together, the fields provide the request’s complete status, such as 404.2. This is important, because in IIS 6, the server no longer reports complete status and substatus codes to clients. To increase security and reduce the possibility of an attack, clients see only the HTTP status code.

Working with ODBC Logging

You can use the ODBC logging format when you want to write access information directly to an ODBC-compliant database, such as Microsoft Access or SQL Server 2000. The key advantage of ODBC logging is that access entries are written directly to a database in a format that ODBC-compliant tracking software can quickly read and interpret. The major disadvantage of ODBC logging is that it requires basic database administration skills to configure and maintain.

With ODBC logging, you must configure a Data Source Name (DSN) that allows IIS to connect to your ODBC database. You must also create a database that can be used for logging. This database must have a table with the appropriate fields for the logging data.

Typically, you’ll use the same database for logging information from multiple sites, with each site writing to a separate table in the database. For example, if you wanted to log HTTP, FTP, and SMTP access information in your database, and these services were running on separate sites, you’d create three tables in your database:

  • HTTPLog

  • FTPLog

  • SMTPLog

These tables would have the columns and data types for field values summarized in Table 14-7. The columns must be configured exactly as shown in the table. Don’t worry; IIS includes a SQL script that you can use to create the necessary table structures. This script, named Logtemp.sql, is located in the \%WinDir%\System32\Inetsrv directory.

Note

If you use the Logtemp.sql script, be sure to edit the table name set in the CREATE TABLE statement. The default table name is Inetlog. For more information about working with SQL scripts, see Microsoft SQL Server 2000 Administrator’s Pocket Consultant (Microsoft Press, 2000).

Table 14-7: Table Fields for ODBC Logging

Field Name

Field Type

Description

ClientHost

varchar(255)

IP address of the client that accessed the server

Username

varchar(255)

Name of an authenticated user (if available)

LogTime

datetime

Date and time on which the activity occurred

Service

varchar(255)

Internet service and instance number that was running on the server

Machine

varchar(255)

Name of the computer that made the request

ServerIP

varchar(50)

IP address of the IIS server

ProcessingTime

int

Time taken (in milliseconds) for the transaction to be completed

BytesRecvd

int

Number of bytes received by the server

BytesSent

int

Number of bytes sent by the server

ServiceStatus

int

HTTP status code

Win32Status

int

Error status code from Windows

Operation

varchar(255)

HTTP request method

Target

varchar(255)

Requested resource

Parameters

varchar(255)

Query parameters passed in request (if any)

Working with Centralized Binary Logging

You can use centralized binary logging when you want all Web sites running on a server to write log data to a single log file. With centralized binary logging, the log files are written in IBL format, which can be read by many professional software applications, or you can read it using tools in the IIS 6 Resource Kit.

On a large IIS installation where the server is running hundreds or thousands of sites, centralized binary logging can dramatically reduce the overhead associated with logging activities. Two types of records are written to the binary log files:

  • Index records Act as record headers, similar to the W3C extended log file format where software, version, date, and field information is provided.

  • Fixed-length records Provide the detailed information about requests. Each value in each field in the entry is stored with a fixed length.

For information on configuring centralized binary logging, see the section of this chapter entitled “Configuring Centralized Binary Logging.”




Microsoft IIS 6.0Administrator's Consultant
Microsoft IIS 6.0Administrator's Consultant
ISBN: N/A
EAN: N/A
Year: 2003
Pages: 116

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net