|
One of your primary responsibilities as a Web administrator might be to log access to your company’s Internet servers. As you’ll see in this chapter, enabling logging on Hypertext Transfer Protocol (HTTP), File Transfer Protocol (FTP), and Simple Mail Transfer Protocol (SMTP) servers isn’t very difficult. What is difficult, however, is gathering the correct access information and recording this information in the proper format so that it can be read and analyzed. Software used to analyze IIS access logs is called tracking software. You’ll find many types of tracking software. Most commercial tracking software produces detailed reports that include tables and graphs that summarize activity for specific periods. For example, you could compile tracking reports daily, weekly, or monthly.
You can configure logging for HTTP, FTP, and SMTP servers. You can configure the file format for access logs in several ways. You can configure standard logging, Open Database Connectivity (ODBC) logging, and extended logging. With standard logging, you choose a log file format and rely on the format to record the user access information you need. With ODBC logging, you record user access directly to an ODBC-compliant database, such as Microsoft SQL Server 2000. With extended logging, you can customize the logging process and record exactly the information you need to track user access.
Access logs are created when you enable logging for an HTTP, FTP, or SMTP server. Every time someone requests a file from your Web site, an entry goes into the access log, making the access log a running history of every successful and unsuccessful attempt to retrieve information from your site. Because each entry has its own line, entries in the access log can be easily extracted and compiled into reports. From these reports, you can learn many things about those who visit your site. You can do the following:
Determine the busiest times of the day and week
Determine which browsers and platforms are used by people who visit your site
Discover popular and unpopular resources
Discover sites that refer users to your site
Learn more about the effectiveness of your advertising
Learn more about the people who visit your site
Obtain information about search engine usage and keywords
Obtain information about the amount of time users spend at the site
You can configure access logs in several formats. The available formats are:
National Center for Supercomputer Applications (NCSA) common log file format (Web and SMTP Only) Use the NCSA common log file format when your reporting and tracking needs are basic. With this format, log entries are small, which reduces the amount of storage space required for logging.
Microsoft Internet Information Services (IIS) log file format Use the IIS log file format when you need a bit more information from the logs but don’t need to tailor the entries to get detailed information. With this format, log entries are compact, which reduces the amount of storage space required for logging.
World Wide Web Consortium (W3C) extended log file format Use the W3C extended log file format when you need to customize the information tracked and obtain detailed information. With this format, log entries can become large, which greatly increases the amount of storage space required. Recording lengthy entries can affect the performance of a busy server as well.
ODBC logging Use ODBC logging when you want to write access information directly to an ODBC-compliant database. With this format, you’ll need tracking software capable of reading from a database. Entries are compact, however, and data can be read much more quickly than from a standard log file. Keep in mind that ODBC logging is more processor-intensive when you log directly to a local database instance.
Centralized binary logging Use centralized binary logging when you want all Web sites running on a server to write log data to a single log file. With centralized binary logging, the log files contain fixed-length and index records that are written in a raw binary format called the Internet Binary Log (IBL) format, giving the log file an .ibl extension. Professional software applications or tools in the IIS 6.0 Software Development Kit can read this format.
Tip | Microsoft distributes a tool for converting a log file to NCSA common log file format. The tool is called CONVLOG, and it’s located in the \%WinDir%\System32 directory. You can use CONVLOG to convert logs formatted using IIS and W3C extended log file formats to NCSA common log file format. The tool also performs reverse Domain Name System (DNS) lookups during the conversion process. This allows you to resolve some Internet Protocol (IP) addresses to domain names. |
Because an understanding of what is written to log files is important to understanding logging itself, the sections that follow examine the main file formats. After this discussion, you’ll be able to determine what each format has to offer and, hopefully, to better determine when to use each format.
NCSA common log file format is the most basic of the log file formats. The common log file format is a fixed American Standard Code of Information Interchange (ASCII) format in which each log entry represents a unique file request. You’ll use the common log file format when your tracking and reporting needs are basic. More specifically, the common log file format is a good choice when you need to track only certain items, such as:
Hits (the number of unique file requests)
Page views (the number of unique page requests)
Visits (the number of user sessions in a specified period)
Other basic access information
With this format, log entries are small, which reduces the amount of storage space required for logging. Each entry in the common log file format has only seven fields:
Host
Identification
User Authentication
Time Stamp
HTTP Request Type
Status Code
Transfer Volume
As you’ll see, the common log file format is easy to understand, which makes it a good stepping-stone to more advanced log file formats. The following listing shows entries in a sample access log that are formatted using the NCSA common log file format. As you can see from the sample, log fields are separated by spaces.
192.168.11.15 - ENGSVR01\wrstanek [15/Jan/2003:18:44:57 -0800] "GET / HTTP/1.1" 200 1970192.168.11.15 - ENGSVR01\wrstanek [15/Jan/2003:18:45:06 -0800] "GET / home.gif HTTP/1.1" 200 5032192.168.11.15 - ENGSVR01\wrstanek [15/Jan/2003:18:45:28 -0800] "GET / main.htm HTTP/1.1" 200 5432192.168.11.15 - ENGSVR01\wrstanek [15/Jan/2003:18:45:31 -0800] "GET / details.gif HTTP/1.1" 200 1211192.168.11.15 - ENGSVR01\wrstanek [15/Jan/2003:18:45:31 -0800] "GET / menu.gif HTTP/1.1" 200 6075192.168.11.15 - ENGSVR01\wrstanek [15/Jan/2003:18:45:31 -0800] "GET / sidebar.gif HTTP/1.1" 200 9023192.168.11.15 - ENGSVR01\wrstanek [15/Jan/2003:18:45:31 -0800] "GET / sun.gif HTTP/1.1" 200 4706192.168.11.15 - ENGSVR01\wrstanek [15/Jan/2003:18:45:38 -0800] "GET / moon.gif HTTP/1.1" 200 1984192.168.11.15 - ENGSVR01\wrstanek [15/Jan/2003:18:45:41 -0800] "GET / stars.gif HTTP/1.1" 200 2098
Since most other log file formats build off the NCSA common log file format, it’s useful to examine how these fields are used.
Host is the first field in the common log format. This field identifies the host computer requesting a file from your Web server. The value in this field is either the remote host’s IP address, such as 192.168.11.15, or the remote host’s fully qualified domain name (FQDN), such as net48.microsoft.com. The following example shows an HTTP query initiated by a host that was successfully resolved to a domain name (the host field information is bold):
net48.microsoft.com - ENGSVR01\wrstanek [15/Jan/2003:18:44:57 -0800] "GET / HTTP/1.1" 200 1970
IP addresses are the numeric equivalent of FQDNs. You can often use a reverse DNS lookup to determine the actual domain name from the IP address. When you have a domain name or resolve an IP address to an actual name, you can examine the name to learn more about the user accessing your server. Divisions within the domain name are separated by periods. The final division identifies the domain class, which can tell you where the user lives and works.
Domain classes are geographically and demographically organized. Geographically organized domain classes end in a two- or three-letter designator for the state or country in which the user lives. For example, the .ca domain class is for companies in Canada. Demographically organized domain classes tell you the type of company providing network access to the user. Table 14-1 summarizes these domain classes.
Domain Name | Description |
---|---|
.com | Commercial; users from commercial organizations |
.edu | Education; users from colleges and universities |
.gov | U.S. government; users from U.S. government agencies, except the military |
.mil | U.S. military; users who work at military installations |
.net | Network; users who work at network service providers and other network-related organizations |
.org | Nonprofit organizations; users who work for nonprofit organizations |
The Identification field is the second field in the common log file format. This field is meant to identify users by their user name but in practice is rarely used. Because of this, you’ll generally see a hyphen (-) in this field, as in the following:
net48.microsoft.com - ENGSVR01\wrstanek [15/Jan/2003:18:44:57 -0800] "GET / HTTP/1.1" 200 1970
If you do see a value in this field, keep in mind that the user name isn’t validated. This means it could be made up and shouldn’t be trusted.
The User Authentication field is the third field in the common log format. If you have a password-protected area at your Web site, users must authenticate themselves with a user name and password that’s registered for this area. After users validate themselves with their user name and password, their user name is entered in the User Authentication field. In unprotected areas of a site, you’ll usually see a hyphen (-) in this field. In protected areas of a site, you’ll see the authenticated user’s account name. The account name can be preceded by the name of the domain in which the user is authenticated, as shown in this example (the user authentication field information is bold):
net48.microsoft.com - ENGSVR01\wrstanek [15/Jan/2003:18:44:57 -0800] "GET / HTTP/1.1" 200 1970
The Time Stamp field is the fourth field in the common log file format. This field tells you exactly when someone accessed a file on the server. The format for the Time Stamp field is as follows:
DD/MMM/YYYY:HH:MM:SS OFFSET
such as:
15/Jan/2003:18:44:57 -0800
The only designator that probably doesn’t make intuitive sense is the offset, which indicates the difference in the server’s time from Greenwich Mean Time (GMT). In the following example, the offset is -8 hours, meaning that the server time is 8 hours behind GMT:
net48.microsoft.com - ENGSVR01\wrstanek [15/Jan/2003:18:44:57 -0800] "GET / HTTP/1.1" 200 1970
The HTTP Request field is the fifth field in the common log format. Use this field to determine the method that the remote client used to request the resource, the resource that the remote client requested, and the HTTP version that the client used to retrieve the resource. In the following example, the HTTP Request field information is bold:
192.168.11.15 - ENGSVR01\wrstanek [15/Jan/2003:18:45:06 -0800] "GET /home.gif HTTP/1.1" 200 5032
Here, the transfer method is GET, the resource is /Home.gif, and the transfer method is HTTP 1.1. One thing you should note is that resources are specified using relative Uniform Resource Locators (URLs). The server interprets relative URLs. For example, if you request the file http://www.microsoft.com/home/main.htm, the server will use the relative URL /home/main.htm to log where the file is found. When you see an entry that ends in a slash, keep in mind that this refers to the default document for a directory, which is typically called Index.htm, Default.htm, or Default.asp.
The Status Code field is the sixth field in the common log file format. Status codes indicate whether files were transferred correctly, were loaded from cache, weren’t found, and so on. Generally, status codes are three-digit numbers. As shown in Table 14-2, the first digit indicates the status code’s class or category.
Code Class | Description |
---|---|
1XX | Continue/protocol change |
2XX | Success |
3XX | Redirection |
4XX | Client error/failure |
5XX | Server error |
Because you’ll rarely see a status code beginning with 1, you need to remember only the other four categories. A status code that begins with 2 indicates that the associated file transferred successfully. A status code that begins with 3 indicates that the server performed a redirect. A status code that begins with 4 indicates some type of client error or failure. Finally, a status code that begins with 5 tells you that a server error occurred.
The last field in the common log file format is the Transfer Volume field. This field indicates the number of bytes transferred to the client because of the request. In the following example, 4096 bytes were transferred to the client (the transfer volume field information is bold):
net48.microsoft.com - ENGSVR01\wrstanek [15/Jan/2003:18:45:06 -0800] "GET / HTTP/1.1" 200 4096
You’ll only see a transfer volume when the status code class indicates success. If another status code class is used in field six, the Transfer Volume field will contain a hyphen (-) or a 0 to indicate that no data was transferred.
Like the common log file format, the Microsoft IIS log file format is a fixed ASCII format. This means that the fields in the log are of a fixed type and can’t be changed. It also means the log is formatted as standard ASCII text and can be read with any standard text editor or compliant application.
You’ll use the IIS log file format when you need a bit more information than the common log file format provides but don’t need to tailor the entries to get detailed information. Since the log entries are compact, the amount of storage space required for logging is much less than the expanded or ODBC logging formats.
The following listing shows entries from a sample log using the IIS log file format. The IIS log entries include common log fields such as the client IP address, authenticated user name, request date and time, HTTP status code, and number of bytes received. IIS log entries also include detailed items such as the Web service name, the server IP address, and the elapsed time. Note that commas separate log fields and entries are much longer than those in the common log file format.
192.14.16.2, -, 04/15/2003, 15:42:25, W3SVC1, ENGSVR01, 192.15.14.81, 0, 594, 3847, 401, 5, GET, /localstart.asp, -,192.14.16.2, ENGSVR01\wrstanek, 04/15/2003, 15:42:25, W3SVC1, ENGSVR01, 192.15.14.81, 10, 412, 3406, 404, 0, GET, /localstart.asp, |-|0|404_Object_Not_Found, 192.14.16.2, -, 04/15/2003, 15:42:29, W3SVC1, ENGSVR01, 192.15.14.81, 0, 622, 3847, 401, 5, GET, /IISHelp/iis/misc/default.asp, -,192.14.16.2, ENGSVR01\wrstanek, 04/15/2003, 15:42:29, W3SVC1, ENGSVR01, 192.15.14.81, 10, 426, 0, 200, 0, GET, /IISHelp/iis/misc/ default.asp, -,192.14.16.2, ENGSVR01\wrstanek, 04/15/2003, 15:42:29, W3SVC1, ENGSVR01, 192.15.14.81, 10, 368, 0, 200, 0, GET, /IISHelp/iis/misc/ contents.asp, -,192.14.16.2, -, 04/15/2003, 15:42:29, W3SVC1, ENGSVR01, 192.15.14.81, 0, 732, 3847, 401, 5, GET, /IISHelp/iis/misc/navbar.asp, -,192.14.16.2, -, 04/15/2003, 15:42:29, W3SVC1, ENGSVR01, 192.15.14.81, 0, 742, 3847, 401, 5, GET, /IISHelp/iis/htm/core/iiwltop.htm, -,192.14.16.2, ENGSVR01\wrstanek, 04/15/2003, 15:42:29, W3SVC1, ENGSVR01, 192.15.14.81, 20, 481, 0, 200, 0, GET, /IISHelp/iis/misc/ navbar.asp, -,192.14.16.2, ENGSVR01\wrstanek, 04/15/2003, 15:42:29, W3SVC1, ENGSVR01, 192.15.14.81, 91, 486, 6520, 200, 0, GET, /IISHelp/ iis/htm/core/iiwltop.htm, -,
The fields supported by IIS are summarized in Table 14-3 on the following page. Note that the listed field order is the general order used by IIS to record fields.
Field Name | Description | Example |
---|---|---|
Client IP | IP address of the client | 192.14.16.2 |
Username | Authenticated name of the user | ENGSVR01\wrstanek |
Date | Date at which the transaction was completed | 04/15/2003 |
Time | Time at which the transaction was completed | 15:42:29 |
Service | Name of the Web service logging the transaction | W3SVC1 |
Computer Name | Name of the computer that made the request | ENGSVR01 |
Server IP | IP address of the Web server | 192.15.14.81 |
Elapsed Time | Time taken (in milliseconds) for the transaction to be completed | 40 |
Bytes Received | Number of bytes received by the server in client request | 486 |
Bytes Sent | Number of bytes sent to the client | 6520 |
Status Code | HTTP status code | 200 |
Windows Status Code | Error status code from Windows | 0 |
Method Used | HTTP request method | GET |
File URI | The requested file | /localstart.asp |
Referrer | The referrer—the location the user came from | http: // www.microsoft.com/ |
The W3C extended log file format is very different from either of the previously discussed formats. With this format you can customize the information tracked and obtain detailed information. When you customize an extended log file, you select the fields you want the server to log, and the server handles the logging for you. Keep in mind that each additional field you track increases the size of entries recorded in the access logs, which can greatly increase the amount of storage space required.
The following listing shows sample entries from an extended log. Note that, as with the common log file format, extended log fields are separated with spaces.
#Software: Microsoft Internet Information Services 6.0 #Version: 1.0 #Date: 2003-04-05 06:27:58 #Fields: date time c-ip cs-username s-ip s-port cs-method cs-uri-stem cs-uri-query sc-status cs(User- Agent) 2003-04-05 06:27:58 192.14.16.2 ENGSVR01\wrstanek 192.14.15.81 80 GET /iishelp/iis/htm/core/ iierrcst.htm - 304 Mozilla/ 4.0+(compatible;+MSIE+6.01;+Windows+NT+5.2;+.NET+CLR+1.1.4322) 2003-04-05 06:28:00 192.14.16.2 ENGSVR01\wrstanek 192.14.15.81 80 GET /iishelp/iis/htm/core/ iierrdtl.htm - 304 Mozilla/ 4.0+(compatible;+MSIE+6.01;+Windows+NT+5.2;+.NET+CLR+1.1.4322) 2003-04-05 06:28:02 192.14.16.2 ENGSVR01\wrstanek 192.14.15.81 80 GET /iishelp/iis/htm/core/ iierrabt.htm - 200 Mozilla/ 4.0+(compatible;+MSIE+6.01;+Windows+NT+5.2;+.NET+CLR+1.1.4322) 2003-04-05 06:28:02 192.14.16.2 ENGSVR01\wrstanek 192.14.15.81 80 GET /iishelp/iis/htm/core/ iierradd.htm - 200 Mozilla/ 4.0+(compatible;+MSIE+6.01;+Windows+NT+5.2;+.NET+CLR+1.1.4322) 2003-04-05 06:28:05 192.14.16.2 ENGSVR01\wrstanek 192.14.15.81 80 GET /iishelp/iis/htm/core/ iiprstop.htm - 200 Mozilla/ 4.0+(compatible;+MSIE+6.01;+Windows+NT+5.2;+.NET+CLR+1.1.4322)
The first time you look at log entries that use the extended log file format, you might be a bit confused because the extended logs are written with server directives as well as file requests. The good news is that server directives are always preceded by the hash symbol (#), which easily allows you to distinguish them from actual file requests. The key directives you’ll see are those that identify the server software and the fields being recorded. These directives are summarized in Table 14-4.
Directive | Name Description |
---|---|
Date | Identifies the date and time the entries were made in the log |
End-Date | Identifies the date and time the log was finished and then archived |
Fields | Specifies the fields and the field order used in the log file |
Remark | Specifies comments |
Software | Identifies the server software that created the log entries |
Start-Date | Identifies the date and time the log was started |
Version | Identifies the version of the extended log file format used |
Most extended log fields have a prefix. The prefix tells you how a particular field is used or how the field was obtained. For example, the cs prefix tells you the field was obtained from a request sent by the client to the server. Field prefixes are summarized in Table 14-5.
Prefix | Description |
---|---|
c | Identifies a client-related field |
s | Identifies a server-related field |
r | Identifies a remote server field |
cs | Identifies information obtained from a request sent by the client to the server |
sc | Identifies information obtained from a request sent by the IIS server to the client |
sr | Identifies information obtained from a request sent by the Web server to a remote server (used by proxies) |
rs | Identifies information obtained from a request sent by a remote server to the IIS server (used by proxies) |
x | Application-specific prefix |
All fields recorded in an extended log have a field identifier. This identifier details the type of information a particular field records. To create a named field, the IIS server can combine a field prefix with a field identifier, or it can simply use a field identifier. The most commonly used field names are summarized in Table 14-6. As you examine the table, keep in mind that most of these fields relate directly to the fields we’ve already discussed for the common and extended log file formats. Again, the key difference is that the extended format can give you information that’s much more detailed.
Field Type | Actual Field Name | Description |
---|---|---|
Bytes Received | cs-bytes | Number of bytes received by the server |
Bytes Sent | sc-bytes | Number of bytes sent by the server |
Client IP Address | c-ip | IP address of the client that accessed the server |
Cookie | cs(Cookie) | Content of the cookie sent or received (if any) |
Date | Date | Date on which the activity occurred |
Method Used | cs-method | HTTP request method |
Protocol Status | sc-status | HTTP status code, such as 404 |
Protocol Substatus | sc-status | HTTP substatus code, such as 2 |
Protocol Version | cs-protocol | Protocol version used by the client |
Referrer | cs(Referrer) | Previous site visited by the user, which provided a link to the current site |
Server IP | s-ip | IP address of the IIS server |
Server Name | s-computername | Name of the IIS server |
Server Port | s-port | Port number to which client is connected |
Service Name and Instance Number | s-sitename | Internet service and instance number that was running on the server |
Time | Time | Time the activity occurred |
Time Taken | time-taken | Time taken (in milliseconds) for the transaction to be completed |
URI Query | cs-uri-query | Query parameters passed in request (if any) |
URI Stem | cs-uri-stem | Requested resource |
User Agent | cs(User-Agent) | Browser type and version used on the client |
User Name | c-username | Name of an authenticated user (if available) |
Win32 Status | sc-win32-status | Error status code from Windows |
Real World | In IIS 6, the HTTP Status option is renamed Protocol Status and you have the additional option of being able to log Protocol Substatus. Protocol Status logs the request’s HTTP status code, such as 404. Protocol Substatus logs the request’s HTTP substatus code, such as 2. When used together, the fields provide the request’s complete status, such as 404.2. This is important, because in IIS 6, the server no longer reports complete status and substatus codes to clients. To increase security and reduce the possibility of an attack, clients see only the HTTP status code. |
You can use the ODBC logging format when you want to write access information directly to an ODBC-compliant database, such as Microsoft Access or SQL Server 2000. The key advantage of ODBC logging is that access entries are written directly to a database in a format that ODBC-compliant tracking software can quickly read and interpret. The major disadvantage of ODBC logging is that it requires basic database administration skills to configure and maintain.
With ODBC logging, you must configure a Data Source Name (DSN) that allows IIS to connect to your ODBC database. You must also create a database that can be used for logging. This database must have a table with the appropriate fields for the logging data.
Typically, you’ll use the same database for logging information from multiple sites, with each site writing to a separate table in the database. For example, if you wanted to log HTTP, FTP, and SMTP access information in your database, and these services were running on separate sites, you’d create three tables in your database:
HTTPLog
FTPLog
SMTPLog
These tables would have the columns and data types for field values summarized in Table 14-7. The columns must be configured exactly as shown in the table. Don’t worry; IIS includes a SQL script that you can use to create the necessary table structures. This script, named Logtemp.sql, is located in the \%WinDir%\System32\Inetsrv directory.
Note | If you use the Logtemp.sql script, be sure to edit the table name set in the CREATE TABLE statement. The default table name is Inetlog. For more information about working with SQL scripts, see Microsoft SQL Server 2000 Administrator’s Pocket Consultant (Microsoft Press, 2000).
|
You can use centralized binary logging when you want all Web sites running on a server to write log data to a single log file. With centralized binary logging, the log files are written in IBL format, which can be read by many professional software applications, or you can read it using tools in the IIS 6 Resource Kit.
On a large IIS installation where the server is running hundreds or thousands of sites, centralized binary logging can dramatically reduce the overhead associated with logging activities. Two types of records are written to the binary log files:
Index records Act as record headers, similar to the W3C extended log file format where software, version, date, and field information is provided.
Fixed-length records Provide the detailed information about requests. Each value in each field in the entry is stored with a fixed length.
For information on configuring centralized binary logging, see the section of this chapter entitled “Configuring Centralized Binary Logging.”
|