Summary

 < Day Day Up > 

Interpreting Web Server Log Files

An abundance of log analysis software is available for Unix operating systems (and thus Tiger). Log analysis is more of an art than a science. As you've seen by looking at the log file formats, you can determine the remote host, requested resource, and time of request from the log file. Unfortunately, many analysis packages try to go even further by providing information on how long visitors were at your page, or where (geographically) they are located. Neither of these pieces of information is tracked in the Apache logs.

Understanding Web Statistics

To determine how long someone has been at your site, the analysis software must look at all accesses and determine which are related, and the amount of time between them. This is entirely guesswork on the part of the server. Assume that a user opens her browser, views a page, walks away for 15 minutes, and then accidentally clicks on another page before closing her browser. If the software is set with a session timeout period greater than 15 minutes, it sees two separate hits on the page. The software assumes that the user spent at least 15 minutes reading both pages and registers that the browser spent 30 minutes on the site. In reality, only a minute or two was spent looking at the site content.

When determining geographic information, analysis software performs an even more amazing task locating what city a user is coming from. To do this, the analysis utility looks up the domain of the client accessing the system. It retrieves the city and state that the domain was registered in. Unfortunately, this is almost completely worthless data.

For example, a WebTrends report on a local (Columbus, Ohio) e-commerce site showed that more than 95% of the remote requests are coming from Herndon, VA. In fact, an analysis of other (nonrelated) sites shows a similar amount of traffic from Virginia. The reason is simple the RoadRunner cable model network. The rr.com domain is registered in Herndon, VA. There are thousands of users with RoadRunner-based access no matter where they are actually coming from, the report displays Herndon, VA. That isn't very useful, is it?

The final web statistic fallacy is the number of hits a page receives. Many people are delighted when they find that they're getting a few thousand hits a day, but they don't realize what constitutes a hit. The Apache web server counts any information requested as a hit. If a web page has 10 tiny icons on it, it takes at least 11 hits to load the page (1 for the page, 10 for the icons). As pages become more graphically rich, it takes even more requests to load them. A 10,000-hit-per-day site might only be serving a few hundred pages per day!

Popular Web Statistics Tools

As long as you realize that log analysis data can be deceiving, it can still provide useful information. Here are a few popular web statistics packages available for Tiger:

  • Analog The world's most popular statistics software, Analog provides all the basics in a very simple layout. Analog doesn't create DTP-quality graphs or have the snazziest interface you've ever seen, but it's fast, does a good job, and it's free. http://www.analog.cx/.

  • Sawmill The Sawmill software provides complete statistics including search engine identification and a unique Calendar view for located information by month and date. Sawmill is a commercial package costing $99 and up. http://www.sawmill.net/.

  • Summary Summary is a great entry-level piece of software with advanced reporting features. Data can be exported directly to spreadsheet format for external graphing. Single-user licenses for Summary start at $59. http://summary.net/download.html

  • AWStats Advanced Web Statistics is a relative newcomer to the web stats arena, but brings with it extensive reporting and built-in graphing. A bit flashier than analog and free for anyone's use, it is a good choice for budget-conscious web manager who wants to provide as complete and easy-to-read statistics as possible. http://awstats.sourceforge.net.

  • Urchin Urchin has, without a doubt, the most user-friendly and attractive interface of any of these offerings. Urchin is a great stats solution for websites with the resources to afford it. Urchin starts at $199 for an individual server license, but can operate in Lite mode for free. http://www.urchin.com/download/

Generating Statistics with AWStats

If you run a server, you'll want a stats solution implemented as soon as possible. Understanding what is happening on your server can help locate errors in your website, find potential hackers, and identify where you should focus your development efforts. To get your site up and running with a stats solution that will likely provide most of the features you'll ever need, let's take a look at how to setup AWStats on your system.

Installing AWStats

To begin, download the latest AWStats distribution from http://awstats.sourceforge.net/#DOWNLOAD, and unarchive it. Next, use mv to move the distribution directory to /usr/local/awstats.

Now, enter the directory /usr/local/awstats/tools and issue the command sudo perl awstats_configure.pl. This will walk you through a simple setup script, as follows:

 ----- AWStats awstats_configure 1.0 (build 1.3) (c) Laurent Destailleur ----- This tool will help you to configure AWStats to analyze statistics for one web server. You can try to use it to let it do all that is possible in AWStats setup, however following the step by step manual setup documentation (docs/index.html) is often a better idea. Above all if: - You are not an administrator user, - You want to analyze downloaded log files without web server, - You want to analyze mail or ftp log files instead of web log files, - You need to analyze load balanced servers log files, - You want to 'understand' all possible ways to use AWStats... Read the AWStats documentation (docs/index.html). -----> Running OS detected: Linux, BSD or Unix -----> Check for web server install   Found Web server Apache config file '/etc/httpd/httpd.conf' 

If you've made the appropriate changes to use the combined log format, answer N to the next question to prevent AWStats from changing your configuration.

 -----> Check and complete web server config file '/etc/httpd/httpd.conf' Warning: You Apache config file contains directives to write 'common' log files This means that some features can't work (os, browsers and keywords detection). Do you want me to setup Apache to write 'combined' log files [y/N] ? N   Add 'Alias /awstatsclasses "/usr/local/awstats/wwwroot/classes/"'   Add 'Alias /awstatscss "/usr/local/awstats/wwwroot/css/"'   Add 'Alias /awstatsicons "/usr/local/awstats/wwwroot/icon/"'   Add 'ScriptAlias /awstats/ "/usr/local/awstats/wwwroot/cgi-bin/"'   Add '<Directory>' directive   AWStats directives added to Apache config file. -----> Update model config file '/usr/local/awstats/wwwroot/cgi-bin/awstats.model.conf'   File awstats.model.conf updated. 

Allow the setup tool to create the initial configuration file for you by answering Y to the following question. Provide a simple name for your site, such as "MySite", used here, and then go with the default /etc/awstats directory to store the config files.

 -----> Need to create a new config file ? Do you want me to build a new AWStats config/profile file (required if first install) [y/N] ? y -----> Define config file name to create What is the name of your web site or profile analysis ? Example: www.mysite.com Example: demo Your web site, virtual server or profile name: > MySite -----> Define config file path In which directory do you plan to store your config file(s) ? Default: /etc/awstats Directory path to store config file(s) (Enter for default): >  -----> Create config file '/etc/awstats/awstats.MySite.conf'  Config file /etc/awstats/awstats.MySite.conf created. -----> Restart Web server with '/sbin/service httpd restart' No such service httpd -----> Add update process inside a scheduler Sorry, configure.pl does not support automatic add to cron yet. You can do it manually by adding the following command to your cron: /usr/local/awstats/wwwroot/cgi-bin/awstats.pl -update -config=MySite Or if you have several config files and prefer having only one command: /usr/local/awstats/tools/awstats_updateall.pl now Press ENTER to continue...  A SIMPLE config file has been created: /etc/awstats/awstats.MySite.conf You should have a look inside to check and change manually main parameters. You can then manually update your statistics for 'MySite' with command: > perl awstats.pl -update -config=MySite You can also read your statistics for 'MySite' with URL: > http://localhost/awstats/awstats.pl?config=MySite Press ENTER to finish... 

Your installation is now complete, but before AWStats will actually run, you'll need to make a few changes to the configuration file and the /usr/local/awstats directory.

First, fix the permissions on /usr/local/awstats by typing chmod 755 /usr/local/awstats. This will allow Apache to execute the CGI application that has been installed.

Next, create a default data directory that is writeable by Apache:

 brezup:jray jray $ sudo mkdir -p /var/lib/awstats brezup:jray jray $ sudo chown www /var/lib/awstats 

Finally, edit the configuration file created by the setup utility. The file should be located at /etc/awstats/awstats.<your site config name>.conf such as awstats.MySite.conf for the sample we've created here.

Search for the line beginning with LogFile= this is should be set to the path of your Apache log, or, by default /var/log/httpd/access_log. Modify the line to read:

 LogFile="/var/log/httpd/access_log" 

Now, save the configuration file and restart Apache with sudo /usr/sbin/apachectl restart.

Running the Stats Analysis

Before you can view the results of AWStats, you must run the analysis on the current log data. To do this, first enter the AWStats cgi-bin directory: cd /usr/local/awstats/wwwroot/cgi-bin. Then execute AWStats with sudo ./awstats.pl -config=<your site config name> -update. For example, the site defined in /etc/awstats/awstats.MySite.conf is processed with

 brezup:jray jray $ sudo ./awstats.pl -config=MySite -update Update for config "/etc/awstats/awstats.MySite.conf" With data in log file "/var/log/httpd/access_log"... Phase 1 : First bypass old records, searching new record... Searching new records from beginning of log file... Jumped lines in file: 0 Parsed lines in file: 40  Found 0 dropped records,  Found 0 corrupted records,  Found 0 old records,  Found 40 new qualified records. 

Your can now view your site statistics by pointing your web browser at http://127.0.0.1/awstats/awstats.pl?config=<yoursiteconfigname>. A well-populated AWStats page is shown in Figure 23.6.

Figure 23.6. AWStats is a free tool for comprehensive web statistics.


TIP

You can automate the process of updating the stats by adding a line to /etc/crontab, such as

0 6 * * * root cd /usr/local/awstats /wwwroot/cgi-bin; ./awstats.pl -config=MySite -update

This will re-run the analysis every morning at 6 a.m.


There are many tweaks and additional features of the AWStats package that you can make/enable within the configuration file. You might want to browse the official documentation at http://awstats.sourceforge.net/docs/index.html for more information.

     < Day Day Up > 


    Mac OS X Tiger Unleashed
    Mac OS X Tiger Unleashed
    ISBN: 0672327465
    EAN: 2147483647
    Year: 2005
    Pages: 251

    flylib.com © 2008-2017.
    If you may any questions please contact us: flylib@qtcs.net