|< Day Day Up >|
Interpreting Web Server Log Files
An abundance of log analysis software is available for Unix operating systems (and thus Tiger). Log analysis is more of an art than a science. As you've seen by looking at the log file formats, you can determine the remote host, requested resource, and time of request from the log file. Unfortunately, many analysis packages try to go even further by providing information on how long visitors were at your page, or where (geographically) they are located. Neither of these pieces of information is tracked in the Apache logs.
Understanding Web Statistics
To determine how long someone has been at your site, the analysis software must look at all accesses and determine which are related, and the amount of time between them. This is entirely guesswork on the part of the server. Assume that a user opens her browser, views a page, walks away for 15 minutes, and then accidentally clicks on another page before closing her browser. If the software is set with a session timeout period greater than 15 minutes, it sees two separate hits on the page. The software assumes that the user spent at least 15 minutes reading both pages and registers that the browser spent 30 minutes on the site. In reality, only a minute or two was spent looking at the site content.
When determining geographic information, analysis software performs an even more amazing task locating what city a user is coming from. To do this, the analysis utility looks up the domain of the client accessing the system. It retrieves the city and state that the domain was registered in. Unfortunately, this is almost completely worthless data.
For example, a WebTrends report on a local (Columbus, Ohio) e-commerce site showed that more than 95% of the remote requests are coming from Herndon, VA. In fact, an analysis of other (nonrelated) sites shows a similar amount of traffic from Virginia. The reason is simple the RoadRunner cable model network. The rr.com domain is registered in Herndon, VA. There are thousands of users with RoadRunner-based access no matter where they are actually coming from, the report displays Herndon, VA. That isn't very useful, is it?
The final web statistic fallacy is the number of hits a page receives. Many people are delighted when they find that they're getting a few thousand hits a day, but they don't realize what constitutes a hit. The Apache web server counts any information requested as a hit. If a web page has 10 tiny icons on it, it takes at least 11 hits to load the page (1 for the page, 10 for the icons). As pages become more graphically rich, it takes even more requests to load them. A 10,000-hit-per-day site might only be serving a few hundred pages per day!
Popular Web Statistics Tools
As long as you realize that log analysis data can be deceiving, it can still provide useful information. Here are a few popular web statistics packages available for Tiger:
Generating Statistics with AWStats
If you run a server, you'll want a stats solution implemented as soon as possible. Understanding what is happening on your server can help locate errors in your website, find potential hackers, and identify where you should focus your development efforts. To get your site up and running with a stats solution that will likely provide most of the features you'll ever need, let's take a look at how to setup AWStats on your system.
To begin, download the latest AWStats distribution from http://awstats.sourceforge.net/#DOWNLOAD, and unarchive it. Next, use mv to move the distribution directory to /usr/local/awstats.
Now, enter the directory /usr/local/awstats/tools and issue the command sudo perl awstats_configure.pl. This will walk you through a simple setup script, as follows:
----- AWStats awstats_configure 1.0 (build 1.3) (c) Laurent Destailleur ----- This tool will help you to configure AWStats to analyze statistics for one web server. You can try to use it to let it do all that is possible in AWStats setup, however following the step by step manual setup documentation (docs/index.html) is often a better idea. Above all if: - You are not an administrator user, - You want to analyze downloaded log files without web server, - You want to analyze mail or ftp log files instead of web log files, - You need to analyze load balanced servers log files, - You want to 'understand' all possible ways to use AWStats... Read the AWStats documentation (docs/index.html). -----> Running OS detected: Linux, BSD or Unix -----> Check for web server install Found Web server Apache config file '/etc/httpd/httpd.conf'
If you've made the appropriate changes to use the combined log format, answer N to the next question to prevent AWStats from changing your configuration.
-----> Check and complete web server config file '/etc/httpd/httpd.conf' Warning: You Apache config file contains directives to write 'common' log files This means that some features can't work (os, browsers and keywords detection). Do you want me to setup Apache to write 'combined' log files [y/N] ? N Add 'Alias /awstatsclasses "/usr/local/awstats/wwwroot/classes/"' Add 'Alias /awstatscss "/usr/local/awstats/wwwroot/css/"' Add 'Alias /awstatsicons "/usr/local/awstats/wwwroot/icon/"' Add 'ScriptAlias /awstats/ "/usr/local/awstats/wwwroot/cgi-bin/"' Add '<Directory>' directive AWStats directives added to Apache config file. -----> Update model config file '/usr/local/awstats/wwwroot/cgi-bin/awstats.model.conf' File awstats.model.conf updated.
Allow the setup tool to create the initial configuration file for you by answering Y to the following question. Provide a simple name for your site, such as "MySite", used here, and then go with the default /etc/awstats directory to store the config files.
-----> Need to create a new config file ? Do you want me to build a new AWStats config/profile file (required if first install) [y/N] ? y -----> Define config file name to create What is the name of your web site or profile analysis ? Example: www.mysite.com Example: demo Your web site, virtual server or profile name: > MySite -----> Define config file path In which directory do you plan to store your config file(s) ? Default: /etc/awstats Directory path to store config file(s) (Enter for default): > -----> Create config file '/etc/awstats/awstats.MySite.conf' Config file /etc/awstats/awstats.MySite.conf created. -----> Restart Web server with '/sbin/service httpd restart' No such service httpd -----> Add update process inside a scheduler Sorry, configure.pl does not support automatic add to cron yet. You can do it manually by adding the following command to your cron: /usr/local/awstats/wwwroot/cgi-bin/awstats.pl -update -config=MySite Or if you have several config files and prefer having only one command: /usr/local/awstats/tools/awstats_updateall.pl now Press ENTER to continue... A SIMPLE config file has been created: /etc/awstats/awstats.MySite.conf You should have a look inside to check and change manually main parameters. You can then manually update your statistics for 'MySite' with command: > perl awstats.pl -update -config=MySite You can also read your statistics for 'MySite' with URL: > http://localhost/awstats/awstats.pl?config=MySite Press ENTER to finish...
Your installation is now complete, but before AWStats will actually run, you'll need to make a few changes to the configuration file and the /usr/local/awstats directory.
First, fix the permissions on /usr/local/awstats by typing chmod 755 /usr/local/awstats. This will allow Apache to execute the CGI application that has been installed.
Next, create a default data directory that is writeable by Apache:
brezup:jray jray $ sudo mkdir -p /var/lib/awstats brezup:jray jray $ sudo chown www /var/lib/awstats
Finally, edit the configuration file created by the setup utility. The file should be located at /etc/awstats/awstats.<your site config name>.conf such as awstats.MySite.conf for the sample we've created here.
Search for the line beginning with LogFile= this is should be set to the path of your Apache log, or, by default /var/log/httpd/access_log. Modify the line to read:
Now, save the configuration file and restart Apache with sudo /usr/sbin/apachectl restart.
Running the Stats Analysis
Before you can view the results of AWStats, you must run the analysis on the current log data. To do this, first enter the AWStats cgi-bin directory: cd /usr/local/awstats/wwwroot/cgi-bin. Then execute AWStats with sudo ./awstats.pl -config=<your site config name> -update. For example, the site defined in /etc/awstats/awstats.MySite.conf is processed with
brezup:jray jray $ sudo ./awstats.pl -config=MySite -update Update for config "/etc/awstats/awstats.MySite.conf" With data in log file "/var/log/httpd/access_log"... Phase 1 : First bypass old records, searching new record... Searching new records from beginning of log file... Jumped lines in file: 0 Parsed lines in file: 40 Found 0 dropped records, Found 0 corrupted records, Found 0 old records, Found 40 new qualified records.
Your can now view your site statistics by pointing your web browser at http://127.0.0.1/awstats/awstats.pl?config=<yoursiteconfigname>. A well-populated AWStats page is shown in Figure 23.6.
Figure 23.6. AWStats is a free tool for comprehensive web statistics.
There are many tweaks and additional features of the AWStats package that you can make/enable within the configuration file. You might want to browse the official documentation at http://awstats.sourceforge.net/docs/index.html for more information.
|< Day Day Up >|