You want to see programs your web server is running and user requests for web pages.
Use command-line tools to get a real-time snapshot of web server activity:
Almost any decent web hosting account will record connections to your web site in logfiles that you can view and process. A good hosting provider may even help you automate the task of purging the connection recordsor log rollingso the files do not consume your account's disk quota, and give you access to web site statistics software, such as Analog or Urchin, that will generate easy-to-read reports about activity on your web site.
If you're serious about your web site, then you should take advantage of the tools available to you and review web site traffic reports often to understand how visitors get to your site, what's popular, and what's working (or not working). How to look at and use web site traffic reports is covered in Recipe 9.9.
The access and error logs that provide the raw material for traffic reports are constantly updated. Traffic reports themselves, on the other hand, are usually generated less frequentlydaily, or even weekly, in some cases. A situation may arise when you can't wait for the next traffic report to be created. You need to get an up-to-the-minute picture of the who, what, and how many of your web site's current activity. Here are some command-line tools you can use to take your web site's pulse.
Using tail to track web site requests in real time
First, you'll need to find your Apache access and error logfiles. They are usually saved in a separate logs directory and have names like access_log, access.log, or apache.access_log. The error log should be in the same directory with the access log, so once you've found the logs, Telnet into your web server and switch to the logfiles directory.
Now you can watch connections to your web site as they're handled by Apache with the Unix utility tail. Assuming your access log is named access_log, type this command at your Telnet prompt:
tail -f access_log
Your shell window should be filled with several lines, like this:
126.96.36.199 - - [14/May/2005:12:49:26 -0500] "GET /swgr/index.php HTTP/1.1" 200 29070 "http://daddison.com/index.html" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)" 188.8.131.52 - - [14/May/2005:12:49:30 -0500] "GET /case_studies/cs01.html HTTP/1.0" 200 19604 "-" "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT; .NET CLR 1.1.4322)" 184.108.40.206 - - [14/May/2005:12:49:33 -0500] "GET /clients/index.html HTTP/1.1" 301 255 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
Each line indicates the IP number, file requested, and status of each unique connection, or hit, to your web site. The -f flag on the command tells tail to show the last 10 lines in the access log, and to echo new lines to the shell window as they are appended to the file. See for yourself: open a browser window and, with your shell window still visible, hit a page on your web site. Your request should be duly noted by tail.
Using grep to find specific requests in the web server log
Going back to the problem in Recipe 1.8 about automatically updating pages on your site, let's say that your boss wants to know how many hits to the company's latest news release have been recorded today. And she can't wait until tomorrow, when a nice and neat traffic report will be waiting on the site with the answer. With grep, you can narrow your focus on the access log to just see recent requests for a specific file.
At the Telnet prompt to your web server, you can instruct the grep utility to search the access log for the filename of the news release in the content of the current access log by typing this command:
grep "GET /news/newsrelease.html" access_log
With the search string GET /news/newsrelease.html you're looking for all the requests for newsrelease.html in the /news directory in the current server log. The results might look like this:
220.127.116.11 - - [14/May/2005:13:55:45 -0500] "GET /news/newsrelease.html HTTP/1.1" 200 18912 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.1.4322)" 18.104.22.168 - - [14/May/2005:13:56:36 -0500] "GET /news/newsrelease.html HTTP/1.1" 200 18912 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows 98; Win 9x 4.90)" 22.214.171.124 - - [14/May/2005:13:58:09 -0500] "GET /news/newsrelease.html HTTP/1.1" 200 18912 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
You can also send the results of the search to file by modifying the command like this:
grep "newsrelease.html" access_log > newsrelease_report.txt
And if you want to get really fancy, you can put that second grep command in your crontab file, have it run every 15 minutes, and let the boss check the hits herself.
You also can use grep to sift the access log for errors and unsuccessful requests that visitors to your web site are encountering. Each line in the log also includes an error code indicating the result of the request. Some common error codes are shown in Table 1-2. For a complete list, see the World Wide Web Consortium (W3C) list referred to in the "See Also" section of this Recipe.
Using ps to monitor web server processes
Finally, there may come a time when you want to see what processes are running under your user ID on your web server. Use the Unix process report utilitypswith this command, replacing userid with your own ID (right after the -U flag):
The results should look something like this, with httpd indicating Apache processes that are currently running on your web server:
PID TTY TIME CMD 11565 ? 0:00 httpd 1715 pts/5 0:00 tail 11569 pts/6 0:00 tcsh 11560 ? 0:00 httpd 11567 ? 0:00 sshd 11512 ? 0:00 sh 11542 ? 0:01 httpd 29475 ? 0:01 sshd 29477 pts/5 0:00 tcsh 6373 ? 0:00 sshd 11559 ? 0:00 httpd 11578 pts/6 0:00 ps 11557 ? 0:00 httpd 11553 ? 0:00 httpd 11554 ? 0:00 httpd
For a complete list of HTTP status code definitions, see the W3C page at http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html.