Section 6.5. Web Server Statistics


6.5. Web Server Statistics

Thus far I have discussed HTTP headers as a means to learn about an individual server. By taking a broader view and looking at them across many sites, it is possible to derive interesting statistics on the Internet at large.

Netcraft, a U.K.-based Internet services company, has built much of its reputation by looking at HTTP headers. Since 1995 they have been archiving the headers and other information from web servers around the world. This has allowed them to track a number of important statistics about the Internet such as the relative popularity of different web servers, operating systems, ISPs, and so forth.

Their survey of the market share of each major type of web server has long been used to highlight the impact that the Apache server and, by association, open source software (OSS), have had on the development of the Internet. The current version of this survey is available at http://news.netcraft.com/archives/web_server_survey.html. Figure 6-2, taken from that survey, shows how Apache dominates the field and continues to grow in importance.

Netcraft provides an excellent Frequently Asked Questions (FAQ) page that details how they capture their data (http://uptime.netcraft.com/up/accuracy.html).

As well as their impressive summary statistics, Netcraft also makes available information on individual sites. You can see what they know about any specific site by visiting http://uptime.netcraft.com/up/graph and entering its URL. With most small sites, the information is limited to that contained in the Server header and DNS entries. But querying with larger sites may yield a lot more detail. In some cases, Netcraft can provide graphs that track the types of server in use at a site, as well as the estimated time since their last reboot. These offer a fascinating glimpse into the evolution of the computing infrastructure within companies or government agencies. This is beautifully illustrated by their graph for www.apple.com, which is shown in Figure 6-3.

This shows the switch from Solaris servers to Mac OS X in the middle of 2000. The diagonal tracks indicate separate servers and provide an indication of the number of servers that were being used to support this domain at any one time.

Figure 6-2. Market share for top servers across all domains August 1995-April 2005 (Copyright Netcraft http://www.netcraft.com)


Figure 6-3. Graph of operating system and server uptime for www.apple.com (Copyright Netcraft, www.netcraft.com)


Try entering the URLs of some of your favorite sites and see what you can uncover about their history. The most informative sites tend to be medium to large companies, which are not using fancy caching or load balancing hardware, which can disrupt Netcraft's data capture. If inspiration fails you, the Internet Archive (www.archive.org), Harvard University (www.harvard.edu), and the SCO Group (www.sco.com) produce interesting results.



Internet Forensics
Internet Forensics
ISBN: 059610006X
EAN: 2147483647
Year: 2003
Pages: 121

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net