A.2 Content Types

only for RuBoard - do not distribute or recompile

A.2 Content Types

Table A-1 shows the breakdown of responses by content type. As you can see, images make up about 60% of all web requests by count and 40% by volume. The top three content types ”GIFs, JPEGs, and HTML ”account for 95% of all requests and 63% of all traffic volume.

Table  A-1. The Most Popular Content Types (IRCache Data)
Content Type Count % Volume % Mean Size , KB
image/gif 40.8 16.6 3.75
text/html 35.1 23.1 6.07
image/jpeg 19.0 22.9 11.12
text/plain 1.7 2.5 13.45
application/x-javascript 1.5 0.2 1.45
application/octet-stream 0.5 10.2 179.22
application/zip 0.1 8.0 684.14
video/mpeg 0.0 3.4 761.90
application/pdf 0.0 1.3 336.30
audio/mpeg 0.0 2.5 1707.70
video/quicktime 0.0 1.2 1205.42
All others 1.1 8.1 69.60

This data is derived from the fifth and tenth fields of Squid's access.log file. The logs include many responses without a content type, such as 302 (Found) and 304 (Not Modified). All non-200 status responses without a content type have been filtered out. I have not eliminated the effects of popularity. Thus, these numbers represent the percentage of requests made by clients rather than the percentage of content that lives at origin servers.

Figure A-3 shows some long- term trends of the three most popular content types and JavaScript. The percentage of JPEG images remains more or less constant at about 20%. GIF requests seem to have a decreasing trend, and HTML has a corresponding increasing trend. The GIF and JPEG traces are very periodic. The peaks and valleys correspond to weekends and weekdays. On weekends, JPEG counts are higher and GIF counts are lower. A possible explanation is that pornographic images are usually JPEG files, and people view pornography more on weekends than on weekdays. The JavaScript trace shows a very slight increasing trend, although it comprises a very small fraction of total traffic.

Figure A-3. Content-type trends over time (IRCache data)
figs/webc_a03.gif

This data is also derived from Squid's access.log file. To determine the content type, I look at the Content-type header and, if that's not present, the filename extension. I ignore responses for which the content type cannot be determined.

only for RuBoard - do not distribute or recompile


Web Caching
Web Caching
ISBN: 156592536X
EAN: N/A
Year: 2001
Pages: 160

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net