only for RuBoard - do not distribute or recompile |
Table A-1 shows the breakdown of responses by content type. As you can see, images make up about 60% of all web requests by count and 40% by volume. The top three content types ”GIFs, JPEGs, and HTML ”account for 95% of all requests and 63% of all traffic volume.
Content Type | Count % | Volume % | Mean Size , KB |
---|---|---|---|
image/gif | 40.8 | 16.6 | 3.75 |
text/html | 35.1 | 23.1 | 6.07 |
image/jpeg | 19.0 | 22.9 | 11.12 |
text/plain | 1.7 | 2.5 | 13.45 |
application/x-javascript | 1.5 | 0.2 | 1.45 |
application/octet-stream | 0.5 | 10.2 | 179.22 |
application/zip | 0.1 | 8.0 | 684.14 |
video/mpeg | 0.0 | 3.4 | 761.90 |
application/pdf | 0.0 | 1.3 | 336.30 |
audio/mpeg | 0.0 | 2.5 | 1707.70 |
video/quicktime | 0.0 | 1.2 | 1205.42 |
All others | 1.1 | 8.1 | 69.60 |
This data is derived from the fifth and tenth fields of Squid's access.log file. The logs include many responses without a content type, such as 302 (Found) and 304 (Not Modified). All non-200 status responses without a content type have been filtered out. I have not eliminated the effects of popularity. Thus, these numbers represent the percentage of requests made by clients rather than the percentage of content that lives at origin servers.
Figure A-3 shows some long- term trends of the three most popular content types and JavaScript. The percentage of JPEG images remains more or less constant at about 20%. GIF requests seem to have a decreasing trend, and HTML has a corresponding increasing trend. The GIF and JPEG traces are very periodic. The peaks and valleys correspond to weekends and weekdays. On weekends, JPEG counts are higher and GIF counts are lower. A possible explanation is that pornographic images are usually JPEG files, and people view pornography more on weekends than on weekdays. The JavaScript trace shows a very slight increasing trend, although it comprises a very small fraction of total traffic.
This data is also derived from Squid's access.log file. To determine the content type, I look at the Content-type header and, if that's not present, the filename extension. I ignore responses for which the content type cannot be determined.
only for RuBoard - do not distribute or recompile |