A.3 HTTP Headers

only for RuBoard - do not distribute or recompile

A.3 HTTP Headers

It is interesting to examine the HTTP headers of requests and responses flowing through the caches. To get this information, I temporarily modified Squid to write a short binary record that indicates which headers are present. I also tracked the Cache-control directives.

The headers log file does not include URLs, so I cannot eliminate the popularity effects. There is one entry for each request from and each response to a client, so this data is from the client's point of view.

A.3.1 Client Request Headers

Table A-2 lists the request headers and their frequency of occurrence. It's important to keep in mind that most of these requests come from child caches, not from web browsers. Furthermore, most of the child caches are also running Squid. Evidence of this is seen in the occurrence of Via and X-Forwarded-For headers. Both of these are added by proxies, and the latter is an extension header used by Squid. According to this data, around 99% of all requests come from child caches.

Table  A-2. Client Request Headers (IRCache Data)
Header % Occurrence Header % Occurrence
Host 99.91 Range 0.46
User-Agent 99.21 Connection 0.26
Via 98.90 From 0.24
Accept 98.84 Date 0.18
Cache-Control 98.34 Proxy-Authorization 0.07
X-Forwarded-For 98.19 Request-Range 0.06
Accept-Language 91.33 If-Range 0.05
Referer 85.00 Expires 0.02
Accept-Encoding 82.60 Mime-Version 0.01
Proxy-Connection 78.46 Content-Encoding 0.00
Cookie 39.18 Location 0.00
Accept-Charset 28.77 If-Match 0.00
If-Modified-Since 24.83 X-Cache 0.00
Pragma 13.18 Age 0.00
Other 5.82 Last-Modified 0.00
Authorization 1.41 Server 0.00
Content-Type 1.00 ETag 0.00
Content-Length 0.84 Accept-Ranges 0.00
If-None-Match 0.61 Set-Cookie 0.00

The Referer and From headers are interesting for their privacy implications. Fortunately, very few requests include the From header. Referer is quite common, but it is less of a threat to privacy.

The data indicates that about 25% of all requests are cache validations. Most of these are If-Modified-Since requests, and a small amount are If-None-Match . Note that Squid does not support ETag-based validation at this time.

Table A-3 lists the Cache-control directives found in the same set of requests. The max-age directive occurs often because Squid always adds this header when forwarding a request to a neighbor cache. The only-if-cached directives come from caches configured in a sibling relationship. (The only-if-cached directive instructs the sibling not to forward the request if it is a cache miss .)

Table  A-3. Cache-control Request Directives (IRCache Data)
Directive % Occurrence
max-age 98.01
only-if-cached 9.63
no-cache 0.09

A.3.2 Client Reply Headers

Table A-4 lists the HTTP reply headers. X-Cache is an extension header that Squid uses for debugging. Its value is either HIT or MISS to indicate whether the reply came from a cached response.

Table  A-4. Client Reply Headers (IRCache Data)
Header % Occurrence Header % Occurrence
X-Cache 100.00 Warning 0.03
Proxy-Connection 99.88 Content-Language 0.02
Date 95.20 WWW-Authenticate 0.02
Content-Type 84.94 Title 0.01
Server 82.49 Content-Base 0.01
Content-Length 65.67 Location 0.00
Last-Modified 65.61 Referer 0.00
ETag 53.07 Content-MD5 0.00
Accept-Ranges 48.06 From 0.00
Age 24.28 Host 0.00
Cache-Control 10.36 Public 0.00
Expires 10.30 Upgrade 0.00
Pragma 3.13 X-Request-URI 0.00
Set-Cookie 3.04 Cookie 0.00
Other 2.99 Accept-Charset 0.00
Mime-Version 1.62 User-Agent 0.00
Via 0.80 Retry-After 0.00
Vary 0.66 Accept-Language 0.00
Link 0.53 Authorization 0.00
Content-Location 0.28 Range 0.00
Content-Encoding 0.28 Accept-Encoding 0.00
Allow 0.19 X-Forwarded-For 0.00
Connection 0.12 If-Modified-Since 0.00
Accept 0.04 Content-Range 0.00

The Date header is important for caching. RFC 2616 says that every response must have a Date header, with few exceptions. Here we see it in about 95% of replies, which is pretty good.

Content-length occurs in only 65% of responses. This is unfortunate, because when a client (including proxies) doesn't know how long the message should be, it's difficult to detect partial responses due to network problems. The missing Content-length header also prevents a connection from being persistent, unless the agents use chunked encoding.

Table A-5 lists the Cache-control reply directives present in the responses sent to cache clients . As you can see, no-cache and private are the most popular directives. The fact that both occur in 4.6% of responses leads me to believe they probably always occur together. max-age is the only other directive that occurs in more than 1% of responses. The "Other" entry refers to unknown or nonstandard directives.

Table  A-5. Cache-control Reply Directives (IRCache Data)
Directive % Occurrence Directive % Occurrence
no-cache 4.60 no-store 0.06
private 4.60 no-transform 0.02
max-age 2.69 s-maxage 0.00
must-revalidate 0.23 proxy-revalidate 0.00
Other 0.11 only-if-cached 0.00
public 0.09    

If we want to find the percentage of responses that have an expiration time, we need to know how often the Expires header and max-age directive occur separately and together. Table A-6 shows the percentage of responses that have one, the other, neither, and both of these headers. In these traces, 89.65% of responses have neither header, which means that only 10.35% have an expiration value. You can see that the Expires header is still more popular than the max-age directive, and that max-age almost never appears alone.

Table  A-6. Responses with Explicit Expiration Times (IRCache Data)
Header/Directive % Occurrence
Neither Expires nor max-age 89.65
Expires only 7.65
Both 2.64
max-age only 0.05

The analysis is similar for cache validators, although the results in Table A-7 are more encouraging. 77.04% of all responses sent to clients have a cache validator. Last-modified is still more popular than the ETag header, although a significant percentage (11.43%) of responses carry only the ETag validator.

Table  A-7. Responses with Cache Validators (IRCache Data)
Header/Directive % Occurrence
Both 41.64
Last-modified only 23.97
Neither Last-modified nor ETag 22.96
ETag only 11.43
only for RuBoard - do not distribute or recompile


Web Caching
Web Caching
ISBN: 156592536X
EAN: N/A
Year: 2001
Pages: 160

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net