2.1 HTTP Requests

only for RuBoard - do not distribute or recompile

2.1 HTTP Requests

Clients always use HTTP when talking to a caching proxy. This is true even when the client requests an FTP or Gopher URL, as we'll see shortly. A client issues a slightly different request when it knows it is talking to a proxy server rather than to an origin server. Occasionally, requests to a cache are referred to as proxy HTTP requests.

2.1.1 Origin Server Requests

First, let's examine a request sent to an origin server. In this example, the user is requesting the URL http://www.nlanr.net/index.html. When the client is not configured to use a proxy, it connects directly to the origin server ( www.nlanr.net ) and writes this request:

 GET /index.html HTTP/1.1 Host: www.nlanr.net Accept: */* Connection: Keep-alive 

In reality, the request includes many more headers than are shown here. Note how the URL has been split into two parts . The request line (the first line) includes only the pathname component of the URL, while the hostname part appears later in a Host header. The Host header is an HTTP/1.1 feature, primarily intended to support virtual hosting of multiple logical web sites on one physical server (one IP address). If the origin server is not serving virtual domains, the Host header is redundant. Note that we can rebuild a full URL from the request line and the Host header. This is an important feature of HTTP/1.1, especially for interception proxies.

2.1.2 Proxy Requests

When a client talks to a proxy, the request is slightly different. The request line of a proxy request includes the full URI:

 GET http://www.nlanr.net/index.html HTTP/1.1 Host: www.nlanr.net Accept: */* Proxy-connection: Keep-alive 

The origin server name is in two places: the full URI and the Host header. This may seem redundant, but when HTTP/1.0 and proxying techniques were invented, the Host header did not exist. [1]

[1] Since the hostname exists in two places, it's possible they don't match. When this occurs, the URL hostname takes precedence. To avoid incorrect operation, a caching proxy must ignore the original Host value and replace it with the URL hostname.

HTTP provides for the fact that requests and responses can pass through a number of proxies between a client and origin server. Some HTTP headers are defined as end-to-end and some as hop-by-hop . The end-to-end headers convey information for the end systems (client and origin server), and they generally must not be modified by proxies. The Cookie header is end-to-end. Conversely, the hop-by-hop header information is meant for intermediate systems, and it must often be modified or removed before being forwarded. The Proxy-connection and Proxy-authorization headers are hop-by-hop. A client uses the Proxy-connection request header to ask the proxy to make the TCP connection "persistent," so that it can be reused for a future request. The Proxy-authorization header contains credentials for access to the proxy, not the origin server.

2.1.3 Non-HTTP Proxy Requests

Finally, let's look at how proxies handle non-HTTP URIs. Most user- agents have built-in support for other transfer protocols, such as FTP and Gopher. In other words, these clients know how to talk directly to FTP and Gopher servers. Most caching proxies, however, do not emulate FTP and Gopher servers. That is, you can't FTP to a caching proxy. This restriction, for better or worse , comes about because it's hard to actually proxy these legacy transport protocols. You might say it's a limitation of their design.

Since HTTP knows about proxies, non-HTTP URIs are proxied as HTTP requests. For example, the user-agent's request looks something like this:

 GET ftp://ftp.freebsd.org/pub/ HTTP/1.1 Host: ftp.freebsd.org Accept: */* 

Because this is an HTTP request, the proxy generates an HTTP response. In other words, it is the role of the proxy to convert from FTP on one side to HTTP on the other. The proxy acts like an FTP client when it talks to an FTP server. The response sent to the user-agent, however, is an HTTP response.

Non-HTTP requests are somewhat difficult for caching proxies in a couple of ways. Both FTP and Gopher services rely heavily on the use of directory listings. Convention dictates that directory listings are displayed with cute little icons and hyperlinks to each of the directory entries. A web client that connects directly to an FTP or Gopher server parses the directory listing and generates a nice HTML page. HTTP servers, however, work differently. For HTTP, the HTML page is generated by the server, not the client. Since FTP and Gopher are proxied via HTTP, it becomes the proxy's responsibility to generate pleasant-looking pages for directory listings. This causes directory pages to look different, depending on whether you proxy FTP and Gopher requests.

Another difficult aspect is that HTTP has certain attributes and features missing from these older protocols. In particular, it can be hard to get values for the Last-modified and Content-length headers. Of course, FTP and Gopher servers don't understand anything about expiration and validation. [2] Usually, the caching proxy just selects an appropriate freshness lifetime for non-HTTP objects. Because the proxy has little information, it may return more stale responses for FTP files than for HTTP resources.

[2] Some FTP servers do support nonstandard commands for getting a file's modification time ( MDTM ) and length ( SIZE ). These might be used to implement validation. The IETF is currently considering a document that would finally standardize these commands.

Determining a response's content type is another hard problem. Proxies and servers often use the filename extension to determine the content type. For example, URLs that end with .jpg have the Content-type image/jpeg . For FTP files, the proxy must guess the Content-type , and it's relatively common to find mysterious filename extensions, such as .scr . The proxy's choice for a content-type affects the way a browser handles the response. For text types, the browser displays the data in a window. For others, such as the catchall application/octet-stream , the browser brings up a "Save as" dialog window. As new filename extensions and content types are brought into use, caching proxies must be updated.

Because of problems such as these, many organizations don't bother to cache non-HTTP requests. As we'll see in Chapter 4, you can easily configure user agents to send or not send FTP and other types of URLs to a caching proxy.

only for RuBoard - do not distribute or recompile


Web Caching
Web Caching
ISBN: 156592536X
EAN: N/A
Year: 2001
Pages: 160

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net