Tricky Things About Proxy Requests

6.5 Tricky Things About Proxy Requests

This section explains some of the tricky and much misunderstood aspects of proxy server requests, including:

                How the URIs in proxy requests differ from server requests

                How intercepting and reverse proxies can obscure server host information

                The rules for URI modification

                How proxies impact a browser's clever URI auto-completion or hostname-expansion features

6.5.1 Proxy URIs Differ from Server URIs

Web server and web proxy messages have the same syntax, with one exception. The URI in an HTTP request message differs when a client sends the request to a server instead of a proxy.

When a client sends a request to a web server, the request line contains only a partial URI (without a scheme, host, or port), as shown in the following example:

 GET /index.html HTTP/1.0 
 User-Agent: SuperBrowserv1.3 

When a client sends a request to a proxy, however, the request line contains the full URI. For example:

 GET http://www.marys-antiques.com/index.html HTTP/1.0 
 User-Agent: SuperBrowser v1.3 

Why have two different request formats, one for proxies and one for servers? In the original HTTP design, clients talked directly to a single server. Virtual hosting did not exist, and no provision was made for proxies. Because a single server knows its own hostname and port, to avoid sending redundant information, clients sent just the partial URI, without the scheme and host (and port).

When proxies emerged, the partial URIs became a problem. Proxies needed to know the name of the destination server, so they could establish their own connections to the server. And proxy-based gateways needed the scheme of the URI to connect to FTP resources and other schemes. HTTP/1.0 solved the problem by requiring the full URI for proxy requests, but it retained partial URIs for server requests (there were too many servers already deployed to change all of them to support full URIs). [8]

[8] HTTP/1.1 now requires servers to handle full URIs for both proxy and server requests, but in practice, many deployed servers still accept only partial URIs.

So we need to send partial URIs to servers, and full URIs to proxies. In the case of explicitly configured client proxy settings, the client knows what type of request to issue:

                When the client is not set to use a proxy, it sends the partial URI ( Figure 6-15 a).

                When the client is set to use a proxy, it sends the full URI ( Figure 6-15 b).

Figure 6-15. Intercepting proxies will get server requests

figs/http_0615.gif

6.5.2 The Same Problem with Virtual Hosting

The proxy "missing scheme/host/port" problem is the same problem faced by virtually hosted web servers. Virtually hosted web servers share the same physical web server among many web sites. When a request comes in for the partial URI /index.html , the virtually hosted web server needs to know the hostname of the intended web site (see Section 5.7.1.1 and Section 18.2 for more information).

In spite of the problems being similar, they were solved in different ways:

                Explicit proxies solve the problem by requiring a full URI in the request message.

                Virtually hosted web servers require a Host header to carry the host and port information.

6.5.3 Intercepting Proxies Get Partial URIs

As long as the clients properly implement HTTP, they will send full URIs in requests to explicitly configured proxies. That solves part of the problem, but there's a catch: a client will not always know it's talking to a proxy, because some proxies may be invisible to the client. Even if the client is not configured to use a proxy, the client's traffic still may go through a surrogate or intercepting proxy. In both of these cases, the client will think it's talking to a web server and won't send the full URI:

                A surrogate , as described earlier, is a proxy server taking the place of the origin server, usually by assuming its hostname or IP address. It receives the web server request and may serve cached responses or proxy requests to the real server. A client cannot distinguish a surrogate from a web server, so it sends partial URIs ( Figure 6-15 c).

                An intercepting proxy is a proxy server in the network flow that hijacks traffic from the client to the server and either serves a cached response or proxies it. Because the intercepting proxy hijacks client-to-server traffic, it will receive partial URIs that are sent to web servers ( Figure 6-15 d). [9]

[9] Intercepting proxies also might intercept client-to-proxy traffic in some circumstances, in which case the intercepting proxy might get full URIs and need to handle them. This doesn't happen often, because explicit proxies normally communicate on a port different from that used by HTTP (usually 8080 instead of 80), and intercepting proxies usually intercept only port 80.

6.5.4 Proxies Can Handle Both Proxy and Server Requests

Because of the different ways that traffic can be redirected into proxy servers, general-purpose proxy servers should support both full URIs and partial URIs in request messages. The proxy should use the full URI if it is an explicit proxy request or use the partial URI and the virtual Host header if it is a web server request.

The rules for using full and partial URIs are:

                If a full URI is provided, the proxy should use it.

                If a partial URI is provided, and a Host header is present, the Host header should be used to determine the origin server name and port number.

                If a partial URI is provided, and there is no Host header, the origin server needs to be determined in some other way:

o               If the proxy is a surrogate, standing in for an origin server, the proxy can be configured with the real server's address and port number.

o               If the traffic was intercepted, and the interceptor makes the original IP address and port available, the proxy can use the IP address and port number from the interception technology (see Chapter 20 ).

o               If all else fails, the proxy doesn't have enough information to determine the origin server and must return an error message (often suggesting that the user upgrade to a modern browser that supports Host headers). [10]

[10] This shouldn't be done casually. Users will receive cryptic error pages they never got before.

6.5.5 In-Flight URI Modification

Proxy servers need to be very careful about changing the request URI as they forward messages. Slight changes in the URI, even if they seem benign , may create interoperability problems with downstream servers.

In particular, some proxies have been known to "canonicalize" URIs into a standard form before forwarding them to the next hop. Seemingly benign transformations, such as replacing default HTTP ports with an explicit ":80", or correcting URIs by replacing illegal reserved characters with their properly escaped substitutions, can cause interoperation problems.

In general, proxy servers should strive to be as tolerant as possible. They should not aim to be "protocol policemen" looking to enforce strict protocol compliance, because this could involve significant disruption of previously functional services.

In particular, the HTTP specifications forbid general intercepting proxies from rewriting the absolute path parts of URIs when forwarding them. The only exception is that they can replace an empty path with "/".

6.5.6 URI Client Auto-Expansion and Hostname Resolution

Browsers resolve request URIs differently, depending on whether or not a proxy is present. Without a proxy, the browser takes the URI you type in and tries to find a corresponding IP address. If the hostname is found, the browser tries the corresponding IP addresses until it gets a successful connection.

But if the host isn't found, many browsers attempt to provide some automatic "expansion" of hostnames, in case you typed in a "shorthand" abbreviation of the host (refer back to Section 2.3.2 ): [11]

[11] Most browsers let you type in "yahoo" and auto-expand that into "www.yahoo.com." Similarly, browsers let you omit the "http://" prefix and insert it if it's missing.

                Many browsers attempt adding a "www." prefix and a ".com" suffix, in case you just entered the middle piece of a common web site name (e.g., to let people enter "yahoo" instead of "www.yahoo.com").

                Some browsers even pass your unresolvable URI to a third-party site, which attempts to correct spelling mistakes and suggest URIs you may have intended.

                In addition, the DNS configuration on most systems allows you to enter just the prefix of the hostname, and the DNS automatically searches the domain. If you are in the domain "oreilly.com" and type in the hostname "host7," the DNS automatically attempts to match "host7.oreilly.com". It's not a complete, valid hostname.

6.5.7 URI Resolution Without a Proxy

Figure 6-16 shows an example of browser hostname auto-expansion without a proxy. In steps 2a-3c, the browser looks up variations of the hostname until a valid hostname is found.

Figure 6-16. Browser auto-expands partial hostnames when no explicit proxy is present

figs/http_0616.gif

Here's what's going on in this figure:

                In Step 1, the user types "oreilly" into the browser's URI window. The browser uses "oreilly" as the hostname and assumes a default scheme of "http://", a default port of "80", and a default path of "/".

                In Step 2a, the browser looks up host "oreilly." This fails.

                In Step 3a, the browser auto-expands the hostname and asks the DNS to resolve "www.oreilly.com." This is successful.

                The browser then successfully connects to www.oreilly.com .

6.5.8 URI Resolution with an Explicit Proxy

When you use an explicit proxy the browser no longer performs any of these convenience expansions, because the user's URI is passed directly to the proxy.

As shown in Figure 6-17 , the browser does not auto-expand the partial hostname when there is an explicit proxy. As a result, when the user types "oreilly" into the browser's location window, the proxy is sent "http://oreilly/" (the browser adds the default scheme and path but leaves the hostname as entered).

Figure 6-17. Browser does not auto-expand partial hostnames when there is an explicit proxy

figs/http_0617.gif

For this reason, some proxies attempt to mimic as much as possible of the browser's convenience services as they can, including "www...com" auto-expansion and addition of local domain suffixes. [12]

[12] But, for widely shared proxies, it may be impossible to know the proper domain suffix for individual users.

6.5.9 URI Resolution with an Intercepting Proxy

Hostname resolution is a little different with an invisible intercepting proxy, because as far as the client is concerned , there is no proxy! The behavior proceeds much like the server case, with the browser auto-expanding hostnames until DNS success. But a significant difference occurs when the connection to the server is made, as Figure 6-18 illustrates.

Figure 6-18. Browser doesn't detect dead server IP addresses when using intercepting proxies

figs/http_0618.gif

Figure 6-18 demonstrates the following transaction:

                In Step 1, the user types "oreilly" into the browser's URI location window.

                In Step 2a, the browser looks up the host "oreilly" via DNS, but the DNS server fails and responds that the host is unknown, as shown in Step 2b.

                In Step 3a, the browser does auto-expansion, converting "oreilly" into "www.oreilly.com." In Step 3b, the browser looks up the host "www.oreilly.com" via DNS. This time, as shown in Step 3c, the DNS server is successful and returns IP addresses back to the browser.

                In Step 4a, the client already has successfully resolved the hostname and has a list of IP addresses. Normally, the client tries to connect to each IP address until it succeeds, because some of the IP addresses may be dead. But with an intercepting proxy, the first connection attempt is terminated by the proxy server, not the origin server. The client believes it is successfully talking to the web server, but the web server might not even be alive .

                When the proxy finally is ready to interact with the real origin server (Step 5b), the proxy may find that the IP address actually points to a down server. To provide the same level of fault tolerance provided by the browser, the proxy needs to try other IP addresses, either by reresolving the hostname in the Host header or by doing a reverse DNS lookup on the IP address. It is important that both intercepting and explicit proxy implementations support fault tolerance on DNS resolution to dead servers, because when browsers are configured to use an explicit proxy, they rely on the proxy for fault tolerance.

 



HTTP. The Definitive Guide
HTTP: The Definitive Guide
ISBN: 1565925092
EAN: 2147483647
Year: 2001
Pages: 294

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net