5.6 Issues

only for RuBoard - do not distribute or recompile

Interception caching is still somewhat controversial . Even though it sounds like a great idea initially, you should carefully consider the following issues before deploying it on your network.

5.6.1 It's Difficult for Users to Bypass

If for some reason, one of your users encounters a problem with interception caching, he or she is going to have a difficult time getting around the cache. Possible problems include stale pages, servers that are incompatible with your cache (but work without it), and IP-based access controls. The only way to get around an interception cache is to configure a different proxy cache manually. Then the TCP packets are not sent to port 80 and thus are not diverted to the cache. Most likely, the user is not savvy enough to configure a proxy manually, let alone realize what the problem is. And even if he does know what to do, he most likely does not have access to another proxy on the Internet.

In my role as a Squid developer, I've received a number of email messages asking for help bypassing ISP settings. The following message is real; only the names have been changed to protect the guilty:

 Duane, I am Zach Ariah, a subscriber of XXX Internet - an ISP who has recently installed... Squid/1.NOVM.21. All of my HTTP requests are now being forced through the proxy(proxy-03-real.xxxxx.net). I really don't like this, and am wondering if there is anyway around this. Can I do some hack on my client machine, or put something special into the browser, which will make me bypass the proxy??? I know the proxy looks at the headers. This is why old browsers don't work. Anyway... Please let me know what's going on with this. Thank you and Best regards, Zach Ariah

This is a more serious issue for ISPs than it is for corporations. Users in a corporate environment are more likely to find someone who can help them. Also, corporate users probably expect their web traffic to be filtered and cached. ISP customers have more to be angry about, since they pay for the service themselves .

All layer four switching and routing products have the ability to bypass the cache for special cases. For example, you can tell the switch to forward packets normally if the origin server is www.hotmail.com or if the request is from the user at 172.16.4.3. However, only the administrator can change the configuration. Users who experience problems need to ask the administrator for assistance. Getting help may take hours, or even days. In some cases, users may not understand the situation well enough to ask for help. It's also likely that such a request will be misinterpreted or perhaps even ignored. ISPs and other organizations that deploy interception caching must be extremely sensitive to problem reports from users trying to surf the Web.

5.6.2 Packet Transport Service

What exactly is the service that one gets from an Internet service provider? Certainly, we can list many services that a typical ISP offers, among them email accounts, domain name service, web hosting, and access to Usenet newsgroups. The primary service, however, is the transportation of TCP/IP packets to and from our systems. This is, after all, the fundamental function that enables all the other services.

As someone who understands a little about the Internet, I have certain expectations about the way in which my ISP handles my packets. When my computer sends a TCP/IP packet to my ISP, I expect my ISP to forward that packet towards its destination address. If my ISP does something different, such as divert my packets to a proxy cache, I might feel as though I'm not getting the service that I pay for.

But what difference does it make? If I still get the information I requested , what's wrong with that? One problem is related to the issues raised in Chapter 3. Users might assume that their web requests cannot be logged because they have not configured a proxy.

Another, more subtle point to be made is that some users of the network expect the network to behave predictably . The standards that define TCP connections and IP routing do not allow for connections to be diverted and accepted under false pretense. When I send a TCP/IP packet, I expect the Internet infrastructure to handle that packet as described in the standards documents. Predictability also means that a TCP/IP packet destined for port 80 should be treated just like a packet for ports 77, 145, and 8333.

5.6.3 Routing Changes

Recall that most interception caching systems expose the cache's IP address when forwarding requests to origin servers. This might alter the network path (routing) for the HTTP packets coming from the origin server to your client. In some cases, the change can be very minor; in others, it might be significant. It's more likely to affect ISPs than corporations and other organizations.

Some origin servers expect all requests from a client to come from the same IP address. This can really be a problem if the server uses HTTP/TLS and unencrypted HTTP. The unencrypted (port 80) traffic may be intercepted and sent through a caching proxy; the encrypted traffic is not intercepted. Thus, the two types of requests come from two different IP addresses. Imagine that the server creates some session information and associates the session with the IP address for unencrypted traffic. If the server instructs the client to make an HTTP/TLS request using the same session, it may refuse the request because the IP address doesn't match what it expects. Given the high proliferation of caching proxies today, it is unrealistic for an origin server to make this requirement. The session key alone should be sufficient, and the server shouldn't really care about the client's IP address.

When an interception cache is located on a different subnet from the clients using the cache, a particularly confusing situation may arise. The cache may be unable to reach an origin server for whatever reason, perhaps because of a routing glitch. However, the client is able to ping the server directly or perhaps even telnet to it and see that it is alive and well. This can happen, of course, because the ping (ICMP) and telnet packets take a different route than HTTP packets. Most likely, the redirection device is unaware that the cache cannot reach the origin server, so it continues to divert packets for that server to the cache.

5.6.4 It Affects More Than Browsers and Users

Web caches are deployed primarily for the benefit of humans sitting at their computers, surfing the Internet. However, a significant amount of HTTP traffic does not originate from browsers. The client might instead be a so-called web robot, or a program that mirrors entire web sites, or any number of other things. Should these clients also use proxy caches? Perhaps, but the important thing is that with interception proxying, they have no choice.

This problem manifested itself in a sudden and very significant way in June of 1998, when Digex decided to deploy interception caching on their backbone network. The story also involves Cybercash, a company that handles credit card payments on the Internet. The Cybercash service is built behind an HTTP server, thus it uses port 80. Furthermore, Cybercash uses IP-based authentication for its services. That is, Cybercash requires transaction requests to come from the known IP addresses of its customers. Perhaps you can see where this is leading.

A number of other companies that sell merchandise on the Internet are connected through Digex's network. When a purchase is made at one of these sites, the merchant's server connects to Cybercash for the credit card transaction. However, with interception caching in place on the Digex network, Cybercash received these transaction connections from a cache IP address instead of the merchant's IP address. As a result, many purchases were denied until people finally realized what was happening.

The incident generated a significant amount of discussion on the North American Network Operators Group (NANOG) mailing list. Not everyone was against interception caching; many applauded Digex for being forward-thinking. However, this message from Jon Lewis (jlewis@fdt.net) illustrates the feelings of people who are negatively impacted by interception caching:

My main gripe with Digex is that they did this (forced our traffic into a transparent proxy) without authorization or notification. I wasted an afternoon, and a customer wasted several days worth of time over a 2 “3 week period trying to figure out why their cybercash suddenly stopped working. This customer then had to scan their web server logs, figure out which sales had been "lost" due to proxy breakage , and see to it that products got shipped out. This introduced unusual delays in their distribution, and had their site shut down for several days between their realization of a problem and resolution yesterday when we got Digex to exempt certain IP's from the proxy.

Others took an even stronger stance against interception caching. For example, Karl Denninger (karl@denninger.net) wrote:

Well, I'd love to know where they think they get the authority to do this from in the first place.... that is, absent active consent. I'd be looking over contracts and talking to counsel if someone tried this with transit connections that I was involved in. Hijacking a connection without knowledge and consent might even run afoul of some kind of tampering or wiretapping statute (read: big trouble).....

5.6.5 No-Intercept Lists

Given that interception caching does not work with some servers, how can we fix it? Currently, the only thing we can do is configure the switch or router not to divert certain connections to the cache. This must be a part of the switch/router configuration because, if the packets are diverted to the cache, there is absolutely nothing the cache can do to "undivert" them. Every interception technique allows you to specify special addresses that should not be diverted.

The maintenance of a no-intercept list is a significant administrative headache . Proxy cache operators cannot really be expected to know of every origin server that breaks with interception caching. At the same time, discovering the list of servers the hard way makes the lives of users and technical support staff unnecessarily difficult. A centrally maintained list has certain appeal , but it would require a standard format to work with products from different vendors .

One downside to a no-divert list is that it may also prevent useful caching of some objects. Routers and switches check only the destination IP address when deciding whether to divert a connection. Any given server might have a large amount of cachable content but only a small subset of URLs that do not work through caches. It is unfortunate that the entire site must not be diverted in this case.

5.6.6 Are Port 80 Packets Always HTTP?

I've already made the point that packets destined for port 80 may not necessarily be HTTP. The implied association between protocols and port numbers is very strong for low-numbered ports. Everyone knows that port 23 is telnet, port 21 is FTP, and port 80 is HTTP. However, these associations are merely conventions that have been established to maximize interoperation .

Nothing really stops me from running a telnet server on port 80 on my own system. The telnet program has the option to connect to any port, so I just need to type telnet myhostname 80 . However, this won't work if there is an interception proxy between my telnet client and the server. The router or switch assumes the port 80 connection is for an HTTP request and diverts it to the cache.

This issue is likely to be of little concern to most people, especially in corporate networks. Only a very small percentage of port 80 traffic is not really HTTP. In fact, some administrators see it as a positive effect, because it can prevent non-HTTP traffic from entering their network.

5.6.7 HTTP Interoperation Problems

Interception caching is known to impair HTTP interoperability. Perhaps the worst instance is with Microsoft Internet Explorer. When you click on Reload , and Explorer thinks it's connecting to the origin server, it omits the Cache-control: no-cache directive. The interception cache doesn't know the user clicked on Reload , so it serves a cache hit instead of forwarding the request to the origin server. ^[4]

^[4] See Microsoft Knowledgebase article Q266121, http://support.microsoft.com/support/kb/articles/Q266/1/21.ASP.

Interception proxies also pose problems for maintaining backwards compatibility. HTTP allows clients and servers to utilize new, custom request methods and headers. Ideally, proxy caches should be able to pass unknown methods and headers between the two sides. However, in practice, many caching products cannot process new request methods. A smart client can bypass the proxy cache for the unknown methods , unless interception caching is used.

5.6.8 IP Interoperation Problems

There are a number of ways that interception proxies impact IP interoperability. For example, consider path MTU ^[5] discovery. Internet hosts use the IP don't fragment option and ICMP feedback messages to discover the smallest MTU of all links between them. This technique is almost worthless when connection hijacking creates two network paths for a single pair of IP addresses.

^[5] The Maximum Transmission Unit is the largest packet size that can be sent in a single datalink-layer frame or cell .

Another problem arises when attempting to measure network proximity. One way to estimate how close you are to another server is to time how long it takes to open a TCP connection. Using this technique with an interception proxy in the way produces misleading results. Connections to port 80 are established quickly and almost uniformly. Connections to other ports, however, take significantly longer and vary greatly. A similar measurement tactic times how long it takes to complete a simple HTTP request. Imagine that you've developed a service that rates content providers based on how quickly their origin servers respond to your requests. Everything is working fine, until one day your ISP installs an interception cache. Now you're measuring the proxy cache rather than the origin servers.

I imagine that as IP security (RFC 2401) becomes more widely deployed, many people will discover problems caused by interception proxies. The IP security protocols and architecture are designed to ensure that packets are delivered end-to-end without modification. Indeed, connection hijacking is precisely one of the reasons to use IP security.

only for RuBoard - do not distribute or recompile