We've discussed different ways that HTTP can be used to enable access to various kinds of resources (through gateways) and to enable application-to-application communication. In this section, we'll take a look at another use of HTTP, web tunnels , which enable access to applications that speak non-HTTP protocols through HTTP applications.
Web tunnels let you send non-HTTP traffic through HTTP connections, allowing other protocols to piggyback on top of HTTP. The most common reason to use web tunnels is to embed non-HTTP traffic inside an HTTP connection, so it can be sent through firewalls that allow only web traffic.
Web tunnels are established using HTTP's CONNECT method. The CONNECT protocol is not part of the core HTTP/1.1 specification, [4] but it is a widely implemented extension. Technical specifications can be found in Ari Luotonen's expired Internet draft specification, "Tunneling TCP based protocols through Web proxy servers," or in his book Web Proxy Servers , both of which are cited at the end of this chapter.
[4] The HTTP/1.1 specification reserves the CONNECT method but does not describe its function.
The CONNECT method asks a tunnel gateway to create a TCP connection to an arbitrary destination server and port and to blindly relay subsequent data between client and server.
Figure 8-10 shows how the CONNECT method works to establish a tunnel to a gateway:
In Figure 8-10 a, the client sends a CONNECT request to the tunnel gateway. The client's CONNECT method asks the tunnel gateway to open a TCP connection (here, to the host named orders.joes-hardware.com on port 443, the normal SSL port).
The TCP connection is created in Figure 8-10 b and Figure 8-10 c.
Once the TCP connection is established, the gateway notifies the client ( Figure 8-10 d) by sending an HTTP 200 Connection Established response.
At this point, the tunnel is set up. Any data sent by the client over the HTTP tunnel will be relayed directly to the outgoing TCP connection, and any data sent by the server will be relayed to the client over the HTTP tunnel.
The example in Figure 8-10 describes an SSL tunnel, where SSL traffic is sent over an HTTP connection, but the CONNECT method can be used to establish a TCP connection to any server using any protocol.
The CONNECT syntax is identical in form to other HTTP methods , with the exception of the start line. The request URI is replaced by a hostname, followed by a colon , followed by a port number. Both the host and the port must be specified:
CONNECT home.netscape.com:443 HTTP/1.0
User-agent: Mozilla/4.0
After the start line, there are zero or more HTTP request header fields, as in other HTTP messages. As usual, the lines end in CRLFs, and the list of headers ends with a bare CRLF.
After the request is sent, the client waits for a response from the gateway. As with normal HTTP messages, a 200 response code indicates success. By convention, the reason phrase in the response is normally set to "Connection Established":
HTTP/1.0 200 Connection Established
Proxy-agent: Netscape-Proxy/1.1
Unlike normal HTTP responses, the response does not need to include a Content-Type header. No content type is required [5] because the connection becomes a raw byte relay, instead of a message carrier.
[5] Future specifications may define a media type for tunnels (e.g., application/tunnel), for uniformity .
Because the tunneled data is opaque to the gateway, the gateway cannot make any assumptions about the order and flow of packets. Once the tunnel is established, data is free to flow in any direction at any time. [6]
[6] The two endpoints of the tunnel (the client and the gateway) must be prepared to accept packets from either of the connections at any time and must forward that data immediately. Because the tunneled protocol may include data dependencies, neither end of the tunnel can ignore input data. Lack of data consumption on one end of the tunnel may hang the producer on the other end of the tunnel, leading to deadlock.
As a performance optimization, clients are allowed to send tunnel data after sending the CONNECT request but before receiving the response. This gets data to the server faster, but it means that the gateway must be able to handle data following the request properly. In particular, the gateway cannot assume that a network I/O request will return only header data, and the gateway must be sure to forward any data read with the header to the server, when the connection is ready. Clients that pipeline data after the request must be prepared to resend the request data if the response comes back as an authentication challenge or other non-200, nonfatal status. [7]
[7] Try not to pipeline more data than can fit into the remainder of the request's TCP packet. Pipelining more data can cause a client TCP reset if the gateway subsequently closes the connection before all pipelined TCP packets are received. A TCP reset can cause the client to lose the received gateway response, so the client won't be able to tell whether the failure was due to a network error, access control, or authentication challenge.
If at any point either one of the tunnel endpoints gets disconnected, any outstanding data that came from that endpoint will be passed to the other one, and after that also the other connection will be terminated by the proxy. If there is undelivered data for the closing endpoint, that data will be discarded.
Web tunnels were first developed to carry encrypted SSL traffic through firewalls. Many organizations funnel all traffic through packet-filtering routers and proxy servers to enhance security. But some protocols, such as encrypted SSL, cannot be proxied by traditional proxy servers, because the information is encrypted. Tunnels let the SSL traffic be carried through the port 80 HTTP firewall by transporting it through an HTTP connection ( Figure 8-11 ).
To allow SSL traffic to flow through existing proxy firewalls, a tunneling feature was added to HTTP, in which raw, encrypted data is placed inside HTTP messages and sent through normal HTTP channels ( Figure 8-12 ).
In Figure 8-12 a, SSL traffic is sent directly to a secure web server (on SSL port 443). In Figure 8-12 b, SSL traffic is encapsulated into HTTP messages and sent over HTTP port 80 connections, until it is decapsulated back into normal SSL connections.
Tunnels often are used to let non-HTTP traffic pass through port-filtering firewalls. This can be put to good use, for example, to allow secure SSL traffic to flow through firewalls. However, this feature can be abused, allowing malicious protocols to flow into an organization through the HTTP tunnel.
The HTTPS protocol (HTTP over SSL) can alternatively be gatewayed in the same way as other protocols: having the gateway (instead of the client) initiate the SSL session with the remote HTTPS server and then perform the HTTPS transaction on the client's part. The response will be received and decrypted by the proxy and sent to the client over ( insecure ) HTTP. This is the way gateways handle FTP. However, this approach has several disadvantages:
The client-to-gateway connection is normal, insecure HTTP.
The client is not able to perform SSL client authentication (authentication based on X509 certificates) to the remote server, as the proxy is the authenticated party.
The gateway needs to support a full SSL implementation.
Note that this mechanism, if used for SSL tunneling, does not require an implementation of SSL in the proxy. The SSL session is established between the client generating the request and the destination (secure) web server; the proxy server in between merely tunnels the encrypted data and does not take any other part in the secure transaction.
Other features of HTTP can be used with tunnels where appropriate. In particular, the proxy authentication support can be used with tunnels to authenticate a client's right to use a tunnel ( Figure 8-13 ).
In general, the tunnel gateway cannot verify that the protocol being spoken is really what it is supposed to tunnel. Thus, for example, mischievous users might use tunnels intended for SSL to tunnel Internet gaming traffic through a corporate firewall, or malicious users might use tunnels to open Telnet sessions or to send email that bypasses corporate email scanners .
To minimize abuse of tunnels, the gateway should open tunnels only for particular well-known ports, such as 443 for HTTPS.