acts as a gateway for servers inside and outside the firewall.

Team-FLY

19.4 Web Communication Patterns

According to HTTP terminology [133], a client is an application that establishes a connection, and a server is an application that accepts connections and responds. A user agent is a client that initiates a request for service. Your browser is both a client and a user agent according to this terminology.

The origin server is the server that has the resource. Figure 19.2 on page 661 shows communication between a client and an origin server. In the current incarnation of the World Wide Web, firewalls, proxy servers and content distribution networks have changed the topology of client-server interaction. Communication between the user agent and the origin server often takes place through one or more intermediaries. This section covers four fundamental building blocks of this more complex topology: tunnels, proxies, caches and gateways.

19.4.1 Tunnels

A tunnel is an intermediary that acts as a blind relay. Tunnels do not parse HTTP, but forward it to the server. Figure 19.4 shows communication between a user agent and an origin server with an intermediate tunnel.

Figure 19.4. Communication between a user agent and an origin server through a tunnel.

graphics/19fig04.gif

The tunnel of Figure 19.4 accepts an HTTP connection from a client and establishes a connection to the server. In this scenario, the tunnel acts both as a client and as a server according to the HTTP definition, although it is neither a user agent nor an origin server. The tunnel forwards the information from the client to the server. When the server responds, the tunnel forwards the response to the client. The tunnel detects closing of connections by either the client or server and closes the other end. After closing both ends, the tunnel ceases to exist. The tunnel of Figure 19.4 always connects to the web server running on the host www.usp.cs.utsa.edu .

Sometimes a tunnel does not establish its own connections but is created by another entity such as a firewall or gateway after the connections are established. Figure 19.5 illustrates one such situation in which a client connects to www.usp.cs.utsa.edu , a host running outside of a firewall. The firewall software creates a tunnel for the connection to a machine usp.cs.utsa.edu that is behind the firewall. Clients behind the firewall connect directly to usp.cs.utsa.edu , but usp is not visible outside of the firewall. As far as the client is concerned , the content is on the machine www.usp.cs.utsa.edu . The client knows nothing of usp.cs.utsa.edu .

Figure 19.5. Tunnels provide a controlled portal through a firewall.

graphics/19fig05.gif

19.4.2 Proxies

A proxy is an intermediary between clients and servers that makes requests on behalf of its clients. Proxies are addressed by a special form of the GET request and must parse HTTP. Like tunnels, proxies act both as clients and servers. However, a proxy is generally long-lived and often acts as an intermediary for many clients. Figure 19.6 shows an example in which a browser has set its proxy to org.proxy.net . The HTTP client (e.g., a browser) makes a connection to the HTTP proxy (e.g., org.proxy.net ) and writes its HTTP request. The HTTP proxy parses the request and makes a separate connection to the HTTP origin server (e.g., www.usp.cs.utsa.edu ). When the origin server responds, the HTTP proxy copies the response on the channel connected to the HTTP client.

Figure 19.6. A proxy accesses any server on behalf of a client.

graphics/19fig06.gif

The GET request of Example 19.4 uses an absolute path to specify the resource location. Clients use an alternative form, the absolute URI , when directing requests to a proxy. The absolute URI contains the full HTTP address of the destination server. In Figure 19.6, the http://www.usp.cs.utsa.edu/usp/simple.html is an absolute URI; /usp/simple.html is an absolute path.

Example 19.11

This HTTP request contains an absolute URI rather than an absolute path.

 GET <SP> http://www.usp.cs.utsa.edu/usp/simple.html <SP> HTTP/1.0 <CRLF> User-Agent:uiciclient <CRLF> <CRLF> 

The proxy server parses the GET line and initiates an HTTP request to www.usp.cs.utsa.edu for the resource /usp/simple.html .

When directing a request through a proxy, user agents use the absolute URI form of the GET request and connect to the proxy rather than directly to the origin server. When a server receives a GET request containing an absolute URI, it knows that it should act as a proxy rather than as the origin server. The proxy reconstructs the GET line so that it contains an absolute path, such as the one shown in Example 19.4, and makes the connection to the origin server. Often, the proxy adds additional header lines to the request. The proxy itself can use another proxy, in which case it forwards the original GET to its designated proxy. Most browsers allow a user option of setting a proxy rather than connecting directly to the origin server. Once set up, the browser's operation with a proxy is transparent to the user, other than a performance improvement or degradation.

19.4.3 Caching and Transparency

A transparent proxy is one that does not modify requests or responses beyond what is needed for proxy identification and authentication. Nontransparent proxies may perform many other types of services on behalf of their clients (e.g., annotation, anonymity filtering, content filtering, censorship, media conversion). Proxies may keep statistics and other information about their clients. Search engines such as Google are proxies of a different sort , caching information about the content of pages along with the URLs. Users access the cached information by keywords or phrases. Clients that use proxies assume that the proxies are correct and trustworthy.

The most important service that proxies perform on behalf of clients is caching. A cache is a local store of response messages. Browsers usually cache recent response messages on disk. When a user opens a URL, the browser checks first to see if the resource can be found on disk and only initiates a network request if it didn't find the object locally.

Exercise 19.12

Examine the current settings and contents of the cache on your browser. Different browsers allow access to this information in different ways. The local cache and proxies are accessible under the Advanced option of the Preferences submenu on the Edit menu in Netscape 6. In Internet Explorer 6, you can access the information from the Internet Options submenu under the Tools menu. The cache is designated under Temporary Internet Files on the General menu. Proxies are designed under LAN Settings on the Connections submenu of Internet Options. Look at the files in the directory that holds your local browser cache. Your browser should offer an option for clearing the local cache. Use the option to clear your local cache, and examine the directory again. What is the effect? Why does the browser keep a local cache and how does the browser use this cache?

Answer:

Clearing the cache should remove the contents of the local cache directory. When the user opens a page in the browser, the browser first checks the local disk for the requested object. If the requested object is in the local cache, the browser can retrieve it locally and avoid a network transfer. Browsers use local caches to speed access and reduce network traffic.

A proxy cache stores resources that it fetches in order to more effectively service future requests for those resources. When the proxy cache receives a request for an object from a client, it first checks its local store of objects. If the object is found in the proxy's local cache (Figure 19.7), the proxy can retrieve the object locally rather than by transferring it from the origin server.

Figure 19.7. If possible, a proxy cache retrieves requested resources from its local store.

graphics/19fig07.gif

If the proxy cache does not find an object in its local store (Figure 19.8), it retrieves the object from the origin server and decides whether to save it locally. Some objects contain headers indicating they cannot be cached. The proxy may also decide not to cache an object for other reasons, for example, because the object is too large to cache or because the proxy does not want to remove other, frequently accessed, objects from its cache.

Figure 19.8. When a proxy cannot locate a requested resource locally, it requests the object from the origin server and may elect to add the object to its local cache.

graphics/19fig08.gif

Often, proxy caches are installed at the gateways to local area networks. Clients on the local network direct all their requests through the proxy. The objects in the proxy cache's local store are responses to requests from many different users. If someone else has already requested the object and the proxy has cached the object, the response to the current request will be much faster.

You are probably wondering what happens if the object has changed since the cache stored the object. In this case, the proxy may return an object that is out-of-date, or stale , a situation that can be mitigated by expiration strategies. Origin servers often provide an expiration time as part of the response header. Proxy caches also use expiration policies to keep old objects from being cached indefinitely. Finally, the proxy (or any client) can execute a conditional GET by including an If-Modified-Since field as a header line. The server only returns objects that have changed since the specified modification date. Otherwise, the server returns a 304 Not Modified response, and the proxy can use the copy from its cache.

19.4.4 Gateways

While a proxy can be viewed as a client-side intermediary, a gateway is a server-side mechanism. A gateway receives requests as though it is an origin server. A gateway may be located at the boundary router for a local area network or outside a firewall protecting an intranet. Gateways provide a variety of services such as security, translation and load balancing. A gateway might be used as the common interface to a cluster of web servers for an organization or as a front-end portal to a web server that is behind a firewall.

Figure 19.9 shows an example of how a gateway might be configured to provide a common access point to resources inside and outside a firewall. The server www.usp.cs.utsa.edu acts as a gateway for usp.cs.utsa.edu , a server that is behind the firewall. If a GET request accesses a resource in the usp directory, the gateway creates a tunnel to usp.cs.utsa.edu . For other resources, the gateway creates a tunnel to the www.cs.utsa.edu server outside the firewall.

Figure 19.9. The server www.usp.cs.utsa.edu acts as a gateway for servers inside and outside the firewall.

graphics/19fig09.gif

Exercise 19.13

How does a gateway differ from a tunnel?

Answer:

A tunnel is a conduit that passes information from one point to another without change. A gateway acts as a front end for a resource, perhaps a cluster of servers.

This chapter explores various aspects of tunnels, proxies and gateways. Sections 19.5 and 19.6 guide you through the implementation of a tunnel that might be used in a firewall. Section 19.7 describes a driver for testing the programs. Section 19.8 discusses the HTTP parsing needed for the proxy servers. Sections 19.9 and 19.10 describe a proxy server that monitors the traffic generated by the browsers that use it. Sections 19.12 and 19.13 explore the use of gateways for firewalls and load balancing, respectively.

Team-FLY


Unix Systems Programming
UNIX Systems Programming: Communication, Concurrency and Threads
ISBN: 0130424110
EAN: 2147483647
Year: 2003
Pages: 274

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net