The HyperText Transfer Protocol, or HTTP, must be the most widely used Application layer protocol in the world today. It forms the basis of what most people understand the Internet to bethe World Wide Web. Its purpose is to provide a lightweight protocol for the retrieval of HyperText Markup Language (HTML) and other documents from Web sites throughout the Internet. Each time you open a Web browser to surf the Internet, you are using HTTP over TCP/IP. HTTP was first ratified in the early 1990s and has been through three main iterations:
Most browsers these days offer support for both 1.0 and 1.1 implementations, with new browsers using 1.1 as a default but supporting the ability to fall back to earlier versions if required. One thing the RFC definitions are clear to point out is that all implementations of the HTTP protocol should be backward compatible. That is to say that a browser implementing the HTTP/1.1 specification should be capable of receiving a 1.0 response from a server. Conversely, a 1.1 implementation on the server side should also be capable of responding to requests from a 1.0 browser. It is well outside the bounds of this book to cover the HTTP protocols in huge detail, so let's concentrate on those elements most relevant to content switching. Basic HTTP Page RetrievalLet's start at the beginning and see how a basic browser retrieves a Web page from a Web server. The first important point to note is that a Web page is typically made up of many dozens of objects, ranging from the HTML base through to the images that are present on the page. The HTML can be thought of as the template for the page overall, instructing the browser on the layout of the text, font sizes and colors, background color of the page, and which other images need to be retrieved to make up the page. Think of the process, taking place in the following order:
Once all elements of the page have been retrieved, the client browser will display the completed Web page. The order and timing of the process described previously depends largely on which implementation of HTTP is used1.0 or 1.1although all browsers work in this way of request and response. HTTP MethodsHTTP does not only offer a mechanism for the client to receive data from the server, but also other communication types such as the passing of data from the client to the server. Such mechanisms are known within the HTTP specifications as a method . Table 3-1 shows the supported method types in HTTP/1.0 and 1.1. Table 3-1. The HTTP Method Headers in HTTP/1.0 and HTTP/1.1
In terms of general Web browsing, the GET and POST methods are by far the most commonly used. For a browser to build a standard Web page, the GET method is used to retrieve each object individually, whereas for transactional Web sites implementing shopping cart style applications, the POST method will also be used. The HTTP URLThe URL is the most important piece of information that the client browser includes in any GET request. The URL is defined as being a combination of the host where the site is located, the scheme used to retrieve the page, and the full path and filename. Optionally, the URL may include information such as the TCP port number to be used or a unique reference point within a larger page. Figure 3-1 shows the breakdown of an example URL. Figure 3-1. An example URL and its components .
The URI is also commonly used when referencing the location of documents within HTTP. The formal definition of the difference between a URL and a URI is simple: A URI is a URL without the scheme defined. Persistent Connections in HTTPOne of the other major differences in operation between HTTP/1.0 and HTTP/1.1 is the handling of TCP connections required to retrieve a full Web page. Given that a client will typically have to retrieve multiple objects to make up a single Web page, it is often inefficient to open and close TCP sessions repeatedly when retrieving objects from the same server. To improve the overall performance of HTTP in this instance, the protocol defines the Connection : header that communicates to the server whether the TCP session should be closed or remain open once the object has been retrieved. The Connection : header has two options:
The Closed state indicates that the server should close the TCP connection once the request has been fulfilled. The Keep-Alive state indicates that the server should keep the TCP connection open after the request has been fulfilled. Along with an obvious performance increase from removing the need to open and close TCP connections, the Keep-Alive state also allows the implementation of pipelining . Pipelining allows a client to send multiple HTTP GET requests over the same TCP connection without needing to wait for individual responses after each. Figure 3-2 shows the difference in these connection types. Figure 3-2. The difference in TCP handling between HTTP/1.0 and HTTP/1.1.
The final piece in the puzzle of interaction between client and server is in opening multiple TCP connections. We've already seen that a client can open a persistent TCP connection to the server and pipeline HTTP requests. To further improve performance of the HTTP operation, many browsers will open several simultaneous connections. Figure 3-3 gives examples of pipelining and multiple connections. Figure 3-3. Implementing pipelining and multiple connections as performance mechanisms.
Other HTTP HeadersThe HTTP protocol includes definitions for dozens of headers that can be included in the client-to-server and server-to-client requests and responses. We will not attempt to list and describe all those available here; for a full description, the RFC for HTTP/1.0 and HTTP/1.1 offers a better source. The RFCs define a series of standard headers, which can be complemented by adding user -defined headers from either the client or server side. As headers are ASCII readable text in every HTTP request and response pair, they can prove very useful in the implementation of content switching. Let's look at some of the HTTP headers most commonly used in content switching. The "Accept:" HeaderThe client browser uses the "Accept:" header to indicate to the server which content and media types can be accepted. Examples of the "Accept:" header include:
The "Accept:" header is useful in the context of content switching to be able to determine the capabilities of a particular client. If the client browser cannot accept images, for example, the request can be directed to a server optimized to deliver text-only versions of the Web pages. The "Host:" HeaderOne of the main problems in the original HTTP/1.0 specification was that a user's request as typed into the browser (e.g., http://www.foocorp.com/index.html ) would not contain the host ( www.foocorp.com ) element in the GET request sent to the server. This represents a problem if virtual hosting is used within a Web server farm, where the server is potentially hosting multiple Web sites and needs to use this host information to determine which path and page the user is requesting. Within the HTTP/1.1 specification, and subsequently in many new HTTP/1.0 browsers, support was added for the "Host:" header. This allows the user's requested URL, typed into the browser, to be converted into a GET request containing the full path and filename along with the host from which the content is being fetched . The following is an example of translating a full URL into its component parts . URL : http://www.foocorp.com/directory/somewhere/page.html GET /directory/somewhere/page.html HTTP/1.0\r\n Host: wwwfoocorp.com The "Host:" header has many uses within content switching, examples of which are shown in Chapter 6, Content-Aware Server Load Balancing . The "User-Agent:" HeaderThe "User-Agent:" header indicates to the server the type of browser being used by the client. The "User-Agent:" header is useful in the context of content switching as it can be used to determine the browser type used by the client and direct the request to a resource offering content optimized for such a browser. The following is an example of the "User-Agent:" . User-Agent: Mozilla/4.0(Compatible; MSIE 6.0; Windows NT 5.0) CookiesThe HTTP State Management MechanismAs we'll see in later chapters, one of the biggest challenges in HTTP environments, whether content switched or not, is maintaining some form of client-side state that enables Web servers and intermediary devices to recognize the client session and understand the current status of the user session. This issue was tackled in RFC 2109, which defined the use of the Set-Cookie and Cookie HTTP headers used to set and use the cookies, respectively. In HTTP, cookies take the form of a small piece of text information that is implanted into the user's browser either permanently or temporarily. The term cookie is commonly used in computing to describe an opaque piece of information held during a session and, unfortunately , seems to have no more interesting origin than that. Once the backend server has implanted the cookie into the user's browser, the information can be used for a number of different applications ranging from content personalization, user session persistence for online shopping, and the collection of demographic and statistical information on Web site usage. The server issuing a Set-Cookie header in any HTTP response can post a cookie to the client at any time during an HTTP session. This Set-Cookie header has the following syntax: Set-Cookie: <name>=<value>; expires=<date>; path=<path>; domain=<domain>; secure The name and value fields are the only ones that are mandatory when issuing a cookie. As the name suggests, these define the name of the cookie and its value, such as UserID=Phil , for example. The expires field identifies, down to the second, the date and time on which a cookie will expire and be deleted from the client computer. The path and domain fields indicate the domain, such as www.foocorp.com , and the URL, such as /home/ brochures /, for which the cookie should be used. Both of these options can effectively be wild-carded by specifying foocorp.com to match www.foocorp.com and intranet.foocorp.com , for example. Finally, the secure field indicates to the client that the cookie should only be used when a secure connection (SSL secured HTTP or HTTPS) is used between the client and server. Figure 3-4 shows the interaction between a client and server as two different cookies are inserted and used. Figure 3-4. The interaction between a client and a server when two different cookies are implanted and used.
The following code shows the HTTP responses from the server in more detail. Note that the second cookie includes the Path field, which will limit the use of the cookie to URLs requested by the user that include the string /docs. Hypertext Transfer Protocol HTTP/1.1 200 OK\r\n Set-Cookie: UserID=Phil Connection: Keep-Alive\r\n Content-Type: text/html\r\n \r\n Hypertext Transfer Protocol HTTP/1.1 200 OK\r\n Set-Cookie: UserType=Gold; Path=/docs Connection: Keep-Alive\r\n Content-Type: text/html\r\n \r\n The mechanism that governs whether a cookie is permanent (i.e., stored on the hard disk of the user's machine) or temporary (i.e., removed once the user closes the browser application) is the Expires field in the Set-Cookie header. If the server does not issue an Expires directive when implanting the cookie, it is considered temporary, whereas if the Expires directive is used, then the cookie will be stored on the client machine until the expiry date has passed. Cookies are by far one of the most useful additions made to the HTTP specifications, and as we'll see in later chapters can be used in conjunction with content switching to enable a whole host of new experience-enhancing services. HTTPFurther ReadingIt is outside the scope of this book to cover the HTTP protocol in its entirety;. the RFC for HTTP/1.1 alone is over 160 pages. For more in-depth detail on the protocol, it's worth looking at the following RFCs:
|