This section provides you with an overview of the HTTP terms, concepts, and features that content networking devices use to enhance content delivery. HTTP is a protocol used to transport web application data across a TCP network. Within the realm of content networking, HTTP clients can be either user agents (web browsers) or content caches. In turn, servers of content can be either origin servers or content caches.
HTTP 1.0 Versus HTTP 1.1
The intent of HTTP 1.1 is to provide a more generic and robust standard than that of HTTP 1.0 to achieve an increase in content optimization, security, and potential for future development. Although HTTP 1.0 reached widespread use through successful implementation into user-agent browsers, such as Microsoft Internet Explorer and Netscape Navigator in the early 1990s, there are many well-known shortcomings of the standard. HTTP 1.1 is backward compatible to HTTP 1.0 and includes the necessary extensibility to avoid the creation of a new version for every extension, or new header, to the protocol. As long as the general message parsing algorithm remains the same, the HTTP version number does not change.
You should prepend your custom HTTP headers with "X-." For example, proxy caches often use the "X-Forwarded-For:" header to indicate the IP address of clients it proxies.
To allow for future headers, HTTP 1.1 requires transparent proxies, such as caches and security proxies, to ignore unrecognized HTTP headers and forward the packets directly to the origin server. This enables easy deployment of new headers into production without first changing every device in the content networkonly the participating devices require the intelligence to process the new headers. As such, the HTTP 1.1 standard allows for significant enhancements to HTTP 1.0, including new caching, security, connection management, and client state recognition functionality.
HTTP 1.0 does not include the server domain name within GET requests. The assumption by its developers was that the IP address would suffice to uniquely identify the site. HTTP 1.1 included a "Host:" header to denote the domain name for use in hosting environments where multiple sites reside on a single physical server.
HTTP is a text-based protocol, with each line terminated with a CRLF (Carriage Return/Line Feed) pair. Text-based protocols make it simple to add headers, thus making the protocol more expandable. In other words, the HTTP RFC 2616 does not contain packet formats as the TCP and IP RFCs do.
HTTP clients and servers communicate with methods for either requesting or supplying content. A client request normally contains a method, URL, the HTTP version number, and headers that pertain to the request, such as the type of client web browser or client security credentials. The response from the server contains a result code and may or may not contain the requested data, depending on factors such as the object's freshness, HTTP version support, or the client's privileges. Table 8-1 gives some common HTTP 1.1 methods.
Table 8-2 gives the available HTTP 1.1 return codes.
HTTP Connection Persistence and Pipelining
When a client requires content from a web server, it first establishes a TCP connection to the server to send a content request over. The server then sends its response over the same TCP session. However, websites often provide multiple URLs in their HTML files that contain ads, images, or other embedded content. If the client creates a new TCP connection for each URL of these embedded objects, the number of TCP connections grows quickly in a short time. HTTP 1.0 introduced the concept of persistent connections where a single TCP connection is used to transport numerous transactions between the client and server. Persistent connections decrease the additional bandwidth created by the overhead from creating multiple TCP connections. Additionally, connection persistence does not incur the round-trip time delays caused from opening new TCP connections. The result is that you perceive faster download times.
Figure 8-1 illustrates how a client establishes two TCP connections to request two objects from an HTML page. The first request is for the index.html main page of Cisco.com, and the second is for an image within the page. The client's browser obtains the image URL from within the index.html page.
Figure 8-1. HTTP Requests Using Individual TCP Connections
HTTP 1.0 introduced the "Connection:" header for client requests. Using this header in the HTTP GET request with a value of "Keep-alive," clients can request HTTP persistence from the server. If the server supports HTTP connection persistence, it will include the "Connection:" header with the value "Keep-alive" in its HTTP 200 OK response. Persistence was a negotiated feature between the client and server in HTTP 1.0 but is enabled by default in HTTP 1.1.
Figure 8-2 illustrates HTTP persistence where the client requests the image within the same TCP connection.
Figure 8-2. HTTP 1.0 Persistence
Even with persistent connections, the roundtrip wait for each response to URL requests created additional noticeable delay. Therefore, HTTP 1.1 introduced pipelined requests, where the client does not require waiting for the response to a request before sending another request. Figure 8-3 illustrates HTTP persistent and pipelined requests.
Figure 8-3. HTTP 1.1 Persistence and Pipelining
Although HTTP 1.1 supports pipelined requests, many clients are not yet implementing this behavior.
Maintaining Client-Side State with HTTP Cookies
The HTTP 1.1 standard is a generic stateless protocol. Therefore, to extend the HTTP 1.1 protocol to enable participants to maintain the state of the HTTP session, cookies were added. Cookies come in two flavors, both created by the server using the same "Set-Cookie:" header. The two types of cookies are session cookies and persistent cookies. Your web browser stores session cookies in RAM only for the life of your HTTP session and removes them when your session ends. That is, when you close your browser window, the client deletes the cookie from memory. In contrast, your web browser stores persistent cookies to disk for future visits to the site, for a period specified by the server in the "Set-Cookie:" header. The server sets session cookies by not including an expiration time in the header, which indicates to the client to remove the cookie after the session ends. Servers set cookies using the HTTP header:
Set-Cookie: name=value; expires=DATE;
When a client sends a request for a URI to a server, the server responds with the "Set-Cookie:" header, set to a unique value to identify the session on the server. In all subsequent requests, the client includes the "Cookie:" header with the value set by the server. This way, the server is able to maintain the state of the session. For example, the server may set a session cookie using the header:
As a result, the client will include the following header in all subsequent HTTP transactions with the server.
Sessions cookies are useful to keep track of user information as they browse from page to page on a website. Session cookies are also an important aspect for SLB Stickiness, which you will learn about in Chapter 10, "Exploring Server Load Balancing."
HTTP 1.0 enables you to provide basic authentication to clients using clear text passwords. For HTTP message digest authentication, you need to use the extension to HTTP 1.1 that is defined in RFC 2617.
With basic authentication, your browser sends your credentials over the network unencrypted using Base64 encoding. For example, the username and password string "sdaros:cisco" (without the quotes) is "c2Rhcm9zOmNpc2Nv" in Base64. Base64 encoding may appear encrypted but it is easily reversible. For example, try cutting and pasting the Base64 encoded string "c2Rhcm9zOmNpc2Nv" into any publicly available Base64 tool on the Internet, and you will see how quickly it can be reversed into the original source string "sdaros:cisco."
With message digest authentication, your browser encrypts your credentials by computing a message digest before sending them over the network. First, to authenticate you, the server challenges your browser with a nonce value. Your browser uses this nonce, along with the username and password that you supply, as input to a hash algorithm. A nonce is a one-time random number that HTTP 1.1 uses to prevent replay attacks.
A hash function is a function that takes input values and performs a mathematical computation, resulting in a single (smaller) value that is impossible to determine the original values from. These functions are also called one-way hash functions because they are considered impossible to reverse. You can use Message Digest 5 (MD5) and Secure Hash Algorithm 1 (SHA-1) hash algorithms for your web applications to compute public keys from private keys. MD5 provides hashes 128 bits in length, and SHA-1 provides 168-bit hashes. Hash functions are also used in many other areas in content networking, ranging from public-key cryptography, X.509 certificate authentication, and message integrity, to server/cache selection in server load balancing (SLB) environments.
A replay attack occurs when a malicious third party somehow intercepts the checksum containing your credentials and attempts to use them to connect to the same server. Because the credentials that the third-party intercepts contain a nonce value that you have previously used to authenticate with that server, the server refuses the malicious authentication attempt. The nonce value forces your credentials to change on the network for every login attempt, even though your username and password stay the same.
A message digest is a one-way hash algorithm, in which the algorithm uses the inputs to compute a single 128 bit (that is, 16 bytes and is visually represented as 32 hexadecimal digits). HTTP message digest authentication uses the MD5 message digest algorithm. For example, the MD5 message digest value for the username and password string "sdaros:cisco" is
Consider the following string containing a username, password, and nonce value of "64b0a095cfd25133b2e4014ea87a0680":
The MD5 digest value for this string is
To generate nonce values, servers normally use the current timestamp in conjunction with its private key (if it has a private key) and the "ETag:" HTTP headers in the client's requests. You will learn about the "ETag:" header and private keys later in this Chapter.
When you attempt to access an object on a server that requires authentication, the server sends the "HTTP 401 Unauthorized" response to challenge your browser for your credentials. This challenge from the server includes the "WWW-Authenticate:" header, which specifies the type of authentication that your browser should use and the realm that the server is authenticating you within. Some common values for this header are "Basic," "Digest," "Kerberos," and "NTLM." The server maintains a list of objects that you can access associated to the particular realm.
Your browser then prompts you for your username and password with a popup window and sends your credentials to the server, with the HTTP "Authenticate:" header. You do not need to enter your password for every object within the realm you request during the HTTP session because your browser automatically sends your credentials during your session. For example, your server challenges your browser to authenticate you with the header:
WWW-Authenticate: Digest realm="CUST_RELATIONS" nonce="64b0a095cfd25133b2e4014ea87a0680"
Then your browser includes the following header to the server:
Authorization: Digest "cacdf924bcbb809b49c75cf967098fc1" realm="CUST_RELATIONS"
NTLM and Kerberos authentication are available for intranet or extranet HTTP access. NTLM servers issue the "WWW-Authenticate: NTLM" header and Kerberos issue "WWW-Authenticate: Negotiate" to clients. These two protocols are similar to Digest authentication in that they both use a one-time nonce value to authenticate clients. However, they are different in that they both automatically supply your Windows user credentials automatically through HTTP to the site to which you require access. If you attempt to log in to the site from a workstation that is not part of a Windows domain, you will have to enter your Windows credentials in a popup window provided by the browser. You will see in Chapter 13, "Delivering Cached and Streaming Media," how NTLM and Kerberos are also useful with the Cisco Content Engine request authentication and authorization feature.
HTTP Caching Controls
As you will learn in Chapter 13 you can use caches in your network when you require reductions in bandwidth and response times. You can place caches between clients and servers to inspect requests and responses to intelligently cache and respond to client requests on the server's behalf. HTTP provides implicit and explicit cache controls.
Implicit Cache Controls
HTTP provides clients and servers with the ability to supply information to caches to ensure that cached information is as fresh as the content residing on the origin server. When a cache requires the validation of a piece of cached content, it sends a GET method to the origin server, with conditional directives. The cache converts the client GET request to a conditional GET request by adding the headers "IF-Modified-Since:" or "IF-Match:" to the request. The response by the server contains only the requested object, depending on the results of a comparison between the freshness of cached content and the original content. If the comparison fails, the server sends the object to the cache; otherwise, the server considers the cached content fresh, and the cache avoids having to download the object from the origin server.
Some browsers send a conditional GET request to the server when you click the Reload button on your web browser. If the content has not changed on the origin server, then your browser uses files and images obtained from its local cache located on your hard disk. Try using a network packet sniffer the next time you click Reload on your browser.
In HTTP 1.0, when a cache originally receives a copy of an object, it stamps the object with an absolute expiration time specified in the "Expires:" header that it receives in server responses. If the origin server wants the cache to validate the object immediately upon receiving the object, the "Expires:" header contains the current time. Otherwise, the cache does not validate the object until the time in the "Expires:" header is reached. To compare the freshness of two pieces of content, the "If-Modified-Since:" header provides a way for the server to determine if the requested content has been modified since the expiration time. The cache issues a conditional GET request to the server with the original expiration time in the "If-Modified-Since:" header. The server then compares the time value in the header from the cache with the modified date on the object in the file system. If the object has not changed, the server returns the "HTTP/1.0 304 Not Modified" response. Otherwise, the server sends a fresh copy of the updated object to the cache. For example, a cache may send the following conditional request to an origin server to validate a piece of requested content:
GET /index.html HTTP/1.0 If-Modified-Since: "Sat, 23 Mar 2006 19:43:31 GMT"
If the server determines that the content has not changed (a cache hit has occurred), it sends the following response to the cache:
HTTP/1.0 304 Not Modified
If the server determines that the content has changed (a cache miss has occurred), it sends the following response to the cache:
HTTP/1.0 200 OK *Data*
This time-based validation mechanism provided by HTTP 1.0 proves to be problematic. That is, if the clocks between the cache and server are not synchronized, this method does not provide the correct response. Therefore, HTTP 1.1 enhanced this method by using the "E-Tag:" header. This header avoids using dates by instead maintaining version numbers for objects. When a cache originally stores an object, it also stores the E-Tag value supplied by the server. When the cache validates the content, the GET request contains a conditional "IF-Match:" header containing the original version code. If the origin server's version code for the object is different from the version supplied by the cache, then the server returns the object to the cache. Otherwise, the server sends the "HTTP/1.1 412 Precondition Failed" response, indicating that the content is fresh. For example, a cache may send a conditional GET request as follows:
GET /index.html HTTP/1.1 If-Match: "4cf0a-8f3-345e03ab"
For a cache hit, the server will respond with:
HTTP/1.1 412 Precondition Failed
For a cache miss, the server will respond with:
HTTP/1.1 200 OK *Data*
Table 8-3 lists the HTTP implicit cache control headers.
You learned that servers can provide implicit controls in response to directives sent by caches through conditional validation requests. In contrast, HTTP 1.1 provides a mechanism for clients and servers to send explicit directives to caches, using the "Cache-Control:" header, regardless of the possible directives in the request that a cache may insert. The "Cache-Control:" header provides the values listed in Table 8-4 that pertain to caching mechanisms in Cisco content networking products.
For example, in response to a client's request, an origin server may indicate that intermediary caches should re-validate the "Set-Cookie" header before returning the header in the object to requesting clients.
HTTP/1.1 200 OK. Cache-Control: no-cache
This completes the overview of the HTTP protocol. So far you have learned the basic features of the HTTP web protocol, including how transactions are executed using persistence and pipelining, how HTTP authentication takes place, and how HTTP uses caching controls for efficient content caching. These features are important to grasp for further study of content networking. However, by themselves most of these provide functionality that, if left as cleartext on your network, could lead to a compromise of your organization's private information.
Because of the growing importance of information security on the Internet, the rest of the Chapter will cover how to secure HTTP using the Public Key Infrastructure (PKI).