HTTP's structure is based on years of text protocol design experience, particular email protocols. Three-digit error codes, including "200 OK," were used in email protocols in 1973 [RFC524]. Header names and values (separated by a carriage returns and a line feed, or CRLF) also appeared in early email protocols [RFC561], probably modeled on interoffice paper messages, where header lines ("To: Alice," "From:Bob," "Re:Staff Meeting") are followed by the message body. Multipart Internet Mail Extensions (MIME) first standardized some of the headers reused by HTTP, like Content-Type and Content-Transfer-Encoding ("Transfer-Encoding" in HTTP) [RFC1341]. 3.2.1 HTTP RequestsAn HTTP request must contain a request line with method, Request-URI, and protocol version. It can contain a number of headers, each new header on a new line. It must contain an empty line signaling the end of the headers. Finally, a request might have a body (see Figure 3-2). Figure 3-2. HTTP request structure.The request method, URI, and protocol version appear on the first line of the request, separated from each other by spaces. Headers appear on subsequent lines, and an arbitrary number of headers might appear before the blank line that signals the end of the headers. Finally, if the request has a body, the body follows immediately after the blank line. If a message has no body, it ends with a blank line. A GET request does not have a body, and it's the most common type of request used on the Web (see Listing 3-2). Listing 3-2 Minimum GET request.GET /index.html HTTP/1.1 Host: example.com:80 The character is used in this book to explicitly show where an empty line must appear. It does not actually appear in protocol messages. Note that much of this text is case-sensitive. In the first line, the method name, Request-URI,[1] and protocol version are case-sensitive. Header names are case-insensitive; however, it's good practice to capitalize header names exactly as they are defined in specifications, in case the recipient relies on consistent capitalization. Header values might or might not be case-sensitive, depending on the definition of the header.
The Request-URI (/index.html) begins with a slash because it is a relative URI. It must be combined with the host identified in the Host header to fully specify the location of the resource. However, Request-URIs need not be relative. In fact, RFC2616 mandates that Web proxies and servers must accept an absolute URI in the Request-URI, because it's more reliable to put the entire resource URL in the Request-URI than to split it up into two parts. Therefore, servers must also accept the GET request structure shown in Listing 3-3. Listing 3-3 GET request showing host name in the Request-URI.GET http://example.com:80/index.html HTTP/1.1 HTTP/1.1 clients must send absolute URIs in requests to HTTP/1.1 proxies, whereas requests to Web servers can still use the old form of relative URIs. 3.2.2 Header SyntaxHTTP headers follow some basic syntax rules.
Many headers can appear multiple times in a message. If the header appears multiple times, the recipient can combine the multiple values into one header, separating the values with commas. For example, the Accept-Language header contains a number of language codes in which the user finds it acceptable to receive documents. So, the following pair of headers: Accept-Language: fr Accept-Language: en is equivalent to: Accept-Language: fr, en
3.2.3 Required Request HeadersThe Host header is required on every request, to identify the virtual server the requested resource is on.
The Content-Type header is required in every message that contains a body. The Content-Type header shows the recipient what MIME type and character set were used for the body. The Content-Length header is usually included to indicate the length of the body. When the Content-Length header is present, HTTP parsers stop reading the body when the number of bytes read equals the Content-Length. When the header is not present, the HTTP parser continues trying to read the body until one party times out the TCP connection. Section 3.2.8 describes some less common ways to indicate the length of the body. 3.2.4 HTTP ResponsesHTTP responses have a slightly different structure for the first line of the message: first the protocol version, then a three-digit status code, and finally some status text. After that, headers and a body follow exactly as in request messages (see Figure 3-3). Figure 3-3. HTTP response structure.The 200 OK response is the most common HTTP response because it's used to send the content of a Web page when the client requests it using GET. Listing 3-4 is a more complete example of a possible response to the GET request in Listing 3-2. Listing 3-4 Typical GET response.HTTP/1.1 200 OK Date: Sun, 29 Jul 2001 15:24:17 GMT Content-Length: 25 Content-Type: text/html Expires: Sun, 29 Jul 2001 19:24:17 GMT Cache-control: private <body>Hello World!</body> Responses should have the Date header showing the time the reply was generated. No other header is required on all responses. If the response has a body, it should include the Content-Length and Content-Type headers. Listing 3-4 has two headers (Expires and Cache-Control) to specify how long and for whom the response body might be cached. Web pages are cached by most browsers so that if the user wants to look at the same page later, the page doesn't have to be downloaded again. Many proxy servers or intermediaries also have a common cache used for many different client connections. When one user requests a cachable file through the intermediary, it's cached so that the next user to request the same file can get a quick response. With the response headers given in Listing 3-4:
A PUT response is quite similar. In a PUT interaction, the client sends content to overwrite a Web page, and the server responds with a status code to show whether the request succeeded. As with a GET interaction, the Host header must be on the request, and the Date header must be on the response. Content-Type and Content-Length now appear on the request because the request message, not the response, has a body (see Listing 3-5). Listing 3-5 PUT request and response: Only the request message has a body.Request: PUT /index.html HTTP/1.1 Host: example.com:80 Content-Type: text/html Content-Length: 33 <body>Hello World, part 2!</body> Response: HTTP/1.1 204 No Content Date: Sun, 29 Jul 2001 15:24:07 GMT A response with the 204 status code never has a response body, which is what "No Content" means. It is still a success response code (almost all 200 series responses are full successes), and the response body is not expected to be there in response to a PUT. Normally, once a success or error response is sent, the server closes the TCP connection. This allows the server to go on and handle other clients with new TCP connections. The connection is reestablished for the next HTTP request. Section 3.5 covers alternatives to closing the connection. 3.2.5 Error ResponsesError responses are formatted in the same way as success responses; they just use different status codes. Many servers add a body to error responses in order to provide extra information. Typically, the error response body is formatted in HTML and is intended to be displayed. For example, if the client issues a request with an unknown method to a server running IIS 4.0, the server returns what appears in Listing 3-6. 3.2.6 Status Code CategoriesSo far, we've seen three different status codes. Servers use 200 OK to return a Web page, 204 No Content when a method is successful and the response has no body, and 501 Not Supported when the server does not support some required functionality requested by the client. Listing 3-6 Sample error response with body.HTTP/1.1 501 Not Supported Server: Microsoft-IIS/4.0 Date: Sun, 29 Jul 2001 15:30:28 GMT Connection: close Content-Type: text/html Content-Length: 121 <html><head><title>Method Not Supported</title></head> <body><h1>The specified method is not supported</h1> </body></html> In addition to these, HTTP defines some few dozen response status codes for different conditions and situations. Although status codes can be any three-digit number, only some ranges are defined. The status codes are grouped into several ranges, some of which have particular meaning:
3.2.7 Common Status CodesHere is a quick run-through of the most common status codes and what they mean. Forty status codes are defined in RFC2616; only the most common are covered here.
Clients must be aware that there may be more than one problem with the request; however, the server can only return one status code and may choose any appropriate status code. 3.2.8 Message LengthHTTP/1.1 specifies five different ways the sender might indicate that the message is completed. All techniques must be supported in WebDAV implementations as well.
The message sender should indicate the correct body length through one of the first four mechanisms. Otherwise, HTTP messages are too subject to truncation attacks and transmission errors. It's easy for a hacker to force a TCP connection to be dropped, and without some indication of the expected body length, the recipient can't tell if the connection was finished or prematurely broken. Note that none of the methods described here reliably prevent truncation attacks, because with an unencrypted channel an attacker can modify several parts of the message, including header values. Only security protocols (typically S/MIME or SSL) protect the message from being tampered with. Content-LengthThe content length is measured in bytes or octets. If the sender somehow sends a content length that is too small, the recipient could cut off the message when the content length is reached. That could result in an unusable file, particularly with formats like Microsoft Word documents. Conversely, if the sender has a content length that is too long and the TCP connection isn't closed, the recipient will wait for the rest of the content. When the client sends a content length that is too long, it can seem like the server is hanging. Clearly, it's important to get this value right. Responses Where Body Must Be EmptyRequests such as GET requests are expected not to have bodies, and the server can confirm that no body is expected by the absence of the Content-* headers. Responses in the 100 199 range, as well as 204 and 304 responses, are required to have empty bodies. No Content-Length header is required for these messages. Alternatively, the Content-Length header could have a value of 0, but this is rare. Transfer-EncodingWhen the Transfer-Encoding header is present, the length of the message is defined according to the transfer encoding mechanism. The Transfer-Encoding header takes precedence over the Content-Length header. The only transfer encoding mechanism defined in RFC2616 is chunked transfer encoding. When chunked transfer encoding is used, the message body is still delivered as the body of a single message, but within the body the sender includes additional fields that are stripped out by the recipient. The additional fields allow the recipient to ensure that a complete message of the correct size has been received, even if the size isn't known by the sender until after the last byte is sent.
Media Type Multipart/ByterangesIf the Content-Type response header shows that the media type is multipart/byteranges, then the length of the response body is included within the message body itself. Support for this mechanism is not required, so the server can only use it if the client advertises support for it by sending a Range header with multiple ranges. This mechanism cannot be used in requests. Closing ConnectionThe server has the option of closing the connection when the response is finished. The close of the connection also indicates the end of the response if the body length wasn't specified in another way. This method isn't recommended, because connections can close accidentally. However, the client must deal with a closed connection gracefully when no Content-Length header is present. More detail on any of these mechanisms can be found in the HTTP/1.1 specification or in a book on HTTP/1.1. |