HTTP messages are simple, formatted blocks of data. Take a peek at Figure 3-3 for an example. Each message contains either a request from a client or a response from a server. They consist of three parts: a start line describing the message, a block of headers containing attributes, and an optional body containing data.
The start line and headers are just ASCII text, broken up by lines. Each line ends with a two-character end-of-line sequence, consisting of a carriage return (ASCII 13) and a line-feed character (ASCII 10). This end-of-line sequence is written "CRLF." It is worth pointing out that while the HTTP specification for terminating lines is CRLF, robust applications also should accept just a line-feed character. Some older or broken HTTP applications do not always send both the carriage return and line feed.
The entity body or message body (or just plain "body") is simply an optional chunk of data. Unlike the start line and headers, the body can contain text or binary data or can be empty.
In the example in Figure 3-3 , the headers give you a bit of information about the body. The Content-Type line tells you what the body isin this example, it is a plain-text document. The Content-Length line tells you how big the body is; here it is a meager 19 bytes.
All HTTP messages fall into two types: request messages and response messages. Request messages request an action from a web server. Response messages carry results of a request back to a client. Both request and response messages have the same basic message structure. Figure 3-4 shows request and response messages to get a GIF image.
Here's the format for a request message:
<method> <request-URL> <version>
<headers>
<entity-body>
Here's the format for a response message (note that the syntax differs only in the start line):
<version> <status> <reason-phrase>
<headers>
<entity-body>
Here's a quick description of the various parts:
method
The action that the client wants the server to perform on the resource. It is a single word, like "GET," "HEAD," or "POST". We cover the method in detail later in this chapter.
request-URL
A complete URL naming the requested resource, or the path component of the URL. If you are talking directly to the server, the path component of the URL is usually okay as long as it is the absolute path to the resourcethe server can assume itself as the host/port of the URL. Chapter 2 covers URL syntax in detail.
version
The version of HTTP that the message is using. Its format looks like:
HTTP/<major>.<minor>
where major and minor both are integers. We discuss HTTP versioning a bit more later in this chapter.
status-code
A three-digit number describing what happened during the request. The first digit of each code describes the general class of status ("success," "error," etc.). An exhaustive list of status codes defined in the HTTP specification and their meanings is provided later in this chapter.
reason-phrase
A human-readable version of the numeric status code, consisting of all the text until the end-of-line sequence. Example reason phrases for all the status codes defined in the HTTP specification are provided later in this chapter. The reason phrase is meant solely for human consumption, so, for example, response lines containing "HTTP/1.0 200 NOT OK" and "HTTP/1.0 200 OK" should be treated as equivalent success indications , despite the reason phrases suggesting otherwise .
headers
Zero or more headers, each of which is a name , followed by a colon ( : ), followed by optional whitespace, followed by a value, followed by a CRLF. The headers are terminated by a blank line (CRLF), marking the end of the list of headers and the beginning of the entity body. Some versions of HTTP, such as HTTP/1.1, require certain headers to be present for the request or response message to be valid. The various HTTP headers are covered later in this chapter.
entity-body
The entity body contains a block of arbitrary data. Not all messages contain entity bodies, so sometimes a message terminates with a bare CRLF. We discuss entities in detail in Chapter 15 .
Figure 3-5 demonstrates hypothetical request and response messages.
Note that a set of HTTP headers should always end in a blank line (bare CRLF), even if there are no headers and even if there is no entity body. Historically, however, many clients and servers (mistakenly) omitted the final CRLF if there was no entity body. To interoperate with these popular but noncompliant implementations , clients and servers should accept messages that end without the final CRLF.
All HTTP messages begin with a start line. The start line for a request message says what to do . The start line for a response message says what happened .
Request messages ask servers to do something to a resource. The start line for a request message, or request line , contains a method describing what operation the server should perform and a request URL describing the resource on which to perform the method. The request line also includes an HTTP version tells the server what dialect of HTTP the client is speaking.
All of these fields are separated by whitespace. In Figure 3-5 a, the request method is GET, the request URL is /test/hi-there.txt , and the version is HTTP/1.1. Prior to HTTP/1.0, request lines were not required to contain an HTTP version.
Response messages carry status information and any resulting data from an operation back to a client. The start line for a response message, or response line , contains the HTTP version that the response message is using, a numeric status code, and a textual reason phrase describing the status of the operation.
All these fields are separated by whitespace. In Figure 3-5 b, the HTTP version is HTTP/1.0, the status code is 200 (indicating success), and the reason phrase is OK, meaning the document was returned successfully. Prior to HTTP/1.0, responses were not required to contain a response line.
The method begins the start line of requests , telling the server what to do. For example, in the line "GET /specials/saw- blade .gif HTTP/1.0," the method is GET.
The HTTP specifications have defined a set of common request methods. For example, the GET method gets a document from a server, the POST method sends data to a server for processing, and the OPTIONS method determines the general capabilities of a web server or the capabilities of a web server for a specific resource.
Table 3-1 describes seven of these methods. Note that some methods have a body in the request message, and other methods have bodyless requests.
Table 3-1. Common HTTP methods | ||
Method | Description | Message body? |
GET | Get a document from the server. | No |
HEAD | Get just the headers for a document from the server. | No |
POST | Send data to the server for processing. | Yes |
PUT | Store the body of the request on the server. | Yes |
TRACE | Trace the message through proxy servers to the server. | No |
OPTIONS | Determine what methods can operate on a server. | No |
DELETE | Remove a document from the server. | No |
Not all servers implement all seven of the methods in Table 3-1 . Furthermore, because HTTP was designed to be easily extensible, other servers may implement their own request methods in addition to these. These additional methods are called extension methods , because they extend the HTTP specification.
As methods tell the server what to do, status codes tell the client what happened. Status codes live in the start lines of responses. For example, in the line "HTTP/1.0 200 OK," the status code is 200.
When clients send request messages to an HTTP server, many things can happen. If you are fortunate, the request will complete successfully. You might not always be so lucky. The server may tell you that the resource you requested could not be found, that you don't have permission to access the resource, or perhaps that the resource has moved someplace else.
Status codes are returned in the start line of each response message. Both a numeric and a human-readable status are returned. The numeric code makes error processing easy for programs, while the reason phrase is easily understood by humans .
The different status codes are grouped into classes by their three-digit numeric codes. Status codes between 200 and 299 represent success. Codes between 300 and 399 indicate that the resource has been moved. Codes between 400 and 499 mean that the client did something wrong in the request. Codes between 500 and 599 mean something went awry on the server.
The status code classes are shown in Table 3-2 .
Table 3-2. Status code classes | ||
Overall range | Defined range | Category |
100-199 | 100-101 | Informational |
200-299 | 200-206 | Successful |
300-399 | 300-305 | Redirection |
400-499 | 400-415 | Client error |
500-599 | 500-505 | Server error |
Current versions of HTTP define only a few codes for each status category. As the protocol evolves, more status codes will be defined officially in the HTTP specification. If you receive a status code that you don't recognize, chances are someone has defined it as an extension to the current protocol. You should treat it as a general member of the class whose range it falls into.
For example, if you receive status code 515 (which is outside of the defined range for 5XX codes listed in Table 3-2 ), you should treat the response as indicating a server error, which is the general class of 5XX messages.
Table 3-3 lists some of the most common status codes that you will see. We will explain all the current HTTP status codes in detail later in this chapter.
Table 3-3. Common status codes | ||
Status code | Reason phrase | Meaning |
200 | OK | Success! Any requested data is in the response body. |
401 | Unauthorized | You need to enter a username and password. |
404 | Not Found | The server cannot find a resource for the requested URL. |
The reason phrase is the last component of the start line of the response. It provides a textual explanation of the status code. For example, in the line "HTTP/1.0 200 OK," the reason phrase is OK.
Reason phrases are paired one-to-one with status codes. The reason phrase provides a human-readable version of the status code that application developers can pass along to their users to indicate what happened during the request.
The HTTP specification does not provide any hard and fast rules for what reason phrases should look like. Later in this chapter, we list the status codes and some suggested reason phrases.
Version numbers appear in both request and response message start lines in the format HTTP/x.y. They provide a means for HTTP applications to tell each other what version of the protocol they conform to.
Version numbers are intended to provide applications speaking HTTP with a clue about each other's capabilities and the format of the message. An HTTP Version 1.2 application communicating with an HTTP Version 1.1 application should know that it should not use any new 1.2 features, as they likely are not implemented by the application speaking the older version of the protocol.
The version number indicates the highest version of HTTP that an application supports. In some cases this leads to confusion between applications, [2] because HTTP/1.0 applications interpret a response with HTTP/1.1 in it to indicate that the response is a 1.1 response, when in fact that's just the level of protocol used by the responding application.
[2] See http://httpd.apache.org/docs-2.0/misc/known_client_problems.html for more on cases in which Apache has run into this problem with clients.
Note that version numbers are not treated as fractional numbers. Each number in the version (for example, the "1" and "0" in HTTP/1.0) is treated as a separate number. So, when comparing HTTP versions, each number must be compared separately in order to determine which is the higher version. For example, HTTP/2.22 is a higher version than HTTP/2.3, because 22 is a larger number than 3.
The previous section focused on the first line of request and response messages (methods, status codes, reason phrases, and version numbers). Following the start line comes a list of zero, one, or many HTTP header fields (see Figure 3-5 ).
HTTP header fields add additional information to request and response messages. They are basically just lists of name/value pairs. For example, the following header line assigns the value 19 to the Content-Length header field:
Content-length: 19
The HTTP specification defines several header fields. Applications also are free to invent their own home-brewed headers. HTTP headers are classified into:
General headers
Can appear in both request and response messages
Request headers
Provide more information about the request
Response headers
Provide more information about the response
Entity headers
Describe body size and contents, or the resource itself
Extension headers
New headers that are not defined in the specification
Each HTTP header has a simple syntax: a name, followed by a colon ( : ), followed by optional whitespace, followed by the field value, followed by a CRLF. Table 3-4 lists some common header examples.
Table 3-4. Common header examples | |
Header example | Description |
Date: Tue, 3 Oct 1997 02:16:03 GMT | The date the server generated the response |
Content-length: 15040 | The entity body contains 15,040 bytes of data |
Content-type: image/gif | The entity body is a GIF image |
Accept: image/gif, image/jpeg, text/html | The client accepts GIF and JPEG images and HTML |
Long header lines can be made more readable by breaking them into multiple lines, preceding each extra line with at least one space or tab character.
For example:
HTTP/1.0 200 OK
Content-Type: image/gif
Content-Length: 8572
Server: Test Server
Version 1.0
In this example, the response message contains a Server header whose value is broken into continuation lines. The complete value of the header is "Test Server Version 1.0".
We'll briefly describe all the HTTP headers later in this chapter. We also provide a more detailed reference summary of all the headers in Appendix C .
The third part of an HTTP message is the optional entity body. Entity bodies are the payload of HTTP messages. They are the things that HTTP was designed to transport.
HTTP messages can carry many kinds of digital data: images, video, HTML documents, software applications, credit card transactions, electronic mail, and so on.
HTTP Version 0.9 was an early version of the HTTP protocol. It was the starting point for the request and response messages that HTTP has today, but with a far simpler protocol (see Figure 3-6 ).
HTTP/0.9 messages also consisted of requests and responses, but the request contained merely the method and the request URL , and the response contained only the entity . No version information (it was the first and only version at the time), no status code or reason phrase, and no headers were included.
However, this simplicity did not allow for much flexibility or the implementation of most of the HTTP features and applications described in this book. We briefly describe it here because there are still clients, servers, and other applications that use it, and application writers should be aware of its limitations.