The web runs on HTTP, the HyperText Transfer Protocol. This protocol governs how web browsers request files from web servers and how the servers send the files back. To understand the various techniques we'll show you in this chapter, you need to have a basic understanding of HTTP. For a more thorough discussion of HTTP, see the HTTP Pocket Reference, by Clinton Wong (O'Reilly).
When a web browser requests a web page, it sends an HTTP request message to a web server. The request message always includes some header information, and it sometimes also includes a body. The web server responds with a reply message, which always includes header information and usually contains a body. The first line of an HTTP request looks like this:
GET /index.html HTTP/1.1
This line specifies an HTTP command, called a method , followed by the address of a document and the version of the HTTP protocol being used. In this case, the request is using the GET method to ask for the index.html document using HTTP 1.1. After this initial line, the request can contain optional header information that gives the server additional data about the request. For example:
User-Agent: Mozilla/5.0 (Windows 2000; U) Opera 6.0 [en] Accept: image/gif, image/jpeg, text/*, */*
The User-Agent header provides information about the web browser, while the Accept header specifies the MIME types that the browser accepts. After any headers, the request contains a blank line, to indicate the end of the header section. The request can also contain additional data, if that is appropriate for the method being used (e.g., with the POST method, as we'll discuss shortly). If the request doesn't contain any data, it ends with a blank line.
The web server receives the request, processes it, and sends a response. The first line of an HTTP response looks like this:
HTTP/1.1 200 OK
This line specifies the protocol version, a status code, and a description of that code. In this case, the status code is "200", meaning that the request was successful (hence the description "OK"). After the status line, the response contains headers that give the client additional information about the response. For example:
Date: Sat, 26 Jan 2002 20:25:12 GMT Server: Apache 1.3.22 (Unix) mod_perl/1.26 PHP/4.1.0 Content-Type: text/html Content-Length: 141
The Server header provides information about the web server software, while the Content-Type header specifies the MIME type of the data included in the response. After the headers, the response contains a blank line, followed by the requested data, if the request was successful.
The two most common HTTP methods are GET and POST. The GET method is designed for retrieving information, such as a document, an image, or the results of a database query, from the server. The POST method is meant for posting information, such as a credit-card number or information that is to be stored in a database, to the server. The GET method is what a web browser uses when the user types in a URL or clicks on a link. When the user submits a form, either the GET or POST method can be used, as specified by the method attribute of the form tag. We'll discuss the GET and POST methods in more detail later, in Section 7.4.