19.3 HTTP Primer

Team-FLY

Clients and web servers have a specific set of rules, or protocol , for exchanging information called Hyper Text Transfer Protocol (HTTP). HTTP is a request-reply protocol that assumes that messages are delivered reliably. For this reason, HTTP communication usually uses TCP, and that is what we assume in this discussion. We also restrict our initial discussion to HTTP 1.0 [53].

Figure 19.2 presents a schematic of a simple HTTP transaction. The client sends a request (e.g., a message that starts with the word GET ). The server parses the message and responds with the status and possibly a copy of the requested resource.

Figure 19.2. Schematic of an HTTP 1.0 transaction.

graphics/19fig02.gif

19.3.1 Client requests

HTTP client requests begin with an initial line that specifies the kind of request being made, the location of the resource and the version of HTTP being used. The initial line ends with a carriage return followed by a line feed. In the following, <CRLF> denotes a carriage return followed by a line feed, and <SP> represents a white space character. A white space character is either a blank or tab.

Example 19.4

The following HTTP 1.0 client request asks a server for the resource /usp/simple.html .

 GET <SP> /usp/simple.html <SP> HTTP/1.0 <CRLF> User-Agent:uiciclient <CRLF> <CRLF> 

The first or initial line of HTTP client requests has the following format.

 Method <SP> Request-URI <SP> HTTP-Version <CRLF> 

Method is usually GET , but other client methods include POST and HEAD .

The second line of the request in Example 19.4 is an example of a header line or header field . These lines convey additional information to the server about the request. Header lines are of the following form.

 Field-Name:Field-Value <CRLF> 

The last line of the request is empty. That is, the last header line just contains a carriage return and a line feed, telling the server that the request is complete. Notice that the HTTP request of Example 19.4 does not explicitly contain a server host name. The request of Example 19.4 might have been generated by a user opening the URL http://www.usp.cs.utsa.edu/usp/simple.html in a browser. The browser parses the URL into a server location www.usp.cs.utsa.edu and a location within that server /usp/simple.html . The browser then opens a TCP connection to port 80 of the server www.usp.cs.utsa.edu and sends the message of Example 19.4.

19.3.2 Server response

A web server responds to a client HTTP request by sending a status line , followed by any number of optional header lines, followed by an empty line containing just <CRLF> . The server then may send a resource. The status line has the following format.

 HTTP-Version <SP> Status-Code <SP> Reason-Phrase <CRLF> 

Table 19.1 summarizes the status codes, which are organized into groups by the first digit.

Table 19.1. Common status codes returned by HTTP servers.

code

category

description

1xx

informational

reserved for future use

2xx

success

successful request

3xx

redirection

additional action must be taken (e.g., object has moved)

4xx

client error

bad syntax or other request error

5xx

server error

server failed to satisfy apparently valid request

Example 19.5

When the request of Example 19.4 is sent to www.usp.cs.utsa.edu , the web server running on port 80 might respond with the following status line.

 HTTP/1.0 <SP> 200 <SP> OK <CRLF> 

After sending any additional header lines and an empty line to mark the end of the header, the server sends the contents of the requested file.

19.3.3 HTTP message exchange

HTTP presumes reliable transport of messages (in order, error-free), usually achieved by the use of TCP. Figure 19.3 shows the steps for the exchange between client and server, using a TCP connection. The server listens on a well-known port (e.g., 80) for a connection request. The client establishes a connection and sends a GET request. The server responds and closes the connection. HTTP 1.0 allows only a single request on a connection, so the client can detect the end of the sending of the resource by the remote closing of the connection. HTTP 1.1 allows the client to pipeline multiple requests on a single connection, requiring the server to send resource length information as part of the response.

Figure 19.3. Sequence of steps in HTTP 1.0 communication.

graphics/19fig03.gif

Exercise 19.6

How could you use Program 18.5 ( client2 ) on page 629 to access the web server that is running on www.usp.cs.utsa.edu ?

Answer:

Start client2 with the following command.

 client2 www.usp.cs.utsa.edu 80 

Type the HTTP request of Example 19.4 at the keyboard. The third line of the request is just an empty line. The host www.usp.cs.utsa.edu runs a web server that listens on port 80. The server interprets the message as an HTTP request and responds. The server then closes the connection.

Exercise 19.7

What message does client2 send to the host when you enter an empty line?

Answer:

The client2 program sends a single byte, the line feed character with ASCII code 10 (the newline character).

Exercise 19.8

Why does the web server still respond if you enter only a line feed and not a <CRLF> for the empty line?

Answer:

Although the HTTP specification [53] says that request lines should be terminated by <CRLF> , it also recommends that applications (clients and servers) be tolerant in parsing. Specifically, HTTP parsers should recognize a simple line feed as a line terminator and ignore the leading carriage return. It also recommends that parsers allow any number of space or tab characters between fields. Almost all web servers and browsers follow these guidelines.

Exercise 19.9

Run Program 18.5 in the same way as in Exercise 19.6, but enter the following.

 GET <SP> /usp/badref.html <SP> HTTP/1.0 <CRLF> <CRLF> 

What happens?

Answer:

The server responds with the following initial line.

 HTTP/1.1 <SP> 404 <SP> Not <SP> Found <CRLF> 

The server response may contain additional header lines before the blank line marking the end of the header. After sending the header, the server closes the connection. Note that the server is using HTTP version 1.1, but it sends a response that can be understood by the client, which is using HTTP version 1.0.

Exercise 19.10

Run Program 18.5, using the following command to redirect the client's standard output to t.out .

 client2 www.usp.cs.utsa.edu 80 > t.out 

Enter the following at standard input of the client. What will t.out contain?

 GET <SP> /usp/images/title.gif <SP> HTTP/1.0 <CRLF> <CRLF> 

Answer:

The t.out contains the server response, which consists of an ASCII header followed by a binary file representing an image. You can view the file by first removing the header and then opening the result in your browser. Use the UNIX more command to see how many header lines are there. If the file has 10 lines, use the following command to save the resources.

 tail +11 t.out > t.gif 

You can then use your web browser to display the result.

To summarize, an HTTP transaction consists of the following components .

  • An initial line ( GET , HEAD or POST for clients and a status line for servers).

  • Zero or more header lines (giving additional information).

  • A blank line (contains only <CRLF> ).

  • An optional message body. For the server response, the message body is the requested item, which could be binary.

The initial and header lines are tokenized ASCII separated by linear white space (tabs and spaces).

Team-FLY


Unix Systems Programming
UNIX Systems Programming: Communication, Concurrency and Threads
ISBN: 0130424110
EAN: 2147483647
Year: 2003
Pages: 274

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net