Understanding HTTP

As you learned earlier, Web servers and browsers communicate using the Hypertext Transfer Protocol (HTTP). The current version of HTTP (1.1) is described in RFC 2616. The purpose of HTTP is to support the transfer of HTML documents. HTTP is an application-level protocol. The HTTP client and server applications use the reliable TCP transport protocol to establish a connection.

HTTP has the following duties:

  • To establish a connection between the browser (the client) and the server

  • To negotiate settings and establish parameters for the session

  • To provide for the orderly transfer of HTML content

  • To close the connection with the server

Although the nature of Web communication has become extremely complex, most of that complexity relates to how the server builds the HTML content and what the browser does with the content it receives. The actual process of transferring the content through HTML is relatively uncluttered.

When you enter a URL into the browser window, the browser first checks the scheme of the URL to determine the protocol. (As you learned earlier in this hour, Web browsers support other protocols besides HTTP.) If the browser determines that the URL refers to a resource on an HTTP site, it extracts the DNS name from the URL and initiates the name resolution process. The client computer sends the DNS lookup request to a name server and receives the server's IP address. The browser then uses the server's IP address to initiate a TCP connection with the server. (See Hour 6 for more on TCP.)

By the Way

In older versions of HTTP (before version 1.1), the client and server opened a new TCP connection for each item transferred. Recent versions of HTTP allow the client and server to maintain a persistent connection.


After the TCP connection is established, the browser uses the HTTP GET command to request the Web page from the server. The GET command contains the URL of the resource the browser is requesting and the version of HTTP the browser wants to use for the transaction. The browser can send the relative URL with the GET request (rather than the full URL) because the connection with the server has already been established:

 GET /watergate/tapes/transcript HTTP/1.1 

The server receives the request and returns the requested document. Along with the document is a header containing several settings. The parameters specified in the header take the form

 keyword:value 

Table 17.4 lists some of the HTTP header fields. All fields are optional, and any field that is not understood by the browser is ignored.

Table 17.4. Examples of HTTP Header Fields

Field

Value Must Be

Description

Content-Length

integer

Size of the content object in octets

Content-Encoding

x-compress

x-gzip

Value representing the type of encoding associated with the message

Date

Standard date format defined in RFC 850

Date in Greenwich Mean Time when the object was created

Last-modified date

Standard date format defined in RFC 850

Date in Greenwich Mean Time when the object was last modified

Content-Language

Language code per ISO 3316

The language in which the object was written

As you can see from Table 17.4, some of the header fields are purely informational. Other header fields may contain information necessary to parse and process the incoming HTML document.

By the Way

The header field format used with HTML is borrowed from the email header format specified in RFC 822.


The Content-Length field is particularly important on today's Internet. In the earlier HTTP version 1.0, each request/response cycle required a new TCP connection. The client opened a connection and initiated a request. The server fulfilled the request and then closed the connection. In that situation, the client knew when the server had stopped sending data because the server closed the TCP connection. Unfortunately, this process required the increased overhead necessary for continually opening and closing connections. More recent versions of HTTP (HTTP 1.1 and later) allow the client and server to maintain the connection for longer than a single transmission. In that case, the client needs some way of knowing when a single response is finished. The Content-Length field specifies the length of the HTML object associated with the response. If the server doesn't know the length of the object it is sending a situation increasingly common with the appearance of Dynamic HTML the server sends the header field Connection:close to notify the browser that the server will specify the end of the data by closing the connection.

HTTP also supports a negotiation phase in which the server and browser agree to common settings for certain format and preference options.



Sams Teach Yourself TCP/IP in 24 Hours
Sams Teach Yourself TCP/IP in 24 Hours (4th Edition)
ISBN: 0672329964
EAN: 2147483647
Year: 2003
Pages: 259
Authors: Joe Casad

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net