HTTP

 < Day Day Up > 


The Hypertext Transfer Protocol (HTTP) is the basic, underlying, application-level protocol used to facilitate the transmission of data to and from a Web server. HTTP provides a simple, fast way to specify the interaction between client and server. The protocol actually defines how a client must ask for data from the server and how the server returns it. HTTP does not specify how the data actually is transferred; this is up to lower-level network protocols such as TCP.

The first version of HTTP, known as version 0.9, was used as early as 1990. HTTP version 1.0 as defined by RFC 1945, is supported by most servers and clients (Web browsers). However, HTTP 1.0 does not properly handle the effects of hierarchical proxies and caching, or provide features to facilitate virtual hosts . More important, HTTP 1.0 has significant performance problems due to the opening and closing of many connections for a single Web page.

The current version, HTTP 1.1, solves many of the past problems of the protocol. It is supported by version 4-generation Web browsers and up. There still are many limitations to HTTP, however. It is used increasingly in applications that need more sophisticated features, including distributed authoring, collaboration, multimedia support, and remote procedure calls. Various ideas to extend HTTP have been discussed and a generic Extension Framework for HTTP has been introduced by the W3C. Already, some facilities such as client capability detection and privacy negotiation between browser and server have been implemented on top of HTTP, but most of these protocols are still being worked out. For now, HTTP continues to be fairly simple, so this discussion will focus on HTTP 1.0 and 1.1.

HTTP Requests

The process of a Web browser or other user agent ¾ such as Web spider or Robot ¾ requesting a document from a Web (or more correctly HTTP) server is simple, and has been discussed throughout the book. The overall process was diagrammed in Figure 16-3. In the figure, the user first requests a document from a Web server by specifying the URL of the document desired. During this step, a domain name lookup may occur, which translates a machine name such as www.democompany.com to an underlying IP address such as 206.251.142.3. If the domain name lookup fails, an error message such as "No Such Host" or "The server does not have a DNS entry" will be returned. Certain assumptions, such as the default service port to access for HTTP requests (80), also might be made. This is transparent to the user, who simply uses a URL to access a page. Once the server has been located, the browser forms the proper HTTP request and sends the request to the server residing at the address specified by the URL. A typical HTTP request consists of the following:

 HTTP-Method Identifier HTTP-version  <Optional additional request headers>  

In this example, the HTTP-Method would be GET or POST . An identifier might correspond to the file desired (for example, /examples/Chapter16/report.htm), and the HTTP-version indicates the dialect of HTTP being used, such as HTTP/1.0.

If a user requests a document with the URL http://www.htmlref.com/examples/chapter15/report.htm, the browser might generate a request such as the one shown here to retrieve the object from the server at www.htmlref.com:

 GET /examples/chapter16/report.htm HTTP/1.0 If-Modified-Since: Tuesday, 15-Aug-00 01:39:39 GMT;  Connection: Keep-Alive User-Agent: Mozilla/4.02 [en] (X11; I; SunOS 5.4 sun4m) Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, */* Accept-Language: en Accept-Charset: iso-8859-1,*,utf-8 

People often ask why the complete URL is not shown in the request. It isn't necessary in most cases, except when using a proxy server. The use of a relative URL in the header is adequate. The server knows where it is; it just needs to know what document to get from its own file tree. In the case of using a proxy server, which requests a document on behalf of a browser, a full URL is passed to it that later is made relative by the proxy. Aside from the simple GET method, there are various other methods specified in HTTP. Not all are commonly used. Table 16-3 provides a summary of the HTTP 1.1 request methods .

Table 16-3: Summary of HTTP 1.1 Request Methods

Method

Description

GET

Returns the object specified by the identifier. Notice that it also is one of the values of the method attribute for the form element.

HEAD

Returns information about the object specified by the identifier, such as last modification data, but does not return the actual object.

OPTIONS

Returns information about the capabilities supported by a server if no location is specified, or the possible methods that can be applied to the specified object.

POST

Sends information to the address indicated by the identifier; generally used to transmit information from a form using the method="POST" attribute of the form element to a server-based CGI program.

PUT

Sends data to the server and writes it to the address specified by the identifier overwriting previous content, in basic form. This method can be used for file upload.

DELETE

Removes the file specified by the identifier; generally disallowed for security reasons.

TRACE

Provides diagnostic information by allowing the client to see what is being received on the server.

It is interesting to note that two of the methods ( GET and POST ) supported by HTTP are the values of the form element's method attribute. Recall that this attribute indicates the method in which data is passed from the form to the server-side program. In the case of GET , it is passed through the URL because another page is simply being fetched , as a normal GET request would do. In the case of a POST value, the data of the form is passed behind the scenes to the server program that should return a result page to the browser as well. This is discussed in Chapters 12 and 13 and, as shown by the form element, it should become clear that HTML and HTTP do interact in more than a casual way.

HTTP Request Headers

Within an HTTP request, there are a variety of optional fields for creating a complete request. The fields are specified by a header name, a colon and then a value. A user agent generally sends extra header information in a request indicating the type of device making the request (user-agent), the type of data it prefers (accept), what language is in use (accept-language), and so on. The value of this header information should not be understated. With it, your server side programs can detect things such as the browser being used, the particular types of images supported by the browser, the language of the browser such as French, English, or Japanese, and so on. The common request headers and an example for each are shown in Table 16-4.

Table 16-4: HTTP Request Headers

Request Header

Description

Example

Accept: MIME-type/MIME-subtype

Indicates the data types accepted by the browser. An entry of */* indicates anything is accepted; however, it is possible to indicate particular content types such as image/jpeg so the server can make a decision on what to return. This facility could be used to introduce a form of content negotiation so that a browser could be served only data it understands or prefers. Apache provides this feature and IIS can be extended to provide this feature as well, but at the time of this edition's writing this concept is not widely understood or implemented by Web developers.

Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, */*

Accept-Charset:
charset

Indicates the character set that is accepted by the browser, such as ASCII or foreign character encodings.

Accept-Charset: iso-8859-1,*,utf-8

Accept-Encoding: encoding-type

Instructs the server on what type of encoding the browser understands. Typically, this field is used to indicate to the server that compressed data can be handled.

Accept-Encoding: x-compress

Accept-Language: language-code

Lists the languages preferred by the browser and could be used by the server to pass back the appropriate language data.

Accept-Language: en

Authorization: authorization-scheme authorization-data

This header typically is used to indicate the userid and encrypted password if the user is returning authorization information.

Authorization: user joeblow:testpass

Content-length:
bytes

Gives the length in bytes of the message being sent to the server, if any. Remember that the browser can upload or pass data using the PUT or POST method.

Content-length: 1805

Content-type :
MIME-type/
MIME-subtype

Indicates the MIME type of a message being sent to a server, if any. The value of this field would be particularly important in the case of file upload.

Content-type: text/plain

Date: date-time

Indicates the date and time that a request was made in Greenwich Mean Time (GMT). GMT time is mandatory for time consistency, given the worldwide nature of the Web.

Date: Thursday, 15-Jan-98
01:39:39 GMT

Host: hostname

Indicates the host and port of the server to which the request is being made. It is extremely important in a server that is running many domain names at once in the form of virtual servers.

Host: www. democompany.com

If-Modified-Since:
date-time

This field indicates file freshness to improve the efficiency of the GET method. When used in conjunction with a GET request for a particular file, the requested file is checked to see if it has been modified since the time specified in the field. If the file has not been modified, a 'not modified' code (304) is sent to the client so a cached version of the document can be used; otherwise , the file is returned normally.

If-Modified-Since: Thursday, 15-Jan-98 01:39:39 GMT

If-Match:
selector-string

This field makes a request conditionally only if the items match some selector value passed in. Imagine only using POST to add data once it has been moved to a file called olddata.

If-Match: 'olddata'

If-None-Match:
selector-string

This field does the opposite of If-Match . The method is conditional only if the selector does not match anything. This might be useful for preventing overwrites of existing files.

If-None-Match: 'newfile'

If-Range:
selector

If a client has a partial copy of an object in its cache and wishes to have an up-to-date copy of the entire object there, it could use the Range request header with this conditional If-Range modifier to update the file. Modification selection can take place on time as well.

If-Range: Thursday, 15-Jan-98 01:39:39 GMT;

If-Unmodified-Since: date-time

If the requested file has not been modified since the specified time, the server should perform the requested method; otherwise, the method should fail.

If-Unmodified-Since: Thursday, 15-Jan-98 01:39:39 GMT

Max-Forwards:
integer

Indicates the limit of the number of proxies or gateways that can forward the request. Often ignored in practice.

Max-Forwards: 6

MIME-version: version-number

This field indicates the MIME protocol version understood by the browser that the server should use when fulfilling requests.

MIME-Version: 1.0

Proxy-Authorization: authorization
information

Allows the client to identify itself or the user to a proxy that requires authentication.

Proxy-Authorization: joeblow: testpass; Realm: All

Pragma:
server-directive

Passes information to a server; for example, this field can be used to inform a caching proxy server to fetch a fresh copy of a page.

Pragma: no-cache

Range: byte-range

Indicates a request for a particular range of a file such as a certain number of bytes. The example shows a request for the last 500 bytes of a file.

Range: bytes=-500

Referer: URL

This field indicates the URL of the document from which the request originates (in other words, the linking document). This value might be empty if the user has entered the URL directly rather than by following a link.

Referer: http://www. democompany.com/ reports /index.html

User-Agent:
Agent-code

This field indicates the type of browser making the request. Very useful for browser detection but may be falsified.

User-Agent: Mozilla/4.0 (compatible; MSIE 5.5; Windows 98)

Note 

The request headers seem very familiar because they constitute the same environment variables that you can access from within a server-side program. Now it should be clear how this information is obtained.

HTTP Responses

After receiving a request, the Web server attempts to process the request. The result of the request is indicated by a server status line that contains a response code; for example, the ever popular "404 Not Found." The server response status line takes this form:

 HTTP-version Status-code Reason-String 

For a successful query, a status line might read as follows :

 HTTP/1.0   200   OK 

whereas in the case of an error, the status line might read:

 HTTP/1.0   404   Not Found 

The status codes for the emerging HTTP 1.1 standard are shown in Table 16-5.

Table 16-5: HTTP 1.1 Status Codes

Status-Code

Reason-String

Description

Informational Codes (Process Continues After This)

   

100

Continue

An interim response issued by the server that indicates the request is in progress but has not been rejected or accepted. This status code is in support of the persistent connection idea introduced in HTTP 1.1.

101

Switching Protocols

Can be returned by the server to indicate that a different protocol should be used to improve communication. This could be used to initiate a real-time protocol.

Success Codes (Request Understood and Accepted)

   

200

OK

Indicates the successful completion of a request.

201

Created

Indicates the successful completion of a PUT request and the creation of the file specified.

202

Accepted

This code indicates that the request has been accepted for processing, but that the processing has not been completed and the request might or might not actually finish properly.

203

Non-Authoritative Information

Indicates a successful request, except that returned information, particularly meta-information about a document, comes from a third source and is unverifiable.

204

No Content

Indicates a successful request, but there is no new data to send to the client.

205

Reset Content

Indicates that the client should reset the page that sent the request ( potentially for more input). This could be used on a form page that needs consistent refreshing, rather than reloading as might be used in a chat system.

206

Partial Content

Indicates a successful request for a piece of a larger document or set of documents. This response typically is encountered when media is sent out in a particular order, or byte-served, as with streaming Acrobat files.

Redirection Codes (Further Action Necessary to Complete Request)

   

300

Multiple Choices

Indicates that there are many possible representations for the requested information, so the client should use the preferred representation, which might be in the form of a closer server or different data format.

301

Moved Permanently

Requested resource has been assigned a new permanent address and any future references to this resource should be done using one of the returned addresses.

302

Moved Temporarily

Requested resource temporarily resides at a different address. For future requests, the original address should still be used.

303

See Other

Indicates that the requested object can be found at a different address and should be retrieved using a GET method on that resource.

304

Not Modified

Issued in response to a conditional GET; indicates to the agent to use a local copy from cache or similar action as the request object has not changed.

305

Use Proxy

Indicates that the requested resource must be accessed through the proxy given by the URL in the Location field.

Client Error Codes (Syntax Error or Other Problem Causing Failure)

   

400

Bad Request

Indicates that the request could not be understood by the server due to malformed syntax.

401

Unauthorized

Request requires user authentication. The authorization has failed for some reason, so this code is returned.

402

Payment Required

Obviously in support of commerce, this code is currently not well-defined .

403

Forbidden

Request is understood but disallowed and should not be reattempted, compared to the 401 code, which might suggest a reauthentication. A typical response code in response to a query for a directory listing when directory browsing is disallowed.

404

Not Found

Usually issued in response to a typo by the user or a moved resource, as the server can't find anything that matches the request nor any indication that the requested item has been moved.

405

Method Not Allowed

Issued in response to a method request such as GET , POST , or PUT on an object where such a method is not supported. Generally an indication of what methods that are supported will be returned.

406

Not Acceptable

Indicates that the response to the request will not be in one of the content types acceptable by the browser, so why bother doing the request? This is an unlikely response given the */* acceptance issued by most, if not all, browsers.

407

Proxy Authentication Required

Indicates that the proxy server requires some form of authentication to continue. This code is similar to the 401 code.

408

Request Time-out

Indicates that the client did not produce or finish a request within the time that the server was prepared to wait.

409

Conflict

The request could not be completed because of a conflict with the requested resource; for example, the file might be locked.

410

Gone

Indicates that the requested object is no longer available at the server and no forwarding address is known. Search engines might want to add remote references to objects that return this value because it is a permanent condition.

411

Length Required

Indicates that the server refuses to accept the request without a defined Content-Length. This might happen when a file is posted without a length.

412

Precondition Failed

Indicates that a precondition given in one
or more of the request header fields, such as If-Unmodified-Since , evaluated to false.

413

Request Entity
Too Large

Indicates that the server is refusing to return data because the object might be too large or the server might be too loaded to handle the request. The server also might provide information indicating when to try again, if possible, but just as well might terminate any open connections.

414

Request-URI Too Large

Indicates that the Uniform Resource Identifier (URI), generally a URL, in the request field is too long for the server to handle. This is unlikely to occur as browsers probably will not allow such transmissions.

415

Unsupported Media Type

Indicates the server will not perform the request because the media type specified in the message is not supported. This code might be returned when a server receives a file that it is not configured to accept using the PUT method.

Server Error Codes (Server Can't Fulfill a Potentially Valid Request)

   

500

Internal Server Error

A serious error message indicating that the server encountered an internal error that keeps it from fulfilling the request.

501

Not Implemented

This response is to a request that the server does not support or might be understood but not implemented.

502

Bad Gateway

Indicates that the server acting as a proxy encountered an error from some other gateway and is passing the message along.

503

Service Unavailable

Indicates the server currently is overloaded or is undergoing maintenance. Headers can be sent to indicate when the server will be available.

504

Gateway Time-out

Indicates that the server, when acting as a gateway or proxy, encountered too long a delay from an upstream proxy and decided to time out.

505

HTTP Version
not supported

Indicates that the server does not support the HTTP version specified in the request.

After the status line, the server responds with information about itself and the data being returned. There are various selected response headers, but the most important indicates the type of data in the form of a MIME-type and subtype that will be returned. Like request headers, many of these codes are optional and depend on the status of the request.

An example server response for the request shown earlier in the chapter (see "HTTP Requests") follows:

 HTTP/1.1 200 OK Date: Wed, 20 Sep 2000 18:59:54 GMT Server: Apache/1.3.12 (Unix) Last-Modified: Fri, 25 Aug 2000 22:19:12 GMT Accept-Ranges: bytes Content-Length: 205 Connection: close Content-Type: text/html  <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"   "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">   <html xmlns="http://www.w3.org/1999/xhtml" lang="en">   <head>   <title>  Report 1  </title>   <meta http-equiv="content-type" content="text/html; charset=ISO-8859-1" />   </head>   <body>   <h1>  Report About Important Things  </h1>   <hr />   <p>  Here is some information about important things.  </p>   </body>   </html>  

A list of the common server response headers for HTTP 1.1, as well as examples of each, can be found in Table 16-6.

Table 16-6: Common HTTP 1.1 Server Response Headers

Response Header

Description

Example

Age

Shows the sender's estimate of the amount of time since the response was generated at the origin server. Age values are nonnegative decimal integers, representing time in seconds.

Age: 10

Content-encoding

Indicates the encoding the data returned is in.

Content-encoding: x-compress

Content-language

Indicates the language used for the data returned by the server.

Content-language: en

Content-length

Indicates the number of bytes returned by the server.

Content-length: 205

Content-range

Indicates the range of the data being sent back by the server.

Content-range: -500

Content-type

This probably is the most important field and indicates what type of content is being returned by the server in the form of a MIME type.

Content-type: text/html

Expires

Gives the date/time after which the returned data should be considered stale and should not be returned from a cache.

Expires: Thu, 04 Dec 1997 16:00:00 GMT

Last-modified

The Last-Modified response-header field is used to indicate the date the content returned was last modified. This can be used by caches to decide to keep local copies of objects.

Last-modified: Thursday, 01-Aug-96 10:09:00 GMT

Location

Used to redirect the browser to another page. Occasionally scripts will use this method for browser redirection based on capability.

Location:
http://www. democompany.com/ products/index.htm

Proxy-authenticate

Included with a 407 (Proxy Authentication Required) response. The value of the field consists of a challenge that indicates the authentication scheme and parameters applicable to the proxy for the request.

Proxy-authenticate: GreenDecoderRing: 0124.

Public

Lists the set of methods supported by the server. The purpose of this field strictly is to inform the browser of the capabilities of the server when new or unusual methods are encountered.

Public: OPTIONS, MGET, MHEAD, GET, HEAD

Retry -after

Can be used in conjunction with a 503 (Service Unavailable) response to indicate how long the service is expected to be unavailable to the requesting client. The value of this field can be either an HTTP-date or an integer number of seconds after which to retry.

Retry-after: Fri, 31 Dec 1999 23:59:59 GMT
Retry-after: 60

Server

Contains information about the Web software used.

Server: Apache/1.3.12 (Unix)

Warning

Used to carry additional information about the status of a response that might not be found in the status code.

Warning: 10 Response is stale

WWW-authenticate

Included with a 401 (Unauthorized) response message. The field consists of at least one challenge that indicates the authentication scheme and parameters applicable to the request made by the client.

WWW-authenticate: Magic-Key-Challenge=
555121, DecoderRing= Green

The most important header response field is the Content-type field. The MIME type indicated by this field is a device by which the browser is able to figure out what to do with the data being returned.

MIME

MIME (Multipurpose Internet Mail Extensions) was originally developed as an extension to the Internet mail protocol that allows for the inclusion of multimedia in mail messages. The basic idea of MIME is transmission of text files with headers that indicate binary data that will follow. Each MIME header is composed of two parts that indicate the data type and subtype in the following format:

 Content-type:  type  /  subtype  

where type can be image, audio, text, video, application, multipart, message, or extension-token; and subtype gives the specifics of the content. Some samples are listed here:

 text/html application/x-director application/x-pdf video/quicktime video/x-msvideo image/gif audio/x-wav 

In addition to these basic headers, you also can include information such as the character-encoding language. For more information about MIME, refer to RFC 1521, available from many sites including http://www.faqs.org/rfcs/, or the list of registered MIME types at ftp://ftp.isi.edu/in-notes/iana/assignments/media-types/.

When a Web server delivers a file, the header information is intercepted by the browser and questioned. The MIME type, as mentioned earlier, is specified by the Content-type HTTP response field. For example, if a browser receives a basic HTML file, the text/html value in the Content-type header indicates what the browser should do. In most cases, this results in the browser rendering the file in the browser window. To determine what to do with a particular MIME type that has been sent, the browser consults a lookup table mapping MIME types to actions, as shown here:

click to expand
Note 

A lookup table that maps file extensions to outgoing MIME type headers exists on the server as well.

Notice that in the example, the type is text/html and the actual HTML document is passed back after all the headers are finished. The dialog indicates that the browser itself will handle the file internally. Also notice that the browser indicates that it recognizes the file extensions .html, .htm, .stm, and .shtml as HTML files. However, other file extensions seem to appear as normal HTML when they are viewed online. The MIME type is the key to why a file with an extension such as .cfm, .asp, .jsp, and so on is treated as HTML by a Web browser when delivered over a network, but if opened from a local disk drive is not read properly. These extensions often are associated with dynamically generated pages that are stamped with the HTML MIME type by the server; when reading off the local drive, the browser relies instead on the file extension such as .htm to determine the contents of a file. If a browser attempts to read a file that it is unsure about, either because of file extension or MIME type, it should respond with a dialog box such as the one shown here, as Netscape does.

click to expand

What's very interesting is how Internet Explorer prompts the user to immediately save data if the MIME type is not understood, as shown by this dialog box:

click to expand
Note 

It is very easy to install a relationship between a MIME type and a program to handle the data, yet few people seem to add the relationships to directly deal with any form of data served by a Web server.

Typically, Web pages are delivered properly, so these dialog boxes are not seen. The browser first would read the HTML being delivered and then retrieve any other objects, such as GIF images, sound files, Flash files, Java applets, and so on, that are associated with the page. Each object would result in another request to the server. If the browser encountered something such as

  <img src="images/logo.gif" alt="demo company" height="100" width="200"   alr="Demo Company" />  

it would then form a request like the following:

 GET /images/logo.gif HTTP/1.1 Connection: Keep-Alive User-Agent: Mozilla/4.0 (compatible; MSIE 5.01;                 Windows 98) Accept: application/x-comet, image/gif, image/x-xbitmap,            image/jpeg, image/pjpeg, */* Accept-Language: en-us 

The server then would respond with a similar answer as before, but this time indicating that a MIME type of image/gif is being returned, followed by the appropriate form of binary data to make up an image, as demonstrated here:

 HTTP/1.1 200 OK Date: Tue, 18 Jan 2000 04:41:15 GMT Server: Apache/1.3.4 (Unix) Last-Modified: Wed, 13 Oct 1999 23:37:38 GMT Content-Length: 28531 Connection: close Content-Type: image/gif     GIF87a-  " '"'"{"ss ky -{""' X ?{{j s '}l {{sfT{{szjq~eUmogKv ]QljZckZoe PfegccZccRZcRX  binary file continues  

Consider that with the MIME type set properly, it is literally possible to serve any object. Yet page authors often avoid serving custom forms of data beyond HTML or common media types such as GIF, JPEG, or WAV because of unfamiliarity with the MIME-type configuration possibilities on client and server. Probably this is due to how little is said about MIME when discussing Web site development, but how important it is in the discussion between the browser and the server, and how it is the key to making server-side programs work properly, as discussed in Chapters 12 and 13. In some sense, one can think of the core Web protocols- HTML, HTTP, and MIME-like the world-famous three tenors. People often only remember the first two, but it truly takes three to make it work!



 < Day Day Up > 


HTML & XHTML
HTML & XHTML: The Complete Reference (Osborne Complete Reference Series)
ISBN: 007222942X
EAN: 2147483647
Year: 2003
Pages: 252
Authors: Thomas Powell

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net