HTTP Transactions | Professional Java Servlets 2.3

< Free Open Study >

The main characteristic in HTTP communications is that there are two parties to an HTTP transaction:

There is the client (called User-Agent in HTTP specifications) that has the responsibility for sending the request to instigate the communication, called a request.
Then there is the server that waits for a client request and then processes the client request and returns a relevant response to the client.

In Internet terms, the client is commonly seen as a web browser, using mostly HTML and image files, communicating with a web server. However in reality the client possibilities are much wider than the browser. A client is simply the party that initiates the request to the server that processes it and returns a response. The client may in fact be another server, acting as client, requesting a resource of the server. In fact, proxy servers fulfil this role, where a client (possibly a browser) makes a request to the proxy server (in this case acting as server), which in turn acts as a client to the server holding the requested resource indicated by the URI.

Proxy servers are often used in companies with firewalls to monitor and direct HTTP traffic through the proxy to the outside Internet. The diagram below outlines this relationship where there are two sets of HTTP transactions, with the proxy server acting as both server on one side and client on the other side:

click to expand

The above shows how, in HTTP Transaction 1, the client makes a HTTP request (1) to the proxy server, and waits for the proxy server's response (4). While the client waits on the first transaction, the proxy server starts a separate (related) transaction with the web server and makes a request (2) to it and receives a response (3) which the proxy server in turn returns to the client of the first transaction.

In HTTP/1.0, client connections were closed after each request. In HTTP/1.1 client connections were allowed to be persistent to allow a client to maintain a connection with the server. For example this allows a client to request a web page and subsequently request images for that page without having to make separate connections for each image, thus saving the overhead cost of creating each connection.

In a Java application, similarly this could allow the client to follow up the initial request with subsequent requests, possibly based on the initial response over the same connection.

HTTP is designed as a stateless protocol (i.e. no state was maintained by the server between client requests). However, there are two main methods that servers use to maintain state. The first is URL rewriting (where additional information is included in the URL) and the second is cookies. Cookies allow the server to send small pieces of information to the client for retrieval from the client on subsequent requests. These are not part of the core HTTP specification document but are included as part of the wider specifications in separate documents specifying their uses.

The HTTP requests and responses contain text-based communication in the headers with possibly text (or binary files) in the message body.

A sample HTTP client request for the default index page of website may look as follows:

     GET / HTTP/1.1     Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, */*     Accept-Language: en-ie     Accept-Encoding: gzip, deflate     User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)     Host: localhost     Connection: Keep-Alive

To briefly explain the client request above, the first line indicates that the request is a GET request for the root directory ("/") using HTTP/1.1 version of the protocol. The second line indicates the types of data it will accept from the server (images or anything "*/*"). Line three specifies the language is English (Irish version). The fourth line (Accept-Encoding) indicates the encoding or compression the client uses. The fifth line (User-Agent) indicates the details of the client making the request. The penultimate line indicates the specific host requested, while the last line indicates that the connection should be kept open by the server for further requests. This will be explained in more detail in the following section.

While the server may respond with:

     HTTP/1.1 200 OK     Date: Tue, 01 Jan 2002 09:50:17 GMT     Content-Type: text/html     Server: Tomcat 4     Content-Location: http://localhost/index.html     Last-Modified: Fri, 28 Sep 2001 23:18:50 GMT     <html>     <head>< title >The Page</title></head>     <body><h1>The page body</h1></body>     </html>

The server's response will be examined in more detail later on, but we will have a brief look at the example response given above. The first line indicates the version of HTTP being used with the 200 OK indicating the request was processed without problems. Line two indicates the date of the response, while the third line indicated the type of the body of the response returned, in this case a HTML text file. The Server header on line four indicates the HTTP server and version, while line five details the location of the resource being returned in the response. Finally in the header fields, the Last-Modified header field indicates the date the document was last changed.

A blank line follows indicating that the header data is completed and the body of the response, if there is one, will begin; in this case a simple HTML page.

In the following subsections we will look at the detail of the HTTP client request and the servers response. We will look at the methods available and headers that can be used in requests/responses.

The HTTP Client Request

The HTTP client is responsible for initiating the communication with the server. To do this it sends a formatted request and waits for a response from the server. The format of the request is as follows:

At a minimum the request will include the method request information that is in the following format:

     Method URI HTTPVersion

This tells us the method being used to make the request, the URI of the requested resource, and the version of HTTP, such as:

     GET /index.html HTTP/1.1

Technically, the Header information is optional but many servers, methods and resources will require some data in various header fields. For HTTP/1.1, at a minimum, the Host header is required as servers may share the same IP address among different web applications on different host URIs so this is the only way they can identify the actual resource requested.

General header information is used in both requests and responses and includes information such as the date, connection or caching information. For example:

     Date: Tue, 01 Jan 2002 09:50:17 GMT     Connection: Keep-Alive

Request header information is obviously used only in the request, and is used to specify relevant information about the client for the server such as the data it prefers to receive, any conditions to the request, or the maximum number of times the request can be forwarded. For example:

     Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, */*     Accept-Language: en-ie     Accept-Encoding: gzip, deflate     User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)     Host: localhost     If-Modified-Since: Tue, 01 Jan 2002 09:50:17 GMT

The Entity header information is used to specify information about the body (entity) of the request being sent, such as the type of the data, the length etc. In Java applications it could specify the type of the data (such as serialized Java objects) being sent. For example:

     Content-Type: application/x-java-serialized-object     Content-Length: 158

The blank line always follows the request header to indicate the end of the header information, and possibly the start of the body of the request (if included).

The Body of the request may contain POST method parameters, files PUT on the server, other files, Java data etc.

HTTP Request Methods

The initial HTTP/0.9 version only supported the GET method for retrieval of resources. As the "informational" specifications evolved into HTTP/1.0, a number of additional request methods were added with varying support. With the advent of HTTP/1.1 two methods (LINK and UNLINK) were dropped and three methods were added. We will examine each of these methods in the following sub sections.

All of the request methods, except where noted in the relevant subsection, are supported methods of javax.servlet.http.HttpServlet, and are available to Java programmers.

HTTP/1.0 Request Methods

HTTP/1.0 added HEAD and POST methods to the GET method already in use. It also added the PUT, DELETE, LINK and UNLINK methods, but the support from servers and clients for these methods was more patchy.

GET Method

The GET method, introduced in HTTP/0.9 was the original request method designed to retrieve information, described as an entity, referenced by the request URI. This is, by convention, only a retrieval method and should not change the information or resources on the server.

We can also use conditional GET requests if one or more conditional If- headers are used. If the range header field is used it may be a partial GET, allowing large documents or data to be more efficiently retrieved in one or more pieces.

Both the conditional GETs and the partial GETs are designed to improve efficiency by reducing the unnecessary network traffic to a minimum.

HEAD Method

The HEAD method (since HTTP/1.0) is essentially the GET request without the return of the message body. The point of the HEAD request is to allow the client to access the header information, without receiving the resource information associated with the request. The header information is identical to that of the GET request, just no entity body is attached after the blank line.

POST Method

This method is designed to allow the client to send a block of data to the server in the message body of the request. This method may result in a new resource URI being created, for example in the case of a post to a notice board, or may involved the sending of data from a form to the server and/or database for processing.

Essentially in Java terms, programmers can treat the GET and POST methods similarly in certain situations, in that similar form parameters may be submitted though each request method, although the POST request is suited also for other data such as files, XML requests, serialized objects etc.

It is also ideal for HTTP tunneling as we can send Java objects, files and other data in a request to the server.

PUT Method

The PUT request is used to store the body of the request at the requested URI. The body may be a file, or other resource such as HTML or XML data, or even a servlet or JSP, and is in effect similar to the File Transfer Protocol (FTP) in relation to transferring files.

The key difference between the POST method and the PUT method is that the PUT method requests that the body of the request is stored at the specified URI, while the POST method requests that the URI specified handles the request, and often will not create a new resource.

DELETE Method

The DELETE method is the converse method to the PUT request. This method allows the client to request that the given resource at the specified URI is deleted, or at least removed to an inaccessible location.

The client is not guaranteed success, even once the response indicating the operation was completed is returned. The server should only return a status code indicating success if, and only if it intends to delete (or already has deleted) the resource at the time of the response.

LINK and UNLINK Methods

The LINK method establishes one or more Link relationships between the existing resource identified by the URI and other existing resources. The UNLINK method removes one or more LINK relationships from the existing resource identified by the URI.

The LINK and UNLINK methods are only mentioned here because they are mentioned in the HTTP/1.0 document, however they are rarely, if ever used, and should be avoided.

They have been dropped completely from the HTTP/1.1 specifications and are not supported request methods in Java servlets (javax.servlet.http.HttpServlet). Of course this class and/or the Servlets API could be extended to support these methods, but this is not advisable.

HTTP/1.1 Additional Request Methods

The HTTP/1.1 specifications define three additional, new methods. One is not implemented yet, while the other two may be more useful to client side developers.

Remember here that when we talk about client-side and client applications this also applies to proxy servers or other servers (in other words Java web applications are included) that need to access external HTTP resources acting as a client in these HTTP conversations.

OPTIONS Method

The OPTIONS method is most useful to the client side developer as it allows the client to determine the options or methods available from a given resource URI on a server. If the request URI is an asterisk, the OPTIONS method applies to the server in general instead of a specific URI resource.

For Java programmers the doOptions() method in HttpServlet need not be overridden, as it will automatically detect the available methods in normal operation. The only reasons for overriding this method are if the servlet has reason to hide a specific method available, or if the servlet extends or implements additional methods beyond those already available.

This HTTP request method could be useful in client applications accessing new or dynamic resources, or if the client is trying to access resources that may not support all of the current HTTP methods.

TRACE Method

The TRACE method simply should return the header information received by the server, back to the client in the body of the response. This is used to allow the client to see exactly what the server received and its primary use is for debugging. This method never includes a body or entity header fields in the request and the response is of the Content-Type message/http with the request in the body.

For Java programmers the doTrace() method in HttpServlet implements this HTTP method and should not be overridden on the server side. The Java client side may use this request for debugging purposes.

CONNECT Method

This CONNECT method is not yet implemented, and is reserved by HTTP/1.1 for use with a proxy server that can dynamically switch to being a tunnel.

This method is not implemented in javax.servlet.http.HttpServlet and is not available to the Java programmer yet.

HTTP Server Response

The structure of the servers response is similar to the client request with two main differences - the response information line and the response headers:

The Response information includes the HTTP version of the response, the status code indicating the result of the request and the message associated with the status code:

     HTTPVersion StatusCode Message

This line indicates the request was processed successfully:

     HTTP/1.1 200 OK

While the following is an example of where the requested resource was not located:

     HTTP/1.1 404 Not Found

General headers are used by both request and response and the following is an example of them from a response:

     Date: Fri, 28 Sep 2001 09:43:29 GMT     Cache-Control: private     Via: 1.1 ni-cache (NetCache NetApp/5.1R2)

Response headers are specific to the server response and are used to include server-specific information such as cookie setting, or authentications:

     Set-Cookie: ASPSESSIONIDQQRTPFWW=MMHTTSHHHYSSWFFYBVWCCRHR; path=/     Location: http://www.harbourne.com     Server: Microsoft-IIS/4.0

Similar to requests, Entity headers in responses indicate information about the body of the request:

     Content-Length: 0     Content-Type: text/html     Last-Modified: Fri, 02 Jan 2002 09:43:29 GMT

HTTP Headers

As we outlined before, there are four types of HTTP headers. General headers and Entity headers are used in both the request and response, while Request and Response headers may only be used on the relevant side of the communication.

The subsections below detail the headers from HTTP/1.0 and HTTP/1.1, and include the additional standard headers for authentication and cookie management.

For each header, the column Since indicates when the header was introduced. As shown, some existed only for HTTP/1.0 and were not included in the HTTP/1.1 specifications:

1.1: since HTTP/1.1
1.0: since HTTP/1.0 (core)
1.0*: since HTTP/1.0 (appendix)
1.0 only: only HTTP/1.0
*: Defined in associated specifications (Security and Cookies)
Note
No headers were defined for HTTP/0.9

General Headers

Header	Since	Use
Cache-Control	1.1	This is used to specify under what conditions should a cached response be returned (by a client), or for a server, under what conditions may the response be cached (if at all). Frequently used with no-cache, no-store, max-age etc.
Connection	1.1	Introduced in version 1.1 to allow connections to remain open to server for more than one request rather than having to reconnect for every resource required from a server. Options are close and keep-alive for persistent connections.
Date	1.0	The date/time that the request was served. Three date formats exist but the format "Tue, 01 Jan 2002 09:50:17 GMT" is preferred.
Pragma	1.0	Retained for backward compatibility with HTTP/1.0. Used in caching with no-cache for documents that should not be cached.
Trailer	1.1	This may be used when chunked transfer encoding is used for the message body.
Transfer-Encoding	1.1	Used with chunked to indicate that the message is encoded.
Upgrade	1.1	This may be used to indicate that communication should upgrade the protocol used for communication to a higher level or more preferable system (e.g. HTTPS).
URI	1.0* only	Used only in HTTP/1.0. Indicates some or all of the URIs by which the requested URI may be identified with.
Via	1.1	This header is useful in debugging problems, usually in association with the TRACE method, as each proxy adds its protocol and host details to the request.
Warning	1.1	This is used to include additional information that may not be included in the response status code.

Request Headers

Header	Since	Use
Accept	1.0*	This is used to indicate the media type or types (separated by commas) that are accepted by the client. Standard ones include "text/html", "application/x-java-serialized-object" etc.
Accept-Charset	1.0*	Indicates the charset(s) that the client is willing to accept.
Accept-Encoding	1.0*	Used to specify the types of encoding that the client understands, such as x-gzip, to reduce network traffic for large amounts of data.
Accept-Language	1.0*	This indicates the language(s) that the client prefers to receive. Useful in internationalization, possibly with Filters in Java servlets.
Authorization	1.0	This contains the client's encoded username and password to the selected resource, usually as a result of the server sending the WWW-Authenticate header.
Cookie	*	Contains cookie information, previously sent by the server to the client.
Cookie2	*	This is used to indicate the version of the state management specifications that the client supports.
Expect	1.1	This is used to indicate specific client expectation, which, if not fulfilled by the server will result in the server returning the 417 status code.
From	1.0	May be used to indicate the e-mail address of the client. This is rarely used due to privacy concerns and Spam e-mail.
Host	1.1	Used to indicate the specific URL being communicated with. This required in version 1.1 as multiple web hosts may share the same IP address.
If-Match	1.1	Conditional request, only to return a body if the ETag header matches one supplied.
If-Modified-Since	1.0	Conditional request, if the server has a later copy of the resource, otherwise the client/proxy will use a cached version.
If-None-Match	1.1	The reverse to If-Match, returning the entity if the ETag does not match one of those supplied.
If-Range	1.1	Used to retrieve part of the data when part of it is already cached.
If-Unmodified-Since	1.1	Obviously the reverse of the If-Modified-Since header indicating the server should not return the entity if it has been modified since the time specified.
Max-Forwards	1.1	This will limit the number of proxy servers or gateway server that can forward the request. This may be specifically useful in debugging, in association with the TRACE method.
Proxy-Authorization	1.1	Used by the client to identify itself to the proxy server.
Range	1.1	Specifies the byte range of the resource data to return. Useful for large files or data where the download was interrupted.
Referer	1.0	Used to indicate the document or resource that referred the client (by a link) to the resource.
TE	1.1	Indicates the list of transfer encodings that the client will accept.
User-Agent	1.0	Identifies the client program making the request (e.g. browser or application).

Response Headers

Header	Since	Use
Accept-Ranges	1.1	Indicates if the server will accept Range requests, and if so, the units that the requests are made in (e.g. none, or bytes).
Age	1.1	This field is used to indicate the age of the document/data being returned, in seconds.
Authentication-Info	*	Used in authentication to indicate the client has been successfully authenticated.
ETag	1.1	Entity Tag associated with the specific resource requested and may be used for caching and conditional requests.
Location	1.0	This specifies the new location for the resource (either created or moved).
Proxy-Authenticate	1.1	This is used for authentication to the proxy, and is used when the client must authenticate with the proxy. The client may resubmit the request with their authentication details.
Retry-After	1.0*	This header indicates that the server may retry their request after the specified date/time or time interval (specified in seconds).
Server	1.0	This is used to identify the server software (including version) used to process the request.
Set-Cookie	*	This is used to send data to the client in a cookie to be returned to the server in subsequent requests, until the specified interval has elapsed (e.g. specified time or date or until the browser shuts down).
Set-Cookie2	*	Slightly modified version of Set-Cookie, but essentially performs the same service with cookies.
Vary	1.1	This header is used to indicate to the client that the requested resource has multiple potential sources, based on the information supplied by the client in the headers returned in the Vary header.
WWW-Authenticate	1.0	This is used to indicate to the client that it must authenticate itself to the server before accessing the requested resource.

Entity Headers

Header	Since	Use
Allow	1.0	This is normally used by the server to indicate that the request method used was not supported (or allowed) and the included request methods are permitted.
Content-Encoding	1.0	This specifies the encoding algorithm used for the body of the request or response. Servers should only use encoding that is supported by the client in the Accept-Encoding header.
Content-Language	1.1	This header specifies the language that the body is in, or aimed at. Can be used in conjunction with the clients Accept-Language header by the server.
Content-Length	1.0	This specifies the length of the body of the entity in bytes.
Content-Location	1.1	Specifies the location that the content body was sourced from.
Content-MD5	1.1	This is used to ensure that the receiver received the entity body without modification or alteration. This is done by running the Message Digest 5 algorithm over the data to produce this header value.
Content-Range	1.1	This is normally used by the server to indicate the range of the data being returned to the client. This allows the client to resume receiving a large response (e.g. large file) from the server.
Content-Type	1.0	This header field specifies the media type of the entity body.
Expires	1.0	This is used to indicate the date/time after which the data becomes invalid and needs to be refreshed from the server.
Last-Modified	1.0	This is used to indicate the time of the last change or modification to the entity.
Link	1.0* only	This is used to indicate relationships between the entity and an other resource or resources. (Dropped for HTTP/1.1)
MIME-Version	1.0* only	This is used to indicate the mime type of the entity body. (Dropped for HTTP/1.1)
Title	1.0* only	Used to indicate the title of the entity. (Dropped for HTTP/1.1)

< Free Open Study >