Hypertext Transfer Protocol | Professional Java Servlets 2.3

< Free Open Study >

Two parties take part in an HTTP communication: a client and a server. The client sends an HTTP request and the server sends an HTTP response. The request can be thought of as a question by the client to the server, and the response as the server's answer to the client. Most applications act as either a client or a server, but there is not reason why an application can't be designed to act as both - for example, a web server might act as a client in order to gather information from other servers to present to the client. Web service applications are beginning to look like this in cases where they communicate over HTTP (or other protocols) with other servers for information.

Each request by a client to the server (for example, for a web page) is made with a separate connection under version 1.0 of the HTTP specification (HTTP 1.0). So, downloading a complex web page could require many requests to the server; one for each frame of the page, one for each image, one for each stylesheet, and so on. In HTTP 1.1, connections are kept open for a short time, which allows multiple requests to be made over the same connection. This reduces the overhead associated with opening separate connections.

We'll learn more about how we can manage this 'restriction' of HTTP in Chapter 5.

HTTP Methods

HTTP 1.1 provides 8 methods used to request data, and to respond to those requests. The methods are CONNECT, DELETE, GET, HEAD, OPTIONS, POST, PUT and TRACE.

The methods we will use most often are GET and POST:

GET

Clients use the GET method to request a resource from a server. By convention, the request will not change the data stored on the server. From the client's perspective, GET requests should be limited to retrieval of information only. On the server, limited side effects to the request may occur (such as logging), however the principle is that the client is not aware of these and not accountable for them.
POST

Clients use the POST method to post data to the server; for example, to submit a form on an HTTP page. The response to a POST request may be nothing other than a status code that indicates the success or failure of the request.

Other useful methods include:

HEAD

Clients use the HEAD method to check basic information about a resource, such as its size, or time of its last modification. This information can then be used to decide if it is necessary to make a GET request for the full resource, or to use a cached copy.
DELETE

A DELETE request asks the server to delete the specified resource from the server, which it can of course refuse to do. The ability to perform delete requests on resources of a server, if allowed, is normally restricted to authenticated users who hold an administrator role, or an equivalent role within the application. Depending upon the configuration, the server may deny unknown users DELETE requests or request that they authenticate using HTTP access authentication before proceeding. Normally the client will receive a successful reply if the resource was removed or (more frequently) the server intends to remove the resource at a later point (the server may need to delay the removal for performance or other reasons).
PUT

PUT requests are the converse of DELETE requests. They are used to request that the enclosed resource (a file or other resource) is put up on the requested destination on the server. The key difference between the POST and PUT request is that PUT identifies the destination of the resource, while the POST method identifies the servlet or other server resource to process the request.

You can find a complete reference for HTTP in Appendix B, which includes details of the features supported by both version 1.0 and version 1.1 of the specification.

HTTP Headers

HTTP requests contain header information that indicates information about the client and the request. It can include the request method used, information on file types accepted by the client, information on the client (for example the browser being used), client language, and the URL of the referring link to this page. Similarly, the server's response includes header information that the client can use to interpret the response. In addition to the headers already defined in HTTP, we can use our own header information in an application. Obviously, using non-standard header information means that the client applications communicating with our web application need to be able to understand this extra information as well.

There are four categories of header information that is used in HTTP communication. General and Entity headers are used in both client requests and server responses, while client requests may include Request headers and server responses may include Response headers too.

General header information may cover the date of the request/document, caching information, warnings and other information. Entity headers are used to specify information about the body of the request or response. This may include the MIME type of the data being carried, its length, and other information about the entity body.

A client might include Request header information in order to add information about itself (what type of software/browser it is), any cookies previously sent, and the types of response (file or MIME types) that it can understand. A server might include in a Response header information such as setting cookies, authentication information or requests, and information about itself (the server software version).

All of this header information that is available in HTTP requests and responses is available to us as programmers. The HttpServletRequest interface exposes the client header information to the servlet, so that it can interpret and extract header information as required (if supplied).

For servlet responses, many headers are automatically set in the response by the container. The exact implementation of this is container dependent, but we can also set or override the response headings through the HttpServletResponse interface.

Being able to set header information on both the server and client can allow web applications and client applications to add useful information to their communications simply. We will show you exactly how to do this later in the chapter.

HTTP Status Codes

In addition to being able to pass information in headers between client and server (and vice versa) the server also responds to the requests and issues a status code with the response to indicate the result of the request. Often this is to indicate that the request was successful, but the status codes can also indicate not only that there was a problem, but also many different types of problems.

In case you are not aware of it, you have probably seen at least one HTTP status code during your surfing of the Internet. Frequently when we click a link on the web to a page or resource that is broken (the page has been moved or removed without notice) you will see an error page with 404 and Not found (or a similar message). The 404 number is in fact an HTTP status code returned by the server to indicate that the file or resource that you requested was not found by the server. Depending on the configuration of the website you might see a friendlier custom warning page instead.

The container handles the setting of status codes automatically. Normally when our servlet processes successfully and returns a response to the client, the container will automatically include the 200 code that indicates that the request was processed normally (OK). Occasionally we see on the web (or in our web applications during development) that when an unhandled error occurs in the servlet, the server will automatically handle this by sending the response code of 500 indicating Internal Server Error.

There are five ranges of status codes that correspond to five general states. Servers can return informational, success, redirection, client error, or server error responses, depending on the result of the processing of the request.

However, we do not have to leave the setting of the status codes only to the container. The HttpServletResponse interface, as we will see later, has methods so that we can set the status code from the servlet to indicate the result of the request. This is important, as we are not always going to want to return the default code that the container may use. The HTTP status codes are provided as constants in the HttpServletResponse interface for servlets and in the J2SE java.net.HttpURLConnection class for clients needing to interpret them.

For example, if we develop an application that has areas of the site that are only availably to authenticated users (or any other relevant criteria), we can have our code return a specific error to non-authorized users such as 401 Unauthorized, 403 Forbidden or 405 Method Not Allowed.

With custom clients we can make use of the status codes to clearly inform the client about the result of an action by simply setting the status code. The example later in the chapter makes specific use of setting status codes programmatically in the servlet, to let the Java application client know the result of their request. The client makes use of these codes to determine how to process the response and to indicate to the reason a request may have failed (or succeeded) to the user.

HTTP Authentication

HTTP offers a basic method of authentication using Base 64 encoding. HTTP Basic Access Authentication works as follows:

A client makes a request for a file, servlet, or other protected resource.
The server replies with the Unauthorized response, including an authentication header (WWW-Authenticate).
The client has to respond with appropriate credentials (an encoded username and password) to gain access.
If the client fails authentication, the server will respond normally with "forbidden" response.
Browsers normally handle the authentication by presenting the user with a dialog box to enter the username and password, and custom applications will have to do the same or retrieve them from memory or storage. For servlet developers with browser clients, this is a convenient method to present authentication.

Base 64 encoding is an extremely weak form of encoding and should not be used to exchange sensitive information over the Web. HTTPS is generally the preferred option.

A more complex form of authentication is available called Digest Access Authentication, which eliminates the need to send passwords across the network. Some data from the server is used together with the password to create a secure hash, which is returned to the server. The server checks that this corresponds with the password and the data it sent to verify the client's identity. These security procedures are under constant development and improvement.

An alternative approach is to use HTTP over Secure Sockets Layer (HTTPS), which uses public key cryptography. The SSL protocol sits between the TCP and HTTP layers and has become the standard for secure connections over the Internet. We can use HTTPS in conjunction with either Basic or Digest Authentication to improve security and authentication.

You can find more information about how to secure your web applications in Chapter 9.

< Free Open Study >