3.3 HTTP Methods | WebDAV. Next Generation Collaborative Web Authoring

GET, PUT, DELETE, POST, HEAD, and OPTIONS are all HTTP methods useful in collaborative authoring. This section provides a brief overview of each, and the next chapter provides more details on how to use these methods in a WebDAV client or server implementation.

Web servers frequently contain both static and dynamic pages, and some HTTP methods handle these differently. A static page is one that is stored, byte for byte, the same way it is transmitted over HTTP. A dynamic page is one that is stored containing some source code. The server interprets the code and uses its output as part or all of the entity transmitted over HTTP. Originally, dynamic resources were C or Perl programs hosted out of a "cgi-bin" directory on the server (CGI is Common Gateway Interface [CGI96]). Links to CGI-generated resources may look like this:

http://example.com/cgi-bin/hrdata?empname=alice

More recent technology allows HTML pages to contain embedded script that is evaluated by the server. On the server, these pages are stored in the same directories as static resources, and you can often tell this from the URL:

http://example.com/hr/info/empinfo.jsp?empname=alice

In general, there's no reliable way for clients to tell if a resource is static or dynamic. The client can download any of these pages consistently but can't author dynamic resources.

3.3.1 GET

GET is the workhorse request of HTTP, the one request used to retrieve every static Web page, every dynamic Web page, every image, and every document. A GET request must include the name (full path) of the requested resource and the name of the host on which the resource appears (see Listing 3-7).

Listing 3-7 Simple GET request.

 GET /index.html HTTP/1.1 Host: example.com:80

A GET request commonly includes some of the following information from the client:

The languages the user prefers the response page to be in.
The formats the client can handle. For example, the client may handle image/jpeg and image/bmp but not image/tiff.
The encodings the client can handle. The client may be able to unzip large files compressed using the gzip format and can advertise this feature in an attempt to save bandwidth.
The conditions that must be met in order for the server to process the operation. For example, the client could specify that the file must have been modified in the last day; otherwise, the server can ignore the request.
The part of the response to return (the range).
The browser software and version sending the request.
The user's authentication information.

The response includes the requested resource. If the requested resource is a dynamic resource, the response is the output of the process that is responsible for generating the page. If the request is a static resource, the response typically includes the page in whatever format it is stored in. The static page in its stored format may first be transformed to a compressed format, a transfer encoding, or a multipart/byteranges format under certain circumstances.

The response must include the date and the content type (see Listing 3-8).

Listing 3-8 Simple GET response.

 HTTP/1.1 200 OK Date: Sun, 29 Jul 2001 15:24:17 GMT Content-Length: 25 Content-Type: text/html  <body>Hello World!</body>

The response also commonly includes:

The language of the response body
The last-modified date of the resource
How to cache, or not to cache, the response body
The ETag for the resource (ETags are discussed in detail later; for now it's sufficient to think of it as an ID for the current version of the page)

The GET request and its response illustrate the extremely simple flow of basic HTTP. In one simple request, perhaps as short as two lines of text, the client asks for a useful resource. In a single response, the server sends that resource. It's trivial to write a basic client application to download Web resources.

3.3.2 GET with File System Directories

Web servers construct directory URLs based on the directory path and name, just as with Web pages. The URL may or may not end with a / character, so either or both (or neither!) of these URLs may point to a directory:

http://www.example.com/hr
http://www.example.com/hr/

Section 3.1.4 explained that a GET to a file system directory can return two kinds of successful responses. The first, a dynamically generated content listing, is returned as Content-Type: text/html, which the browser displays as a Web page. The second, a default starting page usually called index.html, has the same content type. In fact, the browser can't tell the difference between a dynamically generated directory contents listing and an index page with links.

Hidden Pages

Even if the server administrator turns off the feature displaying directory contents as HTML, users can find and download publicly readable Web pages. A recent court case involved an unfortunate company with financial information publicly readable in a hidden page. Of course, a reporter guessed the name of the document and retrieved it [Delio02].

3.3.3 PUT

HTTP defines the PUT method to allow Web pages to be authored. When a client knows the URL to a Web page, image, or other document, it can send a PUT request to change the content of a document and set its content type (see Listing 3-9).

Clearly, the PUT request is core to WebDAV because it allows new Web resources to be created and existing resources to be updated. There are some subtleties that HTTP clients rarely have to handle without WebDAV, so I'll wait until Chapter 5 to deal with those.

When used to overwrite a resource, a PUT request may change some of the metadata of the existing resource as well as the body or entity. The client may send a new Content-Type value or a Content-Language value, as well as the new Content-Length value. The server must store this information in order to be able to respond to GET and HEAD requests with the same values.

PUT can reliably be used to edit static resources but not necessarily to edit dynamic resources. To see if a resource can be edited, the client must send an OPTIONS request and see if PUT is an allowed method on the resource (see Section 3.7.2).

Listing 3-9 Simple PUT request and response.

Request:

 PUT /index.html HTTP/1.1 Host: example.com:80 Content-Type: text/html Content-Length: 33  <body>Hello World, part 2!</body>

Response:

 HTTP/1.1 204 No Content Date: Sun, 29 Jul 2001 15:24:07 GMT

PUT isn't as useful in basic HTTP as it is with WebDAV functionality. Without an explicit way to identify and create collections, clients can't be sure how or where to create a new Web page. PUT is easier to use when combined with locking functionality with locks, users can edit a file without worrying about other users changing the file. Many plain Web server administrators disable PUT since it isn't used, which does make security simpler.

3.3.4 DELETE

The DELETE request is defined in HTTP to delete resources (see Listing 3-10).

Listing 3-10 Simple DELETE example.

Request:

 DELETE /index.html HTTP/1.1 Host: example.com:80

Response:

 HTTP/1.1 204 No Content Date: Sun, 29 Jul 2001 16:28:27 GMT

Like the PUT request, DELETE is not used much on servers that only support HTTP. Without all the functionality of file management, including copy, move, and create collections, DELETE just isn't sufficiently useful. As with PUT, plain Web server administrators often disable DELETE for added security.

Like PUT, DELETE can reliably be used with static resources, but not necessarily on dynamic resources. Servers can advertise which resources support DELETE with the Allow header on an OPTIONS response (see Section 3.7.2).

Again, there are more details to consider when DELETE is used in the context of WebDAV, but we'll get to that in the next chapter.

3.3.5 POST

HTTP/1.1 specifies that the behavior of a POST request on an ordinary resource depends on the nature of the resource. If the resource identified by the URL is a program or script, it may be able to accept any kind of custom client request. The most common use of POST is to accept form submissions (see [Krishnamurthy01] Section 6.2).

In particular, when a Web form allows the Web client to upload a file to the server, the POST method is used to send the file. WebDAV doesn't mention forms and form-based uploads at all, but there's nothing to prevent a WebDAV server from hosting a file upload form. Doing so allows the repository to support file upload via ordinary Web browsers as well as WebDAV client software.

The most common way for a Web browser to upload a file to a Web server is to complete a file upload form on a Web site. The way this form is constructed by the server and handled by the client is defined in HTML 3.2 and 4.0. The form has to specify that POST will be used to submit the form, and the form must have an input element of type file. The form must also define what URL to upload the file to (the action attribute), and what encoding or enctype to use with the file body. The multipart/form-data encoding type is the MIME type specifically defined for uploading a file when a form is submitted [RFC2388] (see Listing 3-11).

Listing 3-11 Minimal form for file upload using POST.

 <FORM NAME=fileUpload METHOD=POST    ACTION='/xythoswfs/webui/lisa?action=upload'    ENCTYPE='multipart/form-data'>    <INPUT TYPE=FILE NAME=FILE1 ID=FILE1>    <INPUT TYPE=HIDDEN NAME='targetpath' VALUE='/lisa'> </FORM>

Browsers heavily restrict use of the file input type in order to maintain privacy. Typically, the user must select each file manually; there's no way to upload a collection with all its contents. The input field must appear in the form (not be hidden), and it can't be given a starting value. These restrictions help save the user from being tricked into uploading sensitive information to a server, but they also make forms hard to use in some situations, particularly when the user wants to upload multiple files.

When the user submits the form, the browser sends a POST request to the server. The POST request includes a body of type multipart/form-data [RFC2388]. This MIME type was specifically defined to allow file uploads through HTML form submission. The document type is a standard MIME multipart type with boundaries as defined in RFC2046. Each part of the multipart document has one or more headers that can appear in any order and cover multiple lines.

The document includes one part for each file uploaded. These parts each contain:

The Content-Disposition header holding the name of the file input field and the name of the file uploaded
The Content-Type header with the type of the file being uploaded
A blank line to separate headers from the body
The contents of the file being uploaded

In addition, the multipart/form-data document includes one part for each other piece of form data from the HTML form (whether radio buttons, checkboxes, text fields, or even hidden fields). In theory, these parts are supposed to contain the Content-Type header too, but in practice these parts each contain:

The Content-Disposition header holding the name of the input field
A blank line to separate headers from the body
The body of the section consisting of the value assigned to the field in the form

Listing 3-12 could be the submission of a form by MS IE 6.0. Most of the request headers aren't relevant to this example, but they're shown for completeness.

Listing 3-12 POST request to upload file.

 POST /xythoswfs/webui/lisa?action=upload HTTP/1.1 Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg,    application/vnd.ms-excel, application/msword,    application/vnd.ms-powerpoint, */* Referer: http://www.example.com/xythoswfs/webui/lisa Accept-Language: en-us Content-Type: multipart/form-data;    boundary=---------------------------7d312541017a Accept-Encoding: gzip, deflate User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1) Host: www.example.com Content-Length: 327 Connection: Keep-Alive Cache-Control: no-cache  -----------------------------7d312541017a Content-Disposition: form-data; name="targetpath"  /lisa -----------------------------7d312541017a Content-Disposition: form-data; name="FILE1"; filename="C:\Documents  and Settings\Lisa\My  Documents\ip-address.txt" Content-Type: text/plain  198.144.203.248 -----------------------------7d312541017a--

When this POST request is received by the server, the server must separate each part using the boundary string, and then parse and evaluate the MIME headers from each part. What the server does with the bodies received is completely up to the server. POST has no guaranteed semantics, and neither does the multipart/form-data MIME type. A server might store an uploaded file, translate it and display the translation, email it, or print it out.

The POST method is frequently used to upload files in Web authoring, as discussed in Section 1.1.3. Form uploads vary widely in their implementation, however, on both the server and the client side.

On the server side, since each site hosts its own forms, every form is different. A site can put any fields in the form (Listing 3-12 used the targetpath field to select where to upload the file). The client provides the local file name and sometimes the path in the POST body, but there's no standard way to choose the destination location and file name. There's no standard way to specify if the file uploaded should overwrite a previous file or not.

On the client side, Web browsers don't all implement the multipart/form-data type consistently with each other or with the specification, although most follow the lead of IE. For example:

RFC2388 requires a Content-Type header in each part, but IE omits that header when sending a form value other than a file body, and so do other browsers.
Mac OS uploads files encoded as Mac Binary.
Netscape Navigator 4 on Macintosh escapes the name of the file that is uploaded. Most browsers don't escape the file name.
Some browsers give the full name and location of the file, as IE does (even showing the drive letter, as in Listing 3-12). Other browsers only provide the file name.

The bottom line for form uploads is that the mechanism offers wide interoperability and customizability but has several costs compared to PUT:

Each site must design and maintain its own forms and choose how to handle file names and target locations.
A form is difficult to use for frequent or multiple file uploads.
A form can't easily be used by an editing tool. Users make more mistakes manually selecting files to upload than when their editing tool can upload the file directly.

3.3.6 HEAD

HEAD requests allow the client to find out information about a resource without actually downloading the resource. A HEAD request returns the headers that a GET request (with the same options, to the same target) would have returned but it does not return the body. Thus, a HEAD response is unique in that it contains statements about the body type and length without actually including a body (see Listing 3-13).

Listing 3-13 HEAD request and response.

Request:

 HEAD /index.html HTTP/1.1 Host: nondav.example.com

Response:

 HTTP/1.1 200 OK Date: Sat, 13 Oct 2001 19:11:04 GMT Server: Apache/1.3.14 (Unix) Last-Modified: Thu, 19 Oct 2000 03:28:13 GMT ETag: "870be-8f0-39ee6a4d" Accept-Ranges: bytes Content-Length: 2288 Content-Type: text/html

3.3.7 OPTIONS

The client uses the OPTIONS request to find out the features the server or a specified resource supports. Finding out the supported features of a resource is simple because the OPTIONS request can be addressed directly to that resource.

The response to OPTIONS varies depending on the target resource. Most Web resources support GET, HEAD, and OPTIONS. Some resources may support PUT and DELETE as well, and other resources may support POST. A client might go so far as to send an OPTIONS request for every resource it downloads, in order to see if the resource supports GET. In practice, however, clients don't go that far, because support for common methods (particularly GET) is rather predictable.

The example OPTIONS response is taken from a static resource on an Apache server. Static resources on this Apache server support only the GET, HEAD, OPTIONS, and TRACE methods (not PUT, DELETE, or POST). Other servers show very different OPTIONS responses (see Listing 3-14).

Listing 3-14 OPTIONS request to an individual resource and response.

Request:

 OPTIONS /index.html HTTP/1.1 Host: nondav.example.com

Response:

 HTTP/1.1 200 OK Date: Sat, 13 Oct 2001 15:26:52 GMT Server: Apache/1.3.14 (Unix) Content-Length: 0 Allow: GET, HEAD, OPTIONS, TRACE

Although RFC2616 states that responses to OPTIONS requests are not cachable, this is generally interpreted as saying that proxies must not cache OPTIONS responses. In practice, clients can't afford to send OPTIONS requests before every WebDAV request, so client software often stores OPTIONS information temporarily.

One minor problem with OPTIONS is that it doesn't work on an unmapped URL. An OPTIONS request to a resource that doesn't exist returns 404 Not Found, not a successful OPTIONS response. Therefore, a client cannot see what methods may be used on unmapped URLs without trying those methods out. For example, it's impossible to confirm that PUT can be used to create a new resource in a given location.

3.3.8 OPTIONS *

Many clients send OPTIONS / requests in order to find out what features the server supports. This isn't quite the right way to do things, because OPTIONS introduces a magic Request-URI for this purpose. A Request-URI consisting of a single asterisk or * means that the request does not apply to a single resource but to the server itself. A response to OPTIONS * is theoretically different from the response to OPTIONS / (e.g., PUT may be supported for some resources on the server, but not on the root directory). However, both Apache 1.3 and Microsoft IIS 5.0 generate the same OPTIONS response to both / and *.

Ideally, OPTIONS * responses would show all the features that are supported anywhere on the server. Unfortunately, that isn't always possible. Sometimes features are implemented to extend an existing Web server, and the base HTTP engine isn't aware of all the extension features. For example, one of the quickest ways of implementing a WebDAV server is to start with a Web server that supports an extension mechanism such as Java Servlets and then write the extension code to apply only to requests addressed to the servlet. Thus, OPTIONS * requests might not show WebDAV support, even if an OPTIONS request to /servlet/dav does show WebDAV support.

WebDAV clients should not rely on OPTIONS * alone. Some features might be available or disabled only in certain sections of a repository. This kind of information can only be found through an OPTIONS request directed at a specific resource.

3.3.9 TRACE

The TRACE method is used to loop back the request message to the sender. This can be useful to see how firewalls, proxies, and/or cache servers are modifying the request as it is transmitted to the final recipient.

WebDAV servers should theoretically support TRACE since it is a required part of HTTP/1.1, but it is vestigial and has even been the subject of a minor security hole. The security hole allowed attackers to trick a client to get cookie values from a server not controlled by the attacker [Owen03].

3.3.10 CONNECT

SSL/TLS [RFC2246] can be used to protect the confidentiality of everything in an HTTP transaction, both request and response, including the method and target resource. However, HTTP proxies were originally designed to examine the HTTP request line and headers to know where to forward the request and how to cache it. Proxies aren't necessarily trustable intermediaries, so this is a breach of confidentiality. The CONNECT method is used to work around this problem.

The CONNECT method is reserved in HTTP/1.1, but it is specified in a separate standard [RFC2817]. When the proxy receives a CONNECT request, it connects to the server and then blindly proxies any data sent by the client to the server and from the server to the client. This allows the client and server to have an end-to-end encrypted connection, even when a proxy is in the way.