15.2 Performance Considerations | WebDAV. Next Generation Collaborative Web Authoring

Whenever the WebDAV repository will be accessed over the Internet or by a large number of users, performance considerations are important. Many tricks that improve performance also improve scalability and vice versa, so don't ignore one if the other is important.

Improving server performance is an art, but it's an art that depends more on programming choices (language, thread and memory management, and data lookup and caching) than on protocol usage choices. In other words, designing a high-performance WebDAV server is a lot like designing any other Internet server. Besides, not many implementors will write their own WebDAV server from scratch. So I'll ignore the server implementation performance considerations and refer readers to language-specific resources like [Bulka00].

The usual WebDAV performance problem isn't slow servers, it's inefficient clients. Clients that make more requests and more roundtrips than are needed not only slow their own performance, but they also make demands of the server that may affect other clients and the server's performance overall. There's very little the WebDAV server implementor or administrator can do about a client that makes more requests than it needs to.

WebDAV was designed to minimize roundtrips because the latency of an HTTP request across the open Internet is high enough to get quite painful if repeated. Early Web browser designers discovered this once Web page authors started putting a dozen small images (icons, bullets, logos) in a Web page [Padmanabhan95]. If the Web browser downloads the HTML document, then makes a request for the first image and waits for a response before asking for the second image, the document takes at least 2 x (n + 1) x L seconds to load, where n is the number of images and L is the latency, or number of seconds it takes for a short message to arrive at its destination. Some HTTP mechanisms were designed to alleviate the latency cost.

15.2.1 Minimizing Unnecessary Requests

There are a number of ways to reduce roundtrips, and one of them is simply to think carefully whether each request is really required, or whether multiple requests can be combined.

Use the Depth header wisely. A request that can be used to apply to one resource can frequently be used to apply to many resources. For example, a client can build a full picture of the directory hierarchy more quickly with a PROPFIND depth infinity than by issuing a separate PROPFIND for every directory. There's always a tradeoff between sending many separate requests and asking for more information than is needed in one request. Servers may forbid depth infinity PROPFIND requests, but it should always be possible to get the properties of a collection and its children in one request.
Cache authentication information. I've seen traces of WebDAV interactions where the client makes every request twice. The first time, the client attempts the request without authentication, and the second time it authenticates. It seems insane for the client not to continue sending the authentication information with each request after the initial authentication challenge, at least for a while.
Go directly to the target resource. Web Folders (IE 6.0 and Windows XP) provides a counter-example of this principle. When connecting to a URL like http://www.example.com/alice for the first time, Web Folders sends an OPTIONS request to the root directory as well as an OPTIONS request to /alice (see the sidebar, Web Folder Creation: Eight Roundtrips). Most WebDAV servers respond the same way to an OPTIONS request for both a folder and its parent. The extra OPTIONS request for the parent costs a roundtrip. It could at least have been done asynchronously rather than force the user to wait for both OPTIONS responses.
Don't keep pinging the server. Web Folders seems to do this as well. At each step of creating a Web Folder, it sends an identical PROPFIND (depth 0, allprop) request to the URL (see the sidebar).
Ask for exactly the properties that are needed. This would have avoided a roundtrip in the Web Folders interaction shown in the sidebar, because the PROPFIND allprop request doesn't return all the properties the client needs. In addition, this saves the server from having to calculate or look up values for properties that the client is going to ignore anyway.

15.2.2 Keeping Connections Alive

Recall that TCP also adds roundtrips to interactions whenever the TCP connection is terminated and needs to be restarted. HTTP has developed two tools to minimize TCP connection restart costs:

Connection "keep-alive" is an HTTP/1.1 header that asks the server to keep the TCP connection open if it can. With a potentially open connection, the client can now pipeline requests or wait for responses normally, but the TCP connection setup cost may be saved.

Web Folder Creation: Eight Roundtrips

When creating a new Web Folder for http://www.example.com/alice/, the client code generates the following requests:

OPTIONS /
OPTIONS /alice
PROPFIND /alice (depth 0, allprop, unauthenticated)
PROPFIND /alice (depth 0, allprop)
PROPFIND /alice (depth 0, asks for specific properties)

Now the client prompts the user to choose a name for the Web Folder. When the user chooses the Next button:

PROPFIND /alice (depth 0, allprop)

Now the client prompts the user to finish. When the user chooses Done:

PROPFIND /alice (depth 0, allprop)
PROPFIND /alice (depth 1, asks for specific properties)

Finally, Web Folders pops open an Explorer window with the contents of the folder.

Pipelining spreads the roundtrip cost across many requests. The client sends multiple idempotent requests on the same connection, without waiting for each response. The server still handles each request in order. Many sets of operations, such as downloading all the images in a Web page, can benefit from pipelining.

15.2.3 Multiple Connections

Some clients initiate many parallel HTTP requests using multiple TCP connections. This is often done to download all the images in a Web page. As soon as the image names are known, the client requests each of them, each in its own connection. This generally improves perceived performance for the user.

TCP connection startup costs are unavoidable when the client chooses to do this. An extra roundtrip is required to set up the connection. TCP slow start [Stevens98] makes the new connection less efficient at first.

Transport layer security (SSL or TLS) makes the situation only slightly worse because these protocols offer a connection resumption feature once initial shared secrets have been established. The place where SSL/TLS demands an unavoidably high cost is in setting up the very first connection and its security context, but users are accustomed to a startup cost the very first time they connect to a server.

Because multiple connections are definitely higher load for the server, a considerate client implementation would only use multiple connections if the benefits outweigh the cost. For example, the client might initiate separate connections only for downloading or uploading large resources and use the original connection for all requests that are likely to be quick.

15.2.4 Minimizing Server Load

Another consideration, which sometimes conflicts with the goal of reducing roundtrips, is that many operations have a high server load. Some of these have been discussed on the WebDAV mailing list:

Any Depth: infinity operation, potentially
PROPFIND allprop requests, particularly if some properties are calculated dynamically
Searches
lockdiscovery is expensive on some servers
Locks of many resources may be expensive

A general recommendation is to use Depth: infinity PROPFIND requests only when they are a clear communications improvement, not just a programming shortcut. If roundtrip reduction is the reason that the client is doing a Depth: infinity request, even though not all the information returned is being used, consider doing pipelined requests for the information that will be used.

An even stronger recommendation can be made in the case of allprop: Don't use it. WebDAV servers may have resources with many properties, even hundreds of properties. Even if a client retrieved the values for all properties on a resource, what would it do with the ones it was unfamiliar with? Thus:

If discovering what properties exist is the goal, use PROPFIND with the propname tag.
If retrieving property values is the goal, use PROPFIND with the names of the desired properties.