15.1 Metadata Usage | WebDAV. Next Generation Collaborative Web Authoring

A large part of a custom application (like the problem-solving environment described in Chapter 14, Custom WebDAV Applications) can be the properties: the data they contain and the way they are organized. A little thought up front can be a benefit to making the application work better. It's not necessary to determine the entire schema up front, though one of the benefits of WebDAV is that a resource's metadata is flexible and can easily be extended or altered later.

15.1.1 Choosing a Namespace

Choose at least one unique namespace. No other group should define elements or properties in this namespace unless there's some strong mechanism for coordination. To ensure this, use a domain name for which your group or your company retains rights. For example, Xythos Software, Inc., uses www.xythos.com as part of its namespaces, or the WebDAV resources site could use www.webdav.org. Only Xythos should use www.xythos.com namespaces unless some other software implementor is specifically adding compatibility for properties defined by Xythos. Domain names work very well for ensuring uniqueness, because a domain name registry exists and because any legal URL is a legal namespace. Turn the domain name into a namespace by filling out the URL with a protocol scheme and perhaps a category or product name. For the Xythos WebFile server, the namespace used is http://www.xythos.com/namespaces/storageServer.

15.1.2 Property Names

If the namespace is well chosen, it should be easy to pick a property name that does not conflict with any other property. This is just like naming variables inside code: Choose a name that communicates what the property stands for.

Note that a valid XML name can only begin with a letter or underscore character, although the name can contain letters, digits, hyphens, and underscores. Thus, a property name of 800number is illegal, but toll-free-number and numberin800code are valid.

15.1.3 Property Values

Recall that one of the benefits the Pacific National Laboratories researchers found (see Section 14.4 in Chapter 14) was that WebDAV is easy to debug, in part because it is all text. A tracing tool can be used to capture a WebDAV request and response, and frequently a bug can be found simply by examining the text of that request and response. Keep this benefit working for you in properties by using human-readable property names and property values. Exchange 2000 did not do this for all properties, and one of the challenges of interoperating with Exchange 2000 servers is to figure out what oddly named or formatted properties might mean.

Complex Values

Some data modeling involves a heavy use of hierarchy. Try to flatten the hierarchy because it is easier for a client, when using WebDAV, to ask for individual properties and get single values. However, when some property substructure is required, this can also be handled using XML values. For example, a value for a userFullName property might be:

 <x:userFullName xmlns:x="http://www.example.com/namespace/">     <x:first>Thomas</x:first><x:last>Poole</x:last> </x:userFullName>

XML elements inside property names should also have namespaces to help guarantee uniqueness and extensibility.

15.1.4 Property Location

Properties can be added to any collection or file. However, the location of the property matters because clients need to address a particular resource to find out the value of a property. If the client can get more needed properties with fewer requests, the system performance will be better. For example, Exchange 2000 puts properties on folders such as the number of children and the number of unread children. Even though the client can calculate the number of child resources and unread child resources (by asking for the unread property on every child resource), the server keeps track of those counts to improve performance.

Sometimes a resource must be used to hold properties that semantically apply to some other concept, such as a user or a repository. For example:

The root collection on a repository might be used to hold properties applying to the whole repository (rather than duplicate that property value on every resource in the repository).
Both Xythos WebFile Server and Apple's iDisk service use properties on the user's home directory to maintain information such as the user's quota for storage usage.

One consideration in determining the location for properties is performance. If a number of properties will be requested as part of a common client action, these properties ought to be retrievable in a small number of requests.

15.1.5 Group Resources in Collections

It can be hard to decide what collections to put resources into. The choice is important because it governs how users browse collections and how fast or large PROPFIND responses are when used for browsing.

How would resources such as résumés be grouped into collections in a résumé repository replacing a résumé database? A natural semantic grouping is the job type the résumé submitter was seeking. However, semantic groupings can break down. Can a résumé belong both to "development" and to "user education?" Does it have to be copied from one collection to another?

The other extreme is not to choose to put all resources in a massive collection. If a résumé repository tosses all résumés into one collection, the application interface can still allow résumé searching and selection via other mechanisms. Still, this may not work very well on all WebDAV servers performance may suffer, and user browsing certainly will.

Another option is to sort documents into collections according to some type of information that isn't ambiguous and doesn't change. For example, résumés could be put into subcollections according to the month of the year they were received. Metadata can be used, rather than collection membership, to indicate that a résumé was submitted for "development," "user education," or both.

15.1.6 Use WebDAV for Global Information

Consider one of the problems of traditional database-backed custom applications. One of the most common ways to deploy a custom client/server database-backed application is to design a database architecture, write client software that uses SQL to access the database, and deploy the client software.

A major problem with that kind of solution, common though it is, is the need to upgrade. For some simple upgrades, the database architecture could be enhanced in one step and the enhanced client software deployed in the next step. However, when more extensive database architecture changes were made that the client wasn't prepared to handle, the database upgrade and deployment of new versions of client software had to be carefully and expensively coordinated.

Designers of these systems came up with all kinds of tricks to minimize the need to upgrade. For example, imagine that the client software needs to present a drop-down list for "Operating System." If the values are hard-coded into the client software, then every time a new value comes up, the client software must be upgraded. Well, that's just so painful that a better solution was quickly found. A table on the database would be created called "OS_TYPES" and it would be populated with values like "Windows 98," "Windows ME," and "Macintosh OS 9." When Apple ships Macintosh OS X, the new field can be added to the server, and all the clients pick up the new value with a simple SQL query.

WebDAV can enable the same trick. Rather than create a table, the repository can provide a property on some globally readable resource. That property can contain all the values for some drop-down selection widget. Alternatively, a file (such as an XML schema) stored on the server can serve the same purpose.

This kind of approach makes it even easier to have rich clients and custom applications that are easy to improve and upgrade.

15.1.7 Handling Multiple Document Schemas

WebDAV already handles custom applications involving documents with various MIME types. For the photo album example design described in the last chapter, the situation is simple: Browsers already know how to display images based on file name extension (.gif, .jpg) or content type. All pictures in a photo album have the same set of properties (title, date taken, display order), so there's little need to distinguish between document schemas.

On the other hand, consider the email and calendaring example. Modern email and collaboration applications frequently handle many kinds of objects, and these different objects have very different sets of properties. Email messages have "to" and "from" properties; appointments have "start-time" and "end-time" properties. However, these objects could all be represented with XML bodies and properties and share the same MIME type. Where the objects really differ is in their metadata schema (whether it's formally defined, as in a DTD, or informally defined). For an application like this, here are the questions that need to be answered:

Are documents with different metadata schemas going to appear in the same folder, or will they be segregated (e.g., invoices go in the invoice folder, receipts in the receipt folder)?
If documents with different schemas appear in the same folder, how are they identified so that the client knows what properties to display for that resource?
If documents are segregated by their schemas into different folders, how are the folders identified or labeled so that the client knows what kind of resource is inside? (Hint: Don't use the name of the folder. This is likely to change for localization or other reasons. Use a property on the folder instead.)

In extreme cases, repositories can have a large number of different schemas (invoices, receipts, applications, resumes, expense reports) and these can change over time. A very flexible and extensible approach to this problem is to have a special location somewhere on the repository for referring to schemas [Lee00]. Note that a schema is never needed for WebDAV properties, but the schema can sometimes provide helpful additional information to clients. The client may be able to intelligently deal with properties that have never been seen before.

The schema approach typically involves a property on the resources, which points to the most appropriate schema to use for interpreting that resource. The schema might provide extra information about properties, such as data type, maximum value, or minimum value. It could indicate which properties should be displayed in listings or which properties are content-indexed and can be searched with text-search forms and can include property descriptions for human consumption.