Metadata is data about data. It's information about a resource, which is itself information. In WebDAV, all resources can have metadata. Metadata is increasingly useful to the World Wide Web because it has grown larger and increasingly sophisticated. It's not always enough to have a Graphical Interchange Format (GIF) image on a Web site and the only information available about it is its name, lvrfplace.gif. Search engines should be able to find not only text documents with the words living room and fireplace but also images of a living room including a fireplace. Ideally, clients could find out more about the image, such as whether it's copyrighted and how much screen space to allocate to displaying it all before downloading the image. The data about the image is best made available separately from the image itself. If the information is only available inside the image file, the cost of downloading the entire file may be prohibitive for search engines or for browsers using slow links. Further, many file formats don't allow internal metadata. WebDAV solves this problem by providing a framework for metadata outside the file body. It also defines a basic schema (a set of property names and types) for all kinds of resources. WebDAV also defines two methods to deal with properties: PROPFIND, to retrieve properties, and PROPPATCH, to change property values. Until Chapter 7, Property Operations, we'll simply take it for granted that these methods are used somehow to get or set property values. The model for metadata is a flat property/value space. Every property on a resource must have a unique name. Properties cannot contain each other, nor can properties have properties. Property ValuesProperty values are always Unicode strings, typically expressed in UTF-8 or UTF-16 characters. The client and server can also negotiate another character formatting. Just about any data type or structured object can be converted to and from a string format, so properties are pretty powerful.
Both PROPFIND and PROPPATCH use XML message bodies to marshal the XML property names and string property values. String values are usually fairly easy to transmit as text inside an XML element. It's only tricky when the WebDAV property value is also structured as XML, but that topic is covered in detail in Section 7.1.8. A property may exist but have an empty value. Sometimes this is because the mere existence of a property is used to convey information. A property with an empty value is different from a nonexistent property in that:
The flexibility of unlimited properties on every resource, property names, namespaces, and the ability to define any data type (as long as it can be represented as a string) means that WebDAV metadata is very powerful. This is the fundamental reason that WebDAV can be used for so many applications. 4.4.1 Live and Dead PropertiesThe distinction between live and dead properties is rather arbitrary, but it matters because WebDAV defines them to behave differently in certain protocol operations. All the properties defined in the previous section are live properties. WebDAV defines live and dead properties in RFC2518: Live Property - A property whose semantics and syntax are enforced by the server. For example, the live "getcontentlength" property has its value, the length of the entity returned by a GET request, automatically calculated by the server. Dead Property - A property whose semantics and syntax are not enforced by the server. The server only records the value of a dead property; the client is responsible for maintaining the consistency of the syntax and semantics of a dead property. By this definition, live properties include:
Dead properties are written and read by clients without any interference from the WebDAV server. The server doesn't care whether a dead property is set, what the value is, what the data type is, or how big the property is. An example could be the employee fullname property used in the employee directory scenario: Clients set the full name and dynamically generated Web pages display it, but the WebDAV server engine isn't involved except by storing and retrieving the value when asked. 4.4.2 Required Live PropertiesWebDAV requires certain properties to exist on any resource (see Table 4-1). Some of these, like resourcetype, are crucial to doing WebDAV operations correctly. Others, like creationdate are more informational. Some are protected, which means clients may not set their values. These properties can be used to display file listings in Windows Explorer style (showing size, type, and last modified date) as well as other common displays. This makes it possible for a WebDAV collection to be displayed and used like a regular file system directory.
These semantic descriptions are based on the assumption that the resource is static. A static resource has a well-defined getlastmodified value, but who knows what value to give that property on a dynamic resource, where the content is being regenerated for every GET request? These properties are discussed in more detail in Chapter 7.
4.4.3 MOVE, COPY, and PropertiesThe behavior of live and dead properties matters most for MOVE and COPY. Since dead properties aren't ever checked or set by the server, dead properties are always moved or copied as part of the resource. However, for live properties, the most appropriate behavior depends on the semantics of the property and on the operation.
In the absence of specific information about how to treat the property in COPY or MOVE, the server should try to copy or move the property value but still provide the same semantics or calculations. 4.4.4 Property NamesProperty names are XML elements, so there are strict rules about what characters can appear in property names. The first character can only be:
Subsequent characters can be:
That means that XML element names may not include, in any location, punctuation, spacing, or any nonalphanumeric characters other than hyphen and underscore. The W3C Recommendation for XML lists every legal character range exhaustively. Here's a list of sample legal property names. Note that case is important, so none of these property names is equivalent to other names: getcontentlanguage getContentLanguage GETCONTENTLANGUAGE get-content-language get_content_language _getcontentlanguage W-2_Income catégorie The following are illegal property names:
These rules are stricter than most property name rules in pre-existing software packages which sometimes cause implementation problems. For example, Microsoft Exchange exposes a number of existing mail and calendar objects over WebDAV, but these objects already had property names, which did not always fit the XML element name restrictions. Some of the illegal property names were simply changed, but others were transformed into legal XML element names in a reversible manner. Typically, clients display their own names for well-known properties. Windows Explorer (when configured to use English) displays directory listings with a "Size" column and displays information about each file including the "Type of File." The values come from the WebDAV properties getcontentlength and getcontenttype. In a French localized WebDAV client, those well-known properties might be displayed instead as "Taille" and "Type de contenu." These localized strings derive from the client's ability to display these well-known properties and localize all strings, not from the server. A reversible transformation could allow users to type in any property name when creating a new property. It's easier for users if they don't have to learn the odd XML element name rules. However, neither the XML or WebDAV standards provide a standard transformation. The URL-encoding mechanism defined in the URL standard [RFC1738] is a familiar mechanism (see Section 3.1.3), but it uses the percent symbol (%), which is illegal in XML element names. Microsoft implemented transformations using an algorithm very similar to URL-encoding. Each illegal character is replaced with a sequence of seven characters. The first and last characters are underscores, the second character is an x, and the middle of the sandwich is the Unicode hex code for the character (preceded by enough zeros to make up four characters). Thus, "first name" is transformed to the property name first_x0020_name. Microsoft clients automatically detect the pattern and reverse the transform. There is no completely reliable way to detect whether a property name is escaped, but errors should be rare. 4.4.5 Property NamespacesProperty names must be unique, even if the property name is intended to be used only by a single software package. For example, a WebDAV client implementation might want to put some custom information in a property called lock-semantics. How is that client supposed to guarantee that the property isn't already being used by some other client or by the server? Interoperability would be quite difficult if custom properties could easily overlap. With overlapping property names, clients might expect to write a property that the server has protected, or clients might fail to parse a property with a new value in a different format. To solve this problem, WebDAV requires the use of namespaces. A namespace is simply a qualifier that disambiguates names. In XML documents, namespaces are abbreviated with prefixes, and the prefixes are put in front of property names, but we'll see later how that's done. The namespace for all the required WebDAV properties is the DAV: namespace. Namespaces are defined in a W3C Recommendation that extends XML [Bray99]. A namespace must be a valid URI, so it can be any well-formed URL. A URL used as a namespace doesn't need to refer to a real resource, it just has to be in the correct URL syntax. There are three basic approaches to choosing namespaces:
Table 4-2 contains actual examples of properties and namespaces used by a few shipping WebDAV implementations. Note how different the namespace names are.
In ordinary text, namespaces are often placed in front of names: The property creationdate in the namespace DAV: is referred to as DAV:creationdate. However, this is only used as a convenience in human documents and is not actually part of any specification. 4.4.6 Property ValuesAll WebDAV properties are expressed in XML; that is, all property values are transmitted as strings within an XML envelope. However, some property values have additional syntax guidelines or value transformations.
Data typing is done only through specification writing in WebDAV. That is, a property is known to be a particular type only by reading the design specification. There's no standard way of signaling the data type of an unknown property. There have been some W3C proposals and recommendations involving data typing of XML elements [Biron01]. A proposal has been made to the WebDAV Working Group for data typing WebDAV properties [Reschke03a], but no standard has emerged. 4.4.7 InternationalizationSince property values may be in different languages in WebDAV, it must be possible also to store and retrieve language information for each property. WebDAV is designed to comply with the standard IETF internationalization practices [RFC2277]. Some text must be accompanied with a language identifier for the client to display, render, or otherwise handle the text correctly [Dürst02]. Some examples:
XML has explicit provisions for tagging property values with a language code in an attribute called lang in the xml namespace. These languages are identified with standard codes [RFC3066]. WebDAV reuses the standard XML mechanism. If the client sends a PROPPATCH request with a language specified for a property value, the server must store that information and return it whenever the property value is retrieved with PROPFIND. The WebDAV specification doesn't say whether multiple variants of a property could exist in multiple languages. For example, if one client sets an employee title property to Engineer with a language value of en, and another client sets the same property to Ingénieur with a language value of fr, are both values stored, or does one replace the other? Most servers overwrite the property. These servers store a single language code (or nothing if the language is unknown) along with a single value for the property. On these servers and the clients that use these servers, the language identifier is still useful for display and rendering. |