7.1 Property Representation | WebDAV. Next Generation Collaborative Web Authoring

WebDAV properties are expressed in XML in PROPFIND and PROPPATCH requests and responses. The first piece to put in place is how those properties are named and expressed in XML. This section attempts to build property representation from the ground up, combining rules about how to represent property names and property values. Then when I show complete PROPFIND and PROPPATCH request and response bodies in XML in Sections 7.2 and 7.3, all the pieces will be in place to understand those examples.

7.1.1 Basic Property Value Example

A property value is represented in WebDAV messages as the text contents of an XML element. The element name is the property name. The element namespace is the property namespace (see Listing 7-1).

Listing 7-1 Basic property name/value example.

 <D:getlastmodified     xmlns:D="DAV:">Thu, 16 Aug 2001 23:24:33 GMT </D:getlastmodified>

In this example, getlastmodified is the property name, DAV: is the property namespace, and the value is a string formatted as a date.

7.1.2 Property Name Only

Sometimes only the property name will appear (PROPFIND requests, Section 7.2.1). When this is required, the property name element is shown the same way but without a value. For example, the getlastmodified property is named like this:

 <D:prop xmlns:D="DAV:">     <D:getlastmodified/> </D:prop>

XML marshals empty elements two ways, so it's also possible to see:

 <D:getlastmodified></D:getlastmodified>

An XML parser will treat these two as equivalent, so the WebDAV implementation doesn't have to worry about both.

7.1.3 Empty Property Values

An empty property value is different from a property that does not exist. When a property exists on a resource but has no value, it can appear empty. The formatting of empty property values appears identical to showing property names, but the context is different (this is used in PROPFIND responses, Section 7.2.2). The following example is excerpted from a larger response. The status is showing that the property value was returned successfully; therefore, the value must be empty (see Listing 7-2).

Listing 7-2 Empty property value.

 <D:prop>     <D:resourcetype/> </D:prop> <D:status>HTTP/1.1 200 OK</D:status>

Again, the equivalent XML syntax may be used to compress an empty value representation to:

 <D:resourcetype></D:resourcetype>

7.1.4 Making Property Values Safe

XML needs a way to hold any kind of text without changing the XML parsing or making the XML document invalid. This is done by making the text "safe." In XML documents, < and > are the only control characters, and & is used as an escape character, so these three characters are the only ones that must be treated specially. Any property containing these characters must be made safe to keep the XML document parsable and valid. Otherwise, the XML document may be unparsable or the recipient may misinterpret what characters comprise the property values.

There are two ways to make text safe for XML. One is to wrap the text in a special begin and end string, unlikely to occur naturally in text. This is called encapsulation. The other method, called escaping, replaces each illegal character with a string that can be used to restore the original character when the text is removed from the XML.

Encapsulation

XML defines a way of encapsulating text that may contain illegal characters: The text is preceded by <![CDATA[ and followed by ]]>. CDATA sections cannot nest and may not include ]]>. A property named transit has its value kelowna-->penticton encapsulated as:

 <x:transit><![CDATA[kelowna-->penticton]]></x:transit>

Character Escaping

Text can also be made safe for XML by escaping each illegal character individually. Characters are escaped with the same mechanism used in HTML. Angle brackets (< and >) are replaced with < and >, respectively. Natural occurrences of the ampersand character (&) must be replaced with the string &.

A property named transit with a value of kelowna-->penticton is escaped as:

 <x:transit>kelowna--&gt;penticton</x:transit>

The XML 1.0 specification states that character escaping must not be used inside the CDATA encapsulation (one character-safety operation inside another). That means that escaping and encapsulation aren't supposed to be done in the same step. However, you'll often see double transformations in the real world because one software process will escape characters, and then another process will encapsulate, or vice versa. For example, when the client uses HTML-style escaping for a piece of text, the server may later encapsulate the value in CDATA, even if it's already legal. If the text begins with <![CDATA[, unencapsulate it; otherwise, unescape it. The recipient should only undo one layer of character-safety transformations at a time; otherwise, the value may actually be changed to a different string than it was originally.

Decimal and Hexadecimal Character References

In XML 1.0, it's legal although rare to escape characters using the decimal or hexadecimal (hex) representations defined in the ISO/IEC 10646 character set. The decimal code for a character is prefixed by &# and ends in ;, and the hex code for a character is prefixed by &#x and ends in ;. Thus, there are three valid escapings for the single character >:

 &gt; &#26; &#x1a;

When property values are set by the client, the client may choose to encapsulate or encode the value when sending it to the server. The server may or may not use the same method for making the value safe when it returns the property value, so the client must be prepared to accept a different encoding than the one used when the property value was set.

7.1.5 Storing Property Value Text

Servers must store property names, namespaces, and values and the language of the property if it was provided by the client. There are several approaches to storing properties, and countless variations exist.

The server may store the property value as sent by the client (whether unadorned, escaped, or encapsulated). It may then return the property in exactly the format it was sent and stored. (This is probably easiest for a WebDAV-only system, where property values will always be in XML. Systems that present property values to non-WebDAV clients will probably prefer to store property values in their "real" format instead.)
The server may unescape or unencapsulate the property value when it is received, before storing the value. When sending the property value out in XML, the server may check to see if it needs to be made safe. If it does, the server could use either mechanism to make the data safe.
The server may unescape or unencapsulate the property value when it is received, before storing the value. When sending the property value out in XML, the server may always perform a transformation to make the data safe, whether it needs to be made safe or not.
The server may keep track of the data type of known properties in order to decide whether to escape, encapsulate, or leave alone. For example, if the property is known to be an integer, it doesn't have to be transformed to be safe text for XML.

Since servers may choose any variation on any of these options, and since clients may also submit property values that have been transformed (made safe) multiple times, clients should be prepared to encounter strings like any of the following examples. The third and fourth examples are technically illegal because they contain both kinds of escaping, but they might occur anyway.

[View full width]
 <![CDATA[<P> This value was made safe through encapsulation. </P>]]> &lt;P&gt; This value was made safe through escaping. &lt;/P&gt; <![CDATA[&lt;P&gt; This value was already made safe through escaping, but the server  encapsulated it anyway. &lt;/P&gt;]]> &lt;![CDATA[&lt;P&gt;This value was already safe through encapsulation, but the server  escaped it anyway.&lt;/P&gt;]]&gt;

7.1.6 Whitespace

Between XML elements, it doesn't matter how many whitespace characters (tabs, carriage returns, new lines, or spaces) are included. However, whitespace does matter inside XML text element values. Thus, a string property could have a value of a single space, two spaces, or no spaces and these are all different, valid values. This sometimes causes confusion when whitespace characters are added for readability in testing or debugging. For example, if a test application put spaces before or after a date, inside the element tag, the recipient could find this to be an invalid date value, since date values are not supposed to have spaces. For the date property named getlastmodified, the following representation would be invalid unless the spaces were removed:

 <D:getlastmodified>2001-05-11T17:33:11Z</D:getlastmodified>

For empty property values in particular, this can cause confusion. The following example is not an empty value for the resourcetype property; it is an illegal value consisting of whitespace:

 <D:resourcetype>  </D:resourcetype>

Implementors should think carefully before adding or stripping leading or trailing whitespace. That's why in this book I've been very careful, adding whitespace to improve readability, but only where it doesn't change the meaning of the example. The character is used when a new line couldn't be avoided, even though it shouldn't be considered part of the example.

7.1.7 Internationalization

Property names and property values must both be internationalizable. They may contain characters such as accented characters, Arabic or Hebrew script, Chinese characters, and so on. The XML body of a WebDAV message may use one of several character sets, including the required character sets UTF-8 and UTF-16. Thus, any Unicode character may be represented in an XML document and included in a WebDAV property name or property value.

XML supports Unicode via the UTF-8 and UTF-16 encoding. The recipient may have to convert string properties from the XML encoding character set to an internal representation, but in some languages this is automatic. Properties must be stored in a format compatible with their character set.

All WebDAV implementations must support both UTF-8 and UTF-16 because either set can be used in requests and responses. Other character sets must be negotiated.

Careless handling of character sets may lead to problems:

One client may set a property using a PROPPATCH method with a XML body in UTF-8 format. Another client may set a property using UTF-16. The server must be able to return both properties in an XML document in which all properties are expressed in a consistent character set. Thus, the server must be able to do transformations between character sets.
Property names may be expressed with UTF-8 or UTF-16 characters. The server must be able to compare two property names to see if they are the same property.
If the server supports DAV Search and Location (DASL) or some other mechanism for comparing property values, the character set must be taken into account. Sort order is particularly difficult.

7.1.8 XML-Valued Properties

Some WebDAV property values are strings intended to be parsed as XML. These values contain one or more self-contained XML elements. If the value is not well-formed XML or is incomplete, then the sender has no choice but to encapsulate or escape the value. If the value is well-formed and complete, then the sender might choose to put the value directly into the XML stream. Let's take the example of an XML-formatted value:

 <home>555-1234</home><work>555-4321</work>

We'll put this inside a property named phone in the http://example.com/contacts namespace:

 <x:phone xmlns:x="http://example.com/contacts">   <home>555-1234</home><work>555-4321</work></x:phone>

That was easy, but only because the inner value does not use namespaces, and there's no need to handle prefixes. When namespaces are used, namespace prefixes must be chosen so that they are unique within a scope. The scope of a namespace declaration includes the element where the declaration is placed and every element in the hierarchy underneath, but not any part of the document outside that XML element. If the namespace is defined on the root element, it applies to all elements inside the document.

If the property value uses a new namespace not already declared within the scope, a new prefix must be chosen. For example, we'll modify the preceding example so that the home and work elements are defined in the http://example.com/contacts/phonetypes namespace.

 <x:phone xmlns:x="http://example.com/contacts"   xmlns:y="http://example.com/contacts/phonetypes">   <y:home>555-1234</y:home><y:work>555-4321</y:work> </x:phone>

A sender might be tempted to apply a simple rule: "Always declare a new prefix for every namespace appearing in the value." However, XML scoping rules prevent this if the same namespace is already declared in the same scope. For example, when the resourcetype property, in the DAV: namespace, takes a value that includes the DAV: namespace, the same prefix must be reused:

 <D:resourcetype xmlns:D="DAV:"><D:collection/>   </D:resourcetype>

It would be incorrect for a sender to attempt simply to encapsulate an XML value:

 <D:resourcetype xmlns:D="DAV:"><![CDATA[<D:collection/>]]>    </D:resourcetype>

The problem with the preceding example is that it gives the resourcetype property a value, which is a string equaling <D:collection/>. That's not the same as a value that is an empty XML element named collection in the DAV: namespace. In the former case, an XML parser will return a string-typed variable. In the latter case, an XML parser will return an XML element or node variable, with its namespace.

WebDAV implementations have to go to some trouble to put XML-formatted properties into legal XML documents. Servers can have trouble storing XML-formatted property values such that the property is reconstructed together with its namespaces without any prefix collisions. A WebDAV server needs to detect whether the client has sent an XML-formatted value for a custom property so that later the server knows whether to do prefix correction when it marshals the value in XML. Luckily, XML parsers can do that easily.

7.1.9 Date and Time Properties

Date and time properties, such as creationdate, are represented as strings in an ISO8601 format subset recommended by the IETF [RFC3339]. The format allows dates alone, times alone, or dates and times together and can include time zone information.

Some examples of values for a timestamp property such as creationdate are:

 1997-12-01T17:42:21-08:00 2001-05-11T17:33:11Z

The part up to the "T" is the date. The part after the "T" and before the hyphen or "Z" is the time. The last piece is the time zone, either the GMT ("Zulu") time zone or a number of hours offset from GMT. In this case, the time zone is eight hours before GMT; a plus sign would appear if the time zone were after GMT.

Although the ISO 8601 format can be parsed by humans, it's typically transformed into more readable format for actual display; for example, "7:33 a.m., Saturday May 5, 2001." When the date is formatted for display, some accuracy may be omitted.

Note that although ISO 8601 allows date and time formats that are incomplete (e.g., the date without a time specified, or the time without a date specified), WebDAV makes further restrictions on timestamps; the date must be fully specified and the time must be fully specified, including time zone.

The getlastmodified property is a little different from regular date/time representations. Since it's defined by its relationship to the Last-Modified header in HTTP/1.1, it uses the same format, even though this format has interoperability and internationalization problems and is no longer recommended for IETF protocols. The format for the Last-Modified header and thus the getlastmodified property is defined in RFC2616. An example of the format is Tue, 15 Nov 1994 12:45:26 GMT.

Timestamp Interoperability Challenges

Dates and times are difficult to do interoperably, particularly when the precision can vary. Some sample problems:

Precision: Does the time 7:42:21 match the time 7:42:21.00? Does it match 7:42:21.01? How are these two times sorted?
Is 15:00 before or after 16:00 if they are in different zones?
Is 24 a legal hour value? Is 60 a legal second value?
Should two-digit year representations (from legacy software) be translated to a four-digit year, or should the uncertainty be preserved?