4.4 Metadata

Metadata is data about data. It's information about a resource, which is itself information. In WebDAV, all resources can have metadata.

Metadata is increasingly useful to the World Wide Web because it has grown larger and increasingly sophisticated. It's not always enough to have a Graphical Interchange Format (GIF) image on a Web site and the only information available about it is its name, lvrfplace.gif. Search engines should be able to find not only text documents with the words living room and fireplace but also images of a living room including a fireplace. Ideally, clients could find out more about the image, such as whether it's copyrighted and how much screen space to allocate to displaying it all before downloading the image.

The data about the image is best made available separately from the image itself. If the information is only available inside the image file, the cost of downloading the entire file may be prohibitive for search engines or for browsers using slow links. Further, many file formats don't allow internal metadata. WebDAV solves this problem by providing a framework for metadata outside the file body. It also defines a basic schema (a set of property names and types) for all kinds of resources.

WebDAV also defines two methods to deal with properties: PROPFIND, to retrieve properties, and PROPPATCH, to change property values. Until Chapter 7, Property Operations, we'll simply take it for granted that these methods are used somehow to get or set property values.

The model for metadata is a flat property/value space. Every property on a resource must have a unique name. Properties cannot contain each other, nor can properties have properties.

Property Values

Property values are always Unicode strings, typically expressed in UTF-8 or UTF-16 characters. The client and server can also negotiate another character formatting. Just about any data type or structured object can be converted to and from a string format, so properties are pretty powerful.

Representing Unicode

Unicode is a code for representing alphabetic and other characters. It can represent most characters used by most written human languages. Although native Unicode environments exist, most Internet communication goes through some system at some point, which may not be native Unicode. However, nearly all systems support ASCII. UTF-8 and UTF-16 allow Unicode characters to be encoded in ASCII.


Both PROPFIND and PROPPATCH use XML message bodies to marshal the XML property names and string property values. String values are usually fairly easy to transmit as text inside an XML element. It's only tricky when the WebDAV property value is also structured as XML, but that topic is covered in detail in Section 7.1.8.

A property may exist but have an empty value. Sometimes this is because the mere existence of a property is used to convey information. A property with an empty value is different from a nonexistent property in that:

  • A request for a nonexistent property will result in a 404 Not Found error response, whereas a request for an empty property will return a successful but empty result.

  • A list of the names of all the properties that exist on a resource does include the names of properties with empty values.

The flexibility of unlimited properties on every resource, property names, namespaces, and the ability to define any data type (as long as it can be represented as a string) means that WebDAV metadata is very powerful. This is the fundamental reason that WebDAV can be used for so many applications.

4.4.1 Live and Dead Properties

The distinction between live and dead properties is rather arbitrary, but it matters because WebDAV defines them to behave differently in certain protocol operations. All the properties defined in the previous section are live properties.

WebDAV defines live and dead properties in RFC2518:

Live Property - A property whose semantics and syntax are enforced by the server. For example, the live "getcontentlength" property has its value, the length of the entity returned by a GET request, automatically calculated by the server. Dead Property - A property whose semantics and syntax are not enforced by the server. The server only records the value of a dead property; the client is responsible for maintaining the consistency of the syntax and semantics of a dead property.

By this definition, live properties include:

  • Properties calculated by the server. The creationdate property is calculated by the server when the resource is created. On some WebDAV servers, the client may later explicitly set or override the value, but regardless, the server sets the value at one point.

  • Protected properties. Clients aren't allowed to change the value of a protected property. The lockdiscovery property is protected because the only way clients can change LOCK information is through the LOCK and UNLOCK requests (and lockdiscovery is also calculated).

  • Properties used as configuration information. These properties may be given values by clients, but the server may check the syntax and then use the value to change its behavior. For example, a current draft [Korver03] proposes a quota-assigned property. A server supporting this property may allow clients with sufficient authorization to set the value of the quota-assigned property on a resource, as long as the value is an integer. The server uses the value to limit the amount of storage used within that resource.

  • Properties where the server verifies syntax or data type. Clients may provide values for use only by other clients, but the server helps enforce consistent syntax. For example, a custom server application might check the employee-start-date to make sure it is a valid date. On that server, the employee-start-date property is live, even though on any server not checking syntax, the property would be a dead property.

Dead properties are written and read by clients without any interference from the WebDAV server. The server doesn't care whether a dead property is set, what the value is, what the data type is, or how big the property is. An example could be the employee fullname property used in the employee directory scenario: Clients set the full name and dynamically generated Web pages display it, but the WebDAV server engine isn't involved except by storing and retrieving the value when asked.

4.4.2 Required Live Properties

WebDAV requires certain properties to exist on any resource (see Table 4-1). Some of these, like resourcetype, are crucial to doing WebDAV operations correctly. Others, like creationdate are more informational. Some are protected, which means clients may not set their values. These properties can be used to display file listings in Windows Explorer style (showing size, type, and last modified date) as well as other common displays. This makes it possible for a WebDAV collection to be displayed and used like a regular file system directory.

Table 4-1. WebDAV Required Properties

Property Name

Meaning

Found On

Protected

creationdate

The date and time a resource was created.

All resources

Yes

displayname

The name of the resource to be shown to users (may be empty).

All resources

Maybe

getcontentlanguage

The language (e.g., English, Spanish) of the resource. Equivalent to the value of the Content-Language header on a GET response.

All resources that respond to a GET request with a content body

Maybe

getcontentlength

The length of the resource. Equivalent to the value of the Content-Length header on a GET response.

All resources that respond to a GET request with a content body

Yes

getcontenttype

Content type (e.g. text, text/xml, application/ms-word). Equivalent to the value of the Content-Type header on a GET response. May include character set information as well.

All resources that respond to a GET request with a content body

Maybe

getetag

The ETag of the resource. Equivalent to the value of the ETag header on a GET response (see Chapter 2, Section 2.4.2).

All resources that respond to a GET request with a content body

Yes

getlastmodified

The date and time a resource body was last modified. Equivalent to the value of the Last-Modified header in response to a GET request.

All resources that respond to a GET request with a content body

Yes

resourcetype

The type of resource (e.g., collection).

All resources

Yes

lockdiscovery

List of detailed information for each lock existing on the resource.

All locked resources

Yes

source

The location of the source code, for a resource that is dynamically generated.

Resources with dynamically generated content (on some servers)

Not specified

supportedlock

List of the kinds of locks that may be created on this resource.

All resources on a server that supports locking

Yes

These semantic descriptions are based on the assumption that the resource is static. A static resource has a well-defined getlastmodified value, but who knows what value to give that property on a dynamic resource, where the content is being regenerated for every GET request?

These properties are discussed in more detail in Chapter 7.

Relationship of Properties and Headers

Some WebDAV properties have a close relationship to HTTP headers. These properties begin with "get" because they have the same value as headers that may be used in the GET response in HTTP/1.1. For example, getcontentlength must have the same value as the Content-Length header in response to a GET request for the same resource.


4.4.3 MOVE, COPY, and Properties

The behavior of live and dead properties matters most for MOVE and COPY. Since dead properties aren't ever checked or set by the server, dead properties are always moved or copied as part of the resource. However, for live properties, the most appropriate behavior depends on the semantics of the property and on the operation.

  • A calculated property like creationdate or lockdiscovery may have to be recalculated. The creationdate might stay the same after a MOVE, but a COPY creates a new resource and the creationdate property is probably given a new value.

  • A configuration property like quota-assigned might even be removed for example, if the resource is being moved to a section of the repository that doesn't enforce quota limits.

  • A property with data-type enforcement like the employee-start-date example will probably be moved or copied just like a dead property.

In the absence of specific information about how to treat the property in COPY or MOVE, the server should try to copy or move the property value but still provide the same semantics or calculations.

4.4.4 Property Names

Property names are XML elements, so there are strict rules about what characters can appear in property names. The first character can only be:

  • A letter (any Unicode letter, not just Latin, and including accented letters)

  • Underscore ( _ )

Subsequent characters can be:

  • Letter (any Unicode letter again)

  • Digit (any Unicode number, not just 0 9)

  • Hyphen (-)

  • Underscore ( _ )

That means that XML element names may not include, in any location, punctuation, spacing, or any nonalphanumeric characters other than hyphen and underscore. The W3C Recommendation for XML lists every legal character range exhaustively.

Here's a list of sample legal property names. Note that case is important, so none of these property names is equivalent to other names:

 
 getcontentlanguage getContentLanguage GETCONTENTLANGUAGE get-content-language get_content_language _getcontentlanguage W-2_Income catégorie 

The following are illegal property names:


W-2 Income     contains a space
--issue--      begins with a hyphen
1099_Income    begins with a number
X.509-Name     contains a period

These rules are stricter than most property name rules in pre-existing software packages which sometimes cause implementation problems. For example, Microsoft Exchange exposes a number of existing mail and calendar objects over WebDAV, but these objects already had property names, which did not always fit the XML element name restrictions. Some of the illegal property names were simply changed, but others were transformed into legal XML element names in a reversible manner.

Typically, clients display their own names for well-known properties. Windows Explorer (when configured to use English) displays directory listings with a "Size" column and displays information about each file including the "Type of File." The values come from the WebDAV properties getcontentlength and getcontenttype. In a French localized WebDAV client, those well-known properties might be displayed instead as "Taille" and "Type de contenu." These localized strings derive from the client's ability to display these well-known properties and localize all strings, not from the server.

A reversible transformation could allow users to type in any property name when creating a new property. It's easier for users if they don't have to learn the odd XML element name rules. However, neither the XML or WebDAV standards provide a standard transformation. The URL-encoding mechanism defined in the URL standard [RFC1738] is a familiar mechanism (see Section 3.1.3), but it uses the percent symbol (%), which is illegal in XML element names.

Microsoft implemented transformations using an algorithm very similar to URL-encoding. Each illegal character is replaced with a sequence of seven characters. The first and last characters are underscores, the second character is an x, and the middle of the sandwich is the Unicode hex code for the character (preceded by enough zeros to make up four characters). Thus, "first name" is transformed to the property name first_x0020_name. Microsoft clients automatically detect the pattern and reverse the transform. There is no completely reliable way to detect whether a property name is escaped, but errors should be rare.

4.4.5 Property Namespaces

Property names must be unique, even if the property name is intended to be used only by a single software package. For example, a WebDAV client implementation might want to put some custom information in a property called lock-semantics. How is that client supposed to guarantee that the property isn't already being used by some other client or by the server? Interoperability would be quite difficult if custom properties could easily overlap. With overlapping property names, clients might expect to write a property that the server has protected, or clients might fail to parse a property with a new value in a different format.

To solve this problem, WebDAV requires the use of namespaces. A namespace is simply a qualifier that disambiguates names. In XML documents, namespaces are abbreviated with prefixes, and the prefixes are put in front of property names, but we'll see later how that's done. The namespace for all the required WebDAV properties is the DAV: namespace.

Namespaces are defined in a W3C Recommendation that extends XML [Bray99]. A namespace must be a valid URI, so it can be any well-formed URL. A URL used as a namespace doesn't need to refer to a real resource, it just has to be in the correct URL syntax.

There are three basic approaches to choosing namespaces:

  • Construct a URI including a string already assigned for use by you or your organization. This can be an IP address, a network card address, an Object Identifier (OID, [RFC3061]) or a DNS address. Many Microsoft namespaces are URLs containing a Microsoft DNS address, although the URL doesn't actually refer to a real resource.

     
     http://schemas.microsoft.com/office2000/ 
  • Include a sufficiently random number or a Universally Unique Identifier (UUID, [Leach98]), and use that consistently. A couple of W3C proposals used UUID namespaces.

     
     urn:uuid:c2f41010-65b3-11d1-a29f-00aa00c14882/ 
  • Reserve a meaningful and human-readable name using a process such as the IETF standards process. The DAV: namespace was reserved in this way [RFC2518]. New names as short as DAV: are discouraged, and most new names begin with urn:. Xml.org submitted an RFC [RFC3120] to reserve all namespaces beginning with:

     
     urn:xmlorg: 

Table 4-2 contains actual examples of properties and namespaces used by a few shipping WebDAV implementations. Note how different the namespace names are.

Table 4-2. Examples of Properties and Namespaces

Namespace

Properties

Used By

DAV:

getcontentlength displayname

All WebDAV clients and servers

urn:schemas-microsoft-com:office:office

Author Words

Microsoft Office 2000

urn:schemas:httpmail

subject To

Microsoft Exchange 2000

http://www.xythos.com/namespaces/StorageServer

quota size

Xythos WebFile Server

In ordinary text, namespaces are often placed in front of names: The property creationdate in the namespace DAV: is referred to as DAV:creationdate. However, this is only used as a convenience in human documents and is not actually part of any specification.

4.4.6 Property Values

All WebDAV properties are expressed in XML; that is, all property values are transmitted as strings within an XML envelope. However, some property values have additional syntax guidelines or value transformations.

  • Some string values contain XML control characters (<, >, and & characters). The control characters must be carefully escaped or the entire value must be encapsulated. Section 7.1.4 of Chapter 7 contains details on both escaping and encapsulation.

  • Some string values are valid XML fragments (e.g., the lockdiscovery property is defined to be valid XML) and thus can safely be embedded in XML without escaping.

  • Date and time values are usually expressed in a subset of the ISO8601 format, although the getlastmodified property is a special case. Section 7.1.9 of Chapter 7 defines date/time values.

Data typing is done only through specification writing in WebDAV. That is, a property is known to be a particular type only by reading the design specification. There's no standard way of signaling the data type of an unknown property. There have been some W3C proposals and recommendations involving data typing of XML elements [Biron01]. A proposal has been made to the WebDAV Working Group for data typing WebDAV properties [Reschke03a], but no standard has emerged.

4.4.7 Internationalization

Since property values may be in different languages in WebDAV, it must be possible also to store and retrieve language information for each property. WebDAV is designed to comply with the standard IETF internationalization practices [RFC2277].

Some text must be accompanied with a language identifier for the client to display, render, or otherwise handle the text correctly [Dürst02]. Some examples:

  • A text-to-audio engine must know whether the word location is a French or an English word to know how to pronounce it.

  • A semantic language parser or translation tool must know the language to guess the meaning of location, because the word means "hiring" or "rental" in French.

  • Many assume that any Unicode character is always displayed consistently, but this isn't true some characters are displayed differently in the context of different languages.

  • Sorting depends on language. English dictionaries sort ll as two separate letters, with llama appearing before location. However, Spanish dictionaries sort ll as a single letter following l, with localidad appearing before llamar.

XML has explicit provisions for tagging property values with a language code in an attribute called lang in the xml namespace. These languages are identified with standard codes [RFC3066]. WebDAV reuses the standard XML mechanism. If the client sends a PROPPATCH request with a language specified for a property value, the server must store that information and return it whenever the property value is retrieved with PROPFIND.

The WebDAV specification doesn't say whether multiple variants of a property could exist in multiple languages. For example, if one client sets an employee title property to Engineer with a language value of en, and another client sets the same property to Ingénieur with a language value of fr, are both values stored, or does one replace the other? Most servers overwrite the property. These servers store a single language code (or nothing if the language is unknown) along with a single value for the property. On these servers and the clients that use these servers, the language identifier is still useful for display and rendering.



WebDAV. Next Generation Collaborative Web Authoring
WebDAV. Next Generation Collaborative Web Authoring
ISBN: 130652083
EAN: N/A
Year: 2003
Pages: 146

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net