Metadata and XML

Metadata is data that describes other data. Metadata about any entity, be it a programming object or a simple Web page, is information that describes that entity. Metadata and the Web have coexisted for a long time. Most Web pages you see broadcast meta-information about themselves for use in search engines. For example, the following lines of metadata appear in the header of Microsoft's Internet Information Server (IIS) 5 HTML documentation.

 <META name="DESCRIPTION" content="Navigational page with links to an extensive glossary, late-breaking IIS information, installation instructions, descriptions of new features, concise quick-start  procedures for experienced server administrators, and tips for the using the IIS documentation."> <META HTTP-EQUIV="Content-Type" content="text/html; charset=Windows-1252"> <META NAME="MS.LOCALE" CONTENT="EN-US"> 

This information sits behind the scenes. Although it does not appear when you look at the Web page through a browser, it's useful for providing contextual information about the page to tools such as search engines.

In this example metadata is used to convey a brief description of the page (that is, Navigational page with links to an extensive glossary...), the content-type and character set that the page uses (that is, text/html; charset=Windows-1252), and locale information (that is, EN-US).

Uses for Metadata

Uses for metadata extend beyond making your Web site more searchable on Excite.com. How else can metadata be used? In one example, plug-n-play devices have for years used metadata to broadcast hardware and driver information whenever they are installed on PCs.

All URL requests made by Web browsers send metadata about the software the user is running to browse the Web, the IP address of the user's computer, and the content-type and character encoding of the request itself. Web server software such as IIS use this information to filter requests or redirect users to pages that are more compatible with their browsers.

Transport Control Protocol (TCP) packets transmit metadata through packet headers. This information includes, among other details, the sender's IP address, the receiver's IP address, and the packet's sequence number. Audio CDs and DVDs have table-of-contents data sections that tell the player how many tracks there are, their duration, and where they begin on the disc.

What does this have to do with XML? Metadata is intrinsic to all XML documents. An XML document's elements and attributes provide extra information about the document's content. This allows people and programs to easily parse and understand information represented in XML format. We will spend the next few pages exploring XML comments, elements, and schemas and DTDs in greater depth.

Using XML for Comments

One straightforward way to include metadata in XML is by inserting comments into a transformed document. XSLT provides an <xsl:comment> element for this purpose. The XSLT processor replaces instances of this element with a well-formed comment in the result document. This form of metadata is mostly intended for human consumption.

Listing 8-1 contains a list of vegetable names and colors. In this example we will apply an XSLT the document and insert a comment along the way. Listing 8-2 shows the XSLT code behind the transformation.

Listing 8-1 vegetables.xml: A list of vegetables to be sorted and displayed.

 <vegetables> <vegetable> <name>cabbage</name> <color>red</color> </vegetable> <vegetable> <name>carrot</name> <color>orange</color> </vegetable> <vegetable> <name>asparagus</name> <color>green</color> </vegetable> <vegetable> <name>squash</name> <color>yellow</color> </vegetable> </vegetables> 

Listing 8-2 vegetables.xsl: This style sheet generates a sorted HTML table with XML comments.

 <?xml version='1.0'?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" >  <xsl:template match="/vegetables"> <html> <body> <xsl:comment>order the vegetables by name</xsl:comment> <table> <tr> <th>Vegetable</th> <th>Color</th> </tr> <xsl:apply-templates select="vegetable"> <xsl:sort select="name"/> </xsl:apply-templates> </table> </body> </html> </xsl:template> <xsl:template match="vegetable"> <tr> <td><xsl:value-of select="name"/></td> <td><xsl:value-of select="color"/></td> </tr> </xsl:template> </xsl:stylesheet> 

The Listing 8-2 transformation, when applied to the Listing 8-1 source document, produces the following HTML source.

 <html> <body> <!--order the vegetables by name--> <table> <tr> <th>Vegetable</th> <th>Color</th> </tr> <tr> <td>asparagus</td> <td>green</td> </tr> <tr> <td>cabbage</td> <td>red</td> </tr> <tr> <td>carrot</td> <td>orange</td> </tr> <tr> <td>squash</td> <td>yellow</td> </tr> </table> </body> </html> 

Elements

An XML document's elements are a good source of meta-information. For example, here is a record from a personnel database in the Microsoft Excel CSV format:

 John,Smith,200 Brattle Street,Cambridge,MA,02138,brown,11/30/1974,blue 

The meaning behind most of this data is obvious. The person's name is "John Smith" and his address is "200 Brattle Street; Cambridge, MA 02138." Unfortunately, the significance of the last three fields ("brown, 11/30/1974, blue") is less clear. Is "blue" John Smith's hair color or favorite kind of cheese?

Listing 8-3 contains the same data in XML format.

Listing 8-3 person.xml: An XML representation of a person's profile.

 <person> <firstName>John</firstName> <lastName>Smith</lastName> <address type="home"> <address1>200 Brattle Street</address1> <city>Cambridge</city> <state>MA</state> <zip>02138</zip> </address> <hairColor>brown</hairColor> <birthDate>11/30/1974</birthDate> <favoriteColor>blue</favoriteColor> </person> 

Although the actual data contained in both documents is the same, the metadata in the XML version makes it easier to understand. We were correct in guessing his name and address and the meaning of "brown", "11/30/1974", and "blue" is now evident. XML elements can also be reordered with no impact on their interpretation. The elements in the document could be ordered alphabetically (by level of hierarchy) just as well.

 <person> <address type="home"> <address1>200 Brattle Street</address1> <city>Cambridge</city> <state>MA</state> <zip>02138</zip> </address> <birthDate>11/30/1974</birthDate> <favoriteColor>blue</favoriteColor> <firstName>John</firstName> <hairColor>brown</hairColor> <lastName>Smith</lastName> </person> 

Reordering the elements has no effect on this document's meaning. The same is not true of delimited documents, where meaning is derived from arbitrary positioning. A simple swap of fields will change John Smith's hair color from brown to blue!

Schemas and DTDs

XML documents can also include a second kind of metadata in the form of a schema or a DTD. Schemas and DTDs both serve the same purpose but use different syntaxes. They carve out a language, or class, of XML documents that governs the types of elements and attributes that might appear in conformant documents. Schemas and DTDs can also impose a rigid or loose structure on the XML documents they define. For example, a <person> element must have first <firstName> and <lastName> child elements but can also have a <favoriteColor> element. Listing 8-4 is the schema that defines a person XML document. Note that a valid coreXMLPerson document need not include all of the elements defined in the schema. Our example from earlier would not be a valid codeXMLPerson document because the schema requires address.address1, address.city, address.state, and address.zip instead of address1, city, state, and zip.

Listing 8-4 PersonSchema.xml: This schema provides metadata describing the structure of a coreXMLPerson XML document.

 <?xml version="1.0"?> <Schema name="coreXMLPerson" xmlns="urn:schemas-microsoft-com:xml-data" xmlns:dt="urn:schemas-microsoft-com:datatypes"> <ElementType name="firstName" content="textOnly" model="closed"/> <ElementType name="lastName" content="textOnly" model="closed"/> <ElementType name="birthDate" content="textOnly" model="closed"/> <ElementType name="hairColor" content="textOnly" model="closed"/> <ElementType name="favoriteColor" content="textOnly" model="closed"/> <ElementType name="address.address1" content="textOnly" model="closed"/> <ElementType name="address.address2" content="textOnly" model="closed"/> <ElementType name="address.address3" content="textOnly" model="closed"/> <ElementType name="address.city" content="textOnly" model="closed"/> <ElementType name="address.state" content="textOnly" model="closed"/> <ElementType name="address.zip" content="textOnly" model="closed"/> <ElementType name="address.country" content="textOnly" model="closed"/> <ElementType name="address" order="many" content="eltOnly" model="closed"> <AttributeType name="type" /> <element type="address.address1" minOccurs="0" maxOccurs="1"/> <element type="address.address2" minOccurs="0" maxOccurs="1"/> <element type="address.address3" minOccurs="0" maxOccurs="1"/> <element type="address.city" minOccurs="0" maxOccurs="1"/> <element type="address.state" minOccurs="0" maxOccurs="1"/> <element type="address.zip" minOccurs="0" maxOccurs="1"/> <element type="address.country" minOccurs="0" maxOccurs="1"/> </ElementType> <ElementType name="person" order="many" content="eltOnly" model="closed"> <element type="firstName" minOccurs="0" maxOccurs="1"/> <element type="lastName" minOccurs="0" maxOccurs="1"/> <element type="address" minOccurs="0" maxOccurs="*"/> <element type="hairColor" minOccurs="0" maxOccurs="1"/> <element type="favoriteColor" minOccurs="0" maxOccurs="1"/> </ElementType> </Schema> 

We can see that this schema strictly defines all of the ingredients that go into a coreXMLPerson document. We now know what types of elements and attributes can appear inside a document of this type. We also know the structure a coreXMLPerson document must have. For instance, address information cannot appear directly beneath a <person> element: it must reside within one of many possible <address> elements.

Two types of schema definitions are in wide circulation today: XSD and the XML-Data Reduced Language (XDR). XSD is the official schema language of the W3C. XDR is an interim schema definition language the W3C offered to developers while they were drafting the XSD specification. Microsoft's MSXML 4 parser supports both types of schema definitions.

XML documents can reference their respective schemas or DTDs through the namespace attribute. To include a reference to the coreXMLPerson schema, we only need to change the root element of the Listing 8-3 document.

 <person xmlns="x-schema:personschema.xml"> ... </person>  

This tells the XML parser that a validating XDR schema can be found at the location personschema.xml. Now the person or program interpreting the XML data has more meta-information to use. We can see what other elements might appear inside similar documents. We also have a means to validate, or check the integrity of, the document we are analyzing.



XML Programming
XML Programming Bible
ISBN: 0764538292
EAN: 2147483647
Year: 2002
Pages: 134

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net