XML Data Schemas

One final XML-related technology that you should be aware of is the use of schemas to validate XML data. To understand what I mean by validate, consider the following scenario:

Company A is a retailer who wants to order goods from company B. To do this, companies A and B agree to exchange an XML order document. Company A sends an XML document containing order information to company B. When company B receives the order, the order must contain certain key pieces of information, such as products required, quantities required, and so on. Company B doesn’t want to start processing the order if that information isn’t there. In fact, if any of the required information is missing, the order can’t be processed because it is invalid.

In this scenario, companies A and B need to agree beforehand on what information an order document must include. Then both companies must work according to a common definition that they can refer to in order to ensure that the documents being exchanged are valid based on this common definition.

A schema is an XML document that contains the definition for a particular type of XML document. You and your trading partners can create schemas to define the business documents you will exchange, thus guaranteeing that you can always refer to a common specification to ensure document validity. Schemas are a more flexible approach to solving the problem of document validation than an older technology called Document Type Definitions (DTDs). This section will discuss some of the important aspects of schemas. For a full explanation of schemas, consult the MSDN library at http://msdn.microsoft.com/library.

The syntax used to create schemas isn’t yet fully approved by the World Wide Web Consortium (W3C). When the final specification is published, Microsoft is committed to supporting it in its products. Meanwhile, Microsoft does support a subset of a syntax named XML-Data, which was submitted as a proposal to the W3C in 1998. This subset is known as XML-Data Reduced (XDR).

Creating an XDR Schema

An XDR schema is an XML document containing declarations of the elements and attributes that can be used in an XML document based on that schema. The schema can also dictate the data types of the elements and attributes, the order in which these elements and attributes must appear, the minimum and maximum lengths of their values, and the minimum and maximum occurrences of these elements and attributes within a document. Furthermore, it can determine whether additional elements and attributes not defined in the schema can be included in a document based on that schema.

Schemas are based on the xml-data namespace. The xml-data namespace defines the syntax used to declare elements and attributes in a schema. The top-level element in a schema is the schema element, which can contain an optional name attribute. A minimal schema is shown here:

 <?xml version="1.0"?> <Schema name="orderschema"     xmlns="urn:schemas-microsoft-com:xml-data"> </Schema> 

Of course, this schema is fairly useless because it doesn’t actually contain any declarations. To define a valid XML document, we must add element and attribute declarations to the schema. You declare elements and attributes by using the ElementType and AttributeType keywords. Elements are declared at the top level of the schema, while attributes can be declared globally, in which case the same attribute can be used in multiple elements or scoped locally within an element declaration. Once you’ve declared an element or an attribute, you can define an instance by using the element or attribute keyword.

Remember that XML is case sensitive. You must use the correct case for the ElementType, AttributeType, element, and attribute keywords.

The following example shows a schema that declares elements and attributes for an XML order document:

 <?xml version="1.0"?> <Schema name="orderschema"     xmlns="urn:schemas-microsoft-com:xml-data">     <ElementType name="OrderDate"/>     <ElementType name="Customer"/>     <ElementType name="Product">         <AttributeType name="ProductID"/>         <AttributeType name="UnitPrice"/>         <attribute type="ProductID"/>         <attribute type="UnitPrice"/>     </ElementType>     <ElementType name="Quantity"/>     <ElementType name="Item">         <element type="Product"/>         <element type="Quantity"/>     </ElementType>     <ElementType name="Order">         <AttributeType name="OrderNo"/>         <attribute type="OrderNo"/>         <element type="OrderDate"/>         <element type="Customer"/>         <element type="Item"/>     </ElementType> </Schema> 

Each element in the order document is declared in the schema. The first two element declarations, the OrderDate and Customer elements, are relatively simple, but the declaration for the Product element bears closer scrutiny. The Product element declaration includes two AttributeType declarations to specify the attributes that will be used in this element. The attribute keyword is then used to define an instance of a single attribute of each type declared in the element.

Next, the Quantity element is declared, followed by a declaration of the Item element, which contains an instance of the Product element and an instance of the Quantity element. Finally, the Order element is declared. This element contains an attribute declaration for the OrderNo attribute, and instances of the OrderNo attribute and the OrderDate, Customer, and Item child elements. An XML order document based on this schema could look like the following example:

 <?xml version="1.0"?> <Order OrderNo="1234">     <OrderDate>2001-01-01</OrderDate>     <Customer>Graeme Malcolm</Customer>     <Item>         <Product Product UnitPrice="18">Chai</Product>         <Quantity>2</Quantity>     </Item>     <Item>         <Product Product UnitPrice="19">Chang</Product>         <Quantity>1</Quantity>     </Item> </Order> 

Specifying a Content Model

By default, the content model for an XDR schema is open, meaning that elements and attributes not defined in the schema can still be included in documents based on the schema. For example, the following document could be based on the schema we defined earlier and would be considered valid, even though it includes a DeliveryDate element that isn’t defined in the schema:

 <?xml version="1.0"?> <Order OrderNo="1234">     <OrderDate>2001-01-01</OrderDate>     <Customer>Graeme Malcolm</Customer>     <DeliveryDate>2001-02-02</DeliveryDate>     <Item>         <Product Product UnitPrice="18">Chai</Product>         <Quantity>2</Quantity>     </Item>     <Item>         <Product Product UnitPrice="19">Chang</Product>         <Quantity>1</Quantity>     </Item> </Order> 

You can set the content model of each element in the schema by specifying a model attribute with a value of open or closed. For example, the schema could be enhanced to limit the possible elements and attributes by specifying a closed content model for each element, as shown here:

 <?xml version="1.0"?> <Schema name="orderschema"     xmlns="urn:schemas-microsoft-com:xml-data">     <ElementType name="OrderDate" model="closed"/>     <ElementType name="Customer" model="closed"/>     <ElementType name="Product" model="closed">         <AttributeType name="ProductID"/>         <AttributeType name="UnitPrice"/>         <attribute type="ProductID"/>         <attribute type="UnitPrice"/>     </ElementType>     <ElementType name="Quantity" model="closed"/>     <ElementType name="Item" model="closed">         <element type="Product"/>         <element type="Quantity"/>     </ElementType>     <ElementType name="Order" model="closed">         <AttributeType name="OrderNo"/>         <attribute type="OrderNo"/>         <element type="OrderDate"/>         <element type="Customer"/>         <element type="Item"/>     </ElementType> </Schema> 

This subtle alteration ensures that a document based on this schema must contain only the elements and attributes defined in the schema. No additional data is allowed.

Limiting the Content of an Element

Elements can contain text values and subelements. You can use the content attribute in a schema to limit the kind of data that can be used in an element. You can use the following values for the content attribute:

  • textOnly Only text is allowed.
  • eltOnly Only subelements are allowed.
  • mixed Both text and subelements are allowed.
  • empty The element must not contain any text or subelements.

The following enhanced order schema shows how you can use the content attribute to limit the types of XML data representation allowed in an XML document:

 <?xml version="1.0"?> <Schema name="orderschema"     xmlns="urn:schemas-microsoft-com:xml-data">     <ElementType name="OrderDate" model="closed" content="textOnly"/>     <ElementType name="Customer" model="closed" content="textOnly"/>     <ElementType name="Product" model="closed" content="mixed">         <AttributeType name="ProductID"/>         <AttributeType name="UnitPrice"/>         <attribute type="ProductID"/>         <attribute type="UnitPrice"/>     </ElementType>     <ElementType name="Quantity" model="closed" content="textOnly"/>     <ElementType name="Item" model="closed" content="eltOnly">         <element type="Product"/>         <element type="Quantity"/>     </ElementType>     <ElementType name="Order" model="closed" content="mixed">         <AttributeType name="OrderNo"/>         <attribute type="OrderNo"/>         <element type="OrderDate"/>         <element type="Customer"/>         <element type="Item"/>     </ElementType> </Schema> 

In this version of the schema, the OrderDate, Customer, and Quantity elements can contain only text, while the Item element can contain only subelements.

Determining the Required Occurrences of Data

Another important constraint you can enforce by using schemas is the definition of what data is required in a document and what the minimum and maximum occurrences of each piece of data are. You can omit elements and attributes declared in a schema from documents based on the schema or they can appear multiple times. To define the rules for data occurrence use the required, default, minOccurs, and maxOccurs attributes.

Requiring Attributes

An attribute can appear only once in its parent element, so the question of minimum or maximum occurrences is academic—either the attribute appears or it doesn’t. You can force the inclusion of an attribute by specifying a required attribute with a value of yes. The parser will consider invalid any documents that omit the attribute.

You can also specify a default value for an attribute. This value is used when the attribute is omitted in a document based on the schema. To declare a default value you must use the default attribute in the instance of the attribute for which you want to define a default value. Note that the default declaration goes in the instance tag for the attribute (attribute), rather than the declaration (AttributeType).

Limiting Occurrences of Elements

To control the number of times an element can be used in a document based on the schema, use the minOccurs and maxOccurs attributes. These attributes take an integer value to define how many times the element can appear or a wildcard asterisk (*) value to specify that it can appear an infinite number of times. To ensure that an element appears in the document, set the minOccurs attribute to 1. You can use the maxOccurs attribute to specify the maximum number of instances allowed for a particular element.

The following example shows how you can use a schema to constrain the occurrences of attributes and elements:

 <?xml version="1.0"?> <Schema name="orderschema"     xmlns="urn:schemas-microsoft-com:xml-data">     <ElementType name="OrderDate" model="closed" content="textOnly"/>     <ElementType name="Customer" model="closed" content="textOnly"/>     <ElementType name="Product" model="closed" content="mixed">         <AttributeType name="ProductID" required="yes"/>         <AttributeType name="UnitPrice"/>         <attribute type="ProductID"/>         <attribute type="UnitPrice" default="10.00"/>     </ElementType>     <ElementType name="Quantity" model="closed" content="textOnly"/>     <ElementType name="Item" model="closed" content="eltOnly">         <element type="Product" minOccurs="1" maxOccurs="1"/>         <element type="Quantity" minOccurs="1" maxOccurs="1"/>     </ElementType>     <ElementType name="Order" model="closed" content="mixed">         <AttributeType name="OrderNo" required="yes"/>         <attribute type="OrderNo"/>         <element type="OrderDate" minOccurs="1" maxOccurs="1"/>         <element type="Customer" minOccurs="1" maxOccurs="1"/>         <element type="Item" minOccurs="1" maxOccurs="*"/>     </ElementType> </Schema> 

In this schema, the OrderNo attribute of the Order element and the ProductID attribute of the Product element are both required. The UnitPrice attribute of the Product element has a default value of 10.00. The Product and Quantity ele ments in an Item element must appear exactly once, as must the OrderDate and Customer elements in the Order element. The Item element, however, must appear a minimum of once, but can appear as many times as necessary.

Specifying Data Types

You can use a schema to define the data types of the elements and attributes in a valid XML document. This ensures that the data exchanged in XML documents can be processed correctly, and helps minimize the chance of an error caused by invalid data. To specify data types, your schema must refer to the datatypes namespace. Supported data types for elements include the primitive XML data types specified by the W3C (string, id, idref, idrefs, entity, entities, nmtoken, nmtokens, and notation) as well as more common data types such as boolean, char, dateTime, int, float, and others. You can declare attributes as XML primitive types, including enumeration, and the common data types.

Earlier versions of the Microsoft XML parser supported only a subset of data types for attributes. For full support make sure you are using the latest version of the parser.

The following schema shows how to declare data types for elements and attributes:

 <?xml version="1.0"?> <Schema name="orderschema"     xmlns="urn:schemas-microsoft-com:xml-data"     xmlns:dt="urn:schemas-microsoft-com:datatypes">     <ElementType name="OrderDate" model="closed" content="textOnly"         dt:type="date"/>     <ElementType name="Customer" model="closed" content="textOnly"         dt:type="string"/>     <ElementType name="Product" model="closed" content="mixed">         <AttributeType name="ProductID" required="yes" dt:type="int"/>         <AttributeType name="UnitPrice" dt:type="fixed.14.4"/>         <attribute type="ProductID"/>         <attribute type="UnitPrice" default="10.00"/>     </ElementType>     <ElementType name="Quantity" model="closed" content="textOnly"         dt:type="int"/>     <ElementType name="Item" model="closed" content="eltOnly">         <element type="Product" minOccurs="1" maxOccurs="1"/>         <element type="Quantity" minOccurs="1" maxOccurs="1"/>     </ElementType>     <ElementType name="Order" model="closed" content="mixed">         <AttributeType name="OrderNo" required="yes" dt:type="int"/>         <attribute type="OrderNo"/>         <element type="OrderDate" minOccurs="1" maxOccurs="1"/>         <element type="Customer" minOccurs="1" maxOccurs="1"/>         <element type="Item" minOccurs="1" maxOccurs="*"/>     </ElementType> </Schema> 

This version of the schema requires that the OrderNo, ProductID, and Quantity values are integer numbers, the OrderDate value is a date, the Customer value is a string, and the UnitPrice value is a fixed decimal number (with up to 14 digits in front of the decimal point and up to 4 digits behind the decimal).

Validating an XML Document

To ensure that a document is validated using an appropriate schema, you can reference the schema in the document namespace by using the x-schema keyword. For example, the following document references a schema named Orderschema.xml:

 <?xml version="1.0"?> <Order OrderNo="1234" xmlns="x-schema:Orderschema.xml">     <OrderDate>2001-01-01</OrderDate>     <Customer>Graeme Malcolm</Customer>     <Item>         <Product Product UnitPrice="18">Chai</Product>         <Quantity>2</Quantity>     </Item>     <Item>         <Product Product UnitPrice="19">Chang</Product>         <Quantity>1</Quantity>     </Item> </Order> 

The file name given in the x-schema namespace is a relative or absolute path to the schema file. In this case the schema is saved as Orderschema.xml in the same folder as the document. When the parser loads a document referencing a schema, that document can be validated against the schema to ensure that it meets all the requirements for documents of that type. If you use the Microsoft XML object library to load the document into an XMLDom object, the default behavior is for the document to be validated as it is parsed. You can check the value of the parseError property to determine if any errors occurred while loading the document, which will result in a nonzero value. The following Microsoft Visual Basic code shows how to check for a parse error:

 Dim objXML As MSXML2.DOMDocument30 Set objXML = CreateObject("Microsoft.XMLDom") objXML.async = False objXML.Load "C:\Order.xml" If Not objXML.parseError = 0 Then     MsgBox "Error parsing XML" End If 

As an alternative approach, you can alter this behavior by setting the validateOnParse property to False. Then you can use the Validate method to validate the document against the schema:

 Dim objXML As MSXML2.DOMDocument30 Set objXML = CreateObject("Microsoft.XMLDom") objXML.async = False objXML.validateOnParse = False objXML.Load "C:\Order.xml" If Not objXML.Validate = 0 Then     MsgBox "The document is invalid." End If 

One potential area for confusion is the fact that Internet Explorer doesn’t automatically validate XML documents when you load them in the browser. It merely checks to ensure that the document is well formed. You can install the XML Validation Tool from the MSDN site at http://msdn.microsoft.com/downloads to enable document validation from within Internet Explorer. After you install this tool, right-click in an XML document in Internet Explorer and choose to validate the XML against a schema.



Programming Microsoft SQL Server 2000 With Xml
Programming Microsoft SQL Server(TM) 2000 with XML (Pro-Developer)
ISBN: 0735613699
EAN: 2147483647
Year: 2005
Pages: 89

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net