|only for RuBoard|
Box, Don, Aaron, Skonnard and John Lam. Essential XML, Beyond Markup. DevelopMentor.
|only for RuBoard|
|only for RuBoard|
Similar to Data Type Definitions (DTDS), XML Schemas provide for specification of the structure of an XML document. This chapter looks at XML Schemas and how you can use them to validate XML documents.
XML Schemas are well- formed XML documents. This means that, unlike DTD documents, they can be created and manipulated with XML parsers, such as the .NET objects in the System.Xml namespace or the MSXML DOM. XML Schemas have many advantages over DTDs, including support for data types and namespaces. They are readable and provide tremendous flexibility for describing an XML document.
World Wide Web Consortium Recommendation
XML Schemas are a World Wide Web Consortium (W3C) Recommendation, which means that they have passed all other stages of the W3C approval process. This also means that tool support is more widespread now that the working draft is not a moving target.
|only for RuBoard|
|only for RuBoard|
Before you jump into the code for working with schemas, you must understand what schemas are. The word Extensible in Extensible Markup Language refers to the ability to define new elements, therefore creating documents that are self-describing . Schemas take the notion of extensibility and extend it to encompass defining new vocabularies for XML documents. Schemas help you define the structure of a document so that the structure can be easily conveyed within and outside your organization.
The word schema is often used in the context of databases to describe the layout of a database. A database schema defines the tables, columns , and datatypes for a database. XML Schemas are similar in that you can define not only the layout of the document in terms of relation and datatypes, but also in terms of hierarchy and order of precedence. At their basic definition, XML Schemas are XML files that use predefined elements to define the structure of an XML document. A parser that supports the XML Schema recommendation knows how to react to the predefined elements when validataing your XML document. Usually, a schema is defined in a file with an XSD extension that is separate from the XML document. An XML document can reference the schema to declare the structure to which it should conform, although it's not required.
Two main acronyms are associated with XML schemas. XML Data Reduced (XDR) was based on a proposal submitted by Microsoft. XDR Schemas are used and supported in many Microsoft tools, such as SQL Server 2000, BizTalk, and Microsoft Office 2000. XDR has been supported in the MSXML parser since version 2.0; it is supported both in the version 4.0 parser and in .NET. The W3C Working Group evolved XDR into XML Schema Definition (XSD) language. XSD is the current implementation of XML Schemas and is currently a W3C Recommendation, which means that the proposal has passed all other stages of the W3C approval process. XSD has superseded XDR, so this book will focus primarily on XSD. However, XDR schemas are used in Chapter 9, "SQL Server 2000 and XML," in conjunction with SQL Server 2000 and are explained within the context. For more information on XDR Schemas, see the XDR Schema Developer's Guide on MSDN Online at http://msdn.microsoft.com/library/en-us/xmlsdk30/htm/xmconxmlschemadevelopersguide.asp.
Schemas allow for validation of the data contained in an XML document. Suppose that your application accepts XML data from a third party. How would you convey what the structure of the XML should be? You could email the structure, "The customer's ID field is 20 characters and is required." This would be troublesome , however, when you need to update the XML format. It would also be inconvenient for validating format because you would still be required to write validation routines to confirm the data is properly formatted according to your email. You would eventually have to recode the same implementation for another document, creating document-specific validation code that is not easily reusable.
Instead of guessing that the XML document conforms to an agreed-upon protocol and risking breaking your application due to malformed XML or missing required elements, you can explicitly convey your document standard by providing an XML Schema. You can make the schema available to developers from other departments or organizations so that they will know how to format their XML documents to effectively share data. When they send you a document, you can validate it using a parser that supports schema validation, such as MSXML4 or the System.Xml objects in .NET. If the XML document conforms to the contract specified in the schema, the XML document is said to be valid. If it does not conform, the XML parser notifies you that the document is not valid.
Another reason to consider schemas is their integration throughout .NET. As explained throughout this book, XML is integrated heavily into .NET. For example, Chapter 8, "Database Access with ADO.NET and XML," shows how XML Schemas play an important role in ADO.NET and are used heavily within ADO.NET DataSets to describe the content of the underlying data. Another new concept in .NET is the notion of typed DataSets . By using XML Schemas, you have the ability to generate class structures from an XML Schema, or to generate schemas from an existing class structure. This is a powerful concept:You can define your class hierarchies by using an XML Schema document and then use a utility to generate class files so that most of the coding is done for you. This technique is demonstrated in Chapter 4, "XML Tool Support in Visual Studio .NET." Conversely, you can develop your class files and serialize them by using a schema document to validate the serialization. Serialization is covered in Chapter 10, "XML Serialization." Finally, XML Schemas are being integrated into many other Microsoft products, such as SQL Server. SQL Server's extensive use of XML Schemas is covered in Chapter 9, "SQL Server 2000 and XML."
A schema document is the document that describes the rules and layout for an XML document. An instance document is an XML document that's validated against the schema document.
Referencing a Schema Document Directly
An instance document does not necessarily have to include a reference to the schema document, although it is common to do so. An XML parser that supports XSD should provide a means to specify the schema document's location and/or text without requiring the instance document to include an embedded reference to the schema document.
For most of this chapter, you'll use instance documents with an embedded reference to a schema document. In the cases where an instance document is used without a schema document, it is explicitly stated.
The two things I wanted to know when I first started using schemas were how to represent a class object and how to represent a collection of those objects. Consider the XML document in Listing 2.1.
<?xml version="1.0" encoding="utf-8" ?> <PURCHASEORDER> <CUSTOMER> <NAME>Carson Allen Evans</NAME> <PHONE>(800)555-1212</PHONE> <EMAIL>email@example.com</EMAIL> </CUSTOMER> <ORDER> <ITEM> <ITEMNAME>Easton BZ70-Z Z-Core Titanium Baseball Bat</ITEMNAME> <DESCRIPTION>The BZ70-Z Z-Core titanium baseball bat from Easton with Sc777 Triple Seven alloy construction.</DESCRIPTION> <SIZE>33/30</SIZE> <PRICE>229.99</PRICE> </ITEM> <ITEM> <ITEMNAME>Mizuno MZP11 Pro Limited</ITEMNAME> <DESCRIPTION>Mizuno MZP11 Pro Limited 12 Inch Baseball Glove, Pro Sized Pitcher's Glove.</DESCRIPTION> <SIZE>Left hand throw</SIZE> <PRICE>175.99</PRICE> </ITEM> </ORDER> </PURCHASEORDER>
Here, you can see an order for some baseball equipment. This document is the instance document because it contains a data instance of the structure that's defined throughout this chapter. The root node is PURCHASEORDER , which describes the content. You have a CUSTOMER object representation with the properties NAME , PHONE , and EMAIL , respectively. Then you have an ORDER collection that contains multiple ITEM s. You can use an XML Schema to validate the structure of the XML instance document. You can also use that structure as a template and any data contained in an instance of that template can also be validated.
XML Naming Conventions
Listing 2.1 uses all uppercase for element names . Although different conventions exist for naming XML elements, such as Pascal-style casing and camel- casing , the use of all uppercase naming was chosen for this example to yield an important point later regarding typed DataSets and Visual Basic .NET.
Begin by looking at how to associate the instance document with a schema document. Listing 2.1 is an example of an instance document without an embedded schema reference. To embed a reference to the external schema document, you must specify a namespace Uniform Resource Identifier (URI) that points to the schema document, as shown here:
<PURCHASEORDER xmlns="http://www.xmlandasp.net/sales/"> ... </PURCHASEORDER>
This code uses namespaces to convey the location of the XML schema. Namespaces are simply unique identifiers to a document. A URL is used because it is unique across the Internet, but you could have just as easily used a Globally Unique Identifier (GUID) or any other string. For example, the following code is also valid:
<PURCHASEORDER xmlns="DeannaEvans"> ... </PURCHASEORDER>
Another convention is to use a Uniform Resource Name (URN) to identify your schema. For more information on URNs, search for RFC 1737 on the web by using your favorite search site. Here's an example of using a URN:
<PURCHASEORDER xmlns="urn:deanna-evans:wife"> ... </PURCHASEORDER>
Because a namespace doesn't have to be a URL, how would the processor locate the schema document referenced in your XML instance document? To answer this question, another namespace declaration is available for use in your instance document, which is described in "XML Schema Part 0: Primer" at www.w3.org/TR/xmlschema-0. This namespace contains the schemaLocation element to provide hints to the physical location of the schema. It accepts pairs of URI references where the first pair references the namespace and the second pair refers to the physical location of the document.
In this code snippet, you can see the use of the xsi namespace to associate a physical document with the associated URN:
<PURCHASEORDER xmlns="urn:schemas-xmlandasp-net:po" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="urn:schemas-xmlandaspnet:po http://www.xmlandasp.net/schema/po.xsd"> ... </PURCHASEORDER>
|only for RuBoard|