Defining and Validating XML with Schemas

for RuBoard

Whenever you use or manipulate data, you need to have a way of answering certain questions about that data. Is an Invoice ID stored as a text value or a numeric value? Is a phone number limited to 10 digits? For that matter, can a person have more than one phone number? What happens if the person has none?

All these questions have to do with the concepts of data definition and validation. Application developers have historically embedded validation logic in application code. Sophisticated designs can encapsulate validation logic in various ways, but in most cases, the data definition and validation logic aren't accessible to processes outside of your application. This defeats the purpose of XML on a number of levels. Remember that XML is designed to be interoperable and human readable. When you commit validation logic to code, you've almost inherently made the validation logic inaccessible to other processes that might come along later. It is, in essence, a black box. The concept of encapsulating data validation in a class is a good and useful thing, but if other developers can't easily access your data design from outside sources, it may not be as useful to them.

A way exists to express and validate data designs expressed in XML. This is done through a standard descriptive format referred to as XML schemas (sometimes abbreviated XSD). Because the various .NET tools provide good support for XML schemas, we'll devote some time to discussing how schemas work, how to build them, and what you can do with them in your Internet applications.

About Document Type Definitions (DTDs)

The first technology used for validating XML structures was known as Document Type Definition (DTD). By linking a DTD document to an XML file, you can ensure that the XML document contains valid data types and structure.

The problem with DTDs is that they have limitations with respect to the things they can do. One glaring limitation is that DTDs can't define data types of elements that appear in a document.

But the most damning implication of using DTDs to validate XML is that DTDs are written using a syntax that is completely removed from that of XML itself; if you want to validate your data using the DTD format, you must ascend a learning curve.

A good example of a DTD is the DTD for XML itself, which resides at http://www.w3.org/XML/1998/06/xmlspec-v21.dtd. By looking at this DTD, you can get a sense for how different the syntax of DTD is. Indeed, the response of many developers who had to use DTDs in the early days of XML was, "Couldn't we use XML syntax to define the structure of an XML document?"

Microsoft chose to use a more evolved document definition technology for XML in the .NET universe ”the XML schema. The most important benefit of XML schemas is that you can write an XML schema using the XML syntax you presumably already know. For these reasons, this chapter focuses on schemas rather than DTDs.

NOTE

The Visual Studio .NET development environment gives developers a graphical way to build XML schemas and contains little or no support for DTDs. You can certainly use DTDs with XML data in the .NET framework; you just won't get much help from the tools.


Before we proceed, it's worth noting that you're never required to validate the XML documents that you use in your application development; XML documents can live out their whole lives without ever knowing or using an XML schema. However, it's a good idea to validate them for the sake of consistency. In addition, certain tools (including Visual Studio .NET) use XML schemas in various interesting and useful ways. Having a handle on what an XML schema is and how it works will give you a leg up on using these tools.

NOTE

The official W3C documentation on XML schemas comes in three parts : a primer, which as of this writing runs 73 printed pages, and two sections that document the schema specification in detail. The whole thing starts at http://www.w3.org/XML/Schema.


About XML Data-Reduced Schemas

The COM-based Microsoft XML implementation that existed before the arrival of the .NET framework used a syntax known as XML Data-Reduced (XDR) schemas. Confusingly, the Microsoft documentation refers to XDR as "XML Schemas," even though that's really a different, albeit related , syntax from that provided by the W3C. The .NET tools support both the XDR and W3C way of expressing schemas, but Visual Studio .NET follows the W3C schema syntax, so you'll likely see the W3C syntax used in .NET applications more often.

We are including this section on XDR schemas so that you can identify and understand the difference between XDR and W3C XML schemas, and as a way to make it easier for users of the MSXML library who are moving to the .NET framework to migrate their applications. Also, because the XML-handling objects in the .NET framework can process XDR- validated documents, it's possible that you will need to use XDR at some point, even though it's being superseded by the W3C schema format.

Listing 10.32 shows an example of a complete XDR schema definition.

Listing 10.32 Example of an XDR Schema
 <Schema xmlns="urn:schemas-microsoft-com:xml-data"         xmlns:dt="urn:schemas-microsoft-com:datatypes">   <ElementType name='TITLE' content='textOnly' />   <AttributeType name='IDType' dt:type='integer' />   <ElementType name='AUTHOR' content='textOnly'>     <attribute type='IDType' />   </ElementType>   <ElementType name='BOOK' content='mixed'>     <element type = 'TITLE' />     <element type = 'AUTHOR' />   </ElementType> </Schema> 

The example begins with a Schema node, to indicate that this is the start of a schema. The two xmlns attributes refer to external schema documents; the first one is for XML itself, the second one is for the data types defined by the Microsoft data types defined for use in XDR schemas.

The ElementType nodes in the schema document form the definition of the nodes that compose the document. In this example, you can see two types of ElementTypes defined; a simple type (containing no child nodes, such as TITLE) and a complex type (containing child nodes and/or attributes, such as BOOK). We'll discuss simple and complex types in more detail in the section "Understanding Simple and Complex Types" later in this chapter.

NOTE

The Microsoft reference on XDR schemas is at http://msdn.microsoft.com/xml/reference/schema/start.asp. Another useful Microsoft link is the XML Schema Developer's Guide, located at http://msdn.microsoft.com/xml/xmlguide/schema-overview.asp.

This reference material was created to document the behavior of the MSXML parser found in Internet Explorer 5.0. It may not have direct applicability to XML applications that you build using the .NET tools (use the W3C specification for XML schemas you build in .NET). Note, again, that when Microsoft refers to "XML Schema," it may be referring to either XDR schemas or W3C-style XML schemas. In general, what you get in the .NET tools are true W3C XML schemas.


This section is intended to give you the briefest example of an XDR schema so you can understand what an XDR schema looks like. Because the Visual Studio .NET uses the more recent W3C recommendation for XML schemas, however, we'll spend the rest of this section discussing the W3C syntax for XML document definition and validation.

NOTE

Using a tool that comes with the .NET framework SDK, you can convert existing XDR schemas to the W3C format described in the next section. Known as the XML Schema Definition Tool (xsd.exe), this command-line tool can also create basic schemas from existing XML files and build ADO.NET classes in Visual Basic.NET or C# from existing schemas.


Creating W3C XML Schemas

A W3C XML schema is conceptually similar to an XDR schema, but has a number of implementation differences. Because XML schema is on its way to becoming an Internet standard, it's better to use the W3C standard format because you can expect a better level of interoperability as the standard propagates. Fortunately, the new XML-handling tools included in the .NET framework and Visual Studio .NET tend to use the newer W3C versions of technologies such as schemas, so you'll have lots of help when building applications that are designed to be Internet standard and interoperable.

Our objective in this section is to perform a very simple validation on a simple XML document. Because the W3C XML schema language is a very complex syntax that could warrant a short book of its own, in this section we'll cover only the basics of XML schema ”particularly, how to create a schema that validates an XML document using the XML-handling objects in the .NET framework classes.

Listing 10.33 shows the document we'll be validating in this section.

Listing 10.33 Simplified book.xml Document
 <BOOK isbn="1234567890">   <TITLE>Little Red Riding Hood</TITLE>   <AUTHOR>Dave-Bob Grimm</AUTHOR> </BOOK> 

As you can see, this is a greatly simplified version of the books.xml document we've used in examples throughout this chapter. Rather than containing an unlimited number of books, this document contains only a single book, so it might be used in situations where one software process hands off book information to another.

Like XDR schemas, W3C schemas are generally linked to external files that provide the basic definition of what a schema is. As a result, the W3C-compliant schemas you create will typically begin with a reference to the standard W3C schema definition (known as the schema of schemas). The W3C schema definition gives your schema access to basic data types and structures you'll need to construct schemas to define and validate your XML documents. This basic definition is shown in Listing 10.34.

Listing 10.34 W3C Schema Definition Boilerplate
 <?xml version="1.0"?> <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"> <!-- Your schema definition goes here --> </xsd:schema> 

This is a boilerplate definition that you'll probably include in most, if not all, of your XML schema documents. (Note that the XML schema-editing function provided by Visual Studio .NET generates a slightly different boilerplate schema definition; what you see here is more streamlined.)

This boilerplate provides the same basic function as the initial definition of the XDR schema shown in a previous example, but it provides a different basic schema type and associates it with the xsd namespace. This means that you'll often see elements of a W3C schema prefixed with the xsd namespace; this is done to prevent namespace collisions between elements defined by the xsd schema definition and elements in your documents with the same name.

The next step to defining your own W3C schema is to define the data types that can appear in your document. To do this, you must have a handle on simple and complex data types in XML and how the W3C XML schema defines them.

Understanding Simple and Complex Types

As you know, an XML document is inherently hierarchical. Every XML document is composed of nodes that can contain child nodes, nested as deeply as necessary to represent a given data structure. When you're authoring an XML schema, you need to be able to make a distinction between simple and complex types. A complex type is defined as any node that has children or attributes; a simple type has no children or attributes. For example, in the book.xml document used as an example in Listing 10.33, BOOK is a complex type because it contains two child elements, AUTHOR and TITLE, as well as an attribute, isbn. AUTHOR, on the other hand, is a simple type, because it contains nothing but a text string (the name of the book's author).

The distinction between simple and complex types becomes important when building XML schemas because the two types are described in different ways in the schema format. In XML schema authoring, it's common to define the simple types first and the complex types later because the complex types are almost invariably built on the simple type definitions.

Listing 10.35 shows an example of a schema with a simple type definition.

Listing 10.35 W3C Schema Containing a Definition for a Simple Type
 <?xml version="1.0"?> <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">   <xsd:simpleType name="ISBNType">     <xsd:restriction base="xsd:string">       <xsd:maxLength value="10"/>     </xsd:restriction>   </xsd:simpletype> </xsd:schema> 

The first few lines of this schema definition are the same as the previous example; they refer to the external master schema maintained by the W3C that defines basic data types and so forth.

The xsd:simpleType node contains the definition for a simple type called ISBNType . You can see from the definition that this type contains a restriction, which provides additional information about the data type and ( optionally ) the nature of the data that this data type supports. In this case, ISBNType is declared to be a string that can have a maximum length of 10 characters. (Note that although ISBN stands for International Standard Book Number, an ISBN can contain alphabetic characters in addition to numbers , so we declare it to be a string type.)

Be careful typing the names of attributes such as maxLength in your XML schema definitions. As with all XML elements, the elements of an XML schema are case sensitive.

You don't have to define named types in your XML schemas. The advantage is reusability ”after you've defined a named type, you can reuse it anywhere you like in the schema definition (or refer to the schema from another schema and reuse it that way). When used in this way, XML schemas behave a bit like class definitions.

After you've created a type definition, you can declare that your document will contain elements of this type. Listing 10.36 shows an example of a schema containing a reference to the ISBNType simple type.

Listing 10.36 W3C Schema Containing an Element Definition That Refers to a Type Definition
 <?xml version="1.0" ?> <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">     <xsd:element name="ISBN" type="ISBNType"></xsd:element>     <xsd:simpleType name="ISBNType">         <xsd:restriction base="xsd:string">             <xsd:maxLength value="10" />         </xsd:restriction>     </xsd:simpleType> </xsd:schema> 

The element definition indicates that XML documents based on this schema will contain an element named ISBN. The type definition for this element is the ISBNType type created in the previous section.

This schema would be sufficient if we were interested only in creating XML documents containing lists of ISBN numbers. But the book information we're working with contains much more than that ”we need to transmit the title and author of the book as well. To do this, we'll need to modify our schema to include a new complex type, called BookType, that defines TITLE and AUTHOR elements, as well as an isbn attribute. The isbn attribute is defined as an ISBNType , which means it takes on the properties of that type definition; it's a string data type with a maximum length of 10 characters.

Listing 10.37 shows another version of the schema, this time with a more complete definition of the BookType type. This time, we've added TITLE and AUTHOR types to the BookType.

Listing 10.37 W3C Schema Containing a Complex Type That Refers to a Simple Type
 <?xml version="1.0" ?> <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">     <!-- Element definition -->     <xsd:element name="BOOK" type="BookType"></xsd:element>     <!-- Complex type definition -->     <xsd:complexType name="BookType">         <xsd:all>             <xsd:element name="TITLE" type="xsd:string" />             <xsd:element name="AUTHOR" type="xsd:string" />         </xsd:all>         <xsd:attribute name="isbn" type="ISBNType" />     </xsd:complexType>     <!-- Simple type definition with restriction-->     <xsd:simpleType name="ISBNType">         <xsd:restriction base="xsd:string">             <xsd:maxLength value="10" />         </xsd:restriction>     </xsd:simpleType> </xsd:schema> 

This version of the schema completes the complex type definition BookType by adding two elements: TITLE and AUTHOR. Both elements are defined as conventional strings with no special validation logic attached. The <xsd:all> element indicates that two or more child elements can appear in any order beneath the parent element. If you need to specify that the child elements should appear in a particular order, use <xsd:sequence> instead of <xsd:all>.

The big change in this final version of the schema is the addition of the BOOK element definition. Because the type of book was defined previously as a complex type, this section is very straightforward; all we need to do is reference the BookType complex type.

You'll notice, too, that this version of the schema contains comments; comments in XML are syntactically identical to comments in HTML. You should include comments in any XML file you create wherever there's a chance that somebody who comes along later might misunderstand what's going on in your document.

Validating Documents Using W3C Schemas

To validate an XML document using a schema, you first create the schema and then link the schema to an XML document defined by the schema. When an XML parser processes a document that is linked to a schema, the parser will first download the schema document(s) associated with that file. If the file fails to conform with any of the rules specified in the schema, the parser will complain (and in most cases refuse to proceed).

Schemas are often contained in a file separate from the XML data file. This enables you to change the schema definition without having to slog through the data itself. By placing a schema file in a location accessible through the Internet, any file anywhere can access and utilize the data structure and validation rules found in the schema.

In addition to linking XML documents to schemas, schemas themselves can be linked to other schemas. This gives you the capability to build progressively more sophisticated schemas based on more basic schema definitions created in the past.

After you've created an XML schema and linked it to an associated XML data document that implements the schema, you should test it to ensure that it does what you want. Because Internet Explorer understands XML and can render XML in the browser, you can use Internet Explorer 5.0 or later to determine whether your XML document parses. To do this, simply load the file into Internet Explorer using the URL text box or by using the File, Open menu command.

As with any XML parser, Internet Explorer automatically downloads the schema definition when it encounters an XML document that is linked to a schema. After the file and the external schemas are downloaded, the browser then attempts to parse the document. If parsing is successful, Internet Explorer displays the document. If it's unsuccessful , it usually gives you an error message in the browser.

Problems with XML rendering and parsing in a validated context usually stem from one of two problems: The document is not well formed (meaning the document is lacking an end tag for a node, for example), or the document is invalid (according to the validation rules defined in the schema file).

XML parsers almost invariably reject documents that are not well formed; whether they reject invalid documents depends on the tool you use to handle the document. For example, an XmlTextReader object will not throw an error when it reads an invalid XML document, but the XmlValidatingReader object will. The XmlValidatingReader object is introduced in the next section, "Using .NET Framework Objects to Validate XML Schemas."

NOTE

You should watch out for a couple of things when working with XML document validation with schemas. First, make sure your computer has a connection to the Internet when you load a validated document, because schemas must typically access other dependent schemas, and the way that's most commonly done is by downloading them over the Net.

Next, remember that XML is case sensitive ”uppercase and lowercase matter. Spelling an element name BOOK in one place in the document and then attempting to refer to something called Book or book later on will cause problems.


Using .NET Framework Objects to Validate XML Schemas

Earlier in this chapter, we discussed how to read a document using the XmlTextReader object provided by the .NET framework. You can use an XML schema to validate a document when reading a document using the .NET framework. To do this, you use the XmlValidatingReader object.

NOTE

The XmlValidatingReader class is found in the System.Xml namespace. Like the XmlTextReader object described earlier in this chapter, XmlValidatingReader inherits from System.Xml.XmlReader . A reference to the classes, properties, and methods introduced in this chapter is included at the end of this chapter.


Because they both inherit from the same base class, the XmlValidatingReader object works similar to the XmlTextReader object. To validate a document using the XmlValidatingReader object, set the object's Validation property to one of the values enumerated in System.Xml.ValidationType . The Validation property can be set to one of these values:

  • ValidationType.None (no validation)

  • ValidationType.DTD (use a document type definition for validation)

  • ValidationType.Schema (use an XSD schema)

  • ValidationType.XDR (use an XDR schema)

  • ValidationType.Auto ( infer one of the preceding values)

The value ValidationType.Auto tells the XmlValidatingReader object to infer which type of schema to use based on what the document contains. This means that it's possible for no validation to occur when using the Auto type ”if the XML document does not actually contain a link to a schema or DTD and the validation type is set to Auto , no validation will occur. For this reason, it's a good idea to explicitly set the validation type (if you know what it's going to be).

If the XML document does not contain a link to a schema, you can add a link to a schema programmatically. Do this by using the Add method of the Schemas collection contained by the XmlValidatingTextReader object.

After you set the validation type, assign a schema, you must then write code to actually perform the validation. This is similar to the code you write. Listing 10.38 shows an example of this.

Listing 10.38 XML Validation Subroutine Using the XmlValidatingReader Object
 <%@ Import Namespace="System.Xml" %> <%@ Import Namespace="System.Xml.Schema" %> <SCRIPT runat='server'>     void Page_Load(Object Sender,EventArgs e)    {       XmlTextReader tr = new XmlTextReader(Server.MapPath("book-invalid.xml"));       XmlValidatingReader vr  = new XmlValidatingReader(tr);       vr.ValidationType = ValidationType.Schema;       vr.Schemas.Add(null, Server.MapPath("book.xsd"));       while(vr.Read())       {      Response.Write("[" + vr.Name + "]" + vr.Value + "<BR>");      if(vr.NodeType == XmlNodeType.Element)      {        while(vr.MoveToNextAttribute())       {         Response.Write("[" + vr.Name + "]" + vr.Value + "<BR>");       }      }        }     } </SCRIPT> 

This code throws an error when it encounters an element of the document that violates the schema. The code will fail because it parses a version of book.xml that contains a validation error (an ISBN that is too long).

Raising errors when XML schema validation rules are broken is fine, but you may want a more granular level of control over how the document is validated, in addition to richer information on where validation errors were encountered . To do this, you can cause the XmlValidatingReader object to raise events when it encounters validation problems in the documents it parses.

To handle the events raised by an XmlValidatingReader object, you must create an event-handling procedure in your code and associate the events raised by the XmlValidatingReader object with your event-handling procedure. Listing 10.39 shows an example of this.

Listing 10.39 Responding to Validation Events Raised by the Validate Subroutine
 <%@ Import Namespace="System.Xml" %> <%@ Import Namespace="System.Xml.Schema" %> <SCRIPT runat='server'>     void Page_Load(Object Sender,EventArgs e)     {       XmlTextReader tr = new XmlTextReader(Server.MapPath("book-invalid.xml"));       XmlValidatingReader vr = new XmlValidatingReader(tr);       vr.ValidationType = ValidationType.Schema;       vr.ValidationEventHandler += new ValidationEventHandler(ValidationHandler);       vr.Schemas.Add(null, Server.MapPath("book.xsd"));       while(vr.Read())       {         Response.Write("[" + vr.Name + "]" + vr.Value + "<BR>");         if(vr.NodeType == XmlNodeType.Element)         {            while(vr.MoveToNextAttribute())              Response.Write("[" + vr.Name + "]" + vr.Value + "<BR>");         }       }     }     public void ValidationHandler(Object sender, ValidationEventArgs args)     {       Response.Write("<P><B>Validation error</B><BR>");       Response.Write("Severity: " + args.Severity + "<BR>");       Response.Write("Message: " + args.Message + "<BR>");     } </SCRIPT> 

You can see that the validation-handling procedure is a standard event handler assigned to the XmlValidatingReader with a call to the AddHandler statement. When a validation-handling procedure is assigned to the XmlValidatingReader in this way, the reader will issue calls to the validation-handling procedure whenever it encounters a validation error in the document.

NOTE

Because the ValidationEventHandler object is a member of the System.Xml.Schema object, you should import this namespace at the beginning of any code that uses the XmlValidatingReader object. To do this, use this page directive:

 <%@ Import Namespace="System.Xml.Schema" %> 

You can test your validation code by changing the XML document that the page parses. Do this by altering the constructor of the XmlTextReader object in the code: book.xml should be valid, whereas book-invalid.xml will cause the validation event handler to be triggered. (Remember in our schema definition earlier in this chapter, we defined an ISBN data type to be an alphanumeric string of no more than 10 characters.)

Creating XSD Schemas in Visual Studio .NET

You can use Visual Studio .NET to create XSD schemas, often without writing code. Visual Studio .NET provides a visual drag-and-drop interface for creating schemas; it supports IntelliSense and instant syntax checking, as you'd expect with any other kind of code you would write in Visual Studio.

To create an XSD schema in Visual Studio .NET, begin by creating a Web application project. Next, add an XSD file to the project by right-clicking the project in the Solution Explorer, and then selecting Add, Add New Item from the pop-up menu. Finally, from the Add New Item dialog box, choose XSD Schema.

The XSD schema designer looks similar to the other server-side designers in Visual Studio .NET ”it's a blank page. At the bottom of the page are two tabs, labeled Schema and XML. These tabs enable you to easily switch back and forth between visual and code views of the schema; you can create the schema either by dragging and dropping schema definition objects onto the page or by editing the code directly.

You can add a definition to the schema visually in one of two ways: by right-clicking the page, selecting Add from the pop-up menu, and choosing a schema member, or by choosing a schema member from the toolbox. (The Visual Studio .NET toolbox has a whole section devoted to the elements, attributes, simple and complex types, and other members of an XML schema definition.)

Editing Schema-Validated XML Files in Visual Studio .NET

Visual Studio .NET enables you to edit XML files with the same color coding and syntax checking you'd expect from any other kind of code you edit in Visual Studio. If you use Visual Studio .NET to edit an XML file that is defined by an XSD schema, you gain a bonus benefit ”IntelliSense support. This means that for an XML document that is defined by an XSL schema, Visual Studio .NET will provide drop-down lists of valid elements and attributes as you edit the XML document.

Creating Schemas from Data Sources Using Visual Studio .NET

Visual Studio .NET has the capability to create XML schemas automatically from a data source. This means that you can set up a database and Visual Studio will reverse engineer the structure of the database into XML schemas.

Because this function is tightly coupled to VS .NET's data-access features, we'll cover it in Chapter 11, "Creating Database Applications with ADO.NET."

for RuBoard


C# Developer[ap]s Guide to ASP. NET, XML, and ADO. NET
C# Developer[ap]s Guide to ASP. NET, XML, and ADO. NET
ISBN: 672321556
EAN: N/A
Year: 2005
Pages: 103

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net