Validating XML Documents


An XML document must follow the specific standards laid down by W3C in order to be acceptable “ in particular, it must be well- formed . It must:

  • Have a single root element that encloses all other content except the document declaration, processing instructions, and comments.

  • Have matching closing tags for all the opening tags (or use the shorthand syntax of ending the element with a forward slash character).

  • Be properly nested so that elements are fully enclosed . You can't open an element as a child of another element and then close the parent element before closing the child element.

  • Contain only valid characters . All non-valid content must be escaped or replaced by the correct entity equivalents, such as & for an ampersand character.

An XML document can be well-formed and still not be valid . The validity of a document is defined using a schema or Document Type Definition ( DTD ). This lays out the structure of the elements, attributes and other content, the ordering of the elements, and the permissible value ranges for the elements and attributes. The XML storage objects parse the XML to ensure that it's well-formed when they load it (it can't be loaded otherwise ), but don't automatically validate the XML. You have to look after that yourself.

The XmlValidatingReader Object

Documents are validated against a given XML schema or DTD using the XmlValidatingReader object. This isn't actually a reader, but a helper object that is attached to a reader. Figure 11-16 shows the way it works. The document is read using an XmlTextReader (or an XmlNodeReader , if you only want to validate part of a document). This object automatically raises an error if the document is not well- formed.

click to expand
Figure 11-16:

When you attach an XmlValidatingReader to the XmlTextReader , it automatically checks for the presence of a schema or DTD within the document, and validates the content of the document against that schema or DTD. Errors found during validation are raised through the Validation event, and the handler for this event receives a ValidationEventHandler object that contains a description of the error. You can access this object's properties when the event occurs to determine the validation errors that are present (you'll see how this is done shortly in the example page), or you can leave it to the default event handler to raise an error.

Creating an XmlValidatingReader Instance

Creating an XmlValidatingReader for use with a document that contains an inline schema or DTD (or which specifies an external schema or DTD) is easy. You just need to create the XmlTextReader , specifying the XML document to load, and then use this as the basis for creating the XmlValidatingReader . Afterwards, you can set the ValidationType property to specify the type of schema you're using:

  'create the new XmlTextReader object and load the XML document   objXTReader = New XmlTextReader(strXMLPath)     'create an XmlValidatingReader for this XmlTextReader   Dim objValidator As New XmlValidatingReader(objXTReader)     'set the validation type to use an XML Schema   objValidator.ValidationType = ValidationType.Schema  

The acceptable values for the ValidationType property are shown in the following table:

Value

Description

Auto

The default. Validation is automatically performed against whichever type of schema or DTD is encountered .

DTD

Validate against a DTD. This actually creates an XML1.0-compliant parser. Default attributes are reported and general entities can be resolved by calling the ResolveEntity method. The DOCTYPE is not used for validation purposes.

Schema

Validate against a W3C-compliant XML Schema (XSD), including an inline schema. Schemas are specified using the schemaLocation attribute.

XDR

Validate against a schema that uses Microsoft's XML Data Reduced (XDR) syntax, including an inline schema. XDR schemas use the " x-schema " namespace prefix or the Schemas property.

None

No validation is performed. Can be used to "switch off" validation when not required.

Figure 11-16 also shows how to use a separate (not inline or linked) schema or DTD to validate the document. And, as schemas can inherit from each other, there could be several schemas that you'd want to apply to the XML document (thought there can only be one DTD). To cope with this, the XmlValidatingReader exposes a reference to an XmlSchemaCollection through the Schemas property. This collection contains all the desired schemas.

Validating XML When Loading a Document Object

If you are familiar with using the MSXML parser in ASP 3.0 or other environments, you may expect to be able to validate a document when you load it simply by setting some property. For example, with MSXML, the ValidateOnParse property can be set to True to validate a document that contains an inline schema or DTD, or a reference to an external schema or DTD.

However, things are different when using the .NET System.Xml classes. Loading a combined schema or DTD and the XML data content (that is, an inline schema) or an XML document that references an external schema or DTD into any of the XML storage objects such as XmlDocument , XmlDataDocument , and XPathDocument does not automatically validate that document. And there is no property that you can set to make it do this.

Instead, you can load the document via an XmlTextReader object to which you have attached an XmlValidatingReader . The Load method of the XmlDocument and XmlDataDocument objects can accept an XmlValidatingReader as the single parameter instead of a file path and name . Meanwhile, the constructor for the XPathDocument object can accept an XmlValdiatingReader as the single parameter.

So all you have to do is set up the XmlValidatingReader and XmlTextReader combination, and pass this to the Load method or the constructor function (depending on which document object you're creating). The document will then be validated as it is loaded:

  'create XmlTextReader, load XML document and create Validator   objXTReader = New XmlTextReader(strXMLPath)   Dim objValidator As New XmlValidatingReader(objXTReader)   objValidator.ValidationType = ValidationType.Schema     'use the validator/reader combination to create XPathDocument object   Dim objXPathDoc As New XPathDocument(objValidator)     'use the validator/reader combination to create XmlDocument object   Dim objXmlDoc As New XmlDocument()   objXmlDoc.Load(objValidator)  

The XmlValidatingReader can also be used to validate XML held in a String . So, you can validate XML that's already loaded into an object or application by simply extracting it as a String object (using the GetXml method with a DataSet object, or the OuterXml property to get a document fragment, for example) and applying the XmlValidatingReader to this.

Validating XML When Loading a DataSet

Like the XML document objects, a DataSet does not automatically validate XML that's provided for the ReadXml method against any schema that is already in place within the DataSet or which is inline with the XML (that is, in the same document as the XML data content). In a DataSet , the schema is used solely to provide information about the intended structure of the data. It's not used for actual validation at all.

When you load the schema, the DataSet uses it as a specification for the table names, column names , data types, and so on. Then, when you load the XML data content, it arranges the data in the appropriate tables and columns as new data rows. An encountered value or element that doesn't match the schema is ignored, and that particular column in the current data row is left empty.

This makes sense, because the DataSet is designed to work with structured relational data, and so any superfluous content in the source file cannot be part of the correct data model. So, you should think of schemas in a DataSet as being a way to specify the data structure (rather than inferring the structure from the data, as happens if no schema is present). Don't think of this as a way of validating the data.

A Document Validation Example

The Validating XML documents with an XmlValidatingReader object ( validating-xml.aspx ) example page shown in Figure 11-17 demonstrates how you can validate an XML document. When first opened, it displays a list of source documents that you can use in a drop-down list, and it performs validation against the selected document. As you can see from the screenshot, it reports no validation errors in a valid document.

click to expand
Figure 11-17:
Note

You must run the page in a browser on the web server itself to be able to open the XML document and schema using the physical paths in the hyperlinks in the page.

However, if you select the well-formed but invalid document, it reports a series of validation errors, as shown in Figure 11-18:

click to expand
Figure 11-18:

In this case the XML document contains an extra <MiddleInitial> child element within one of the <Books> elements, which is not permitted in the schema that's being used to validate it.

The following code shows the offending element. You can view the document and the schema using the hyperlinks provided in the page:

  <Books>   <ISBN>0764544020</ISBN>   <Title>Beginning Access 2002 VBA</Title>   <PublicationDate>2000-04-01T00:00:00.0000000+01:00</PublicationDate>   <FirstName>Mark</FirstName>    <MiddleInitial>J</MiddleInitial>    <LastName>Horner</LastName>   </Books>  
The Code for the Validation Example

The code that follows performs the validation. We start by creating the paths to the schema and XML document. In this example, the document name comes from the selXMLFile drop-down list defined earlier in the page “ the filename itself is the value attribute of the selected item.

We then declare a variable to hold the number of validation errors found. This is followed by code to create an XmlTextReader object, specifying the XML document as the source. Also provided is a hyperlink to this document:

  'create physical path to sample files (in same folder as ASPX page)   Dim strCurrentPath As String = Request.PhysicalPath   Dim strXMLPath As String = Left(strCurrentPath, _   InStrRev(strCurrentPath, "\")) & selXMLFile.SelectedItem.Value   Dim strSchemaPath As String = Left(strCurrentPath, _   InStrRev(strCurrentPath, "\")) & "booklist-schema.xsd"     'variable to count number of validation errors found   Dim intValidErrors As Integer = 0   'create the new XmlTextReader object and load the XML document   objXTReader = New XmlTextReader(strXMLPath)   outXMLDoc.innerHTML = "Loaded file: <a href=""" & strXMLPath _   & """>" & strXMLPath & "</a><br />"  
Creating the XmlValidatingReader and Specifying the Schema

The next step is to create the XmlValidatingReader object with the XmlTextReader as the source, and specify the validation type to suit the schema (you could have, of course, used Auto to automatically validate against any type of schema or DTD).

The schema is in a separate document and there is no link or reference to it in the XML document. So it's necessary to specify which schema to use. You can create a new XmlSchemaCollection , and add the schema to it using the Add method of the XmlSchemaCollection . You then specify this collection as the Schemas property, and display a link to the schema:

  'create an XmlValidatingReader for this XmlTextReader   Dim objValidator As New XmlValidatingReader(objXTReader)     'set the validation type to use an XSD schema   objValidator.ValidationType = ValidationType.Schema     'create a new XmlSchemaCollection   Dim objSchemaCol As New XmlSchemaCollection()     'add the booklist-schema.xsd schema to it   objSchemaCol.Add("", strSchemaPath)     'assign the schema collection to the XmlValidatingReader   objValidator.Schemas.Add(objSchemaCol)   outXMLDoc.innerHTML &= "Validating against: <a href=""" _   & strSchemaPath & """>" & strSchemaPath & "</a>"  
Note

In version 1.1, Microsoft has suggested an updated approach to loading stylesheets that are not fully trusted. See the Loading Stylesheets and Schemas with an XmlResolver section at the end of this chapter for details.

Specifying the Validation Event Handler

The XmlValidatingReader will raise an event whenever it encounters a validation error in the document, as the XmlTextReader reads it from the disk file. If you don't handle this event specifically , it will be raised to the default error handler. In our case, this is the Try...Catch construct included in the example page.

However, it's often better to handle the validation events separately from other (usually fatal) errors such as the XML file not actually existing on disk. To specify your own event handler for the ValidationEventHandler event in Visual Basic, use the AddHandler method, and pass to it the event you want to handle and a pointer to the handler routine (named ValidationError in this example):

  'add the event handler for any validation errors found   AddHandler objValidator.ValidationEventHandler, AddressOf ValidationError  

In C#, you can add the validation event handler using the following syntax:

  objValidator.ValidationEventHandler += new   ValidationEventHandler(ValidationError);  
Reading the Document and Catching Parser Errors

You are now ready to read the XML document from the disk file. In this case, you're only reading through to check for validation errors. In an application, you would have code here to perform whatever tasks you need against the XML, or alternatively use the XmlValidatingReader as the source for the Load method of an XmlDocument or XmlDataDocument object, or in the constructor for an XPathDocument object.

Once validation is complete, you can display a count of the number of errors found and close the reader object to release the disk file. If the document is not well-formed or cannot be loaded for any other reason (such as it doesn't exist), a parser error occurs. In this case, you can include a statement in the Catch section that displays the error. That's all you need to do to validate the document:

  Try   'iterate through the document reading and validating each element   While objValidator.Read()   'use or display the XML content here as required   End While   'display count of errors found   outXMLDoc.innerHTML &= "Validation complete " & intValidErrors _   & " error(s) found"     Catch objError As Exception   'will occur if there is a read error or the document cannot be parsed   outXMLDoc.InnerHTML &= "Read/Parser error: " & objError.Message     Finally   'must remember to always close the XmlTextReader after use   objXTReader.Close()     End Try  
The ValidationEvent Handler

The XmlValidatingReader raises the Validation event whenever a validation error is discovered in the XML document, and it's been specified that the ValidationError event handler will be called when this event is raised. This event handler receives the usual reference to the object that raised the event, plus a ValidationEventArgs object containing information about the event.

In the event handler, we first increment the error counter, and then check what kind of error it is by using the Severity property of the ValidationEventArgs object. A displayed message describes the error, the line number, and character position if available (although these are generally included in the error message anyway):

  Public Sub ValidationError(objSender As Object, _   objArgs As ValidationEventArgs)   'event handler called when a validation error is found   intValidErrors += 1 'increment count of errors     'check the severity of the error   Dim strSeverity As String   If objArgs.Severity = 0 Then strSeverity = "Error"   If objArgs.Severity = 1 Then strSeverity = "Warning"     'display a message   outXMLDoc.InnerHTML &= "Validation error: " & objArgs.Message _   & "<br /> Severity level: '" & strSeverity   If objXTReader.LineNumber > 0 Then   outXMLDoc.InnerHTML &= "Line: " & objXTReader.LineNumber _   & ", character: " & objXTReader.LinePosition   End If     End Sub  

The previous screenshot displayed validation error messages caused by a well-formed but invalid document. We've also provided an XML document that is not well-formed, so that you can see the parser error that is raised and trapped by the Try...Catch construct. This also prevents the remainder of the document from being read, as shown in Figure 11-19:

click to expand
Figure 11-19:

In this case, there is an illegal closing tag for one of the <Books> elements. One of the options provided even tries to load a non-existent XML document, so you can see that the page traps this error successfully as well.

  <Books>   <ISBN>1861003382</ISBN>   <Title>Beginning Active Server Pages 3.0</Title>   <PublicationDate>1999-12-01T00:00:00</PublicationDate>   <FirstName>David</FirstName>   <LastName>Sussman</LastName>    </BBoooks>   



Professional ASP. NET 1.1
Professional ASP.NET MVC 1.0 (Wrox Programmer to Programmer)
ISBN: 0470384611
EAN: 2147483647
Year: 2006
Pages: 243

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net