Validation and Entity Resolution

I l @ ve RuBoard

Once you use XML documents to exchange business information, you'll need additional functionality to help manage these documents. Data can flow smoothly between applications only if the applications agree on how the data will be formatted and what the structure of the data will be. XML provides the formatting and encoding for business data, but the required structure will depend on the applications involved.

Validating XML Documents

If XML documents are to be shared between applications, the receiving application must understand the contents of the document. Writing code to check that the document format is correct can be time consuming and prone to errors. To avoid coding, you can define the expected format of a document using a DTD or a schema.

The DTD syntax is inherited from the Standard Generalized Markup Language (SGML), which preceded XML, and it looks somewhat arcane. DTDs are limited in that they do not have a particularly strong type system ”you cannot differentiate between strings and numbers , for example. There is also a limit of one DTD per document, which can cause problems if documents are being merged or parts are sourced from different places. The following sample file, CakeCatalog.dtd, shows the DTD for documents that contain a root element of CakeCatalog . I will not go into the precise syntax, but suffice it to say that this is not XML!

CakeCatalog.dtd
 <!ELEMENTCakeCatalog(CakeType)*> <!ELEMENTCakeType(Message,Description,Sizes?)> <!ATTLISTCakeTypestyleCDATA#REQUIRED fillingCDATA#REQUIRED shapeCDATA#REQUIRED> <!ELEMENTMessage(#PCDATA)> <!ELEMENTDescription(#PCDATA)> <!ELEMENTSizes(Option)*> <!ELEMENTOptionEMPTY> <!ATTLISTOptionsizeInInchesCDATA#REQUIRED> 

The XML Schema standard from the W3C defines document structure using an XML-based syntax. This standard has evolved from several early efforts to improve DTDs by using XML as the schema syntax and introducing a stronger type system. One of these early efforts was the XML Data submission to the W3C by Microsoft, DataChannel, and others. This standard was subsequently scaled down to a form known as XML Data Reduced (XDR). XDR was used for the first schema-based tools launched by Microsoft, so you might find many schemas still defined in this dialect . The latest XML Schema standard has many similarities to XDR, but the two are not interoperable. If you need to convert an XDR schema to an XML schema, you can use a tool such as the XML Schema Definition tool (XSD.exe), which comes with the .NET Framework SDK. A graphical schema editor is available with Visual Studio .NET.

The XML schema corresponding to the CakeCatalog is shown in the following sample file, SchemaCakeCatalog.xsd.

SchemaCakeCatalog.xsd
 <?xmlversion="1.0" ?> <xs:schemaid="CakeCatalog"  targetNamespace="http://www.fourthcoffee.com/SchemaCakeCatalog.xsd"  xmlns:mstns="http://www.fourthcoffee.com/SchemaCakeCatalog.xsd"  xmlns="http://www.fourthcoffee.com/SchemaCakeCatalog.xsd"  xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:msdata="urn:schemas-microsoft-com:xml-msdata"  attributeFormDefault="qualified" elementFormDefault="qualified"> <xs:elementname="CakeCatalog" msdata:IsDataSet="true"  msdata:Locale="en-GB" msdata:EnforceConstraints="False"> <xs:complexType> <xs:choicemaxOccurs="unbounded"> <xs:elementname="CakeType"> <xs:complexType> <xs:sequence> <xs:elementname="Message" type="xs:string" minOccurs="0"  msdata:Ordinal="0" /> <xs:elementname="Description" type="xs:string" minOccurs="0"  msdata:Ordinal="1" /> <xs:elementname="Sizes" minOccurs="0" maxOccurs="unbounded"> <xs:complexType> <xs:sequence> <xs:elementname="Option" minOccurs="0"  maxOccurs="unbounded"> <xs:complexType> <xs:attributename="sizeInInches"  form="unqualified" type="xs:string" /> </xs:complexType> </xs:element> </xs:sequence> </xs:complexType> </xs:element> </xs:sequence> <xs:attributename="style" form="unqualified"  type="xs:string" /> <xs:attributename="filling" form="unqualified"  type="xs:string" /> <xs:attributename="shape" form="unqualified" type="xs:string" /> </xs:complexType> </xs:element> </xs:choice> </xs:complexType> </xs:element> </xs:schema> 
Attaching a Schema or a DTD to an XML Document

A DTD can be associated with an XML document in one of two ways. First, a DTD can be defined within the XML document itself, as shown in the following sample file, DTDCakeCatalog.xml. The DOCTYPE statement indicates the name of the root element to which the DTD pertains.

DTDCakeCatalog.xml
 <?xmlversion="1.0" encoding="utf-8"?> <!DOCTYPECakeCatalog[ <!ELEMENTCakeCatalog(CakeType)*> <!ELEMENTCakeType(Message,Description,Sizes?)> <!ATTLISTCakeTypestyleCDATA#REQUIRED fillingCDATA#REQUIRED shapeCDATA#REQUIRED> <!ELEMENTMessage(#PCDATA)> <!ELEMENTDescription(#PCDATA)> <!ELEMENTSizes(Option)*> <!ELEMENTOptionEMPTY> <!ATTLISTOptionsizeInInchesCDATA#REQUIRED> ]> <CakeCatalog> <CakeTypestyle="Celebration" filling="sponge" shape="square"> <Message>Congratulations</Message> <Description>Generalachievement</Description> <Sizes> <OptionsizeInInches="10" /> <OptionsizeInInches="12" /> </Sizes> </CakeType> </CakeCatalog> 

Alternatively, you can provide the name of an external file that contains the DTD. The following DOCTYPE declaration could replace the DOCTYPE declaration in DTDCakeCatalog.xml. This declaration refers to the file CakeCatalog.dtd sample shown earlier:

 <!DOCTYPECakeCatalogSYSTEM "CakeCatalog.dtd"> 

In either case (internal or external DTD definition), a validating parser can discover, resolve, and apply the structure information given within the document.

Most structure definitions created recently use an XML schema rather than a DTD. To attach the schema defined in SchemaCakeCatalog.xsd to an XML document, you can use a namespace declaration. The following sample file, SchemaCakeCatalog.xml, shows how this can be done so that the schema becomes the default schema for that document. You can define multiple schemas on a single start element tag using different namespace prefixes. (There can be only one default.)

SchemaCakeCatalog.xml
 <?xmlversion="1.0" encoding="utf-8"?> <CakeCatalogxmlns="http://www.fourthcoffee.com/SchemaCakeCatalog.xsd"> <CakeTypestyle="Celebration" filling="sponge" shape="square"> <Message>Congratulations</Message> <Description>Generalachievement</Description> <Sizes> <OptionsizeInInches="10" /> <OptionsizeInInches="12" /> </Sizes> </CakeType> </CakeCatalog> 

Again, any schema-aware, validating parser that encounters the document SchemaCakaCatalog.xml can resolve the location of the schema and use it to validate the XML document.

Handling Validation Errors

You now have a schema or DTD associated with your document. How do you ensure that the document is valid? The ability to validate documents programmatically is what adds value ”you do not have to write low-level checking code yourself. The first thing to do is to obtain a validating parser. The XmlValidatingReader class provides such functionality. This class can obtain the XML to be validated from the following sources:

  • An existing XmlReader ” either an XmlTextReader or an XmlNode ­Reader

  • A stream or a string. In this case, you must provide more information, including the type of XML fragment being parsed and the encoding used

The simplest approach is to wrap an XmlValidatingReader around an XmlTextReader :

 XmlTextReaderreader=newXmlTextReader(args[0]); reader.set_WhitespaceHandling(WhitespaceHandling.Significant); XmlValidatingReadervreader=newXmlValidatingReader(reader); 

Calls to the Read method (or similar methods ) of the XmlValidating ­Reader class will cause the document to be validated as the reading progresses. If an inconsistency is encountered , a System.Xml.Schema.XmlSchemaException will be thrown. This exception can then be caught and examined. However, there is a cleaner way of monitoring the progress of validation.

When a validation error occurs, the parser will raise an event to this effect; an exception will be thrown only if there is no event handler defined. Therefore, the preferred way of handling validation errors is to set an event handler. In J#, the event handler takes the form of a delegate with the signature containing the sender and validation event arguments, as shown here:

 publicdelegatevoidValidationEventHandler(objectsender, ValidationEventArgse); 

You can define an event handler by creating a delegate and calling the add_ValidationEventHandler method to add this to the list of event handlers for the XmlValidatingReader :

 ValidationEventHandlerhandler=new ValidationEventHandler(validationHandler); vreader.add_ValidationEventHandler(handler); 

If a validation problem is encountered, the handler will be invoked. The problem could be an error or a warning as defined by the Severity property of the ValidationEventArgs object passed into the event handler. The event handler can then decide on a suitable course of action. The DTDValidateCatalog.jsl sample file shows how to handle such validation errors. This sample is provided with three XML files. DTDCakeCatalog.xml is a valid XML file that uses an internal DTD, ExtDTDCakeCatalog.xml is a valid XML file that uses an external DTD, and InvalidDTDCakeCatalog.xml is an invalid XML file that uses an internal DTD.

Defining the Schema to Use

A document can be validated using an attached schema or a DTD. However, a particular document might have multiple schema types attached to it. In this case, you can set the XmlValidatingReader class's ValidationType property to indicate whether the document should be validated against a DTD, an XDR schema, or an XML schema.

Another issue is whether you're using the correct schema. In general terms, you might need to examine and possibly change the namespace attributes associated with a document to ensure that it is validated against the schema expected by your application.

The use of schemas can be made more efficient by caching the most frequently used schemas. You can create an XmlSchemaCollection and add common schemas to it. This collection can then be added to the schema collection held for each XmlValidatingReader used:

 XmlSchemaCollectionschemas=newXmlSchemaCollection(); schemas.Add("http://www.fourthcoffee.com/SchemaCakeCatalog.xsd",  "SchemaCakeCatalog.xsd"); ... XmlSchemaCollectionxsc=vreader.get_Schemas(); xsc.Add(schemas); //Validateasbefore... 

The code to perform the validation of an XML document using an XML schema and a schema cache can be found in the sample file SchemaValidateCatalog.jsl, which uses the files SchemaCakeCatalog.xml and SchemaCakeCatalog.xsd.

Resolving Entities

An XML document might contain fixed markup or other content. For example, you might need to use the same supplier address in every XML-based invoice you issue. One way of inserting the address is to define the appropriate markup for the address in another file and include this file as the document is being processed . When used in this way, the external markup is referred to as an external entity. Such entities are defined as part of the DTD or schema for a document.

The .NET Framework Class Library contains a class designed for entity resolution, XmlUrlResolver . If you need to resolve entities in any other way, you can create a custom resolver class, which you can set on the XmlReader using the XmlResolver property. This resolver is then used for all external resolution, such as the location of external DTDs and schemas. A full examination of entity resolution, schema includes and imports, and the use of nondefault resolvers is beyond the scope of this book. For more information, see the .NET Framework documentation.

I l @ ve RuBoard


Microsoft Visual J# .NET (Core Reference)
Microsoft Visual J# .NET (Core Reference) (Pro-Developer)
ISBN: 0735615500
EAN: 2147483647
Year: 2002
Pages: 128

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net