Validating an XML Document with a DTD

Problem

You want to verify that an XML document is valid according to a DTD.

Solution

Use the Xerces library with either the SAX2 (Simple API for XML) or the DOM parser.

To validate an XML document using SAX2, obtain a SAX2XMLReader, as in Example 14-8. Next, enable DTD validation by calling the parser's setFeature( ) method with the arguments xercesc::XMLUni::fgSAX2CoreValidation and true. Finally, register an ErrorHandler to receive notifications of DTD violations and call the parser's parse() method with your XML document's name as its argument.

To validate an XML document using DOM, first construct an instance of XercesDOMParser. Next, enable DTD validation by calling the parser's setValidationScheme( ) method with the argument xercesc:: XercesDOMParser::Val_Always. Finally, register an ErrorHandler to receive notifications of DTD violations and call the parser's parse( ) method with your XML document's name as its argument.

Here I'm using the class XercesDOMParser, an XML parser that has been part of Xerces since before the DOM Level 3 DOMBuilder interface was introduced. Using a XercesDOMParser makes the example a bit simpler, but you can use a DOMBuilder instead if you like. See Discussion and Recipe 14.4.

For example, suppose you modify the XML document animals.xml from Example 14-1 to contain a reference to an external DTD, as illustrated in Examples Example 14-11 and Example 14-12. The code to validate this document using the SAX2 API is presented in Example 14-13; the code to validate it using the DOM parser is presented in Example 14-14.

Example 14-11. DTD animals.dtd for the file animals.xml











 

Example 14-12. The file animals.xml, modified to contain a DTD







 

 

Example 14-13. Validating the document animals.xml against a DTD using the SAX2 API

/*
 * Same includes as Example 14-8, except  is not needed
 */

#include  // runtime_error
#include 

using namespace std;
using namespace xercesc;

/*
 * Define XercesInitializer as in Example Example 14-8
 * and CircusErrorHandler as in Example Example 14-7
 */

int main( )
{
 try {
 // Initialize Xerces and obtain a SAX2 parser
 XercesInitializer init;
 auto_ptr 
 parser(XMLReaderFactory::createXMLReader( ));

 // Enable validation
 parser->setFeature(XMLUni::fgSAX2CoreValidation, true);

 // Register error handler to receive notifications
 // of DTD violations
 CircusErrorHandler error;
 parser->setErrorHandler(&error);
 parser->parse("animals.xml");
 } catch (const SAXException& e) {
 cout << "xml error: " << toNative(e.getMessage( )) << "
";
 return EXIT_FAILURE;
 } catch (const XMLException& e) {
 cout << "xml error: " << toNative(e.getMessage( )) << "
";
 return EXIT_FAILURE;
 } catch (const exception& e) {
 cout << e.what( ) << "
";
 return EXIT_FAILURE;
 }
}

 

Example 14-14. Validating the document animals.xml against the DTD animals.dtd using XercesDOMParser

#include 
#include  // cout
#include  // runtime_error
#include 
#include 
#include 
#include 
#include "xerces_strings.hpp" // Example 14-4

using namespace std;
using namespace xercesc;

/*
 * Define XercesInitializer as in Example 14-8
 * and CircusErrorHandler as in Example 14-7
 */

int main( )
{
 try {
 // Initialize Xerces and construct a DOM parser.
 XercesInitializer init;
 XercesDOMParser parser;

 // Enable DTD validation
 parser.setValidationScheme(XercesDOMParser::Val_Always);

 // Register an error handler to receive notifications
 // of schema violations
 CircusErrorHandler handler;
 parser.setErrorHandler(&handler);

 // Parse and validate.
 parser.parse("animals.xml");
 } catch (const SAXException& e) {
 cout << "xml error: " << toNative(e.getMessage( )) << "
";
 return EXIT_FAILURE;
 } catch (const XMLException& e) {
 cout << "xml error: " << toNative(e.getMessage( )) << "
";
 return EXIT_FAILURE;
 } catch (const exception& e) {
 cout << e.what( ) << "
";
 return EXIT_FAILURE;
 }
}

 

Discussion

DTDs provide a simple way to constrain an XML document. For example, using a DTD, you can specify what elements may appear in a document; what attributes an element may have; and whether a particular element can contain child elements, text, or both. It's also possible to impose constraints on the type, order, and number of an element's children and on the values an attribute may take.

The purpose of DTDs is to identify the subset of well-formed XML documents that are interesting in a certain application domain. In Example 14-1, for instance, it's important that each animal element has child elements name, species, dateofBirth, veterinarian, and trainer, that the name, species, and dateOfBirth elements contain only text, and that the veterinarian and trainer elements have both a name and a phone attribute. Furthermore, an animal element should have no phone attribute, and a veterinarian element should have no species children.

These are the types of restrictions enforced by the DTD in Example 14-11. For example, the following element declaration states that an animal element must have child elements name, species, dateOfBirth, veterinarian, and trainer, in that order.

Similarly, the following attribute declaration indicates that a TRainer element must have name and phone attributes; the fact that no other attribute declarations for trainer appears in the DTD indicates that these are the only two attributes a TRainer element may have:

An XML document that contains a DTD and conforms to its constraints is said to be valid. An XML parser that checks for validity in addition to checking for syntax errors is called a validating parser. Although SAX2XMLReader parser and XercesDOMParser are not validating parsers by default, they both provide a validation feature that can be enabled as shown in Examples Example 14-13 and Example 14-14. Similarly, a DOMBuilder, described in Recipe 14.4, can be made to validate by calling its setFeature( ) method with the arguments fgXMLUni::fgDOMValidation and true.

The classes SAX2XMLReader, DOMBuilder, DOMWriter, and XercesDOMParser support a number of optional features. With SAX2XMLReader and DOMBuilder, you can enable and disable these features using the methods setFeature( ) and setProperty( ). The first method takes a string and a boolean value; the second takes a string and a void*. You can also query the enabled features using getFeature( ) and getProperty( ). For convenience, Xerces provides constants representing the names of features and properties. The class DOMWriter supports setFeature() but not setProperty( ). The class XercesDOMParser supports neither method; it provides separate setter and getter methods for each feature. See the Xerces documentation for a complete list of supported features.

 

See Also

Recipe 14.6

Building C++ Applications

Code Organization

Numbers

Strings and Text

Dates and Times

Managing Data with Containers

Algorithms

Classes

Exceptions and Safety

Streams and Files

Science and Mathematics

Multithreading

Internationalization

XML

Miscellaneous

Index



C++ Cookbook
Secure Programming Cookbook for C and C++: Recipes for Cryptography, Authentication, Input Validation & More
ISBN: 0596003943
EAN: 2147483647
Year: 2006
Pages: 241

Flylib.com © 2008-2020.
If you may any questions please contact us: flylib@qtcs.net