Recipe12.8.Validating XML Documents


Recipe 12.8. Validating XML Documents

Credit: Paul Sholtz, Jeroen Jeroen, Marius Gedminas

Problem

You are handling XML documents and must check the validity with respect to either internal or external DTDs. You possibly also want to perform application-specific processing during the validation process.

Solution

You often want to validate an XML document file with respect to a !DOCTYPE processing instruction that the document file contains. On occasion, though, you may want to force loading of an external DTD from a given file. Moreover, a frequent need is to also perform application-specific processing during validation. A function with optional parameters, using modules from the PyXML package, can accommodate all of these needs:

from xml.parsers.xmlproc import utils, xmlval, xmldtd def validate_xml_file(xml_filename, app=None, dtd_filename=None):     # build validating parser object with appropriate error handler     parser = xmlval.Validator( )     parser.set_error_handler(utils.ErrorPrinter(parser))     if dtd_filename is not None:         # DTD file specified, load and set it as the DTD to use         dtd = xmldtd.load_dtd(dtd_filename)         parser.val.dtd = parser.dtd = parser.ent = dtd     if app is not None:         # Application processing requested, set appliation object         parser.set_application(app)     # everything being set correctly, finally perform the parsing     parser.parse_resource(xml_filename)

If your XML data is in a string s, rather than in a file, instead of the parse.parse_resource call, you should use the following two statements in a variant of the previously shown function:

    parser.feed(s)     parser.close( )

Discussion

Documentation on XML parsing in general, and xmlproc in particular, is easy enough to come by. However, XML is a very large subject, and PyXML is a correspondingly large package. The package's documentation is often not entirely complete and up to date; even if it were, finding out how to perform specific tasks would still take quite a bit of digging. This recipe shows how to validate documents in a simple way that is easy to adapt to your specific needs.

If you need to perform application-specific processing, as well as validation, you need to make your own application object (an instance of some subclass of xmlproc.xmlproc.Application that appropriately overrides some or all of its various methods, most typically handle_start_tag, handle_end_tag, handle_data, and doc_end) and pass the application object as the app argument to the validate_xml_file function.

If you need to handle errors and warnings differently from the emitting of copious error messages that xmlproc.utils.ErrorPrinter performs, you need to subclass (either that class or its base xmlproc.xmlapp.ErrorHandler directly) to perform whatever tweaking you need. (See the sources of the utils.py module for examples; that module will usually be at relative path _xmlplus/parsers/xmlproc/utils.py in your Python library directory, after you have installed the PyXML package.) Then, you need to alter the call to the method set_error_handler that you see in this recipe's validate_xml_file function so that it uses an instance of your own error-handling class. You might modify the validate_xml_file function to take yet another optional parameter err=None for the purpose, but this way overgeneralization lies. I've found ErrorHandler's diagnostics normally cover my applications' needs, so, in the code shown in this recipe's Solution, I have not provided for this specific customization.

See Also

The PyXML web site at http://pyxml.sourceforge.net/.



Python Cookbook
Python Cookbook
ISBN: 0596007973
EAN: 2147483647
Year: 2004
Pages: 420

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net