Checking for Well-Formed XML Documents


from xml.sax.handler import ContentHandler import xml.sax xmlparser = xml.sax.make_parser() xmlparser.setContentHandler(ContentHandler()) xmlparser.parse(fName)

One of the most common tasks when processing XML documents is checking to see whether a document is well formed. The best way to determine whether a document is well formed is to use the xml.sax module to parse inside a try statement that will handle an exception if the document is not well formed.

First, create an xml.sax parser object using the make_parser() function. The make_parser function will return a parser object that can be used to parse the XML file.

After you have created the parser object, add a content handler to the object using its setContentHandler(handler) function. In this phrase, a generic content handler is passed to the object by calling the xml.sax.handler.ContentHandler() function.

Once the content handler has been added to the parser object, the XML files can be parsed inside a try block. If the parser encounters an error in the XML document, an exception will be thrown; otherwise, the document is well formed.

import sys from xml.sax.handler import ContentHandler import xml.sax fileList = ["emails.xml", "bad.xml"] #Create a parser object xmlparser = xml.sax.make_parser() #Attach a generic content handler to parser xmlparser.setContentHandler(ContentHandler()) #Parse the files and handle exceptions #on bad-formed XML files for fName in fileList:     try:         xmlparser.parse(fName)         print "%s is a well-formed file." % fName     except Exception, err: print "ERROR %s:\n\t %s is not a well-formed file." %  (err, fName)


xml_wellformed.py

emails.xml is a well-formed file. ERROR bad.xml:5:12: not well-formed (invalid token):         bad.xml is not a well-formed file.


Output from xml_wellformed.py code.



Python Phrasebook(c) Essential Code and Commands
Python Phrasebook
ISBN: 0672329107
EAN: 2147483647
Year: N/A
Pages: 138
Authors: Brad Dayley

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net