Parsing XML Tags


import xml.sax class tagHandler(xml.sax.handler.ContentHandler):     def __init__(self):         self.tags = {}     def startElement(self,name, attr):         name = name.encode('ascii')         self.tags[name] = self.tags.get(name, 0) + 1         print "Tag %s = %d" % \               (name, self.tags.get(name)) xmlparser = xml.sax.make_parser() tHandler = tagHandler() xmlparser.setContentHandler(tHandler) xmlparser.parse(xmlFile)

Another fairly common task when processing XML files is to process the XML tags themselves. The xml.sax module provides a quick, clean interface to the XML tags by defining a custom content handler to deal with the tags.

This phrase demonstrates how to override the content handler of a sax XML parser to determine how many instances of a specific tag there are in the XML document.

First, define a tag handler class that inherits from the xml.sax.handler.ContentHandler class. Then override the startElement() method of the class to keep track of each encounter with specific tags.

After you have defined the tag handler class, create an xml.sax parser object using the make_parser() function. The make_parser() function will return a parser object that can be used to parse the XML file. Next, create an instance of the tag handler object.

After you have created the parser and tag handler objects, add the custom tag handler object to the parser object using the setContentHandler(handler) function.

After the content handler has been added to the parser object, parse the XML file using the parse(file) command of the parser object.

import xml.sax xmlFile = "emails.xml" xmlTag = "email" #Define handler to scan XML file and parse tags class tagHandler(xml.sax.handler.ContentHandler):     def __init__(self):         self.tags = {}     def startElement(self,name, attr):         name = name.encode('ascii')         self.tags[name] = self.tags.get(name, 0) + 1         print "Tag %s = %d" % \               (name, self.tags.get(name)) #Create a parser object xmlparser = xml.sax.make_parser() #Create a content handler object tHandler = tagHandler() #Attach the content handler to the parser xmlparser.setContentHandler(tHandler) #Parse the XML file xmlparser.parse(xmlFile) tags = tHandler.tags if tags.has_key(xmlTag):     print "%s has %d <%s> nodes." % \           (xmlFile, tags[xmlTag], xmlTag)


xml_tags.py

Tag emails = 1 Tag email = 1 Tag to = 1 Tag addr = 1 Tag addr = 2 Tag from = 1 Tag addr = 3 Tag subject = 1 Tag body = 1 Tag email = 2 Tag to = 2 Tag addr = 4 Tag addr = 5 Tag from = 2 Tag addr = 6 Tag subject = 2 Tag body = 2 emails.xml has 2 <email> nodes.


Output from xml_tags.py code




Python Phrasebook(c) Essential Code and Commands
Python Phrasebook
ISBN: 0672329107
EAN: 2147483647
Year: N/A
Pages: 138
Authors: Brad Dayley

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net