The xml.parsers.expat Module

(Optional) The xml.parsers.expat module is an interface to James Clark's Expat XML parser. Example 5-3 demonstrates this full-featured and fast parser, which is an excellent choice for production use.

Example 5-3. Using the xml.parsers.expat Module

File: xml-parsers-expat-example-1.py

from xml.parsers import expat

class Parser:

 def _ _init_ _(self):
 self._parser = expat.ParserCreate()
 self._parser.StartElementHandler = self.start
 self._parser.EndElementHandler = self.end
 self._parser.CharacterDataHandler = self.data

 def feed(self, data):
 self._parser.Parse(data, 0)

 def close(self):
 self._parser.Parse("", 1) # end of data
 del self._parser # get rid of circular references

 def start(self, tag, attrs):
 print "START", repr(tag), attrs

 def end(self, tag):
 print "END", repr(tag)

 def data(self, data):
 print "DATA", repr(data)

p = Parser()
p.feed("data")
p.close()

START u'tag' {}
DATA u'data'
END u'tag'

Note that the parser returns Unicode strings, even if you pass it ordinary text. By default, the parser interprets the source text as UTF-8 (as per the XML standard). To use other encodings, make sure the XML file contains an encoding directive. Example 5-4 shows how to read ISO Latin-1 text using xml.parsers.expat.

Example 5-4. Using the xml.parsers.expat Module to Read ISO Latin-1 Text

File: xml-parsers-expat-example-2.py

from xml.parsers import expat

class Parser:

 def _ _init_ _(self):
 self._parser = expat.ParserCreate()
 self._parser.StartElementHandler = self.start
 self._parser.EndElementHandler = self.end
 self._parser.CharacterDataHandler = self.data

 def feed(self, data):
 self._parser.Parse(data, 0)

 def close(self):
 self._parser.Parse("", 1) # end of data
 del self._parser # get rid of circular references

 def start(self, tag, attrs):
 print "START", repr(tag), attrs

 def end(self, tag):
 print "END", repr(tag)

 def data(self, data):
 print "DATA", repr(data)

p = Parser()
p.feed("""


fredrik lundh
linköping

"""
)
p.close()

START u'author' {}
DATA u'12'
START u'name' {}
DATA u'fredrik lundh'
END u'name'
DATA u'12'
START u'city' {}
DATA u'link366ping'
END u'city'
DATA u'12'
END u'author'

Core Modules

More Standard Modules

Threads and Processes

Data Representation

File Formats

Mail and News Message Processing

Network Protocols

Internationalization

Multimedia Modules

Data Storage

Tools and Utilities

Platform-Specific Modules

Implementation Support Modules

Other Modules



Python Standard Library
Python Standard Library (Nutshell Handbooks) with
ISBN: 0596000960
EAN: 2147483647
Year: 2000
Pages: 252
Authors: Fredrik Lundh

Flylib.com © 2008-2020.
If you may any questions please contact us: flylib@qtcs.net