Chapter 19. Simple API for XML (SAX) | XML in a Nutshell, 2nd Edition

CONTENTS

19.1 The ContentHandler Interface
19.2 SAX Features and Properties
19.3 Filters

The Simple API for XML (SAX) is a straightforward, event-based API for reading XML documents. Many different XML parsers, including Xerces, Crimson, MSXML, the Oracle XML Parser for Java, and lfred, implement the SAX API. SAX was originally defined as a Java API and is primarily intended for parsers written in Java. Therefore, this chapter focuses on the Java version of the API. However, SAX has been ported to most other major object-oriented languages, including C++, Python, Perl, and Eiffel. The translation from Java is usually fairly obvious.

The SAX API is unusual among XML APIs because it's an event-based push model rather than a tree-based pull model. As the XML parser reads an XML document, it sends your program information from the document in real time. Each time the parser sees a start-tag, an end-tag, character data, or a processing instruction, it tells your program. The document is presented to your program one piece at a time from beginning to end. You can either save the pieces you're interested in until the entire document has been read or process the information as soon as you receive it. You do not have to wait for the entire document to be read before acting on the data at the beginning of the document. Most importantly, the entire document does not have to reside in memory. This feature makes SAX the API of choice for very large documents that do not fit into available memory.

This chapter covers SAX2 exclusively. In 2002 all major parsers that support SAX support SAX2. The major change in SAX2 from SAX1 is the addition of namespace support. This addition necessitated changing the names and signatures of almost every method and class in SAX. The old SAX1 methods and classes are still available, but they're now deprecated, and you shouldn't use them.

SAX is primarily a collection of interfaces in the org.xml.sax package. One such interface is XMLReader . This interface represents the XML parser. It declares methods to parse a document and configure the parsing process, for instance, by turning validation on or off. To parse a document with SAX, first create an instance of XMLReader with the XMLReaderFactory class in the org.xml.sax.helpers package. This class has a static createXMLReader( ) factory method that produces the parser-specific implementation of the XMLReader interface. The Java system property org.xml.sax.driver specifies the concrete class to instantiate:

try {   XMLReader parser = XMLReaderFactory.createXMLReader( );   // parse the document... } catch (SAXException e) {   // couldn't create the XMLReader }

The call to XMLReaderFactory.createXMLReader( ) is wrapped in a try-catch block that catches SAXException . This is the generic checked exception superclass for almost anything that can go wrong while parsing an XML document. In this case, it means either that the org.xml.sax.driver system property wasn't set or that it was set to the name of a class that Java couldn't find in the class path.

You can choose which concrete class to instantiate by passing its name as a string to the createXMLReader( ) method. This code fragment instantiates the Xerces parser by name:

try {   XMLReader parser = XMLReaderFactory.createXMLReader(    "org.apache.xerces.parsers.SAXParser");   // parse the document... } catch (SAXException e) {   // couldn't create the XMLReader }

Now that you've created a parser, you're ready to parse some documents with it. Pass the system ID of the document you want to parse to the parse( ) method. The system ID is either an absolute or a relative URL encoded in a string. For example, this code fragment parses the document at http://www.slashdot.org/slashdot.xml:

try {   XMLReader parser = XMLReaderFactory.createXMLReader( );   parser.parse("http://www.slashdot.org/slashdot.xml"); } catch (SAXParseException e) {   // Well-formedness error } catch (SAXException e) {   // Could not find an XMLReader implementation class } catch (IOException e) {   // Some sort of I/O error prevented the document from being completely   // downloaded from the server }

The parse( ) method throws a SAXParseException if the document is malformed, an IOException if an I/O error such as a broken socket occurs while the document is being read, and a SAXException if anything else goes wrong. Otherwise, it returns void. To receive information from the parser as it reads the document, you must configure it with a ContentHandler.

19.1 The ContentHandler Interface

ContentHandler, shown in stripped-down form in Example 19-1, is an interface in the org.xml.sax package. You implement this interface in a class of your own devising. Next, you configure an XMLReader with an instance of your implementation. As the XMLReader reads the document, it invokes the methods in your object to tell your program what's in the XML document. You can respond to these method invocations in any way you see fit.

This class has no relation to the moribund java.net.ContentHandler class. However, you may encounter a name conflict if you import both java.net.* and org.xml.sax.* in the same class. It's better to import just the java.net classes you actually need, rather than the entire package.

Example 19-1. The org.xml.sax.ContentHandler Interface

package org.xml.sax; public interface ContentHandler {     public void setDocumentLocator(Locator locator);     public void startDocument( ) throws SAXException;     public void endDocument( ) throws SAXException;     public void startPrefixMapping(String prefix, String uri)      throws SAXException;     public void endPrefixMapping(String prefix) throws SAXException;     public void startElement(String namespaceURI, String localName,      String qualifiedName, Attributes atts) throws SAXException;     public void endElement(String namespaceURI, String localName,      String qualifiedName) throws SAXException;     public void characters(char[] text, int start, int length)      throws SAXException;     public void ignorableWhitespace(char[] text, int start, int length)      throws SAXException;     public void processingInstruction(String target, String data)      throws SAXException;     public void skippedEntity(String name) throws SAXException; }

Every time the XMLReader reads a piece of the document, it calls a method in its ContentHandler. Suppose a parser reads the simple document shown in Example 19-2.

Example 19-2. A simple XML document

<?xml version="1.0" encoding="ISO-8859-1"?> <?xml-stylesheet type='text/css' href='person.css'?> <!DOCTYPE person SYSTEM "person.dtd"> <person xmlns="http://xml.oreilly.com/person">   <name:name xmlns:name="http://xml.oreilly.com/name">     <name:first>Sydney</name:first>     <name:last>Lee</name:last>   </name:name>   <assignment project_id="p2"/> </person>

The parser will call these methods in its ContentHandler with these arguments in this order. The values of the arguments passed to each method are given after each method name:

setDocumentLocator(Locator locator) locator: org.apache.xerces.readers.DefaultEntityHandler@1f953d

```
startDocument( )
```

processingInstruction(String target, String data) target: "xml-stylesheet" data: "type='text/css' href='person.css'"

startPrefixMapping(String prefix, String namespaceURI) prefix: "" namespaceURI: "http://xml.oreilly.com/person"

startElement(String namespaceURI, String localName,  String qualifiedName, Attributes atts) namespaceURI: "http://xml.oreilly.com/person" localName: "person" qualifiedName: "person" atts: {} (no attributes, an empty list)

ignorableWhitespace(char[] text, int start, int length) text: <?xml version="1.0" encoding="ISO-8859-1"?> <?xml-stylesheet type='text/css' href='person.css'?> <!DOCTYPE person SYSTEM "person.dtd"> <person xmlns="http://xml.oreilly.com/person">   <name:name xmlns:name="http://xml.oreilly.com/name">     <name:first>Sydney</name:first>     <name:last>Lee</name:last>   </name:name>   <assignment project_id="p2"/> </person> start: 181 length: 3

startPrefixMapping(String prefix, String uri) prefix: "name" uri: "http://xml.oreilly.com/name")

startElement(String namespaceURI, String localName,  String qualifiedName, Attributes atts) namespaceURI: "http://xml.oreilly.com/name" localName: "name" qualifiedName: "name:name" atts: {} (no attributes, an empty list)

ignorableWhitespace(char[] text, int start, int length) text: <?xml version="1.0" encoding="ISO-8859-1"?> <?xml-stylesheet type='text/css' href='person.css'?> <!DOCTYPE person SYSTEM "person.dtd"> <person xmlns="http://xml.oreilly.com/person">   <name:name xmlns:name="http://xml.oreilly.com/name">     <name:first>Sydney</name:first>     <name:last>Lee</name:last>   </name:name>   <assignment project_id="p2"/> </person> start: 236 length: 5

startElement(String namespaceURI, String localName,  String qualifiedName, Attributes atts) namespaceURI: "http://xml.oreilly.com/name" localName: "first" qualifiedName: "name:first" atts: {} (no attributes, an empty list)

characters(char[] text, int start, int length) text: <?xml version="1.0" encoding="ISO-8859-1"?> <?xml-stylesheet type='text/css' href='person.css'?> <!DOCTYPE person SYSTEM "person.dtd"> <person xmlns="http://xml.oreilly.com/person">   <name:name xmlns:name="http://xml.oreilly.com/name">     <name:first>Sydney</name:first>     <name:last>Lee</name:last>   </name:name>   <assignment project_id="p2"/> </person> start: 253 length: 6

endElement(String namespaceURI, String localName, String qualifiedName) namespaceURI: "http://xml.oreilly.com/name" localName: "first" qualifiedName: "name:first"

ignorableWhitespace(char[] text, int start, int length) text: <?xml version="1.0" encoding="ISO-8859-1"?> <?xml-stylesheet type='text/css' href='person.css'?> <!DOCTYPE person SYSTEM "person.dtd"> <person xmlns="http://xml.oreilly.com/person">   <name:name xmlns:name="http://xml.oreilly.com/name">     <name:first>Sydney</name:first>     <name:last>Lee</name:last>   </name:name>   <assignment project_id="p2"/> </person> start: 272 length: 5

startElement(String namespaceURI, String localName, String qualifiedName, Attributes atts) namespaceURI: "http://xml.oreilly.com/name" localName: "last" qualifiedName: "name:last" atts: {} (no attributes, an empty list)

characters(char[] text, int start, int length) text: <?xml version="1.0" encoding="ISO-8859-1"?> <?xml-stylesheet type='text/css' href='person.css'?> <!DOCTYPE person SYSTEM "person.dtd"> <person xmlns="http://xml.oreilly.com/person">   <name:name xmlns:name="http://xml.oreilly.com/name">     <name:first>Sydney</name:first>     <name:last>Lee</name:last>   </name:name>   <assignment project_id="p2"/> </person> start: 288 length: 3

endElement(String namespaceURI, String localName, String qualifiedName) namespaceURI: "http://xml.oreilly.com/name" localName: "last" qualifiedName: "name:last"

ignorableWhitespace(char[] text, int start, int length) text: <?xml version="1.0" encoding="ISO-8859-1"?> <?xml-stylesheet type='text/css' href='person.css'?> <!DOCTYPE person SYSTEM "person.dtd"> <person xmlns="http://xml.oreilly.com/person">   <name:name xmlns:name="http://xml.oreilly.com/name">     <name:first>Sydney</name:first>     <name:last>Lee</name:last>   </name:name>   <assignment project_id="p2"/> </person> start: 303 length: 3

endElement(String namespaceURI, String localName, String qualifiedName) namespaceURI: "http://xml.oreilly.com/name" localName: "name" qualifiedName: "name:name"

endPrefixMapping(String prefix) prefix: "name"

ignorableWhitespace(char[] text, int start, int length) text: <?xml version="1.0" encoding="ISO-8859-1"?> <?xml-stylesheet type='text/css' href='person.css'?> <!DOCTYPE person SYSTEM "person.dtd"> <person xmlns="http://xml.oreilly.com/person">   <name:name xmlns:name="http://xml.oreilly.com/name">     <name:first>Sydney</name:first>     <name:last>Lee</name:last>   </name:name>   <assignment project_id="p2"/> </person> start: 318 length: 3

startElement(String namespaceURI, String localName, String qualifiedName, Attributes atts) namespaceURI: "http://xml.oreilly.com/person" localName: "assignment" qualifiedName: "assignment atts: {project_id="p2"}

endElement(String namespaceURI, String localName, String qualifiedName) namespaceURI: "http://xml.oreilly.com/person" localName: "assignment" qualifiedName: "assignment"

ignorableWhitespace(char[] text, int start, int length) text: <?xml version="1.0" encoding="ISO-8859-1"?> <?xml-stylesheet type='text/css' href='person.css'?> <!DOCTYPE person SYSTEM "person.dtd"> <person xmlns="http://xml.oreilly.com/person">   <name:name xmlns:name="http://xml.oreilly.com/name">     <name:first>Sydney</name:first>     <name:last>Lee</name:last>   </name:name>   <assignment project_id="p2"/> </person> start: 350 length: 1

endElement(String namespaceURI, String localName, String qualifiedName) namespaceURI: "http://xml.oreilly.com/person" localName: "person" qualifiedName: "person"

endPrefixMapping(String prefix) prefix: ""

```
endDocument( )
```

Some pieces of this are not deterministic. Note that the char array passed to each call to characters( ) and ignorableWhitespace( ) actually contains the entire document! The specific text block that the parser really returns is indicated by the second two arguments. This is an optimization that Xerces-J performs. Other parsers are free to pass different char arrays as long as they set the start and length arguments to match. Indeed, the parser is also free to split a long run of plain text across multiple calls to characters( ) or ignorableWhitespace( ), so you cannot assume that these methods necessarily return the longest possible contiguous run of plain text. Other details that may change from parser to parser include attribute order within a tag and whether a Locator object is provided by calling setDocumentLocator( ).

Suppose you want to count the number of elements, attributes, processing instructions, and characters of plain text that exist in a given XML document. To do so, first write a class that implements the ContentHandler interface. The current count of each of the four items of interest is stored in a field. The field values are initialized to zero in the startDocument( ) method, which is called exactly once for each document parsed. Each callback method in the class increments the relevant field. The endDocument( ) method reports the total for that document. Example 19-3 is such a class.

Example 19-3. The XMLCounter ContentHandler

import org.xml.sax.*; public class XMLCounter implements ContentHandler {   private int numberOfElements;   private int numberOfAttributes;   private int numberOfProcessingInstructions;   private int numberOfCharacters;   public void startDocument( ) throws SAXException {     numberOfElements = 0;     numberOfAttributes = 0;     numberOfProcessingInstructions = 0;     numberOfCharacters = 0;   }   // We should count either the start-tag of the element or the end-tag,   // but not both. Empty elements are reported by each of these methods.   public void startElement(String namespaceURI, String localName,    String qualifiedName, Attributes atts) throws SAXException {     numberOfElements++;     numberOfAttributes += atts.getLength( );   }   public void endElement(String namespaceURI, String localName,    String qualifiedName) throws SAXException {}   public void characters(char[] text, int start, int length)    throws SAXException {     numberOfCharacters += length;   }   public void ignorableWhitespace(char[] text, int start, int length)    throws SAXException {     numberOfCharacters += length;   }   public void processingInstruction(String target, String data)    throws SAXException {     numberOfProcessingInstructions++;   }   // Now that the document is done, we can print out the final results   public void endDocument( ) throws SAXException {     System.out.println("Number of elements: " + numberOfElements);     System.out.println("Number of attributes: " + numberOfAttributes);     System.out.println("Number of processing instructions: "      + numberOfProcessingInstructions);     System.out.println("Number of characters of plain text: "      + numberOfCharacters);   }   // Do-nothing methods we have to implement only to fulfill   // the interface requirements:   public void setDocumentLocator(Locator locator) {}   public void startPrefixMapping(String prefix, String uri)    throws SAXException {}   public void endPrefixMapping(String prefix) throws SAXException {}   public void skippedEntity(String name) throws SAXException {} }

This class needs to override most methods in the ContentHandler interface. However, if you only really want to provide one or two ContentHandler methods, you may want to subclass the DefaultHandler class instead. This adapter class implements all methods in the ContentHandler interface with do-nothing methods, so you only have to override methods in which you're genuinely interested.

Next, build an XMLReader, and configure it with an instance of this class. Finally, parse the documents you want to count, as in Example 19-4.

Example 19-4. The DocumentStatistics driver class

import org.xml.sax.*; import org.xml.sax.helpers.*; import java.io.IOException; public class DocumentStatistics {   public static void main(String[] args) {     XMLReader parser;     try {      parser = XMLReaderFactory.createXMLReader( );     }     catch (SAXException e) {       // fall back on Xerces parser by name       try {         parser = XMLReaderFactory.createXMLReader(          "org.apache.xerces.parsers.SAXParser");       }       catch (SAXException ee) {         System.err.println("Couldn't locate a SAX parser");         return;       }     }     if (args.length == 0) {       System.out.println(        "Usage: java DocumentStatistics URL1 URL2...");     }     // Install the Content Handler     parser.setContentHandler(new XMLCounter( ));     // start parsing...     for (int i = 0; i < args.length; i++) {       // command line should offer URIs or file names       try {         parser.parse(args[i]);       }       catch (SAXParseException e) { // well-formedness error         System.out.println(args[i] + " is not well formed.");         System.out.println(e.getMessage( )          + " at line " + e.getLineNumber( )          + ", column " + e.getColumnNumber( ));       }       catch (SAXException e) { // some other kind of error         System.out.println(e.getMessage( ));       }       catch (IOException e) {         System.out.println("Could not report on " + args[i]          + " because of the IOException " + e);       }     }   } }

Running the program in Example 19-4 across the document in Example 19-2 results in the following output:

D:\books\xian\examples\18>java DocumentStatistics 18-2.xml Number of elements: 5 Number of attributes: 1 Number of processing instructions: 1 Number of characters of plain text: 29

This generic program of Example 19-4 works on any well-formed XML document. Most SAX programs are more specific and only work with certain XML applications. They look for particular elements or attributes in particular places and respond to them accordingly. They may rely on patterns that are enforced by a validating parser. Still, this behavior comprises the fundamentals of SAX.

The complicated part of most SAX programs is the data structure you must build to store information returned by the parser until you're ready to use it. Sometimes this information can be as complicated as the XML document itself, in which case you may be better off using DOM, which at least provides a ready-made data structure for an XML document. You usually want only some information, though, and the data structure you construct should be less complex than the document itself.

19.2 SAX Features and Properties

SAX uses properties and features to control parser behavior. Each feature and property has a name that's an absolute URI. Like namespace URIs, these URIs are only used to name things and do not necessarily point to a real page you can load into a web browser. Features are either true or false; that is, they're Booleans. Properties have values of an appropriate Object type.

The http://xml.org/sax/features/validation feature controls whether a parser validates. If this feature is true, then the parser will report validity errors in the document to the registered ErrorHandler; otherwise, it won't. This feature is turned off by default. To turn a feature on, pass the feature's name and value to the XMLReader's setFeature( ) method:

try {   parser.setFeature("http://xml.org/sax/features/validation", true); } catch (SAXNotSupportedException e) {   System.out.println("Cannot turn on validation right now."); } catch (SAXNotRecognizedException e) {   System.out.println("This is not a validating parser."); }

Not all parsers can validate. If you try to turn on validation in a parser that doesn't validate or set any other feature the parser doesn't provide, setFeature( ) throws a SAXNotRecognizedException. If you try to set a feature the parser does recognize but cannot change at the current time e.g., you try to turn on validation when the parser has already read half of the document setFeature( ) throws a SAXNotSupportedException. Both are subclasses of SAXException.

You can check a feature's current value using XMLReader's getFeature( ) method. This method returns a boolean and throws the same exceptions for the same reasons as setFeature( ). If you want to know whether the parser validates, you can ask in the following manner:

try {   boolean isValidating =    parser.getFeature("http://xml.org/sax/features/validation"); } catch (SAXException e) {   System.out.println("This is not a validating parser"); }

Properties are similar to features, but they allow a broader choice than a simple Boolean on/off, true/false dichotomy. Each property value is an object of unspecified type. For example, if you want to know the literal string of data parsed to produce the current SAX event, you can ask by reading the http://xml.org/sax/properties/xml-string property with the getProperty( ) method:

try {   String tag = (String) parser.getProperty(    "http://xml.org/sax/properties/xml-string"); } catch (SAXNotSupportedException e) {   System.out.println("This parser does not provide the original data"); } catch (SAXNotRecognizedException e) {   System.out.println("Parser does not recognize the " +    "http://xml.org/sax/properties/xml-string property"); }

You can change a property value by invoking the setProperty( ) method with two arguments. The first is the URI of the property to set. The second is the object specifying the value for the property. For example, this code fragment attempts to set the http://xml.org/sax/properties/LexicalHandler property to a new instance of the MyLexicalHandlerClass. The parser reports lexical events (comments, CDATA sections, and entity references) to the org.xml.sax.ext.LexicalHandler implementation object named by this property:

try {   parser.setProperty(     "http://xml.org/sax/properties/LexicalHandler",     new MyLexicalHandlerClass( )   ); } catch (SAXException e) {   System.out.println("This parser does not provide lexical events."); }

If you pass in the wrong kind of object for a property (e.g., an object that does not implement the LexicalHandler interface for the http://xml.org/sax/properties/LexicalHandler property), then setProperty( ) throws a SAXNotSupportedException.

Not all features and properties can be set at all times. For example, you cannot suddenly decide to start validating when the parser is already halfway through a document. An attempt to do so will fail and throw a SAXNotSupportedException. However, you can change a parser's features in between documents after parsing one document, but before parsing the next. You can read most feature and property values at any time.

19.3 Filters

A SAX filter sits in between the parser and the client application and intercepts the messages that these two objects pass to each other. It can pass these messages unchanged or modify, replace, or block them. To a client application, the filter looks like a parser, that is, an XMLReader. To the parser, the filter looks like a client application, that is, a ContentHandler.

SAX filters are implemented by subclassing the org.xml.sax.helpers.XMLFilterImpl class.^[1] This class implements all the required interfaces of SAX for both parsers and client applications. That is, its signature is as follows:

public class XMLFilterImpl implements XMLFilter, XMLReader,  ContentHandler, DTDHandler, ErrorHandler

Your own filters will extend this class and override those methods that correspond to the messages you want to filter. For example, if you wanted to filter out all processing instructions, you would write a filter that would override the processingInstruction( ) method to do nothing, as shown in Example 19-5.

Example 19-5. A SAX filter that removes processing instructions

import org.xml.sax.helpers.XMLFilterImpl; public class ProcessingInstructionStripper extends XMLFilterImpl {   public void processingInstruction(String target, String data) {     // Because we do nothing, processing instructions read in the     // document are *not* passed to client application   } }

If instead you wanted to replace a processing instruction with an element whose name was the same as the processing instruction's target and whose text content was the processing instruction's data, you'd call the startElement( ), characters( ), and endElement( ) methods from inside the processingInstruction( ) method after filling in the arguments with the relevant data from the processing instruction, as shown in Example 19-6.

Example 19-6. A SAX filter that converts processing instructions to elements

import org.xml.sax.*; import org.xml.sax.helpers.*; public class ProcessingInstructionConverter extends XMLFilterImpl {   public void processingInstruction(String target, String data)    throws SAXException {     // AttributesImpl is an adapter class in the org.xml.sax.ext package     // for precisely this case. We don't really want to add any attributes     // here, but we need to pass something as the fourth argument to     // startElement( ).     Attributes emptyAttributes = new AttributesImpl( );     // We won't use any namespace for the element     startElement("", target, target, emptyAttributes);     // converts String data to char array     char[] text = data.toCharArray( );     characters(text, 0, text.length);     endElement("", target, target);   } }

We used this filter before passing Example 19-2 into a program that echoes an XML document onto System.out and were a little surprised to see this come out:

<xml-stylesheet>type="text/css" href="person.css"</xml-stylesheet> <person xmlns="http://xml.oreilly.com/person">   <name:name xmlns:name="http://xml.oreilly.com/name">     <name:first>Sydney</name:first>     <name:last>Lee</name:last>   </name:name>   <assignment project_id="p2"></assignment> </person>

This document is not well-formed! The specific problem is that there are two independent root elements. However, on further consideration that's really not too surprising. Well-formedness checking is normally done by the underlying parser when it reads the text form of an XML document. SAX filters should but are not absolutely required to provide well-formed XML data to client applications. Indeed, they can produce substantially more malformed data than this by including start-tags that are not matched by end-tags, text that contains illegal characters such as the formfeed or the vertical tab, and XML names that contain non-name characters such as * and . You need to be very careful before assuming data you receive from a filter is valid or well-formed.

If you want to invoke a method without filtering it or you want to invoke the same method in the underlying handler, you can prefix a call to it with the super keyword. This invokes the variant of the method from the superclass. By default, each method in XMLFilterImpl just passes the same arguments to the equivalent method in the parent handler. Example 19-7 demonstrates with a filter that changes all character data to uppercase by overriding the characters( ) method.

Example 19-7. A SAX filter that converts text to uppercase

import org.xml.sax.*; import org.xml.sax.helpers.*; public class UpperCaseFilter extends XMLFilterImpl {   public void characters(char[] text, int start, int length)    throws SAXException {     String temp = new String(text, start, length);     temp = temp.toUpperCase( );     text = temp.toCharArray( );     super.characters(text, 0, text.length);   } }

Actually, using a filter involves these steps:

Create a filter object, normally by invoking its own constructor.
Create the XMLReader that will actually parse the document, normally by calling XMLReaderFactory.createXMLReader( ).
Attach the filter to the parser using the filter's setParent( ) method.
Install a ContentHandler in the filter.
Parse the document by calling the filter's parse( ) method.

Details can vary a little from application to application. For instance, you might install other handlers besides the ContentHandler or change the parent between documents. However, once the filter has been attached to the underlying XMLReader, you should not directly invoke any methods on this underlying parser; you should only talk to it through the filter. For example, this is how you'd use the filter in Example 19-7 to parse a document:

XMLFilter filter = new UpperCaseFilter( ); filter.setParent(XMLReaderFactory.createXMLReader( )); filter.setContentHandler(yourContentHandlerObject); filter.parse(document);

Notice specifically that you invoke the filter's parse( ) method, not the underlying parser's parse( ) method.

[1] There's also an org.xml.sax.XMLFilter interface. However, this interface is arranged exactly backwards for most use cases. It filters messages from the client application to the parser, but not the much more important messages from the parser to the client application. Furthermore, implementing the XMLFilter interface directly requires a lot more work than subclassing XMLFilterImpl. Almost no experienced SAX programmer would choose to implement XMLFilter directly rather than subclassing the XMLFilterImpl adapter class.

CONTENTS