Content Filters


By this point, you may have noticed a problem with the XMLFilter interface. It filters calls from the client application to the parser; however, most events are passed in the opposite direction from the parser to the client application through the various callback interfaces, particularly ContentHandler . XMLFilter is set up exactly backwards for filtering these events. It filters calls from the client application to the parser, but not the much more important calls from the parser to the client application!

It is possible to work around this. First, intercept the handlers passed to methods such as setContentHandler() and setDTDHandler() . Then, replace them with handlers of your own so that your handlers receive the callback events from the parent parser. They can either pass them along to the client application's handler methods, or pass something different instead.

Filtering Tags

For example, let's suppose you want to convert a RDDL document to pure XHTML. RDDL, Resource Directory Description Language, is used for human- and machine-readable documents placed at the end of namespace URLs. RDDL is actually quite close to XHTML to begin with. It's just an XHTML Basic document that contains one extra element, rddl:resource , which can appear anywhere a p element can appear, and can contain anything a div element can contain. The customary rddl prefix is mapped to the http://www.rddl.org/ namespace URL, and as usual the prefix can change as long as the URL remains the same. For example, the following is a rddl:resource from the RDDL specification [http://www.rddl.org/] itself:

 <rddl:resource id="rec-xhtml"          xlink:title="W3C REC XHTML"         xlink:role="http://www.w3.org/1999/xhtml"         xlink:arcrole="http://www.rddl.org/purposes#reference"         xlink:href="http://www.w3.org/tr/xhtml1"         > <li><a href="http://www.w3.org/tr/xhtml1">W3C XHTML 1.0</a></li> </rddl:resource> 

The filter needs to throw away the <rddl:resource> start-tag and </rddl:resource> end-tag while leaving everything else intact. Example 8.6 does this. The startElement() and endElement() methods pass everything except rddl:resource tags to the parent's ContentHandler . However, rddl:resource tags are just dropped on the floor.

Example 8.6 A ContentHandler Filter
 import org.xml.sax.*; public class RDDLStripper implements ContentHandler {   public final static String RDDL_NAMESPACE    = "http://www.rddl.org/";   public final static String XHTML_NAMESPACE    = "http://www.w3.org/1999/xhtml";   private ContentHandler parent;   public RDDLStripper(ContentHandler parent) {     this.parent = parent;   }   // Filter out <rddl:resource> start-tags   public void startElement(String namespaceURI, String localName,    String qualifiedName, Attributes atts) throws SAXException {     if (localName.equals("resource")         && namespaceURI.equals(RDDL_NAMESPACE)) {       return; // having done nothing     }     else {// pass the element along       parent.startElement(namespaceURI, localName, qualifiedName, atts);     }   }   // Filter out </rddl:resource> end-tags   public void endElement(String namespaceURI, String localName,    String qualifiedName) throws SAXException {     if (localName.equals("resource")         && namespaceURI.equals(RDDL_NAMESPACE)) {       return; // having done nothing     }     else {       parent.endElement(namespaceURI, localName, qualifiedName);     }   }   // Methods that pass data along unchanged:   public void startDocument() throws SAXException {     parent.startDocument();   }   public void startPrefixMapping(String prefix, String uri)    throws SAXException {     parent.startPrefixMapping(prefix, uri);   }   public void endPrefixMapping(String prefix)    throws SAXException {     parent.endPrefixMapping(prefix);   }   public void setDocumentLocator(Locator locator) {     parent.setDocumentLocator(locator);   }   public void endDocument() throws SAXException {     parent.endDocument();   }   public void characters(char[] text, int start, int length)    throws SAXException {     parent.characters(text, start, length);   }   public void ignorableWhitespace(char[] text, int start,    int length) throws SAXException {     parent.ignorableWhitespace(text, start, length);   }   public void processingInstruction(String target, String data)    throws SAXException {     parent.processingInstruction(target, data);   }   public void skippedEntity(String name)    throws SAXException {     parent.skippedEntity(name);   } } 

Example 8.6 is just a slight variation on the earlier TransparentFilter (demonstrated in Example 8.3). Here, however, the setContentHandler() method installs an RDDLStripper . [2] The next step is to install an RDDLStripper as an XMLFilter 's ContentHandler . Example 8.7 does this.

[2] In fact, it probably should have been written as a subclass of TransparentFilter that overrode this one method. But I've discovered that whenever I use inheritance to extend one example from an earlier one, I get daily e-mail from readers asking me where the missing methods are. The next section will introduce a standard class that you can extend to avoid repeating all these methods.

Example 8.7 A Filter That Substitutes Its Own ContentHandler
 import org.xml.sax.*; import java.io.IOException; public class RDDLFilter implements XMLFilter {   private XMLReader parent;   public void setParent(XMLReader parent) {     this.parent = parent;   }   public XMLReader getParent() {     return this.parent;   }   public void setContentHandler(ContentHandler handler) {     parent.setContentHandler(new RDDLStripper(handler));   }   // Transparent pass-along methods   public ContentHandler getContentHandler() {     return parent.getContentHandler();   }   public boolean getFeature(String name)    throws SAXNotRecognizedException, SAXNotSupportedException {     return parent.getFeature(name);   }   public void setFeature(String name, boolean value)    throws SAXNotRecognizedException, SAXNotSupportedException {     parent.setFeature(name, value);   }   public Object getProperty(String name)    throws SAXNotRecognizedException, SAXNotSupportedException {     return parent.getProperty(name);   }      public void setProperty(String name, Object value)    throws SAXNotRecognizedException, SAXNotSupportedException {     parent.setProperty(name, value);   }   public void setEntityResolver(EntityResolver resolver) {     parent.setEntityResolver(resolver);   }   public EntityResolver getEntityResolver() {     return parent.getEntityResolver();   }   public void setDTDHandler(DTDHandler handler) {     parent.setDTDHandler(handler);   }   public DTDHandler getDTDHandler() {     return parent.getDTDHandler();   }   public void setErrorHandler(ErrorHandler handler) {     parent.setErrorHandler(handler);   }   public ErrorHandler getErrorHandler() {     return parent.getErrorHandler();   }   public void parse(InputSource input)    throws SAXException, IOException {     parent.parse(input);   }   public void parse(String systemId)    throws SAXException, IOException {     parent.parse(systemId);   } } 

The final step is to write a client application that uses this filter to read either XHTML or RDDL documents, or both. Whichever kind you feed it, the client application should only see XHTML. For this client application, I'm going to use a ContentHandler that just prints the result on System.out . More specifically , I'm going to use David Megginson's public-domain com.megginson.sax.XMLWriter class [http://www.megginson.com/Software/index.html]. This class not only implements ContentHandler ; it's also an XMLFilter . But because it uses a few features we haven't discussed yet, detailed analysis of this class will have to wait. In the meantime, all you really need to know is that XMLWriter logs all events the parser fires to a specified java.io.Writer in a well- formed way.

Example 8.8 demonstrates with a driver program that parses a document named by the first command line argument. It uses the XMLFilter class named by the second command line argument. This test program will allow us to inspect the output of various filters in this chapter.

Example 8.8 A Program That Filters Documents
 import org.xml.sax.*; import org.xml.sax.helpers.XMLReaderFactory; import java.io.*; import com.megginson.sax.XMLWriter; public class FilterTester {   public static void main(String[] args) {     if (args.length < 2) {       System.out.println(        "Usage: java FilterTester URL FilterClass");       return;     }     String document = args[0];     String filterClass = args[1];     try {       XMLFilter filter        = (XMLFilter) Class.forName(filterClass).newInstance();       filter.setParent(XMLReaderFactory.createXMLReader());       filter.setContentHandler(        new XMLWriter(new OutputStreamWriter(System.out))       );       filter.parse(document);     }     catch (SAXException e) {       e.printStackTrace();       System.out.println(e);     }     catch (IOException e) {         e.printStackTrace();      System.out.println(        "Due to an IOException, the parser could not read "        + args[0]       );     }     catch (ClassCastException e) {       System.out.println(filterClass        + " does not implement org.xml.sax.XMLFilter");     }     catch (ClassNotFoundException e) {       System.out.println(filterClass        + " cannot be found in the CLASSPATH");     }     catch (InstantiationException e) {       System.out.println(filterClass        + " does not have a no-args constructor");     }     catch (Exception e) {       System.err.println(e);     }   } } 

For example, here are the actual last few lines of the RDDL specification (modulo the usual white-space adjustments):

 <rddl:resource          id="note-xlink2rdf"         xlink:title="W3C NOTE XLink2RDF"         xlink:role="http://www.w3.org/TR/html4/"         xlink:arcrole="http://www.rddl.org/purposes#reference"         xlink:href="http://www.w3.org/TR/xlink2rdf/"         > <li><a href="http://www.w3.org/TR/xlink2rdf/">W3C Note Harvesting RDF Statements from XLinks</a></li> </rddl:resource> <rddl:resource id="rec-modxhtml"         xlink:title="W3C REC Modularization for XHTML"         xlink:role="http://www.w3.org/1999/xhtml"         xlink:arcrole="http://www.rddl.org/purposes#reference"         xlink:href="http://www.w3.org/TR/xhtml-modularization/"         > <li><a href="http://www.w3.org/TR/xhtml-modularization/">W3C Modularization of XHTML</a></li> </rddl:resource> </ol> </div> </body> </html> 

Now here's some of the output when I passed the RDDL specification through this program. The rddl:resource tags have been removed.

 %  java -Dorg.xml.sax.driver=org.apache.xerces.parsers.SAXParser   FilterTester http://www.rddl.org/ RDDLFilter  ...         <li><a href="http://www.w3.org/TR/xlink2rdf/">W3C Note         Harvesting RDF Statements from XLinks</a></li>         <li><a href="http://www.w3.org/TR/xhtml-modularization/">         W3C Modularization of XHTML</a></li> </ol> </div> </body> </html> 

The pattern followed here is nearly identical to that shown in Figure 8.2. If we revise that figure to include the specific class names used here you get Figure 8.4.

Figure 8.4. How Data Flows through the RDDLStripper Program

graphics/08fig04.gif

Because SAX operates in a linear fashion rather than a hierarchical fashion, filtering the start-tag and end-tag of each rddl:resource element does not filter the contents of the rddl:resource element. In this case, that's exactly what we want. RDDL is a little unusual in that it both contains XHTML and is contained by it. Most applications that get mixed with XHTML, such as SVG and MathML, are contained in XHTML but do not themselves contain XHTML.

Filtering Elements

It's easy enough to drop out any elements that are not in the XHTML namespace. In the case of SVG, MathML, and most other applications, however, you'll want to remove the content of these elements as well. I'll assume that the namespace for text is the same as the namespace of the parent element. (This is not at all clear from the namespaces specification, but it makes sense in many cases.) To track the nearest namespace for nonelements, startElement() will push the element's namespace onto a stack, and endElement() will pop it off. Peeking at the top of the stack will tell you what namespace the nearest element uses. You can't use NamespaceSupport() or startPrefixMapping() and endPrefixMapping() here, because those track namespace declarations rather than the actual namespaces used. Example 8.9 is the ContentHandler that accomplishes this.

Example 8.9 A ContentHandler Filter That Throws Away Non-XHTML Elements
 import org.xml.sax.*; import java.util.*; public class PureXHTMLHandler implements ContentHandler {   public final static String XHTML_NAMESPACE    = "http://www.w3.org/1999/xhtml";   private ContentHandler parent;   private Stack namespaces;  // initialized in startDocument()   public PureXHTMLHandler(ContentHandler parent) {     this.parent = parent;   }   // Filter out anything that's not in the XHTML namespace   public void startElement(String namespaceURI, String localName,    String qualifiedName, Attributes atts) throws SAXException {     namespaces.push(namespaceURI);     if (namespaceURI.equals(XHTML_NAMESPACE)) {       parent.startElement(namespaceURI, localName, qualifiedName, atts);     }     // else do nothing   }   // Filter out anything that's not in the XHTML namespace   public void endElement(String namespaceURI, String localName,    String qualifiedName) throws SAXException {     namespaces.pop();     if (namespaceURI.equals(XHTML_NAMESPACE)) {       parent.endElement(namespaceURI, localName, qualifiedName);     }     // else do-nothing   }   // simple utility method to determine whether or not the parent   // element is an XHTML element   private boolean inXHTML() {     try {       String namespace = (String) namespaces.peek();       if (namespace.equals(XHTML_NAMESPACE)) return true;       return false;     }     catch (EmptyStackException e) {       // This means we're outside the root element in a       // processing instruction. Such processing instructions are       // legal in XHTML so I keep it.       return true;     }   }   public void characters(char[] text, int start, int length)    throws SAXException {     if (inXHTML()) parent.characters(text, start, length);   }   public void ignorableWhitespace(char[] text, int start,    int length) throws SAXException {     if (inXHTML()) parent.ignorableWhitespace(text, start, length);   }   public void processingInstruction(String target, String data)    throws SAXException {     if (inXHTML()) parent.processingInstruction(target, data);   }   public void skippedEntity(String name)    throws SAXException {     if (inXHTML()) parent.skippedEntity(name);   }   // I could track namespace declarations with these next two   // methods and only pass along those for XHTML. However,   // that is quite tricky because the endPrefixMapping()   // method only reports the prefix, not the URI.   public void startPrefixMapping(String prefix, String uri)    throws SAXException {     parent.startPrefixMapping(prefix, uri);   }   public void endPrefixMapping(String prefix)    throws SAXException {     parent.endPrefixMapping(prefix);   }   // Methods that pass data along unchanged:   public void startDocument() throws SAXException {     namespaces = new Stack();     parent.startDocument();   }   public void endDocument() throws SAXException {     parent.endDocument();   }   public void setDocumentLocator(Locator locator) {     parent.setDocumentLocator(locator);   } } 

One thing to consider: What happens when a non-XHTML document such as a pure SVG or MathML document is passed through this filter? In other words, what if the root element is not an XHTML element? In this case, the entire document, except perhaps for a few comments and processing instructions that precede or follow the root element, is filtered out. The resulting document does not have a root element. It is thus not well-formed!

Believe it or not, there is no prohibition against this. XML documents must be well-formed, but the only normative definition of an XML document is as a sequence of characters. Here we no longer have a sequence of characters; instead, we have a sequence of method invocations. If those method invocations are coming straight from a parsed XML document, then there are certain guarantees of well- formedness , such as there being a single root element. However, if you're a little further removed from the actual XML document, these guarantees no longer hold. Indeed filters can do considerably weirder things, such as passing along startElement() calls but blocking the corresponding endElement() calls, or passing non-white-space characters to the ignorableWhiteSpace() method. If your applications have reason to do things like this, go ahead; but document the filter's behavior exhaustively, and be very careful about the client applications that receive data from your filter. Filters can very easily violate a handler's implicit preconditions with predictably disastrous results.

Filtering Attributes

You can filter attributes in the startElement() method. Unlike element content, the complete set of attributes for any element is completely available in a single method call as an Attributes object. Thus it's easy to read through the list and respond appropriately without building complicated data structures to maintain state between method invocations.

The only mildly tricky part is that the Attributes interface is read-only. It provides numerous getter methods but no corresponding setter methods. Thus if you want to do anything other than forward the original Attributes object along unchanged, you'll need to create your own object that implements the Attributes interface and is mutable. Although doing so would be a simple matter of programming, most of the time it's easier to use the class SAX provides for this purpose, org.xml.sax.helpers.AttributesImpl , summarized in Example 8.10. This class implements the Attributes interface and adds methods for copying existing Attributes objects, adding attributes to the list, and deleting attributes from it. This class is quite useful when writing filters that process attributes.

Example 8.10 The AttributesImpl Helper Class
 package org.xml.sax.helpers; public class AttributesImpl implements Attributes {   public AttributesImpl();   public AttributesImpl(Attributes atts);   public int    getLength();   public String getURI(int index);   public String getLocalName(int index);   public String getQualifiedName(int index);   public String getType(int index);   public String getValue(int index);   public int    getIndex(String uri, String localName);   public int    getIndex(String qualifiedName);   public String getType(String uri, String localName);   public String getType(String qualifiedName);   public String getValue(String uri, String localName);   public String getValue(String qualifiedName);   public void clear();   public void setAttributes(Attributes atts);   public void addAttribute(String uri, String localName,    String qualifiedName, String type, String value);   public void setAttribute(int index, String uri,    String localName, String qualifiedName, String type,    String value);   public void removeAttribute(int index);   public void setURI(int index, String uri);   public void setLocalName(int index, String localName);   public void setQualifiedName(int index, String qualifiedName);   public void setType(int index, String type);   public void setValue(int index, String value); } 

As an example, let's extend the pure XHTML filter of Example 8.9 to remove non-XHTML attributes such as xlink:href from XHTML elements. XHTML attributes are always in no namespace, so we can just drop out any attributes that are in a namespace. The only exceptions to this are the three xml: attributes: xml:lang , xml:space , and xml:base . These are in the http://www.w3.org/XML/1998/namespace namespace, conveniently represented in SAX by the named constant NamespaceSupport.XMLNS . The revised startElement() method would be as follows :

 public void startElement(String namespaceURI, String localName,   String qualifiedName, Attributes atts) throws SAXException {   namespaces.push(namespaceURI);   if (namespaceURI.equals(XHTML_NAMESPACE)) {     AttributesImpl newAttributes = new AttributesImpl();     // copy only those attributes that are not in a namespace     // and do not declare a non-XHTML namespace     for (int i = 0; i < atts.getLength(); i++) {       if (atts.getURI(i).equals("")            atts.getURI(i).equals(NamespaceSupport.XMLNS)) {         if (!atts.getQName(i).startsWith("xmlns:")) {           if (!atts.getQName(i).equals("xmlns")                atts.getValue(i).equals(XHTML_NAMESPACE)) {             newAttributes.addAttribute(atts.getURI(i),                                        atts.getLocalName(i),                                        atts.getQName(i),                                        atts.getType(i),                                        atts.getValue(i));           }         }       }     }     parent.startElement(namespaceURI, localName, qualifiedName,      newAttributes);   }   // else do nothing } 

The rest of the methods are the same as in Example 8.9, so I won't repeat them here.

Filters That Add Content

So far I've demonstrated filters that simply remove particular tags, elements, and attributes from the parsed document, but it's equally easy to develop filters that insert new content or replace existing content. For example, it's easy to imagine a filter that converts transitional XHTML to strict XHTML. Such a filter would have to replace deprecated elements such as <font size ="-1"> with their strict equivalents such as <span style="font-size: small"> . The details of listing all the tags that need to be fixed are a little tedious , but the algorithm for doing this is not particularly difficult.

More simply, we can write a filter that modifies just one element. Example 8.11 demonstrates such a filter, replacing rddl:resource elements with a simple table. The various attributes of the resource are mapped to different parts of the table. In particular, you would start with an rddl:resource like this:

 <rddl:resource          id="note-xlink2rdf"         xlink:title="W3C NOTE XLink2RDF"         xlink:role="http://www.w3.org/TR/html4/"         xlink:arcrole="http://www.rddl.org/purposes#reference"         xlink:href="http://www.w3.org/TR/xlink2rdf/"         >       <li>         <a href="http://www.w3.org/TR/xlink2rdf/">           W3C Note Harvesting RDF Statements from XLinks         </a>       </li> </rddl:resource> 

The filter would turn it into an XHTML table that looks like this:

 <table id="note-xlink2rdf">    <caption>W3C NOTE XLink2RDF</caption>   <tr><td>Role: </td><td>http://www.w3.org/TR/html4/</td></tr>   <tr><td>Arcrole: </td>          <td>http://www.rddl.org/purposes#reference</td></tr>   <tr><td>Href: </td><td><a href="http://www.w3.org/TR/xlink2rdf/">    http://www.w3.org/TR/xlink2rdf/</a></td></tr>   <tr>     <td colspan="2">       <li>         <a href="http://www.w3.org/TR/xlink2rdf/">           W3C Note Harvesting RDF Statements from XLinks         </a>       </li>     </td>   </tr> </table> 

This means that a single <rddl:resource> start-tag must create several elements and tags in the output document. There's no guarantee of a one-to-one mapping between elements in the original and filtered content.

Example 8.11 Changing One Element into Another
 import org.xml.sax.*; import org.xml.sax.helpers.*; public class ResourceToTable implements ContentHandler {   public final static String RDDL_NAMESPACE    = "http://www.rddl.org/";   public final static String XHTML_NAMESPACE    = "http://www.w3.org/1999/xhtml";   public final static String XLINK_NAMESPACE    = "http://www.w3.org/1999/xlink";   private ContentHandler parent;   public ResourceToTable(ContentHandler parent) {     this.parent = parent;   }   // Replace <rddl:resource> start-tags with the beginning of   // a table. Move the attributes into table cells.   // Make the content of the element the last table row.   public void startElement(String namespaceURI, String localName,    String qualifiedName, Attributes atts) throws SAXException {     if (localName.equals("resource")         && RDDL_NAMESPACE.equals(namespaceURI)) {       String hrefAtt    = atts.getValue(XLINK_NAMESPACE, "href");       String roleAtt    = atts.getValue(XLINK_NAMESPACE, "role");       String arcroleAtt        = atts.getValue(XLINK_NAMESPACE, "arcrole");       String titleAtt   = atts.getValue(XLINK_NAMESPACE, "title");       // We need to copy the id, xml:lang, and xml:base       // attributes, if any, from the rddl:resource element to       // the XHTML table element       AttributesImpl tableAttributes = new AttributesImpl();       String id = atts.getValue("id");       if (id != null) {         tableAttributes.addAttribute("", "id", "id", "ID", id);       }       String lang = atts.getValue("xml:lang");       if (lang != null) {         tableAttributes.addAttribute(NamespaceSupport.XMLNS,          "lang", "xml:lang", "NMTOKEN", lang);       }       String base = atts.getValue("xml:base");       // xml:base is not legal in XHTML. Here I just drop it, but       // it would be preferable to use it to resolve URLs in the       // XHTML document before passing them along       tableAttributes.addAttribute("", "border", "border",        "NMTOKEN", "1");       parent.startElement(        XHTML_NAMESPACE, "table", "table", tableAttributes);       Attributes noAtts = new AttributesImpl();       if (titleAtt != null) {         parent.startElement(          XHTML_NAMESPACE, "caption", "caption", noAtts);         characters(titleAtt.toCharArray(), 0, titleAtt.length());         endElement(XHTML_NAMESPACE, "caption", "caption");       }       if (roleAtt != null) {         parent.startElement(XHTML_NAMESPACE, "tr", "tr", noAtts);         parent.startElement(XHTML_NAMESPACE, "td", "td", noAtts);         String role = "Role: ";         characters(role.toCharArray(), 0, role.length());         endElement(XHTML_NAMESPACE, "td", "td");         parent.startElement(XHTML_NAMESPACE, "td", "td", noAtts);         characters(roleAtt.toCharArray(), 0, roleAtt.length());         endElement(XHTML_NAMESPACE, "td", "td");         endElement(XHTML_NAMESPACE, "tr", "tr");       }       if (arcroleAtt != null) {         String arcrole = "Arcrole: ";         parent.startElement(XHTML_NAMESPACE, "tr", "tr", noAtts);         parent.startElement(XHTML_NAMESPACE, "td", "td", noAtts);         characters(arcrole.toCharArray(), 0, arcrole.length());         endElement(XHTML_NAMESPACE, "td", "td");         parent.startElement(XHTML_NAMESPACE, "td", "td", noAtts);         characters(          arcroleAtt.toCharArray(), 0, arcroleAtt.length());         endElement(XHTML_NAMESPACE, "td", "td");         endElement(XHTML_NAMESPACE, "tr", "tr");       }       if (hrefAtt != null) {         String href="href: ";         AttributesImpl hrefAtts = new AttributesImpl();         hrefAtts.addAttribute("", "href", "href", "CDATA", href);         parent.startElement(XHTML_NAMESPACE, "tr", "tr", noAtts);         parent.startElement(XHTML_NAMESPACE, "td", "td", noAtts);         characters(href.toCharArray(), 0, href.length());         endElement(XHTML_NAMESPACE, "td", "td");         parent.startElement(XHTML_NAMESPACE, "td", "td", noAtts);         parent.startElement(XHTML_NAMESPACE, "a", "a", hrefAtts);         characters(hrefAtt.toCharArray(), 0, hrefAtt.length());         endElement(XHTML_NAMESPACE, "a", "a");         endElement(XHTML_NAMESPACE, "td", "td");         endElement(XHTML_NAMESPACE, "tr", "tr");       }       // Now open the contents of the element       parent.startElement(XHTML_NAMESPACE, "tr", "tr", noAtts);       AttributesImpl colspanAtts = new AttributesImpl();       colspanAtts.addAttribute(        "", "colspan", "colspan", "CDATA", "2");       parent.startElement(        XHTML_NAMESPACE, "td", "td", colspanAtts);     }     else {// pass the element along       parent.startElement(namespaceURI, localName, qualifiedName, atts);     }   }   // Replace </rddl:resource> end-tags with the end of a table.   public void endElement(String namespaceURI, String localName,    String qualifiedName) throws SAXException {     if (localName.equals("resource")         && RDDL_NAMESPACE.equals(namespaceURI)) {       parent.endElement(XHTML_NAMESPACE, "td", "td");       parent.endElement(XHTML_NAMESPACE, "tr", "tr");       parent.endElement(XHTML_NAMESPACE, "table", "table");     }     else {       parent.endElement(namespaceURI, localName, qualifiedName);     }   }   // Methods that pass data along unchanged:   public void startDocument() throws SAXException {     parent.startDocument();   }   public void setDocumentLocator(Locator locator) {     parent.setDocumentLocator(locator);   }   public void endDocument() throws SAXException {     parent.endDocument();   }   public void characters(char[] text, int start, int length)    throws SAXException {     parent.characters(text, start, length);   }   public void ignorableWhitespace(char[] text, int start,    int length) throws SAXException {     parent.ignorableWhitespace(text, start, length);   }   public void processingInstruction(String target, String data)    throws SAXException {     parent.processingInstruction(target, data);   }   public void startPrefixMapping(String prefix, String uri)    throws SAXException {     parent.startPrefixMapping(prefix, uri);   }   public void endPrefixMapping(String prefix)    throws SAXException {     parent.endPrefixMapping(prefix);   }   public void skippedEntity(String name)    throws SAXException {     parent.skippedEntity(name);   } } 

Figure 8.5 shows the end of the RDDL specification before this filter is applied. Figure 8.6 shows the end of the RDDL specification after this filter is applied (by passing the filtered source through an XMLWriter and saving the result in a text file). Notice that the tables have been added as specified.

Figure 8.5. The End of the RDDL Specification as Normally Presented

graphics/08fig05.gif

Figure 8.6. The End of the RDDL Specification after Small Tables Have Replaced rddl:resource Elements

graphics/08fig06.gif

Filters versus Transforms

By this point, you may have noticed that all of these operations could perhaps have been implemented more easily in XSLT. Because XSLT is Turing complete, there really isn't a lot you can do with a SAX filter that you can't do with XSLT. The reason you choose one over the other is mostly a matter of convenience for the problem at hand. One factor in SAX's favor is that it is much more efficient when the input or output can be streamed. Unlike XSLT, SAX does not need to read the entire document into memory before performing the transformation. The flip side of this is that filters only work well for localized transformations in which the output in one part of the document doesn't depend heavily on the input in another part of the document, in particular a part of the document that comes later. For these cases, XSLT is preferable.

The second big advantage to using SAX filters instead of XSLT is that SAX gives you easy access to the full power of Java. For example, a SAX filter can read a book ISBN number and look up the current price of that book at amazon.com to insert as the value of a price attribute it adds to the element. Pure XSLT can't do this. Of course, you could write an extension function in Java to add this feature to XSLT, so it really is primarily a question of which approach feels more natural to the developers within their particular system and for their particular problem.



Processing XML with Java. A Guide to SAX, DOM, JDOM, JAXP, and TrAX
Processing XML with Javaв„ў: A Guide to SAX, DOM, JDOM, JAXP, and TrAX
ISBN: 0201771861
EAN: 2147483647
Year: 2001
Pages: 191

Similar book on Amazon

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net