Receiving Elements | Processing XML with Javaв„ў: A Guide to SAX, DOM, JDOM, JAXP, and TrAX

In a very real sense, SAX reports tags, not elements. When the parser encounters a start-tag, it calls the startElement() method. When the parser encounters an end-tag, it calls the endElement() method. When the parser encounters an empty-element tag, it calls the startElement() method and then the endElement() method.

Caution

Parsers and ContentHandler s are not thread safe or reentrant. Whereas it's straightforward to design a SAX program that operates on multiple documents in series, it is almost impossible to design one that operates on multiple documents in parallel. If you need to perform XML parsing in multiple, simultaneous threads, give each thread its own XMLReader and ContentHandler objects. Similarly, if you want to parse another document from inside one of the ContentHandler methods , create a new XMLReader and a new ContentHandler object to parse it with. Do not try to reuse the existing XMLReader and ContentHandler before they've finished with the current document.

If an end-tag does not match its corresponding start-tag, then the parser throws a SAXParseException . Beyond that, however, you are responsible for tracking the hierarchy. For example, if you want to treat a params element inside a methodCall element differently from a params element inside a fault element, then you'll need to store some form of state in-between calls to the startElement() and endElement() methods. This is actually quite common. Many SAX content handlers simply build up a data structure as the document is parsed, and then operate on that data structure once the document has been read completely. Provided that the data structure is simpler than the XML document itself, this is a reasonable approach. However, in the most general case you can find yourself inventing a complete object hierarchy to represent arbitrary XML documents. In this case, you're better off using DOM or JDOM instead of SAX, because they'll do the hard work of defining and building this object hierarchy for you.

The arguments to the startElement() and endElement() methods are similar:

 public void  startElement  (String  namespaceURI,  String  localName,  String  qualifiedName,  Attributes  atts  ) throws SAXException public void  endElement  (String  namespaceURI,  String  localName,  String  qualifiedName  ) throws SAXException

The sequence is as follows :

The namespace URI is passed as a String . If the element is unqualified (that is, if it is not in a namespace), then this argument is the empty string, not null.
The local name is passed as a String . This is the part of the name after the prefix and the colon , if any. For example, whether an element is named SOAP-ENV:Body or Body with no prefix, its local name is Body .
The third argument contains the qualified name as a String . This is the entire element name including the prefix and the colon, if any. For example, if an element is named SOAP-ENV:Body , then its qualified name is SOAP-ENV:Body . However, if an element is named Body with no prefix, then its qualified name is just Body .
Finally in the startElement() method only, the set of attributes for that element is passed as a SAX-specific Attributes object. I'll discuss this in the next section.

As an example I'm going to build a GUI representation of the tree structure of an XML document that allows you to collapse and expand the individual elements. The GUI parts will be provided by a javax.swing.JTree . The tree will be filled in startElement() and displayed in a window in endDocument() . Example 6.7 shows how.

Example 6.7 A ContentHandler Class That Builds a GUI Representation of an XML Document

 import org.xml.sax.*; import org.xml.sax.helpers.*; import javax.swing.*; import javax.swing.tree.*; import java.util.*; public class TreeViewer extends DefaultHandler {   private Stack nodes;   // Initialize the per-document data structures   public void startDocument() throws SAXException {     // The stack needs to be reinitialized for each document     // because an exception might have interrupted parsing of a     // previous document, leaving an unempty stack.     nodes = new Stack();   }   // Make sure we always have the root element   private TreeNode root;   // Initialize the per-element data structures   public void startElement(String namespaceURI, String localName,    String qualifiedName, Attributes atts) {     String data;     if (namespaceURI.equals("")) data = localName;     else {       data = '{' + namespaceURI + "}" + qualifiedName;     }     MutableTreeNode node = new DefaultMutableTreeNode(data);     try {       MutableTreeNode parent = (MutableTreeNode) nodes.peek();       parent.insert(node, parent.getChildCount());     }     catch (EmptyStackException e) {       root = node;     }     nodes.push(node);   }   public void endElement(String namespaceURI, String localName,    String qualifiedName) {     nodes.pop();   }   // Flush and commit the per-document data structures   public void endDocument() {     JTree tree           = new JTree(root);     JScrollPane treeView = new JScrollPane(tree);     JFrame f             = new JFrame("XML Tree");     f.setDefaultCloseOperation(JFrame.EXIT_ON_CLOSE);     f.getContentPane().add(treeView);     f.pack();     f.show();   }   public static void main(String[] args) {   try {       XMLReader parser = XMLReaderFactory.createXMLReader(         "org.apache.xerces.parsers.SAXParser"       );       ContentHandler handler = new TreeViewer();       parser.setContentHandler(handler);       for (int i = 0; i < args.length; i++) {         parser.parse(args[i]);       }     }     catch (Exception e) {       System.err.println(e);     }   }  // end main() }// end TreeViewer

The JTree class provides a ready-made data structure for this program. We just have to fill it. In doing so, we need to track where we are in the XML hierarchy at all times so that the parent to which the current node will be added is accessible. For this purpose a stack is very helpful. The parent element can be pushed onto the stack in startElement() and popped off the stack in endElement() . Because SAX's beginning-to-end parsing of an XML document equates to a depth-first tree traversal, the top element in the stack always contains the most recently visited element.

I find stacks like this to be very useful in many SAX programs. More complex programs may need to build more complicated tree or object structures. If your purpose is not simply to display a GUI for the tree, then you should probably roll your own tree structure rather than using JTree as I've done here.

Note

TreeViewer runs with the default distribution of Java 1.2 and later. It can run with Java 1.1, but you'll need to make sure the swingall.jar archive is somewhere in your class path . The javax.swing classes used here are not bundled with the JDK 1.1.

Figure 6.1 shows this program displaying Example 1.7 from Chapter 1. Swing allows individual parts of the tree to be collapsed or expanded, but the entire element tree is always present even if it's hidden. JTree also allows you to customize the icons used, and even enable the user to edit the tree. But that's purely Swing programming and says little to nothing about XML, so I'll leave that as an exercise for the reader.

Figure 6.1. The Swing-Based TreeViewer

graphics/06fig01.gif

Caution

This makes a nice little example, but please don't regard it as more than that. The tantalizing ease of representing XML documents with widgets like java.swing.JTree and similar features in Windows, Motif, and other GUIs has spawned a lot of editors and browsers that use these tree models as user interfaces. However, not a lot of thought went into whether users actually thought of XML documents this way or could be quickly trained to do so.

In actual practice, user interfaces of this sort have failed spectacularly. A good user interface for XML editors and viewers more closely resembles the user interfaces people are accustomed to from traditional programs, such as Microsoft Word, Netscape Navigator, and Adobe Illustrator. The whole point of a GUI is that it can decouple the user interface from the underlying data model. Just because an XML document is a tree is no excuse for making users edit trees when they don't want to.