Navigating JDOM Trees | Processing XML with Javaв„ў: A Guide to SAX, DOM, JDOM, JAXP, and TrAX

Once you've parsed a document and formed a Document object, you'll probably want to search it to select the parts of it your program is interested in. In JDOM, most navigation takes place through the methods of the Element class. The complete children of each Element are available as a java.util.List returned by the getContent() method. Only the child elements of each Element are available as a java.util.List returned by the getChildren() method. ^[2]

^[2] Yes, the terminology is a little confusing here. This is a case in which JDOM is marching out of step with the rest of the XML world. JDOM uses the word children to refer only to child elements.

Because JDOM uses the Java Collections API to manage the tree, it is simultaneously too polymorphic (everything's an object and must be cast to the right type before you can use it) and not polymorphic enough (there's no useful generic interface or superclass for navigation, such as DOM's Node class). Consequently, you're going to find yourself doing numerous tests with instanceof and casting to the determined type. This is far and away my least-favorite part of JDOM's design. Furthermore, there's no standard traversal API as there is in DOM to help you avoid reinventing the wheel every time you need to walk a tree or iterate a document. There is a Filter interface that can simplify some of the polymorphism and casting issues a little, but it still won't let you walk more than one level down the tree at a time.

Let's begin with Example 14.9, a simple program that reads a document and prints the names of the elements in that document, nicely indented to show the hierarchy. Pay special attention to the listChildren() method. This recursive method is the key to the whole program.

Example 14.9 A JDOM Program That Lists the Elements Used in a Document

 import org.jdom.*; import org.jdom.input.SAXBuilder; import java.io.IOException; import java.util.*; public class ElementLister {   public static void main(String[] args) {     if(args.length == 0) {       System.out.println("Usage: java ElementLister URL");       return;     }     SAXBuilder builder = new SAXBuilder();     try {       Document doc = builder.build(args[0]);       Element root = doc.getRootElement();       listChildren(root, 0);     }     // indicates a well-formedness error     catch (JDOMException e) {       System.out.println(args[0] + " is not well-formed.");       System.out.println(e.getMessage());     }     catch (IOException e) {      System.out.println(e);     }   }   public static void listChildren(Element current, int depth) {     printSpaces(depth);     System.out.println(current.getName());     List children = current.getChildren();     Iterator iterator = children.iterator();     while (iterator.hasNext()) {       Element child = (Element) iterator.next();       listChildren(child, depth+1);     }   }   private static void printSpaces(int n) {     for (int i = 0; i < n; i++) {       System.out.print(' ');     }   } }

The main() method simply parses a document and passes its root element to the listChildren() method along with a depth of zero. The listChildren() method indents a number of spaces equal to the depth in the hierarchy. Then it prints the name of the current element. Next it asks for a list of the children of that element by invoking getChildren() . This returns a java.util.List from the Java Collections API. This list is live. That is, any changes you make to it will be reflected in the original Element . However, this program does not take advantage of that. Instead, it retrieves a java.util.Iterator object using the iterator() method. Then it iterates through the list. Because each item in the list is known to be a JDOM Element object, each item returned by next() can be safely cast to Element and passed recursively to listChildren() . Other than the knowledge that each object in the list is an Element , every step is just standard list iteration from the Java Collections API. Internally, JDOM is actually using a special package-private subclass of List org.jdom.ContentList but you don't need to know this. Everything you need to do can be accomplished through the documented java.util.List interface.

Following is the beginning of the output when this program is run across this chapter's source code:

 %  java ElementLister file://D/books/XMLJava/jdom.xml  chapter  title  caution   para  para  itemizedlist   listitem    para   listitem    para   listitem    para   listitem    para  para  para  caution   para  sect1   title   para   blockquote   ...

The getChildren() method only returns elements. It misses everything else completely. For instance, it doesn't report comments, processing instructions, or text nodes. To get this material, you need to use the getContent() method, which returns everything. However, this makes life a little trickier because you can no longer assume that everything in the list returned is an Element . You'll probably need to use a big tree of if (o instance of Element) {... }else if (o instance of Text) { in order to choose the processing to perform on each member of the list. Example 14.10 demonstrates with a simple program that recursively lists all of the nodes used in the document. Elements are identified by their name. All other items are identified just by their types.

Example 14.10 A JDOM Program That Lists the Nodes Used in a Document

 import org.jdom.*; import org.jdom.input.SAXBuilder; import java.io.IOException; import java.util.*; public class NodeLister {   public static void main(String[] args) {     if (args.length == 0) {       System.out.println("Usage: java NodeLister URL");       return;     }     SAXBuilder builder = new SAXBuilder();     try {       Document doc = builder.build(args[0]);       listNodes(doc, 0);     }     // indicates a well-formedness error     catch (JDOMException e) {       System.out.println(args[0] + " is not well-formed.");       System.out.println(e.getMessage());     }     catch (IOException e) {       System.out.println(e);     }   }   public static void listNodes(Object o, int depth) {     printSpaces(depth);     if (o instanceof Element) {       Element element = (Element) o;       System.out.println("Element: " + element.getName());       List children = element.getContent();       Iterator iterator = children.iterator();       while (iterator.hasNext()) {         Object child = iterator.next();         listNodes(child, depth+1);       }     }     else if (o instanceof Document) {       System.out.println("Document");       Document doc = (Document) o;       List children = doc.getContent();       Iterator iterator = children.iterator();       while (iterator.hasNext()) {         Object child = iterator.next();         listNodes(child, depth+1);       }     }     else if (o instanceof Comment) {       System.out.println("Comment");     }     else if (o instanceof CDATA) {       System.out.println("CDATA section");       // CDATA is a subclass of Text so this test must come       // before the test for Text.     }     else if (o instanceof Text) {       System.out.println("Text");     }     else if (o instanceof EntityRef) {       System.out.println("Entity reference");     }     else if (o instanceof ProcessingInstruction) {       System.out.println("Processing Instruction");     }     else { // This really shouldn't happen       System.out.println("Unexpected type: " + o.getClass());     }   }   private static void printSpaces(int n) {     for (int i = 0; i < n; i++) {       System.out.print(' ');     }   } }

Following is the beginning of the output when this program is run across this chapter's source code:

 %  java NodeLister file://D/books/XMLJava/jdom.xml  Document  Element: chapter   Text   Element: title    Text   Text   Element: caution    Text    Element: para     Text    Text   Text   Element: para   ...

The only pieces that are missing here are the attributes and namespaces associated with each element. These are not included by either getContent() or getChildren() . If you want them, you have to ask for them explicitly using the getAttributes() , getNamespace() , getAdditionalNamespaces() , and related methods of the Element class.

In the next chapter, we'll look more closely at the classes of objects that appear when you're navigating a JDOM tree ( Element , Text , Comment , and so on) and what you can learn from each one.