Section 24.4. DOM | Learning Java

24.4. DOM

In the last section, we used SAX to parse an XML document and build a Java object model representing it. In that case, we created specific Java types for each of our complex elements. If we were planning to use our model extensively in an application, this technique would give us a great deal of flexibility. But often it is sufficient (and much easier) to use a "generic" model that simply represents the content of the XML in a neutral form. The Document Object Model (DOM) is just that. The DOM API parses an XML document into a full, memory-resident representation consisting of classes such as Element and Attribute that hold their own values.

As we saw in our zoo example, once you have an object model, using the data is a breeze. So a generic DOM would seem like an appealing solution, especially when working mainly with text. The only catch in this case is that DOM didn't evolve first as a Java API, and it doesn't map well to Java. DOM is very complete and provides access to every facet of the original XML document, but it's so generic (and language-neutral) that it's cumbersome to use in Java. In our example, we'll start by making a couple of helper methods to smooth things over. Later, we'll also mention a native Java alternative to DOM called JDOM that is more pleasant to use.

24.4.1. The DOM API

The core DOM classes belong to the org.w3c.dom package. The result of parsing an XML document with DOM is a Document object from this package (see Figure 24-1). The Document is a factory and a container for a hierarchical collection of Node objects, representing the document structure. A node has a parent and may have children, which can be traversed using its getChildNodes( ), getFirstChild( ), or getLastChild( ) methods. A node may also have "attributes" associated with it, which consist of a named map of nodes.

Figure 24-1. The parsed DOM

Subtypes of NodeElement, Text, and Attrrepresent elements, text, and attributes in XML. Some types of nodes (including these) have a text "value." For example, the value of a Text node is the text of the element it represents. The same is true of an attribute, cdata, or comment node. The value of a node can be accessed by the getNodeValue( ) and setNodeValue( ) methods.

The Element node provides "random" access to its child elements through its getElementsByTagName( ) method, which returns a NodeList (a simple collection type). You can also fetch an attribute by name from the Element using the getAttribute( ) method.

The javax.xml.parsers package contains a factory for DOM parsers, just as it does for SAX parsers. An instance of DocumentBuilderFactory can be used to create a DocumentBuilder object to parse the file and produce a Document result.

24.4.2. Test-Driving DOM

Let's use DOM to parse our zoo inventory and print the same information as our model-builder example. Using DOM saves us from having to create all those model classes and makes our example much shorter. But before we even begin, we're going to make a couple of utility methods to save us a great deal of pain. The following class, DOMUtil, covers two very common operations on an element: retrieving a simple (singular) child element by name and retrieving the text of a simple child element by name. Here is the code:

         import org.w3c.dom.*;         public class DOMUtil         {            public static Element getFirstElement( Element element, String name ) {               NodeList nl = element.getElementsByTagName( name );               if ( nl.getLength(  ) < 1 )                  throw new RuntimeException(                     "Element: "+element+" does not contain: "+name);               return (Element)nl.item(0);            }            public static String getSimpleElementText( Element node, String name )            {               Element namedElement = getFirstElement( node, name );               return getSimpleElementText( namedElement );            }            public static String getSimpleElementText( Element node )            {               StringBuffer sb = new StringBuffer(  );               NodeList children = node.getChildNodes(  );               for(int i=0; i<children.getLength(  ); i++) {                  Node child = children.item(i);                  if ( child instanceof Text )                     sb.append( child.getNodeValue(  ) );               }               return sb.toString(  );            }         }

With that out of the way, we can present our TestDOM class:

         mport javax.xml.parsers.*;         import org.w3c.dom.*;         public class TestDOM         {            public static void main( String [] args ) throws Exception            {               DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance(  );               DocumentBuilder parser = factory.newDocumentBuilder(  );               Document document = parser.parse( "zooinventory.xml" );               Element inventory = document.getDocumentElement(  );               NodeList animals = inventory.getElementsByTagName("Animal");               System.out.println("Animals = ");               for( int i=0; i<animals.getLength(  ); i++ ) {                  String name = DOMUtil.getSimpleElementText(                     (Element)animals.item(i),"Name" );                  String species = DOMUtil.getSimpleElementText(                     (Element)animals.item(i), "Species" );                  System.out.println( "  "+ name +" ("+species+")" );               }               Element foodRecipe = DOMUtil.getFirstElement(                  (Element)animals.item(1), "FoodRecipe" );               String name = DOMUtil.getSimpleElementText( foodRecipe, "Name" );               System.out.println("Recipe = " + name );               NodeList ingredients = foodRecipe.getElementsByTagName("Ingredient");               for(int i=0; i<ingredients.getLength(  ); i++)                  System.out.println( "  " + DOMUtil.getSimpleElementText(                     (Element)ingredients.item(i) ) );            }         }

TestDOM creates an instance of a DocumentBuilder and uses it to parse our zooinventory.xml file. We use the Document getdocumentElement( ) method to get the root element of the document, from which we will begin our traversal. From there, we ask for all the Animal child nodes. The getElementbyTagName( ) method returns a NodeList object, which we then use to iterate through our creatures. For each animal, we use our DOMUtil.getSimpleElementText( ) method to retrieve the basic name and species information. Next, we use the DOMUtil.getFirstElement( ) method to retrieve the element called FoodRecipe from the second animal. We use it to fetch a NodeList for the tags matching Ingredient and print them as before. The output should contain the same information as our SAX-based example. But as you can see, the tradeoff in not having to create our own model classes is that we have to suffer through the use of the generic model and produce code that is harder to read.

24.4.3. Generating XML with DOM

Thus far, we've used the SAX and DOM APIs to parse XML. But what about generating XML? Sure, it's easy to generate trivial XML documents simply by emitting the appropriate strings. But if we plan to create a complex document on the fly, we might want some help with all those quotes and closing tags. We may also want to validate our model against an XML DTD or Schema before writing it out. What we can do is to build a DOM representation of our object in memory and then transform it to text. This is also useful if we want to read a document and then make some alterations to it. To do this, we'll use of the java.xml.transform package. This package does a lot more than just printing XML. As its name implies, it's part of a general transformation facility. It includes the XSL/XSLT languages for generating one XML document from another. (We'll talk about XSL later in this chapter.)

We won't discuss the details of constructing a DOM in memory here, but it follows fairly naturally from what you've learned about traversing the tree in our previous example. The following example, PrintDOM, simply parses our zooinventory.xml file to a DOM and then prints it back to the screen:

         import javax.xml.parsers.*;         import org.w3c.dom.*;         import javax.xml.transform.*;         import javax.xml.transform.dom.DOMSource;         import javax.xml.transform.stream.StreamResult;         public class PrintDOM {            public static void main( String [] args ) throws Exception            {               DocumentBuilder parser =                  DocumentBuilderFactory.newInstance(  ).newDocumentBuilder(  );               Document document=parser.parse( "zooinventory.xml" );               Transformer transformer =                  TransformerFactory.newInstance(  ).newTransformer(  );               Source source = new DOMSource( document );               Result output = new StreamResult( System.out );               transformer.transform( source, output );            }         }

Note that the imports are almost as long as the entire program! Here, we are using an instance of a TRansformer object in its simplest capacity to copy from a source to an output. We'll return to the TRansformer later when we discuss XSL, at which point, it will be doing a lot more work for us.

24.4.4. JDOM

As we promised earlier, we'll now describe an easier DOM API: JDOM, created by Jason Hunter and Brett McLaughlin, two fellow O'Reilly authors (Java Servlet Programming and Java and XML, respectively). It is a more natural Java DOM that uses real Java collection types such as List for its hierarchy and provides more streamlined methods for building documents. You can get the latest JDOM from http://www.jdom.org/. Here's the JDOM version of our standard "test" program:

         import org.jdom.*;         import org.jdom.input.*;         import org.jdom.output.*;         import java.util.*;         public class TestJDOM {            public static void main( String[] args ) throws Exception {               Document doc = new SAXBuilder(  ).build("zooinventory.xml");               List animals = doc.getRootElement(  ).getChildren("Animal");               System.out.println("Animals = ");               for( int i=0; i<animals.size(  ); i++ ) {                  String name = ((Element)animals.get(i)).getChildText("Name");                  String species = ((Element)animals.get(i)).getChildText("Species");                  System.out.println( "  "+ name +" ("+species+")" );               }               Element foodRecipe = ((Element)animals.get(1)).getChild("FoodRecipe");               String name = foodRecipe.getChildText("Name");               System.out.println("Recipe = " + name );               List ingredients = foodRecipe.getChildren("Ingredient");               for(int i=0; i<ingredients.size(  ); i++)                  System.out.println( "  "+((Element)ingredients.get(i)).getText(  ) );            }         }

JDOM has its own convenience methods that take the place of our homemade DOM helper methods. Namely, the JDOM Element class has getChild( ) and getChildren( ) methods as well as a getChildText( ) method for retrieving node text.

Now that we've covered the basics of SAX and DOM, we're going to look at a new API that, in a sense, straddles the two. XPath allows us to target only the parts of a document that we want and gives us the option of getting at those components in DOM form.