Manipulating XML Documents Using the Document Object Model (DOM)

	Java™ 2 Primer Plus By Steven Haines, Steve Potts
	Table of Contents

	Chapter 25. XML

The Document Object Model presents a tree representation of an XML document through a standards-based API maintained by the World Wide Web Consortium (W3C). The DOM is better suited for applications that want to read information from an XML file, manipulate the XML data, and then eventually write the data out to a destination. Additionally, it offers the benefit over the SAX that the document can be accessed randomly (the program does not have to read through the document sequentially).

The DOM does have a serious limitation in the fact that is must read the entire XML document into memory. This isn't a problem for the small example we've been working through in the chapter, but in some real-world applications, such as B2B e-commerce, XML files can get considerably larger (in B2B e-commerce applications that mapped EDI to XML we saw XML files that were over 100MB in size). But, if your application's architecture can make use of the DOM, it does not require all the code that we saw to build an in-memory representation of an XML file using the SAX.

Building an XML Tree in Memory

Building a DOM XML tree in memory is actually a fairly trivial task. Similar to the SAX model, the DOM model has a factory that creates parsers: javax.xml.parsers.DocumentBuilderFactory. This class enables the programmer to define the parser parameters that he would like the parser to present (see the javadoc documentation for more information). A DocumentBuilderFactory can be obtained by calling its static newInstance() method. After the DocumentBuilderFactory is configured, its newDocumentBuilder() method will return an instance of the javax.xml.parsers.DocumentBuilder class this is the DOM parser. The DocumentBuilder class has several parse() methods that will construct a DOM document, shown in Table 25.3.

Table 25.3. `DocumentBuilder parse()` Methods
Method	Description
`Document parse(java.io.File f)`	Parse the content of the given file as an XML document and return a new DOM Document object.
`Document parse(org.xml.sax.InputSource is)`	Parse the content of the given input source as an XML document and return a new DOM Document object.
`Document parse(java.io.InputStream is)`	Parse the content of the given `InputStream` as an XML document and return a new DOM Document object.
`Document parse(java.io.InputStream is, String systemId)`	Parse the content of the given `InputStream` as an XML document and return a new DOM Document object.
`Document parse(String uri)`	Parse the content of the given URI as an XML document and return a new DOM Document object.

These parse() methods are very similar to their SAX counterparts, but the bottom line is that they build an org.w3c.dom.Document object from varying input sources. For example, consider building a DOM document from a file:

 File f = new File( "myfile.xml" );  DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();  DocumentBuilder builder = factory.newDocumentBuilder();  Document doc = builder.parse( f );

Reading from the XML Tree

A DOM tree is an in-memory representation of an XML file; and just as it sounds it is in the form of a tree. Figure 25.2 shows how the book.xml file might appear in a DOM.

Figure 25.2. A sample DOM tree diagram.

graphics/25fig02.gif

Observe from Figure 25.2 that the <books> root element contains two <book> elements. Each <book> element has an attribute node named category, and has three elements: <author>, <title>, and <price>. Each of these elements has a child text node that contains the element's value.

After a Document object is obtained from the DocumentBuilder, its root node can be accessed through the DocumentBuilder's getDocumentElement() method; this method returns an instance of a class implementing the org.w3c.com.Element interface. Table 25.4 describes some of the more useful retrieval methods of the Document class.

Table 25.4. Document Retrieval Methods
Method	Description
`DocumentType getDoctype()`	The Document Type Declaration (see `org.w3c.dom.DocumentType`) associated with this document.
`Element getDocumentElement()`	This is a convenience attribute that allows direct access to the child node that is the root element of the document.
`NodeList getElementsByTagName(String tagname)`	Returns a `NodeList` of all the Elements with a given tag name in the order in which they are encountered in a preorder traversal of the Document tree.

The org.w3x.dom.Element interface defines a set of methods for traversing the DOM tree. Table 25.5 shows some of the more useful Element retrieval methods.

Table 25.5. Element Retrieval Methods
Method	Description
`String getAttribute(String name)`	Retrieves an attribute value by name
`Attr getAttributeNode(String name)`	Retrieves an attribute node by name
`NodeList getElementsByTagName(String name)`	Returns a `NodeList` of all descendant Elements with a given tag name in the order in which they are encountered in a preorder traversal of this Element tree
`String getTagName()`	The name of the element

The Element interface is derived from the org.w3c.dom.Node interface that offers the addition methods shown in Table 25.6.

Table 25.6. Element Retrieval Methods
Method	Description
`NamedNodeMap getAttributes()`	A `NamedNodeMap` containing the attributes of this node (if it is an Element) or null otherwise
`NodeList getChildNodes()`	A `NodeList` that contains all children of this node
`Node getFirstChild()`	The first child of this node
`Node getLastChild()`	The last child of this node
`String getNodeName()`	The name of this node, depending on its type
`short getNodeType()`	A code representing the type of the underlying object
`String getNodeValue()`	The value of this node, depending on its type

Together, the Element methods and the Node methods offer the capability to discover the node's type (element node, text node, attribute node, and so on see the javadoc for the org.w3c.dom.Node interface), retrieve a list of child nodes, retrieve a list of attributes, retrieve the name of the node, and retrieve its value. Listing 25.7 shows a sample application that reads the same book.xml file into a DOM object and traverses it using the methods just discussed.

Listing 25.7 `DOMSample.java`

 001:import javax.xml.parsers.DocumentBuilder;  002:import javax.xml.parsers.DocumentBuilderFactory;   003:import javax.xml.parsers.FactoryConfigurationError;  004:import javax.xml.parsers.ParserConfigurationException;  005:  006:import org.xml.sax.SAXException;  007:import org.xml.sax.SAXParseException;  008:  009:import java.io.FileInputStream;  010:import java.io.File;  011:import java.io.IOException;  012:  013:import org.w3c.dom.Document;  014:import org.w3c.dom.Element;  015:import org.w3c.dom.NodeList;  016:import org.w3c.dom.Node;  017:import org.w3c.dom.DOMException;  018:  019:public class DOMSample  020:{  021:    public static void main( String[] args )  022:    {  023:        try  024:        {  025:            File file = new File( "book.xml" );   026:            if( !file.exists() )  027:            {   028:                System.out.println( "Couldn't find file..." );  029:                return;  030:            }  031:  032:            // Parse the document  033:            DocumentBuilderFactory factory =                              DocumentBuilderFactory.newInstance();  034:            DocumentBuilder builder = factory.newDocumentBuilder();  035:            Document document = builder.parse( file );  036:  037:            // Walk the document  038:            Element root = document.getDocumentElement();  039:            System.out.println( "root=" + root.getTagName() );  040:  041:            // List the children of <books>; a set of <book> elements  042:            NodeList list = root.getChildNodes();  043:            for( int i=0; i<list.getLength(); i++ )  044:            {  045:                Node node = list.item( i );  046:                if( node.getNodeType() == node.ELEMENT_NODE )   047:                {  048:            // Found a <book> element  049:            System.out.println( "Handling node: " + node.getNodeName() );  050:            Element element = ( Element )node;  051:            System.out.println( "\tCategory Attribute: " +                           element.getAttribute( "category" ) );  052:  053:            // Get its children: <author>, <title>, <price>  054:            NodeList childList = element.getChildNodes();  055:               for( int j=0; j<childList.getLength(); j++ )  056:               {  057:            // Once we have one of these nodes we need to find its  058:            // text element  059:                  Node childNode = childList.item( j );  060:            if( childNode.getNodeType() == childNode.ELEMENT_NODE )  061:            {  062:                NodeList childNodeList = childNode.getChildNodes();  063:                for( int k=0; k<childNodeList.getLength(); k++ )  064:                {  065:                Node innerChildNode = childNodeList.item( k );  066:                System.out.println( "\t\tNode=" +                                      innerChildNode.getNodeValue() );   067:                }  068:            }  069:               }  070:             }  071:          }  072:        } catch( Exception e )  073:        {  074:            e.printStackTrace();  075:        }  076:    }  077:}

Lines 25 30 obtain a reference to the book.xml file and verify that it exists.

 033:            DocumentBuilderFactory factory =  DocumentBuilderFactory.newInstance();  034:            DocumentBuilder builder = factory.newDocumentBuilder();  035:            Document document = builder.parse( file );

Lines 33 35 use the DocumentBuilderFactory to obtain a DocumentBuilder, and then have the DocumentBuilder parse the XML file. The return value is an instance a class implementing the org.w3c.dom.Document interface.

 038:            Element root = document.getDocumentElement();

Line 38 gets the root element of the document as an org.w3c.dom.Element; this is the <books> element.

 042:            NodeList list = root.getChildNodes();

Line 42 gets all the child nodes of the <books> element; this contains the <book> elements as well as a set of empty text elements (which we ignore). The result is an instance of a class implementing the org.w3c.dom.NodeList interface. This interface defines two methods, as shown in Table 25.7.

Table 25.7. `NodeList` Methods
Method	Description
`int getLength()`	The number of nodes in the list
`Node item(int index)`	Returns the item at the "index" zero-based index in the collection

Line 46 takes the node from the child-node list and determines if it is an element node (in this case it is the <book> node). If it is then lines 50 51 extract the category attribute.

Lines 53 69 iterate over all the <book> nodes children: <author>, <title>, and <price>. Remember that these nodes do not contain the value of the aforementioned tags, but they instead contain text nodes that contain the values, which is extracted in lines 59 67.

As you experiment with the DOM, you will notice that every node that has children also has a set of text nodes containing only white space. The reason for this is that unless the elements in the document do not have spaces between them, the DOM builds a node to hold them. The example carefully avoided printing blank text nodes because it understood the nature of the XML document. The results of this sample application should resemble the following:

 root=books  Handling node: book          Category Attribute: fiction                  Node=Left Behind                  Node=Tim Lahaye                  Node=14.95  Handling node: book          Category Attribute: Computer Programming                  Node=Java 2 From Scratch                  Node=Steven Haines                  Node=39.95

Outputting the XML Tree

After all the work involved in parsing through the DOM tree, outputting the DOM tree to a stream is a very simple thing. The DOM implementation of the Element interface in the JAXP has overridden the toString() method to display the tree in XML form. Therefore, to display the entire DOM tree to the screen:

 System.out.println( root );

Manipulating the XML Tree

After a DOM tree is constructed in memory, you might want to modify the tree by adding nodes, changing values, or deleting nodes.

Adding an Element to the XML Tree

When adding nodes to a DOM tree recall the internal structure of the DOM tree (refer to Figure 25.2). An element in the DOM tree is a node that is the composition of other nodes, including attribute nodes, text nodes, and other element nodes. Therefore, to add a new <book> to the <books> root element node, <author>, <title>, and <price> nodes must be created that contain text nodes with their respective values and added to a new <book> node.

The Document class offers some helpful methods that create new nodes, see Table 25.8.

Table 25.8. Document Node Creation Methods
Method	Description
`Attr createAttribute(String name)`	Creates an `Attr` of the given name
`CDATASection createCDATASection(String data)`	Creates a `CDATASection` node whose value is the specified string
`Comment createComment(String data)`	Creates a Comment node given the specified string
`DocumentFragment createDocumentFragment()`	Creates an empty `DocumentFragment` object
`Element createElement(String tagName)`	Creates an element of the type specified
`EntityReference createEntityReference(String name)`	Creates an `EntityReference` object
`ProcessingInstruction createProcessingInstruction(String target, String data)`	Creates a `ProcessingInstruction` node given the specified name and data strings
`Text createTextNode(String data)`	Creates a Text node given the specified string

The Element and Node interfaces offer additional help in document creation through the methods described in Table 25.9.

Table 25.9. Element and Node Document Creation Methods
Method	Description
`Node appendChild(Node newChild)`	Adds the node `newChild` to the end of the list of children of this node
`void setAttribute(String name, String value)`	Adds a new attribute
`Attr setAttributeNode(Attr newAttr)`	Adds a new attribute node

Listing 25.8 shows a sample application that adds a new book to the DOM, and then displays the new document to the standard output.

Listing 25.8 `DOMSample2.java`

 001:import javax.xml.parsers.DocumentBuilder;  002:import javax.xml.parsers.DocumentBuilderFactory;  003:import javax.xml.parsers.FactoryConfigurationError;  004:import javax.xml.parsers.ParserConfigurationException;   005:  006:import org.xml.sax.SAXException;  007:import org.xml.sax.SAXParseException;  008:  009:import java.io.FileInputStream;  010:import java.io.File;  011:import java.io.IOException;  012:  013:import org.w3c.dom.Document;  014:import org.w3c.dom.Element;  015:import org.w3c.dom.NodeList;  016:import org.w3c.dom.Node;  017:import org.w3c.dom.DOMException;  018:import org.w3c.dom.Text;   019:  020:public class DOMSample2  021:{  022:    public static void main( String[] args )  023:    {  024:        try  025:        {  026:            File file = new File( "book.xml" );   027:            if( !file.exists() )  028:            {  029:                System.out.println( "Couldn't find file..." );  030:                return;  031:            }  032:  033:            // Parse the document  034:            DocumentBuilderFactory factory =  DocumentBuilderFactory.newInstance();  035:            DocumentBuilder builder = factory.newDocumentBuilder();  036:            Document document = builder.parse( file );  037:  038:            // Get the root of the document  039:            Element root = document.getDocumentElement();  040:  041:        // Build a new book  042:        Element newAuthor = document.createElement( "author" );  043:        Text authorText = document.createTextNode( "Tim Lahaye" );  044:        newAuthor.appendChild( authorText );  045:        Element newTitle = document.createElement( "title" );  046:        Text titleText = document.createTextNode( "Desecration" );  047:        newTitle.appendChild( titleText );  048:        Element newPrice = document.createElement( "price" );  049:        Text priceText = document.createTextNode( "19.95" );  050:        newPrice.appendChild( priceText );  051:        Element newBook = document.createElement( "book" );   052:        newBook.setAttribute( "category", "fiction" );  053:        newBook.appendChild( newAuthor );  054:        newBook.appendChild( newTitle );  055:        newBook.appendChild( newPrice );  056:  057:        // Add the book to the root  058:        root.appendChild( newBook );  059:  060:        // Display the document  061:        System.out.println( root );  062:        } catch( Exception e )  063:        {  064:            e.printStackTrace();  065:        }  066:    }  067:}

The output from Listing 25.8 should appear similar to the following:

 <books>      <book category="fiction">          <title>Left Behind</title>          <author>Tim Lahaye</author>          <price>14.95</price>      </book>      <book category="Computer Programming">          <title>Java 2 From Scratch</title>          <author>Steven Haines</author>          <price>39.95</price>      </book>  <book category="fiction"><author>Tim  Lahaye</author><title>Desecration</title><price>19.95</price></book></books>

Notice that the XML content is correct, but why is the added node formatted so poorly? Remember all those aforementioned text nodes? Without those nodes this is how the output looks.

Removing an Element from the XML Tree

Removing an element from the DOM tree is a simple operation:

Obtain a reference to the node to delete
Call the Node interface's removeChild( Node node ) method from the parent node, passing it the child to delete

For example:

 // Get document root into variable books  // Find the book we are looking for: oldBook  books.removeChild( oldBook );

Modifying an Element in the XML Tree

Modifying an element involves retrieving the element and calling one of the Element or Node document modification methods, see Table 25.10.

Table 25.10. Element and Node Document Modification Methods
Method	Description
`void removeAttribute(String name)`	Removes an attribute by name
`Attr removeAttributeNode(Attr oldAttr)`	Removes the specified attribute node
`void setAttribute(String name, String value)`	Adds a new attribute
`Attr setAttributeNode(Attr newAttr)`	Adds a new attribute node
`void setNodeValue(String nodeValue)`	Sets the value of a node

From the Element interface, there are methods to remove and set attributes and the Node interface offers a method that sets or replaces the value of a node.

Top

Manipulating XML Documents Using the Document Object Model (DOM)

Building an XML Tree in Memory

Table 25.3. `DocumentBuilder parse()` Methods

Reading from the XML Tree

Figure 25.2. A sample DOM tree diagram.

Table 25.4. Document Retrieval Methods

Table 25.5. Element Retrieval Methods

Table 25.6. Element Retrieval Methods

Listing 25.7 `DOMSample.java`

Table 25.7. `NodeList` Methods

Outputting the XML Tree

Manipulating the XML Tree

Adding an Element to the XML Tree

Table 25.8. Document Node Creation Methods

Table 25.9. Element and Node Document Creation Methods

Listing 25.8 `DOMSample2.java`

Removing an Element from the XML Tree

Modifying an Element in the XML Tree

Table 25.10. Element and Node Document Modification Methods

Manipulating XML Documents Using the Document Object Model (DOM)

Building an XML Tree in Memory

Table 25.3. DocumentBuilder parse() Methods

Reading from the XML Tree

Figure 25.2. A sample DOM tree diagram.

Table 25.4. Document Retrieval Methods

Table 25.5. Element Retrieval Methods

Table 25.6. Element Retrieval Methods

Listing 25.7 DOMSample.java

Table 25.7. NodeList Methods

Outputting the XML Tree

Manipulating the XML Tree

Adding an Element to the XML Tree

Table 25.8. Document Node Creation Methods

Table 25.9. Element and Node Document Creation Methods

Listing 25.8 DOMSample2.java

Removing an Element from the XML Tree

Modifying an Element in the XML Tree

Table 25.10. Element and Node Document Modification Methods

Table 25.3. `DocumentBuilder parse()` Methods

Listing 25.7 `DOMSample.java`

Table 25.7. `NodeList` Methods

Listing 25.8 `DOMSample2.java`