The Document Object Model presents a tree representation of an XML document through a standards-based API maintained by the World Wide Web Consortium (W3C). The DOM is better suited for applications that want to read information from an XML file, manipulate the XML data, and then eventually write the data out to a destination. Additionally, it offers the benefit over the SAX that the document can be accessed randomly (the program does not have to read through the document sequentially). The DOM does have a serious limitation in the fact that is must read the entire XML document into memory. This isn't a problem for the small example we've been working through in the chapter, but in some real-world applications, such as B2B e-commerce, XML files can get considerably larger (in B2B e-commerce applications that mapped EDI to XML we saw XML files that were over 100MB in size). But, if your application's architecture can make use of the DOM, it does not require all the code that we saw to build an in-memory representation of an XML file using the SAX. Building an XML Tree in Memory Building a DOM XML tree in memory is actually a fairly trivial task. Similar to the SAX model, the DOM model has a factory that creates parsers: javax.xml.parsers.DocumentBuilderFactory. This class enables the programmer to define the parser parameters that he would like the parser to present (see the javadoc documentation for more information). A DocumentBuilderFactory can be obtained by calling its static newInstance() method. After the DocumentBuilderFactory is configured, its newDocumentBuilder() method will return an instance of the javax.xml.parsers.DocumentBuilder class this is the DOM parser. The DocumentBuilder class has several parse() methods that will construct a DOM document, shown in Table 25.3. Table 25.3. DocumentBuilder parse() Methods Method | Description |
---|
Document parse(java.io.File f) | Parse the content of the given file as an XML document and return a new DOM Document object. | Document parse(org.xml.sax.InputSource is) | Parse the content of the given input source as an XML document and return a new DOM Document object. | Document parse(java.io.InputStream is) | Parse the content of the given InputStream as an XML document and return a new DOM Document object. | Document parse(java.io.InputStream is, String systemId) | Parse the content of the given InputStream as an XML document and return a new DOM Document object. | Document parse(String uri) | Parse the content of the given URI as an XML document and return a new DOM Document object. | These parse() methods are very similar to their SAX counterparts, but the bottom line is that they build an org.w3c.dom.Document object from varying input sources. For example, consider building a DOM document from a file: File f = new File( "myfile.xml" ); DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance(); DocumentBuilder builder = factory.newDocumentBuilder(); Document doc = builder.parse( f ); Reading from the XML Tree A DOM tree is an in-memory representation of an XML file; and just as it sounds it is in the form of a tree. Figure 25.2 shows how the book.xml file might appear in a DOM. Figure 25.2. A sample DOM tree diagram. Observe from Figure 25.2 that the <books> root element contains two <book> elements. Each <book> element has an attribute node named category, and has three elements: <author>, <title>, and <price>. Each of these elements has a child text node that contains the element's value. After a Document object is obtained from the DocumentBuilder, its root node can be accessed through the DocumentBuilder's getDocumentElement() method; this method returns an instance of a class implementing the org.w3c.com.Element interface. Table 25.4 describes some of the more useful retrieval methods of the Document class. Table 25.4. Document Retrieval Methods Method | Description |
---|
DocumentType getDoctype() | The Document Type Declaration (see org.w3c.dom.DocumentType) associated with this document. | Element getDocumentElement() | This is a convenience attribute that allows direct access to the child node that is the root element of the document. | NodeList getElementsByTagName(String tagname) | Returns a NodeList of all the Elements with a given tag name in the order in which they are encountered in a preorder traversal of the Document tree. | The org.w3x.dom.Element interface defines a set of methods for traversing the DOM tree. Table 25.5 shows some of the more useful Element retrieval methods. Table 25.5. Element Retrieval Methods Method | Description |
---|
String getAttribute(String name) | Retrieves an attribute value by name | Attr getAttributeNode(String name) | Retrieves an attribute node by name | NodeList getElementsByTagName(String name) | Returns a NodeList of all descendant Elements with a given tag name in the order in which they are encountered in a preorder traversal of this Element tree | String getTagName() | The name of the element | The Element interface is derived from the org.w3c.dom.Node interface that offers the addition methods shown in Table 25.6. Table 25.6. Element Retrieval Methods Method | Description |
---|
NamedNodeMap getAttributes() | A NamedNodeMap containing the attributes of this node (if it is an Element) or null otherwise | NodeList getChildNodes() | A NodeList that contains all children of this node | Node getFirstChild() | The first child of this node | Node getLastChild() | The last child of this node | String getNodeName() | The name of this node, depending on its type | short getNodeType() | A code representing the type of the underlying object | String getNodeValue() | The value of this node, depending on its type | Together, the Element methods and the Node methods offer the capability to discover the node's type (element node, text node, attribute node, and so on see the javadoc for the org.w3c.dom.Node interface), retrieve a list of child nodes, retrieve a list of attributes, retrieve the name of the node, and retrieve its value. Listing 25.7 shows a sample application that reads the same book.xml file into a DOM object and traverses it using the methods just discussed. Listing 25.7 DOMSample.java 001:import javax.xml.parsers.DocumentBuilder; 002:import javax.xml.parsers.DocumentBuilderFactory; 003:import javax.xml.parsers.FactoryConfigurationError; 004:import javax.xml.parsers.ParserConfigurationException; 005: 006:import org.xml.sax.SAXException; 007:import org.xml.sax.SAXParseException; 008: 009:import java.io.FileInputStream; 010:import java.io.File; 011:import java.io.IOException; 012: 013:import org.w3c.dom.Document; 014:import org.w3c.dom.Element; 015:import org.w3c.dom.NodeList; 016:import org.w3c.dom.Node; 017:import org.w3c.dom.DOMException; 018: 019:public class DOMSample 020:{ 021: public static void main( String[] args ) 022: { 023: try 024: { 025: File file = new File( "book.xml" ); 026: if( !file.exists() ) 027: { 028: System.out.println( "Couldn't find file..." ); 029: return; 030: } 031: 032: // Parse the document 033: DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance(); 034: DocumentBuilder builder = factory.newDocumentBuilder(); 035: Document document = builder.parse( file ); 036: 037: // Walk the document 038: Element root = document.getDocumentElement(); 039: System.out.println( "root=" + root.getTagName() ); 040: 041: // List the children of <books>; a set of <book> elements 042: NodeList list = root.getChildNodes(); 043: for( int i=0; i<list.getLength(); i++ ) 044: { 045: Node node = list.item( i ); 046: if( node.getNodeType() == node.ELEMENT_NODE ) 047: { 048: // Found a <book> element 049: System.out.println( "Handling node: " + node.getNodeName() ); 050: Element element = ( Element )node; 051: System.out.println( "\tCategory Attribute: " + element.getAttribute( "category" ) ); 052: 053: // Get its children: <author>, <title>, <price> 054: NodeList childList = element.getChildNodes(); 055: for( int j=0; j<childList.getLength(); j++ ) 056: { 057: // Once we have one of these nodes we need to find its 058: // text element 059: Node childNode = childList.item( j ); 060: if( childNode.getNodeType() == childNode.ELEMENT_NODE ) 061: { 062: NodeList childNodeList = childNode.getChildNodes(); 063: for( int k=0; k<childNodeList.getLength(); k++ ) 064: { 065: Node innerChildNode = childNodeList.item( k ); 066: System.out.println( "\t\tNode=" + innerChildNode.getNodeValue() ); 067: } 068: } 069: } 070: } 071: } 072: } catch( Exception e ) 073: { 074: e.printStackTrace(); 075: } 076: } 077:} Lines 25 30 obtain a reference to the book.xml file and verify that it exists. 033: DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance(); 034: DocumentBuilder builder = factory.newDocumentBuilder(); 035: Document document = builder.parse( file ); Lines 33 35 use the DocumentBuilderFactory to obtain a DocumentBuilder, and then have the DocumentBuilder parse the XML file. The return value is an instance a class implementing the org.w3c.dom.Document interface. 038: Element root = document.getDocumentElement(); Line 38 gets the root element of the document as an org.w3c.dom.Element; this is the <books> element. 042: NodeList list = root.getChildNodes(); Line 42 gets all the child nodes of the <books> element; this contains the <book> elements as well as a set of empty text elements (which we ignore). The result is an instance of a class implementing the org.w3c.dom.NodeList interface. This interface defines two methods, as shown in Table 25.7. Table 25.7. NodeList Methods Method | Description |
---|
int getLength() | The number of nodes in the list | Node item(int index) | Returns the item at the "index" zero-based index in the collection | Line 46 takes the node from the child-node list and determines if it is an element node (in this case it is the <book> node). If it is then lines 50 51 extract the category attribute. Lines 53 69 iterate over all the <book> nodes children: <author>, <title>, and <price>. Remember that these nodes do not contain the value of the aforementioned tags, but they instead contain text nodes that contain the values, which is extracted in lines 59 67. As you experiment with the DOM, you will notice that every node that has children also has a set of text nodes containing only white space. The reason for this is that unless the elements in the document do not have spaces between them, the DOM builds a node to hold them. The example carefully avoided printing blank text nodes because it understood the nature of the XML document. The results of this sample application should resemble the following: root=books Handling node: book Category Attribute: fiction Node=Left Behind Node=Tim Lahaye Node=14.95 Handling node: book Category Attribute: Computer Programming Node=Java 2 From Scratch Node=Steven Haines Node=39.95 Outputting the XML Tree After all the work involved in parsing through the DOM tree, outputting the DOM tree to a stream is a very simple thing. The DOM implementation of the Element interface in the JAXP has overridden the toString() method to display the tree in XML form. Therefore, to display the entire DOM tree to the screen: System.out.println( root ); Manipulating the XML Tree After a DOM tree is constructed in memory, you might want to modify the tree by adding nodes, changing values, or deleting nodes. Adding an Element to the XML Tree When adding nodes to a DOM tree recall the internal structure of the DOM tree (refer to Figure 25.2). An element in the DOM tree is a node that is the composition of other nodes, including attribute nodes, text nodes, and other element nodes. Therefore, to add a new <book> to the <books> root element node, <author>, <title>, and <price> nodes must be created that contain text nodes with their respective values and added to a new <book> node. The Document class offers some helpful methods that create new nodes, see Table 25.8. Table 25.8. Document Node Creation Methods Method | Description |
---|
Attr createAttribute(String name) | Creates an Attr of the given name | CDATASection createCDATASection(String data) | Creates a CDATASection node whose value is the specified string | Comment createComment(String data) | Creates a Comment node given the specified string | DocumentFragment createDocumentFragment() | Creates an empty DocumentFragment object | Element createElement(String tagName) | Creates an element of the type specified | EntityReference createEntityReference(String name) | Creates an EntityReference object | ProcessingInstruction createProcessingInstruction(String target, String data) | Creates a ProcessingInstruction node given the specified name and data strings | Text createTextNode(String data) | Creates a Text node given the specified string | The Element and Node interfaces offer additional help in document creation through the methods described in Table 25.9. Table 25.9. Element and Node Document Creation Methods Method | Description |
---|
Node appendChild(Node newChild) | Adds the node newChild to the end of the list of children of this node | void setAttribute(String name, String value) | Adds a new attribute | Attr setAttributeNode(Attr newAttr) | Adds a new attribute node | Listing 25.8 shows a sample application that adds a new book to the DOM, and then displays the new document to the standard output. Listing 25.8 DOMSample2.java 001:import javax.xml.parsers.DocumentBuilder; 002:import javax.xml.parsers.DocumentBuilderFactory; 003:import javax.xml.parsers.FactoryConfigurationError; 004:import javax.xml.parsers.ParserConfigurationException; 005: 006:import org.xml.sax.SAXException; 007:import org.xml.sax.SAXParseException; 008: 009:import java.io.FileInputStream; 010:import java.io.File; 011:import java.io.IOException; 012: 013:import org.w3c.dom.Document; 014:import org.w3c.dom.Element; 015:import org.w3c.dom.NodeList; 016:import org.w3c.dom.Node; 017:import org.w3c.dom.DOMException; 018:import org.w3c.dom.Text; 019: 020:public class DOMSample2 021:{ 022: public static void main( String[] args ) 023: { 024: try 025: { 026: File file = new File( "book.xml" ); 027: if( !file.exists() ) 028: { 029: System.out.println( "Couldn't find file..." ); 030: return; 031: } 032: 033: // Parse the document 034: DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance(); 035: DocumentBuilder builder = factory.newDocumentBuilder(); 036: Document document = builder.parse( file ); 037: 038: // Get the root of the document 039: Element root = document.getDocumentElement(); 040: 041: // Build a new book 042: Element newAuthor = document.createElement( "author" ); 043: Text authorText = document.createTextNode( "Tim Lahaye" ); 044: newAuthor.appendChild( authorText ); 045: Element newTitle = document.createElement( "title" ); 046: Text titleText = document.createTextNode( "Desecration" ); 047: newTitle.appendChild( titleText ); 048: Element newPrice = document.createElement( "price" ); 049: Text priceText = document.createTextNode( "19.95" ); 050: newPrice.appendChild( priceText ); 051: Element newBook = document.createElement( "book" ); 052: newBook.setAttribute( "category", "fiction" ); 053: newBook.appendChild( newAuthor ); 054: newBook.appendChild( newTitle ); 055: newBook.appendChild( newPrice ); 056: 057: // Add the book to the root 058: root.appendChild( newBook ); 059: 060: // Display the document 061: System.out.println( root ); 062: } catch( Exception e ) 063: { 064: e.printStackTrace(); 065: } 066: } 067:} The output from Listing 25.8 should appear similar to the following: <books> <book category="fiction"> <title>Left Behind</title> <author>Tim Lahaye</author> <price>14.95</price> </book> <book category="Computer Programming"> <title>Java 2 From Scratch</title> <author>Steven Haines</author> <price>39.95</price> </book> <book category="fiction"><author>Tim Lahaye</author><title>Desecration</title><price>19.95</price></book></books> Notice that the XML content is correct, but why is the added node formatted so poorly? Remember all those aforementioned text nodes? Without those nodes this is how the output looks. Removing an Element from the XML Tree Removing an element from the DOM tree is a simple operation: -
Obtain a reference to the node to delete -
Call the Node interface's removeChild( Node node ) method from the parent node, passing it the child to delete For example: // Get document root into variable books // Find the book we are looking for: oldBook books.removeChild( oldBook ); Modifying an Element in the XML Tree Modifying an element involves retrieving the element and calling one of the Element or Node document modification methods, see Table 25.10. Table 25.10. Element and Node Document Modification Methods Method | Description |
---|
void removeAttribute(String name) | Removes an attribute by name | Attr removeAttributeNode(Attr oldAttr) | Removes the specified attribute node | void setAttribute(String name, String value) | Adds a new attribute | Attr setAttributeNode(Attr newAttr) | Adds a new attribute node | void setNodeValue(String nodeValue) | Sets the value of a node | From the Element interface, there are methods to remove and set attributes and the Node interface offers a method that sets or replaces the value of a node. |