Accessing XML Data Using DOM | JAX: Java APIs for XML Kick Start

E-Mail

Add Note

Add Bookmark

	Java APIs for XML Kick Start By Aoyon Chowdhury, Parag Choudhary
	Table of Contents

	Chapter 5. The Document Object Model

You can access XML data from the Document object by using its member methods. You will update the MyDOMHandler application to get the following information from the CarParts.xml file:

DTD information
Element node types
An element node and its attributes (accessed randomly)
An element's text node

Accessing DTD Information

A DTD describes the structure of the content of an XML document. It defines the elements, their order, and relation with each other. A DTD also defines the element attributes and whether the element and/or attributes are mandatory or optional. You can use a DTD to ensure that the XML document conforming to the DTD is well-formed and valid. You might want to access the DTD information for various reasons. Suppose you build an XML document based on a publicly available DTD. This means that when the DOM parser parses the XML document, it must refer to the DTD at runtime. However, it is possible that the server at which the DTD is posted is down. Using the DTD information and EntityResolver interface, you can redirect the DTD source to a DTD cached locally in your machine.

The information about the DTD can be accessed by using the getDoctype() method of the Document object. To implement the getDoctype() method, add the following bold lines of code:

try {         DocumentBuilder builder = factory.newDocumentBuilder();         builder.setErrorHandler(new MyErrorHandler());         System.out.println("\n----------------- Creating the DOM from the CarParts.xml  File ---------- \n");         document = builder.parse( new File("CarParts.xml") );         System.out.println("\n----------------- DOM Created from the CarParts.xml file ---------- \n");         // Getting the DTD Information         DocumentType docType = document.getDoctype();         System.out.println("\n-----------The Name of the DTD is : " + docType.getName() + "\n");         System.out.println("\n-----------The system ID of the doctype is : " + docType.getSystemId() + "\n");     } catch (SAXParseException saxException) {

The getDocType() method returns a DocumentType object. The DocumentType object has several methods with which you can get various information about the DTD of the XML document. In the preceding code, the getName() and the getSystemId() methods are used to get the name and the system identifier of the DTD.

NOTE

The code discussed here is available in the example0502 folder. This folder also contains the sample CarParts.xml file.

Compile and run the application. The output should be similar to the displayed listing:

----------------- Creating the DOM from the CarParts.xml File ---------- ----------------- DOM Created from the CarParts.xml file ---------- -----------The Name of the DTD is : carparts -----------The system ID of the doctype is : CarParts.dtd

As shown in the preceding listing, the name of the DTD is carparts, and the system identifier is CarParts.dtd.

Next you will update the MyDOMHandler code to access and determine the node types.

Determining Element Node Types

In DOM, a Node is the primary data type. All information, be it the element, its attribute, or its contents, is considered to be a node. You might want to determine the element node type for various reasons, such as to get a list of all element tags, specific attributes, comments, and so on.

For example, consider the following entry in the CarParts.xml file:

<engine  type="Alpha37" capacity="2500" price="3500">     Engine 1 </engine>

In the DOM representation, the engine element will be an Element node, the attributes will be Attr nodes, and the text for the engine element (Engine 1) will be a Text node.

The Document object provides a number of methods with which you can access a specific node of the document. The Node object has three attributes: nodeName, nodeValue, and attributes, which enable an application to get the node information. The values for these attributes can be accessed by using the three methods getNodeName(), getNodeType(), and getNodeValue().

The values of nodeName, nodeValue, and attributes for the different node types are displayed earlier in Table 5.1.

Next, update the MyDOMHandler application to get all the nodes in the document and determine their types and values. To do this, you need to do the following:

Define an array that contains the descriptions of the different node types. This is required because the getNodeType() method returns a short value, such as 1 for Element, 2 for Attr, and so on. The application will resolve the short value to a more comprehensible text entry.
Define a recursive function printDomNode() that takes as a parameter a Node object. The function displays the node name and node type of the node object and its child nodes. Child nodes are the nodes that exist under a node. For example, the text node is always a child node of the element node.
Use the getDocumentElement() method to get the root node of the document, and pass it as a parameter to the printDomNode() method to get all the nodes of the CarParts.xml document.

First, you will begin by defining the array. To do so, enter the lines of code listed in bold:

public class MyDOMHandler {    static Document document;        static String  nodeTypes[] ={                                         "none",                                     "Element",                                         "Attr",                                         "Text",                                         "CDATA",                                         "EntityRef",                                         "Entity",                                         "ProcInstr",                                         "Comment",                                         "Document",                                         "DocType",                                         "DocFragment",                                         "Notation", };

Next, write the printDomNode() function. To do so, add the lines of code listed in bold:

public class MyDOMHandler {    static Document document;        static String  nodeTypes[] ={                                         "none",                                         "Element",                                     "Attr",                                         "Text",                                         "CDATA",                                         "EntityRef",                                         "Entity",                                         "ProcInstr",                                         "Comment",                                         "Document",                                         "DocType",                                         "DocFragment",                                         "Notation", }; static int nodeLevel=0; public static void printDomNode(Node node) {     for(int j=0;j<nodeLevel;j++)     System.out.print("\t");     System.out.println(node.getNodeName() + "\" [" +   nodeTypes[node.getNodeType()]+"]");     NodeList list = node.getChildNodes();     nodeLevel++;     for(int i=0;i<list.getLength();i++)     {         printDomNode(list.item(i));     }     nodeLevel--;  }

The printDomNode() is a simple recursive function that takes a Node object as its parameter. It then calls the getNodeName() and getNodeType() methods to display the name and type of the node. Then it gets a list of its child nodes and recurses for each of the child nodes.

Finally, the getDocumentElement() method needs to be called to get the root element. This will be passed on to the printDomNode() method to get all the nodes of the document. To do this, add the lines of code listed in bold:

public static void main(String argv[]) {     ...............     System.out.println("\n-----------The public ID of the doctype is : " + docType. getSystemId() + "\n");     // Getting the root element     Element element = document.getDocumentElement();     System.out.println("\n---- The tag name of the document element is : " + element.getTagName() + "\n");     printDomNode(document.getDocumentElement());     } catch (SAXParseException saxException) {

In the preceding code snippet, the root element node is obtained from the Document object, and its name is displayed. Then the root element node is passed on as a parameter to the printDomNode() method.

NOTE

The code discussed here is available in the example0503 folder. This folder also contains the sample CarParts.xml file.

Compile and run the application. The output should be similar to that displayed in Listing 5.4.

Listing 5.4 Output of MyDOMHandler Application

----------------- Creating the DOM from the CarParts.xml File ---------- ----------------- DOM Created from the CarParts.xml file ---------- -----------The Name of the DTD is : carparts -----------The public ID of the doctype is : CarParts.dtd ---- The tag name of the document element is : carparts carparts" [Element]     #text" [Text]     supplierformat" [ProcInstr]     #text" [Text]         supplier" [Element]         #text" [Text]     #text" [Text]     engines" [Element]         #text" [Text]         engine" [Element]             #text" [Text]         #text" [Text]         engine" [Element]             #text" [Text]         #text" [Text]     #text" [Text]     carbodies" [Element]         #text" [Text]         carbody" [Element]             #text" [Text]         #text" [Text]     #text" [Text]     wheels" [Element]         #text" [Text]         wheel" [Element]             #text" [Text]         #text" [Text]     #text" [Text]     carstereos" [Element]         #text" [Text]         carstereo" [Element]             #text" [Text]         #text" [Text]     #text" [Text]     forCDATA" [Element]         #cdata-section" [CDATA]         #text" [Text]     #text" [Text]

The output displays the different nodes in the XML document and its types. Notice that although the DTD specifies that the carparts element can only have child elements and no text, a text node still appears. Similarly, for all elements that have only child elements, there are text nodes that follow. This is because when the DocumentBuilder is creating a DOM, it includes the lexical information in its entirety by default. It does so to ensure that the XML document can be reconstructed later in its original form, including the extra spaces and so forth.

Also note that although the attributes of an element are also nodes, they do not appear here. This is because the attributes are considered properties of the element and treated differently.

Next, you will update the application to randomly access an element node and determine its attributes and values.

Randomly Accessing an Element Node and Its Attributes

The biggest advantage of the DOM is the capability that it provides to an application to randomly access any node in the document tree. This helps the application to manipulate the DOM as per requirements. The Document object provides a number of methods to randomly access a node.

In the MyDOMHandler application, do the following to randomly access an element node and its attributes:

Use the getElementsByTagName() method to access an element by its tag name. In the MyDOMHandler application, access the supplier element.
Because the getElementsByTagName() method returns a NodeList object, iterate the list to get the elements with the same tag name and then call the printDomNode() method.
Update the printDomNode() method to get the attributes and display their names and values.

First, implement getElementsByTagName() to access the elements by tag name. To do so, add the following lines of code displayed in bold:

// Getting the root element     Element element = document.getDocumentElement();     System.out.println("\n---- The tag name of the document element is : " + element. getTagName() + "\n");     NodeList list = document.getElementsByTagName("supplier");

Next, for each element with the supplier tag name, call the printDomNode() method. To do so, add the lines of code displayed in bold:

NodeList list = document.getElementsByTagName("supplier"); int ListLength = list.getLength(); System.out.println("\n---- The number of supplier element nodes are : " + ListLength + "\n"); for (int i = 0; i < ListLength; i++) {        System.out.println("The name of the node is : " + list.item(i).getNodeName() + "\ n");        printDomNode(list.item(i)); } catch (SAXParseException saxException) {

Finally, update the printDomNode() method to get the attributes and display their names and values. To do so, add the lines of code displayed in bold:

public static void printDomNode(Node node) {     for(int j=0;j<nodeLevel;j++)     System.out.print("\t");     System.out.println(node.getNodeName() + "\" [" + nodeTypes[node.getNodeType()]+"]");    //Prints the attributes and its values for an element     NamedNodeMap nodeMap = node.getAttributes();     for (int j = 0; null!= nodeMap && j < nodeMap.getLength(); j++)     {         Node attributeNode = nodeMap.item(j);         for(int k=0;k<nodeLevel;k++)         System.out.print("\t");         System.out.println("Attr Name: "+attributeNode.getNodeName() +" Value:  "+attributeNode.getNodeValue());     }     NodeList list = node.getChildNodes();     nodeLevel++;     for(int i=0;i<list.getLength();i++)     {         printDomNode(list.item(i));     }     nodeLevel--;    }

To get the attributes of an element, use the getAttributes() method. Because the attributes are also node elements, you can use the getNodeName() and getNodeValue() methods to get the name and the values of the attributes.

NOTE

The code discussed here is available in the example0504 folder. This folder also contains the sample CarParts.xml file.

Compile and run the application. The output should be similar to the following:

----------------- Creating the DOM from the CarParts.xml File ---------- ----------------- DOM Created from the CarParts.xml file ---------- -----------The Name of the DTD is : carparts -----------The public ID of the doctype is : CarParts.dtd ---- The tag name of the document element is : carparts ---- The number of supplier element nodes are : 1 The name of the node is : supplier supplier" [Element] Attr Name: URL Value: http://carpartsheaven.com Attr Name: name Value: Heaven Car Parts (TM)     #text" [Text]

Notice that for the supplier element, the attribute values were entities. The DOM parser automatically replaces the entities with the text they represent.

Next you will learn how to access the text node of an element.

Accessing an Element's Text Node

The character data of an element is contained in the text node of the element. It is the last child node of the element. A text node cannot have child nodes. An application can access the element's text node to get the value for an element. The application can then display this information for the end-user's benefit.

For example, in the DOM representation of an XML element <engine> Engine 1 </engine>, the engine element node will contain a text node whose value is Engine 1.

Because a text node is the last child node of an element, you can use the getLastChild() method to access the text node. To understand how to access a text node, update the MyDOMHandler application to get the text node for the supplier element.

To do so, add the lines of code listed in bold:

NodeList list = document.getElementsByTagName("supplier"); int ListLength = list.getLength(); System.out.println("\n---- The number of supplier element nodes are : " + ListLength + "\ n"); for (int i = 0; i < ListLength; i++) {        System.out.println("The name of the node is : " + list.item(i).getNodeName() + "\ n");        //printDomNode(list.item(i));        Node textNode = list.item(i).getLastChild(); //Since text node is the last child  node of the element node        System.out.println("The value of the text node is :" + textNode.getNodeValue(). trim() + "\n"); } catch (SAXParseException saxException) {

Note that you could have also modified printDomNode() to display the value in the text node. However, the printDomNode() method will not be used at all in this example. As shown in the previous code snippet, comment out the printDomNode() method and use the getLastChild() method to access the text node of the supplier element node.

NOTE

The code discussed here is available in the example0505 folder. This folder also contains the sample CarParts.xml file.

Compile and run the application. The output should be similar to the following:

----------------- Creating the DOM from the CarParts.xml File ---------- ----------------- DOM Created from the CarParts.xml file ---------- -----------The Name of the DTD is : carparts -----------The public ID of the doctype is : CarParts.dtd ---- The tag name of the document element is : carparts ---- The number of supplier element nodes are : 1 The name of the node is : supplier The value of the text node is :Heaven Car Parts (TM)

Notice that the actual text for the supplier element in the CarParts.xml file was an entity reference. The DOM parser replaces the entity reference with the text.

E-Mail

Add Note

Add Bookmark

Top

[0672324342/ch05lev1sec4]