This first Java example will get us started by parsing an XML document and displaying the number of a certain element in it. In this chapter, I'm taking a look at using the XML DOM with Java, and I'll use the Java DocumentBuilder class, which creates a W3C DOM tree as its output. Here's the document we'll parse: Listing ch11_01.xml<?xml version = "1.0" standalone="yes"?> <DOCUMENT> <CUSTOMER> <NAME> <LAST_NAME>Smith</LAST_NAME> <FIRST_NAME>Sam</FIRST_NAME> </NAME> <DATE>October 15, 2003</DATE> <ORDERS> <ITEM> <PRODUCT>Tomatoes</PRODUCT> <NUMBER>8</NUMBER> <PRICE>.25</PRICE> </ITEM> <ITEM> <PRODUCT>Oranges</PRODUCT> <NUMBER>24</NUMBER> <PRICE>.98</PRICE> </ITEM> </ORDERS> </CUSTOMER> <CUSTOMER> <NAME> <LAST_NAME>Jones</LAST_NAME> <FIRST_NAME>Polly</FIRST_NAME> </NAME> <DATE>October 20, 2003</DATE> <ORDERS> <ITEM> <PRODUCT>Bread</PRODUCT> <NUMBER>12</NUMBER> <PRICE>.95</PRICE> </ITEM> <ITEM> <PRODUCT>Apples</PRODUCT> <NUMBER>6</NUMBER> <PRICE>.50</PRICE> </ITEM> </ORDERS> </CUSTOMER> <CUSTOMER> <NAME> <LAST_NAME>Weber</LAST_NAME> <FIRST_NAME>Bill</FIRST_NAME> </NAME> <DATE>October 25, 2003</DATE> <ORDERS> <ITEM> <PRODUCT>Asparagus</PRODUCT> <NUMBER>12</NUMBER> <PRICE>.95</PRICE> </ITEM> <ITEM> <PRODUCT>Lettuce</PRODUCT> <NUMBER>6</NUMBER> <PRICE>.50</PRICE> </ITEM> </ORDERS> </CUSTOMER> </DOCUMENT> In this first example, the code will scan ch11_01.xml and report how many <CUSTOMER> elements the document has. To start this program, I'll import the Java classes we'll need (which support the W3C DOM interfaces, such as Node and Element ) and the XML parser classes we'll use: import javax.xml.parsers.*; import org.w3c.dom.*; . . . I'll call this first program ch11_02.java, so the public class in that file is ch11_02 : import javax.xml.parsers.*; import org.w3c.dom.*; public class ch11_02 { public static void main(String[] args) { . . . } To parse the XML document, you need a DocumentBuilderFactory object, which you use to create an object of the DocumentBuilder class (it's called a document builder factory because you can use it to create parsers using Java classes from different parser vendors , not just the default Java XML parser that we'll use here): import javax.xml.parsers.*; import org.w3c.dom.*; public class ch11_02 { public static void main(String[] args) { try { DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance(); DocumentBuilder db = null; try { db = dbf.newDocumentBuilder(); } catch (ParserConfigurationException pce) {} . . . } } You can find the constructors for the DocumentBuilderFactory class in Table 11-1 and the methods of the DocumentBuilder class in Table 11-2. Table 11-1. Methods of the javax.xml.parsers.DocumentBuilderFactory Class
Table 11-2. Methods of the javax.xml.parsers.DocumentBuilder Class
To actually parse the XML document, you use the parse method of the DocumentBuilder object. I'll let the user specify the name of the document to parse on the command line by parsing args[0] . Note that you don't need to pass the name of a local file to the parse methodyou can pass the URL of a document on the Internet, and the parse method will retrieve that document. Here's how you can use the parse method: import javax.xml.parsers.*; import org.w3c.dom.*; public class ch11_02 { public static void main(String[] args) { try { DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance(); DocumentBuilder db = null; try { db = dbf.newDocumentBuilder(); } catch (ParserConfigurationException pce) {} Document doc = null; doc = db.parse(args[0]); . . . } catch (Exception e) { e.printStackTrace(System.err); } } } If the document is successfully parsed, this code creates a Document object based on the W3C DOM. The Document interface is part of the W3C DOM, and you can find the methods of this interface in Table 11-3. Table 11-3. Methods of the org.w3c.dom.Document Interface
The Document interface is based on the Node interface, which supports the W3C Node object. Nodes represent a single node in the document tree (as you recall, everything in the document tree, including text and comments, is treated as nodes). The Node interface has many methods that you can use to work with nodes. For example, you can use methods such as getNodeName and getNodeValue to get information about the node, and we'll use this kind of information a great deal in this chapter. This interface also has data members , called fields, which hold constant values corresponding to various node types; we'll see them in this chapter as well. You'll find the Node interface fields in Table 11-4 and the methods of this interface in Table 11-5. As you see in Table 11-4, the Node interface contains all the standard W3C DOM methods for navigating in a document that we've already used with JavaScript in Chapter 7, "Handling XML Documents with JavaScript." These include getNextSibling , getPreviousSibling , getFirstChild , getLastChild , and getParent . We'll put those methods to work here as well. Table 11-4. Node Interface Fields
Table 11-5. Methods of the org.w3c.dom.Node Interface
At this point, we've gotten access to the root node of the document in Java. Our goal here is to check how many <CUSTOMER> elements the document has. I'll use the getElementsByTagName method to get a Java NodeList object containing a list of all <CUSTOMER> elements: import javax.xml.parsers.*; import org.w3c.dom.*; public class ch11_02 { public static void main(String[] args) { try { DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance(); DocumentBuilder db = null; try { db = dbf.newDocumentBuilder(); } catch (ParserConfigurationException pce) {} Document doc = null; doc = db.parse(args[0]); NodeList nodelist = doc.getElementsByTagName("CUSTOMER"); . . . } catch (Exception e) { e.printStackTrace(System.err); } } } The NodeList interface supports an ordered collection of nodes. You can access nodes in such a collection by index, and we'll do that in this chapter. You can find the methods of the NodeList interface in Table 11-6. Table 11-6. NodeList Interface Methods
If you take a look at Table 11-6, you'll see that the NodeList interface supports a getLength method that returns the number of nodes in the list. This means that we can find how many <CUSTOMER> elements there are in the document like this: Listing ch11_02.javaimport javax.xml.parsers.*; import org.w3c.dom.*; public class ch11_02 { public static void main(String[] args) { try { DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance(); DocumentBuilder db = null; try { db = dbf.newDocumentBuilder(); } catch (ParserConfigurationException pce) {} Document doc = null; doc = db.parse(args[0]); NodeList nodelist = doc.getElementsByTagName("CUSTOMER"); System.out.println(args[0] + " has " + nodelist.getLength() + " <CUSTOMER> elements."); } catch (Exception e) { e.printStackTrace(System.err); } } } And that's it. You can see the results of this code here, indicating that ch11_01.xml has three <CUSTOMER> elements, which is correct: %java ch11_02 ch11_01.xml ch11_01.xml has 3 <CUSTOMER> elements. That's all it takes to get started with the Java XML parsers. |