Creating a Parser

This first Java example will get us started by parsing an XML document and displaying the number of a certain element in it. In this chapter, I'm taking a look at using the XML DOM with Java, and I'll use the Java DocumentBuilder class, which creates a W3C DOM tree as its output. Here's the document we'll parse:

Listing ch11_01.xml
 <?xml version = "1.0" standalone="yes"?> <DOCUMENT>     <CUSTOMER>         <NAME>             <LAST_NAME>Smith</LAST_NAME>             <FIRST_NAME>Sam</FIRST_NAME>         </NAME>         <DATE>October 15, 2003</DATE>         <ORDERS>             <ITEM>                 <PRODUCT>Tomatoes</PRODUCT>                 <NUMBER>8</NUMBER>                 <PRICE>.25</PRICE>             </ITEM>             <ITEM>                 <PRODUCT>Oranges</PRODUCT>                 <NUMBER>24</NUMBER>                 <PRICE>.98</PRICE>             </ITEM>         </ORDERS>     </CUSTOMER>     <CUSTOMER>         <NAME>             <LAST_NAME>Jones</LAST_NAME>             <FIRST_NAME>Polly</FIRST_NAME>         </NAME>         <DATE>October 20, 2003</DATE>         <ORDERS>             <ITEM>                 <PRODUCT>Bread</PRODUCT>                 <NUMBER>12</NUMBER>                 <PRICE>.95</PRICE>             </ITEM>             <ITEM>                 <PRODUCT>Apples</PRODUCT>                 <NUMBER>6</NUMBER>                 <PRICE>.50</PRICE>             </ITEM>         </ORDERS>     </CUSTOMER>     <CUSTOMER>         <NAME>             <LAST_NAME>Weber</LAST_NAME>             <FIRST_NAME>Bill</FIRST_NAME>         </NAME>         <DATE>October 25, 2003</DATE>         <ORDERS>             <ITEM>                 <PRODUCT>Asparagus</PRODUCT>                 <NUMBER>12</NUMBER>                 <PRICE>.95</PRICE>             </ITEM>             <ITEM>                 <PRODUCT>Lettuce</PRODUCT>                 <NUMBER>6</NUMBER>                 <PRICE>.50</PRICE>             </ITEM>         </ORDERS>     </CUSTOMER> </DOCUMENT> 

In this first example, the code will scan ch11_01.xml and report how many <CUSTOMER> elements the document has.

To start this program, I'll import the Java classes we'll need (which support the W3C DOM interfaces, such as Node and Element ) and the XML parser classes we'll use:

 import javax.xml.parsers.*;  import org.w3c.dom.*;     .     .     . 

I'll call this first program ch11_02.java, so the public class in that file is ch11_02 :

 import javax.xml.parsers.*;  import org.w3c.dom.*;  public class ch11_02   {   public static void main(String[] args)   {   .   .   .   }  

To parse the XML document, you need a DocumentBuilderFactory object, which you use to create an object of the DocumentBuilder class (it's called a document builder factory because you can use it to create parsers using Java classes from different parser vendors , not just the default Java XML parser that we'll use here):

 import javax.xml.parsers.*;  import org.w3c.dom.*; public class ch11_02 {     public static void main(String[] args)     {  try {   DocumentBuilderFactory dbf =   DocumentBuilderFactory.newInstance();   DocumentBuilder db = null;   try {   db = dbf.newDocumentBuilder();   }   catch (ParserConfigurationException pce) {}   .   .   .  } } 

You can find the constructors for the DocumentBuilderFactory class in Table 11-1 and the methods of the DocumentBuilder class in Table 11-2.

Table 11-1. Methods of the javax.xml.parsers.DocumentBuilderFactory Class
Method Does This
protected DocumentBuilderFactory() The default constructor
abstract Object getAttribute (String name) Returns specific attribute values
boolean isCoalescing() Is true if the factory is configured to produce parsers that convert CDATA nodes to text nodes
boolean isExpandEntityReferences() Is true if the factory is configured to produce parsers that expand XML entity reference nodes
boolean isIgnoringComments() Is true if the factory will produce parsers that ignore comments
boolean isIgnoringElementContentWhitespace() Is true if the factory will produce parsers that ignore ignorable whitespace (such as that used to indent elements) in element content
boolean isNamespaceAware() Is true if the factory will produce parsers that can use XML namespaces
boolean isValidating() Is true if the factory will produce parsers that validate the XML content during parsing operations
abstract DocumentBuildernewDocumentBuilder() Creates a new DocumentBuilder object
static DocumentBuilderFactorynewInstance() Returns a new DocumentBuilderFactory object
abstract void setAttribute(String name, Object value) Sets specific attributes
void setCoalescing(boolean coalescing ) Requires the parser produced to convert CDATA nodes to text nodes
void setExpandEntityReferences(boolean expandEntityRef) Requires the parser produced to expand XML entity reference nodes
void setIgnoringComments(boolean ignoreComments) Requires the parser produced to ignore comments
void setIgnoringElementContentWhitespace(boolean whitespace) Requires the parsers created to eliminate ignorable whitespace
void setNamespaceAware(boolean awareness) Requires the parser produced to provide support for XML namespaces
void setValidating(boolean validating) Requires the parser produced to validate documents as they are parsed
Table 11-2. Methods of the javax.xml.parsers.DocumentBuilder Class
Method Does This
protected DocumentBuilder() The default constructor
abstract DOMImplementation getDOMImplementation() Returns a DOMImplementation object
abstract boolean isNamespaceAware() Is true if this parser is configured to understand namespaces
abstract boolean isValidating() Is true if this parser is configured to validate XML documents
abstract Document newDocument() Returns a new instance of a DOM Document object to build a DOM tree
Document parse(File f) Parses the content of the file as an XML document and returns a new DOM Document object
abstract Document parse(InputSource is) Parses the content of the specified source as an XML document and returns a new DOM Document object
Document parse(InputStream is) Parses the content of the specified InputStream as an XML document and returns a new DOM Document object
Document parse(InputStream is , String systemId) Parses the content of the specified InputStream as an XML document and returns a new DOM Document object
Document parse(String uri) Parses the content of the specified URI as an XML document and returns a new DOM Document object
abstract void setEntityResolverEntityResolver (EntityResolver er) Specifies the EntityResolver object to be used to resolve entities
abstract void setErrorHandler(ErrorHandler eh) Specifies the ErrorHandler to be used to report errors

To actually parse the XML document, you use the parse method of the DocumentBuilder object. I'll let the user specify the name of the document to parse on the command line by parsing args[0] . Note that you don't need to pass the name of a local file to the parse methodyou can pass the URL of a document on the Internet, and the parse method will retrieve that document.

Here's how you can use the parse method:

 import javax.xml.parsers.*;  import org.w3c.dom.*; public class ch11_02 {     public static void main(String[] args)     {         try {             DocumentBuilderFactory dbf =                 DocumentBuilderFactory.newInstance();             DocumentBuilder db = null;             try {                 db = dbf.newDocumentBuilder();             }             catch (ParserConfigurationException pce) {}  Document doc = null;   doc = db.parse(args[0]);  .         .         .         } catch (Exception e) {             e.printStackTrace(System.err);         }     } } 

If the document is successfully parsed, this code creates a Document object based on the W3C DOM. The Document interface is part of the W3C DOM, and you can find the methods of this interface in Table 11-3.

Table 11-3. Methods of the org.w3c.dom.Document Interface
Method Does This
Attr createAttribute(String name) Creates an Attr object of the specified name
Attr createAttributeNS(String namespaceURI, String qualifiedName) Creates an attribute of the specified name and name space
CDATASection createCDATASection(String data) Creates a CDATASection node whose value is the specified string
Comment createComment(String data) Creates a Comment node using the specified string
DocumentFragment createDocumentFragment() Creates an empty DocumentFragment object
Element createElement(String tagName ) Creates an element of the type specified
Element createElementNS(String namespaceURI, String qualifiedName) Creates an element of the specified qualified name and namespace uniform resource identifier (URI)
EntityReference createEntityReference(String name) Creates an EntityReference object
ProcessingInstruction createProcessingInstruction(String target, String data) Creates a ProcessingInstruction node
Text createTextNode(String data) Creates a text node using the specified string
DocumentType getDoctype() Returns the document type definition (DTD) for this document
Element getDocumentElement() Provides direct access to the Document element
Element getElementById(String elementId) Returns the element whose ID is specified
NodeList getElementsByTagName(String tagname) Returns all the elements with a specified tag name
NodeList getElementsByTagNameNS(String namespaceURI, String localName) Returns all the elements with a specified name and name space
DOMImplementation getImplementation() Gets the DOMImplementation object that handles this document
Node importNode(Node importedNode , boolean deep) Imports a node from another document to this document

The Document interface is based on the Node interface, which supports the W3C Node object. Nodes represent a single node in the document tree (as you recall, everything in the document tree, including text and comments, is treated as nodes). The Node interface has many methods that you can use to work with nodes. For example, you can use methods such as getNodeName and getNodeValue to get information about the node, and we'll use this kind of information a great deal in this chapter. This interface also has data members , called fields, which hold constant values corresponding to various node types; we'll see them in this chapter as well. You'll find the Node interface fields in Table 11-4 and the methods of this interface in Table 11-5. As you see in Table 11-4, the Node interface contains all the standard W3C DOM methods for navigating in a document that we've already used with JavaScript in Chapter 7, "Handling XML Documents with JavaScript." These include getNextSibling , getPreviousSibling , getFirstChild , getLastChild , and getParent . We'll put those methods to work here as well.

Table 11-4. Node Interface Fields
Field Summary
static short ATTRIBUTE_NODE
static short CDATA_SECTION_NODE
static short COMMENT_NODE
static short DOCUMENT_FRAGMENT_NODE
static short DOCUMENT_NODE
static short DOCUMENT_TYPE_NODE
static short ELEMENT_NODE
static short ENTITY_NODE
static short ENTITY_REFERENCE_NODE
static short NOTATION_NODE
static short PROCESSING_INSTRUCTION_NODE
static short TEXT_NODE
Table 11-5. Methods of the org.w3c.dom.Node Interface
Method Does This
Node appendChild(Node newChild) Adds the specified node to the end of the list of children of the current node
Node cloneNode(boolean deep) Returns a duplicate of this node
NamedNodeMap getAttributes() Returns the attributes of this node if it is an element
NodeList getChildNodes() Returns all the children of this node
Node getFirstChild() Returns the first child of this node
Node getLastChild() Returns the last child of this node
String getLocalName() Returns the local part of the full name of this node
String getNamespaceURI() Returns the namespace URI of this node
Node getNextSibling() Returns the node following this node
String getNodeName() Returns the name of this node
short getNodeType() Returns the type of the node's object
String getNodeValue() Returns the value of this node
Document getOwnerDocument() Returns the Document object for this node
Node getParentNode() Returns the parent of this node
String getPrefix() Returns the namespace prefix of this node
Node getPreviousSibling() Returns the node preceding this node
boolean hasAttributes() Is true if this node has any attributes
boolean hasChildNodes() Is true if this node has any children
Node insertBefore(Node newChild, Node refChild) Inserts the new node before the existing reference child node
boolean isSupported(String feature, String version) Is true if the specific feature is implemented
void normalize() Puts all text nodes into XML "normal" form
Node removeChild(Node oldChild) Removes the child node and returns it
Node replaceChild(Node newChild, Node oldChild) Replaces the child node in the list of children and returns the old child node
void setNodeValue(String nodeValue) Sets the value of a node
void setPrefix(String prefix) Sets the namespace prefix of this node

At this point, we've gotten access to the root node of the document in Java. Our goal here is to check how many <CUSTOMER> elements the document has. I'll use the getElementsByTagName method to get a Java NodeList object containing a list of all <CUSTOMER> elements:

 import javax.xml.parsers.*;  import org.w3c.dom.*; public class ch11_02 {     public static void main(String[] args)     {         try {             DocumentBuilderFactory dbf =                 DocumentBuilderFactory.newInstance();             DocumentBuilder db = null;             try {                 db = dbf.newDocumentBuilder();             }             catch (ParserConfigurationException pce) {}             Document doc = null;             doc = db.parse(args[0]);  NodeList nodelist = doc.getElementsByTagName("CUSTOMER");  .         .         .         } catch (Exception e) {             e.printStackTrace(System.err);         }     } } 

The NodeList interface supports an ordered collection of nodes. You can access nodes in such a collection by index, and we'll do that in this chapter. You can find the methods of the NodeList interface in Table 11-6.

Table 11-6. NodeList Interface Methods
Method Summary
int getLength() Gets the number of nodes in this list
Node item(int index) Gets the item at index in the collection

If you take a look at Table 11-6, you'll see that the NodeList interface supports a getLength method that returns the number of nodes in the list. This means that we can find how many <CUSTOMER> elements there are in the document like this:

Listing ch11_02.java
 import javax.xml.parsers.*; import org.w3c.dom.*; public class ch11_02 {     public static void main(String[] args)     {         try {             DocumentBuilderFactory dbf =                 DocumentBuilderFactory.newInstance();             DocumentBuilder db = null;             try {                 db = dbf.newDocumentBuilder();             }             catch (ParserConfigurationException pce) {}             Document doc = null;             doc = db.parse(args[0]);             NodeList nodelist = doc.getElementsByTagName("CUSTOMER");  System.out.println(args[0] + " has " + nodelist.getLength() + "   <CUSTOMER> elements.");  } catch (Exception e) {             e.printStackTrace(System.err);         }     } } 

And that's it. You can see the results of this code here, indicating that ch11_01.xml has three <CUSTOMER> elements, which is correct:

 %java ch11_02 ch11_01.xml  ch11_01.xml has 3 <CUSTOMER> elements. 

That's all it takes to get started with the Java XML parsers.



Real World XML
Real World XML (2nd Edition)
ISBN: 0735712867
EAN: 2147483647
Year: 2005
Pages: 440
Authors: Steve Holzner

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net