Parsing XML with DOM | Java Phrasebook

File file = new File("document.xml"); DocumentBuilderFactory f =    DocumentBuilderFactory.newInstance(); DocumentBuilder p = f.newDocumentBuilder(); Document doc = p.parse(file);

The DocumentBuilderFactory, DocumentBuilder, and Document classes are the three classes that we use to kick off the parsing of an XML document using a DOM parser. We perform the parsing with the DocumentBuilder class. The DocumentBuilder class defines the API to obtain DOM Document instances from an XML document. The DocumentBuilder class can parse XML from a variety of input sources, including InputStreams, Files, URLs, and SAXInputSources. In this phrase, we parse the XML from a File input source. The parse() method of the DocumentBuilder class parses the XML document and returns a Document object. A Document object represents the DOM of an XML document. From the Document instance, you could then pull the document apart and get at the components that make up the XML document, such as its entities, elements, attributes, etc.

The Document object is a container for a hierarchical collection of Node objects that represent the XML document's structure. Nodes can have a parent, children or attributes associated with them. There are three main subclasses of the Node type that represent the major parts of an XML document; these are the Element, Text, and Attr classes. Next, we show an example of further parsing a DOM using the Document class. Below is the sample XML document that we will use:

<Location>   <Address>     <City>Flat Rock</City>     <State>Michigan</State>   </Address> </Location>

Assuming that we've already obtained a Document instance using the parse technique demonstrated in the previous phrase, the Java code here will pull out the city and state text values:

NodeList list =    document.getElementsByTagName("City"); Element cityEl = (Element)list.item(0); String city =    ((Text)cityEl.getFirstChild()).getData(); NodeList list =    document.getElementsByTagName("State"); Element stateEl = (Element)list.item(0); String state =    ((Text)stateEl.getFirstChild()).getData();

The method getElementsByTagName() that we use returns a NodeList containing all the elements matching the name passed in. Since our sample document contains only one City element and one State element, we just get the first (zero indexed) element out of the node list and cast it as an Element type. The City and State elements each have one child, which is a Text type. We use the getdata() method of the Text type to get the actual value for the city and state.

Unlike a SAX parser, a DOM parser reads an entire XML document into memory, and the document is then parsed and processed from memory. In this regard, a SAX parser is more memory efficient because with SAX, the entire XML document is not stored in memory. The document is scanned in a streaming style when using SAX.