Displaying an Entire Document | Real World XML (2nd Edition)

In this next example, I'm going to write a program that will parse and display an entire document, indenting each element, processing instruction, and so on, as well as displaying attributes and their values. For example, if you pass ch11_01.xml to this program, which I'll call ch11_03.java, that program will display the whole document properly indented.

I start by letting the user specify what document to parse and parsing that document as before. To actually parse the document, I'll call a new method, displayDocument , from the main method:

 public static void main(String args[])  {  displayDocument(args[0]);  .     .     . }

In the displayDocument method, I'll parse the document and get an object corresponding to that document:

 import javax.xml.parsers.*;  import org.w3c.dom.*; public class ch11_03 {     static String displayStrings[] = new String[1000];     static int numberDisplayLines = 0;     public static void displayDocument(String uri)     {         try {             DocumentBuilderFactory dbf =                 DocumentBuilderFactory.newInstance();             DocumentBuilder db = null;             try {                 db = dbf.newDocumentBuilder();             }             catch (ParserConfigurationException pce) {}             Document document = null;             document = db.parse(uri);         .         .         .         } catch (Exception e) {             e.printStackTrace(System.err);         }     .     .     .

The actual method that will parse the document, display , will be recursive (we saw recursion when working with JavaScript). I'll pass the document to parse to that method, as well as the current indentation string (which will grow by four spaces for every successive level of recursion):

 import javax.xml.parsers.*;  import org.w3c.dom.*; public class ch11_03 {     static String displayStrings[] = new String[1000];     static int numberDisplayLines = 0;     public static void displayDocument(String uri)     {         try {             DocumentBuilderFactory dbf =                 DocumentBuilderFactory.newInstance();             DocumentBuilder db = null;             try {                 db = dbf.newDocumentBuilder();             }             catch (ParserConfigurationException pce) {}             Document document = null;             document = db.parse(uri);  display(document, "");  } catch (Exception e) {             e.printStackTrace(System.err);         }     .     .     .

In the display method, I'll check to see whether the node passed to us is really a node and, if not, return. The next job is to display the node. How we do that depends on the type of node we're working with. To get the type of node, you can use the node's getNodeType method. I'll set up a long switch statement to handle the different types:

 public static void display(Node node, String indent)  {  if (node == null) {   return;   }   int type = node.getNodeType();   switch (type) {  . . .

To handle output from this program, I'll create an array of strings, displayStrings , placing each line of the output into one of those strings. I'll also store our current location in that array in an integer named numberDisplayLines :

 public class ch11_03  {  static String displayStrings[] = new String[1000];   static int numberDisplayLines = 0;  .     .     .

I'll start handling various types of nodes in this switch statement now.

Handling Document Nodes

At the beginning of the document is the XML declaration; the type of this node matches the constant Node.DOCUMENT_NODE defined in the Node interface (see Table 11-6). This declaration takes up one line of output, so I'll start the first line of output with the current indent string, followed by a default XML declaration.

The next step is to get the document element of the document we're parsing (the root element), and you do that with the getDocumentElement method. The root element contains all other elements, so I pass that element to the display method, which will display all those elements:

 public static void display(Node node, String indent)  {     if (node == null) {         return;     }     int type = node.getNodeType();     switch (type) {         case Node.DOCUMENT_NODE: {  displayStrings[numberDisplayLines] = indent;   displayStrings[numberDisplayLines] += "<?xml version=\"1.0\" encoding=\""+   "UTF-8" + "\"?>";   numberDisplayLines++;   display(((Document)node).getDocumentElement(), "");   break;  } . . .

Handling Element Nodes

To handle an element node, we should display the name of the element as well as any attributes the element has. I start by checking whether the current node type is Node.ELEMENT_NODE ; if so, I place the current indent string into a display string, followed by a < and the element's name, which I can get with the getNodeName method:

 switch (type) {     .     .     .  case Node.ELEMENT_NODE: {   displayStrings[numberDisplayLines] = indent;   displayStrings[numberDisplayLines] += "<";   displayStrings[numberDisplayLines] += node.getNodeName();  .          .          .

Handling Attributes

Now we've got to handle the attributes of this element, if it has any. Because the current node is an element node, you can use the method getAttributes to get a NodeList object that holds all its attributes, which are stored as Attr objects. I'll convert the node list to an array of Attr objects, attributes , like this. Note that I first create the attributes array after finding the number of items in the NodeList object with the getLength method:

 switch (type) {     .     .     .      case Node.ELEMENT_NODE: {          displayStrings[numberDisplayLines] = indent;          displayStrings[numberDisplayLines] += "<";          displayStrings[numberDisplayLines] += node.getNodeName();  int length = (node.getAttributes() != null) ? node.getAttributes().getLength() : 0;   Attr attributes[] = new Attr[length];   for (int loopIndex = 0; loopIndex < length; loopIndex++) {   attributes[loopIndex] = (Attr)node.getAttributes().item (loopIndex);   }  .          .          .

You can find the methods of the Attr interface in Table 11-7.

Table 11-7. Attr Interface Methods

Method	Summary
`java.lang.String getName()`	Gets the name of this attribute
`Element getOwnerElement()`	Gets the `Element` node this attribute is attached to
`boolean getSpecified()`	Is `true` if this attribute was explicitly given a value in the original document
`java.lang.String getValue()`	Gets the value of the attribute as a string
`void setValue(String value)`	Sets the value of the attribute as a string

Because the Attr interface is built on the Node interface, you can use either the getNodeName and getNodeValue methods to get the attribute's name and value, or the Attr getName and getValue methods. I'll use getNodeName and getNodeValue here. In this case, I'm going to loop over all the attributes in the attributes array, adding them to the current display like this: AttrName = "AttrValue" (note that I escape the quotation marks around the attribute values as \" so that Java doesn't interpret them as the end of the string):

 switch (type) {     .     .     .      case Node.ELEMENT_NODE: {          displayStrings[numberDisplayLines] = indent;          displayStrings[numberDisplayLines] += "<";          displayStrings[numberDisplayLines] += node.getNodeName();         int length = (node.getAttributes() != null) ? node.getAttributes().getLength() : 0;          Attr attributes[] = new Attr[length];          for (int loopIndex = 0; loopIndex < length; loopIndex++) {              attributes[loopIndex] = (Attr)node.getAttributes().item(loopIndex);          }  for (int loopIndex = 0; loopIndex < attributes.length; loopIndex++){   Attr attribute = attributes[loopIndex];   displayStrings[numberDisplayLines] += " ";   displayStrings[numberDisplayLines] += attribute.getNodeName();   displayStrings[numberDisplayLines] += "=\"";   displayStrings[numberDisplayLines] +=   attribute.getNodeValue();   displayStrings[numberDisplayLines] += "\"";   }   displayStrings[numberDisplayLines] +=">";   numberDisplayLines++;  .         .         .

This element might have child elements, of course, and we've got to handle them as well. I do that by storing all the child nodes in a NodeList object with the getChildNodes method. If there are any child nodes, I add four spaces to the indent string and loop over those child nodes, calling display to display each of them:

 switch (type) {     .     .     .      case Node.ELEMENT_NODE: {          displayStrings[numberDisplayLines] = indent;          displayStrings[numberDisplayLines] += "<";          displayStrings[numberDisplayLines] += node.getNodeName();         int length = (node.getAttributes() != null) ? node.getAttributes().getLength() : 0;         Attr attributes[] = new Attr[length];         for (int loopIndex = 0; loopIndex < length; loopIndex++) {             attributes[loopIndex] = (Attr)node.getAttributes().item(loopIndex);         }         for (int loopIndex = 0; loopIndex < attributes.length; loopIndex++){              Attr attribute = attributes[loopIndex];              displayStrings[numberDisplayLines] += " ";              displayStrings[numberDisplayLines] += attribute.getNodeName();              displayStrings[numberDisplayLines] += "=\"";              displayStrings[numberDisplayLines] += attribute.getNodeValue();              displayStrings[numberDisplayLines] += "\"";          }          displayStrings[numberDisplayLines] +=">";          numberDisplayLines++;  NodeList childNodes = node.getChildNodes();   if (childNodes != null) {   length = childNodes.getLength();   indent += "    ";   for (int loopIndex = 0; loopIndex < length; loopIndex++ ) {   display(childNodes.item(loopIndex), indent);   }   }  break;     }     .     .     .

That's it for handling elements. I'll handle CDATA sections next.

Handling CDATA Section Nodes

Handling CDATA sections is particularly easy. All I have to do here is enclose the value of the CDATA section's node inside "<![CDATA[" and "[[>" , and that looks like this:

 case Node.CDATA_SECTION_NODE: {     displayStrings[numberDisplayLines] = indent;     displayStrings[numberDisplayLines] += "<![CDATA[";     displayStrings[numberDisplayLines] += node.getNodeValue();     displayStrings[numberDisplayLines] += "]]>";     numberDisplayLines++;     break; } . . .

Handling Text Nodes

The W3C DOM specifies that the text in elements must be stored in text nodes, and those nodes have the type Node.TEXT_NODE . For these nodes, I'll add the current indent string to the display string and then trim off leading and trailing whitespace from the node's value with the Java String object's trim method:

 case Node.TEXT_NODE: {  displayStrings[numberDisplayLines] = indent;   String newText = node.getNodeValue().trim();  .     .     .

XML parsers treat all text as text nodes, including the spaces used for indenting elements in ch11_01.xml. I'll filter out the text nodes corresponding to indentation spacing. If a text node contains only displayable text, on the other hand, I'll add that text to the strings in the displayStrings array:

 case Node.TEXT_NODE: {     displayStrings[numberDisplayLines] = indent;     String newText = node.getNodeValue().trim();  if(newText.indexOf("\n") < 0 && newText.length() > 0) {   displayStrings[numberDisplayLines] += newText;   numberDisplayLines++;   }  break; } . . .

Handling Processing Instruction Nodes

The W3C DOM also lets you handle processing instructions. Here, the node type is Node.PROCESSING_INSTRUCTION_NODE and the node value is simply the processing instruction itself. For example, if the processing instruction is <?xml-stylesheet type="text/css" href="style.css"?> , the value of the associated processing instruction node is xml-stylesheet type="text/css" href="style.css" . That means all we have to do is straddle the value of a processing instruction node with <? and ?> . Here's what the code looks like:

 case Node.PROCESSING_INSTRUCTION_NODE: {  displayStrings[numberDisplayLines] = indent;   displayStrings[numberDisplayLines] += "<?";   String text = node.getNodeValue();   if (text != null && text.length() > 0) {   displayStrings[numberDisplayLines] += text;   }   displayStrings[numberDisplayLines] += "?>";   numberDisplayLines++;   break;  } } . . .

And that finishes the switch statement that handles the various types of nodes. There's only one more point to cover.

Closing Element Tags

Displaying element nodes takes a little more thought than displaying other types of nodes because, in addition to displaying < , the name of the element, and > , you also have to display a closing tag, </ , the name of the element, and > at the end of the element.

For that reason, I'll place some code after the switch statement to add closing tags to elements after all their children have been displayed (note that I'm also subtracting four spaces from the indent string, using the Java String substr method so that the closing tag lines up vertically with the opening tag):

 if (type == Node.ELEMENT_NODE) {         displayStrings[numberDisplayLines] = indent.substring(0, indent .length() - 4);         displayStrings[numberDisplayLines] += "</";         displayStrings[numberDisplayLines] += node.getNodeName();         displayStrings[numberDisplayLines] +=">";         numberDisplayLines++;         indent += "    ";     } }

And that's it. Here's the entire code, ch11_03.java:

Listing ch11_03.java

 import javax.xml.parsers.*; import org.w3c.dom.*; public class ch11_03 {     static String displayStrings[] = new String[1000];     static int numberDisplayLines = 0;     public static void displayDocument(String uri)     {         try {             DocumentBuilderFactory dbf =                 DocumentBuilderFactory.newInstance();             DocumentBuilder db = null;             try {                 db = dbf.newDocumentBuilder();             }             catch (ParserConfigurationException pce) {}             Document document = null;             document = db.parse(uri);             display(document, "");         } catch (Exception e) {             e.printStackTrace(System.err);         }     }     public static void display(Node node, String indent)     {         if (node == null) {             return;         }         int type = node.getNodeType();         switch (type) {             case Node.DOCUMENT_NODE: {                 displayStrings[numberDisplayLines] = indent;                 displayStrings[numberDisplayLines] += "<?xml version=\"1.0\" encoding=\""+                   "UTF-8" + "\"?>";                 numberDisplayLines++;                 display(((Document)node).getDocumentElement(), "");                 break;              }              case Node.ELEMENT_NODE: {                  displayStrings[numberDisplayLines] = indent;                  displayStrings[numberDisplayLines] += "<";                  displayStrings[numberDisplayLines] += node.getNodeName();                  int length = (node.getAttributes() != null) ?                  node.getAttributes().getLength() : 0;                  Attr attributes[] = new Attr[length];                  for (int loopIndex = 0; loopIndex < length; loopIndex++) {                      attributes[loopIndex] = (Attr)node.getAttributes().item(loopIndex);                  }                  for (int loopIndex = 0; loopIndex < attributes.length; loopIndex++) {                      Attr attribute = attributes[loopIndex];                      displayStrings[numberDisplayLines] += " ";                      displayStrings[numberDisplayLines] +=  attribute.getNodeName();                      displayStrings[numberDisplayLines] += "=\"";                      displayStrings[numberDisplayLines] += attribute.getNodeValue();                      displayStrings[numberDisplayLines] += "\"";                  }                  displayStrings[numberDisplayLines] +=">";                  numberDisplayLines++;                  NodeList childNodes = node.getChildNodes();                  if (childNodes != null) {                      length = childNodes.getLength();                      indent += "    ";                      for (int loopIndex = 0; loopIndex < length; loopIndex++ ) {                         display(childNodes.item(loopIndex), indent);                      }                  }                  break;              }              case Node.CDATA_SECTION_NODE: {                  displayStrings[numberDisplayLines] = indent;                  displayStrings[numberDisplayLines] += "<![CDATA[";                  displayStrings[numberDisplayLines] += node.getNodeValue();                  displayStrings[numberDisplayLines] += "]]>";                  numberDisplayLines++;                  break;               }              case Node.TEXT_NODE: {                  displayStrings[numberDisplayLines] = indent;                  String newText = node.getNodeValue().trim();                  if(newText.indexOf("\n") < 0 && newText.length() > 0) {                      displayStrings[numberDisplayLines] += newText;                      numberDisplayLines++;                  }                  break;              }              case Node.PROCESSING_INSTRUCTION_NODE: {                  displayStrings[numberDisplayLines] = indent;                  displayStrings[numberDisplayLines] += "<?";                  displayStrings[numberDisplayLines] += node.getNodeName();                  String text = node.getNodeValue();                  if (text != null && text.length() > 0) {                      displayStrings[numberDisplayLines] += text;                  }                  displayStrings[numberDisplayLines] += "?>";                  numberDisplayLines++;                  break;             }         }         if (type == Node.ELEMENT_NODE) {             displayStrings[numberDisplayLines] = indent.substring(0, indent.length() - 4);             displayStrings[numberDisplayLines] += "</";             displayStrings[numberDisplayLines] += node.getNodeName();             displayStrings[numberDisplayLines] +=">";             numberDisplayLines++;             indent += "    ";         }     }     public static void main(String args[])     {         displayDocument(args[0]);         for(int loopIndex = 0; loopIndex < numberDisplayLines; loopIndex++){             System.out.println(displayStrings[loopIndex]);         }     } }

I'll parse and display ch11_01.xml like this after compiling ch11_03.java. In this case, I'll pipe the output through the more filter to stop it from scrolling off the screen (the more filter is available in MS DOS and certain UNIX ports):

 %java ch11_03 ch11_01.xml  more

You can see the results in Figure 11-1. As you see in this figure, the program works as it should: The document appears with all elements and text intact, indented properly. Congratulations, now you're able to handle most of what you'll find in XML documents using Java! Note that you can use this program as a text-based browser: You can give it the name of any XML document on the Internetnot just local documentsto parse, and it'll fetch that document and parse it.

Figure 11-1. Parsing an XML document.

graphics/11fig01.gif