Displaying an Entire Document

In this next example, as in the previous chapter, I'm going to write a program that parses and displays an entire document, indenting each element, processing instruction, and so on, as well as displaying attributes and their values. Here, however, I'll use SAX methods, not DOM methods . If you pass ch12_01.xml to this program, which I'll call ch12_03.java, that program will display the whole document properly indented.

I start by letting the user specify what document to parse and parsing that document. To actually parse the document, I'll create an object of the current class and call that object's displayDocument method from the main method, as before. The displayDocument method will fill the array displayStrings with the formatted document, and the main method will print it:

 import org.xml.sax.*;  import org.xml.sax.helpers.DefaultHandler; import javax.xml.parsers.*; import java.io.*; public class ch12_03 extends DefaultHandler {  public static void displayDocument(String uri)   {   .   .   .   }  public static void main(String args[])     {  ch12_03 obj = new ch12_03();   obj.displayDocument(args[0]);   for(int index = 0; index < numberDisplayLines; index++){   System.out.println(displayStrings[index]);   }  } } 

In the displayDocument method, I'll create a SAX parser. You need to register an object with the SAX parser whose callback methods the parser will call. We can use the current object for that, as in the previous example in this chapter:

 import org.xml.sax.*;  import org.xml.sax.helpers.DefaultHandler; import javax.xml.parsers.*; import java.io.*; public class ch12_03 extends DefaultHandler {  public void displayDocument(String uri)   {   DefaultHandler handler = this;   SAXParserFactory factory = SAXParserFactory.newInstance();   try {   SAXParser saxParser = factory.newSAXParser();   saxParser.parse(new File(uri), handler);   } catch (Throwable t) {}   }  public static void main(String args[])     {         ch12_03 obj = new ch12_03();         obj.displayDocument(args[0]);         for(int index = 0; index < numberDisplayLines; index++){             System.out.println(displayStrings[index]);         }     } } 

All that's left is to create the various methods that will be called for SAX events, and I'll start with the beginning of the document.

Handling the Beginning of Documents

When the SAX parser encounters the beginning of the document to parse, it calls the startDocument method. This method is not passed any arguments, so I'll just have the program display the XML declaration. As in the previous chapter, I'll store the text to display in the array of String objects named displayStrings , our location in that array in the integer variable numberDisplayLines , and the current indentation level in a String object named indent . Using an array of strings like this facilitates the conversion process when we adapt this program to display in a Java window.

Here's how I add the XML declaration to the display strings in startDocument :

 import org.xml.sax.*;  import org.xml.sax.helpers.DefaultHandler; import javax.xml.parsers.*; import java.io.*; public class ch12_03 extends DefaultHandler {  static String displayStrings[] = new String[1000];   static int numberDisplayLines = 0;   static String indent = "";  public void displayDocument(String uri)     {         DefaultHandler handler = this;         SAXParserFactory factory = SAXParserFactory.newInstance();         try {             SAXParser saxParser = factory.newSAXParser();             saxParser.parse(new File(uri), handler);         } catch (Throwable t) {}     }  public void startDocument()   {   displayStrings[numberDisplayLines] = indent;   displayStrings[numberDisplayLines] += "<?xml version=\"1.0\" encoding=\""+   "UTF-8" + "\"?>";   numberDisplayLines++;   }  .     .     .     public static void main(String args[])     {         ch12_03 obj = new ch12_03();         obj.displayDocument(args[0]);         for(int index = 0; index < numberDisplayLines; index++){             System.out.println(displayStrings[index]);         }     } } 

I'll take a look at handling processing instructions next.

Handling Processing Instructions

You can handle processing instructions with the processingInstruction callback. This method is called with two arguments, the processing instruction's target and its data. For example, in the <?xml-stylesheet type="text/css" href="style.css"?> , its target is xml-stylesheet and its data is type="text/css" href="style.css" .

Here's how I handle processing instructions, adding them to the display strings. Note that I check first to make sure there is some data before adding it to the processing instruction's display:

 import org.xml.sax.*;  import org.xml.sax.helpers.DefaultHandler; import javax.xml.parsers.*; import java.io.*; public class ch12_03 extends DefaultHandler {     static String displayStrings[] = new String[1000];     static int numberDisplayLines = 0;     static String indent = "";  public void processingInstruction(String target, String data)   {   displayStrings[numberDisplayLines] = indent;   displayStrings[numberDisplayLines] += "<?";   displayStrings[numberDisplayLines] += target;   if (data != null && data.length() > 0) {   displayStrings[numberDisplayLines] += ' ';   displayStrings[numberDisplayLines] += data;   }   displayStrings[numberDisplayLines] += "?>";   numberDisplayLines++;   }  public static void main(String args[])     {     .     .     .     } } 

Handling the Beginning of Elements

You can handle the start of elements with the startElement method. Because we've found a new element, I'll add four spaces to the current indentation to handle any children the element has, and I'll display its name using the qualifiedName argument:

 import org.xml.sax.*;  import org.xml.sax.helpers.DefaultHandler; import javax.xml.parsers.*; import java.io.*; public class ch12_03 extends DefaultHandler {     static String displayStrings[] = new String[1000];     static int numberDisplayLines = 0;     static String indent = "";  public void startElement(String uri, String localName, String qualifiedName,   Attributes attributes)   {   displayStrings[numberDisplayLines] = indent;   indent += "    ";   displayStrings[numberDisplayLines] += '<';   displayStrings[numberDisplayLines] += qualifiedName;   displayStrings[numberDisplayLines] += '>';   numberDisplayLines++;   }  public static void main(String args[])     {     .     .     .     } } 

That's enough to display the opening tag of an element, but what if the element has attributes?

Handling Attributes

One of the arguments passed to the startElement method is an object that implements the Attributes interface:

 public void startElement(String uri, String localName, String qualifiedName,      Attributes attributes) { . . . } 

This object gives you access to the attributes of the element, and you can find the methods of the Attributes interface in Table 12-4. You can reach the attributes in an object that implements this interface based on index, name, or namespace-qualified name.

Table 12-4. Attributes Interface Methods
Method Summary
int getIndex (java.lang.String qualifiedName) Gets the index of an attribute, given its qualified name
int getIndex(java.lang. String uri, java.lang.String localPart) Gets the index of an attribute by namespace and local name
int getLength() Gets the number of attributes in the list
java.lang.String getLocalName (int index) Gets an attribute's local name by index
java.lang.String getQName (int index) Gets an attribute's qualified name by index
java.lang.String getType (int index) Gets an attribute's type by index
java.lang.String getType (java.lang.String qualifiedName) Gets an attribute's type by qualified name
java.lang.String getType (java.lang.String uri, java.lang.String localName) Gets an attribute's type by namespace and local name
java.lang.String getURI (int index) Gets an attribute's namespace URI by index
java.lang.String getValue (int index) Gets an attribute's value by index
java.lang.String getValue (java.lang.String qualifiedName) Gets an attribute's value by qualified name
java.lang.String getValue (java.lang.String uri, java.lang. String localName) Gets an attribute's value by namespace name and local name

How do we find and display all the attributes an element has? I'll find the number of attributes using the Attributes interface's getLength method. Then I'll get the names and values of the attributes with the getLocalName and getValue methods, referring to attributes by index. Note that I first ensure that this element actually has attributes by checking to make sure that the attributes argument is not null:

 import org.xml.sax.*;  import org.xml.sax.helpers.DefaultHandler; import javax.xml.parsers.*; import java.io.*; public class ch12_03 extends DefaultHandler {     static String displayStrings[] = new String[1000];     static int numberDisplayLines = 0;     static String indent = "";     public void startElement(String uri, String localName, String qualifiedName,         Attributes attributes)     {         displayStrings[numberDisplayLines] = indent;         indent += "    ";         displayStrings[numberDisplayLines] += '<';         displayStrings[numberDisplayLines] += qualifiedName;  if (attributes != null) {   int numberAttributes = attributes.getLength();   for (int loopIndex = 0; loopIndex < numberAttributes; loopIndex++) {   displayStrings[numberDisplayLines] += ' ';   displayStrings[numberDisplayLines] += attributes.getLocalName(loopIndex);   displayStrings[numberDisplayLines] += "=\"";   displayStrings[numberDisplayLines] += attributes.getValue(loopIndex);   displayStrings[numberDisplayLines] += '"';   }   }  displayStrings[numberDisplayLines] += '>';         numberDisplayLines++;     }     public static void main(String args[])     {     .     .     .     } } 

That's all it takes; now we're handling the element's attributes as well.

Handling Text

Many of the elements in ch12_01.xml, such as the <FIRST_NAME> and <LAST_NAME> elements, contain text; we want to display that text. To handle element text, you use the characters callback.

This method is called with three arguments: an array of type char that holds the actual character text, the starting location in the array, and the length of the text. For elements that contain only one text node, the starting location is always in the character array.

To add the text inside an element to the display strings, I implement the characters method, converting the character array to a Java String object named characterData like this. Note that I use the String class's trim method to trim the text of leading and trailing spaces:

 import org.xml.sax.*;  import org.xml.sax.helpers.DefaultHandler; import javax.xml.parsers.*; import java.io.*; public class ch12_03 extends DefaultHandler {     static String displayStrings[] = new String[1000];     static int numberDisplayLines = 0;     static String indent = "";     public void characters(char characters[], int start, int length)     {  String characterData = (new String(characters, start, length)).trim();   .   .   .  }     public static void main(String args[])     {     .     .     .     } } 

To eliminate indentation textthe spaces used to indent the elements in the file ch12_01.xmlI add an if statement and then add the text itself to the display strings, this way:

 import org.xml.sax.*;  import org.xml.sax.helpers.DefaultHandler; import javax.xml.parsers.*; import java.io.*; public class ch12_03 extends DefaultHandler {     static String displayStrings[] = new String[1000];     static int numberDisplayLines = 0;     static String indent = "";     public void characters(char characters[], int start, int length)     {         String characterData = (new String(characters, start, length)).trim();  if(characterData.indexOf("\n") < 0 && characterData.length() > 0) {   displayStrings[numberDisplayLines] = indent;   displayStrings[numberDisplayLines] += characterData;   numberDisplayLines++;   }  }     public static void main(String args[])     {     .     .     .     } } 

That's all there is to it. By default, the Java SAX parser reports the whitespace that a document uses for indentation, which is called "ignorable" whitespace.

Handling Ignorable Whitespace

So how do you actually ignore "ignorable" whitespace? The SAX parser needs to know only what text it can ignore, so you must indicate what the proper grammar of the document is. You could do this with a DTD in ch12_01.xml:

 <?xml version = "1.0" standalone="yes"?>  <!DOCTYPE DOCUMENT [   <!ELEMENT DOCUMENT (CUSTOMER)*>   <!ELEMENT CUSTOMER (NAME,DATE,ORDERS)>   <!ELEMENT NAME (LAST_NAME,FIRST_NAME)>   <!ELEMENT LAST_NAME (#PCDATA)>   <!ELEMENT FIRST_NAME (#PCDATA)>   <!ELEMENT DATE (#PCDATA)>   <!ELEMENT ORDERS (ITEM)*>   <!ELEMENT ITEM (PRODUCT,NUMBER,PRICE)>   <!ELEMENT PRODUCT (#PCDATA)>   <!ELEMENT NUMBER (#PCDATA)>   <!ELEMENT PRICE (#PCDATA)>   ]>  <DOCUMENT>     <CUSTOMER>         <NAME>             <LAST_NAME>Smith</LAST_NAME>             <FIRST_NAME>Sam</FIRST_NAME>         </NAME>         <DATE>October 15, 2003</DATE>         <ORDERS>         .         .         .         <ORDERS>             <ITEM>                 <PRODUCT>Asparagus</PRODUCT>                 <NUMBER>12</NUMBER>                 <PRICE>.95</PRICE>             </ITEM>             <ITEM>                 <PRODUCT ID = "5231" TYPE = "3133">Lettuce</PRODUCT>                 <NUMBER>6</NUMBER>                 <PRICE>.50</PRICE>             </ITEM>         </ORDERS>     </CUSTOMER> </DOCUMENT> 

Now, the SAX parser will not call the characters callback when it sees ignorable whitespace (such as indentation spaces); it will call a method named ignorableWhitespace . That means you can comment out the if statement I used to filter out ignorable whitespace before:

 public void characters(char characters[], int start, int length)  {     String characterData = (new String(characters, start, length)).trim();  //if(characterData.indexOf("\n") < 0 && characterData.length() > 0) {  displayStrings[numberDisplayLines] = indent;         displayStrings[numberDisplayLines] += characterData;         numberDisplayLines++;  //}  } 

That's all it takes. To filter out ignorable whitespace, just give the SAX parser some way of figuring out what is ignorable, such as adding a DTD to your document.

Note that you can add code to the ignorableWhitespace to handle that whitespace, if you like. In fact, you can even pass it on to the characters callback, as I'm doing here:

 public void ignorableWhitespace(char characters[], int start, int length)  {     characters(characters, start, length); } 

Handling the End of Elements

So far, we've handled the start of each element and incremented the indentation level each time to handle any possible children. We also have to display the end tag for each element and decrement the indentation level. I'll do that in the endElement callback, which is called each time the SAX parser reaches the end of an element. Here's what that looks like in code:

 import org.xml.sax.*;  import org.xml.sax.helpers.DefaultHandler; import javax.xml.parsers.*; import java.io.*; public class ch12_03 extends DefaultHandler {     static String displayStrings[] = new String[1000];     static int numberDisplayLines = 0;     static String indent = "";  public void endElement(String uri, String localName, String qualifiedName)   {   indent = indent.substring(0, indent.length()    -    4);   displayStrings[numberDisplayLines] = indent;   displayStrings[numberDisplayLines] += "</";   displayStrings[numberDisplayLines] += qualifiedName;   displayStrings[numberDisplayLines] += '>';   numberDisplayLines++;   }  public static void main(String args[])     {     .     .     .     } } 

There's one last topic to cover: handling errors and warnings.

Handling Errors and Warnings

The DefaultHandler interface defines several callbacks to handle warnings and errors from the parser. These methods are warning , which handles parser warnings; error , which handles parser errors; and fatalError , which handles errors so severe that the parser can't continue.

Each of these methods is passed an object of the class SAXParseException . That object supports a method, getMessage , that will return the warning or error message. I display those messages using System.err.println message, which prints to the Java err output channel, which corresponds to the console by default:

 import org.xml.sax.*;  import org.xml.sax.helpers.DefaultHandler; import javax.xml.parsers.*; import java.io.*; public class ch12_03 extends DefaultHandler {     static String displayStrings[] = new String[1000];     static int numberDisplayLines = 0;     static String indent = "";     .     .     .  public void warning(SAXParseException exception)   {   System.err.println("WARNING! " +   exception.getMessage());   }   public void error(SAXParseException exception)   {   System.err.println("ERROR! " +   exception.getMessage());   }   public void fatalError(SAXParseException exception)   {   System.err.println("FATAL ERROR! " +   exception.getMessage());   }  public static void main(String args[])     {     .     .     .     } } 

That's all we need. You can see the results of parsing ch12_01.xml in Figure 12-1, where I'm using the MS DOS more filter to stop the display from scrolling off the top of the window.

Figure 12-1. Parsing an XML document with a SAX parser.

graphics/12fig01.gif

This program is a success, and here's the complete code:

Listing ch12_03.java
 import org.xml.sax.*; import org.xml.sax.helpers.DefaultHandler; import javax.xml.parsers.*; import java.io.*; public class ch12_03 extends DefaultHandler {     static String displayStrings[] = new String[1000];     static int numberDisplayLines = 0;     static String indent = "";     public void displayDocument(String uri)     {         DefaultHandler handler = this;         SAXParserFactory factory = SAXParserFactory.newInstance();         try {             SAXParser saxParser = factory.newSAXParser();             saxParser.parse(new File(uri), handler);         } catch (Throwable t) {}     }     public void startDocument()     {         displayStrings[numberDisplayLines] = indent;         displayStrings[numberDisplayLines] += "<?xml version=\"1.0\" encoding=\""+             "UTF-8" + "\"?>";         numberDisplayLines++;     }     public void processingInstruction(String target, String data)     {         displayStrings[numberDisplayLines] = indent;         displayStrings[numberDisplayLines] += "<?";         displayStrings[numberDisplayLines] += target;         if (data != null && data.length() > 0) {             displayStrings[numberDisplayLines] += ' ';             displayStrings[numberDisplayLines] += data;         }         displayStrings[numberDisplayLines] += "?>";         numberDisplayLines++;     }     public void startElement(String uri, String localName, String qualifiedName,         Attributes attributes)     {         displayStrings[numberDisplayLines] = indent;         indent += "    ";         displayStrings[numberDisplayLines] += '<';         displayStrings[numberDisplayLines] += qualifiedName;         if (attributes != null) {             int numberAttributes = attributes.getLength();             for (int loopIndex = 0; loopIndex < numberAttributes; loopIndex++) {                 displayStrings[numberDisplayLines] += ' ';                 displayStrings[numberDisplayLines] += attributes.getLocalName(loopIndex);                 displayStrings[numberDisplayLines] += "=\"";                 displayStrings[numberDisplayLines] += attributes.getValue(loopIndex);                 displayStrings[numberDisplayLines] += '"';             }         }         displayStrings[numberDisplayLines] += '>';         numberDisplayLines++;     }     public void characters(char characters[], int start, int length)     {         String characterData = (new String(characters, start, length)).trim();         if(characterData.indexOf("\n") < 0 && characterData.length() > 0) {             displayStrings[numberDisplayLines] = indent;             displayStrings[numberDisplayLines] += characterData;             numberDisplayLines++;         }     }     public void ignorableWhitespace(char characters[], int start, int length)     {         //characters(characters, start, length);     }     public void endElement(String uri, String localName, String qualifiedName)     {         indent = indent.substring(0, indent.length() - 4);         displayStrings[numberDisplayLines] = indent;         displayStrings[numberDisplayLines] += "</";         displayStrings[numberDisplayLines] += qualifiedName;         displayStrings[numberDisplayLines] += '>';         numberDisplayLines++;     }     public void warning(SAXParseException exception)     {         System.err.println("WARNING! " +             exception.getMessage());     }     public void error(SAXParseException exception)     {         System.err.println("ERROR! " +             exception.getMessage());     }     public void fatalError(SAXParseException exception)     {         System.err.println("FATAL ERROR! " +             exception.getMessage());     }     public static void main(String args[])     {         ch12_03 obj = new ch12_03();         obj.displayDocument(args[0]);         for(int index = 0; index < numberDisplayLines; index++){             System.out.println(displayStrings[index]);         }     } } 


Real World XML
Real World XML (2nd Edition)
ISBN: 0735712867
EAN: 2147483647
Year: 2005
Pages: 440
Authors: Steve Holzner

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net