The Comment Interface | Processing XML with Javaв„ў: A Guide to SAX, DOM, JDOM, JAXP, and TrAX

XML comments don't have a lot of structure. They're really just some undifferentiated text inside  . Therefore, the Comment interface, shown in Example 11.19, is a subinterface of CharacterData and shares all of its method with that interface. However, your code can use the type to determine that a node is a comment, and treat it appropriately. Serializers will be smart enough to output a Comment with the right markup around it.

Example 11.19 The Comment Interface

 package org.w3c.dom; public interface Comment extends CharacterData { }

In Example 7.12, I demonstrated a SAX program that reads comments. Now in Example 11.20, you can see the DOM equivalent. The approach is differentactively walking a tree instead of passively receiving eventsbut the effect is the same, printing the contents of comments and only comments on System.out .

Example 11.20 A DOM Program That Prints Comments

 import javax.xml.parsers.*; import org.w3c.dom.*; import org.xml.sax.SAXException; import java.io.IOException; public class DOMCommentReader {   // note use of recursion   public static void printComments(Node node) {     int type = node.getNodeType();     if (type == Node.COMMENT_NODE) {       Comment comment = (Comment) node;       System.out.println(comment.getData());       System.out.println();     }     else {       if (node.hasChildNodes()) {         NodeList children = node.getChildNodes();         for (int i = 0; i < children.getLength(); i++) {           printComments(children.item(i));         }       }     }   }   public static void main(String[] args) {     if (args.length <= 0) {       System.out.println("Usage: java DOMCommentReader URL");       return;     }     String url = args[0];     try {       DocumentBuilderFactory factory        = DocumentBuilderFactory.newInstance();       DocumentBuilder parser = factory.newDocumentBuilder();       // Read the document       Document document = parser.parse(url);       // Process the document       DOMCommentReader.printComments(document);     }     catch (SAXException e) {       System.out.println(url + " is not well-formed.");     }     catch (IOException e) {       System.out.println(        "Due to an IOException, the parser could not check " + url       );     }     catch (FactoryConfigurationError e) {       System.out.println("Could not locate a factory class");     }     catch (ParserConfigurationException e) {       System.out.println("Could not locate a JAXP parser");     }   } // end main }

The following is the result of running this program on the XML Schema Datatypes specification:

 D:\books\XMLJAVA>  java DOMCommentReader  http://www.w3.org/TR/   2001/REC-xmlschema-2-20010502/datatypes.xml  commenting these out means only that they won't show up in the     stylesheet generated "Revisions from previous draft" appendix Changes before Sept public draft commented out... <sitem> 19990521: PVB: corrected definition of length and maxLengths facet for strings to be in terms of <emph>characters</emph> not <emph>bytes</emph> </sitem> <sitem> 19990521: PVB: removed issue "other-date-representations". We don't want other separators, left mention of aggregate reps for dates as an ednote. </sitem> <sitem> 19990521: PVB: fixed "holidays" example, "-0101" ==> "==0101" (where == in the correction should be two hyphens, but that would not allow us to comment out this sitem) ...

It's not obvious from this output sample, but there is a big difference in the behavior of the SAX and DOM versions of this program. The SAX version begins producing output almost immediately because it works in streaming mode. By contrast, the DOM version first has to read the entire document from the remote URL, parse it, and only then begin walking the tree to look for comments. The SAX and DOM versions are both limited by the speed of the network connection, so they both take equal amounts of time to run on the same input data. However, the SAX version begins returning results much more quickly than the DOM version, which doesn't present any results until the entire document has been read. This may not be a big concern in a batch-mode application, but it can be very important when there is a human user . The SAX version will feel a lot more responsive .