The CDATASection Interface | Processing XML with Javaв„ў: A Guide to SAX, DOM, JDOM, JAXP, and TrAX

The CDATASection interface, shown in Example 11.11, is a subinterface of Text that specifically represents CDATA sections. It has no unique methods of its own, but when a CDATASection is serialized into a file, the text of the node may be wrapped inside CDATA section markers. As a result, characters such as ampersand and the less-than sign do not need to be escaped as & and < .

Example 11.11 The CDATASection Interface

 package org.w3c.dom; public interface CDATASection extends Text { }

CDATA sections are convenient syntax sugar for documents that will sometimes be read or authored by human beings in source code form. The source code for this book uses them frequently for examples. Please don't use CDATA sections for more than that. With the possible exception of editors, all programs that process XML documents should treat CDATA sections as identical to the same text with all the less-than signs changed to < and all the ampersands changed to & . In particular, do not use CDATA sections as a sort of pseudo-element to hide HTML in your XML documents, like this:

 <Product>    <Name>Brass Ship's Bell</Name>   <Quantity>1</Quantity>   <Price currency="USD">144.95</Price >   <Discount>.10</Discount>   <![CDATA[<html><body>     <b>Happy Father&rsquo;s Day to a great Dad!<P></b>     <i>Love,<br>     Sam and Beatrice<body></html>]]> </Product>

Instead, write well- formed HTML inside an appropriate element, like this:

 <Product>    <Name>Brass Ship's Bell</Name>   <Quantity>1</Quantity>   <Price currency="USD">144.95</Price >   <Discount>.10</Discount>   <GiftMessage><html><body>     <p><b>Happy Father's Day to a great Dad!</b></p>     <i>Love,<br />     Sam and Beatrice</i></body></html>   </GiftMessage> </Product>

The second example is much more flexible and much more robust. DOM parsers are not required to report CDATA sections, and other processes are even less likely to maintain them, so you should not use CDATA sections as a substitute for elements.

The normalize() method in the Node interface does not combine CDATA sections with adjacent text nodes or other CDATA sections. Example 11.12 provides a static utility method that does do this. A Node is passed in as an argument. All CDATASection descendants of this node are converted to simple Text objects, and then all adjacent Text objects are merged. The argument is modified in place. Thus the method returns void.

Example 11.12 Merging CDATA Sections with Text Nodes

 import org.w3c.dom.*; public class CDATAUtility {   // Recursively descend the tree converting all CDATA sections   // to text nodes and merging them with adjacent text nodes.   public static void superNormalize(Node parent) {     // We'll need this to create new Text objects     Document factory = parent.getOwnerDocument();     Node current = parent.getFirstChild();     while (current != null) {       int type = current.getNodeType();       if (type == Node.CDATA_SECTION_NODE) {         // Convert CDATA section to a text node         CDATASection cdata = (CDATASection) current;         String data = cdata.getData();         Text newNode = factory.createTextNode(data);         parent.replaceChild(newNode, cdata);         current = newNode;       }       // Recheck in case we changed type above       type = current.getNodeType();       if (type == Node.TEXT_NODE) {         // If previous node is a text node, then append this         // node's data to that node, and delete this node         Node previous = current.getPreviousSibling();         if (previous != null) {           int previousType = previous.getNodeType();           if (previousType == Node.TEXT_NODE) {             Text previousText = (Text) previous;             Text currentText = (Text) current;             String data = currentText.getData();             previousText.appendData(data);             parent.removeChild(current);             current = previous;           }         }       } // end if       else {// recurse         superNormalize(current);       }       // increment node       current = current.getNextSibling();     } // end while    }  // end superNormalize() }

More than anything else, superNormalize() is an exercise in navigating the DOM tree. It uses the Node methods getFirstChild() , getNextSibling() , and getPreviousChild() in a while loop instead of iterating through a NodeList in a for loop, because it's constantly changing the contents of the node list. Node lists are live, but keeping the loop counter pointed at the right node as the list changes is tricky (not impossible certainly , just not as straightforward as the approach used here).