The Text Class | Processing XML with Javaв„ў: A Guide to SAX, DOM, JDOM, JAXP, and TrAX

JDOM uses the Text class internally to represent text nodes. In normal usage, you don't deal with this class directlyyou just use strings. The one time you may encounter it is when you use getContent() to retrieve all the children of the node, and you're iterating through the list returned. In this case, you will see Text objects.

Each Text object has a parent Element (which may be null) and a String value that holds the content of the node. This value may contain characters like < and &. If so, they will be escaped when the node is serialized. They do not need to be escaped before being inserted into a Text object.

The Text class, summarized in Example 15.11, has methods to get, set, and detach the parent Element ; to get and set the text content as a String ; to append more text to the node; and to get the content of the node after trimming or normalizing white space. And of course, it has the other usual Java methods such as equals() , hashCode() , and clone() that all JDOM objects possess.

Example 15.11 The JDOM Text Class

 package org.jdom; public class Text implements Serializable, Cloneable {   protected String value;   protected Object parent;   protected Text();   public    Text(String s);   public String getText();   public String getTextTrim();   public String getTextNormalize();   public static String normalizeString(String s);   public Text     setText(String s);   public void     append(String s);   public void     append(Text text);   public Element  getParent();   public Document getDocument();   protected Text  setParent(Element parent);   public Text     detach();   public       String  toString();   public final int     hashCode();   public final boolean equals(Object ob);   public       Object  clone(); }

JDOM does not guarantee that each run of text is represented by a single text node. Rather, Text objects can be adjacent to each other, which can make it a little tricky to retrieve the complete content of an element. For example, consider this element:

 <vendor>    Gus's  Crawfish </vendor>

Just from looking at the XML, there's no way to say whether the Element object representing the vendor element contains one Text object or two. Indeed, in extreme cases, it may contain three, four, or even more. If this element was read by SAXBuilder , then JDOM treats it as a single Text object. On the other hand, if a program created it or modified it in memory, then all bets are off.

In fact, this is of concern even if JDOM did not allow adjacent text nodes. For example, consider this element:

 <vendor>    Gus's <!-- This is my brother-in-law. My wife asked me to        throw him some business. --> Crawfish </vendor>

The text content of the vendor element is the same as before, but now there's no way for JDOM to represent it as a single Text object.

You must also consider the case in which an element contains child elements such as this one:

 <vendor>    Gus's <seafood>Crawfish</seafood> </vendor>

To accumulate the complete text of an element, you need to iterate through its children while recursively processing any element children. This getFullText() method demonstrates :

 public static String getFullText(Element element) {   StringBuffer result = new StringBuffer();   List content = element.getContent();   Iterator iterator = content.iterator();   while (iterator.hasNext()) {     Object o = iterator.next();     if (o instanceof Text) {       Text t = (Text) o;       result.append(t.getText());     }     else if (o instanceof Element) {       Element child = (Element) o;       result.append(getFullValue(child));     }   }   return result.toString(); }

Chapter 11 demonstrated a program that encoded all the text of a document, but not its markup, in rot-13 using DOM. Let's repeat that example here, but with JDOM instead. You can compare Example 15.12 with Example 11.8 to get a good feeling for the differences between DOM and JDOM. The DOM version is significantly more complex, particularly when it comes to building the document and then serializing it.

Example 15.12 JDOM-Based Rot-13 Encoder for XML Documents

 import org.jdom.*; import org.jdom.output.XMLOutputter; import org.jdom.input.SAXBuilder; import java.io.IOException; import java.util.*; public class ROT13XML {   // note use of recursion   public static void encode(Element element) {     List content = element.getContent();     Iterator iterator = content.iterator();     while (iterator.hasNext()) {       Object o = iterator.next();       if (o instanceof Text) {         Text t = (Text) o;         String cipherText = rot13(t.getText());         t.setText(cipherText);       }       else if (o instanceof Element) {         Element child = (Element) o;         encode(child);       }     }   }   public static String rot13(String s) {     StringBuffer out = new StringBuffer(s.length());     for (int i = 0; i < s.length(); I++) {       int c = s.charAt(i);       if (c >= 'A' && c <= 'M') out.append((char) (c+13));       else if (c >= 'N' && c <= 'Z') out.append((char) (c-13));       else if (c >= 'a' && c <= 'm') out.append((char) (c+13));       else if (c >= 'n' && c <= 'z') out.append((char) (c-13));       else out.append((char) c);     }     return out.toString();   }   public static void main(String[] args) {     if (args.length <= 0) {       System.out.println("Usage: java ROT13XML URL");       return;     }     String url = args[0];     try {       SAXBuilder parser = new SAXBuilder();       // Read the document       Document document = parser.build(url);       // Modify the document       ROT13XML.encode(document.getRootElement());       // Write it out again       XMLOutputter outputter = new XMLOutputter();       outputter.output(document, System.out);     }     catch (JDOMException e) {       System.out.println(url + " is not well-formed.");     }     catch (IOException e) {       System.out.println(        "Due to an IOException, the parser could not encode " + url       );     }   } // end main }

Here is a joke encoded by this program. You'll have to run the program if you want to find out what it says. :-)

 D:\books\XMLJAVA>  java ROT13XML joke.xml  <?xml version="1.0" encoding="UTF-8"?> <joke>   Gur qrsvavgvba bs n yvoregnevna vf n pbafreingvir   haqre vaqvpgzrag. </joke>