The EntityRef Class | Processing XML with Javaв„ў: A Guide to SAX, DOM, JDOM, JAXP, and TrAX

The EntityRef class shown in Example 15.21 represents a defined entity reference such as © or &chapter1; . It is used only for entity references that the parser does not expand. Given a fully validating parser, or even just one that reads the external DTD subset, no EntityRef objects will normally be present in the tree.

Example 15.21 The JDOM EntityRef Class

 package org.jdom; public class EntityRef implements Serializable, Cloneable {   protected String name;   protected String publicID;   protected String systemID;   protected Object parent;   protected EntityRef();   public EntityRef(String name);   public EntityRef(String name, String systemID);   public EntityRef(String name, String publicID,    String systemID);   public EntityRef detach();   public Document  getDocument();   public String    getName();   public EntityRef setName(String name);   public Element   getParent();   public String    getPublicID();   public EntityRef setPublicID(String newPublicID);   public String    getSystemID();   public EntityRef setSystemID(String newSystemID);   public final boolean equals(Object ob);   public final int     hashCode();   public       String  toString();   public       Object  clone(); }

Each EntityRef object has these four properties:

The entity name
The public ID of the entity
The system ID of the entity
The parent Element of the entity

The public and system IDs will be null if the parser did not read the part of the DTD that defined the entity.

The one thing you might expect that is not available is the entity's replacement text. Unlike the EntityReference interface in DOM, JDOM EntityRef objects do not have any children. If the builder knows the replacement text of the entity, then it will insert the corresponding nodes in the tree rather than including an EntityRef object.

There's infrequent need to use this class directly. You can add an entity reference in place of the characters that you know are going to cause problems in your choice of encoding. On the other hand, you're probably better off just letting the XMLOutputter emit numeric character references instead. If you do choose to insert EntityRef objects into your JDOM tree, then be sure to use a DocType that either points to an external DTD subset or includes an internal DTD subset that defines your entities. JDOM will not do this for you automatically, so if you aren't careful you can produce a malformed document.

For an example, let's turn once again to XHTML. Browsers generally use nonvalidating parsers and tend not to read the external DTD subset by default. Thus they're likely to encounter skipped entity references. The XHTML specification states:

If it encounters an entity reference (other than one of the predefined entities) for which the User Agent has processed no declaration (which could happen if the declaration is in the external subset which the User Agent hasn't read), the entity reference should be rendered as the characters (starting with the ampersand and ending with the semi- colon ) that make up the entity reference.

Here is a simple method that assists with this requirement by converting all EntityRef objects in a tree to Text objects of the form &name; .

 public static void entityRefToText(Element element) {   List content = element.getContent();   ListIterator iterator = content.listIterator();   while (iterator.hasNext()) {     Object o = iterator.next();     if (o instanceof Element) {       Element child = (Element) o;       entityRefToText(child);     }     else if (o instanceof EntityRef) {       EntityRef ref = (EntityRef) o;       Text fauxRef = new Text("&");       fauxRef.append(ref.getName());       fauxRef.append(";");       iterator.set(fauxRef);     }   } }

There's one technique here you may not have seen before. Instead of a basic Iterator , I used a ListIterator . The reason is that ListIterator has an optional set() method (which JDOM does implement) that replaces the last object returned by next() with another object. That's how I replace the EntityRef with a Text .

Caution

Be sure you understand the difference here. A Text object always contains plain text, never an entity reference or a tag, even if it contains some characters such as & and < that might need to be escaped when the document is serialized. For example, invoking element.setText("<") sets the content of element to the four characters &, l, t, and ; in that order. It does not set it to the single character <. When element is serialized, its content will be written as &lt; .