The EntityReference Interface


The EntityReference interface represents a general entity reference such as   or &copyright_notice; . (It is not used for the five predefined entity references & , < , > , ' , and " .)

Example 11.13 summarizes the EntityReference interface. You'll notice it declares exactly zero methods of its own. It inherits all of its functionality from the Node superinterface. In an XML document, an entity reference is just a placeholder for the text that will replace it. In a DOM tree, an EntityReference object merely contains the things that will replace the entity reference.

Example 11.13 The EntityReference Interface
 package org.w3c.dom; public interface EntityReference extends Node { } 

The name of the entity reference is returned by the getNodeName() method. The replacement text for the entity ( assuming that the parser has resolved the entity) can be read through the usual methods of the Node interface, such as getFirstChild() . However, entity references are read only. You cannot change their children using methods such as appendChild() or replaceChild() or change their names using methods such as setNodeName() . An attempt to do so throws a DOMException with the error code NO_MODIFICATION_ALLOWED_ERR .

EntityReference objects do not know their own system ID (URL) or public ID. Using the entity reference's name, however, you can look up this information in the NamedNodeMap of Entity objects returned by the getEntities() method of the DocumentType class. I'll show you an example of this when we get to the Entity interface. In the meantime, let's consider an example that creates new entity references in the tree.

One common complaint about XML is that it doesn't support the entity references like   and é which developers are accustomed to from HTML. Using DOM, it's uncomplicated to replace any inconvenient character with an entity reference, as Example 11.14 proves. This program recursively descends the element tree looking for any nonbreaking space characters (Unicode code point 0xA0). It replaces any it finds with an entity reference with the name nbsp. To do so, it has to split the text node around the nonbreaking space.

Example 11.14 Inserting Entity References into a Document
 import org.w3c.dom.*; public class NBSPUtility {   // Recursively descend the tree replacing all nonbreaking   // spaces with &nbsp;   public static void addEntityReferences(Node node) {     int type = node.getNodeType();     if (type == Node.TEXT_NODE) {                 // the only type with attributes       Text text = (Text) node;       String s = text.getNodeValue();       int nbsp = s.indexOf('\u00A0'); // finds the first A0       if (nbsp != -1) {         Text middle = text.splitText(nbsp);         Text end = middle.splitText(1);         Node parent = text.getParentNode();         Document factory = text.getOwnerDocument();         EntityReference ref =          factory.createEntityReference("nbsp");         parent.replaceChild(ref, middle);         addEntityReferences(end); // finds any subsequent A0s         System.out.println("Added");       }     } // end if     else if (node.hasChildNodes()) {       NodeList children = node.getChildNodes();       for (int i = 0; i < children.getLength(); i++) {         Node child = children.item(i);         addEntityReferences(child);       } // end for     } // end if   }  // end addEntityReferences() } 

It would be easy enough to make it replace all of the Latin-1 characters, or all of the characters that have standard entity references in HTML, or some such. You'd just need to keep a table of the characters and their corresponding entity references. You could even build such a table from the entities map available from the DTD.

Although this code runs, the documents it produces are not necessarily well- formed . In particular, only entities defined in the DTD should be used. Assuming that's the case, then the child list of the entity will be automatically filled by the entity's replacement text. Unfortunately, however, DOM does not offer any means of defining new entities that are not part of the document's original DTD.



Processing XML with Java. A Guide to SAX, DOM, JDOM, JAXP, and TrAX
Processing XML with Javaв„ў: A Guide to SAX, DOM, JDOM, JAXP, and TrAX
ISBN: 0201771861
EAN: 2147483647
Year: 2001
Pages: 191

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net