The Document Object Model defines a tree-based representation of XML documents. The org.w3c.dom package contains the basic node classes that represent the different components which make up the tree. The org.w3c.dom.traversal package includes some useful utility classes for navigating, searching, and querying the tree. DOM Level 2, the version described here, is incomplete. It does not define how a DOMImplementation is loaded, how a document is parsed, or how a document is serialized. For the moment, JAXP provides a stopgap solution. Eventually, DOM3 will fill in these holes, but because it was far from complete at the time of this writing, this appendix covers DOM2 exclusively. The DOM Data ModelTable A.1 summarizes the DOM data model with the name , value, parent, and possible children for each kind of node. Table A.1. DOM2 Node Properties
One thing to keep in mind is the parts of the XML document that are not exposed in this data model:
A DOM program cannot manipulate any of these constructs. It cannot, for example, read in an XML document and then write it out again in the same encoding as in the original document, because it doesn't know what encoding the original document used. It cannot treat $var differently than $var , because it doesn't know which was originally written. org.w3c.domThe org.w3c.dom package contains the core interfaces that are used to form DOM documents. Node is the common superinterface that all of these node types share. In addition, this package contains a few data structures used to hold collections of DOM nodes and one exception class. AttrThe Attr interface represents an attribute node. Its node properties are defined as follows :
Attr objects are not part of the tree, and they have neither parents nor siblings. getParentNode() , getPreviousSibling() , and getNextSibling() all return null when invoked on an Attr object. Attr objects do have children ( Text and EntityReference objects), but it's generally best to ignore this and simply use the getValue() method to read the value of an attribute. package org.w3c.dom; public interface Attr extends Node { public String getName (); public boolean getSpecified (); public String getValue (); public void setValue (String value ) throws DOMException; public Element getOwnerElement (); } CDATASectionThe CDATASection interface represents a CDATA section. DOM parsers are not required to use this interface to report CDATA sections. They may just use Text objects to report the content of CDATA sections. Do not write code that depends on recognizing CDATA sections in text. The node properties of CDATASection are defined as follows:
package org.w3c.dom; public interface CDATASection extends Text { } CharacterDataThe CharacterData interface is the generic superinterface for nodes composed of plain text: Comment , Text , and CDATASection . All actual instances of CharacterData should be instances of one of these subinterfaces. The node properties depend on the specific subinterface. package org.w3c.dom; public interface CharacterData extends Node { public String getData () throws DOMException; public void setData (String data ) throws DOMException; public int getLength (); public String substringData (int offset, int count ) throws DOMException; public void appendData (String s ) throws DOMException; public void insertData (int offset, String s ) throws DOMException; public void deleteData (int offset, int count ) throws DOMException; public void replaceData (int offset, int count, String s ) throws DOMException; } CommentThe Comment interface represents a comment node. It inherits all of its methods from the CharacterData and Node superinterfaces. Its node properties are defined as follows:
package org.w3c.dom; public interface Comment extends CharacterData { } DocumentThe Document interface represents the root node of the tree. It also serves as an abstract factory to create the other kinds of nodes (element, attribute, comment, and so on) that will be stored in the tree. Its node properties are defined as follows:
package org.w3c.dom; public interface Document extends Node { public DocumentType getDoctype (); public DOMImplementation getImplementation (); public Element getDocumentElement (); public Element createElement (String tagName ) throws DOMException; public Element createElementNS ( String namespaceURI, String qualifiedName ) throws DOMException; public Attr createAttribute (String name ) throws DOMException; public Attr createAttributeNS ( String namespaceURI, String qualifiedName ) throws DOMException; public DocumentFragment createDocumentFragment (); public Text createTextNode (String data ); public Comment createComment (String data ); public CDATASection createCDATASection (String data ) throws DOMException; public ProcessingInstruction createProcessingInstruction ( String target, String data ) throws DOMException; public EntityReference createEntityReference (String name ) throws DOMException; public NodeList getElementsByTagName (String tagName ); public Node importNode (Node importedNode, boolean deep ) throws DOMException; public NodeList getElementsByTagNameNS (String namespaceURI, String localName ); public Element getElementById (String id ); } DocumentFragmentThe DocumentFragment interface is used to hold lists of element, text, comment, CDATA section, and processing instruction nodes when those nodes do not have a parent. It's convenient for cutting and pasting or inserting and moving fragments of an XML document that don't necessarily contain a single element. The node properties of DocumentFragment are defined as follows:
package org.w3c.dom; public interface DocumentFragment extends Node { } This interface is for advanced use only. DOM trees created by a parser won't contain any DocumentFragment objects, and adding a DocumentFragment to a Document actually adds the contents of the fragment instead. DocumentTypeThe DocumentType interface represents a document type declaration. It contains the root element name it declares, the system ID and public ID for the external DTD subset, and the complete internal DTD subset as a String . It also contains lists of the notations and general entities declared in the DTD. Otherwise it contains no information from the DTD. The node properties of a DocumentType object are defined as follows:
package org.w3c.dom; public interface DocumentType extends Node { public String getName (); public String getPublicId (); public String getSystemId (); public String getInternalSubset (); public NamedNodeMap getEntities (); public NamedNodeMap getNotations (); } In DOM2, the entire DocumentType object is read-only. No part of it can be modified. Furthermore, a Document object's DocumentType cannot be changed after the Document object is created. This restriction is lifted in DOM3. DOM2 does not provide any representation of the document type definition (DTD) as distinguished from the document type declaration. DOMImplementationDOMImplementation is an abstract factory used to create new Document and DocumentType objects. The javax.xml.parsers.DocumentBuilder class can create new DOMImplementation objects. package org.w3c.dom; public interface DOMImplementation { public DocumentType createDocumentType (String qualifiedName, String publicID, String systemID ) throws DOMException; public Document createDocument (String namespaceURI, String qualifiedName, DocumentType doctype ) throws DOMException; public boolean hasFeature (String feature, String version ); } ElementThe Element interface represents an element node. The most important methods for this interface are inherited from the Node superinterface. Its node properties are defined as follows:
package org.w3c.dom; public interface Element extends Node { public String getTagName (); public NodeList getElementsByTagNameNS (String namespaceURI, String localName ); public NodeList getElementsByTagName (String name ); public String getAttribute (String name ); public void setAttribute (String name, String value ) throws DOMException; public void removeAttribute (String name ) throws DOMException; public Attr getAttributeNode (String name ); public Attr setAttributeNode (Attr newAttr ) throws DOMException; public Attr removeAttributeNode (Attr oldAttr ) throws DOMException; public String getAttributeNS (String namespaceURI, String localName ); public void setAttributeNS (String namespaceURI, String qualifiedName, String value ) throws DOMException; public void removeAttributeNS (String namespaceURI, String localName ) throws DOMException; public Attr getAttributeNodeNS (String namespaceURI, String localName ); public Attr setAttributeNodeNS (Attr newAttr ) throws DOMException; public boolean hasAttribute (String name ); public boolean hasAttributeNS (String namespaceURI, String localName ); } EntityThe Entity interface represents an entity node. It does not appear directly in the tree; instead, an EntityReference node appears in the tree. The name of the EntityReference identifies a member of the document's entities map, which is accessible through the DocumentType interface. If the Entity object represents a parsed entity and the parser resolved the entity, then this node will have children that represent its replacement text. All aspects of the Entity object, including all of its children, are read-only. They cannot be modified or changed in any way. The node properties of Entity are defined as follows:
package org.w3c.dom; public interface Entity extends Node { public String getPublicId (); public String getSystemId (); public String getNotationName (); } Because Entity objects are not part of the tree, they have neither parents nor siblings. getParentNode() , getPreviousSibling() , and getNextSibling() all return null when invoked on an Entity object. EntityReferenceThe EntityReference interface represents a parsed entity reference that appears in the document tree. Parsers are not required to use this class. Some parsers silently resolve all entity references to their replacement text. If a parser does not resolve external entity references, then it must include EntityReference objects instead, although the only information available from these objects will be the name. A parser that does resolve external entity references and chooses to include EntityReference objects anyway will also set the children of this node, so as to represent the entity's replacement text. In this case, you can use the methods inherited from the Node superinterface to walk the entity's tree. Note, however, that all of these children and their descendants are completely read-only, and you will not be able to change them in any way. If you need to modify them, you must first clone each of the EntityReference children, and then replace the EntityReference with the cloned children. EntityReference objects are never used for the five predefined entity references ( < , > , & , " , and ' ,) or for character references such as   or   . The node properties of EntityReference are defined as follows:
package org.w3c.dom; public interface EntityReference extends Node { } NamedNodeMapDOM uses NamedNodeMap data structures to hold unordered sets of attributes, notations, and entities. You can iterate through a map using item() and getLength() . The first item in the map is at index 0. Note that the particular order the implementation chooses is not significant or even reproducible. package org.w3c.dom; public interface NamedNodeMap { public Node getNamedItem (String name ); public Node setNamedItem (Node node ) throws DOMException; public Node removeNamedItem (String name ) throws DOMException; public Node item (int index ); public int getLength (); public Node getNamedItemNS (String namespaceURI, String localName ); public Node setNamedItemNS (Node node ) throws DOMException; public Node removeNamedItemNS (String namespaceURI, String localName ) throws DOMException; } NamedNodeMap s are live. That is, adding an item to the map or removing an item from the map will add it to or remove it from whatever construct produced the map in the first place. NodeNode is the key superinterface for almost all of the other classes in the org.w3c.dom package. It is the primary means by which you navigate, search, query, and occasionally even update an XML document with DOM. package org.w3c.dom; public interface Node { // Node type constants public static final short ELEMENT_NODE; public static final short ATTRIBUTE_NODE; public static final short TEXT_NODE; public static final short CDATA_SECTION_NODE; public static final short ENTITY_REFERENCE_NODE; public static final short ENTITY_NODE; public static final short PROCESSING_INSTRUCTION_NODE; public static final short COMMENT_NODE; public static final short DOCUMENT_NODE; public static final short DOCUMENT_TYPE_NODE; public static final short DOCUMENT_FRAGMENT_NODE; public static final short NOTATION_NODE; // Basic getter methods public String getNodeName (); public String getNodeValue () throws DOMException; public void setNodeValue (String value ) throws DOMException; public short getNodeType (); public String getNamespaceURI (); public String getPrefix (); public void setPrefix (String prefix ) throws DOMException; public String getLocalName (); // Navigation methods public Node getParentNode (); public boolean hasChildNodes (); public NodeList getChildNodes (); public Node getFirstChild (); public Node getLastChild (); public Node getPreviousSibling (); public Node getNextSibling (); public Document getOwnerDocument (); // Attribute methods public boolean hasAttributes (); public NamedNodeMap getAttributes (); // Tree modification methods public Node insertBefore (Node newChild, Node refChild ) throws DOMException; public Node replaceChild (Node newChild, Node oldChild ) throws DOMException; public Node removeChild (Node oldChild ) throws DOMException; public Node appendChild (Node newChild ) throws DOMException; // Utility methods public Node cloneNode (boolean deep ); public void normalize (); public boolean isSupported (String feature, String version ); } NodeListNodeList is the basic DOM list type. Its most common use is for lists of children of an Element or Document . The index of the first item in the list is 0, as with Java arrays. The actual data structure used to implement the list can vary from implementation to implementation, but one constant is that the lists are live. In other words, if a node is deleted or moved from its parent, then it is also deleted from all lists that were built from the children of that parent. Similarly, if a new node is added to some node, then it is also added to all lists that point to the children of that node. package org.w3c.dom; public interface NodeList { public Node item (int index ); public int getLength (); } NotationThe Notation interface represents a notation declared in the document's DTD. It does not have a position in the tree. The complete list of notations in the document is accessible through the getNotations() method of the DocumentType interface. Both this list and the individual Notation objects are read-only. The node properties of Notation are defined as follows:
package org.w3c.dom; public interface Notation extends Node { public String getPublicId (); public String getSystemId (); } ProcessingInstructionThe ProcessingInstruction interface represents a processing instruction node. Its node properties are defined as follows:
package org.w3c.dom; public interface ProcessingInstruction extends Node { public String getTarget (); public String getData (); public void setData (String data ) throws DOMException; } TextThe Text interface represents a text node. It can contain any characters that are legal in XML text, including characters such as the less-than sign and ampersand that may need to be escaped when the document is serialized. When a parser reads an XML document and builds a DOM tree, each Text object will contain the longest-possible contiguous run of text. However, DOM does not maintain this constraint as the document is manipulated in memory. Its node properties are defined as follows:
The Text interface declares only one method of its own, splitText() . Most of its functionality is inherited from the superinterfaces CharacterData and Node . package org.w3c.dom; public interface Text extends CharacterData { public Text splitText (int offset ) throws DOMException; } Exceptions and ErrorsDOM2 defines only one exception class, DOMException . This is a runtime exception used for almost anything that can go wrong while constructing or manipulating a DOM Document . The details are provided by a short field, code , which is set to any of several named constants. package org.w3c.dom; public class DOMException extends RuntimeException { public short code; public static final short INDEX_SIZE_ERR; public static final short DOMSTRING_SIZE_ERR; public static final short HIERARCHY_REQUEST_ERR; public static final short WRONG_DOCUMENT_ERR; public static final short INVALID_CHARACTER_ERR; public static final short NO_DATA_ALLOWED_ERR; public static final short NO_MODIFICATION_ALLOWED_ERR; public static final short NOT_FOUND_ERR; public static final short NOT_SUPPORTED_ERR; public static final short INUSE_ATTRIBUTE_ERR; public static final short INVALID_STATE_ERR; public static final short SYNTAX_ERR; public static final short INVALID_MODIFICATION_ERR; public static final short NAMESPACE_ERR; public static final short INVALID_ACCESS_ERR; public DOMException (short code, String message ); } org.w3c.dom.traversalThe DOM traversal API in the org.w3c.dom.traversal package provides some convenience classes for navigating and searching an XML document. The most useful aspect of this class is the capability to get lists and trees that contain the kinds of nodes that you're interested in while ignoring everything else. DocumentTraversalDocumentTraversal is a factory interface for creating new NodeIterator and TreeWalker objects that present a filtered view of the content of an element or a document. (You can filter other kinds of nodes, too, but there's not much point to this if they don't have any children.) In implementations that support the traversal API (which can be determined by invoking the hasFeature("Traversal", "2.0" ) method in the Document or DOMImplementation classes) all objects that implement Document also implement DocumentTraversal . That is, to create a DocumentTraversal object, just cast a Document to DocumentTraversal . package org.w3c.dom.traversal; public interface DocumentTraversal { public NodeIterator createNodeIterator (Node root, int whatToShow, NodeFilter filter, boolean expandEntities ) throws DOMException; public TreeWalker createTreeWalker (Node root, int whatToShow, NodeFilter filter, boolean expandEntities ) throws DOMException; } NodeFilterThe NodeFilter interface is used by NodeIterator s and TreeWalker s to determine which nodes are included in the view of the document that they present to the client. Each node in the subtree will be passed to the filter's acceptNode() method. This returns one of the three named constants:
In addition, this class has 13 named constants that can be combined with the bitwise operators and passed to createNodeIterator() and createTreeWalker() to specify which kinds of nodes should be included in their views. package org.w3c.dom.traversal; public interface NodeFilter { public static final short FILTER_ACCEPT; public static final short FILTER_REJECT; public static final short FILTER_SKIP; public static final int SHOW_ALL; public static final int SHOW_ELEMENT; public static final int SHOW_ATTRIBUTE; public static final int SHOW_TEXT; public static final int SHOW_CDATA_SECTION; public static final int SHOW_ENTITY_REFERENCE; public static final int SHOW_ENTITY; public static final int SHOW_PROCESSING_INSTRUCTION; public static final int SHOW_COMMENT; public static final int SHOW_DOCUMENT; public static final int SHOW_DOCUMENT_TYPE; public static final int SHOW_DOCUMENT_FRAGMENT; public static final int SHOW_NOTATION; public short acceptNode (Node node ); } NodeIteratorThe NodeIterator interface presents a subset of nodes from the document as a list in document order. The list is live; that is, changes to the document are reflected in the list. package org.w3c.dom.traversal; public interface NodeIterator { public Node nextNode () throws DOMException; public Node previousNode () throws DOMException; public Node getRoot (); public int getWhatToShow (); public NodeFilter getFilter (); public boolean getExpandEntityReferences (); public void detach (); } TreeWalkerThe TreeWalker interface presents a subset of nodes from the document as a tree. Walking the TreeWalker is much like walking a full Document or Element , except that many of the node's descendants in which you aren't interested can be filtered out so they don't get in your way. The tree is live; that is, changes to the document are reflected in the tree. package org.w3c.dom.traversal; public interface TreeWalker { public Node parentNode (); public Node firstChild (); public Node lastChild (); public Node previousSibling (); public Node nextSibling (); public Node previousNode (); public Node nextNode (); public Node getRoot (); public int getWhatToShow (); public NodeFilter getFilter (); public boolean getExpandEntityReferences (); public Node getCurrentNode (); public void setCurrentNode (Node node ) throws DOMException; } |