CONTENTS |
Having successfully mastered JavaScript in the previous chapter (for our purposes, anyway), we're going to use it in this chapter to work with the W3C Document Object Model (DOM), the W3C-standardized programming interface for handling XML documents. Before the introduction of the DOM, all XML parsers and processors had different ways of interacting with XML documents and, worse, they kept changing all the time. With the introduction of the XML DOM, things have settled down (to some extent). Note that this chapter relies on the Microsoft Internet Explorer, which provides the most complete JavaScript-accessible implementation of the DOM.
The W3C DOM specifies a way of treating a document as a tree of nodes. In this model, every discrete data item is a node, and child elements or enclosed text become subnodes. Treating a document as a tree of nodes is one good way of handling XML documents (although there are others, as we'll see when we start working with Java) because it makes it relatively easy to explicitly state which elements contain which other elements; the contained elements become subnodes of the container nodes. Everything in a document becomes a node in this model elements, element attributes, text, and so on. Here are the possible node types in the W3C DOM:
Element
Attribute
Text
CDATA section
Entity reference
Entity
Processing instruction
Comment
Document
Document type
Document fragment
Notation
For example, take a look at this document:
<?xml version="1.0" encoding="UTF-8"?> <DOCUMENT> <GREETING> Hello From XML </GREETING> <MESSAGE> Welcome to the wild and woolly world of XML. </MESSAGE> </DOCUMENT>
This document has a processing instruction node and a root element node corresponding to the <DOCUMENT> element. The <DOCUMENT> node has two sub-nodes, the <GREETING> and <MESSAGE> nodes. These nodes are child nodes of the <DOCUMENT> node and sibling nodes of each other. Both the <GREETING> and <MESSAGE> elements have one subnode a text node that holds character data. We'll get used to handling documents like this one as a tree of nodes in this chapter. Figure 7.1 shows what this document looks like.
Every discrete data item is itself treated as a node. Using the methods defined in the W3C DOM, you can navigate along the various branches of a document's tree using methods such as nextChild to move to the nextChild node, or lastSibling to move to the last sibling node of the current node. Working with a document this way takes a little practice, and that's what this chapter is all about.
There are a number of different levels of DOM:
Level 0. There is no official DOM "level 0," but that's the way W3C refers to the DOM as implemented in relatively early versions of the popular browsers in particular, Netscape Navigator 3.0 and Microsoft Internet Explorer 3.0.
Level 1. This level of the DOM is the current W3C recommendation, and it concentrates on the HTML and XML document models. You can find the documentation for this level at http://www.w3.org/TR/REC-DOM-Level-1/.
Level 2. Currently at the Candidate Recommendation stage, this level of the DOM is more advanced and includes a style sheet object model. It also adds functionality for manipulating the style information attached to a document. In addition, it enables you to traverse a document, has a built-in event model, and supports XML namespaces. You can find the documentation for this level at http://www.w3.org/TR/DOM-Level-2/.
Level 3. This level is still in the planning stage and will address document loading and saving, as well as content models (such as DTDs and schemas) with document validation support. In addition, it will also address document views and formatting, key events, and event groups. There is no documentation on this level yet.
Practically speaking, the only nearly complete implementation of the XML DOM today is that in Internet Explorer version 5 or later. You can find the documentation for the Microsoft DOM at http://msdn.microsoft.com/library/psdk/xmlsdk/xmld20ab.htm as of this writing. However, the Microsoft sites are continually (and annoyingly) being reorganized, so it's quite possible that by the time you read this, that page will be long gone. In that case, your best bet is to go to http://msdn.microsoft.com and search for "xml dom." (The general rule is not to trust an URL at a Microsoft site for more than about two months.)
Because Internet Explorer provides substantial support for the W3C DOM level 1, I'm going to use it in this chapter. Let's hope that the translation to other W3C-compliant browsers, as those browsers begin to support the W3C DOM, won't be terribly difficult.
Here are the official W3C DOM level 1 objects:
Object | Description |
---|---|
Document | The document object. |
DocumentFragment | Reference to a fragment of a document. |
DocumentType | Reference to the <!DOCTYPE> element. |
EntityReference | Reference to an entity. |
Element | An element. |
Attr | An attribute. |
ProcessingInstruction | A processing instruction. |
Comment | Content of an XML comment. |
Text | Text content of an element or attribute. |
CDATAsection | CDATA section |
Entity | Indication of a parsed or unparsed entity in the XML document. |
Notation | Holder for a notation. |
Node | A single node in the document tree. |
NodeList | A list of node objects. This allows iteration and indexed access operations. |
NamedNodeMap | Allows iteration and access by name to the collection of attributes. |
Microsoft uses different names for these objects and adds its own. In particular, Microsoft defines a set of "base objects" that form the foundation of its XML DOM. The top-level object is the DOMDocument object, and it's the only one that you create directly you reach the other objects through that object. Here's the list of base objects in Internet Explorer. Note the objects designed to treat a document as a tree of nodes XMLDOMNode, XMLDOMNodeList, and so on:
Object | Description |
---|---|
DOMDocument | The top node of the XML DOM tree. |
XMLDOMNode | A single node in the document tree. It includes support for data types, namespaces, DTDs, and XML schemas. |
XMLDOMNodeList | A list of node objects. It allows iteration and indexed access operations. |
XMLDOMNamedNodeMap | Allows iteration and access by name to the collection of attributes. |
XMLDOMParseError | Information about the most recent error. It includes error number, line number, character position, and a text description. |
XMLHttpRequest | Allows communication with HTTP servers. |
XTLRuntime | Supports methods that you can call from XSL style sheets. |
Besides these base objects, the Microsoft XML DOM also provides these XML DOM objects that you use when working with documents in code, including the various types of nodes, which you see supported with objects of types such as XMLDOMAttribute, XMLDOMCharacterData, and XMLDOMElement:
Object | Description |
---|---|
XMLDOMAttribute | Stands for an attribute object. |
XMLDOMCDATASection | Handles CDATA sections so that text is not interpreted as markup language. |
XMLDOMCharacterData | Provides methods used for text manipulation. |
XMLDOMComment | Gives the content of an XML comment. |
XMLDOMDocumentFragment | Is a lightweight object useful for tree insert operations. |
XMLDOMDocumentType | Holds information connected to the document type declaration. |
XMLDOMElement | Stands for the element object. |
XMLDOMEntity | Stands for a parsed or unparsed entity in the XML document. |
XMLDOMEntityReference | Stands for an entity reference node. |
XMLDOMImplementation | Supports general DOM methods. |
XMLDOMNotation | Holds a notation (as declared in the DTD or schema). |
XMLDOMProcessingInstruction | Is a processing instruction. |
XMLDOMText | Is text content of an element or attribute. |
We'll put many of these objects to work in this chapter, seeing how to parse and access XML documents using the Microsoft XML DOM and handling events as documents are loaded. We'll also see how to alter an XML document at run time.
This previous list of objects is pretty substantial, and each object can contain its own properties, methods, and events. Although most of these properties, methods, and events are specified in the W3C XML DOM, many are added by Microsoft as well (and so are nonstandard). If we're going to work with the XML DOM in practice, it's essential to have a good understanding of these objects, both practically for the purposes of this chapter and also for reference. I'll go through the major objects in some detail to make handling the XML DOM clear, starting with the main object, the DOMDocument object.
The DOMDocument object is the main object that you work with, and it represents the top node in every document tree. When working with the DOM, this is the only object that you create directly.
As we'll see in this chapter, there are two ways to create document objects in Internet Explorer: using the Microsoft.XMLDOM class and using XML data islands. Creating a document object with the Microsoft.XMLDOM class looks like this, where you explicitly load a document into the object with the load method:
function readXMLDocument() { var xmldoc xmldoc = new ActiveXObject("Microsoft.XMLDOM") xmldoc.load("meetings.xml") . . .
We'll also see that you can use the <XML> HTML element to create a data island in Internet Explorer, and then use the XMLDocument property of that element to gain access to the corresponding document object:
<XML ID="meetingsXML" SRC="meetings.xml"></XML> <SCRIPT LANGUAGE="JavaScript"> function readXMLDocument() { xmldoc = document.all("meetingsXML").XMLDocument . . .
XML DOM and Multithreaded ProgramsThere's also a "free-threaded" version of the Microsoft.XMLDOM class that you can use in multithreaded programs: var xmldoc = new ActiveXObject("Microsoft.FreeThreadedXMLDOM") For more information on this advanced topic, take a look at the Microsoft XML DOM site (at http://msdn.microsoft.com/library/psdk/xmlsdk/xmld20ab.htm as of this writing, but very probably moved by the time you read this). |
Here are the properties of this object:
Property | Description |
---|---|
async[*] | Indicates whether asynchronous download is allowed. Read/write. |
attributes | Holds the list of attributes for this node. Read-only. |
baseName[*] | Is the base name qualified with the namespace. Read-only. |
childNodes | Holds a node list containing child nodes for nodes that may have children. Read-only. |
dataType[*] | Gives the data type for this node. Read/write. |
definition[*] | Gives the definition of the node in the DTD or schema. Read-only. |
doctype | Specifies the document type node, which is what specifies the DTD for this document. Read-only. |
documentElement | Gives the root element of the document. Read/write. |
firstChild | Gives the first child of the current node. Read-only. |
implementation | Specifies the XMLDOMImplementation object for this document. Read-only. |
lastChild | Gives the last child node of the current node. Read-only. |
namespaceURI[*] | Gives the URI for the namespace. Read-only. |
nextSibling | Specifies the next sibling of the current node. Read-only. |
nodeName | Specifies the qualified name of the element, attribute, or entity reference. Holds a fixed string for other node types. Read-only. |
nodeType | Gives the XML DOM node type. Read-only. |
nodeTypedValue[*] | Holds this node's value. Read/write. |
nodeTypeString[*] | Gives the node type expressed as a string. Read-only. |
nodeValue | Is the text associated with the node. Read/write. |
ondataavailable[*] | Is the event handler for the ondataavailable event. Read/write. |
onreadystatechange[*] | Is the event handler that handles readyState property changes. Read/write. |
ontransformnode[*] | Is the event handler for the ontransformnode event. Read/write. |
ownerDocument | Gives the root of the document that contains this node. Read-only. |
parentNode | Specifies the parent node (for nodes that can have parents). Read-only. |
parsed[*] | Is true if this node and all descendants have been parsed; is false otherwise. Read-only. |
parseError[*] | Is an XMLDOMParseError object with information about the most recent parsing error. Read-only. |
prefix[*] | Gives the namespace prefix. Read-only. |
preserveWhiteSpace[*] | Is true if processing should preserve whitespace; is false otherwise. Read/write. |
previousSibling | Specifies the previous sibling of this node. Read-only. |
readyState[*] | Gives the current state of the XML document. Read-only. |
resolveExternals[*] | Indicates whether external definitions are to be resolved at parse time. Read/write. |
specified[*] | Indicates whether the node is explicitly given or derived from a default value. Read-only. |
text[*] | Gives the text content of the node and its subtrees. Read/write. |
url[*] | Specifies the canonical URL for the most recently loaded XML document. Read-only. |
validateOnParse[*] | Indicates whether the parser should validate this document. Read/write. |
xml[*] | Gives the XML representation of the node and all its descendants. Read-only. |
[*] Microsoft extension to the W3C DOM.
Here are the methods of the document object:
Method | Description |
---|---|
abort[*] | Aborts an asynchronous download |
appendChild | Appends a new child as the last child of the current node |
cloneNode | Returns a new node that is a copy of this node |
createAttribute | Returns a new attribute with the given name |
createCDATASection | Returns a CDATA section node that contains the given data |
createComment | Returns a comment node |
createDocumentFragment | Returns an empty DocumentFragment object |
createElement | Returns an element node using the given name |
createEntityReference | Returns a new EntityReference object |
createNode[*] | Returns a node using the given type, name, and namespace |
createProcessingInstruction | Returns a processing instruction node |
createTextNode | Returns a text node that contains the given data |
getElementsByTagName | Yields a collection of elements that have the given name |
hasChildNodes | Is true if this node has children |
insertBefore | Inserts a child node before the given node |
load[*] | Loads an XML document from the given location |
loadXML[*] | Loads an XML document using the given string |
nodeFromID[*] | Yields the node whose ID attribute matches the given value |
removeChild | Removes the given child node from the list of children |
replaceChild | Replaces the given child node with the given new child node |
save[*] | Saves an XML document to the given location |
selectNodes[*] | Applies the given pattern-matching operation to this node's context, returning a list of matching nodes |
selectSingleNode[*] | Applies the given pattern-matching operation to this node's context, returning the first matching node |
transformNode[*] | Transforms this node and its children using the given XSL style sheet |
transformNodeToObject[*] | Transforms this node and its children to an object, using the given XSL style sheet |
Here are the events of the document object:
Event | Description |
---|---|
ondataavailable[*] | Indicates that XML document data is available |
onreadystatechange[*] | Indicates when the readyState property changes |
ontransformnode[*] | Happens before each node in the style sheet is applied in the XML source |
The Microsoft XMLDOMNode object extends the core XML DOM node interface by adding support for data types, namespaces, DTDs, and schemas as implemented in Internet Explorer. We'll use this object a good deal as we traverse document trees. Here are the properties of this object:
Property | Description |
---|---|
attributes | The list of attributes for this node. Read-only. |
baseName[*] | The base name for the name qualified with the namespace. Read-only. |
childNodes | A node list containing the child nodes of the current node. Read-only. |
dataType[*] | The data type for this node. Read/write. |
definition[*] | The definition of the node in the DTD or schema. Read-only. |
firstChild | The first child of the current node. Read-only. |
lastChild | The last child of the current node. Read-only. |
namespaceURI[*] | The URI for the namespace. Read-only. |
nextSibling | The next sibling of this node. Read-only. |
nodeName | Holder for a qualified name for an element, attribute, or entity reference, or a string for other node types. Read-only. |
nodeType | The XML DOM node type. Read-only. |
nodeTypedValue[*] | The node's value. Read/write. |
nodeTypeString[*] | The node type in string form. Read-only. |
nodeValue | The text associated with the node. Read/write. |
ownerDocument | The root of the document. Read-only. |
parentNode | The parent node. Read-only. |
parsed[*] | True if this node and all descendants have been parsed; false otherwise. Read-only. |
prefix[*] | The namespace prefix. Read-only. |
previousSibling | The previous sibling of this node. Read- |
specified[*] | Indication of whether a node is explicitly given or derived from a default value. Read-only. |
text[*] | The text content of the node and its subtrees. Read/write. |
xml[*] | The XML representation of the node and all its descendants. Read-only. |
Here are the methods of this object:
Method | Description |
---|---|
appendChild | Appends a new child as the last child of this node |
cloneNode | Creates a new node that is a copy of this node |
hasChildNodes | Is true if this node has children |
insertBefore | Inserts a child node before the given node |
removeChild | Removes the given child node |
replaceChild | Replaces the given child node with the given new child node |
selectNodes[*] | Applies the given pattern-matching operation to this node's context, returning a list of matching nodes |
selectSingleNode[*] | Applies the given pattern-matching operation to this node's context, returning the first matching node |
transformNode[*] | Transforms this node and its children using the given XSL style sheet |
transformNodeToObject[*] | Transforms this node and its children using the given XSL style sheet, returning the result in an object |
This object has no events.
You use the XMLDOMNodeList to handle lists of nodes. Node lists are useful because a node itself can have many child nodes. Using a node list, you can handle all the children of a node at once.
For example, here I'm loading a document and getting a list of all <PERSON> elements as a node list, using the document object's getElementsByTagName method:
function readXMLDocument() { var xmldoc, nodeList xmldoc = new ActiveXObject("Microsoft.XMLDOM") xmldoc.load("meetings.xml") nodeList = xmlDoc.getElementsByTagName("PERSON") . . .
The XMLDOMNodeList object has a single property, length, which describes the number of items in the collection and is read-only.
Here are the methods of the XMLDOMNodeList object:
Method | Description |
---|---|
item | Allows random access to nodes in the collection |
nextNode[*] | Indicates the next node in the collection |
reset[*] | Resets the list iterator |
This object has no events.
The Microsoft XML DOM also supports an XMLDOMNamedNodeMap object, which provides support for namespaces. Here are the properties of this object:
Property | Description |
---|---|
length | Gives the number of items in the collection. Read-only. |
item | Allows random access to nodes in the collection. Read-only. |
Here are the methods of this object:
Method | Description |
---|---|
getNamedItem | Gets the attribute with the given name |
getQualifiedItem[*] | Gets the attribute with the given namespace and attribute name |
nextNode | Gets the next node |
removeNamedItem | Removes an attribute |
removeQualifiedItem | Removes the attribute with the given namespace and attribute name |
reset | Resets the list iterator |
setNamedItem | Adds the given node |
This object has no events.
The Microsoft XMLDOMParseError object holds information about the most recent parse error, including the error number, line number, character position, and a text description. Although it's not obvious to anyone who loads an XML document into Internet Explorer, the browser does actually validate the document using either a DTD or schema if one is supplied. It's not obvious that this happens because, by default, Internet Explorer does not display any validation error messages. However, if you use the XMLDOMParseError object, you can get a full validation report, and I'll do so later in this chapter.
Here are the properties of this object:
Property | Description |
---|---|
errorCode | The error code of the most recent parse error. Read-only. |
filepos | The file position where the error occurred. Read-only. |
line | The line number that contains the error. Read-only. |
linepos | The character position in the line where the error happened. Read-only. |
reason | The reason for the error. Read-only. |
srcText | The full text of the line containing the error. Read-only. |
url | The URL of the XML document containing the last error. Read-only. |
Note that this object does not have any methods or events, and it does not correspond to any official W3C object in the W3C DOM.
In both the W3C and Microsoft DOM, attribute objects are node objects (that is, they are based on the node object), but they are not actually child nodes of an element and are not considered part of the document tree. Instead, attributes are considered properties of their associated elements. (This means that properties such as parentNode, previousSibling, or nextSibling are meaningless for attributes.) We'll see how to work with attributes in this chapter.
Here are the properties of the XMLDOMAttribute object:
Property | Description |
---|---|
attributes | The list of attributes for this node. Read-only. |
baseName[*] | The base name for the name qualified with the namespace. Read-only. |
childNodes | A node list containing child nodes. Read-only. |
dataType[*] | The data type of this node. Read/write. |
definition[*] | The definition of the node in the DTD or schema. Read-only. |
firstChild | The first child of the current node. Read-only. |
lastChild | The last child of the current node. Read-only. |
name | The attribute name. Read-only. |
namespaceURI[*] | The URI for the namespace. Read-only. |
nextSibling | The next sibling of this node. Read-only. |
nodeName | The qualified name for an element, attribute, or entity reference, or a string for other node types. Read-only. |
nodeType | The XML DOM node type. Read-only. |
nodeTypedValue[*] | The node's value. Read/write. |
nodeTypeString[*] | The node type in string form. Read-only. |
nodeValue | The text associated with the node. Read/write. |
ownerDocument | The root of the document. Read-only. |
parentNode | Holder for the parent node (for nodes that can have parents). Read-only. |
parsed[*] | True if this node and all descendants have been parsed; false otherwise. Read-only. |
prefix[*] | The namespace prefix. Read-only. |
previousSibling | The previous sibling of this node. Read-only. |
specified | Indication of whether the node (usually an attribute) is explicitly specified or derived from a default value. Read-only. |
text | The text content of the node and its subtrees. Read/write. |
value | The attribute's value. Read/write. |
xml | The XML representation of the node and all its descendants. Read-only. |
Here are the methods of the XMLDOMAttribute object:
Method | Description |
---|---|
appendChild | Appends a new child as the last child of this node |
cloneNode | Returns a new node that is a copy of this node |
hasChildNodes | Is true if this node has children |
insertBefore | Inserts a child node before the given node |
removeChild | Removes the given child node from the list |
replaceChild | Replaces the given child node with the given new child node |
selectNodes | Applies the given pattern-matching operation to this node's context, returning a list of matching nodes |
selectSingleNode | Applies the given pattern-matching operation to this node's context, returning the first matching node |
transformNode | Transforms this node and its children using the given XSL style sheet |
transformNodeToObject | Transforms this node and its children using the given XSL style sheet, and returns the result in an object |
This object does not support any events.
XMLDOMElement objects represent elements and are probably the most common node objects that you'll deal with. Because attributes are not considered child nodes of an element object, you use special methods to get the attributes of an element for example, you can use the getAttribute method, which returns an XMLDOMNamedNodeMap object that contains all the element's attributes.
Here are the properties of the XMLDOMElement object:
Property | Description |
---|---|
attributes | The list of attributes for this node. Read-only. |
baseName[*] | The base name for the name qualified with the namespace. Read-only. |
childNodes | A node list containing the children. Read-only. |
dataType[*] | The data type for this node. Read/write. |
definition[*] | The definition of the node in the DTD or schema. |
firstChild | The first child of this node. Read-only. |
lastChild | The last child node of this node. Read-only. |
namespaceURI[*] | The URI for the namespace. Read-only. |
nextSibling | The next sibling of this node. Read-only. |
nodeName | Holder for the qualified name of an element, attribute, or entity reference, or a string for other node types. Read-only. |
nodeType | Indication of the XML DOM node type. Read-only. |
nodeTypeString[*] | The node type in string form. Read-only. |
nodeValue | The text associated with the node. Read/write. |
ownerDocument | The root of the document. Read-only. |
parentNode | The parent node of the current node. Read-only. |
parsed[*] | True if this node and all descendants have been parsed; false otherwise. Read-only. |
prefix[*] | The namespace prefix. Read-only. |
previousSibling | The previous sibling of this node. Read-only. |
specified[*] | Indication of whether the node is explicitly specified or derived from a default value in the DTD or schema. Read-only. |
tagName | Holder for the element name. Read-only. |
text[*] | Holder for the text content of the node and its subtrees. Read/write. |
xml[*] | Holder for the XML representation of the node and all its descendants. Read-only. |
Here are the methods of the XMLDOMElement object:
Method | Description |
---|---|
appendChild | Appends a new child as the last child of the current node |
cloneNode | Returns a new node that is a copy of this node |
getAttribute | Gets the value of the named attribute |
getAttributeNode | Gets the named attribute node |
getElementsByTagName | Returns a list of all descendant elements that match the given name |
hasChildNodes | Is true if this node has children |
insertBefore | Inserts a child node before the given node |
normalize | Normalizes all descendent elements, combining two or more text nodes next to each other into one text node |
removeAttribute | Removes or replaces the named attribute |
removeAttributeNode | Removes the given attribute from this element |
removeChild | Removes the given child node |
replaceChild | Replaces the given child node with the given new child node |
selectNodes[*] | Applies the given pattern-matching operation to this node's context, returning the list of matching nodes |
selectSingleNode[*] | Applies the given pattern-matching operation to this node's context, returning the first matching node |
setAttribute | Sets the value of a named attribute |
setAttributeNode | Adds or changes the given attribute node on this element |
transformNode[*] | Transforms this node and its children using the given XSL style sheet |
transformNodeToObject[*] | Transforms this node and its children using the given XSL style sheet, and returns the resulting transformation as an object |
This object has no events.
The XMLDOMText object holds the text content of an element or attribute. If there is no markup inside an element, but there is text, that element will contain only one node a text node that holds the text. (In mixed-content models, text nodes can have sibling element nodes.)
When a document is first made available to the XML DOM, all text is normalized, which means that there is only one text node for each block of text. You can actually create text nodes that are adjacent to each other, although they will not be saved as distinct the next time that the document is opened. (It's worth noting that the normalize method on the XMLDOMElement object merges adjacent text nodes into a single node.)
Here are the properties of the XMLDOMText object:
Property | Description |
---|---|
attributes | Holder for the list of attributes for this node. Read-only. |
baseName[*] | The base name for the name qualified with the namespace. Read-only. |
childNodes | A node list containing the child nodes. Read-only. |
data | This node's data (what's actually stored depends on the node type). Read/write. |
dataType[*] | The data type for this node. Read/write. |
definition[*] | The definition of the node in the DTD or schema. Read-only. |
firstChild | The first child of the current node. Read-only. |
lastChild | The last child of the current node. Read-only. |
length | The length, in characters, of the data. Read-only. |
namespaceURI[*] | The URI for the namespace. Read-only. |
nextSibling | The next sibling of this node. Read-only. |
nodeName | The qualified name of an element, attribute, or entity reference, or a string for other node types. Read-only. |
nodeType | Indication of the XML DOM node type. Read-only. |
nodeTypedValue[*] | This node's value. Read/write. |
nodeTypeString[*] | The node type in string form. Read-only. |
nodeValue | The text associated with the node. Read/write. |
ownerDocument | The root of the document. Read-only. |
parentNode | The parent node. Read-only. |
parsed[*] | True if this node and all descendants have been parsed; false otherwise. Read-only. |
prefix[*] | The namespace prefix. Read-only. |
previousSibling | The previous sibling of this node. Read-only. |
specified | Indication of whether the node is explicitly specified or derived from a default value. Read-only. |
text[*] | Holder for the text content of the node and its subtrees. Read/write. |
xml[*] | Holder for the XML representation of the node and all its descendants. Read-only. |
Here are the methods of the XMLDOMText object:
Method | Description |
---|---|
appendChild | Appends a new child as the last child of this node |
appendData | Appends the given string to the existing string data |
cloneNode | Returns a new node that is a copy of this node |
deleteData | Removes the given substring within the string data |
hasChildNodes | Is true if this node has children |
insertBefore | Inserts a child node before the specified node |
insertData | Inserts the supplied string at the specified offset |
removeChild | Removes the specified child node from the list of children |
replaceChild | Replaces the specified child node with the given new child node |
selectNodes[*] | Replaces the given number of characters with the given string |
selectSingleNode[*] | Applies the given pattern-matching operation to this node's context, returning a list of matching nodes |
specified[*] | Applies the specified pattern-matching operation to this node's context, returning an object |
splitText | Breaks this text node into two text nodes |
substringData | Returns a substring of the full string |
transformNode[*] | Transforms this node and its children using the given XSL style sheet |
transformNodeToObject[*] | Transforms this node and its children using the given XSL style sheet, and returns the resulting transformation as an object |
This object doesn't support any events.
That gives us an overview of the most commonly used objects in the Microsoft XML DOM. Now I'm going to put them to work in the rest of the chapter. I'll start at the beginning loading an XML document.
Our first step will be to load an XML document into Internet Explorer using code, and to create a document object. Using this object, we'll be able to access all aspects of the document itself.
As mentioned earlier in this chapter, there are two ways to load an XML document into Internet Explorer so that you have access to it using JavaScript. To see how this works, I'll use this XML document, meetings.xml, throughout this chapter this document records business meetings, including who was present and when the meeting occurred:
<?xml version="1.0"?> <MEETINGS> <MEETING TYPE="informal"> <MEETING_TITLE>XML In The Real World</MEETING_TITLE> <MEETING_NUMBER>2079</MEETING_NUMBER> <SUBJECT>XML</SUBJECT> <DATE>6/1/2002</DATE> <PEOPLE> <PERSON ATTENDANCE="present"> <FIRST_NAME>Edward</FIRST_NAME> <LAST_NAME>Samson</LAST_NAME> </PERSON> <PERSON ATTENDANCE="absent"> <FIRST_NAME>Ernestine</FIRST_NAME> <LAST_NAME>Johnson</LAST_NAME> </PERSON> <PERSON ATTENDANCE="present"> <FIRST_NAME>Betty</FIRST_NAME> <LAST_NAME>Richardson</LAST_NAME> </PERSON> </PEOPLE> </MEETING> </MEETINGS>
The first way of loading an XML document into Internet Explorer is to create a document object using the Microsoft.XMLDOM class.
To see this in action, I'm going to create an example that reads in meetings.xml and retrieves the name of the third person in that document (Betty Richardson). I start by creating a new document object like this (recall that you use the new operator to create a new object): xmldoc = new ActiveXObject("Microsoft.XMLDOM"). Here's how it looks in code:
<HTML> <HEAD> <TITLE> Reading XML element values </TITLE> <SCRIPT LANGUAGE="JavaScript"> function readXMLDocument() { var xmldoc xmldoc = new ActiveXObject("Microsoft.XMLDOM") . . . </HEAD> </HTML>
Now I can load in the XML document meetings.xml:
<HTML> <HEAD> <TITLE> Reading XML element values </TITLE> <SCRIPT LANGUAGE="JavaScript"> function readXMLDocument() { var xmldoc xmldoc = new ActiveXObject("Microsoft.XMLDOM") xmldoc.load("meetings.xml") . . . </HEAD> </HTML>
The next step is to get a node object corresponding to the document's root element, <MEETINGS>. You do that with the documentElement method:
<HTML> <HEAD> <TITLE> Reading XML element values </TITLE> <SCRIPT LANGUAGE="JavaScript"> function readXMLDocument() { var xmldoc, meetingsNode xmldoc = new ActiveXObject("Microsoft.XMLDOM") xmldoc.load("meetings.xml") meetingsNode = xmldoc.documentElement . . . </HEAD> </HTML>
At this point, I'm free to move around the document as I like, using methods such as firstChild, nextChild, previousChild, and lastChild, which let you access the child elements of an element, and the firstSibling, nextSibling, previousSibling, and lastSibling methods, which let you access elements on the same nesting level. For example, the <MEETING> element is the first child of the document root element, <MEETINGS>, so I can get a node corresponding to the <MEETING> element using the firstChild method:
<HTML> <HEAD> <TITLE> Reading XML element values </TITLE> <SCRIPT LANGUAGE="JavaScript"> function readXMLDocument() { var xmldoc, meetingsNode, meetingNode, xmldoc = new ActiveXObject("Microsoft.XMLDOM") xmldoc.load("meetings.xml") meetingsNode = xmldoc.documentElement meetingNode = meetingsNode.firstChild . . . </HEAD> </HTML>
I want to track down the third <PERSON> element inside the <PEOPLE> element. The <PEOPLE> element is the last child of the <MEETING> element, so I can get a node corresponding to the <PEOPLE> element this way:
<HTML> <HEAD> <TITLE> Reading XML element values </TITLE> <SCRIPT LANGUAGE="JavaScript"> function readXMLDocument() { var xmldoc, meetingsNode, meetingNode, peopleNode var first_nameNode, last_nameNode xmldoc = new ActiveXObject("Microsoft.XMLDOM") xmldoc.load("meetings.xml") meetingsNode = xmldoc.documentElement meetingNode = meetingsNode.firstChild peopleNode = meetingNode.lastChild . . . </HEAD> </HTML>
I want the third person in the <PEOPLE> element, which is the last child of this element, so I get access to that person with the lastChild method:
<HTML> <HEAD> <TITLE> Reading XML element values </TITLE> <SCRIPT LANGUAGE="JavaScript"> function readXMLDocument() { var xmldoc, meetingsNode, meetingNode, peopleNode xmldoc = new ActiveXObject("Microsoft.XMLDOM") xmldoc.load("meetings.xml") meetingsNode = xmldoc.documentElement meetingNode = meetingsNode.firstChild peopleNode = meetingNode.lastChild personNode = peopleNode.lastChild . . . </HEAD> </HTML>
Finally, I can get a node corresponding to the <FIRST_NAME> and <LAST_NAME> elements that holds the appropriate person's name using the firstChild and nextSibling (which gets the current node's next sibling node) methods:
<HTML> <HEAD> <TITLE> Reading XML element values </TITLE> <SCRIPT LANGUAGE="JavaScript"> function readXMLDocument() { var xmldoc, meetingsNode, meetingNode, peopleNode var first_nameNode, last_nameNode xmldoc = new ActiveXObject("Microsoft.XMLDOM") xmldoc.load("meetings.xml") meetingsNode = xmldoc.documentElement meetingNode = meetingsNode.firstChild peopleNode = meetingNode.lastChild personNode = peopleNode.lastChild first_nameNode = personNode.firstChild last_nameNode = first_nameNode.nextSibling . . . </HEAD> </HTML>
Now I've walked the tree to get nodes corresponding to the actual elements that I want. Note, however, that the node I want is actually the text nodes inside the <FIRST_NAME> and <LAST_NAME> elements, which hold the person's name. That means that I have to get the first child of those elements (that is, the text node), and then use the nodeValue property of that text node to read the person's name.
To actually display the person's first and last names, I'll use a little dynamic HTML here, I'm going to use an HTML <DIV> element and the innerHTML property of that element (which holds the text content of the <DIV> element) to display the person's name, like this:
<HTML> <HEAD> <TITLE> Reading XML element values </TITLE> <SCRIPT LANGUAGE="JavaScript"> function readXMLDocument() { var xmldoc, meetingsNode, meetingNode, peopleNode var first_nameNode, last_nameNode, outputText xmldoc = new ActiveXObject("Microsoft.XMLDOM") xmldoc.load("meetings.xml") meetingsNode = xmldoc.documentElement meetingNode = meetingsNode.firstChild peopleNode = meetingNode.lastChild personNode = peopleNode.lastChild first_nameNode = personNode.firstChild last_nameNode = first_nameNode.nextSibling outputText = "Third name: " + first_nameNode.firstChild.nodeValue + ' ' + last_nameNode.firstChild.nodeValue messageDIV.innerHTML=outputText } </SCRIPT> </HEAD> <BODY> <CENTER> <H1> Reading XML element values </H1> <INPUT TYPE="BUTTON" VALUE="Get the name of the third person" ONCLICK="readXMLDocument()"> <P> <DIV ID="messageDIV"></DIV> </CENTER> </BODY> </HTML>
I've also added a button with the caption Get the name of the third person that will call the JavaScript function we've defined, readXMLDocument, and that function reads and displays the document.
You can see this page at work in Internet Explorer in Figure 7.2. When the user clicks the button, the XML document meetings.xml is read and parsed, and we retrieve and display the third person's name. We've made substantial progress.
As of Internet Explorer version 5, you can also use XML data islands to actually embed XML inside HTML pages. Internet Explorer supports an HTML <XML> element (which is not part of the HTML standard) that you can simply enclose an XML document inside, like this:
<XML ID="greeting"> <DOCUMENT> <GREETING>Hi there XML!</GREETING> </DOCUMENT> </XML>
The Internet Explorer <XML> element has some attributes worth noting:
Attribute | Description |
---|---|
ID | The ID with which you can refer to the <XML> element in code. Set to an alphanumeric string. |
NS | The URI of the XML namespace used by the XML content. Set to a URI. |
PREFIX | Namespace prefix of the XML contents. Set to an alphanumeric string. |
SRC | Source for the XML document, if the document is external. Set to a URI. |
When you use this element, you access it using its ID value in code. To reach the element, you can use the all collection, passing it the ID that you gave the element, like this, for the above example: document.all("greeting"). To get the document object corresponding to the XML document, you can then use the XMLDocument property. Here's how I convert the previous example to use a data island instead of the Microsoft.XMLDOM object:
<HTML> <HEAD> <TITLE> Reading element values with XML data islands </TITLE> <XML ID="meetingsXML" SRC="meetings.xml"></XML> <SCRIPT LANGUAGE="JavaScript"> function readXMLDocument() { var xmldoc, meetingsNode, meetingNode, peopleNode var first_nameNode, last_nameNode, outputText xmldoc= document.all("meetingsXML").XMLDocument meetingsNode = xmldoc.documentElement meetingNode = meetingsNode.firstChild peopleNode = meetingNode.lastChild personNode = peopleNode.lastChild first_nameNode = personNode.firstChild last_nameNode = first_nameNode.nextSibling outputText = "Third name: " + first_nameNode.firstChild.nodeValue + ' ' + last_nameNode.firstChild.nodeValue messageDIV.innerHTML=outputText } </SCRIPT> </HEAD> <BODY> <CENTER> <H1> Reading element values with XML data islands </H1> <INPUT TYPE="BUTTON" VALUE="Get the name of the third person" ONCLICK="readXMLDocument()"> <P> <DIV ID="messageDIV"></DIV> </CENTER> </BODY> </HTML>
This example works as the previous example did, as shown in Figure 7.3.
In the previous example, I used an external XML document, meetings.xml, which I referenced with the <XML> element's SRC attribute. However, you can also enclose the entire XML document in the <XML> element, like this:
<HTML> <HEAD> <TITLE> Creating An XML Data Island </TITLE> <XML ID="meetingsXML"> <?xml version="1.0"?> <MEETINGS> <MEETING TYPE="informal"> <MEETING_TITLE>XML In The Real World</MEETING_TITLE> <MEETING_NUMBER>2079</MEETING_NUMBER> <SUBJECT>XML</SUBJECT> <DATE>6/1/2002</DATE> <PEOPLE> <PERSON ATTENDANCE="present"> <FIRST_NAME>Edward</FIRST_NAME> <LAST_NAME>Samson</LAST_NAME> </PERSON> <PERSON ATTENDANCE="absent"> <FIRST_NAME>Ernestine</FIRST_NAME> <LAST_NAME>Johnson</LAST_NAME> </PERSON> <PERSON ATTENDANCE="present"> <FIRST_NAME>Betty</FIRST_NAME> <LAST_NAME>Richardson</LAST_NAME> </PERSON> </PEOPLE> </MEETING> </MEETINGS> </XML> <SCRIPT LANGUAGE="JavaScript"> function readXMLDocument() { var xmldoc, meetingsNode, meetingNode, peopleNode var first_nameNode, last_nameNode, outputText xmldoc= document.all("meetingsXML").XMLDocument meetingsNode = xmldoc.documentElement meetingNode = meetingsNode.firstChild peopleNode = meetingNode.lastChild personNode = peopleNode.lastChild first_nameNode = personNode.firstChild last_nameNode = first_nameNode.nextSibling outputText = "Third name: " + first_nameNode.firstChild.nodeValue + ' ' + last_nameNode.firstChild.nodeValue messageDIV.innerHTML=outputText } </SCRIPT> </HEAD> <BODY> <CENTER> <H1> Reading element values with XML data islands </H1> <INPUT TYPE="BUTTON" VALUE="Get the name of the third person" ONCLICK="readXMLDocument()"> <P> <DIV ID="messageDIV"></DIV> </CENTER> </BODY> </HTML>
So far, I've used the XMLDocument property of the object corresponding to the XML data island to get the document object, but you can also use the documentElement property of the data island directly to get the root element of the XML document, like this:
<HTML> <HEAD> <TITLE> Reading XML element values </TITLE> <XML ID="meetingsXML" SRC="meetings.xml"></XML> <SCRIPT LANGUAGE="JavaScript"> function readXMLDocument() { var xmldoc, meetingsNode, meetingNode, peopleNode var first_nameNode, last_nameNode, outputText meetingsNode = meetingsXML.documentElement meetingNode = meetingsNode.firstChild peopleNode = meetingNode.lastChild personNode = peopleNode.lastChild first_nameNode = personNode.firstChild last_nameNode = first_nameNode.nextSibling . . . </HTML>
So far in this chapter, I've used the navigation methods such as nextSibling and nextChild to navigate through XML documents. However, you can also get individual elements by searching for them by name. Here's an example; in this case, I'll use the document object's getElementsByTagName method to return a node list object holding all elements of a given name. In particular, I'm searching for <FIRST_NAME> and <LAST_NAME> elements, so I get lists of those elements like this:
<HTML> <HEAD> <TITLE> Reading XML element values </TITLE> <SCRIPT LANGUAGE="JavaScript"> function loadDocument() { var xmldoc, listNodesFirstName, listNodesLastName xmldoc = new ActiveXObject("Microsoft.XMLDOM") xmldoc.load("meetings.xml") listNodesFirstName = xmldoc.getElementsByTagName("FIRST_NAME") listNodesLastName = xmldoc.getElementsByTagName("LAST_NAME")<HTML> . . .
Like all node lists, the listNodesFirstName and listNodesLastName node lists are indexed by number starting at 0, so the third element in these lists is element number 2, which you refer to as listNodesLastName.item(2). This means that I can find the first and last name of the third person. (Recall that I actually need the first child of the <FIRST_NAME> and <LAST_NAME> nodes, which is the text node inside those elements that holds the person's name, so I use the firstChild method here.)
<HTML> <HEAD> <TITLE> Reading XML element values </TITLE> <SCRIPT LANGUAGE="JavaScript"> function loadDocument() { var xmldoc, listNodesFirstName, listNodesLastName xmldoc = new ActiveXObject("Microsoft.XMLDOM") xmldoc.load("meetings.xml") listNodesFirstName = xmldoc.getElementsByTagName("FIRST_NAME") listNodesLastName = xmldoc.getElementsByTagName("LAST_NAME") outputText = "Third name: " + listNodesFirstName.item(2).firstChild.nodeValue + ' ' + listNodesLastName.item(2).firstChild.nodeValue messageDIV.innerHTML=outputText } </SCRIPT> </HEAD> <BODY> <CENTER> <H1> Reading XML element values </H1> <INPUT TYPE="BUTTON" VALUE="Get the name of the third person" ONCLICK="loadDocument()"> <P> <DIV ID="messageDIV"></DIV> </CENTER> </BODY> </HTML>
We've made some progress here and have been able to read in an XML document in various ways to access specific elements in the document. I'll move on to the next step now accessing not just an element's text content, but also the element's attributes.
To see how to read attribute values from an XML document, I'll read the value of the ATTENDANCE attribute of the third person in the XML document meetings.xml:
<?xml version="1.0"?> <MEETINGS> <MEETING TYPE="informal"> <MEETING_TITLE>XML In The Real World</MEETING_TITLE> <MEETING_NUMBER>2079</MEETING_NUMBER> <SUBJECT>XML</SUBJECT> <DATE>6/1/2002</DATE> <PEOPLE> <PERSON ATTENDANCE="present"> <FIRST_NAME>Edward</FIRST_NAME> <LAST_NAME>Samson</LAST_NAME> </PERSON> <PERSON ATTENDANCE="absent"> <FIRST_NAME>Ernestine</FIRST_NAME> <LAST_NAME>Johnson</LAST_NAME> </PERSON> <PERSON ATTENDANCE="present"> <FIRST_NAME>Betty</FIRST_NAME> <LAST_NAME>Richardson</LAST_NAME> </PERSON> </PEOPLE> </MEETING> </MEETINGS>
How do you read attribute values? You start by getting a named node map object of the attributes of the current element using that element's attribute's property. In this case, we want the attributes of the third <PERSON> element, and we get a named node map of those attributes, like this:
<HTML> <HEAD> <TITLE> Reading attribute values from XML documents </TITLE> <XML ID="meetingsXML" SRC="meetings.xml"></XML> <SCRIPT LANGUAGE="JavaScript"> function readXMLDocument() { var xmldoc, meetingsNode, meetingNode, peopleNode var first_nameNode, last_nameNode, outputText var attributes xmldoc= document.all("meetingsXML").XMLDocument meetingsNode = xmldoc.documentElement meetingNode = meetingsNode.firstChild peopleNode = meetingNode.lastChild personNode = peopleNode.lastChild first_nameNode = personNode.firstChild last_nameNode = first_nameNode.nextSibling attributes = personNode.attributes . . . </HTML>
Now I can recover the actual node for the ATTENDANCE node with the named node map object's getNamedItem method:
<HTML> <HEAD> <TITLE> Reading attribute values from XML documents </TITLE> <XML ID="meetingsXML" SRC="meetings.xml"></XML> <SCRIPT LANGUAGE="JavaScript"> function readXMLDocument() { var xmldoc, meetingsNode, meetingNode, peopleNode var first_nameNode, last_nameNode, outputText xmldoc= document.all("meetingsXML").XMLDocument meetingsNode = xmldoc.documentElement meetingNode = meetingsNode.firstChild peopleNode = meetingNode.lastChild personNode = peopleNode.lastChild first_nameNode = personNode.firstChild last_nameNode = first_nameNode.nextSibling attributes = personNode.attributes attendancePerson = attributes.getNamedItem("ATTENDANCE") . . . </HTML>
Now I have a node corresponding to the ATTENDANCE attribute, and I can get the value of that attribute using the value property (attribute nodes don't have internal text nodes):
<HTML> <HEAD> <TITLE> Reading attribute values from XML documents </TITLE> <XML ID="meetingsXML" SRC="meetings.xml"></XML> <SCRIPT LANGUAGE="JavaScript"> function readXMLDocument() { var xmldoc, meetingsNode, meetingNode, peopleNode var first_nameNode, last_nameNode, outputText var attributes, attendancePerson xmldoc= document.all("meetingsXML").XMLDocument meetingsNode = xmldoc.documentElement meetingNode = meetingsNode.firstChild peopleNode = meetingNode.lastChild personNode = peopleNode.lastChild first_nameNode = personNode.firstChild last_nameNode = first_nameNode.nextSibling attributes = personNode.attributes attendancePerson = attributes.getNamedItem("ATTENDANCE") outputText = first_nameNode.firstChild.nodeValue + ' ' + last_nameNode.firstChild.nodeValue + " is " + attendancePerson.value messageDIV.innerHTML=outputText . . . </HTML>
And that's all it takes. Here's what the whole page looks like:
<HTML> <HEAD> <TITLE> Reading attribute values from XML documents </TITLE> <XML ID="meetingsXML" SRC="meetings.xml"></XML> <SCRIPT LANGUAGE="JavaScript"> function readXMLDocument() { var xmldoc, meetingsNode, meetingNode, peopleNode var first_nameNode, last_nameNode, outputText var attributes, attendancePerson xmldoc= document.all("meetingsXML").XMLDocument meetingsNode = xmldoc.documentElement meetingNode = meetingsNode.firstChild peopleNode = meetingNode.lastChild personNode = peopleNode.lastChild first_nameNode = personNode.firstChild last_nameNode = first_nameNode.nextSibling attributes = personNode.attributes attendancePerson = attributes.getNamedItem("ATTENDANCE") outputText = first_nameNode.firstChild.nodeValue + ' ' + last_nameNode.firstChild.nodeValue + " is " + attendancePerson.value messageDIV.innerHTML=outputText } </SCRIPT> </HEAD> <BODY> <CENTER> <H1> Reading attribute values from XML documents </H1> <INPUT TYPE="BUTTON" VALUE="Get attendance of the third person" ONCLICK="readXMLDocument()"> <P> <DIV ID="messageDIV"></DIV> </CENTER> </BODY> </HTML>
Figure 7.4 shows the results; the attendance of the third person is present.
Up to this point, I've gone after a specific element in a Web page, but there are other ways of handling documents, too. For example, you can parse that is, read and interpret the entire document at once. Here's an example; in this case, I'll work through this entire XML document, meetings.xml, displaying all its nodes in an HTML Web page.
To handle this document, I'll create a function, iterateChildren, that will read and display all the children of a node. As with most parsers, this function is a recursive function, which means that it can call itself to get the children of the current node. To get the name of a node, I will use the nodeName property. To parse an entire document, then, you just have to pass the root node of the entire document to the iterateChildren function, and it will work through the entire document, displaying all the nodes in that document:
<HTML> <HEAD> <TITLE> Parsing an XML Document </TITLE> <XML ID="meetingsXML" SRC="meetings.xml"></XML> <SCRIPT LANGUAGE="JavaScript"> function parseDocument() { documentXML = document.all("meetingsXML").XMLDocument resultsDIV.innerHTML = iterateChildren(documentXML, "") } . . .
Note that I've also passed an empty string ("") to the iterateChildren function. I'll use this string to indent the various levels of the display, to indicate what nodes are nested inside what other nodes. In the iterateChildren function, I start by creating a new text string with the current indentation string (which is either an empty string or a string of spaces), as well as the name of the current node and a <BR> element so that the browser will skip to the next line:
<HTML> <HEAD> <TITLE> Parsing an XML Document </TITLE> <XML ID="meetingsXML" SRC="meetings.xml"></XML> <SCRIPT LANGUAGE="JavaScript"> function parseDocument() { documentXML = document.all("meetingsXML").XMLDocument resultsDIV.innerHTML = iterateChildren(documentXML, "") } function iterateChildren(theNode, indentSpacing) { var text = indentSpacing + theNode.nodeName + "<BR>" . . . return text } </SCRIPT> </HEAD> . . .
I can determine whether the current node has children by checking the childNodes property, which holds a node list of the children of the current node. I can determine whether the current node has any children by checking the length of this list with its length property; if it does have children, I call iterateChildren on all child nodes. (Note also that I indent this next level of the display by adding four nonbreaking spaces which you specify with the entity reference in HTML to the current indentation string.)
<HTML> <HEAD> <TITLE> Parsing an XML Document </TITLE> <XML ID="meetingsXML" SRC="meetings.xml"></XML> <SCRIPT LANGUAGE="JavaScript"> function parseDocument() { documentXML = document.all("meetingsXML").XMLDocument resultsDIV.innerHTML = iterateChildren(documentXML, "") } function iterateChildren(theNode, indentSpacing) { var text = indentSpacing + theNode.nodeName + "<BR>" if (theNode.childNodes.length > 0) { for (var loopIndex = 0; loopIndex < theNode.childNodes.length; loopIndex++) { text += iterateChildren(theNode.childNodes(loopIndex), indentSpacing + " ") } } return text } </SCRIPT> </HEAD> . . .
And that's all it takes; here's the whole Web page:
<HTML> <HEAD> <TITLE> Parsing an XML Document </TITLE> <XML ID="meetingsXML" SRC="meetings.xml"></XML> <SCRIPT LANGUAGE="JavaScript"> function parseDocument() { documentXML = document.all("meetingsXML").XMLDocument resultsDIV.innerHTML = iterateChildren(documentXML, "") } function iterateChildren(theNode, indentSpacing) { var text = indentSpacing + theNode.nodeName + "<BR>" if (theNode.childNodes.length > 0) { for (var loopIndex = 0; loopIndex < theNode.childNodes.length; loopIndex++) { text += iterateChildren(theNode.childNodes(loopIndex), indentSpacing + " ") } } return text } </SCRIPT> </HEAD> <BODY> <CENTER> <H1> Parsing an XML Document </H1> </CENTER> <CENTER> <INPUT TYPE="BUTTON" VALUE="Parse and display the document" ONCLICK="parseDocument()"> </CENTER> <DIV ID="resultsDIV"></DIV> </BODY> </HTML>
When you click the button in this page, it will read meetings.xml and display its structure as shown in Figure 7.5. You can see all the nodes listed there, indented as they should be. Note also the "meta-names" that Internet Explorer gives to document and text nodes #document and #text.
In the previous example, the code listed the names of each node in the meetings.xml document. However, you can do more than that you can also use the nodeValue property to list the value of each node, and I'll do that in this section. In addition, you can indicate the type of each node that you come across by checking the nodeType property. Here are the possible values for this property:
Value | Description |
---|---|
1 | Element |
2 | Attribute |
3 | Text |
4 | CDATA section |
5 | Entity reference |
6 | Entity |
7 | Processing instruction |
8 | Comment |
9 | Document |
10 | Document type |
11 | Document fragment |
12 | Notation |
Here's how I determine the type of a particular node, using a JavaScript switch statement of the kind that we saw in the previous chapter:
<HTML> <HEAD> <TITLE> Parsing an XML document and displaying node type and content </TITLE> <XML ID="meetingsXML" SRC="meetings.xml"></XML> <SCRIPT LANGUAGE="JavaScript"> function parseDocument() { documentXML = document.all("meetingsXML").XMLDocument resultsDIV.innerHTML = iterateChildren(documentXML, "") } function iterateChildren(theNode, indentSpacing) { var typeData switch (theNode.nodeType) { case 1: typeData = "element" break case 2: typeData = "attribute" break case 3: typeData = "text" break case 4: typeData = "CDATA section" break case 5: typeData = "entity reference" break case 6: typeData = "entity" break case 7: typeData = "processing instruction" break case 8: typeData = "comment" break case 9: typeData = "document" break case 10: typeData = "document type" break case 11: typeData = "document fragment" break case 12: typeData = "notation" } . . .
If the node has a value (which I check by comparing nodeValue to null, which is the value that it will have if there is no actual node value), I can display that value like this:
<HTML> <HEAD> <TITLE> Parsing an XML document and displaying node type and content </TITLE> <XML ID="meetingsXML" SRC="meetings.xml"></XML> <SCRIPT LANGUAGE="JavaScript"> function parseDocument() { documentXML = document.all("meetingsXML").XMLDocument resultsDIV.innerHTML = iterateChildren(documentXML, "") } function iterateChildren(theNode, indentSpacing) { var typeData switch (theNode.nodeType) { case 1: typeData = "element" break case 2: typeData = "attribute" break case 3: typeData = "text" break case 4: typeData = "CDATA section" break case 5: typeData = "entity reference" break case 6: typeData = "entity" break case 7: typeData = "processing instruction" break case 8: typeData = "comment" break case 9: typeData = "document" break case 10: typeData = "document type" break case 11: typeData = "document fragment" break case 12: typeData = "notation" } var text if (theNode.nodeValue != null) { text = indentSpacing + theNode.nodeName + " = " + theNode.nodeValue + " (Node type: " + typeData + ")<BR>" } else { text = indentSpacing + theNode.nodeName + " (Node type: " + typeData + ")<BR>" } if (theNode.childNodes.length > 0) { for (var loopIndex = 0; loopIndex < theNode.childNodes.length; loopIndex++) { text += iterateChildren(theNode.childNodes(loopIndex), indentSpacing + " ") } } return text } </SCRIPT> </HEAD> <BODY> <CENTER> <H1> Parsing an XML document and displaying node type and content </H1> </CENTER> <CENTER> <INPUT TYPE="BUTTON" VALUE="Parse and display the document" ONCLICK="parseDocument()"> </CENTER> <DIV ID="resultsDIV"></DIV> </BODY> </HTML>
And that's all it takes; the results are shown in Figure 7.6. As you see there, the entire document is listed, as is the type of each node. In addition, if the node has a value, that value is displayed.
This example listed the nodes of a document on the other hand, some of the elements in meetings.xml have attributes as well. So how do you handle attributes?
You can get access to an element's attributes with the element's attributes property. You can get attribute names and values with the name and value properties of attribute objects I used the value property earlier in this chapter. It's also worth noting that because attributes are themselves nodes, you can use the nodeName and nodeValue properties to do the same thing; I'll do that in this example to show how it works.
Here's how I augment the previous example, looping over all the attributes that an element has and listing them. (Note that you could use the name and value properties here instead of nodeName and nodeValue.)
<HTML> <HEAD> <TITLE> Parsing XML to read attributes </TITLE> <XML ID="meetingsXML" SRC="meetings.xml"></XML> <SCRIPT LANGUAGE="JavaScript"> function parseDocument() { documentXML = document.all("meetingsXML").XMLDocument resultsDIV.innerHTML = iterateChildren(documentXML, "") } function iterateChildren(theNode, indentSpacing) { var typeData switch (theNode.nodeType) { case 1: typeData = "element" break case 2: typeData = "attribute" break case 3: typeData = "text" break case 4: typeData = "CDATA section" break case 5: typeData = "entity reference" break case 6: typeData = "entity" break case 7: typeData = "processing instruction" break case 8: typeData = "comment" break case 9: typeData = "document" break case 10: typeData = "document type" break case 11: typeData = "document fragment" break case 12: typeData = "notation" } var text if (theNode.nodeValue != null) { text = indentSpacing + theNode.nodeName + " = " + theNode.nodeValue + " (Node type: " + typeData + ")" } else { text = indentSpacing + theNode.nodeName + " (Node type: " + typeData + ")" } if (theNode.attributes != null) { if (theNode.attributes.length > 0) { for (var loopIndex = 0; loopIndex < theNode.attributes.length; loopIndex++) { text += " (Attribute: " + theNode.attributes(loopIndex).nodeName + " = \"" + theNode.attributes(loopIndex).nodeValue + "\")" } } } text += "<BR>" if (theNode.childNodes.length > 0) { for (var loopIndex = 0; loopIndex < theNode.childNodes.length; loopIndex++) { text += iterateChildren(theNode.childNodes(loopIndex), indentSpacing + " ") } } return text } </SCRIPT> </HEAD> <BODY> <CENTER> <H1> Parsing XML to read attributes </H1> </CENTER> <CENTER> <INPUT TYPE="BUTTON" VALUE="Parse and display the document" ONCLICK="parseDocument()"> </CENTER> <DIV ID="resultsDIV"></DIV> </BODY> </HTML>
You can see the results of this page in Figure 7.7; both elements and attributes are listed in that figure.
Internet Explorer also lets you track the progress of an XML document as it's being loaded. In particular, you can use the onreadystatechange and ondataavailable events to watch what's happening. The readyState property in the onreadystatechange event informs you about the current status of a document. Here's an example showing how this works:
<HTML> <HEAD> <TITLE> Handling document loading events </TITLE> <SCRIPT LANGUAGE="JavaScript"> var xmldoc function loadDocument() { xmldoc = new ActiveXObject("microsoft.XMLDOM") xmldoc.ondataavailable = dataAvailableHandler xmldoc.onreadystatechange = stateChangeHandler xmldoc.load('meetings.xml') } function dataAvailableHandler() { messageDIV.innerHTML += "Status: data available.<BR>" } function stateChangeHandler() { switch (xmldoc.readyState) { case 1: messageDIV.innerHTML += "Status: data uninitialized.<BR>" break case 2: messageDIV.innerHTML += "Status: data loading.<BR>" break case 3: messageDIV.innerHTML += "Status: data loaded.<BR>" break case 4: messageDIV.innerHTML += "Status: data loading complete.<BR>" if (xmldoc.parseError.errorCode != 0) { messageDIV.innerHTML += "Status: error.<BR>" } else { messageDIV.innerHTML += "Status: data loaded alright.<BR>" } break } } </SCRIPT> </HEAD> <BODY> <CENTER> <H1> Handling document loading events </H1> </CENTER> <CENTER> <INPUT TYPE="BUTTON" VALUE="Load the document" ONCLICK="loadDocument()"> </CENTER> <DIV ID="messageDIV"></DIV> </BODY> </HTML>
The results of this Web page appear in Figure 7.8, and you can see the progress that Internet Explorer made in loading a document in that page.
By default, Internet Explorer actually does validate XML documents as it loads them, but you won't see any validation errors unless you check the parseError object.
Turning Validation On and OffYou can turn document validation on or off with the document object's validateOnParse property, which is set to true by default. |
Here's an example; in this case, I'll load this XML document, error.xml. This document has a validation problem because the <NAME> element is declared to contain only a <FIRST_NAME> element, not a <LAST_NAME> element:
<?xml version = "1.0" standalone="yes"?> <!DOCTYPE DOCUMENT [ <!ELEMENT DOCUMENT (CUSTOMER)*> <!ELEMENT CUSTOMER (NAME,DATE,ORDERS)> <!ELEMENT NAME (FIRST_NAME)> <!ELEMENT LAST_NAME (#PCDATA)> <!ELEMENT FIRST_NAME (#PCDATA)> <!ELEMENT DATE (#PCDATA)> <!ELEMENT ORDERS (ITEM)*> <!ELEMENT ITEM (PRODUCT,NUMBER,PRICE)> <!ELEMENT PRODUCT (#PCDATA)> <!ELEMENT NUMBER (#PCDATA)> <!ELEMENT PRICE (#PCDATA)> ]> <DOCUMENT> <CUSTOMER> <NAME> <LAST_NAME>Smith</LAST_NAME> <FIRST_NAME>Sam</FIRST_NAME> </NAME> <DATE>October 15, 2001</DATE> <ORDERS> <ITEM> <PRODUCT>Tomatoes</PRODUCT> <NUMBER>8</NUMBER> <PRICE>$1.25</PRICE> </ITEM> <ITEM> <PRODUCT>Asparagus</PRODUCT> <NUMBER>12</NUMBER> <PRICE>$2.95</PRICE> </ITEM> <ITEM> <PRODUCT>Lettuce</PRODUCT> <NUMBER>6</NUMBER> <PRICE>$11.50</PRICE> </ITEM> </ORDERS> </CUSTOMER> </DOCUMENT>
Here's what the Web page that reads in and checks this document looks like here I'm using the parseError object's errorCode, url, line, linepos, errorString, and reason properties to track down the error:
<HTML> <HEAD> <TITLE> Validating documents </TITLE> <SCRIPT LANGUAGE="JavaScript"> var xmldoc function loadDocument() { xmldoc = new ActiveXObject("microsoft.XMLDOM") xmldoc.onreadystatechange = stateChangeHandler xmldoc.ondataavailable = dataAvailableHandler xmldoc.load('error.xml') } function dataAvailableHandler() { messageDIV.innerHTML += "Status: data available.<BR>" } function stateChangeHandler() { if(xmldoc.readyState == 4){ var errorString = xmldoc.parseError.srcText errorString = xmldoc.parseError.srcText.replace(/\</g, "<") errorString = errorString.replace(/\>/g, ">") if (xmldoc.parseError.errorCode != 0) { messageDIV.innerHTML = "Problem in " + xmldoc.parseError.url + " line " + xmldoc.parseError.line + " position " + xmldoc.parseError.linepos + ":<BR>Error source: " + errorString + "<BR>" + xmldoc.parseError.reason + "<BR>" + "Error: " + xmldoc.parseError.errorCode } else { messageDIV.innerHTML = "Status: document loaded alright.<BR>" } } } </SCRIPT> </HEAD> <BODY> <CENTER> <H1> Validating documents </H1> </CENTER> <DIV ID="messageDIV"></DIV> <CENTER> <INPUT TYPE="BUTTON" VALUE="Load the document" ONCLICK="loadDocument()"> </CENTER> </BODY> </HTML>
Figure 7.9 shows the results of this Web page, where the validation error is reported.
You might note that the errorString property holds the error-causing text from the XML document. Because that text is <LAST_NAME>Smith</LAST_NAME>, there's a problem the browser will try to interpret this as markup. To avoid that, I use the JavaScript String object's replace method to replace < with < and > with >. (You pass a regular expression to the replace method; to change all < characters to <, the regular expression that you use is /\</g. To change all > characters to >, you match to the regular expression /\>/g.)
Internet Explorer provides limited support for scripting XML elements. For example, I can add an onclick event attribute to an XML element named <xlink> in an XHTML document. (We'll take a look at Xlinks and XHTML later in this book; see Chapters 15, 16, and 17.)
<?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet TYPE="text/css" href="xlink.css"?> <!DOCTYPE html SYSTEM "t3.dtd"> <html> <head> </head> <body> Want to check out <xlink xml:link = "simple" inline="false" href = "http://www.w3c.org" onclick="location.href='http://www.w3c.org'">W3C</xlink>? </body> </html>
I can specify in a style sheet, xlink.css, that <xlink> elements should be displayed in blue and underlined, as a hyperlink might appear, and I can also specify that the mouse cursor should change to a hand when over this element, just as it would for an HTML hyperlink:
xlink {color: #0000FF; text-decoration: underline; cursor: hand}
The results appear in Figure 7.10 when the user clicks the <xlink> element, Internet Explorer executes the code in the onclick event attribute. In this case, that navigates the browser to http://www.w3c.org. As you can see, you can script XML elements in Internet Explorer, adding event attributes such as onclick.
You can alter the contents of an XML document in Internet Explorer. To do this, you use methods such s createElement, insertBefore, createTextNode, and appendChild.
As an example, I'll alter the document meetings.xml by inserting a new element, <MEETING_CHAIR>, like this:
<?xml version="1.0"?> <MEETINGS> <MEETING TYPE="informal"> <MEETING_CHAIR>Ted Bond</MEETING_CHAIR> <MEETING_TITLE>XML In The Real World</MEETING_TITLE> <MEETING_NUMBER>2079</MEETING_NUMBER> <SUBJECT>XML</SUBJECT> <DATE>6/1/2002</DATE> <PEOPLE> <PERSON ATTENDANCE="present"> <FIRST_NAME>Edward</FIRST_NAME> <LAST_NAME>Samson</LAST_NAME> </PERSON> . . .
I begin by creating the new node, corresponding to the <MEETING_CHAIR> element, and inserting it into the document with the insertBefore method:
<HTML> <HEAD> <XML ID="meetingsXML" SRC="meetings.xml"></XML> <SCRIPT LANGUAGE="JavaScript"> <! function alterDocument() { var xmldoc, rootNode, meetingsNode, meetingNode, createdNode, createdTextNode xmldoc = document.all.meetingsXML rootNode = xmldoc.documentElement meetingsNode = rootNode.firstChild meetingNode = meetingsNode.firstChild createdNode = xmldoc.createElement("MEETING_CHAIR") createdNode = meetingsNode.insertBefore(createdNode, meetingNode) . . .
Now I will create the text node inside this new element. The text node will hold the text "Ted Bond", and I'll create it with the createTextNode method and append it to the <MEETING_CHAIR> element with the appendChild method:
<HTML> <HEAD> <XML ID="meetingsXML" SRC="meetings.xml"></XML> <SCRIPT LANGUAGE="JavaScript"> <!-- function alterDocument() { var xmldoc, rootNode, meetingsNode, meetingNode, createdNode, createdTextNode xmldoc = document.all.meetingsXML rootNode = xmldoc.documentElement meetingsNode = rootNode.firstChild meetingNode = meetingsNode.firstChild createdNode = xmldoc.createElement("MEETING_CHAIR") createdNode = meetingsNode.insertBefore(createdNode,meetingNode) createdTextNode = xmldoc.createTextNode("Ted Bond") createdNode.appendChild(createdTextNode) . . .
Now I've altered the document but at this point, it exists only inside the xmldoc object. How do I display it in the browser? The DOMDocument object actually has a save method that enables you to save the document to a new file like this: xmldoc.save("new.xml"). However, you can't use that method without changing the security settings in Internet Explorer by default, browsers aren't supposed to be capable of writing files on the host machine.
I'll take a different approach. In this case, I'll store the XML document's text in a hidden control in an HTML form (a hidden control simply holds text invisible to the user), and send the data in that form to a server-side Active Server Pages (ASP) script. That script will just echo the document back to the browser, which, in turn, will display it. Here's the ASP script, echo.asp, where I set the MIME type of this document to "text/xml", add an <?xml?> processing instruction, and echo the XML data back to Internet Explorer. (ASP scripts such as this one are beyond the scope of this book, but we'll take a brief look at them in Chapter 20, "WML, ASP, JSP, Servlets, and Perl.")
<%@ LANGUAGE="VBSCRIPT" %> <% Response.ContentType = "text/xml" Response.Write "<?xml version=" & Chr(34) & "1.0" & Chr(34) & "?>" & Chr(13) & Chr(10) Response.Write Request("data") %>
I have an ASP server on my host machine, so the URI that I'll send the XML document to is http://default/db/echo.asp. I do that by using the HTML form's submit method (which works exactly as if the user had clicked a Submit button in the form) after loading the XML document into the page's hidden control:
<HTML> <HEAD> <XML ID="meetingsXML" SRC="meetings.xml"></XML> <SCRIPT LANGUAGE="JavaScript"> <!-- function alterDocument() { var xmldoc, rootNode, meetingsNode, meetingNode, createdNode, createdTextNode xmldoc = document.all.meetingsXML rootNode = xmldoc.documentElement meetingsNode = rootNode.firstChild meetingNode = meetingsNode.firstChild createdNode = xmldoc.createElement("MEETING_CHAIR") createdNode = meetingsNode.insertBefore(createdNode, meetingNode) createdTextNode = xmldoc.createTextNode("Ted Bond") createdNode.appendChild(createdTextNode) document.all.data.value = meetingsXML.documentElement.xml document.form1.submit() } //--> </SCRIPT> </HEAD> <BODY> <CENTER> <FORM NAME="form1" ACTION="http://default/db/echo.asp" METHOD="POST"> <INPUT TYPE="HIDDEN" NAME="data"> <INPUT TYPE="BUTTON" VALUE="Alter the document" onclick="alterDocument()"> </FORM> </CENTER> </BODY> </HTML>
Now when the user clicks the button with the caption Alter the document, the code in this page alters the XML document and sends it to the server. The ASP script on the server echoes the XML document back to the browser, which displays it, as you see in Figure 7.11. You can see the new <MEETING_CHAIR> element in that figure.
We've put JavaScript to work in this chapter, parsing and accessing XML documents. In the next chapter, I'm going to put JavaScript to work treating XML data as database objects.
CONTENTS |