Chapter 7. Handling XML Documents with JavaScript

CONTENTS
  •  The W3C DOM
  •  Loading XML Documents
  •  Getting Elements by Name
  •  Getting Attribute Values from XML Elements
  •  Parsing XML Documents in Code
  •  Handling Events While Loading XML Documents
  •  Validating XML Documents with Internet Explorer
  •  Scripting XML Elements
  •  Editing XML Documents with Internet Explorer

Having successfully mastered JavaScript in the previous chapter (for our purposes, anyway), we're going to use it in this chapter to work with the W3C Document Object Model (DOM), the W3C-standardized programming interface for handling XML documents. Before the introduction of the DOM, all XML parsers and processors had different ways of interacting with XML documents and, worse, they kept changing all the time. With the introduction of the XML DOM, things have settled down (to some extent). Note that this chapter relies on the Microsoft Internet Explorer, which provides the most complete JavaScript-accessible implementation of the DOM.

The W3C DOM

The W3C DOM specifies a way of treating a document as a tree of nodes. In this model, every discrete data item is a node, and child elements or enclosed text become subnodes. Treating a document as a tree of nodes is one good way of handling XML documents (although there are others, as we'll see when we start working with Java) because it makes it relatively easy to explicitly state which elements contain which other elements; the contained elements become subnodes of the container nodes. Everything in a document becomes a node in this model elements, element attributes, text, and so on. Here are the possible node types in the W3C DOM:

  • Element

  • Attribute

  • Text

  • CDATA section

  • Entity reference

  • Entity

  • Processing instruction

  • Comment

  • Document

  • Document type

  • Document fragment

  • Notation

For example, take a look at this document:

<?xml version="1.0" encoding="UTF-8"?> <DOCUMENT>     <GREETING>         Hello From XML     </GREETING>     <MESSAGE>         Welcome to the wild and woolly world of XML.     </MESSAGE> </DOCUMENT>

This document has a processing instruction node and a root element node corresponding to the <DOCUMENT> element. The <DOCUMENT> node has two sub-nodes, the <GREETING> and <MESSAGE> nodes. These nodes are child nodes of the <DOCUMENT> node and sibling nodes of each other. Both the <GREETING> and <MESSAGE> elements have one subnode a text node that holds character data. We'll get used to handling documents like this one as a tree of nodes in this chapter. Figure 7.1 shows what this document looks like.

Figure 7.1. Viewing the document as a tree.

graphics/07fig01.gif

Every discrete data item is itself treated as a node. Using the methods defined in the W3C DOM, you can navigate along the various branches of a document's tree using methods such as nextChild to move to the nextChild node, or lastSibling to move to the last sibling node of the current node. Working with a document this way takes a little practice, and that's what this chapter is all about.

There are a number of different levels of DOM:

  • Level 0. There is no official DOM "level 0," but that's the way W3C refers to the DOM as implemented in relatively early versions of the popular browsers in particular, Netscape Navigator 3.0 and Microsoft Internet Explorer 3.0.

  • Level 1. This level of the DOM is the current W3C recommendation, and it concentrates on the HTML and XML document models. You can find the documentation for this level at http://www.w3.org/TR/REC-DOM-Level-1/.

  • Level 2. Currently at the Candidate Recommendation stage, this level of the DOM is more advanced and includes a style sheet object model. It also adds functionality for manipulating the style information attached to a document. In addition, it enables you to traverse a document, has a built-in event model, and supports XML namespaces. You can find the documentation for this level at http://www.w3.org/TR/DOM-Level-2/.

  • Level 3. This level is still in the planning stage and will address document loading and saving, as well as content models (such as DTDs and schemas) with document validation support. In addition, it will also address document views and formatting, key events, and event groups. There is no documentation on this level yet.

Practically speaking, the only nearly complete implementation of the XML DOM today is that in Internet Explorer version 5 or later. You can find the documentation for the Microsoft DOM at http://msdn.microsoft.com/library/psdk/xmlsdk/xmld20ab.htm as of this writing. However, the Microsoft sites are continually (and annoyingly) being reorganized, so it's quite possible that by the time you read this, that page will be long gone. In that case, your best bet is to go to http://msdn.microsoft.com and search for "xml dom." (The general rule is not to trust an URL at a Microsoft site for more than about two months.)

Because Internet Explorer provides substantial support for the W3C DOM level 1, I'm going to use it in this chapter. Let's hope that the translation to other W3C-compliant browsers, as those browsers begin to support the W3C DOM, won't be terribly difficult.

The XML DOM Objects

Here are the official W3C DOM level 1 objects:

Object Description
Document The document object.
DocumentFragment Reference to a fragment of a document.
DocumentType Reference to the <!DOCTYPE> element.
EntityReference Reference to an entity.
Element An element.
Attr An attribute.
ProcessingInstruction A processing instruction.
Comment Content of an XML comment.
Text Text content of an element or attribute.
CDATAsection CDATA section
Entity Indication of a parsed or unparsed entity in the XML document.
Notation Holder for a notation.
Node A single node in the document tree.
NodeList A list of node objects. This allows iteration and indexed access operations.
NamedNodeMap Allows iteration and access by name to the collection of attributes.

Microsoft uses different names for these objects and adds its own. In particular, Microsoft defines a set of "base objects" that form the foundation of its XML DOM. The top-level object is the DOMDocument object, and it's the only one that you create directly you reach the other objects through that object. Here's the list of base objects in Internet Explorer. Note the objects designed to treat a document as a tree of nodes XMLDOMNode, XMLDOMNodeList, and so on:

Object Description
DOMDocument The top node of the XML DOM tree.
XMLDOMNode A single node in the document tree. It includes support for data types, namespaces, DTDs, and XML schemas.
XMLDOMNodeList A list of node objects. It allows iteration and indexed access operations.
XMLDOMNamedNodeMap Allows iteration and access by name to the collection of attributes.
XMLDOMParseError Information about the most recent error. It includes error number, line number, character position, and a text description.
XMLHttpRequest Allows communication with HTTP servers.
XTLRuntime Supports methods that you can call from XSL style sheets.

Besides these base objects, the Microsoft XML DOM also provides these XML DOM objects that you use when working with documents in code, including the various types of nodes, which you see supported with objects of types such as XMLDOMAttribute, XMLDOMCharacterData, and XMLDOMElement:

Object Description
XMLDOMAttribute Stands for an attribute object.
XMLDOMCDATASection Handles CDATA sections so that text is not interpreted as markup language.
XMLDOMCharacterData Provides methods used for text manipulation.
XMLDOMComment Gives the content of an XML comment.
XMLDOMDocumentFragment Is a lightweight object useful for tree insert operations.
XMLDOMDocumentType Holds information connected to the document type declaration.
XMLDOMElement Stands for the element object.
XMLDOMEntity Stands for a parsed or unparsed entity in the XML document.
XMLDOMEntityReference Stands for an entity reference node.
XMLDOMImplementation Supports general DOM methods.
XMLDOMNotation Holds a notation (as declared in the DTD or schema).
XMLDOMProcessingInstruction Is a processing instruction.
XMLDOMText Is text content of an element or attribute.

We'll put many of these objects to work in this chapter, seeing how to parse and access XML documents using the Microsoft XML DOM and handling events as documents are loaded. We'll also see how to alter an XML document at run time.

This previous list of objects is pretty substantial, and each object can contain its own properties, methods, and events. Although most of these properties, methods, and events are specified in the W3C XML DOM, many are added by Microsoft as well (and so are nonstandard). If we're going to work with the XML DOM in practice, it's essential to have a good understanding of these objects, both practically for the purposes of this chapter and also for reference. I'll go through the major objects in some detail to make handling the XML DOM clear, starting with the main object, the DOMDocument object.

The DOMDocument Object

The DOMDocument object is the main object that you work with, and it represents the top node in every document tree. When working with the DOM, this is the only object that you create directly.

As we'll see in this chapter, there are two ways to create document objects in Internet Explorer: using the Microsoft.XMLDOM class and using XML data islands. Creating a document object with the Microsoft.XMLDOM class looks like this, where you explicitly load a document into the object with the load method:

function readXMLDocument() {     var xmldoc     xmldoc = new ActiveXObject("Microsoft.XMLDOM")     xmldoc.load("meetings.xml")     .     .     .

We'll also see that you can use the <XML> HTML element to create a data island in Internet Explorer, and then use the XMLDocument property of that element to gain access to the corresponding document object:

<XML ID="meetingsXML" SRC="meetings.xml"></XML> <SCRIPT LANGUAGE="JavaScript">     function readXMLDocument()     {          xmldoc = document.all("meetingsXML").XMLDocument          .          .          .

XML DOM and Multithreaded Programs

There's also a "free-threaded" version of the Microsoft.XMLDOM class that you can use in multithreaded programs:

var xmldoc = new ActiveXObject("Microsoft.FreeThreadedXMLDOM")

For more information on this advanced topic, take a look at the Microsoft XML DOM site (at http://msdn.microsoft.com/library/psdk/xmlsdk/xmld20ab.htm as of this writing, but very probably moved by the time you read this).

Here are the properties of this object:

Property Description
async[*] Indicates whether asynchronous download is allowed. Read/write.
attributes Holds the list of attributes for this node. Read-only.
baseName[*] Is the base name qualified with the namespace. Read-only.
childNodes Holds a node list containing child nodes for nodes that may have children. Read-only.
dataType[*] Gives the data type for this node. Read/write.
definition[*] Gives the definition of the node in the DTD or schema. Read-only.
doctype Specifies the document type node, which is what specifies the DTD for this document. Read-only.
documentElement Gives the root element of the document. Read/write.
firstChild Gives the first child of the current node. Read-only.
implementation Specifies the XMLDOMImplementation object for this document. Read-only.
lastChild Gives the last child node of the current node. Read-only.
namespaceURI[*] Gives the URI for the namespace. Read-only.
nextSibling Specifies the next sibling of the current node. Read-only.
nodeName Specifies the qualified name of the element, attribute, or entity reference. Holds a fixed string for other node types. Read-only.
nodeType Gives the XML DOM node type. Read-only.
nodeTypedValue[*] Holds this node's value. Read/write.
nodeTypeString[*] Gives the node type expressed as a string. Read-only.
nodeValue Is the text associated with the node. Read/write.
ondataavailable[*] Is the event handler for the ondataavailable event. Read/write.
onreadystatechange[*] Is the event handler that handles readyState property changes. Read/write.
ontransformnode[*] Is the event handler for the ontransformnode event. Read/write.
ownerDocument Gives the root of the document that contains this node. Read-only.
parentNode Specifies the parent node (for nodes that can have parents). Read-only.
parsed[*] Is true if this node and all descendants have been parsed; is false otherwise. Read-only.
parseError[*] Is an XMLDOMParseError object with information about the most recent parsing error. Read-only.
prefix[*] Gives the namespace prefix. Read-only.
preserveWhiteSpace[*] Is true if processing should preserve whitespace; is false otherwise. Read/write.
previousSibling Specifies the previous sibling of this node. Read-only.
readyState[*] Gives the current state of the XML document. Read-only.
resolveExternals[*] Indicates whether external definitions are to be resolved at parse time. Read/write.
specified[*] Indicates whether the node is explicitly given or derived from a default value. Read-only.
text[*] Gives the text content of the node and its subtrees. Read/write.
url[*] Specifies the canonical URL for the most recently loaded XML document. Read-only.
validateOnParse[*] Indicates whether the parser should validate this document. Read/write.
xml[*] Gives the XML representation of the node and all its descendants. Read-only.

[*] Microsoft extension to the W3C DOM.

Here are the methods of the document object:

Method Description
abort[*] Aborts an asynchronous download
appendChild Appends a new child as the last child of the current node
cloneNode Returns a new node that is a copy of this node
createAttribute Returns a new attribute with the given name
createCDATASection Returns a CDATA section node that contains the given data
createComment Returns a comment node
createDocumentFragment Returns an empty DocumentFragment object
createElement Returns an element node using the given name
createEntityReference Returns a new EntityReference object
createNode[*] Returns a node using the given type, name, and namespace
createProcessingInstruction Returns a processing instruction node
createTextNode Returns a text node that contains the given data
getElementsByTagName Yields a collection of elements that have the given name
hasChildNodes Is true if this node has children
insertBefore Inserts a child node before the given node
load[*] Loads an XML document from the given location
loadXML[*] Loads an XML document using the given string
nodeFromID[*] Yields the node whose ID attribute matches the given value
removeChild Removes the given child node from the list of children
replaceChild Replaces the given child node with the given new child node
save[*] Saves an XML document to the given location
selectNodes[*] Applies the given pattern-matching operation to this node's context, returning a list of matching nodes
selectSingleNode[*] Applies the given pattern-matching operation to this node's context, returning the first matching node
transformNode[*] Transforms this node and its children using the given XSL style sheet
transformNodeToObject[*] Transforms this node and its children to an object, using the given XSL style sheet

Here are the events of the document object:

Event Description
ondataavailable[*] Indicates that XML document data is available
onreadystatechange[*] Indicates when the readyState property changes
ontransformnode[*] Happens before each node in the style sheet is applied in the XML source

The XMLDOMNode Object

The Microsoft XMLDOMNode object extends the core XML DOM node interface by adding support for data types, namespaces, DTDs, and schemas as implemented in Internet Explorer. We'll use this object a good deal as we traverse document trees. Here are the properties of this object:

Property Description
attributes The list of attributes for this node. Read-only.
baseName[*] The base name for the name qualified with the namespace. Read-only.
childNodes A node list containing the child nodes of the current node. Read-only.
dataType[*] The data type for this node. Read/write.
definition[*] The definition of the node in the DTD or schema. Read-only.
firstChild The first child of the current node. Read-only.
lastChild The last child of the current node. Read-only.
namespaceURI[*] The URI for the namespace. Read-only.
nextSibling The next sibling of this node. Read-only.
nodeName Holder for a qualified name for an element, attribute, or entity reference, or a string for other node types. Read-only.
nodeType The XML DOM node type. Read-only.
nodeTypedValue[*] The node's value. Read/write.
nodeTypeString[*] The node type in string form. Read-only.
nodeValue The text associated with the node. Read/write.
ownerDocument The root of the document. Read-only.
parentNode The parent node. Read-only.
parsed[*] True if this node and all descendants have been parsed; false otherwise. Read-only.
prefix[*] The namespace prefix. Read-only.
previousSibling The previous sibling of this node. Read-
specified[*] Indication of whether a node is explicitly given or derived from a default value. Read-only.
text[*] The text content of the node and its subtrees. Read/write.
xml[*] The XML representation of the node and all its descendants. Read-only.

Here are the methods of this object:

Method Description
appendChild Appends a new child as the last child of this node
cloneNode Creates a new node that is a copy of this node
hasChildNodes Is true if this node has children
insertBefore Inserts a child node before the given node
removeChild Removes the given child node
replaceChild Replaces the given child node with the given new child node
selectNodes[*] Applies the given pattern-matching operation to this node's context, returning a list of matching nodes
selectSingleNode[*] Applies the given pattern-matching operation to this node's context, returning the first matching node
transformNode[*] Transforms this node and its children using the given XSL style sheet
transformNodeToObject[*] Transforms this node and its children using the given XSL style sheet, returning the result in an object

This object has no events.

The XMLDOMNodeList Object

You use the XMLDOMNodeList to handle lists of nodes. Node lists are useful because a node itself can have many child nodes. Using a node list, you can handle all the children of a node at once.

For example, here I'm loading a document and getting a list of all <PERSON> elements as a node list, using the document object's getElementsByTagName method:

function readXMLDocument() {     var xmldoc, nodeList     xmldoc = new ActiveXObject("Microsoft.XMLDOM")     xmldoc.load("meetings.xml")     nodeList = xmlDoc.getElementsByTagName("PERSON")     .     .     .

The XMLDOMNodeList object has a single property, length, which describes the number of items in the collection and is read-only.

Here are the methods of the XMLDOMNodeList object:

Method Description
item Allows random access to nodes in the collection
nextNode[*] Indicates the next node in the collection
reset[*] Resets the list iterator

This object has no events.

The XMLDOMNamedNodeMap Object

The Microsoft XML DOM also supports an XMLDOMNamedNodeMap object, which provides support for namespaces. Here are the properties of this object:

Property Description
length Gives the number of items in the collection. Read-only.
item Allows random access to nodes in the collection. Read-only.

Here are the methods of this object:

Method Description
getNamedItem Gets the attribute with the given name
getQualifiedItem[*] Gets the attribute with the given namespace and attribute name
nextNode Gets the next node
removeNamedItem Removes an attribute
removeQualifiedItem Removes the attribute with the given namespace and attribute name
reset Resets the list iterator
setNamedItem Adds the given node

This object has no events.

The XMLDOMParseError Object

The Microsoft XMLDOMParseError object holds information about the most recent parse error, including the error number, line number, character position, and a text description. Although it's not obvious to anyone who loads an XML document into Internet Explorer, the browser does actually validate the document using either a DTD or schema if one is supplied. It's not obvious that this happens because, by default, Internet Explorer does not display any validation error messages. However, if you use the XMLDOMParseError object, you can get a full validation report, and I'll do so later in this chapter.

Here are the properties of this object:

Property Description
errorCode The error code of the most recent parse error. Read-only.
filepos The file position where the error occurred. Read-only.
line The line number that contains the error. Read-only.
linepos The character position in the line where the error happened. Read-only.
reason The reason for the error. Read-only.
srcText The full text of the line containing the error. Read-only.
url The URL of the XML document containing the last error. Read-only.

Note that this object does not have any methods or events, and it does not correspond to any official W3C object in the W3C DOM.

The XMLDOMAttribute Object

In both the W3C and Microsoft DOM, attribute objects are node objects (that is, they are based on the node object), but they are not actually child nodes of an element and are not considered part of the document tree. Instead, attributes are considered properties of their associated elements. (This means that properties such as parentNode, previousSibling, or nextSibling are meaningless for attributes.) We'll see how to work with attributes in this chapter.

Here are the properties of the XMLDOMAttribute object:

Property Description
attributes The list of attributes for this node. Read-only.
baseName[*] The base name for the name qualified with the namespace. Read-only.
childNodes A node list containing child nodes. Read-only.
dataType[*] The data type of this node. Read/write.
definition[*] The definition of the node in the DTD or schema. Read-only.
firstChild The first child of the current node. Read-only.
lastChild The last child of the current node. Read-only.
name The attribute name. Read-only.
namespaceURI[*] The URI for the namespace. Read-only.
nextSibling The next sibling of this node. Read-only.
nodeName The qualified name for an element, attribute, or entity reference, or a string for other node types. Read-only.
nodeType The XML DOM node type. Read-only.
nodeTypedValue[*] The node's value. Read/write.
nodeTypeString[*] The node type in string form. Read-only.
nodeValue The text associated with the node. Read/write.
ownerDocument The root of the document. Read-only.
parentNode Holder for the parent node (for nodes that can have parents). Read-only.
parsed[*] True if this node and all descendants have been parsed; false otherwise. Read-only.
prefix[*] The namespace prefix. Read-only.
previousSibling The previous sibling of this node. Read-only.
specified Indication of whether the node (usually an attribute) is explicitly specified or derived from a default value. Read-only.
text The text content of the node and its subtrees. Read/write.
value The attribute's value. Read/write.
xml The XML representation of the node and all its descendants. Read-only.

Here are the methods of the XMLDOMAttribute object:

Method Description
appendChild Appends a new child as the last child of this node
cloneNode Returns a new node that is a copy of this node
hasChildNodes Is true if this node has children
insertBefore Inserts a child node before the given node
removeChild Removes the given child node from the list
replaceChild Replaces the given child node with the given new child node
selectNodes Applies the given pattern-matching operation to this node's context, returning a list of matching nodes
selectSingleNode Applies the given pattern-matching operation to this node's context, returning the first matching node
transformNode Transforms this node and its children using the given XSL style sheet
transformNodeToObject Transforms this node and its children using the given XSL style sheet, and returns the result in an object

This object does not support any events.

The XMLDOMElement Object

XMLDOMElement objects represent elements and are probably the most common node objects that you'll deal with. Because attributes are not considered child nodes of an element object, you use special methods to get the attributes of an element for example, you can use the getAttribute method, which returns an XMLDOMNamedNodeMap object that contains all the element's attributes.

Here are the properties of the XMLDOMElement object:

Property Description
attributes The list of attributes for this node. Read-only.
baseName[*] The base name for the name qualified with the namespace. Read-only.
childNodes A node list containing the children. Read-only.
dataType[*] The data type for this node. Read/write.
definition[*] The definition of the node in the DTD or schema.
firstChild The first child of this node. Read-only.
lastChild The last child node of this node. Read-only.
namespaceURI[*] The URI for the namespace. Read-only.
nextSibling The next sibling of this node. Read-only.
nodeName Holder for the qualified name of an element, attribute, or entity reference, or a string for other node types. Read-only.
nodeType Indication of the XML DOM node type. Read-only.
nodeTypeString[*] The node type in string form. Read-only.
nodeValue The text associated with the node. Read/write.
ownerDocument The root of the document. Read-only.
parentNode The parent node of the current node. Read-only.
parsed[*] True if this node and all descendants have been parsed; false otherwise. Read-only.
prefix[*] The namespace prefix. Read-only.
previousSibling The previous sibling of this node. Read-only.
specified[*] Indication of whether the node is explicitly specified or derived from a default value in the DTD or schema. Read-only.
tagName Holder for the element name. Read-only.
text[*] Holder for the text content of the node and its subtrees. Read/write.
xml[*] Holder for the XML representation of the node and all its descendants. Read-only.

Here are the methods of the XMLDOMElement object:

Method Description
appendChild Appends a new child as the last child of the current node
cloneNode Returns a new node that is a copy of this node
getAttribute Gets the value of the named attribute
getAttributeNode Gets the named attribute node
getElementsByTagName Returns a list of all descendant elements that match the given name
hasChildNodes Is true if this node has children
insertBefore Inserts a child node before the given node
normalize Normalizes all descendent elements, combining two or more text nodes next to each other into one text node
removeAttribute Removes or replaces the named attribute
removeAttributeNode Removes the given attribute from this element
removeChild Removes the given child node
replaceChild Replaces the given child node with the given new child node
selectNodes[*] Applies the given pattern-matching operation to this node's context, returning the list of matching nodes
selectSingleNode[*] Applies the given pattern-matching operation to this node's context, returning the first matching node
setAttribute Sets the value of a named attribute
setAttributeNode Adds or changes the given attribute node on this element
transformNode[*] Transforms this node and its children using the given XSL style sheet
transformNodeToObject[*] Transforms this node and its children using the given XSL style sheet, and returns the resulting transformation as an object

This object has no events.

The XMLDOMText Object

The XMLDOMText object holds the text content of an element or attribute. If there is no markup inside an element, but there is text, that element will contain only one node a text node that holds the text. (In mixed-content models, text nodes can have sibling element nodes.)

When a document is first made available to the XML DOM, all text is normalized, which means that there is only one text node for each block of text. You can actually create text nodes that are adjacent to each other, although they will not be saved as distinct the next time that the document is opened. (It's worth noting that the normalize method on the XMLDOMElement object merges adjacent text nodes into a single node.)

Here are the properties of the XMLDOMText object:

Property Description
attributes Holder for the list of attributes for this node. Read-only.
baseName[*] The base name for the name qualified with the namespace. Read-only.
childNodes A node list containing the child nodes. Read-only.
data This node's data (what's actually stored depends on the node type). Read/write.
dataType[*] The data type for this node. Read/write.
definition[*] The definition of the node in the DTD or schema. Read-only.
firstChild The first child of the current node. Read-only.
lastChild The last child of the current node. Read-only.
length The length, in characters, of the data. Read-only.
namespaceURI[*] The URI for the namespace. Read-only.
nextSibling The next sibling of this node. Read-only.
nodeName The qualified name of an element, attribute, or entity reference, or a string for other node types. Read-only.
nodeType Indication of the XML DOM node type. Read-only.
nodeTypedValue[*] This node's value. Read/write.
nodeTypeString[*] The node type in string form. Read-only.
nodeValue The text associated with the node. Read/write.
ownerDocument The root of the document. Read-only.
parentNode The parent node. Read-only.
parsed[*] True if this node and all descendants have been parsed; false otherwise. Read-only.
prefix[*] The namespace prefix. Read-only.
previousSibling The previous sibling of this node. Read-only.
specified Indication of whether the node is explicitly specified or derived from a default value. Read-only.
text[*] Holder for the text content of the node and its subtrees. Read/write.
xml[*] Holder for the XML representation of the node and all its descendants. Read-only.

Here are the methods of the XMLDOMText object:

Method Description
appendChild Appends a new child as the last child of this node
appendData Appends the given string to the existing string data
cloneNode Returns a new node that is a copy of this node
deleteData Removes the given substring within the string data
hasChildNodes Is true if this node has children
insertBefore Inserts a child node before the specified node
insertData Inserts the supplied string at the specified offset
removeChild Removes the specified child node from the list of children
replaceChild Replaces the specified child node with the given new child node
selectNodes[*] Replaces the given number of characters with the given string
selectSingleNode[*] Applies the given pattern-matching operation to this node's context, returning a list of matching nodes
specified[*] Applies the specified pattern-matching operation to this node's context, returning an object
splitText Breaks this text node into two text nodes
substringData Returns a substring of the full string
transformNode[*] Transforms this node and its children using the given XSL style sheet
transformNodeToObject[*] Transforms this node and its children using the given XSL style sheet, and returns the resulting transformation as an object

This object doesn't support any events.

That gives us an overview of the most commonly used objects in the Microsoft XML DOM. Now I'm going to put them to work in the rest of the chapter. I'll start at the beginning loading an XML document.

Loading XML Documents

Our first step will be to load an XML document into Internet Explorer using code, and to create a document object. Using this object, we'll be able to access all aspects of the document itself.

As mentioned earlier in this chapter, there are two ways to load an XML document into Internet Explorer so that you have access to it using JavaScript. To see how this works, I'll use this XML document, meetings.xml, throughout this chapter this document records business meetings, including who was present and when the meeting occurred:

<?xml version="1.0"?> <MEETINGS>    <MEETING TYPE="informal">        <MEETING_TITLE>XML In The Real World</MEETING_TITLE>        <MEETING_NUMBER>2079</MEETING_NUMBER>        <SUBJECT>XML</SUBJECT>        <DATE>6/1/2002</DATE>        <PEOPLE>            <PERSON ATTENDANCE="present">                <FIRST_NAME>Edward</FIRST_NAME>                <LAST_NAME>Samson</LAST_NAME>            </PERSON>            <PERSON ATTENDANCE="absent">                <FIRST_NAME>Ernestine</FIRST_NAME>                <LAST_NAME>Johnson</LAST_NAME>            </PERSON>            <PERSON ATTENDANCE="present">                <FIRST_NAME>Betty</FIRST_NAME>                <LAST_NAME>Richardson</LAST_NAME>            </PERSON>        </PEOPLE>    </MEETING> </MEETINGS>

The first way of loading an XML document into Internet Explorer is to create a document object using the Microsoft.XMLDOM class.

To see this in action, I'm going to create an example that reads in meetings.xml and retrieves the name of the third person in that document (Betty Richardson). I start by creating a new document object like this (recall that you use the new operator to create a new object): xmldoc = new ActiveXObject("Microsoft.XMLDOM"). Here's how it looks in code:

<HTML>     <HEAD>          <TITLE>              Reading XML element values          </TITLE>          <SCRIPT LANGUAGE="JavaScript">               function readXMLDocument()               {                   var xmldoc                   xmldoc = new ActiveXObject("Microsoft.XMLDOM")                   .                   .                   .     </HEAD> </HTML>

Now I can load in the XML document meetings.xml:

<HTML>     <HEAD>          <TITLE>              Reading XML element values          </TITLE>          <SCRIPT LANGUAGE="JavaScript">               function readXMLDocument()               {                   var xmldoc                   xmldoc = new ActiveXObject("Microsoft.XMLDOM")                   xmldoc.load("meetings.xml")                   .                   .                   .     </HEAD> </HTML>

The next step is to get a node object corresponding to the document's root element, <MEETINGS>. You do that with the documentElement method:

<HTML>     <HEAD>          <TITLE>              Reading XML element values          </TITLE>          <SCRIPT LANGUAGE="JavaScript">               function readXMLDocument()               {                   var xmldoc, meetingsNode                   xmldoc = new ActiveXObject("Microsoft.XMLDOM")                   xmldoc.load("meetings.xml")                   meetingsNode = xmldoc.documentElement                   .                   .                   .     </HEAD> </HTML>

At this point, I'm free to move around the document as I like, using methods such as firstChild, nextChild, previousChild, and lastChild, which let you access the child elements of an element, and the firstSibling, nextSibling, previousSibling, and lastSibling methods, which let you access elements on the same nesting level. For example, the <MEETING> element is the first child of the document root element, <MEETINGS>, so I can get a node corresponding to the <MEETING> element using the firstChild method:

<HTML>     <HEAD>          <TITLE>              Reading XML element values          </TITLE>          <SCRIPT LANGUAGE="JavaScript">               function readXMLDocument()               {                   var xmldoc, meetingsNode, meetingNode,                   xmldoc = new ActiveXObject("Microsoft.XMLDOM")                   xmldoc.load("meetings.xml")                   meetingsNode = xmldoc.documentElement                   meetingNode = meetingsNode.firstChild                   .                   .                   .     </HEAD> </HTML>

I want to track down the third <PERSON> element inside the <PEOPLE> element. The <PEOPLE> element is the last child of the <MEETING> element, so I can get a node corresponding to the <PEOPLE> element this way:

<HTML>     <HEAD>          <TITLE>              Reading XML element values          </TITLE>          <SCRIPT LANGUAGE="JavaScript">               function readXMLDocument()               {                   var xmldoc, meetingsNode, meetingNode, peopleNode                   var first_nameNode, last_nameNode                   xmldoc = new ActiveXObject("Microsoft.XMLDOM")                   xmldoc.load("meetings.xml")                   meetingsNode = xmldoc.documentElement                   meetingNode = meetingsNode.firstChild                   peopleNode = meetingNode.lastChild                   .                   .                   .     </HEAD> </HTML>

I want the third person in the <PEOPLE> element, which is the last child of this element, so I get access to that person with the lastChild method:

<HTML>     <HEAD>          <TITLE>              Reading XML element values          </TITLE>          <SCRIPT LANGUAGE="JavaScript">               function readXMLDocument()               {                   var xmldoc, meetingsNode, meetingNode, peopleNode                   xmldoc = new ActiveXObject("Microsoft.XMLDOM")                   xmldoc.load("meetings.xml")                   meetingsNode = xmldoc.documentElement                   meetingNode = meetingsNode.firstChild                   peopleNode = meetingNode.lastChild                   personNode = peopleNode.lastChild                   .                   .                   .     </HEAD> </HTML>

Finally, I can get a node corresponding to the <FIRST_NAME> and <LAST_NAME> elements that holds the appropriate person's name using the firstChild and nextSibling (which gets the current node's next sibling node) methods:

<HTML>     <HEAD>          <TITLE>              Reading XML element values          </TITLE>          <SCRIPT LANGUAGE="JavaScript">               function readXMLDocument()               {                   var xmldoc, meetingsNode, meetingNode, peopleNode                   var first_nameNode, last_nameNode                   xmldoc = new ActiveXObject("Microsoft.XMLDOM")                   xmldoc.load("meetings.xml")                   meetingsNode = xmldoc.documentElement                   meetingNode = meetingsNode.firstChild                   peopleNode = meetingNode.lastChild                   personNode = peopleNode.lastChild                   first_nameNode = personNode.firstChild                   last_nameNode = first_nameNode.nextSibling                   .                   .                   .     </HEAD> </HTML>

Now I've walked the tree to get nodes corresponding to the actual elements that I want. Note, however, that the node I want is actually the text nodes inside the <FIRST_NAME> and <LAST_NAME> elements, which hold the person's name. That means that I have to get the first child of those elements (that is, the text node), and then use the nodeValue property of that text node to read the person's name.

To actually display the person's first and last names, I'll use a little dynamic HTML here, I'm going to use an HTML <DIV> element and the innerHTML property of that element (which holds the text content of the <DIV> element) to display the person's name, like this:

<HTML>     <HEAD>          <TITLE>              Reading XML element values          </TITLE>          <SCRIPT LANGUAGE="JavaScript">               function readXMLDocument()               {                   var xmldoc, meetingsNode, meetingNode, peopleNode                   var first_nameNode, last_nameNode, outputText                   xmldoc = new ActiveXObject("Microsoft.XMLDOM")                   xmldoc.load("meetings.xml")                   meetingsNode = xmldoc.documentElement                   meetingNode = meetingsNode.firstChild                   peopleNode = meetingNode.lastChild                   personNode = peopleNode.lastChild                   first_nameNode = personNode.firstChild                   last_nameNode = first_nameNode.nextSibling                   outputText = "Third name: " +                         first_nameNode.firstChild.nodeValue + ' '                       + last_nameNode.firstChild.nodeValue                   messageDIV.innerHTML=outputText              }          </SCRIPT>     </HEAD>     <BODY>         <CENTER>             <H1>                 Reading XML element values             </H1>             <INPUT TYPE="BUTTON" VALUE="Get the name of the third person"                 ONCLICK="readXMLDocument()">             <P>             <DIV ID="messageDIV"></DIV>         </CENTER>     </BODY> </HTML>

I've also added a button with the caption Get the name of the third person that will call the JavaScript function we've defined, readXMLDocument, and that function reads and displays the document.

You can see this page at work in Internet Explorer in Figure 7.2. When the user clicks the button, the XML document meetings.xml is read and parsed, and we retrieve and display the third person's name. We've made substantial progress.

Figure 7.2. Reading an XML element in Internet Explorer.

graphics/07fig02.gif

Using XML Data Islands

As of Internet Explorer version 5, you can also use XML data islands to actually embed XML inside HTML pages. Internet Explorer supports an HTML <XML> element (which is not part of the HTML standard) that you can simply enclose an XML document inside, like this:

<XML ID="greeting">     <DOCUMENT>         <GREETING>Hi there XML!</GREETING>     </DOCUMENT> </XML>

The Internet Explorer <XML> element has some attributes worth noting:

Attribute Description
ID The ID with which you can refer to the <XML> element in code. Set to an alphanumeric string.
NS The URI of the XML namespace used by the XML content. Set to a URI.
PREFIX Namespace prefix of the XML contents. Set to an alphanumeric string.
SRC Source for the XML document, if the document is external. Set to a URI.

When you use this element, you access it using its ID value in code. To reach the element, you can use the all collection, passing it the ID that you gave the element, like this, for the above example: document.all("greeting"). To get the document object corresponding to the XML document, you can then use the XMLDocument property. Here's how I convert the previous example to use a data island instead of the Microsoft.XMLDOM object:

<HTML>     <HEAD>          <TITLE>              Reading element values with XML data islands          </TITLE>          <XML ID="meetingsXML" SRC="meetings.xml"></XML>          <SCRIPT LANGUAGE="JavaScript">              function readXMLDocument()              {                   var xmldoc, meetingsNode, meetingNode, peopleNode                   var first_nameNode, last_nameNode, outputText                   xmldoc= document.all("meetingsXML").XMLDocument                   meetingsNode = xmldoc.documentElement                   meetingNode = meetingsNode.firstChild                   peopleNode = meetingNode.lastChild                   personNode = peopleNode.lastChild                   first_nameNode = personNode.firstChild                   last_nameNode = first_nameNode.nextSibling                   outputText = "Third name: " +                         first_nameNode.firstChild.nodeValue + ' '                       + last_nameNode.firstChild.nodeValue                   messageDIV.innerHTML=outputText              }          </SCRIPT>     </HEAD>     <BODY>         <CENTER>             <H1>                 Reading element values with XML data islands             </H1>             <INPUT TYPE="BUTTON" VALUE="Get the name of the third person"                 ONCLICK="readXMLDocument()">             <P>             <DIV ID="messageDIV"></DIV>         </CENTER>     </BODY> </HTML>

This example works as the previous example did, as shown in Figure 7.3.

Figure 7.3. Using XML data islands in Internet Explorer.

graphics/07fig03.gif

In the previous example, I used an external XML document, meetings.xml, which I referenced with the <XML> element's SRC attribute. However, you can also enclose the entire XML document in the <XML> element, like this:

<HTML>     <HEAD>          <TITLE>              Creating An XML Data Island          </TITLE>          <XML ID="meetingsXML">             <?xml version="1.0"?>             <MEETINGS>                <MEETING TYPE="informal">                    <MEETING_TITLE>XML In The Real World</MEETING_TITLE>                    <MEETING_NUMBER>2079</MEETING_NUMBER>                    <SUBJECT>XML</SUBJECT>                    <DATE>6/1/2002</DATE>                    <PEOPLE>                        <PERSON ATTENDANCE="present">                            <FIRST_NAME>Edward</FIRST_NAME>                            <LAST_NAME>Samson</LAST_NAME>                        </PERSON>                        <PERSON ATTENDANCE="absent">                            <FIRST_NAME>Ernestine</FIRST_NAME>                            <LAST_NAME>Johnson</LAST_NAME>                        </PERSON>                        <PERSON ATTENDANCE="present">                            <FIRST_NAME>Betty</FIRST_NAME>                            <LAST_NAME>Richardson</LAST_NAME>                        </PERSON>                    </PEOPLE>                </MEETING>             </MEETINGS>         </XML>          <SCRIPT LANGUAGE="JavaScript">              function readXMLDocument()              {                   var xmldoc, meetingsNode, meetingNode, peopleNode                   var first_nameNode, last_nameNode, outputText                   xmldoc= document.all("meetingsXML").XMLDocument                   meetingsNode = xmldoc.documentElement                   meetingNode = meetingsNode.firstChild                   peopleNode = meetingNode.lastChild                   personNode = peopleNode.lastChild                   first_nameNode = personNode.firstChild                   last_nameNode = first_nameNode.nextSibling                   outputText = "Third name: " +                         first_nameNode.firstChild.nodeValue + ' '                       + last_nameNode.firstChild.nodeValue                   messageDIV.innerHTML=outputText              }          </SCRIPT>     </HEAD>     <BODY>         <CENTER>             <H1>                 Reading element values with XML data islands             </H1>             <INPUT TYPE="BUTTON" VALUE="Get the name of the third person"                 ONCLICK="readXMLDocument()">             <P>             <DIV ID="messageDIV"></DIV>         </CENTER>     </BODY> </HTML>

So far, I've used the XMLDocument property of the object corresponding to the XML data island to get the document object, but you can also use the documentElement property of the data island directly to get the root element of the XML document, like this:

<HTML>     <HEAD>          <TITLE>              Reading XML element values          </TITLE>          <XML ID="meetingsXML" SRC="meetings.xml"></XML>          <SCRIPT LANGUAGE="JavaScript">               function readXMLDocument()               {                   var xmldoc, meetingsNode, meetingNode, peopleNode                   var first_nameNode, last_nameNode, outputText                   meetingsNode = meetingsXML.documentElement                   meetingNode = meetingsNode.firstChild                   peopleNode = meetingNode.lastChild                   personNode = peopleNode.lastChild                   first_nameNode = personNode.firstChild                   last_nameNode = first_nameNode.nextSibling                   .                   .                   . </HTML>

Getting Elements by Name

So far in this chapter, I've used the navigation methods such as nextSibling and nextChild to navigate through XML documents. However, you can also get individual elements by searching for them by name. Here's an example; in this case, I'll use the document object's getElementsByTagName method to return a node list object holding all elements of a given name. In particular, I'm searching for <FIRST_NAME> and <LAST_NAME> elements, so I get lists of those elements like this:

<HTML>     <HEAD>          <TITLE>              Reading XML element values          </TITLE>          <SCRIPT LANGUAGE="JavaScript">               function loadDocument()               {                   var xmldoc, listNodesFirstName, listNodesLastName                   xmldoc = new ActiveXObject("Microsoft.XMLDOM")                   xmldoc.load("meetings.xml")                   listNodesFirstName = xmldoc.getElementsByTagName("FIRST_NAME")                   listNodesLastName = xmldoc.getElementsByTagName("LAST_NAME")<HTML>     .     .     .

Like all node lists, the listNodesFirstName and listNodesLastName node lists are indexed by number starting at 0, so the third element in these lists is element number 2, which you refer to as listNodesLastName.item(2). This means that I can find the first and last name of the third person. (Recall that I actually need the first child of the <FIRST_NAME> and <LAST_NAME> nodes, which is the text node inside those elements that holds the person's name, so I use the firstChild method here.)

<HTML>     <HEAD>          <TITLE>              Reading XML element values          </TITLE>          <SCRIPT LANGUAGE="JavaScript">               function loadDocument()               {                   var xmldoc, listNodesFirstName, listNodesLastName                   xmldoc = new ActiveXObject("Microsoft.XMLDOM")                   xmldoc.load("meetings.xml")                   listNodesFirstName = xmldoc.getElementsByTagName("FIRST_NAME")                   listNodesLastName = xmldoc.getElementsByTagName("LAST_NAME")                   outputText = "Third name: " +                         listNodesFirstName.item(2).firstChild.nodeValue + ' '                       + listNodesLastName.item(2).firstChild.nodeValue                   messageDIV.innerHTML=outputText              }          </SCRIPT>     </HEAD>     <BODY>         <CENTER>             <H1>                 Reading XML element values             </H1>             <INPUT TYPE="BUTTON" VALUE="Get the name of the third person"                 ONCLICK="loadDocument()">             <P>             <DIV ID="messageDIV"></DIV>         </CENTER>     </BODY> </HTML>

We've made some progress here and have been able to read in an XML document in various ways to access specific elements in the document. I'll move on to the next step now accessing not just an element's text content, but also the element's attributes.

Getting Attribute Values from XML Elements

To see how to read attribute values from an XML document, I'll read the value of the ATTENDANCE attribute of the third person in the XML document meetings.xml:

<?xml version="1.0"?> <MEETINGS>    <MEETING TYPE="informal">        <MEETING_TITLE>XML In The Real World</MEETING_TITLE>        <MEETING_NUMBER>2079</MEETING_NUMBER>        <SUBJECT>XML</SUBJECT>        <DATE>6/1/2002</DATE>        <PEOPLE>            <PERSON ATTENDANCE="present">                <FIRST_NAME>Edward</FIRST_NAME>                <LAST_NAME>Samson</LAST_NAME>            </PERSON>            <PERSON ATTENDANCE="absent">                <FIRST_NAME>Ernestine</FIRST_NAME>                <LAST_NAME>Johnson</LAST_NAME>            </PERSON>            <PERSON ATTENDANCE="present">                <FIRST_NAME>Betty</FIRST_NAME>                <LAST_NAME>Richardson</LAST_NAME>            </PERSON>        </PEOPLE>    </MEETING> </MEETINGS>

How do you read attribute values? You start by getting a named node map object of the attributes of the current element using that element's attribute's property. In this case, we want the attributes of the third <PERSON> element, and we get a named node map of those attributes, like this:

<HTML>     <HEAD>          <TITLE>              Reading attribute values from XML documents          </TITLE>          <XML ID="meetingsXML" SRC="meetings.xml"></XML>          <SCRIPT LANGUAGE="JavaScript">              function readXMLDocument()              {                   var xmldoc, meetingsNode, meetingNode, peopleNode                   var first_nameNode, last_nameNode, outputText                   var attributes                   xmldoc= document.all("meetingsXML").XMLDocument                   meetingsNode = xmldoc.documentElement                   meetingNode = meetingsNode.firstChild                   peopleNode = meetingNode.lastChild                   personNode = peopleNode.lastChild                   first_nameNode = personNode.firstChild                   last_nameNode = first_nameNode.nextSibling                   attributes = personNode.attributes                   .                   .                   . </HTML>

Now I can recover the actual node for the ATTENDANCE node with the named node map object's getNamedItem method:

<HTML>     <HEAD>          <TITLE>              Reading attribute values from XML documents          </TITLE>          <XML ID="meetingsXML" SRC="meetings.xml"></XML>          <SCRIPT LANGUAGE="JavaScript">              function readXMLDocument()              {                   var xmldoc, meetingsNode, meetingNode, peopleNode                   var first_nameNode, last_nameNode, outputText                   xmldoc= document.all("meetingsXML").XMLDocument                   meetingsNode = xmldoc.documentElement                   meetingNode = meetingsNode.firstChild                   peopleNode = meetingNode.lastChild                   personNode = peopleNode.lastChild                   first_nameNode = personNode.firstChild                   last_nameNode = first_nameNode.nextSibling                   attributes = personNode.attributes                   attendancePerson = attributes.getNamedItem("ATTENDANCE")                   .                   .                   . </HTML>

Now I have a node corresponding to the ATTENDANCE attribute, and I can get the value of that attribute using the value property (attribute nodes don't have internal text nodes):

<HTML>     <HEAD>          <TITLE>              Reading attribute values from XML documents          </TITLE>          <XML ID="meetingsXML" SRC="meetings.xml"></XML>          <SCRIPT LANGUAGE="JavaScript">              function readXMLDocument()              {                   var xmldoc, meetingsNode, meetingNode, peopleNode                   var first_nameNode, last_nameNode, outputText                   var attributes, attendancePerson                   xmldoc= document.all("meetingsXML").XMLDocument                   meetingsNode = xmldoc.documentElement                   meetingNode = meetingsNode.firstChild                   peopleNode = meetingNode.lastChild                   personNode = peopleNode.lastChild                   first_nameNode = personNode.firstChild                   last_nameNode = first_nameNode.nextSibling                   attributes = personNode.attributes                   attendancePerson = attributes.getNamedItem("ATTENDANCE")                   outputText = first_nameNode.firstChild.nodeValue                       + ' ' + last_nameNode.firstChild.nodeValue                       + " is " + attendancePerson.value                   messageDIV.innerHTML=outputText                   .                   .                   . </HTML>

And that's all it takes. Here's what the whole page looks like:

<HTML>     <HEAD>          <TITLE>              Reading attribute values from XML documents          </TITLE>          <XML ID="meetingsXML" SRC="meetings.xml"></XML>          <SCRIPT LANGUAGE="JavaScript">              function readXMLDocument()              {                   var xmldoc, meetingsNode, meetingNode, peopleNode                   var first_nameNode, last_nameNode, outputText                   var attributes, attendancePerson                   xmldoc= document.all("meetingsXML").XMLDocument                   meetingsNode = xmldoc.documentElement                   meetingNode = meetingsNode.firstChild                   peopleNode = meetingNode.lastChild                   personNode = peopleNode.lastChild                   first_nameNode = personNode.firstChild                   last_nameNode = first_nameNode.nextSibling                   attributes = personNode.attributes                   attendancePerson = attributes.getNamedItem("ATTENDANCE")                   outputText = first_nameNode.firstChild.nodeValue                       + ' ' + last_nameNode.firstChild.nodeValue                       + " is " + attendancePerson.value                   messageDIV.innerHTML=outputText              }          </SCRIPT>     </HEAD>     <BODY>         <CENTER>             <H1>                 Reading attribute values from XML documents             </H1>             <INPUT TYPE="BUTTON" VALUE="Get attendance of the third person"                 ONCLICK="readXMLDocument()">             <P>             <DIV ID="messageDIV"></DIV>         </CENTER>     </BODY> </HTML>

Figure 7.4 shows the results; the attendance of the third person is present.

Figure 7.4. Reading attributes in Internet Explorer.

graphics/07fig04.gif

Parsing XML Documents in Code

Up to this point, I've gone after a specific element in a Web page, but there are other ways of handling documents, too. For example, you can parse that is, read and interpret the entire document at once. Here's an example; in this case, I'll work through this entire XML document, meetings.xml, displaying all its nodes in an HTML Web page.

To handle this document, I'll create a function, iterateChildren, that will read and display all the children of a node. As with most parsers, this function is a recursive function, which means that it can call itself to get the children of the current node. To get the name of a node, I will use the nodeName property. To parse an entire document, then, you just have to pass the root node of the entire document to the iterateChildren function, and it will work through the entire document, displaying all the nodes in that document:

<HTML>     <HEAD>         <TITLE>             Parsing an XML Document         </TITLE>         <XML ID="meetingsXML" SRC="meetings.xml"></XML>         <SCRIPT LANGUAGE="JavaScript">             function parseDocument()             {                 documentXML = document.all("meetingsXML").XMLDocument                 resultsDIV.innerHTML = iterateChildren(documentXML, "")             }     .     .     .

Note that I've also passed an empty string ("") to the iterateChildren function. I'll use this string to indent the various levels of the display, to indicate what nodes are nested inside what other nodes. In the iterateChildren function, I start by creating a new text string with the current indentation string (which is either an empty string or a string of spaces), as well as the name of the current node and a <BR> element so that the browser will skip to the next line:

<HTML>     <HEAD>         <TITLE>             Parsing an XML Document         </TITLE>         <XML ID="meetingsXML" SRC="meetings.xml"></XML>         <SCRIPT LANGUAGE="JavaScript">             function parseDocument()             {                 documentXML = document.all("meetingsXML").XMLDocument                 resultsDIV.innerHTML = iterateChildren(documentXML, "")             }             function iterateChildren(theNode, indentSpacing)             {                 var text = indentSpacing + theNode.nodeName + "<BR>"                 .                 .                 .                 return text             }         </SCRIPT>     </HEAD>     .     .     .

I can determine whether the current node has children by checking the childNodes property, which holds a node list of the children of the current node. I can determine whether the current node has any children by checking the length of this list with its length property; if it does have children, I call iterateChildren on all child nodes. (Note also that I indent this next level of the display by adding four nonbreaking spaces which you specify with the &nbsp; entity reference in HTML to the current indentation string.)

<HTML>     <HEAD>         <TITLE>             Parsing an XML Document         </TITLE>         <XML ID="meetingsXML" SRC="meetings.xml"></XML>         <SCRIPT LANGUAGE="JavaScript">             function parseDocument()             {                 documentXML = document.all("meetingsXML").XMLDocument                 resultsDIV.innerHTML = iterateChildren(documentXML, "")             }             function iterateChildren(theNode, indentSpacing)             {                 var text = indentSpacing + theNode.nodeName + "<BR>"                 if (theNode.childNodes.length > 0) {                     for (var loopIndex = 0; loopIndex <                         theNode.childNodes.length; loopIndex++) {                         text += iterateChildren(theNode.childNodes(loopIndex),                         indentSpacing + "&nbsp;&nbsp;&nbsp;&nbsp;")                     }                 }                 return text             }         </SCRIPT>     </HEAD>     .     .     .

And that's all it takes; here's the whole Web page:

<HTML>     <HEAD>         <TITLE>             Parsing an XML Document         </TITLE>         <XML ID="meetingsXML" SRC="meetings.xml"></XML>         <SCRIPT LANGUAGE="JavaScript">             function parseDocument()             {                 documentXML = document.all("meetingsXML").XMLDocument                 resultsDIV.innerHTML = iterateChildren(documentXML, "")             }             function iterateChildren(theNode, indentSpacing)             {                 var text = indentSpacing + theNode.nodeName + "<BR>"                 if (theNode.childNodes.length > 0) {                     for (var loopIndex = 0; loopIndex <                         theNode.childNodes.length; loopIndex++) {                         text += iterateChildren(theNode.childNodes(loopIndex),                         indentSpacing + "&nbsp;&nbsp;&nbsp;&nbsp;")                     }                 }                 return text             }         </SCRIPT>     </HEAD>     <BODY>         <CENTER>             <H1>                 Parsing an XML Document             </H1>         </CENTER>         <CENTER>             <INPUT TYPE="BUTTON" VALUE="Parse and display the document"                 ONCLICK="parseDocument()">         </CENTER>         <DIV ID="resultsDIV"></DIV>     </BODY> </HTML>

When you click the button in this page, it will read meetings.xml and display its structure as shown in Figure 7.5. You can see all the nodes listed there, indented as they should be. Note also the "meta-names" that Internet Explorer gives to document and text nodes #document and #text.

Figure 7.5. Parsing a document in Internet Explorer.

graphics/07fig05.gif

Parsing an XML Document to Display Node Type and Content

In the previous example, the code listed the names of each node in the meetings.xml document. However, you can do more than that you can also use the nodeValue property to list the value of each node, and I'll do that in this section. In addition, you can indicate the type of each node that you come across by checking the nodeType property. Here are the possible values for this property:

Value Description
1 Element
2 Attribute
3 Text
4 CDATA section
5 Entity reference
6 Entity
7 Processing instruction
8 Comment
9 Document
10 Document type
11 Document fragment
12 Notation

Here's how I determine the type of a particular node, using a JavaScript switch statement of the kind that we saw in the previous chapter:

<HTML>     <HEAD>         <TITLE>             Parsing an XML document and displaying node type and content         </TITLE>         <XML ID="meetingsXML" SRC="meetings.xml"></XML>         <SCRIPT LANGUAGE="JavaScript">             function parseDocument()             {                 documentXML = document.all("meetingsXML").XMLDocument                 resultsDIV.innerHTML = iterateChildren(documentXML, "")             }             function iterateChildren(theNode, indentSpacing)             {                 var typeData                 switch (theNode.nodeType) {                     case 1:                         typeData = "element"                         break                     case 2:                         typeData = "attribute"                         break                     case 3:                         typeData = "text"                         break                     case 4:                         typeData = "CDATA section"                         break                     case 5:                         typeData = "entity reference"                         break                     case 6:                         typeData = "entity"                         break                     case 7:                         typeData = "processing instruction"                         break                     case 8:                         typeData = "comment"                         break                     case 9:                         typeData = "document"                         break                     case 10:                         typeData = "document type"                         break                     case 11:                         typeData = "document fragment"                         break                     case 12:                         typeData = "notation"                 }                 .                 .                 .

If the node has a value (which I check by comparing nodeValue to null, which is the value that it will have if there is no actual node value), I can display that value like this:

<HTML>     <HEAD>         <TITLE>             Parsing an XML document and displaying node type and content         </TITLE>         <XML ID="meetingsXML" SRC="meetings.xml"></XML>         <SCRIPT LANGUAGE="JavaScript">             function parseDocument()             {                 documentXML = document.all("meetingsXML").XMLDocument                 resultsDIV.innerHTML = iterateChildren(documentXML, "")             }             function iterateChildren(theNode, indentSpacing)             {                 var typeData                 switch (theNode.nodeType) {                     case 1:                         typeData = "element"                         break                     case 2:                         typeData = "attribute"                         break                     case 3:                         typeData = "text"                         break                     case 4:                         typeData = "CDATA section"                         break                     case 5:                         typeData = "entity reference"                         break                     case 6:                         typeData = "entity"                         break                     case 7:                         typeData = "processing instruction"                         break                     case 8:                         typeData = "comment"                         break                     case 9:                         typeData = "document"                         break                     case 10:                         typeData = "document type"                         break                     case 11:                         typeData = "document fragment"                         break                     case 12:                         typeData = "notation"                 }                   var text                   if (theNode.nodeValue != null) {                       text = indentSpacing + theNode.nodeName                       + "&nbsp; = " + theNode.nodeValue                       + "&nbsp; (Node type: " + typeData                       + ")<BR>"                   } else {                       text = indentSpacing + theNode.nodeName                       + "&nbsp; (Node type: " + typeData                       + ")<BR>"                   }                  if (theNode.childNodes.length > 0) {                     for (var loopIndex = 0; loopIndex <                         theNode.childNodes.length; loopIndex++) {                         text += iterateChildren(theNode.childNodes(loopIndex),                         indentSpacing + "&nbsp;&nbsp;&nbsp;&nbsp;")                     }                 }                 return text             }         </SCRIPT>     </HEAD>     <BODY>         <CENTER>             <H1>                 Parsing an XML document and displaying node type and content             </H1>         </CENTER>         <CENTER>             <INPUT TYPE="BUTTON" VALUE="Parse and display the document"                 ONCLICK="parseDocument()">         </CENTER>         <DIV ID="resultsDIV"></DIV>     </BODY> </HTML>

And that's all it takes; the results are shown in Figure 7.6. As you see there, the entire document is listed, as is the type of each node. In addition, if the node has a value, that value is displayed.

Figure 7.6. Using JavaScript to display element content and type.

graphics/07fig06.gif

This example listed the nodes of a document on the other hand, some of the elements in meetings.xml have attributes as well. So how do you handle attributes?

Parsing an XML Document to Display Attribute Values

You can get access to an element's attributes with the element's attributes property. You can get attribute names and values with the name and value properties of attribute objects I used the value property earlier in this chapter. It's also worth noting that because attributes are themselves nodes, you can use the nodeName and nodeValue properties to do the same thing; I'll do that in this example to show how it works.

Here's how I augment the previous example, looping over all the attributes that an element has and listing them. (Note that you could use the name and value properties here instead of nodeName and nodeValue.)

<HTML>     <HEAD>         <TITLE>             Parsing XML to read attributes         </TITLE>         <XML ID="meetingsXML" SRC="meetings.xml"></XML>         <SCRIPT LANGUAGE="JavaScript">             function parseDocument()             {                 documentXML = document.all("meetingsXML").XMLDocument                 resultsDIV.innerHTML = iterateChildren(documentXML, "")             }             function iterateChildren(theNode, indentSpacing)             {                 var typeData                 switch (theNode.nodeType) {                     case 1:                         typeData = "element"                         break                     case 2:                         typeData = "attribute"                         break                     case 3:                         typeData = "text"                         break                     case 4:                         typeData = "CDATA section"                         break                     case 5:                         typeData = "entity reference"                         break                     case 6:                         typeData = "entity"                         break                     case 7:                         typeData = "processing instruction"                         break                     case 8:                         typeData = "comment"                         break                     case 9:                         typeData = "document"                         break                     case 10:                         typeData = "document type"                         break                     case 11:                         typeData = "document fragment"                         break                     case 12:                         typeData = "notation"                 }                   var text                   if (theNode.nodeValue != null) {                       text = indentSpacing + theNode.nodeName                       + "&nbsp; = " + theNode.nodeValue                       + "&nbsp; (Node type: " + typeData                       + ")"                   } else {                       text = indentSpacing + theNode.nodeName                       + "&nbsp; (Node type: " + typeData                       + ")"                   }                 if (theNode.attributes != null) {                      if (theNode.attributes.length > 0) {                          for (var loopIndex = 0; loopIndex <                              theNode.attributes.length; loopIndex++) {                              text += " (Attribute: " +                                  theNode.attributes(loopIndex).nodeName +                                  " = \"" +                                  theNode.attributes(loopIndex).nodeValue                                  + "\")"                          }                      }                  }                  text += "<BR>"                  if (theNode.childNodes.length > 0) {                     for (var loopIndex = 0; loopIndex <                         theNode.childNodes.length; loopIndex++) {                         text += iterateChildren(theNode.childNodes(loopIndex),                         indentSpacing + "&nbsp;&nbsp;&nbsp;&nbsp;")                     }                 }                 return text             }         </SCRIPT>     </HEAD>     <BODY>         <CENTER>             <H1>                Parsing XML to read attributes             </H1>         </CENTER>         <CENTER>             <INPUT TYPE="BUTTON" VALUE="Parse and display the document"                 ONCLICK="parseDocument()">         </CENTER>         <DIV ID="resultsDIV"></DIV>     </BODY> </HTML>

You can see the results of this page in Figure 7.7; both elements and attributes are listed in that figure.

Figure 7.7. Listing elements and attributes in Internet Explorer.

graphics/07fig07.gif

Handling Events While Loading XML Documents

Internet Explorer also lets you track the progress of an XML document as it's being loaded. In particular, you can use the onreadystatechange and ondataavailable events to watch what's happening. The readyState property in the onreadystatechange event informs you about the current status of a document. Here's an example showing how this works:

<HTML>     <HEAD>         <TITLE>             Handling document loading events         </TITLE>         <SCRIPT LANGUAGE="JavaScript">             var xmldoc             function loadDocument()             {                 xmldoc = new ActiveXObject("microsoft.XMLDOM")                 xmldoc.ondataavailable = dataAvailableHandler                 xmldoc.onreadystatechange = stateChangeHandler                 xmldoc.load('meetings.xml')             }             function dataAvailableHandler()             {                 messageDIV.innerHTML += "Status: data available.<BR>"             }             function stateChangeHandler()             {                 switch (xmldoc.readyState)                 {                     case 1:                         messageDIV.innerHTML +=                             "Status: data uninitialized.<BR>"                         break                     case 2:                         messageDIV.innerHTML += "Status: data loading.<BR>"                         break                     case 3:                         messageDIV.innerHTML += "Status: data loaded.<BR>"                         break                     case 4:                         messageDIV.innerHTML +=                             "Status: data loading complete.<BR>"                         if (xmldoc.parseError.errorCode != 0) {                             messageDIV.innerHTML += "Status: error.<BR>"                         }                         else {                             messageDIV.innerHTML +=                                 "Status: data loaded alright.<BR>"                         }                         break                 }             }         </SCRIPT>     </HEAD>     <BODY>         <CENTER>             <H1>                 Handling document loading events             </H1>         </CENTER>         <CENTER>             <INPUT TYPE="BUTTON" VALUE="Load the document"                 ONCLICK="loadDocument()">         </CENTER>         <DIV ID="messageDIV"></DIV>     </BODY> </HTML>

The results of this Web page appear in Figure 7.8, and you can see the progress that Internet Explorer made in loading a document in that page.

Figure 7.8. Monitoring XML loading events in Internet Explorer.

graphics/07fig08.gif

Validating XML Documents with Internet Explorer

By default, Internet Explorer actually does validate XML documents as it loads them, but you won't see any validation errors unless you check the parseError object.

Turning Validation On and Off

You can turn document validation on or off with the document object's validateOnParse property, which is set to true by default.

Here's an example; in this case, I'll load this XML document, error.xml. This document has a validation problem because the <NAME> element is declared to contain only a <FIRST_NAME> element, not a <LAST_NAME> element:

<?xml version = "1.0" standalone="yes"?> <!DOCTYPE DOCUMENT [ <!ELEMENT DOCUMENT (CUSTOMER)*> <!ELEMENT CUSTOMER (NAME,DATE,ORDERS)> <!ELEMENT NAME (FIRST_NAME)> <!ELEMENT LAST_NAME (#PCDATA)> <!ELEMENT FIRST_NAME (#PCDATA)> <!ELEMENT DATE (#PCDATA)> <!ELEMENT ORDERS (ITEM)*> <!ELEMENT ITEM (PRODUCT,NUMBER,PRICE)> <!ELEMENT PRODUCT (#PCDATA)> <!ELEMENT NUMBER (#PCDATA)> <!ELEMENT PRICE (#PCDATA)> ]> <DOCUMENT>     <CUSTOMER>         <NAME>             <LAST_NAME>Smith</LAST_NAME>             <FIRST_NAME>Sam</FIRST_NAME>         </NAME>         <DATE>October 15, 2001</DATE>         <ORDERS>             <ITEM>                 <PRODUCT>Tomatoes</PRODUCT>                 <NUMBER>8</NUMBER>                 <PRICE>$1.25</PRICE>             </ITEM>             <ITEM>                 <PRODUCT>Asparagus</PRODUCT>                 <NUMBER>12</NUMBER>                 <PRICE>$2.95</PRICE>             </ITEM>             <ITEM>                 <PRODUCT>Lettuce</PRODUCT>                 <NUMBER>6</NUMBER>                 <PRICE>$11.50</PRICE>             </ITEM>         </ORDERS>     </CUSTOMER> </DOCUMENT>

Here's what the Web page that reads in and checks this document looks like here I'm using the parseError object's errorCode, url, line, linepos, errorString, and reason properties to track down the error:

<HTML>     <HEAD>         <TITLE>             Validating documents         </TITLE>         <SCRIPT LANGUAGE="JavaScript">             var xmldoc             function loadDocument()             {                 xmldoc = new ActiveXObject("microsoft.XMLDOM")                 xmldoc.onreadystatechange = stateChangeHandler                 xmldoc.ondataavailable = dataAvailableHandler                 xmldoc.load('error.xml')             }             function dataAvailableHandler()             {                 messageDIV.innerHTML += "Status: data available.<BR>"             }             function stateChangeHandler()             {                 if(xmldoc.readyState == 4){                     var errorString = xmldoc.parseError.srcText                     errorString =                     xmldoc.parseError.srcText.replace(/\</g, "&lt;")                     errorString = errorString.replace(/\>/g, "&gt;")                     if (xmldoc.parseError.errorCode != 0) {                         messageDIV.innerHTML = "Problem in " +                         xmldoc.parseError.url +                         " line " + xmldoc.parseError.line +                         " position " + xmldoc.parseError.linepos +                         ":<BR>Error source: " + errorString +                         "<BR>" + xmldoc.parseError.reason +                         "<BR>" +  "Error: " +                         xmldoc.parseError.errorCode                     }                     else {                         messageDIV.innerHTML =                         "Status: document loaded alright.<BR>"                     }                 }             }         </SCRIPT>     </HEAD>     <BODY>         <CENTER>             <H1>                 Validating documents             </H1>         </CENTER>         <DIV ID="messageDIV"></DIV>         <CENTER>             <INPUT TYPE="BUTTON" VALUE="Load the document"                 ONCLICK="loadDocument()">         </CENTER>     </BODY> </HTML>

Figure 7.9 shows the results of this Web page, where the validation error is reported.

Figure 7.9. Validating XML documents in Internet Explorer.

graphics/07fig09.gif

You might note that the errorString property holds the error-causing text from the XML document. Because that text is <LAST_NAME>Smith</LAST_NAME>, there's a problem the browser will try to interpret this as markup. To avoid that, I use the JavaScript String object's replace method to replace < with &lt; and > with &gt;. (You pass a regular expression to the replace method; to change all < characters to &lt;, the regular expression that you use is /\</g. To change all > characters to &gt;, you match to the regular expression /\>/g.)

Scripting XML Elements

Internet Explorer provides limited support for scripting XML elements. For example, I can add an onclick event attribute to an XML element named <xlink> in an XHTML document. (We'll take a look at Xlinks and XHTML later in this book; see Chapters 15, 16, and 17.)

<?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet TYPE="text/css" href="xlink.css"?> <!DOCTYPE html SYSTEM "t3.dtd"> <html>     <head>     </head>     <body>     Want to check out <xlink xml:link = "simple" inline="false"     href = "http://www.w3c.org"     onclick="location.href='http://www.w3c.org'">W3C</xlink>?     </body> </html>

I can specify in a style sheet, xlink.css, that <xlink> elements should be displayed in blue and underlined, as a hyperlink might appear, and I can also specify that the mouse cursor should change to a hand when over this element, just as it would for an HTML hyperlink:

xlink {color: #0000FF; text-decoration: underline; cursor: hand}

The results appear in Figure 7.10 when the user clicks the <xlink> element, Internet Explorer executes the code in the onclick event attribute. In this case, that navigates the browser to http://www.w3c.org. As you can see, you can script XML elements in Internet Explorer, adding event attributes such as onclick.

Figure 7.10. Creating a "hyperlink" in an XML document in Internet Explorer.

graphics/07fig10.gif

Editing XML Documents with Internet Explorer

You can alter the contents of an XML document in Internet Explorer. To do this, you use methods such s createElement, insertBefore, createTextNode, and appendChild.

As an example, I'll alter the document meetings.xml by inserting a new element, <MEETING_CHAIR>, like this:

<?xml version="1.0"?> <MEETINGS>    <MEETING TYPE="informal">        <MEETING_CHAIR>Ted Bond</MEETING_CHAIR>        <MEETING_TITLE>XML In The Real World</MEETING_TITLE>        <MEETING_NUMBER>2079</MEETING_NUMBER>        <SUBJECT>XML</SUBJECT>        <DATE>6/1/2002</DATE>        <PEOPLE>            <PERSON ATTENDANCE="present">                <FIRST_NAME>Edward</FIRST_NAME>                <LAST_NAME>Samson</LAST_NAME>            </PERSON>             .             .             .

I begin by creating the new node, corresponding to the <MEETING_CHAIR> element, and inserting it into the document with the insertBefore method:

<HTML>     <HEAD>         <XML ID="meetingsXML" SRC="meetings.xml"></XML>         <SCRIPT LANGUAGE="JavaScript">         <!             function alterDocument()             {                 var xmldoc, rootNode, meetingsNode, meetingNode, createdNode, createdTextNode                 xmldoc = document.all.meetingsXML                 rootNode = xmldoc.documentElement                 meetingsNode = rootNode.firstChild                 meetingNode = meetingsNode.firstChild                 createdNode = xmldoc.createElement("MEETING_CHAIR")                 createdNode = meetingsNode.insertBefore(createdNode, meetingNode)                 .                 .                 .

Now I will create the text node inside this new element. The text node will hold the text "Ted Bond", and I'll create it with the createTextNode method and append it to the <MEETING_CHAIR> element with the appendChild method:

<HTML>     <HEAD>         <XML ID="meetingsXML" SRC="meetings.xml"></XML>         <SCRIPT LANGUAGE="JavaScript">         <!--             function alterDocument()             {                 var xmldoc, rootNode, meetingsNode, meetingNode, createdNode, createdTextNode                 xmldoc = document.all.meetingsXML                 rootNode = xmldoc.documentElement                 meetingsNode = rootNode.firstChild                 meetingNode = meetingsNode.firstChild                 createdNode = xmldoc.createElement("MEETING_CHAIR")                 createdNode = meetingsNode.insertBefore(createdNode,meetingNode)                 createdTextNode = xmldoc.createTextNode("Ted Bond")                 createdNode.appendChild(createdTextNode)                 .                 .                 .

Now I've altered the document but at this point, it exists only inside the xmldoc object. How do I display it in the browser? The DOMDocument object actually has a save method that enables you to save the document to a new file like this: xmldoc.save("new.xml"). However, you can't use that method without changing the security settings in Internet Explorer by default, browsers aren't supposed to be capable of writing files on the host machine.

I'll take a different approach. In this case, I'll store the XML document's text in a hidden control in an HTML form (a hidden control simply holds text invisible to the user), and send the data in that form to a server-side Active Server Pages (ASP) script. That script will just echo the document back to the browser, which, in turn, will display it. Here's the ASP script, echo.asp, where I set the MIME type of this document to "text/xml", add an <?xml?> processing instruction, and echo the XML data back to Internet Explorer. (ASP scripts such as this one are beyond the scope of this book, but we'll take a brief look at them in Chapter 20, "WML, ASP, JSP, Servlets, and Perl.")

<%@ LANGUAGE="VBSCRIPT" %> <% Response.ContentType = "text/xml" Response.Write "<?xml version=" & Chr(34) & "1.0" & Chr(34) & "?>" & Chr(13) & Chr(10) Response.Write Request("data") %>

I have an ASP server on my host machine, so the URI that I'll send the XML document to is http://default/db/echo.asp. I do that by using the HTML form's submit method (which works exactly as if the user had clicked a Submit button in the form) after loading the XML document into the page's hidden control:

<HTML>     <HEAD>         <XML ID="meetingsXML" SRC="meetings.xml"></XML>         <SCRIPT LANGUAGE="JavaScript">         <!--             function alterDocument()             {                 var xmldoc, rootNode, meetingsNode, meetingNode, createdNode, createdTextNode                 xmldoc = document.all.meetingsXML                 rootNode = xmldoc.documentElement                 meetingsNode = rootNode.firstChild                 meetingNode = meetingsNode.firstChild                 createdNode = xmldoc.createElement("MEETING_CHAIR")                 createdNode = meetingsNode.insertBefore(createdNode, meetingNode)                 createdTextNode = xmldoc.createTextNode("Ted Bond")                 createdNode.appendChild(createdTextNode)                 document.all.data.value = meetingsXML.documentElement.xml                 document.form1.submit()             }         //-->         </SCRIPT>     </HEAD>     <BODY>         <CENTER>             <FORM NAME="form1" ACTION="http://default/db/echo.asp" METHOD="POST">             <INPUT TYPE="HIDDEN" NAME="data">             <INPUT TYPE="BUTTON" VALUE="Alter the document" onclick="alterDocument()">             </FORM>         </CENTER>     </BODY> </HTML>

Now when the user clicks the button with the caption Alter the document, the code in this page alters the XML document and sends it to the server. The ASP script on the server echoes the XML document back to the browser, which displays it, as you see in Figure 7.11. You can see the new <MEETING_CHAIR> element in that figure.

Figure 7.11. Altering an XML document in Internet Explorer.

graphics/07fig11.gif

We've put JavaScript to work in this chapter, parsing and accessing XML documents. In the next chapter, I'm going to put JavaScript to work treating XML data as database objects.

CONTENTS


Inside XML
Real World XML (2nd Edition)
ISBN: 0735712867
EAN: 2147483647
Year: 2005
Pages: 23
Authors: Steve Holzner

Similar book on Amazon

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net