What Is DOM? | Professional XML (Programmer to Programmer)

Some overhead is involved when using XML documents, because extracting data from the tags in an XML document can be arduous. A parser is used to take care of checking a document's validity and extracting the data from the XML syntax. A layer of abstraction between the application and the XML document is made possible by the XML Document Object Model (DOM) specification, which has been standardized by the W3C. This layer of abstraction comes in the form of interfaces that have methods and properties to manipulate an XML document. In other words, when using the DOM, you don't need to worry about the XML syntax directly. For example, the methods, getAttribute(…) and setAttribute(…), enable you to manipulate the attributes on an element in an elegant fashion. Legacy systems can use these interfaces to provide access to legacy data as if the data was natively stored in XML. In other words, your legacy data can be made to look like an XML document by implementing the DOM interfaces on top of the legacy database.

Why Client-Side XML Processing?

At first glance, it seems pretty silly to process XML data on the client side when powerful languages such as ASP.NET, Java, and Perl exist to handle processing on the back end. But, if you have been around the world of Web development for any length of time, will know that in some circumstances it makes sense to handle things on the server side, and other conditions that suit processing on the client side.

Processing data on the client side can help relieve server load and give the visitor a better, more responsive experience on your site. For example, the use of server-side programming to perform a task as simple as sorting a column in a table, or formatting some data, is unnecessary; it also forces the users to wait longer than they should have to for such trivial operations. Client-side processing of XML data can be a big help in situations like this.

XML DOM Object Model

Document Object Model is a W3C standard that allows you to put together a document dynamically, and to navigate and manipulate its structure and content. To work with DOM, you use an XML parser to load XML documents into memory. After the documents are loaded, you can then easily manipulate the information in the documents through the Document Object Model (DOM).

You can visualize the DOM's structure as a tree of nodes. The root of the tree is a Document node, which has one or more child nodes that branch off from this trunk. Each of these child nodes may in turn contain child nodes of its own, and so on. For example, consider the XML file shown in Listing 12-1.

Listing 12-1: A sample XML file

      <?xml version="1.0" encoding="utf-8"?>      <Products>        <Product Category="Helmets">          <ProductID>707</ProductID>          <Name>Sport-100 Helmet, Red</Name>          <ProductNumber>HL-U509-R</ProductNumber>        </Product>        <Product Category="Socks">          <ProductID>709</ProductID>          <Name>Mountain Bike Socks, M</Name>          <ProductNumber>SO-B909-M</ProductNumber>        </Product>        <Product Category="Socks">          <ProductID>710</ProductID>          <Name>Mountain Bike Socks, L</Name>          <ProductNumber>SO-B909-L</ProductNumber></Product>                <Product Category="Caps">          <ProductID>712</ProductID>          <Name>AWC Logo Cap</Name>          <ProductNumber>CA-1098</ProductNumber>        </Product>      </Products>

The root element of this XML document is <Products>, which contains an arbitrary number of <Product> elements. Each <Product> element, in turn, contains <ProductID>, <Name>, and <ProductNumber> elements. In addition, the <Product > element also contains a category attribute.

If you load this XML file into DOM, DOM loads the XML file into a tree-like structure with the elements, attributes, and text defined as nodes. Some of these node objects have child objects or child nodes. Nodes with no child object are called leaf nodes. Figure 12-1 provides a visual representation of the Products.xml file.

image from book
Figure 12-1

According to W3C recommendations, the DOM Level 1 allows navigation within an HTML or XML document and the manipulation of its content. DOM Level 2 extends Level 1 with a number of features such as XML Namespace support, filtered views, ranges, and events. DOM Level 3 builds on Level 2 that allows programs to dynamically access and update the content, structure, and style of documents. The following table describes the main interfaces that form the DOM Level 3 Core module.

Open table as spreadsheet

Interface	Description
Attr	The Attr interface represents an attribute in an `Element` object
CDataSection	CDATA sections escape blocks of text containing characters that would otherwise be regarded as markup
CharacterData	The CharacterData interface extends Node with a set of attributes and methods for accessing character data in the DOM
Comment	This interface inherits from CharacterData and represents the content of a comment (in other words, all the characters between the starting `<!--` and ending `-->`)
Document	The Document interface represents the entire Hypertext Markup Language (HTML) or XML document
DocumentFragment	DocumentFragment is a light-weight or minimal Document object
DocumentType	Each Document has a `doctype` attribute whose value is either null or a DocumentType object
DOMImplementation	The DOMImplementation interface provides a number of methods for performing operations that are independent of any particular instance of the DOM
Element	The Element interface represents an element in an HTML or XML document
Entity	This interface represents an entity, either parsed or unparsed, in an XML document
EntityReference	EntityReference objects may be inserted into the structure model when an entity reference is in the source document or when the user wants to insert an entity reference
NamedNodeMap	Objects implementing the NamedNodeMap interface represent collections of nodes that can be accessed by name
Node	The Node interface is the primary data type for the entire DOM
NodeList	The NodeList interface provides the abstraction of an ordered collection of nodes, without defining or constraining how this collection is implemented
Notation	This interface represents a notation declared in the document type definition (DTD)
ProcessingInstruction	The ProcessingInstruction interface represents a PI, which is used in XML as a way to keep processor-specific information in the text of the document
Text	The Text interface inherits from CharacterData and represents the textual content (termed character data in XML) of an Element or Attr

Note that every interface that represents a node in the DOM tree extends the Node interface. The next few sections explore some of the important interfaces and the steps involved in using its methods and properties.

Using the Document Interface

The Document interface is the uppermost object in the XML DOM hierarchy. It implements all the basic DOM methods required to work with an XML document. It also provides methods that help you navigate, query, and modify the content and the structure of an XML document. Some of the important methods of the Microsoft's implementation of Document object are described in the following table:

Open table as spreadsheet

Method	Description
`createElement`	Takes an element name as a parameter and creates an element node by using the name. You cannot create namespace-qualified elements using the `createElement()` method. To create namespace-qualified elements, you need to use the `createElementNS()` method
`createAttribute`	Takes an attribute name as a parameter and creates an attribute node with that name
`createTextNode`	Takes a string as a parameter and creates a text node containing the specified string
`createNode`	Takes three parameters. The `type` parameter is a variant that can be either a string or an integer. The second parameter is a string that represents the name of the node to be created. The third parameter is a string that represents the namespace-URI
`createComment`	Takes a string as a parameter and creates a comment node containing this string
`getElementsByTagName`	Takes a string as a parameter. The string represents the element to be searched. This method returns an instance of the `IXMLDOMNodeList` object, which contains the collection of nodes with the specified element name. You can use the node list to navigate and manipulate the values stored in the named elements
`load`	Takes a string as a parameter that represents the URL or the path of an XML document as its argument and loads the specified document in the DOMDocument object
`loadXML`	Takes a string as a parameter, which contains well-formed XML code or an entire XML document, to load it in the DOMDocument object
`transformNode`	Takes a style sheet object as a parameter, processes the node by applying the corresponding style sheet template on the XML document, and returns the result of transformation
`save`	Takes an object as a parameter. This object can be either DOMDocument or a filename. The `save()` method saves the DOMDocument object at the specified destination

In addition to the preceding methods, the Microsoft implementation of the Document interface also exposes the following properties that can be used to manipulate the information contained in the Document object.

Open table as spreadsheet

Property	Description
`async`	Specifies whether an asynchronous download is permitted. If you set this property to `true`, the script executes while the XML document is still being loaded. If this property is set to `false`, the script waits until the XML document is loaded before it starts processing the content.
`childNodes`	Returns a list of child nodes that belong to a parent node. The value of this property is of the type `IXMLDOMNodeList.`
`documentElement`	Contains the root element of the XML document represented by the DOM-Document object.
`firstChild`	Returns the first child node of a parent element. This is a read-only property.
`lastChild`	Returns the last child of a parent node.
`parseError`	Returns an `IXMLDOMParseError` object that contains information about the most recently generated error.
`readyState`	Returns the state of the XML document. It indicates whether the document has been loaded completely.
`xml`	Returns an XML representation of a node and its child nodes.
`validateOnParse`	Specifies whether the parser should validate the XML document when parsing.

Now that you have had a brief look at the properties and methods of the Document interface, take a look at an example that shows how to load an XML document through the Document interface.

Loading an XML Document

To traverse an XML document in Internet Explorer, you first have to instantiate the Microsoft XMLDOM parser. In Internet Explorer 5.0 and above, you can instantiate the parser using JavaScript:

      <script type="text/javascript">        function loadDocument()        {          var doc = new ActiveXObject("Microsoft.XMLDOM");        ...        }      </script>

Note that the previous XML parser is implemented as an ActiveX object and works only in Internet Explorer.

After the parser is instantiated, you can load a file into it using a series of commands. For example, to load the Products.xml file in the parser:

      <script type="text/javascript">        function loadDocument()        {          var doc = new ActiveXObject("Microsoft.XMLDOM");        doc.async = false;        doc.load("Products.xml");          ...          }      </script>

Note that you set the async property of the XMLDOM object to false to ensure that the parser will wait until the document is fully loaded before it does anything else. Next, you invoke the load() method to load the contents of the Products.xml file into the parser.

At times you might want to load the XML from a string variable and then feed it directly to the parser. To do this, you must use the loadXML() method instead of the load() method, as in the following example:

      <script type="text/javascript">        function loadDocument()        {          var xmlContents = '<?xml version="1.0" encoding="iso-8859-1"?>';          xmlContents += '<Products><Product>';          xmlContents += '<ProductID>707</ProductID>';          xmlContents += '<Name>Sport-100 Helmet, Red</Name>';          xmlContents += '<ProductNumber>HL-U509-R</ProductNumber>';          xmlContents += '</Product></Products>';          var doc = new ActiveXObject("Microsoft.XMLDOM");          doc.async = false;          doc.loadXML(xmlContents);          ...        }      </script>

The loadXML() method can be extremely useful in scenarios where you are retrieving XML data from the server side dynamically as a string variable. You can take that XML and load it onto an XML DOM object using the loadXML() method for subsequent processing.

Using the readyState Property

To check whether a document has been loaded completely, use the readyState property. This property stores a numeric value, which represents one of the following states:

q LOADING (1)-The loading process is in progress, and data is not yet parsed.
q LOADED (2)-The data has been read and parsed, but the object model is not ready.
q INTERACTIVE (3)-The object model is available with partially retrieved data set and is in read-only mode.
q COMPLETED (4)-The loading process is complete.

To determine whether the XML document is completely loaded and display a message using JavaScript, use the code:

      if (doc.readyState==4)      {        alert ("Document is completely loaded");      }

Using the Element Interface

The Element interface represents each element in the XML document. It supports the manipulation of elements and the attributes associated with the elements. If the element node contains text, this text is represented in a text node. The Element interface helps manage attributes because this is the only node type that has attributes. This interface has only one read-only property, tagName, which retrieves the tag name of the element as a string.

An element is also a Node object and inherits different properties of the Node object. The methods of the Element interface are shown in the following table:

Open table as spreadsheet

Method	Description
`getAttribute`	Returns the string containing the value of the specified attribute
`getAttributeNode`	Returns the specified attribute node as an `Attr` object
`getElementsByTagName`	Returns the NodeList of all descendant elements with a given tag name
`removeAttribute`	Removes the specified attribute's value
`removeAttributeNode`	Removes the specified attribute node
`setAttribute`	Creates a new attribute and sets the value for the attribute. If an attribute is present, changes the value for it
`setAttributeNode`	Inserts a new specified attribute to the element, replacing any existing attribute

As mentioned previously, the getElementsByTagName() method retrieves all elements of the specified name that occur under the node on which the method is called. For example, to print the value contained in the Name element of the first product, you could write the following code:

      document.write(doc.getElementsByTagName("Name").item(0).text);

To display all the values of the Name elements, you could loop through the collection of NodeList object returned by the getElementsByTagName() method:

      var names = doc.getElementsByTagName("Name");      for (var i = 0; i < names.length; i++)      {        document.write(names.item(i).text + "  ");      }

Creating a New Element

You can create a new element for an XML document using the createElement() method of the DOM object. The createElement() method takes one parameter-the name of the element that is to be created, as shown:

      var prodElement = doc.createElement("Product");

In the previous code, a variable named prodElement is declared and a new element node, Product, is created. The reference of the new node is stored in the prodElement variable.

Using the Node Interface

The Node interface represents a single node in the document tree structure. All the objects inherit the properties from the Node interface. In addition to the properties and functions, which are associated with them, the Node interface provides basic information like the name of the Node, its text, and its content. The following table lists the different properties of the Node interface:

Open table as spreadsheet

Property	Description
`attributes`	This returns a `NamedNodeMap` for nodes that have attributes
`baseName`	A read-only property that returns the base name for a node
`childNodes`	A read-only property containing a node list of all children for all the elements that can have them
`dataType`	A read-only property that specifies the data type for the node
`definition`	This property returns the definition of the node in the DTD
`firstChild`	A read-only property that returns the first child node of a node
`lastChild`	A read-only property that returns the last child node of a node
`namespaceURI`	A read-only property. This property returns the Universal Resource Identifier (URI) of the namespace
`nextSibling`	This property returns the next node in the parent's child list
`nodeName`	A read-only property and contains the name of the node, depending on node type
`nodeType`	A read-only property specifying the type of the node
`nodeTypedValue`	This property contains the value of this node as expressed in its data type
`nodeTypeString`	A read-only property and returns the node type in string form
`nodeValue`	This property contains the value of the node, depending on its type
`ownerDocument`	This property returns the Document interface to which the node belongs
`parentNode`	A read-only property and returns the parent node of all nodes except `Document`, `DocumentFragment` and `Attr`, which cannot have parent nodes
`parsed`	This property returns a value of `True` if this node and all of its child nodes have been parsed. Otherwise, it returns `False`
`prefix`	This property is read-only property and returns the namespace prefix
`previousSibling`	This property returns the previous node in the parent's child list
`specified`	This property returns a value indicating whether this node is specified or derived from a default value in the DTD or schema
`text`	This property returns the text content of this node and its sub trees
`xml`	This property contains the XML representation of this node and its child nodes

Note

Note that the properties baseName, dataType, definition, nodeTypedValue, nodeTypeString, parsed, text, and xml are available only in the Microsoft implementation of DOM.

The following table lists the different methods of the Node interface:

Open table as spreadsheet

Method	Description
`appendChild`	Adds a new child node to the list of children for this node
`cloneNode`	Creates a clone node that is an exact duplicate of this node
`hasChildNodes`	Determines whether a node has child nodes
`insertBefore`	Inserts a new child node before an existing one. If no child node exists, the new child node becomes the first
`removeChild`	Removes the specified node from the list of child nodes
`replaceChild`	Replaces one child of a node with another and returns the old child
`selectNodes`	Creates a NodeList of all the matching child nodes returned after matching the specified pattern
`selectSingleNode`	Returns a Node interface for the first child node to match the specified pattern
`transformNode`	Processes this node and its child nodes using the specified XSL style sheet and returns the resulting transformation
`transformNodeToObject`	Processes this node and its descendants using the specified XSL style sheet and returns the resulting transformation in the specified object

Note

Note that the methods selectNodes, selectSingleNode, transformNode, and transform NodeToObject are available only in the Microsoft implementation of DOM.

Now that you have had an understanding of the properties and methods of the Node object, look at an example.

When the parser loads an XML document, it gives you a reference to the document itself. From this, you can get a reference to the root element in the document (in this example, the Products element) with the property name documentElement. The children of that element are, in turn, accessible through the childNodes property.

      var nodes = doc.documentElement.childNodes;

The childNodes property, and thus the nodes variable in this example, contains a node list that is represented by NodeList interface. In accordance with the DOM standard, you can access the elements of a node list by passing a numerical index to the item() method, with 0 corresponding to the first node in the list. In this example, therefore, nodes.item(0) returns a reference to the first child element of the Products element-the Product element.

      document.write(nodes.item(0).text);

The result should look something like this:

      707 Sport-100 Helmet, Red HL-U509-R

As you can see, the output shows the concatenated the values of the ProductID, Name and ProductNumber elements. If you just want to print the ProductID element value of the first Product element, you need to modify the code to look as follows:

      var nodes = doc.documentElement.childNodes.item(0).childNodes;      document.write(nodes.item(0).text);

When you run the code now, the text 707 is displayed in the browser dialog box.

Note that Internet Explorer (and indeed many other DOM implementations) allows you to treat NodeList objects as arrays to simplify the code you need to work with them. For example, you could use array syntax to access nodes instead of the item method:

      var nodes = doc.documentElement.childNodes[0].childNodes;      alert(nodes[0].text);

This method of accessing text values within an XML file by numerical index is useful, but it can get a little cumbersome and it can be sometimes error prone as well. Fortunately, there is another way to approach the problem.

Creating a New Node

You create a new node using the createNode() method. To create a root element using the createNode() method in JavaScript, use the following code:

      var doc = new ActiveXObject("Microsoft.XMLDOM");      doc.async = false;      doc.load("Products.xml");       if (doc.childNodes.length == 0)      {         rootNode = doc.createNode(1,"Products"," ");         doc.appendChild(rootNode);         doc.save("Products.xml");      }

In the previous code, the DOM object serves as the root node for the tree structure. The length property of the NodeList object is used to check the number of child nodes that the root node contains. If this number is equal to 0, a new node is created using the createNode() method. This new node is then added as the root document element using the appendChild() method.

Appending a New Child Node

You append a new child node to a DOM tree using the appendChild() method of the Node object, as shown:

      var rootElement = doc.documentElement;      var prodElement=doc.createElement("Product");      rootElement.appendChild(prodElement);

In the previous code, you first create a reference to the root element of the DOM object. You then create a new element using the createElement() method of the DOMDocument object in JavaScript. Finally, you append the created element to the last child of the root element using the appendChild() method of the Node object.

Inserting a Node Before an Existing Node

You insert a node before an existing node in a DOM tree using the insertBefore() method of the Node object, as shown:

      var newElement= doc.createElement("ProductIdentifier");      var oldElement = doc.documentElement.childNodes.item(0).childNodes.item(0);      doc.documentElement.childNodes.item(0).insertBefore(newElement, oldElement);

In the previous code, you first create a new element called ProductIdentifier. You then obtain the reference of the first child of the first node-set within the root element and store a reference to this child node in a variable, oldElement. Finally, you insert the newly created node before the first child node using the insertBefore() method of the Node object.

Removing a Child Node

You can remove a child node from a DOM tree using the removeChild() method of the Node object, as shown:

      var elementToBeRemoved = doc.documentElement.childNodes.item(0).firstChild;      doc.documentElement.childNodes.item(0).removeChild(elementToBeRemoved);

In the previous code, you first obtain a reference to the first child node of the first node-set of the root element and store this reference in the variable, elementToBeRemoved. You use the removeChild() method of the Node object to remove the node contained in elementToBeRemoved.

Replacing a Node

You replace an existing node with a new node using the replaceChild() method of the Node object. The replaceChild() method takes two parameters, the first parameter is the new element and the second parameter is the existing element that needs to be replaced. In the following code, the first ProductID element in the document is replaced with the new element named ProductIdentifier.

      var newElement= doc.createElement("ProductIdentifier");      var oldElement=doc.documentElement.childNodes.item(0).childNodes.item(0);      doc.documentElement.childNodes.item(0).replaceChild(newElement, oldElement);      doc.documentElement.childNodes.item(1).childNodes.item(0).        replaceChild(newElement, oldElement);

Accessing Text Values of Elements

In the Microsoft implementation of DOM, the text enclosed within the tags in an XML document is used as a node value, which can be the value of an attribute or the text within an element.

You can display the text within an element using the text property of the Node object, as shown:

      alert(productIDElement.text);

You can also set the value of an element or an attribute using this property, as shown:

      productIDElement.text="100";

Using the NodeList Interface

The NodeList interface is a collection of Node and its childNode interfaces. It allows access to all the child nodes. The length property of the NodeList interface is a very important property that returns the number of items in the NodeList collection. The following table describes the different methods of the NodeList interface.

Open table as spreadsheet

Method	Description
`item`	Returns the item at the index of the Node collection
`nextNode`	Returns null if an invalid index is entered
`reset`	Resets the sequence of the collection

The following code creates a NodeList interface of the Product elements using the XML document's getElementByTagName() method. With the Length property, you can determine the number of nodes in the list and display the node values by accessing each node through the index.

      var productNodes = doc.getElementsByTagName("Product");      var length = productNodes.length;      for (i = 0; i < length; i++)        document.write(productNodes.item(i).text + "<br>");

When you open the HTML file in the browser, the browser displays the output shown in Figure 12-2.

image from book
Figure 12-2

Using the NamedNodeMap Interface

The NamedNodeMap interface represents a collection of nodes that can be accessed by name. The following code shows how to create a NamedNodeMap interface of all the attribute nodes of the class element. Then iterate through the collection using the item method to display the attribute name and associated text.

      var firstChildElement = doc.documentElement.firstChild;      var attributes = firstChildElement.attributes;      for (i = 0; i < attributes.length; i++)        document.write(attributes.item(i).name + "="        + attributes.item(i).text + "<br>");

When you open the HTML file in the browser, the browser displays the attribute name and associated text. If you use the Products.xml file as an example, you will get “Category=Helmets” as the output because the Product element has only one attribute.

Using the Attr Interface

The Attr interface represents an attribute of an Element object. The DOM considers Attr to be a property of an element. The values that are allowed for an Attr interface are defined in DTD. An Attr interface is similar to a Node interface and has the properties and methods of a Node interface. The following table discusses the important properties of the Attr interface.

Open table as spreadsheet

Property	Description
`Name`	Sets the name of the attribute. It is same as the nodeName property for this Node interface
`specified`	Indicates if the value of the attribute is set in the document
`Value`	Returns or sets the value of the attribute

In addition to the previous methods, all the methods of the Node interface also apply to Attr because Attr is also a Node interface. The following code shows a simple example of using the Attr interface to retrieve the name and value of attributes in an XML document.

      var firstChildElement = doc.documentElement.firstChild;      var attributes = firstChildElement.attributes;      for (i = 0; i < attributes.length; i++)        document.write(attributes.item(i).name + "=" +        attributes.item(i).value + "<br>");

When you open the HTML file in the browser, the browser displays the name and the value of the attribute of the first node. In the case of Products.xml file, it just displays Category=Helmets as the output.

Creating Attributes

Most of the functionality that is included with the Element node is the management of attributes. This example shows how to add new attributes to an existing Element node and how to view attribute contents. Creating attributes can be accomplished with the Document method createAttribute(…). It can then be inserted into the tree with setAttributeNode(…). An even simpler method exists by using the setAttribute(…) method on the Element node. This method allows you to work with attribute names that are strings instead of attribute nodes. Listing 12-2 shows an example of how to create an attribute and retrieve its value for display purposes.

Listing 12-2: Using XML DOM to manipulate attributes

      <html xmlns="http://www.w3.org/1999/xhtml">      <head>        <title>Working with Attributes</title>        <script type="text/javascript" language="javascript">          var doc;          function btnCreateAndDisplayAttribute_Click()          {            loadDocument();            createAndDisplayAttribute();          }          function loadDocument()          {            doc = new ActiveXObject("Microsoft.XMLDOM");            doc.async = false;             doc.load("Products.xml");          }          function createAndDisplayAttribute()          {            var docElement = doc.documentElement;            //Put the attribute myAtt='hello' on rootElement            docElement.setAttribute('CategoryID', '1');            //Display the value of the added attribute            result.innerText = docElement.getAttribute('CategoryID');          }          </script>      </head>      <body>        <input type="button"           value="Create and display attribute"          onclick="btnCreateAndDisplayAttribute_Click()" />        <br/><br/><br/>        <div ></div>      </body>      </html>

When you click the button control, the page displays the value of the CategoryID attribute, which is 1 in this case.

Using the CharacterData Interface

The CharacterData interface provides the Node object with various properties and methods to manipulate text. These interfaces can handle very large amounts of text and can be implemented by the CDATA Section, Comment, and Text Nodes. The CharacterData interface has the following properties:

Open table as spreadsheet

Property	Description
`data`	This property contains the data for this node, depending on node type
`length`	This property is read-only and contains the length of the data string in characters

The following table lists the methods for CharacterData Interface.

Open table as spreadsheet

Property	Description
`appendData`	Adds the specified string to existing string data
`deleteData`	Deletes the specified range of characters from string data
`insertData`	Inserts a string of data at the specified position in the string
`replaceData`	Replaces the characters from the specified position in the string with the supplied string data
`substringData`	Returns a substring consisting of the specified range of characters

Look at the following simple example to understand the use of one of the methods of the CharacterData interface.

      var prodElement = doc.documentElement.firstChild;      var text = prodElement.firstChild.firstChild;      document.write(text.data + "<br>");      var lastTwoCharacters = text.substringData(1, 2)      document.write(lastTwoCharacters  + "<br>");

The previous code displays the character data of the first ProductID element using the data. The substringData() method gets the specified range of characters from the substring of the text (char-offset = 1 and num-count= 2) and displays that specific data. The output produced by the page looks as follows in the browser:

      707      07

Using the Comment Interface

The Comment represents the content which appear between ‘<!-’ and ‘-->’ as a comment entry. The Comment object does not have any properties of its own. It inherits the properties of Node objects as well as CharacterData objects. It inherits the properties as well as the methods of Node and CharacterData objects.

Using the Text Interface

The Text object represents the text of an Element or an Attr object. There is only one node of Text for each block of text. The Text object has properties of Node and CharacterData objects. The Text is also a Node object and therefore inherits the methods of Node objects. The Text interface has one method of its own named splitText(number). This method splits the text in two parts, at the specified character, and returns the rest of the text, till the end of the string into a new text node.

Using the CDATA Section Interface

The CDATA Section interface represents the content within the CDATA section brackets ![…]]. The CDATA Section provides characters that should not be parsed by the XML parser. The content of CDATA Section is stored as a childNode of a Text node. The CDATA Section interface has no methods or properties of its own but inherits those of the Text and Node objects.

If the CDATA Section contains text, which includes HTML tags, the CDATA Section object allows it to escape from the XML parser. The content of the CDATA Section is displayed without the brackets ![…]]. You can use CDATA Section interface to exclude HTML tags while parsing as shown here:

      <?xml version="1.0"?>      <Products>        <Product>          <ProductID></ProductID>          <Name><![CDATA[<span style="color:red"> Cotton Shirt </span>]]> </Name>          ----          ----      </Products>

The code required to handle a CDATA Section is exactly the same as processing any other node since the CDATA Section is also a node.

Handling Errors in XML DOM

At times the XML parsing might generate errors due to reasons such as invalid XML, schema compliant reasons, and so on. To process these errors, the Document object exposes a property called parseError through which you can get more details about the exception. This object, derived from the interface IXMLDOMParseError provides a set of properties to retrieve the error information. The following table describes the commonly used properties of the IXMLDOMParseError object:

Open table as spreadsheet

Property	Description
`reason`	Stores a string explaining the reason for the error
`line`	Stores a long integer representing the line number for the error
`errorCode`	Contains long integer error code. This property contains the value `0` if there are no errors in the XML document
`linepos`	Stores a long integer representing the line position for the error
`srcText`	Stores a string containing the line that caused the error

You use the IXMLDOMParseError object to display the information about the errors encountered while parsing an XML document, as shown here:

      var doc = new ActiveXObject("Microsoft.XMLDOM");      doc.async = false;      doc.load("Products.xml");      if (doc.parseError.errorCode != 0)      {        alert("Error Code: " + doc.parseError.errorCode);        alert("Error Reason: " + doc.parseError.reason);        alert("Error Line: " + doc.parseError.line);      }      else      {        alert(doc.documentElement.xml);      }

In the previous code, you first create a new DOM object and then use the if construct to determine whether the parseError property of this object returns any error code. If the error code is greater than 1, you display the details of the error indicating the error code, reason, and the line number where the error occurred. Otherwise, you display a message box showing the XML of the document.

XML Transformation Using XSL

In this section, you see the steps involved in transforming the contents of an XML file into HTML using the built-in support provided by XML DOM. You can accomplish this in the client side by invoking the methods of XML DOM through JavaScript. First, let's create the XSL file that will be used to transform the Products.xml file as shown in Listing 12-3.

Listing 12-3: Products.xsl file used for transforming the Products.xml file

      <?xml version="1.0" ?>      <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">        <xsl:output method="html" />        <xsl:template match="/">          <table border="1" cellSpacing="1" cellPadding="1">            <center>              <xsl:element name="tr">                <xsl:element name="td">Product ID</xsl:element>                <xsl:element name="td">                  <xsl:attribute name="align">center</xsl:attribute>                  Name               </xsl:element>                <xsl:element name="td">Product Number</xsl:element>              </xsl:element>              <xsl:for-each select="//Product">                <!-- Each product on a separate row -->                <xsl:element name="tr">                  <xsl:element name="td">                   <xsl:value-of select="ProductID" />                  </xsl:element>                  <xsl:element name="td">                   <xsl:value-of select="Name" />                  </xsl:element>                  <xsl:element name="td">                   <xsl:value-of select="ProductNumber" />                  </xsl:element>                </xsl:element>              </xsl:for-each>            </center>          </table>        </xsl:template>      </xsl:stylesheet>

The XSL logic shown in Listing 12-3 simply loops through all the <Product> elements and for each element it retrieves the values of the ProductID, Name, and ProductNumber elements and displays them in the browser. Now that you have created the XSL file, look at the code of the Web page in Listing 12-4 to perform the transformation.

Listing 12-4: Transforming XML to HTML using XML DOM

      <html xmlns="http://www.w3.org/1999/xhtml">      <head>        <title>Transforming XML to HTML</title>        <script type="text/javascript" language="javascript">          var xmlDoc;          var xslDoc;          function btnTransformXmlToHtml_Click()          {            loadDocuments();            tranformXmlToHtml();          }          function loadDocuments()          {            //Load the XML Document            xmlDoc = new ActiveXObject("Microsoft.XMLDOM");            xmlDoc.async = false;            xmlDoc.load("Products.xml");            //Load the XSL Document            xslDoc = new ActiveXObject("Microsoft.XMLDOM");            xslDoc.async = false;            xslDoc.load("Products.xsl");          }          function tranformXmlToHtml()          {            var output = xmlDoc.transformNode(xslDoc);            result.innerHTML = output;          }          </script>      </head>      <body>         <input type="button"  value="Transform XML"           onclick="btnTransformXmlToHtml_Click()" />          <br/><br/><br/>          <div ></div>      </body>      </html>

The preceding Web page contains mostly JavaScript code that loads the XML and XSLT files into memory, processes them, and displays the results. First, you create an instance of the XML DOM and load the Products.xml file into memory. Next, you create another instance of XML DOM and load the Products.xsl file into memory. Since XSLT files are formatted as XML, you can load them just as you would any other XML file:

You then transform the XML document using the XSL style sheet, and assign the HTML output of the transformation to the innerHTML property of the div control.

            function tranformXmlToHtml()      {        var output = xmlDoc.transformNode(xslDoc);        result.innerHTML = output;      }

The transformNode() method takes the object that holds the XSL file as an argument. Figure 12-3 shows how the output looks when you click the Transform XML button in the browser.

image from book
Figure 12-3