Section 19.8. (Optional) Document Object Model (DOM)

19.8. (Optional) Document Object Model (DOM)

Although an XML document is a text file, retrieving data from the document using traditional sequential file processing techniques is neither practical nor efficient, especially for adding and removing elements dynamically.

Upon successfully parsing a document, some XML parsers store document data as tree structures in memory. Figure 19.21 illustrates the tree structure for the root element of the document article.xml discussed in Fig. 19.2. This hierarchical tree structure is called a Document Object Model (DOM) tree, and an XML parser that creates this type of structure is known as a DOM parser. Each element name (e.g., article, date, firstName) is represented by a node. A node that contains other nodes (called child nodes or children) is called a parent node (e.g., author). A parent node can have many children, but a child node can have only one parent node. Nodes that are peers (e.g., firstName and lastName) are called sibling nodes. A node's descendant nodes include its children, its children's children and so on. A node's ancestor nodes include its parent, its parent's parent and so on.

Figure 19.21. Tree structure for the document `article.xml` of Fig. 19.2.

The DOM tree has a single root node, which contains all the other nodes in the document. For example, the root node of the DOM tree that represents article.xml (Fig. 19.2) contains a node for the XML declaration (line 1), two nodes for the comments (lines 23) and a node for the XML document's root element article (line 5).

Classes for creating, reading and manipulating XML documents are located in the FCL namespace System.Xml. This namespace also contains additional namespaces that provide other XML-related operations.

Reading an XML Document with an `XmlReader`

In this section, we present several examples that use DOM trees. Our first example, the program in Fig. 19.22, loads the XML document presented in Fig. 19.2 and displays its data in a text box. This example uses class XmlReader to iterate through each node in the XML document.

Figure 19.22. XmlReader iterating through an XML document.

  1  ' Fig. 19.22: FrmXmlReaderTest.vb  2  ' Reading an XML document.  3  Imports System.Xml  4  5  Public Class FrmXmlReaderTest  6     ' read XML document and display its content  7     Private Sub FrmXmlReaderTest_Load(ByVal sender As System.Object, _  8        ByVal e As System.EventArgs) Handles MyBase .Load  9        ' create the XmlReader object                                       10        Dim settings As New XmlReaderSettings()                             11        Dim reader As XmlReader = XmlReader.Create("article.xml", settings) 12 13        Dim depth As Integer = -1 ' tree depth is -1, no indentation 14 15        While reader.Read() ' display each node's content 16           Select Case (reader.NodeType) 17              Case XmlNodeType.Element ' XML Element, display its name 18                 depth += 1 ' increase tab depth 19                 TabOutput(depth) ' insert tabs 20                 txtOutput.Text &= "<" & reader.Name & ">" & vbCrLf 21 22                 ' if empty element, decrease depth 23                 If reader.IsEmptyElement Then 24                    depth -= 1 25                 End If 26              Case XmlNodeType.Comment ' XML Comment, display it 27                 TabOutput(depth) ' insert tabs 28                 txtOutput.Text &= "<!--" & reader.Value & "-->" & vbCrLf 29              Case XmlNodeType.Text ' XML Text, display it 30                 TabOutput(depth) ' insert tabs 31                 txtOutput.Text &= vbTab & reader.Value & vbCrLf 32              Case XmlNodeType.XmlDeclaration ' XML Declaration, display it 33                 TabOutput(depth) ' insert tabs 34                 txtOutput.Text &= "<?" & reader.Name & " " & _ 35                    reader.Value & "?>" & vbCrLf 36              Case XmlNodeType.EndElement ' XML EndElement, display it 37                 TabOutput(depth) ' insert tabs 38                 txtOutput.Text &= "</" & reader.Name & ">" & vbCrLf 39                 depth -= 1 ' decrement depth 40           End Select 41        End While 42     End Sub ' FrmXmlReaderTest_Load 43 44     ' insert tabs 45     Private Sub TabOutput( ByVal number As Integer) 46        For i As Integer = 1 To number 47           txtOutput.Text &= vbTab 48        Next 49     End Sub ' TabOutput 50  End Class ' FrmXmlReaderTest

Line 3 imports the System.Xml namespace, which contains the XML classes used in this example. Class XmlReader is a MustInherit class that defines the interface for reading XML documents. We cannot create an XmlReader object directly. Instead, we must invoke XmlReader's Shared method Create to obtain an XmlReader reference (line 11). Before doing so, however, we must prepare an XmlReaderSettings object that specifies how we would like the XmlReader to behave (line 10). In this example, we use the default settings of the properties of an XmlReaderSettings object. Later, you will learn how to set certain properties of the XmlReaderSettings class to instruct the XmlReader to perform validation, which it does not do by default. The Shared method Create receives as arguments the name of the XML document to read and an XmlReaderSettings object. In this example the XML document article.xml (Fig. 19.2) is opened when method Create is invoked in line 11. Once the XmlReader is created, the XML document's contents can be read programmatically.

Method Read of XmlReader reads one node from the DOM tree. By calling this method in the loop condition (line 15), reader reads all the document nodes. The Select Case statement (lines 1640) processes each node. Either the Name property (lines 20, 34 and 38), which contains the node's name, or the Value property (lines 28 and 31), which contains the node's data, is formatted and concatenated to the String assigned to the TextBox's Text property. The XmlReader's NodeType property specifies whether the node is an element, comment, text, XML declaration or end element. Note that each Case specifies a node type using XmlNodeType enumeration constants. For example, XmlNodeType.Element (line 17) indicates the start tag of an element.

The displayed output emphasizes the structure of the XML document. Variable depth (line 13) maintains the number of tab characters to indent each element. We increment the depth each time the program encounters an Element and decrement it each time the program encounters an EndElement or empty element. We use a similar technique in the next example to emphasize the tree structure of the XML document being displayed.

Displaying a DOM Tree Graphically in a `treeView` Control

XmlReaders do not provide features for displaying their content graphically. In this example, we display an XML document's contents using a TReeView control. We use class TReeNode to represent each node in the tree. Class treeView and class treeNode are part of the System.Windows.Forms namespace. TReeNodes are added to the TReeView to emphasize the structure of the XML document.

The program in Fig. 19.23 demonstrates how to manipulate a DOM tree programmatically to display it graphically in a treeView control. The GUI for this application contains a treeView control named treeXML (declared in FrmXmlDom.Designer.vb). The application loads letter.xml (Fig. 19.24) into an XmlReader (line 17), then displays the document's tree structure in the TReeView control. [Note: The version of letter.xml in Fig. 19.24 is nearly identical to the one in Fig. 19.4, except that Fig. 19.24 does not reference a DTD as line 5 of Fig. 19.4 does.]

Figure 19.23. DOM structure of an XML document displayed in a TreeView.

  1  ' Fig. 19.23: FrmXmlDom.vb  2  ' Demonstrates DOM tree manipulation.  3  Imports System.Xml  4  5  Public Class FrmXmlDom  6     Private tree As TreeNode ' TreeNode reference  7  8     ' initialize instance variables  9     Private Sub FrmXmlDom_Load(ByVal sender As Object, _ 10       ByVal e As EventArgs) Handles MyBase .Load 11       ' create Xml ReaderSettings and 12       ' set the IgnoreWhitespace property 13       Dim settings As New XmlReaderSettings() 14       settings.IgnoreWhitespace = True 15 16       ' create XmlReader object 17       Dim reader As XmlReader = XmlReader.Create("letter.xml", settings) 18       tree = New TreeNode() ' instantiate TreeNode                19       tree.Text = "letter.xml" ' assign name to TreeNode          20       treeXml.Nodes.Add(tree) ' add TreeNode to TreeView control  21       BuildTree(reader, tree) ' build node and tree hierarchy 22    End Sub ' FrmXmlDom_Load 23 24    ' construct TreeView based on DOM tree 25    Private Sub BuildTree(ByVal reader As XmlReader, _ 26       ByVal treeNode As TreeNode) 27       ' treeNode to add to existing tree 28       Dim newNode As New TreeNode() 29 30       While reader.Read() 31          ' build tree based on node type 32          Select Case reader.NodeType 33             Case XmlNodeType.Text ' add Text node's value to tree 34                newNode.Text = reader.Value 35                treeNode.Nodes.Add(newNode) 36             Case XmlNodeType.EndElement ' move up tree 37                treeNode = treeNode.Parent 38             Case XmlNodeType.Element ' add element name and traverse tree 39                ' determine whether element contains information 40                If Not reader.IsEmptyElement Then 41                   newNode.Text = reader.Name ' assign node text      42                   treeNode.Nodes.Add(newNode) ' add newNode as child 43                   treeNode = newNode ' set treeNode to last child 44                Else ' do not traverse empty elements 45                   ' assign NodeType string to newNode and add it to tree 46                   newNode.Text = reader.NodeType.ToString() 47                   treeNode.Nodes.Add(newNode)               48                End If 49             Case Else ' all other types, display node type 50                newNode.Text = reader.NodeType.ToString() 51                treeNode.Nodes.Add(newNode)               52          End Select 53 54          newNode = New TreeNode() 55       End While 56 57       ' update TreeView control 58       treeXml.ExpandAll() ' expand tree nodes in TreeView 59       treeXml.Refresh() ' force TreeView to update        60    End Sub ' BuildTree 61 End Class ' FrmXmlDom

Figure 19.24. Business letter marked up as XML.

  1  <?xml version = "1.0"?>  2  <!-- Fig. 19.24: letter.xml            -->  3  <!-- Business letter formatted with XML -->  4  5  <letter>  6     <contact type = "sender">  7        <name> Jane Doe</name>  8        <address1> Box 12345</address1>  9        <address2> 15 Any Ave.</address2> 10       <city> Othertown</city> 11       <state> Otherstate</state> 12       <zip> 67890</zip> 13       <phone> 555-4321         </phone> 14       <flag gender = "F" /> 15    </contact> 16 17    <contact type = "receiver"> 18       <name> John Doe</name> 19       <address1> 123 Main St.</address1> 20       <address2></address2> 21       <city> Anytown</city> 22       <state> Anystate</state> 23       <zip> 12345</zip> 24       <phone> 555-1234</phone> 25       <flag gender = "M"  /> 26    </contact> 27 28    <salutation> Dear Sir: </salutation> 29 30    <paragraph> It is our privilege to  inform you about our new database 31          managed with XML. This new system  allows you to reduce the 32          load on your inventory list server by having the client machine 33          perform the work of sorting   and filtering the data. 34    </paragraph> 35 36    <paragraph> Please visit our Web site  for availability 37          and pricing. 38    </paragraph> 39 40    <closing> Sincerely,</closing> 41    <signature> Ms. Doe</signature> 42 </letter>

In FrmXmlDom's Load event handler (lines 922), lines 1314 create an XmlReaderSettings object and set its IgnoreWhitespace property to TRue so that the insignificant whitespaces in the XML document are ignored. Line 17 then invokes Shared XmlReader method Create to parse and load letter.xml.

Line 18 creates the TReeNode tree (declared in line 6). This treeNode is used as a graphical representation of a DOM tree node in the treeView control. Line 19 assigns the XML document's name (i.e., letter.xml) to TRee's Text property. Line 20 calls method Add to add the new TReeNode to the treeView's Nodes collection. Line 21 calls our Private method BuildTree to update the treeView so that it displays the complete DOM tree.

Method BuildTree (lines 2560) receives an XmlReader for reading the XML document and a treeNode referencing the current location in the tree (i.e., the TReeNode most recently added to the TReeView control). Line 28 declares TReeNode reference newNode, which will be used for adding new nodes to the TReeView. Lines 3055 iterate through each node in the XML document's DOM tree.

The Select Case statement in lines 3252 adds a node to the treeView, based on the XmlReader's current node. When a text node is encountered, the Text property of the new TReeNodenewNodeis assigned the current node's value (line 34). Line 35 adds this TReeNode to treeNode's node list (i.e., adds the node to the treeView control).

Line 36 matches an EndElement node type. This Case moves up the tree to the current node's parent because the end of an element has been encountered. Line 37 accesses TReeNode's Parent property to retrieve the node's current parent.

Line 38 matches Element node types. Each non-empty Element NodeType (line 40) increases the depth of the tree; thus, we assign the current reader.Name to the newNode's Text property and add the newNode to treeNode's node list (lines 4142). Line 43 assigns the newNode's reference to treeNode to ensure that treeNode refers to the last child TReeNode in the node list. If the current Element node is an empty element (line 44), we assign to the newNode's Text property the string representation of the NodeType (line 46). Next, the newNode is added to the treeNode node list (line 47). The default case (lines 4951) assigns the string representation of the node type to the newNode Text property, then adds the newNode to the TReeNode node list.

After the entire DOM tree is processed, the treeNode node list is displayed in the treeView control (lines 5859). treeView method ExpandAll causes all the nodes of the tree to be displayed. treeView method Refresh updates the display to show the newly added treeNodes. Note that while the application is running, clicking nodes (i.e., the + or boxes) in the treeView either expands or collapses them.

Locating Data in XML Documents with XPath

Although XmlReader includes methods for reading and modifying node values, it is not the most efficient means of locating data in a DOM tree. The Framework Class Library provides class XPathNavigator in the System.Xml.XPath namespace for iterating through node lists that match search criteria, which are written as XPath expressions. Recall that XPath (XML Path Language) provides a syntax for locating specific nodes in XML documents effectively and efficiently. XPath is a string-based language of expressions used by XML and many of its related technologies (such as XSLT, discussed in Section 19.7).

Figure 19.25 uses an XPathNavigator to navigate an XML document and uses a treeView control and treeNode objects to display the XML document's structure. In this example, the treeNode node list is updated each time the XPathNavigator is positioned to a new node, rather than displaying the entire DOM tree at once. Nodes are added to and deleted from the treeView to reflect the XPathNavigator's location in the DOM tree. Fig. 19.26 shows the XML document sports.xml that we use in this example. [Note: The versions of sports.xml presented in Fig. 19.26 and Fig. 19.16 are nearly identical. In the current example, we do not want to apply an XSLT, so we omit the processing instruction found in line 2 of Fig. 19.16.]

Figure 19.25. XPathNavigator navigating selected nodes.

  1  ' Fig. 19.25: FrmPathNavigator.vb  2  ' Demonstrates class XPathNavigator.  3  Imports System.Xml.XPath  4  5  Public Class FrmPathNavigator  6     Private xPath As XPathNavigator ' navigator to traverse document        7     Private document As XPathDocument ' document for use by XPathNavigator  8     Private tree As TreeNode ' TreeNode used by TreeView control  9 10    ' initialize variables and TreeView control 11    Private Sub FrmPathNavigator_Load(ByVal sender As Object, _ 12       ByVal e As EventArgs) Handles MyBase.Load 13       document = New XPathDocument("sports.xml") ' load XML document 14       xPath = document.CreateNavigator() ' create navigator 15       tree = New TreeNode() ' create root node for TreeNodes 16 17       tree.Text = xPath.NodeType.ToString() ' root 18       treePath.Nodes.Add(tree) ' add tree 19 20       ' update TreeView control 21       treePath.ExpandAll() ' expand tree node in TreeView 22       treePath.Refresh() ' force TreeView update 23       treePath.SelectedNode = tree  ' highlight root 24    End Sub  ' FrmPathNavigator_Load 25 26    ' process btnSelect_Click  event 27    Private Sub btnSelect_Click(ByVal sender As Object, _ 28       ByVal e As EventArgs) Handles btnSelect.Click 29       Dim iterator As XPathNodeIterator ' enables node iteration 30 31       Try ' get specified node from ComboBox 32          iterator = xPath.Select(cboSelect.Text) ' select specified node 33          DisplayIterator(iterator) ' display selection 34      Catch argumentException As XPathException 35         MessageBox.Show(argumentException.Message, "Error", _ 36            MessageBoxButtons.OK, MessageBoxIcon.Error) 37      End Try 38    End Sub ' btnSelect_Click 39 40    ' traverse to first child on btnFirstChild_Click event 41    Private Sub btnFirstChild_Click(ByVal sender As Object, _ 42       ByVal e  As EventArgs) Handles btnFirstChild.Click 43       Dim newTreeNode As TreeNode 44 45       ' move to first child 46       If xPath.MoveToFirstChild() Then 47          newTreeNode = New TreeNode() ' create new node 48 49          ' set node's Text property to  either navigator's name or value 50          DetermineType(newTreeNode, xPath) 51          tree.Nodes.Add(newTreeNode) ' add nodes to TreeNode node list 52          tree = newTreeNode ' assign tree newTreeNode 53 54          ' update TreeView control 55          treePath.ExpandAll() ' expand  node in     TreeView 56          treePath.Refresh() ' force TreeView to update 57          treePath.SelectedNode = tree ' highlight root 58       Else ' node has no children 59          MessageBox.Show("Current Node  has no children.", _ 60                "", MessageBoxButtons.OK, MessageBoxIcon.Information) 61       End If 62    End Sub ' btnFirstChild_Click 63 64    ' traverse to node's parent on btnParent_Click event 65    Private Sub btnParent_Click(ByVal sender As Object, _ 66       ByVal e As  EventArgs) Handles btnParent.Click 67       ' move to parent 68       If xPath.MoveToParent() Then 69          tree = tree.Parent 70 71          ' get number of child nodes, not including sub trees 72          Dim count As Integer = tree.GetNodeCount(False) 73 74          ' remove all  children 75          For i As Integer = 0 To count - 1 76             tree.Nodes.Remove(tree.FirstNode) ' remove child node 77          Next 78 79          ' update TreeView control 80          treePath.ExpandAll() ' expand node in TreeView 81          treePath.Refresh() ' force  TreeView to update 82          treePath.SelectedNode = tree ' highlight root 83       Else ' if node  has no parent (root node) 84          MessageBox.Show("Current node   has no  parent.", "" , _ 85             MessageBoxButtons.OK, MessageBoxIcon.Information) 86       End If 87    End Sub ' btnParent_Click 88 89    ' find next sibling on btnNext_Click event 90    Private Sub btnNext_Click(ByVal sender As Object, _ 91       ByVal e As  EventArgs) Handles btnNext.Click 92       ' declare and initialize two TreeNodes 93       Dim newTreeNode As TreeNode = Nothing 94       Dim newNode As TreeNode = Nothing 95 96       ' move to next sibling 97       If xPath.MoveToNext() Then 98          newTreeNode = tree.Parent ' get parent node 99          newNode = New TreeNode() ' create new node 100 101         ' decide whether to display current node 102         DetermineType(newNode, xPath) 103         newTreeNode.Nodes.Add(newNode) ' add to parent node 104 105         tree = newNode ' set current position for display 106 107         ' update TreeView control 108         treePath.ExpandAll() ' expand node in TreeView 109         treePath.Refresh() ' force TreeView to update 110         treePath.SelectedNode = tree ' highlight root 111      Else ' node has no additional siblings 112         MessageBox.Show("Current node is last sibling.", "", _ 113            MessageBoxButtons.OK, MessageBoxIcon.Information) 114      End If 115    End Sub ' btnNext_Click 116 117    ' get previous sibling on btnPrevious_Click 118    Private Sub btnPrevious_Click(ByVal sender As Object, _ 119       ByVal e As EventArgs) Handles btnPrevious.Click 120       Dim parentTreeNode As TreeNode = Nothing 121 122       ' move to previous sibling 123      If xPath.MoveToPrevious() Then 124 125          parentTreeNode = tree.Parent ' get parent node 126          parentTreeNode.Nodes.Remove(tree) ' delete current node 127          tree = parentTreeNode.LastNode ' move to previous node 128 129          ' update TreeView control 130          treePath.ExpandAll() ' expand tree node in TreeView 131          treePath.Refresh() ' force TreeView to update 132          treePath.SelectedNode = tree ' highlight root 133      Else ' if current node has no previous siblings 134         MessageBox.Show("Current node is first sibling.", "", _ 135            MessageBoxButtons.OK, MessageBoxIcon.Information) 136      End If 137    End Sub ' btnPrevious_Click 138 139    ' print values for XPathNodeIterator 140    Private Sub DisplayIterator(ByVal iterator As XPathNodeIterator) 141       txtSelect.Clear() 142 143       ' display selected node's values 144       While iterator.MoveNext() 145          txtSelect.Text &= iterator.Current.Value.Trim() & vbCrLf 146       End While 147    End Sub ' DisplayIterator 148 149    ' determine if TreeNode should display current node name or value 150    Private Sub DetermineType(ByVal node As TreeNode, _ 151       ByVal xPath As XPathNavigator) 152 153       Select Case xPath.NodeType ' determine NodeType 154          Case XPathNodeType.Element ' if Element, get its name 155             ' get current node name, and remove whitespaces 156             node.Text = xPath.Name.Trim() 157          Case Else ' obtain node values 158             ' get current node value and remove whitespaces 159             node.Text = xPath.Value.Trim() 160       End Select 161    End Sub ' DetermineType 162 End Class ' FrmPathNavigator

(a)

(b)

(c)

(d)

(e)

Figure 19.26. XML document that describes various sports.

  1  <?xml version = "1.0"?>  2  <!-- Fig. 19.26: sports.xml -->  3  <!--     Sports Database   -->  4  5  <sports>  6    <game id = "783">  7        <name> Cricket</name>  8  9        <paragraph> 10          More popular among commonwealth nations. 11       </paragraph> 12   </game> 13 14   <game id = "239"> 15       <name> Baseball</name> 16 17       <paragraph> 18          More popular in America. 19       </paragraph> 20   </game> 21 22   <game id = "418"> 23       <name> Soccer (Futbol)</name> 24 25       <paragraph> 26          Most popular sport in the world. 27       </paragraph> 28    </game> 29 </sports>

The program of Fig. 19.25 loads XML document sports.xml (Fig. 19.26) into an XPathDocument object by passing the document's file name to the XPathDocument constructor (line 13). Method CreateNavigator (line 14) creates and returns an XPathNavigator reference to the XPathDocument's tree structure.

The navigation methods of XPathNavigator are MoveToFirstChild (line 46), MoveToParent (line 68), MoveToNext (line 97) and MoveToPrevious (line 123). Each method performs the action that its name implies. Method MoveToFirstChild moves to the first child of the node referenced by the XPathNavigator, MoveToParent moves to the parent node of the node referenced by the XPathNavigator, MoveToNext moves to the next sibling of the node referenced by the XPathNavigator and MoveToPrevious moves to the previous sibling of the node referenced by the XPathNavigator. Each method returns a Boolean indicating whether the move was successful. Whenever a move operation fails, we display a warning in a MessageBox. Furthermore, each method is called in the event handler of the button that matches its name (e.g., clicking the First Child button in Fig. 19.25(a) triggers btnFirstChild_Click, which calls MoveToFirstChild).

Whenever we move forward using XPathNavigator, as with MoveToFirstChild and MoveToNext, nodes are added to the TReeNode node list. The Private method DetermineType (lines 150161) determines whether to assign the Node's Name property or Value property to the treeNode (lines 156 and 159). Whenever MoveToParent is called, all the children of the parent node are removed from the display. Similarly, a call to MoveToPrevious removes the current sibling node. Note that the nodes are removed only from the TReeView, not from the tree representation of the document.

The btnSelect_Click event handler (lines 2738) corresponds to the Select button. XPathNavigator method Select (line 32) takes search criteria in the form of either an XPathExpression or a String that represents an XPath expression, and returns as an XPathNodeIterator object any node that matches the search criteria. Figure 19.27 summarizes the XPath expressions provided by this program's combo box. We show the result of some of these expressions in Figs 19.25(b)(d).

Figure 19.27. XPath expressions and descriptions.
XPath Expression	Description
`/sports`	Matches all `sports` nodes that are child nodes of the document root node.
`/sports/game`	Matches all `game` nodes that are child nodes of `sports`, which is a child of the document root.
`/sports/game/name`	Matches all `name` nodes that are child nodes of `game`. The `game` is a child of `sports`, which is a child of the document root.
`/sports/game/paragraph`	Matches all `paragraph` nodes that are child nodes of `game`. The `game` is a child of `sports`, which is a child of the document root.
`/sports/game [name='Cricket']`	Matches all `game` nodes that contain a child element whose name is `Cricket`. The `game` is a child of `sports`, which is a child of the document root.

Method DisplayIterator (defined in lines 140147) appends the node values from the given XPathNodeIterator to the txtSelect TextBox. Note that we call String method trim to remove unnecessary whitespace. Method MoveNext (line 144) advances to the next node, which property Current (line 145) can access.