19.8. (Optional) Document Object Model (DOM) Although an XML document is a text file, retrieving data from the document using traditional sequential file processing techniques is neither practical nor efficient, especially for adding and removing elements dynamically. Upon successfully parsing a document, some XML parsers store document data as tree structures in memory. Figure 19.21 illustrates the tree structure for the root element of the document article.xml discussed in Fig. 19.2. This hierarchical tree structure is called a Document Object Model (DOM) tree, and an XML parser that creates this type of structure is known as a DOM parser. Each element name (e.g., article, date, firstName) is represented by a node. A node that contains other nodes (called child nodes or children) is called a parent node (e.g., author). A parent node can have many children, but a child node can have only one parent node. Nodes that are peers (e.g., firstName and lastName) are called sibling nodes. A node's descendant nodes include its children, its children's children and so on. A node's ancestor nodes include its parent, its parent's parent and so on. Figure 19.21. Tree structure for the document article.xml of Fig. 19.2. The DOM tree has a single root node, which contains all the other nodes in the document. For example, the root node of the DOM tree that represents article.xml (Fig. 19.2) contains a node for the XML declaration (line 1), two nodes for the comments (lines 23) and a node for the XML document's root element article (line 5). Classes for creating, reading and manipulating XML documents are located in the FCL namespace System.Xml. This namespace also contains additional namespaces that provide other XML-related operations. Reading an XML Document with an XmlReader In this section, we present several examples that use DOM trees. Our first example, the program in Fig. 19.22, loads the XML document presented in Fig. 19.2 and displays its data in a text box. This example uses class XmlReader to iterate through each node in the XML document. Figure 19.22. XmlReader iterating through an XML document. 1 ' Fig. 19.22: FrmXmlReaderTest.vb 2 ' Reading an XML document. 3 Imports System.Xml 4 5 Public Class FrmXmlReaderTest 6 ' read XML document and display its content 7 Private Sub FrmXmlReaderTest_Load(ByVal sender As System.Object, _ 8 ByVal e As System.EventArgs) Handles MyBase .Load 9 ' create the XmlReader object 10 Dim settings As New XmlReaderSettings() 11 Dim reader As XmlReader = XmlReader.Create("article.xml", settings) 12 13 Dim depth As Integer = -1 ' tree depth is -1, no indentation 14 15 While reader.Read() ' display each node's content 16 Select Case (reader.NodeType) 17 Case XmlNodeType.Element ' XML Element, display its name 18 depth += 1 ' increase tab depth 19 TabOutput(depth) ' insert tabs 20 txtOutput.Text &= "<" & reader.Name & ">" & vbCrLf 21 22 ' if empty element, decrease depth 23 If reader.IsEmptyElement Then 24 depth -= 1 25 End If 26 Case XmlNodeType.Comment ' XML Comment, display it 27 TabOutput(depth) ' insert tabs 28 txtOutput.Text &= "<!--" & reader.Value & "-->" & vbCrLf 29 Case XmlNodeType.Text ' XML Text, display it 30 TabOutput(depth) ' insert tabs 31 txtOutput.Text &= vbTab & reader.Value & vbCrLf 32 Case XmlNodeType.XmlDeclaration ' XML Declaration, display it 33 TabOutput(depth) ' insert tabs 34 txtOutput.Text &= "<?" & reader.Name & " " & _ 35 reader.Value & "?>" & vbCrLf 36 Case XmlNodeType.EndElement ' XML EndElement, display it 37 TabOutput(depth) ' insert tabs 38 txtOutput.Text &= "</" & reader.Name & ">" & vbCrLf 39 depth -= 1 ' decrement depth 40 End Select 41 End While 42 End Sub ' FrmXmlReaderTest_Load 43 44 ' insert tabs 45 Private Sub TabOutput( ByVal number As Integer) 46 For i As Integer = 1 To number 47 txtOutput.Text &= vbTab 48 Next 49 End Sub ' TabOutput 50 End Class ' FrmXmlReaderTest  | Line 3 imports the System.Xml namespace, which contains the XML classes used in this example. Class XmlReader is a MustInherit class that defines the interface for reading XML documents. We cannot create an XmlReader object directly. Instead, we must invoke XmlReader's Shared method Create to obtain an XmlReader reference (line 11). Before doing so, however, we must prepare an XmlReaderSettings object that specifies how we would like the XmlReader to behave (line 10). In this example, we use the default settings of the properties of an XmlReaderSettings object. Later, you will learn how to set certain properties of the XmlReaderSettings class to instruct the XmlReader to perform validation, which it does not do by default. The Shared method Create receives as arguments the name of the XML document to read and an XmlReaderSettings object. In this example the XML document article.xml (Fig. 19.2) is opened when method Create is invoked in line 11. Once the XmlReader is created, the XML document's contents can be read programmatically. Method Read of XmlReader reads one node from the DOM tree. By calling this method in the loop condition (line 15), reader reads all the document nodes. The Select Case statement (lines 1640) processes each node. Either the Name property (lines 20, 34 and 38), which contains the node's name, or the Value property (lines 28 and 31), which contains the node's data, is formatted and concatenated to the String assigned to the TextBox's Text property. The XmlReader's NodeType property specifies whether the node is an element, comment, text, XML declaration or end element. Note that each Case specifies a node type using XmlNodeType enumeration constants. For example, XmlNodeType.Element (line 17) indicates the start tag of an element. The displayed output emphasizes the structure of the XML document. Variable depth (line 13) maintains the number of tab characters to indent each element. We increment the depth each time the program encounters an Element and decrement it each time the program encounters an EndElement or empty element. We use a similar technique in the next example to emphasize the tree structure of the XML document being displayed. Displaying a DOM Tree Graphically in a treeView Control XmlReaders do not provide features for displaying their content graphically. In this example, we display an XML document's contents using a TReeView control. We use class TReeNode to represent each node in the tree. Class treeView and class treeNode are part of the System.Windows.Forms namespace. TReeNodes are added to the TReeView to emphasize the structure of the XML document. The program in Fig. 19.23 demonstrates how to manipulate a DOM tree programmatically to display it graphically in a treeView control. The GUI for this application contains a treeView control named treeXML (declared in FrmXmlDom.Designer.vb). The application loads letter.xml (Fig. 19.24) into an XmlReader (line 17), then displays the document's tree structure in the TReeView control. [Note: The version of letter.xml in Fig. 19.24 is nearly identical to the one in Fig. 19.4, except that Fig. 19.24 does not reference a DTD as line 5 of Fig. 19.4 does.] Figure 19.23. DOM structure of an XML document displayed in a TreeView. 1 ' Fig. 19.23: FrmXmlDom.vb 2 ' Demonstrates DOM tree manipulation. 3 Imports System.Xml 4 5 Public Class FrmXmlDom 6 Private tree As TreeNode ' TreeNode reference 7 8 ' initialize instance variables 9 Private Sub FrmXmlDom_Load(ByVal sender As Object, _ 10 ByVal e As EventArgs) Handles MyBase .Load 11 ' create Xml ReaderSettings and 12 ' set the IgnoreWhitespace property 13 Dim settings As New XmlReaderSettings() 14 settings.IgnoreWhitespace = True 15 16 ' create XmlReader object 17 Dim reader As XmlReader = XmlReader.Create("letter.xml", settings) 18 tree = New TreeNode() ' instantiate TreeNode 19 tree.Text = "letter.xml" ' assign name to TreeNode 20 treeXml.Nodes.Add(tree) ' add TreeNode to TreeView control 21 BuildTree(reader, tree) ' build node and tree hierarchy 22 End Sub ' FrmXmlDom_Load 23 24 ' construct TreeView based on DOM tree 25 Private Sub BuildTree(ByVal reader As XmlReader, _ 26 ByVal treeNode As TreeNode) 27 ' treeNode to add to existing tree 28 Dim newNode As New TreeNode() 29 30 While reader.Read() 31 ' build tree based on node type 32 Select Case reader.NodeType 33 Case XmlNodeType.Text ' add Text node's value to tree 34 newNode.Text = reader.Value 35 treeNode.Nodes.Add(newNode) 36 Case XmlNodeType.EndElement ' move up tree 37 treeNode = treeNode.Parent 38 Case XmlNodeType.Element ' add element name and traverse tree 39 ' determine whether element contains information 40 If Not reader.IsEmptyElement Then 41 newNode.Text = reader.Name ' assign node text 42 treeNode.Nodes.Add(newNode) ' add newNode as child 43 treeNode = newNode ' set treeNode to last child 44 Else ' do not traverse empty elements 45 ' assign NodeType string to newNode and add it to tree 46 newNode.Text = reader.NodeType.ToString() 47 treeNode.Nodes.Add(newNode) 48 End If 49 Case Else ' all other types, display node type 50 newNode.Text = reader.NodeType.ToString() 51 treeNode.Nodes.Add(newNode) 52 End Select 53 54 newNode = New TreeNode() 55 End While 56 57 ' update TreeView control 58 treeXml.ExpandAll() ' expand tree nodes in TreeView 59 treeXml.Refresh() ' force TreeView to update 60 End Sub ' BuildTree 61 End Class ' FrmXmlDom | Figure 19.24. Business letter marked up as XML. 1 <?xml version = "1.0"?> 2 <!-- Fig. 19.24: letter.xml --> 3 <!-- Business letter formatted with XML --> 4 5 <letter> 6 <contact type = "sender"> 7 <name> Jane Doe</name> 8 <address1> Box 12345</address1> 9 <address2> 15 Any Ave.</address2> 10 <city> Othertown</city> 11 <state> Otherstate</state> 12 <zip> 67890</zip> 13 <phone> 555-4321 </phone> 14 <flag gender = "F" /> 15 </contact> 16 17 <contact type = "receiver"> 18 <name> John Doe</name> 19 <address1> 123 Main St.</address1> 20 <address2></address2> 21 <city> Anytown</city> 22 <state> Anystate</state> 23 <zip> 12345</zip> 24 <phone> 555-1234</phone> 25 <flag gender = "M" /> 26 </contact> 27 28 <salutation> Dear Sir: </salutation> 29 30 <paragraph> It is our privilege to inform you about our new database 31 managed with XML. This new system allows you to reduce the 32 load on your inventory list server by having the client machine 33 perform the work of sorting and filtering the data. 34 </paragraph> 35 36 <paragraph> Please visit our Web site for availability 37 and pricing. 38 </paragraph> 39 40 <closing> Sincerely,</closing> 41 <signature> Ms. Doe</signature> 42 </letter> | In FrmXmlDom's Load event handler (lines 922), lines 1314 create an XmlReaderSettings object and set its IgnoreWhitespace property to TRue so that the insignificant whitespaces in the XML document are ignored. Line 17 then invokes Shared XmlReader method Create to parse and load letter.xml. Line 18 creates the TReeNode tree (declared in line 6). This treeNode is used as a graphical representation of a DOM tree node in the treeView control. Line 19 assigns the XML document's name (i.e., letter.xml) to TRee's Text property. Line 20 calls method Add to add the new TReeNode to the treeView's Nodes collection. Line 21 calls our Private method BuildTree to update the treeView so that it displays the complete DOM tree. Method BuildTree (lines 2560) receives an XmlReader for reading the XML document and a treeNode referencing the current location in the tree (i.e., the TReeNode most recently added to the TReeView control). Line 28 declares TReeNode reference newNode, which will be used for adding new nodes to the TReeView. Lines 3055 iterate through each node in the XML document's DOM tree. The Select Case statement in lines 3252 adds a node to the treeView, based on the XmlReader's current node. When a text node is encountered, the Text property of the new TReeNodenewNodeis assigned the current node's value (line 34). Line 35 adds this TReeNode to treeNode's node list (i.e., adds the node to the treeView control). Line 36 matches an EndElement node type. This Case moves up the tree to the current node's parent because the end of an element has been encountered. Line 37 accesses TReeNode's Parent property to retrieve the node's current parent. Line 38 matches Element node types. Each non-empty Element NodeType (line 40) increases the depth of the tree; thus, we assign the current reader.Name to the newNode's Text property and add the newNode to treeNode's node list (lines 4142). Line 43 assigns the newNode's reference to treeNode to ensure that treeNode refers to the last child TReeNode in the node list. If the current Element node is an empty element (line 44), we assign to the newNode's Text property the string representation of the NodeType (line 46). Next, the newNode is added to the treeNode node list (line 47). The default case (lines 4951) assigns the string representation of the node type to the newNode Text property, then adds the newNode to the TReeNode node list. After the entire DOM tree is processed, the treeNode node list is displayed in the treeView control (lines 5859). treeView method ExpandAll causes all the nodes of the tree to be displayed. treeView method Refresh updates the display to show the newly added treeNodes. Note that while the application is running, clicking nodes (i.e., the + or boxes) in the treeView either expands or collapses them. Locating Data in XML Documents with XPath Although XmlReader includes methods for reading and modifying node values, it is not the most efficient means of locating data in a DOM tree. The Framework Class Library provides class XPathNavigator in the System.Xml.XPath namespace for iterating through node lists that match search criteria, which are written as XPath expressions. Recall that XPath (XML Path Language) provides a syntax for locating specific nodes in XML documents effectively and efficiently. XPath is a string-based language of expressions used by XML and many of its related technologies (such as XSLT, discussed in Section 19.7). Figure 19.25 uses an XPathNavigator to navigate an XML document and uses a treeView control and treeNode objects to display the XML document's structure. In this example, the treeNode node list is updated each time the XPathNavigator is positioned to a new node, rather than displaying the entire DOM tree at once. Nodes are added to and deleted from the treeView to reflect the XPathNavigator's location in the DOM tree. Fig. 19.26 shows the XML document sports.xml that we use in this example. [Note: The versions of sports.xml presented in Fig. 19.26 and Fig. 19.16 are nearly identical. In the current example, we do not want to apply an XSLT, so we omit the processing instruction found in line 2 of Fig. 19.16.] Figure 19.25. XPathNavigator navigating selected nodes. 1 ' Fig. 19.25: FrmPathNavigator.vb 2 ' Demonstrates class XPathNavigator. 3 Imports System.Xml.XPath 4 5 Public Class FrmPathNavigator 6 Private xPath As XPathNavigator ' navigator to traverse document 7 Private document As XPathDocument ' document for use by XPathNavigator 8 Private tree As TreeNode ' TreeNode used by TreeView control 9 10 ' initialize variables and TreeView control 11 Private Sub FrmPathNavigator_Load(ByVal sender As Object, _ 12 ByVal e As EventArgs) Handles MyBase.Load 13 document = New XPathDocument("sports.xml") ' load XML document 14 xPath = document.CreateNavigator() ' create navigator 15 tree = New TreeNode() ' create root node for TreeNodes 16 17 tree.Text = xPath.NodeType.ToString() ' root 18 treePath.Nodes.Add(tree) ' add tree 19 20 ' update TreeView control 21 treePath.ExpandAll() ' expand tree node in TreeView 22 treePath.Refresh() ' force TreeView update 23 treePath.SelectedNode = tree ' highlight root 24 End Sub ' FrmPathNavigator_Load 25 26 ' process btnSelect_Click event 27 Private Sub btnSelect_Click(ByVal sender As Object, _ 28 ByVal e As EventArgs) Handles btnSelect.Click 29 Dim iterator As XPathNodeIterator ' enables node iteration 30 31 Try ' get specified node from ComboBox 32 iterator = xPath.Select(cboSelect.Text) ' select specified node 33 DisplayIterator(iterator) ' display selection 34 Catch argumentException As XPathException 35 MessageBox.Show(argumentException.Message, "Error", _ 36 MessageBoxButtons.OK, MessageBoxIcon.Error) 37 End Try 38 End Sub ' btnSelect_Click 39 40 ' traverse to first child on btnFirstChild_Click event 41 Private Sub btnFirstChild_Click(ByVal sender As Object, _ 42 ByVal e As EventArgs) Handles btnFirstChild.Click 43 Dim newTreeNode As TreeNode 44 45 ' move to first child 46 If xPath.MoveToFirstChild() Then 47 newTreeNode = New TreeNode() ' create new node 48 49 ' set node's Text property to either navigator's name or value 50 DetermineType(newTreeNode, xPath) 51 tree.Nodes.Add(newTreeNode) ' add nodes to TreeNode node list 52 tree = newTreeNode ' assign tree newTreeNode 53 54 ' update TreeView control 55 treePath.ExpandAll() ' expand node in TreeView 56 treePath.Refresh() ' force TreeView to update 57 treePath.SelectedNode = tree ' highlight root 58 Else ' node has no children 59 MessageBox.Show("Current Node has no children.", _ 60 "", MessageBoxButtons.OK, MessageBoxIcon.Information) 61 End If 62 End Sub ' btnFirstChild_Click 63 64 ' traverse to node's parent on btnParent_Click event 65 Private Sub btnParent_Click(ByVal sender As Object, _ 66 ByVal e As EventArgs) Handles btnParent.Click 67 ' move to parent 68 If xPath.MoveToParent() Then 69 tree = tree.Parent 70 71 ' get number of child nodes, not including sub trees 72 Dim count As Integer = tree.GetNodeCount(False) 73 74 ' remove all children 75 For i As Integer = 0 To count - 1 76 tree.Nodes.Remove(tree.FirstNode) ' remove child node 77 Next 78 79 ' update TreeView control 80 treePath.ExpandAll() ' expand node in TreeView 81 treePath.Refresh() ' force TreeView to update 82 treePath.SelectedNode = tree ' highlight root 83 Else ' if node has no parent (root node) 84 MessageBox.Show("Current node has no parent.", "" , _ 85 MessageBoxButtons.OK, MessageBoxIcon.Information) 86 End If 87 End Sub ' btnParent_Click 88 89 ' find next sibling on btnNext_Click event 90 Private Sub btnNext_Click(ByVal sender As Object, _ 91 ByVal e As EventArgs) Handles btnNext.Click 92 ' declare and initialize two TreeNodes 93 Dim newTreeNode As TreeNode = Nothing 94 Dim newNode As TreeNode = Nothing 95 96 ' move to next sibling 97 If xPath.MoveToNext() Then 98 newTreeNode = tree.Parent ' get parent node 99 newNode = New TreeNode() ' create new node 100 101 ' decide whether to display current node 102 DetermineType(newNode, xPath) 103 newTreeNode.Nodes.Add(newNode) ' add to parent node 104 105 tree = newNode ' set current position for display 106 107 ' update TreeView control 108 treePath.ExpandAll() ' expand node in TreeView 109 treePath.Refresh() ' force TreeView to update 110 treePath.SelectedNode = tree ' highlight root 111 Else ' node has no additional siblings 112 MessageBox.Show("Current node is last sibling.", "", _ 113 MessageBoxButtons.OK, MessageBoxIcon.Information) 114 End If 115 End Sub ' btnNext_Click 116 117 ' get previous sibling on btnPrevious_Click 118 Private Sub btnPrevious_Click(ByVal sender As Object, _ 119 ByVal e As EventArgs) Handles btnPrevious.Click 120 Dim parentTreeNode As TreeNode = Nothing 121 122 ' move to previous sibling 123 If xPath.MoveToPrevious() Then 124 125 parentTreeNode = tree.Parent ' get parent node 126 parentTreeNode.Nodes.Remove(tree) ' delete current node 127 tree = parentTreeNode.LastNode ' move to previous node 128 129 ' update TreeView control 130 treePath.ExpandAll() ' expand tree node in TreeView 131 treePath.Refresh() ' force TreeView to update 132 treePath.SelectedNode = tree ' highlight root 133 Else ' if current node has no previous siblings 134 MessageBox.Show("Current node is first sibling.", "", _ 135 MessageBoxButtons.OK, MessageBoxIcon.Information) 136 End If 137 End Sub ' btnPrevious_Click 138 139 ' print values for XPathNodeIterator 140 Private Sub DisplayIterator(ByVal iterator As XPathNodeIterator) 141 txtSelect.Clear() 142 143 ' display selected node's values 144 While iterator.MoveNext() 145 txtSelect.Text &= iterator.Current.Value.Trim() & vbCrLf 146 End While 147 End Sub ' DisplayIterator 148 149 ' determine if TreeNode should display current node name or value 150 Private Sub DetermineType(ByVal node As TreeNode, _ 151 ByVal xPath As XPathNavigator) 152 153 Select Case xPath.NodeType ' determine NodeType 154 Case XPathNodeType.Element ' if Element, get its name 155 ' get current node name, and remove whitespaces 156 node.Text = xPath.Name.Trim() 157 Case Else ' obtain node values 158 ' get current node value and remove whitespaces 159 node.Text = xPath.Value.Trim() 160 End Select 161 End Sub ' DetermineType 162 End Class ' FrmPathNavigator
(a) 
(b) 
(c) 
(d) 
(e)  | Figure 19.26. XML document that describes various sports. 1 <?xml version = "1.0"?> 2 <!-- Fig. 19.26: sports.xml --> 3 <!-- Sports Database --> 4 5 <sports> 6 <game id = "783"> 7 <name> Cricket</name> 8 9 <paragraph> 10 More popular among commonwealth nations. 11 </paragraph> 12 </game> 13 14 <game id = "239"> 15 <name> Baseball</name> 16 17 <paragraph> 18 More popular in America. 19 </paragraph> 20 </game> 21 22 <game id = "418"> 23 <name> Soccer (Futbol)</name> 24 25 <paragraph> 26 Most popular sport in the world. 27 </paragraph> 28 </game> 29 </sports> | The program of Fig. 19.25 loads XML document sports.xml (Fig. 19.26) into an XPathDocument object by passing the document's file name to the XPathDocument constructor (line 13). Method CreateNavigator (line 14) creates and returns an XPathNavigator reference to the XPathDocument's tree structure. The navigation methods of XPathNavigator are MoveToFirstChild (line 46), MoveToParent (line 68), MoveToNext (line 97) and MoveToPrevious (line 123). Each method performs the action that its name implies. Method MoveToFirstChild moves to the first child of the node referenced by the XPathNavigator, MoveToParent moves to the parent node of the node referenced by the XPathNavigator, MoveToNext moves to the next sibling of the node referenced by the XPathNavigator and MoveToPrevious moves to the previous sibling of the node referenced by the XPathNavigator. Each method returns a Boolean indicating whether the move was successful. Whenever a move operation fails, we display a warning in a MessageBox. Furthermore, each method is called in the event handler of the button that matches its name (e.g., clicking the First Child button in Fig. 19.25(a) triggers btnFirstChild_Click, which calls MoveToFirstChild). Whenever we move forward using XPathNavigator, as with MoveToFirstChild and MoveToNext, nodes are added to the TReeNode node list. The Private method DetermineType (lines 150161) determines whether to assign the Node's Name property or Value property to the treeNode (lines 156 and 159). Whenever MoveToParent is called, all the children of the parent node are removed from the display. Similarly, a call to MoveToPrevious removes the current sibling node. Note that the nodes are removed only from the TReeView, not from the tree representation of the document. The btnSelect_Click event handler (lines 2738) corresponds to the Select button. XPathNavigator method Select (line 32) takes search criteria in the form of either an XPathExpression or a String that represents an XPath expression, and returns as an XPathNodeIterator object any node that matches the search criteria. Figure 19.27 summarizes the XPath expressions provided by this program's combo box. We show the result of some of these expressions in Figs 19.25(b)(d). Figure 19.27. XPath expressions and descriptions.XPath Expression | Description |
|---|
/sports | Matches all sports nodes that are child nodes of the document root node. | /sports/game | Matches all game nodes that are child nodes of sports, which is a child of the document root. | /sports/game/name | Matches all name nodes that are child nodes of game. The game is a child of sports, which is a child of the document root. | /sports/game/paragraph | Matches all paragraph nodes that are child nodes of game. The game is a child of sports, which is a child of the document root. | /sports/game [name='Cricket'] | Matches all game nodes that contain a child element whose name is Cricket. The game is a child of sports, which is a child of the document root. |
Method DisplayIterator (defined in lines 140147) appends the node values from the given XPathNodeIterator to the txtSelect TextBox. Note that we call String method trim to remove unnecessary whitespace. Method MoveNext (line 144) advances to the next node, which property Current (line 145) can access. |