19.8. (Optional) Document Object Model (DOM)Although an XML document is a text file, retrieving data from the document using traditional sequential file processing techniques is neither practical nor efficient, especially for adding and removing elements dynamically. Upon successfully parsing a document, some XML parsers store document data as tree structures in memory. Figure 19.21 illustrates the tree structure for the root element of the document article.xml discussed in Fig. 19.2. This hierarchical tree structure is called a Document Object Model (DOM) tree, and an XML parser that creates this type of structure is known as a DOM parser. Each element name (e.g., article, date, firstName) is represented by a node. A node that contains other nodes (called child nodes or children) is called a parent node (e.g., author). A parent node can have many children, but a child node can have only one parent node. Nodes that are peers (e.g., firstName and lastName) are called sibling nodes. A node's descendant nodes include its children, its children's children and so on. A node's ancestor nodes include its parent, its parent's parent and so on. Figure 19.21. Tree structure for the document article.xml of Fig. 19.2.The DOM tree has a single root node, which contains all the other nodes in the document. For example, the root node of the DOM tree that represents article.xml (Fig. 19.2) contains a node for the XML declaration (line 1), two nodes for the comments (lines 23) and a node for the XML document's root element article (line 5). Classes for creating, reading and manipulating XML documents are located in the FCL namespace System.Xml. This namespace also contains additional namespaces that provide other XML-related operations. Reading an XML Document with an XmlReaderIn this section, we present several examples that use DOM trees. Our first example, the program in Fig. 19.22, loads the XML document presented in Fig. 19.2 and displays its data in a text box. This example uses class XmlReader to iterate through each node in the XML document. Figure 19.22. XmlReader iterating through an XML document.
Line 3 imports the System.Xml namespace, which contains the XML classes used in this example. Class XmlReader is a MustInherit class that defines the interface for reading XML documents. We cannot create an XmlReader object directly. Instead, we must invoke XmlReader's Shared method Create to obtain an XmlReader reference (line 11). Before doing so, however, we must prepare an XmlReaderSettings object that specifies how we would like the XmlReader to behave (line 10). In this example, we use the default settings of the properties of an XmlReaderSettings object. Later, you will learn how to set certain properties of the XmlReaderSettings class to instruct the XmlReader to perform validation, which it does not do by default. The Shared method Create receives as arguments the name of the XML document to read and an XmlReaderSettings object. In this example the XML document article.xml (Fig. 19.2) is opened when method Create is invoked in line 11. Once the XmlReader is created, the XML document's contents can be read programmatically. Method Read of XmlReader reads one node from the DOM tree. By calling this method in the loop condition (line 15), reader reads all the document nodes. The Select Case statement (lines 1640) processes each node. Either the Name property (lines 20, 34 and 38), which contains the node's name, or the Value property (lines 28 and 31), which contains the node's data, is formatted and concatenated to the String assigned to the TextBox's Text property. The XmlReader's NodeType property specifies whether the node is an element, comment, text, XML declaration or end element. Note that each Case specifies a node type using XmlNodeType enumeration constants. For example, XmlNodeType.Element (line 17) indicates the start tag of an element. The displayed output emphasizes the structure of the XML document. Variable depth (line 13) maintains the number of tab characters to indent each element. We increment the depth each time the program encounters an Element and decrement it each time the program encounters an EndElement or empty element. We use a similar technique in the next example to emphasize the tree structure of the XML document being displayed. Displaying a DOM Tree Graphically in a treeView ControlXmlReaders do not provide features for displaying their content graphically. In this example, we display an XML document's contents using a TReeView control. We use class TReeNode to represent each node in the tree. Class treeView and class treeNode are part of the System.Windows.Forms namespace. TReeNodes are added to the TReeView to emphasize the structure of the XML document. The program in Fig. 19.23 demonstrates how to manipulate a DOM tree programmatically to display it graphically in a treeView control. The GUI for this application contains a treeView control named treeXML (declared in FrmXmlDom.Designer.vb). The application loads letter.xml (Fig. 19.24) into an XmlReader (line 17), then displays the document's tree structure in the TReeView control. [Note: The version of letter.xml in Fig. 19.24 is nearly identical to the one in Fig. 19.4, except that Fig. 19.24 does not reference a DTD as line 5 of Fig. 19.4 does.] Figure 19.23. DOM structure of an XML document displayed in a TreeView.
Figure 19.24. Business letter marked up as XML.
In FrmXmlDom's Load event handler (lines 922), lines 1314 create an XmlReaderSettings object and set its IgnoreWhitespace property to TRue so that the insignificant whitespaces in the XML document are ignored. Line 17 then invokes Shared XmlReader method Create to parse and load letter.xml. Line 18 creates the TReeNode tree (declared in line 6). This treeNode is used as a graphical representation of a DOM tree node in the treeView control. Line 19 assigns the XML document's name (i.e., letter.xml) to TRee's Text property. Line 20 calls method Add to add the new TReeNode to the treeView's Nodes collection. Line 21 calls our Private method BuildTree to update the treeView so that it displays the complete DOM tree. Method BuildTree (lines 2560) receives an XmlReader for reading the XML document and a treeNode referencing the current location in the tree (i.e., the TReeNode most recently added to the TReeView control). Line 28 declares TReeNode reference newNode, which will be used for adding new nodes to the TReeView. Lines 3055 iterate through each node in the XML document's DOM tree. The Select Case statement in lines 3252 adds a node to the treeView, based on the XmlReader's current node. When a text node is encountered, the Text property of the new TReeNodenewNodeis assigned the current node's value (line 34). Line 35 adds this TReeNode to treeNode's node list (i.e., adds the node to the treeView control). Line 36 matches an EndElement node type. This Case moves up the tree to the current node's parent because the end of an element has been encountered. Line 37 accesses TReeNode's Parent property to retrieve the node's current parent. Line 38 matches Element node types. Each non-empty Element NodeType (line 40) increases the depth of the tree; thus, we assign the current reader.Name to the newNode's Text property and add the newNode to treeNode's node list (lines 4142). Line 43 assigns the newNode's reference to treeNode to ensure that treeNode refers to the last child TReeNode in the node list. If the current Element node is an empty element (line 44), we assign to the newNode's Text property the string representation of the NodeType (line 46). Next, the newNode is added to the treeNode node list (line 47). The default case (lines 4951) assigns the string representation of the node type to the newNode Text property, then adds the newNode to the TReeNode node list. After the entire DOM tree is processed, the treeNode node list is displayed in the treeView control (lines 5859). treeView method ExpandAll causes all the nodes of the tree to be displayed. treeView method Refresh updates the display to show the newly added treeNodes. Note that while the application is running, clicking nodes (i.e., the + or boxes) in the treeView either expands or collapses them. Locating Data in XML Documents with XPathAlthough XmlReader includes methods for reading and modifying node values, it is not the most efficient means of locating data in a DOM tree. The Framework Class Library provides class XPathNavigator in the System.Xml.XPath namespace for iterating through node lists that match search criteria, which are written as XPath expressions. Recall that XPath (XML Path Language) provides a syntax for locating specific nodes in XML documents effectively and efficiently. XPath is a string-based language of expressions used by XML and many of its related technologies (such as XSLT, discussed in Section 19.7). Figure 19.25 uses an XPathNavigator to navigate an XML document and uses a treeView control and treeNode objects to display the XML document's structure. In this example, the treeNode node list is updated each time the XPathNavigator is positioned to a new node, rather than displaying the entire DOM tree at once. Nodes are added to and deleted from the treeView to reflect the XPathNavigator's location in the DOM tree. Fig. 19.26 shows the XML document sports.xml that we use in this example. [Note: The versions of sports.xml presented in Fig. 19.26 and Fig. 19.16 are nearly identical. In the current example, we do not want to apply an XSLT, so we omit the processing instruction found in line 2 of Fig. 19.16.] Figure 19.25. XPathNavigator navigating selected nodes.
Figure 19.26. XML document that describes various sports.
The program of Fig. 19.25 loads XML document sports.xml (Fig. 19.26) into an XPathDocument object by passing the document's file name to the XPathDocument constructor (line 13). Method CreateNavigator (line 14) creates and returns an XPathNavigator reference to the XPathDocument's tree structure. The navigation methods of XPathNavigator are MoveToFirstChild (line 46), MoveToParent (line 68), MoveToNext (line 97) and MoveToPrevious (line 123). Each method performs the action that its name implies. Method MoveToFirstChild moves to the first child of the node referenced by the XPathNavigator, MoveToParent moves to the parent node of the node referenced by the XPathNavigator, MoveToNext moves to the next sibling of the node referenced by the XPathNavigator and MoveToPrevious moves to the previous sibling of the node referenced by the XPathNavigator. Each method returns a Boolean indicating whether the move was successful. Whenever a move operation fails, we display a warning in a MessageBox. Furthermore, each method is called in the event handler of the button that matches its name (e.g., clicking the First Child button in Fig. 19.25(a) triggers btnFirstChild_Click, which calls MoveToFirstChild). Whenever we move forward using XPathNavigator, as with MoveToFirstChild and MoveToNext, nodes are added to the TReeNode node list. The Private method DetermineType (lines 150161) determines whether to assign the Node's Name property or Value property to the treeNode (lines 156 and 159). Whenever MoveToParent is called, all the children of the parent node are removed from the display. Similarly, a call to MoveToPrevious removes the current sibling node. Note that the nodes are removed only from the TReeView, not from the tree representation of the document. The btnSelect_Click event handler (lines 2738) corresponds to the Select button. XPathNavigator method Select (line 32) takes search criteria in the form of either an XPathExpression or a String that represents an XPath expression, and returns as an XPathNodeIterator object any node that matches the search criteria. Figure 19.27 summarizes the XPath expressions provided by this program's combo box. We show the result of some of these expressions in Figs 19.25(b)(d).
Method DisplayIterator (defined in lines 140147) appends the node values from the given XPathNodeIterator to the txtSelect TextBox. Note that we call String method trim to remove unnecessary whitespace. Method MoveNext (line 144) advances to the next node, which property Current (line 145) can access. |