Working with DOM Trees


The DOM is a specification for how to store and manipulate XML documents in memory. This differs significantly from the forward-only access just discussed, because for that method only a single node of the XML document is in memory at any one time. Having the entire document in memory has some major advantages and a couple of significant disadvantages compared to forward-only access.

The most important advantage is that because the entire XML document is in memory, you have the ability to access any portion of the XML document at any time. This means you can read, search, write, change, and delete anywhere at any time in the document. Best of all, once you are through, you can dump the XML document back to disk with a single command.

The major disadvantages are that the DOM tree uses up a lot more memory than forward-only access and that there is a slight delay as the DOM tree is loaded. Are these disadvantages significant? In most cases the answer is not really. Most computers have more than enough memory to handle all but the very largest XML documents (and when a document gets that large, the data should probably be in a database anyway). The slight delay is usually masked in the start-up of the application, and for the delay to be noticeable at all, the XML document needs to be quite sizable. (Again, when an XML document gets that large, it should probably be placed in a database.)

The core underlying class of the DOM tree is the abstract class XmlNode. You should be able to get comfortable quickly with XmlNode, as the classes derived from XmlNode have a close resemblance to the node types you worked with in the previous section. As you can see in Table 13-3, every type of node that is part of an XML document inherits from XmlNode. In fact, even the XmlDocument class is inherited from XmlNode.

Table 13-3: Classes Derived from XmlNode

CLASS

DESCRIPTION

XmlAttribute

Represents an attribute

XmlCDataSection

Represents a CDATA section

XmlCharacterData

Provides text manipulation methods that are used by several inherited classes

XmlComment

Represents an XML comment

XmlDataDocument

Provides the ability to store, retrieve, and manipulate data through a relational DataSet

XmlDeclaration

Represents the XML declaration node

XmlDocument

Represents an XML document

XmlDocumentFragment

Represents a fragment or hierarchical branch of the XML document tree

XmlDocumentType

Represents the DTD

XmlElement

Represents an element

XmlEntity

Represents an entity declaration

XmlEntityReference

Represents an entity reference node

XmlLinkedNode

Provides the ability to get the node before and after the current node

XmlNotation

Represents a notation declaration

XmlProcessingInstruction

Represents a processing instruction

XmlSignificantWhitespace

Represents white space between markup in a mixed content mode or white space within an xml:space= 'preserve' scope

XmlText

Represents the text content of an element or attribute

XmlWhitespace

Represents white space in element content

Because it's easier to visualize the XmlNode hierarchy than describe it in text, I've included the following illustration:

click to expand

You use the properties and the methods defined in the XmlNode class to navigate, manipulate, and remove the nodes of DOM tree. Here are some of the more common XmlNode properties:

  • Attributes is an XmlAttributeCollection containing the attributes of the current node.

  • ChildNodes is an XmlNodeList containing all the child nodes of the current node.

  • FirstChild is an XmlNode of the first child of the current node, probably the XML declaration. If there is no first child node, then the value is null.

  • HasChildNodes is a Boolean that is true if the node has any children; otherwise, it is false.

  • InnerText is a String concatenation of the value of the current node and all of its children.

  • InnerXml is a String representing the markup of the children of the current node. Setting this property replaces all the children of the current node.

  • IsReadOnly is a Boolean that is true if the node is read-only; otherwise, it is false.

  • Item is an XmlElement child of the current node specified by name.

  • LastChild is an XmlNode of the last child of the current node.

  • LocalName is a String representing the name of the current node without the namespace prefix.

  • Name is a String representing the qualified name of the current node.

  • NextSibling is the XmlNode with the same parent immediately following the current node. It has a value of null if no subsequent sibling exists.

  • NodeType is an XmlNodeType enum that represents the node type (see Table 13-2) of the current node.

  • OuterXml is a String representing the markup of the current node and of the children of the current node.

  • OwnerDocument is the XmlDocument of which the current node belongs.

  • ParentNode is the XmlNode of the parent of the current node.

  • PreviousSibling is the XmlNode with the same parent immediately before the current node. It has a value of null if no prior sibling exists.

  • Value is a String representing the value of the current node.

As mentioned previously, XmlNode has methods. Here are some of the more common ones:

  • AppendChild() adds a child to the end of the list of children for the current node.

  • CloneNode() creates a duplicate of the current node.

  • CreateAttribute() creates an XmlAttribute.

  • CreateNavigator() creates an XPathNavigator.

  • CreateNode() creates an XmlNode.

  • InsertAfter() inserts a node immediately after the current node.

  • InsertBefore() inserts a node immediately before the current node.

  • PrependChild() adds a child at the beginning of the list of children for the current node.

  • RemoveAll() removes all children and/or attributes for the current node.

  • RemoveChild() removes the specified child node.

  • ReplaceChild() replaces the specified child node.

  • SelectNodes() selects a list of nodes that matches a specified XPath expression.

  • SelectSingleNode() selects the first node that matches a specified XPath expression.

  • WriteContentTo() saves all the children of the XmlDocument to an XmlWriter.

  • WriteTo() saves the XmlDocument to an XmlWriter.

XmlNodes are placed in an XmlNodeList. This list is ordered and supports indexed as well as enumerated access. Any changes that you make to the XmlNodes in the DOM tree are immediately reflected in the XmlNodeList in which the XmlNodes reside. You can find the root of all XmlNodeLists in the DocumentElement property of the XmlDocument class.

The starting point of working with DOM trees is the XmlDocument class. Not only do you use this class to load and save the XML document to and from disk, but you also use it to query the DOM tree and create nodes to be added to the tree. As you might have noticed in Table 13-3, XmlDocument inherits from XmlNode, so the XmlDocument class has all the XmlNode class's properties and methods. Here are some of the more common properties unique to XmlDocument:

  • DocumentElement is an XmlElement representing the root element of the document.

  • DocumentType is an XmlDocumentType containing the DocumentType or DOCTYPE declaration if the document has one.

  • PreserveWhitespace is a Boolean that is true if white space is to be preserved; otherwise, it is false.

As you can see, the XmlDocument class provides quite a bit of additional functionality over the XmlNode class. The following are some of the XmlDocument class's unique methods:

  • CreateCDataSection() creates an XmlCDataSection.

  • CreateComment() creates an XmlComment.

  • CreateDocumentFragment() creates an XmlDocumentFragment.

  • CreateDocumentType() creates an XmlDocumentType.

  • CreateElement() creates an XmlElement.

  • CreateEntityReference() creates an XmlEntityReference.

  • CreateTextNode() creates an XmlText.

  • CreateXmlDeclaration() creates an XmlDeclaration.

  • GetElementById() gets an element based on a specified ID.

  • GetElementsByTagName() gets an XmlNodeList of all elements that match the specified tag.

  • ImportNode() imports a node for another XmlDocument.

  • Load() loads into the XmlDocument a File, Stream, TextReader, or XmlReader.

  • LoadXml() loads into the XmlDocument a String.

  • ReadNode() creates an XmlNode based on the current position of an XmlReader.

  • Save() saves the XmlDocument to a specified filename, Stream, TextWriter, or XmlWriter.

Reading a DOM Tree

You have many different ways of navigating through a DOM tree. You'll start out by using only the basic methods found in XmlDocument, XmlNode, and XmlNodeList. Later you'll look at an easier way of navigating using XPaths.

Because the DOM is stored in a tree in memory, it's a good candidate for navigating via recursion. The example in Listing 13-9 demonstrates an implementation of recursively following the tree branch and dumping the node information it passed along the way. You dump the tree to a ListBox. (The code for the ListBox isn't included. Chapter 9 covers the ListBox.)

Listing 13-9: Reading a DOM Tree Recursively

start example
 void BuildListBox() {     XmlDocument *doc = new XmlDocument();     try     {         XmlTextReader *reader = new XmlTextReader(S"Monsters.xml");         doc->Load(reader);         reader->Close();         XmlNode *node = doc->FirstChild;  // I want the Xml Declaration         // Recursive navigation of the DOM tree         Navigate(node, 0);     }     catch (Exception *e)     {         MessageBox::Show(e->Message, S"Navigate Aborted");     } } void Navigate(XmlNode *node, Int32 depth) {     if (node == 0)         return;     Output->Items->Add(String::Format(S"{0}: Name='{1}' Value='{2}'",         String::Concat(indent(depth), __box(node->NodeType)->ToString()),         node->Name, node->Value));     if (node->Attributes != 0)     {         for (Int32 i = 0; i < node->Attributes->Count; i++)         {             Output->Items->Add(String::Format(                 S"{0}Attribute: Name='{1}' Value='{2}'",                 indent(depth+1),                 node->Attributes->ItemOf[i]->Name,                 node->Attributes->ItemOf[i]->Value));         }     }     Navigate(node->FirstChild, depth+1);     Navigate(node->NextSibling, depth); } 
end example

As I stated before, you process all XML documents within an exception try block because every XML method in the .NET Framework class library can throw an exception.

Before you start reading the DOM tree, you need to load it. First, you create an XmlDocument to hold the tree. You do this using a standard constructor:

 XmlDocument *doc = new XmlDocument(); 

Then you load the XML document into the XmlDocument. It is possible to pass the name of the XML file directly into the Load() method, which I think is a little easier. But, if you do it the following way, make sure you close the file after the load is complete, because the file resource remains open longer than it needs to be. Plus, if you try to write to the file, it will throw an exception because the file is already open.

 XmlTextReader *reader = new XmlTextReader(S"Monsters.xml"); doc->Load(reader); reader->Close(); 

In the previous example, I call the XmlDocument class's FirstChild() method instead of the DocumentElement() method because I want to start reading the XML document at the XML declaration and not the first element of the document.

 XmlNode *node = doc->FirstChild;  // I want the Xml Declaration 

Finally, you call a simple recursive method to navigate the tree. The first thing this method does is check to make sure that you have not already reached the end of the current branch of the tree:

 if (node == 0)     return; 

Then it dumps to the ListBox the current node's type, name, and value. Notice that I use the little trick I mentioned in Chapter 3 to display the enum's (in this case, the NodeType's) String name:

 Output->Items->Add(String::Format(S"{0}: Name='{1}' Value='{2}'",     String::Concat(indent(depth), __box(node->NodeType)->ToString()),     node->Name, node->Value)); 

The method then checks to see if the element has any attributes. If it does, it then iterates through them, dumping each to the ListBox as it goes:

 if (node->Attributes != 0) {     for (Int32 i = 0; i < node->Attributes->Count; i++)     {         Output->Items->Add(String::Format(             S"{0}Attribute: Name='{1}' Value='{2}'",             indent(depth+1),             node->Attributes->ItemOf [i]->Name,             node->Attributes->ItemOf[i]->Value));     } } 

The last thing the method does is call itself to navigate down through its children, and then it calls itself to navigate through its siblings:

 Navigate(node->FirstChild, depth+1); Navigate(node->NextSibling, depth); 

Figure 13-5 shows the resulting ListBox dump for ReadXMLDOM.exe of all the nodes and attributes that make up the monster DOM tree.

click to expand
Figure 13-5: The ListBox dump of the monster DOM tree

Updating a DOM Tree

The process of updating a DOM tree is as simple as finding the correct node and changing the appropriate values. Finally, after all of the changes are made, save the changes.

In Listing 13-10, you continue to recursively navigate the DOM tree of Listing 13-1, but this time you're looking for a Goblin node that was mistakenly given a Dagger. The Goblin was supposed to have a Saber. The trick is that you can just globally change all Daggers to Sabers because the Succubus node also has a Dagger, so you have to verify that it is the Goblin node's Dagger. There are many ways of doing this, and I can think of a couple (better ones) using flags, but the method in Listing 13-10 shows the implementation of the largest number of different methods to find a node (without being redundant).

Listing 13-10: Updating the Monster DOM Tree

start example
 using namespace System; using namespace System::Xml; void Navigate(XmlNode *node) {     if (node == 0)         return;     if (node->Value != 0 && node->Value->Equals(S"Dagger"))     {         if (node->ParentNode->ParentNode->Item[S"Name"]->FirstChild->Value->             Equals(S"Goblin"))         {             node->Value = S"Saber";             node->ParentNode->Attributes->ItemOf[S"Damage"]->Value = S"1d8";         }     }     Navigate(node->FirstChild);     Navigate(node->NextSibling); } Int32 main(void) {     XmlDocument *doc = new XmlDocument();     try     {         doc->Load(S"Monsters.xml");         XmlNode *root = doc->DocumentElement;         // Recursive navigation of the DOM tree         Navigate(root);         doc->Save(S"New_Monsters.xml");     }     catch (Exception *e)     {         Console::WriteLine(S"Navigate Aborted: {0}", e->Message );     } } 
end example

The main method looks familiar enough. The main difference is that you will write out the DOM tree when you are done to make sure the change actually occurred:

 doc->Save(S"New_Monsters.xml"); 

The recursive function is pretty similar. Let's look closely at the if statement that does the update. First, you make sure the node has a value, as not all nodes have one. Calling the Equals() method on a node that doesn't have a value will cause an exception to be thrown:

 if (node->Value != 0 && node->Value->Equals(S"Dagger")) 

So you now know that you have a node with a value of Dagger. How do you check to make sure it belongs to a Goblin? You do this by checking the current node's grandparent's Name element for the value of Goblin:

 if (node->ParentNode->ParentNode->Item[S"Name"]->FirstChild->Value-> Equals(S"Goblin")) 

What I really want you to focus on in the preceding statement is Item[S"Name"]. The Item property of an XmlNode contains a collection of its child elements. This collection can be either an indexed property (as previously) or an array property where it is passed the numeric index of the child: Item[0].

To change the value of a node, you simply assign it a new value:

 node->Value = S"Saber"; 

The damage done by a Saber differs from a Dagger, so you need to change the Damage attribute of the Weapon node. Notice that it is the Weapon node, not the Saber node. The Saber node is an XmlText node. You need to navigate to the Saber node's parent first and then to its attributes. Notice that the attribute has a similar property to XmlNode, but it is called ItemOf and not Item.

 node->ParentNode->Attributes->ItemOf[S"Damage"]->Value = S"1d8"; 

Figure 13-6 shows the new copy of the XML monster file created by UpdateXMLDOM.exe in the Visual Studio .NET editor.

click to expand
Figure 13-6: The updated XML monster file

Writing XmlNodes in a DOM Tree

You can truly get a good understanding of how a DOM tree is stored in memory by building a few XmlNodes manually. The basic process is to create a node and then append all its children on it. Then for each of the children, append all their children, and so on.

The last example (see Listing 13-11) before you get to XPaths shows how to add a new monster (a Skeleton) after the Goblin.

Listing 13-11: Adding a New Monster to the DOM Tree

start example
 using namespace System; using namespace System::Xml; XmlElement *CreateMonster(XmlDocument *doc) {     XmlElement *skeleton = doc->CreateElement(S"Monster");     // <Name>Skeleton</Name>     XmlElement *name = doc->CreateElement(S"Name");     name->AppendChild(doc->CreateTextNode(S"Skeleton"));     skeleton->AppendChild(name);     // <HitDice Dice="1/2 d12" Default="3" />     XmlElement *hitdice = doc->CreateElement(S"HitDice");     XmlAttribute *att = doc->CreateAttribute(S"Dice");     att->Value = S"1/2 d12";     hitdice->Attributes->Append(att);     att = doc->CreateAttribute(S"Default");     att->Value = S"3";     hitdice->Attributes->Append(att);     skeleton->AppendChild(hitdice);     // <Weapon Number="2" Damage="1d3-1">Claw</Weapon>     XmlElement *weapon = doc->CreateElement(S"Weapon");     att = doc->CreateAttribute(S"Number");     att->Value = S"2";     weapon->Attributes->Append(att);     att = doc->CreateAttribute(S"Damage");     att->Value = S"1d3-1";     weapon->Attributes->Append(att);     weapon->AppendChild(doc->CreateTextNode(S"Claw"));     skeleton->AppendChild(weapon);     return skeleton; } Int32 main(void) {     XmlDocument *doc = new XmlDocument();     try     {         doc->Load(S"Monsters.xml");         XmlNode *root = doc->DocumentElement;         // Skip comment and goblin         XmlNode *child = root->FirstChild->NextSibling;         // Insert new monster         root->InsertAfter(CreateMonster(doc), child);         doc->Save(S"New_Monsters.xml");     }     catch (Exception *e)     {         Console::WriteLine(S"Navigate Aborted: {0}", e->Message );     } } 
end example

The method of inserting XmlNodes, though not difficult, needs a quick explanation. I first wondered why you needed to pass a pointer to the XmlNode that you are going to place on the new XmlNode before or after. Why not just call the Insert method for this node instead, like this:

 childNode->InsertBefore(newNode);  // wrong childNode->InsertAfter(newNode);   // wrong 

Then I realized that I am not actually inserting after the child node. Instead I am inserting into the parent node after or before the child node. Thus the correct syntax:

 parentNode->InsertBefore(newNode, childNode); parentNode->InsertAfter(newNode, childNode); 

Or as in the previous code:

 root->InsertAfter(CreateMonster(doc), child); 

Like the writing methods of forward-only only access, it seems there is a lot of effort needed to create such a simple XmlElement. You need to remember that the correct way to do this is without hard-coding, thus making it reusable.

The first issue with creating nodes dynamically is that you need access to the XmlDocument, as all the XmlNode creation methods are found in it. You have two choices: pass XmlDocument as a parameter as was done in this example or make XmlDocument a private member variable that all classes can access.

Now that you have access to the creation methods, it is a simple matter to create the element:

 XmlElement *skeleton = doc->CreateElement(S"Monster"); 

Then you create and append any of its child elements:

 XmlElement *weapon = doc->CreateElement(S"Weapon"); skeleton->AppendChild(weapon); 

Of course, to create these child elements, you need to create and append the child elements attribute(s) and body text (which might need to create grandchildren nodes and so on):

 XmlAttribute *att = doc->CreateAttribute(S"Number"); att->Value = S"2"; weapon->Attributes->Append(att); att = doc->CreateAttribute(S"Damage"); att->Value = S"1d3-1"; weapon->Attributes->Append(att); weapon->AppendChild(doc->CreateTextNode(S"Claw")); 

Figure 13-7 shows the resulting new copy of the XML monster file from WriteXMLDOM.exe with the new inserted monster in the Visual Studio .NET editor.

click to expand
Figure 13-7: The XML monster file with a new monster




Managed C++ and. NET Development
Managed C++ and .NET Development: Visual Studio .NET 2003 Edition
ISBN: 1590590333
EAN: 2147483647
Year: 2005
Pages: 169

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net