Using XmlDocument

Our handling of XML so far has been forward-only, which is very light on resource usage but isn’t so useful if you need to move around within the XML document. The XmlDocument class is based on the W3C DOM, and it’s the class that you want to use if you need to browse, modify, or create an XML document.

What Is the W3C DOM?

The DOM is a specification for an API that lets programmers manipulate XML held in memory. The DOM specification is language-independent, and bindings are available for many programming languages, including C++. XmlDocument is based on the DOM, with Microsoft extensions.

Because XmlDocument works with XML in memory, it has several advantages and disadvantages when compared with the XmlTextReader forward-only approach.

One advantage is that, in reading the entire document and building a tree in memory, you have access to all the elements and can wander through the document at will. You can also edit the document by changing, adding, or deleting nodes, and you can write the changed document back to disk again. It’s even possible to create an entire XML document from scratch in memory and write it out—serialize it—which is a useful alternative to using XmlTextWriter.

The main disadvantage is that all of an XML document is held in memory at once, so the amount of memory needed by your program is going to be proportional to the size of the XML document you’re working with. Therefore, if you’re working with a very large XML document—or have limited memory—you might not be able to use XmlDocument.

The XmlDocument class has a number of properties, methods, and events, the most important of which are summarized in the following three tables.

Property	Description
Attributes	Gets an XmlAttributeCollection representing the attributes of a node.
ChildNodes	Gets all the child nodes of a node.
DocumentElement	Returns the root element for the document.
DocumentType	Returns the DOCTYPE node, if one is present.
FirstChild, LastChild	Gets the first or last child nodes of a node.
HasChildNodes	Value is true if a node has child nodes.
InnerText	Returns the concatenated values of a node and all its child nodes.
InnerXml	Gets or sets the markup representing the children of the current node.
IsReadOnly	Gets a value indicating whether the current node is read-only.
LocalName	Gets the name of the current node without a namespace prefix.
Name	Gets the fully qualified name of the current node.
NodeType	Gets the type of the current node. The node type will be one of the XmlNodeType values listed in the table on page 409.
OwnerDocument	Gets the XmlDocument to which the current node belongs.
ParentNode	Gets the parent of a node.
PreserveWhitespace	Determines whether white space should be regarded as significant. The default is false.
Value	Gets or sets the value of a node.

Method	Description
AppendChild	Appends a child node to a node
CloneNode	Creates a duplicate of the current node
CreateAttribute	Creates an XmlAttribute object
CreateCDataSection	Creates an XmlCDataSection object
CreateComment	Creates an XmlComment object
CreateDefaultAttribute	Creates a default XmlAttribute object
CreateDocumentType	Creates an XmlDocumentType object
CreateElement	Creates an XmlElement object
CreateEntityReference	Creates an XmlEntityReference object
CreateNavigator	Creates an XPathNavigator for navigating the object and its contents
CreateNode	Creates a plain XmlNode
CreateProcessingInstruction	Creates an XmlProcessingInstruction object
CreateTextNode	Creates an XmlText object
CreateXmlDeclaration	Creates an XmlDeclaration object
GetElementById	Returns an XML element with the specified ID attribute
GetElementsByTagName	Gets a list of descendant nodes matching a name
ImportNode	Imports a node from another document
InsertBefore, InsertAfter	Inserts a node before or after a reference node
Load	Loads XML from a file, a URL, a stream, or an XmlReader object
LoadXml	Loads XML from a string
ReadNode	Creates an XmlNode based on the current position of an XmlReader
RemoveAll	Removes all child nodes and attributes from a node
RemoveChild, ReplaceChild	Removes or replaces a child node
Save	Saves the XML document to a file, a stream, or an XmlWriter
SelectNodes, SelectSingleNode	Select one or more nodes matching an XPath expression
WriteContentTo	Saves all the children of the XmlDocument node to an XmlWriter
WriteTo	Saves the XmlDocument to an XmlWriter

Event	Description
NodeChanged	Fired when the value of a node has been changed
NodeChanging	Fired when the value of a node is about to be changed
NodeInserted	Fired when a node has been inserted
NodeInserting	Fired when a node is about to be inserted
NodeRemoved	Fired when a node has been removed
NodeRemoving	Fired when a node is about to be removed

The XmlNode Class

You’ll notice a lot of references to nodes in the preceding tables. The DOM tree that an XmlDocument object builds in memory is composed of nodes, each of which is an object of a class that inherits from the abstract XmlNode base class. Just about everything in an XML document is represented by a node. For example:

Elements are represented by the XmlElement class.
Attributes are represented by the XmlAttribute class.
The text content of elements is represented by the XmlText class.
Comments are represented by the XmlComment class.

The XmlNode class provides common functionality for all these node types. Because this functionality is so important when working with XmlDocument, I’ve listed the properties and methods of XmlNode in the following two tables.

Property	Description
Attributes	Gets the collection of attributes for the node.
ChildNodes	Gets all the children of the node as an XmlNodeList.
FirstChild, LastChild	Gets a pointer to the first and last children of the node.
HasChildNodes	Value is true if a node has child nodes.
InnerText	Represents the concatenated values of the node and all its children.
InnerXml, OuterXml	InnerXml gets or sets the markup representing the children of the node. OuterXml includes the node and its children.
IsReadOnly	Returns the read-only status of the node.
Item	Gets a child element by name.
Name, LocalName	The name of the node, with or without namespace information.
NextSibling, PreviousSibling	Gets a pointer to the node immediately following or preceding a node.
NodeType	Returns an XmlNodeType value representing the type of the node.
OwnerDocument	Gets a pointer to the XmlDocument that owns this node.
ParentNode	Gets the node’s parent node.
Prefix	Gets or sets the namespace prefix for the node.
Value	Gets or sets the value of the node. What the value represents will depend on the node type.

Method	Description
AppendChild, PrependChild	Adds a child to the end or beginning of a node’s list of child nodes
Clone, CloneNode	Clones a node
CreateNavigator	Creates an XPathNavigator for navigating the object and its contents
GetEnumerator	Returns an enumerator for the collection of child nodes
InsertAfter, InsertBefore	Inserts a node after or before a specified node
Normalize	Normalizes the tree so that there are no adjacent XmlText nodes
RemoveAll	Removes all children and attributes of a node
RemoveChild	Removes a specified child node
ReplaceChild	Replaces a specified child node
SelectNodes	Selects a list of nodes matching an XPath expression
SelectSingleNode	Selects the first node that matches an XPath expression
Supports	Tests whether the underlying DOM implementation supports a particular feature
WriteContentTo	Saves all children of the current node
WriteTo	Saves the current node

Perhaps the most important descendant of XmlNode is XmlElement, which represents an element within a document. This class adds a number of methods to XmlNode, most of which are concerned with getting, setting, and removing attributes.

The following exercise shows you how to use XmlDocument. You’ll write a program that reads the volcano XML file into memory and then inserts a new element into the structure.

Start a new Visual C++ Console Application (.NET) project named CppDom.
Add the two following lines to the top of CppDom.cpp. These lines reference the XML DLL and help you access the namespace members.
```
#using <System.xml.dll> using namespace System::Xml; 
```
You’re going to supply the name of the XML document to read when you run the program from the command line, so change the declaration of the _tmain function to include the command-line argument parameters, as shown here:
```
int _tmain(int argc, char* argv[])
```

Add this code to the start of the _tmain function to check the number of arguments and save the path:

// Check for required arguments if (argc < 2) { Console::WriteLine(S"Usage: CppXmlWriter path"); return -1; } String* path = new String(argv[1]);

Create a new managed class named XmlBuilder, and give it an XmlDocument* as a data member:
```
__gc class XmlBuilder { XmlDocument* doc; };
```
You need a managed class because it will be necessary to pass the XmlDocument pointer around between functions. You could pass the pointer explicitly in the argument list of each function, but it’s better to make it a member of a class so that it can be accessed by all the member functions.
Add a constructor that creates an XmlDocument object, and tell it to load the file that was specified on the command line.
```
public: XmlBuilder(String* path) { // Create the XmlDocument doc = new XmlDocument(); // Load the data doc->Load(path); Console::WriteLine(S"Document loaded"); }
```
Unlike XmlTextReader, the XmlDocument class reads and parses the file when it’s constructed. Note that you’re not catching exceptions here. Something might go wrong when opening or parsing the file, but exceptions are left for the caller to handle.
Add some code to the _tmain function to create an XmlBuilder object. Make sure you are prepared to handle any exceptions that occur.
```
// Create a Builder and get it to read the file try { XmlBuilder* pf = new XmlBuilder(path); } catch(Exception* pe) { Console::WriteLine(pe->Message); }
```
You can try building and running the code at this point. First copy the volcano.xml and geology.dtd files you created earlier into the project folder. If you see the “Document loaded” message displayed when you run the program, you know that the document has been loaded and parsed.

The next step is to access the nodes in the tree. The current XML document contains three volcano elements; what you’ll do is find the second element and insert a new element after it. There are a number of ways in which you could do this, and I’ll just illustrate one method. It isn’t the most efficient way to do the job, but it does show how to use several XmlDocument and XmlNode methods and properties.

Continue working on the CppDom project. Start working with the tree by getting a pointer to its root. Because you’ll use this root several times, add an XmlNode* member to the XmlBuilder class to hold the root, like this:
```
private: XmlNode* root;
```
Add the following code to the constructor to get the root node:
```
// Get the root of the tree root = doc->DocumentElement;
```
DocumentElement returns you the top of the DOM tree. Note that this is not the root element of the XML document, which is one level down.
You also need to get the list of child nodes for the root. Because you’ll be using this list again, add an XmlNodeList* member to the class to hold the list.
```
private: XmlNodeList* xnl; 
```
The following code shows how you can get a list of child nodes and iterate over it. Add this code to the constructor:
```
// get the child node list xnl = doc->ChildNodes; IEnumerator* ie = xnl->GetEnumerator(); while (ie->MoveNext() == true) Console::WriteLine(S"Child: {0}", (dynamic_cast<XmlNode*>(ie->Current))->Name);
```
The ChildNodes property returns a list of child nodes as an XmlNodeList. The XmlNodeList is a typical .NET collection class, which means that you can get an enumerator to iterate over the nodes. The code iterates over the child nodes, printing the name of each. Note that because Current returns an Object*, it has to be cast to an XmlNode* before you can use the Name property.

The IEnumerator interface is part of the System::Collections namespace, so you need to add the following code near the top of the CppDom.cpp file, after the other using directives:

using namespace System::Collections;

If you run this code on the volcanoes.xml file, you should see output similar to the following:

Document loaded Child: xml Child: geology Child: #comment Child: geology

The root of the tree has four child nodes: the XML declaration, the DOCTYPE declaration, a comment, and the root node.

Note

Once you’ve verified the existence of the child nodes, you can remove the lines that declare and use the enumerator because you won’t need them again. Make sure you don’t remove the line that assigns the value to xnl!

Now that you’ve got the root of the tree, you need to find the root element of the XML by using a public class member function named ProcessChildNodes, as shown here:
```
void ProcessChildNodes() { // Declare an enumerator IEnumerator* ie = xnl->GetEnumerator(); while (ie->MoveNext() == true) { // Get a pointer to the node XmlNode* pNode = dynamic_cast<XmlNode*>(ie->Current); // See if it is the root if (pNode->NodeType == XmlNodeType::Element && pNode->Name->Equals(S"geology")) { Console::WriteLine(S" Found the root"); ProcessRoot(pNode); } } }
```
The function creates an enumerator and iterates over the children of the root node. The root XML element will be of type XmlNodeType::Element and will have the name geology. Once we’ve identified that element, the function ProcessRoot is then used to process the children of the root XML element.

Here’s the public ProcessRoot member function:
```
void ProcessRoot(XmlNode* rootNode) { XmlNode* pVolc = dynamic_cast<XmlNode*>(rootNode->ChildNodes->Item(1)); // Create a new volcano element XmlElement* newVolcano = CreateNewVolcano(); // Link it in root->InsertBefore(newVolcano, pVolc); }
```
The function is passed in the root node. I know that the file I’m working with has more than two volcano elements, and I know that I want to insert a new one before the second element. So, I can get a direct reference to the second element by using the Items property on ChildNodes to access a child node by index. In real code, you’d obviously need to put in a lot more checking to make sure you were retrieving the desired node.

Once the node has been retrieved, you call CreateNewVolcano to create a new volcano element. Then you use InsertBefore to insert the new one immediately before the node you just retrieved by index.
Now add the public CreateNewVolcano function, which creates a new volcano element. To save space, I haven’t given the code for creating the whole element, but just enough that you can see it working.
```
XmlElement* CreateNewVolcano() { // Create a new element XmlElement* newElement = doc->CreateElement(S"volcano"); // Set the name attribute XmlAttribute* pAtt = doc->CreateAttribute(S"name"); pAtt->Value = S"Mount St.Helens"; newElement->Attributes->Append(pAtt); // Create the location element XmlElement* locElement = doc->CreateElement(S"location"); XmlText* xt = doc->CreateTextNode(S"Washington State, USA"); locElement->AppendChild(xt); newElement->AppendChild(locElement); return newElement; }
```
The function creates a new XmlElement for the volcano. Note that the node classes—XmlElement, XmlComment, and so on—don’t have public constructors, so you need to create them by calling the appropriate factory method. The name attribute gets appended to the element’s collection of attributes, and then the location element is created with its content. Building DOM trees like this is a process of creating new nodes and appending them to one another.
It would be useful to be able to print out the modified tree, so add a public function named PrintTree to the class, as shown here:
```
void PrintTree() { XmlTextWriter* xtw = new XmlTextWriter(Console::Out); xtw->Formatting = Formatting::Indented; doc->WriteTo(xtw); xtw->Flush(); Console::WriteLine(); }
```
You’ve already seen the use of XmlTextWriter to create XML manually. You can also use it to output XML from a DOM tree, by linking it up to an XmlDocument, as shown in the preceding code.
Add calls to ProcessChildNodes and PrintTree to the _tmain function, and you can build and test the program.
```
try { XmlBuilder* pf = new XmlBuilder(path); pf->ProcessChildNodes();  pf->PrintTree(); } catch(Exception* pe) { Console::WriteLine(pe->Message); }
```
When you run the program, you’ll be able to see that the new node has been added to the tree. Remember that this operation has modified only the DOM tree in memory; the original XML file has not been changed.