|
Our handling of XML so far has been forward-only, which is very light on resource usage but isn’t so useful if you need to move around within the XML document. The XmlDocument class is based on the W3C DOM, and it’s the class that you want to use if you need to browse, modify, or create an XML document.
The DOM is a specification for an API that lets programmers manipulate XML held in memory. The DOM specification is language-independent, and bindings are available for many programming languages, including C++. XmlDocument is based on the DOM, with Microsoft extensions.
Because XmlDocument works with XML in memory, it has several advantages and disadvantages when compared with the XmlTextReader forward-only approach.
One advantage is that, in reading the entire document and building a tree in memory, you have access to all the elements and can wander through the document at will. You can also edit the document by changing, adding, or deleting nodes, and you can write the changed document back to disk again. It’s even possible to create an entire XML document from scratch in memory and write it out—serialize it—which is a useful alternative to using XmlTextWriter.
The main disadvantage is that all of an XML document is held in memory at once, so the amount of memory needed by your program is going to be proportional to the size of the XML document you’re working with. Therefore, if you’re working with a very large XML document—or have limited memory—you might not be able to use XmlDocument.
The XmlDocument class has a number of properties, methods, and events, the most important of which are summarized in the following three tables.
Property | Description |
---|---|
Attributes | Gets an XmlAttributeCollection representing the attributes of a node. |
ChildNodes | Gets all the child nodes of a node. |
DocumentElement | Returns the root element for the document. |
DocumentType | Returns the DOCTYPE node, if one is present. |
FirstChild, LastChild | Gets the first or last child nodes of a node. |
HasChildNodes | Value is true if a node has child nodes. |
InnerText | Returns the concatenated values of a node and all its child nodes. |
InnerXml | Gets or sets the markup representing the children of the current node. |
IsReadOnly | Gets a value indicating whether the current node is read-only. |
LocalName | Gets the name of the current node without a namespace prefix. |
Name | Gets the fully qualified name of the current node. |
NodeType | Gets the type of the current node. The node type will be one of the XmlNodeType values listed in the table on page 409. |
OwnerDocument | Gets the XmlDocument to which the current node belongs. |
ParentNode | Gets the parent of a node. |
PreserveWhitespace | Determines whether white space should be regarded as significant. The default is false. |
Value | Gets or sets the value of a node. |
Method | Description |
AppendChild | Appends a child node to a node |
CloneNode | Creates a duplicate of the current node |
CreateAttribute | Creates an XmlAttribute object |
CreateCDataSection | Creates an XmlCDataSection object |
CreateComment | Creates an XmlComment object |
CreateDefaultAttribute | Creates a default XmlAttribute object |
CreateDocumentType | Creates an XmlDocumentType object |
CreateElement | Creates an XmlElement object |
CreateEntityReference | Creates an XmlEntityReference object |
CreateNavigator | Creates an XPathNavigator for navigating the object and its contents |
CreateNode | Creates a plain XmlNode |
CreateProcessingInstruction | Creates an XmlProcessingInstruction object |
CreateTextNode | Creates an XmlText object |
CreateXmlDeclaration | Creates an XmlDeclaration object |
GetElementById | Returns an XML element with the specified ID attribute |
GetElementsByTagName | Gets a list of descendant nodes matching a name |
ImportNode | Imports a node from another document |
InsertBefore, InsertAfter | Inserts a node before or after a reference node |
Load | Loads XML from a file, a URL, a stream, or an XmlReader object |
LoadXml | Loads XML from a string |
ReadNode | Creates an XmlNode based on the current position of an XmlReader |
RemoveAll | Removes all child nodes and attributes from a node |
RemoveChild, ReplaceChild | Removes or replaces a child node |
Save | Saves the XML document to a file, a stream, or an XmlWriter |
SelectNodes, SelectSingleNode | Select one or more nodes matching an XPath expression |
WriteContentTo | Saves all the children of the XmlDocument node to an XmlWriter |
WriteTo | Saves the XmlDocument to an XmlWriter |
Event | Description |
---|---|
NodeChanged | Fired when the value of a node has been changed |
NodeChanging | Fired when the value of a node is about to be changed |
NodeInserted | Fired when a node has been inserted |
NodeInserting | Fired when a node is about to be inserted |
NodeRemoved | Fired when a node has been removed |
NodeRemoving | Fired when a node is about to be removed |
You’ll notice a lot of references to nodes in the preceding tables. The DOM tree that an XmlDocument object builds in memory is composed of nodes, each of which is an object of a class that inherits from the abstract XmlNode base class. Just about everything in an XML document is represented by a node. For example:
Elements are represented by the XmlElement class.
Attributes are represented by the XmlAttribute class.
The text content of elements is represented by the XmlText class.
Comments are represented by the XmlComment class.
The XmlNode class provides common functionality for all these node types. Because this functionality is so important when working with XmlDocument, I’ve listed the properties and methods of XmlNode in the following two tables.
Property | Description |
---|---|
Attributes | Gets the collection of attributes for the node. |
ChildNodes | Gets all the children of the node as an XmlNodeList. |
FirstChild, LastChild | Gets a pointer to the first and last children of the node. |
HasChildNodes | Value is true if a node has child nodes. |
InnerText | Represents the concatenated values of the node and all its children. |
InnerXml, OuterXml | InnerXml gets or sets the markup representing the children of the node. OuterXml includes the node and its children. |
IsReadOnly | Returns the read-only status of the node. |
Item | Gets a child element by name. |
Name, LocalName | The name of the node, with or without namespace information. |
NextSibling, PreviousSibling | Gets a pointer to the node immediately following or preceding a node. |
NodeType | Returns an XmlNodeType value representing the type of the node. |
OwnerDocument | Gets a pointer to the XmlDocument that owns this node. |
ParentNode | Gets the node’s parent node. |
Prefix | Gets or sets the namespace prefix for the node. |
Value | Gets or sets the value of the node. What the value represents will depend on the node type. |
Method | Description |
AppendChild, PrependChild | Adds a child to the end or beginning of a node’s list of child nodes |
Clone, CloneNode | Clones a node |
CreateNavigator | Creates an XPathNavigator for navigating the object and its contents |
GetEnumerator | Returns an enumerator for the collection of child nodes |
InsertAfter, InsertBefore | Inserts a node after or before a specified node |
Normalize | Normalizes the tree so that there are no adjacent XmlText nodes |
RemoveAll | Removes all children and attributes of a node |
RemoveChild | Removes a specified child node |
ReplaceChild | Replaces a specified child node |
SelectNodes | Selects a list of nodes matching an XPath |
SelectSingleNode | Selects the first node that matches an XPath expression |
Supports | Tests whether the underlying DOM implementation supports a particular feature |
WriteContentTo | Saves all children of the current node |
WriteTo | Saves the current node |
Perhaps the most important descendant of XmlNode is XmlElement, which represents an element within a document. This class adds a number of methods to XmlNode, most of which are concerned with getting, setting, and removing attributes.
The following exercise shows you how to use XmlDocument. You’ll write a program that reads the volcano XML file into memory and then inserts a new element into the structure.
Start a new Visual C++ Console Application (.NET) project named CppDom.
Add the two following lines to the top of CppDom.cpp. These lines reference the XML DLL and help you access the namespace members.
#using <System.xml.dll> using namespace System::Xml;
You’re going to supply the name of the XML document to read when you run the program from the command line, so change the declaration of the _tmain function to include the command-line argument parameters, as shown here:
int _tmain(int argc, char* argv[])
Add this code to the start of the _tmain function to check the number of arguments and save the path:
// Check for required arguments if (argc < 2) { Console::WriteLine(S"Usage: CppXmlWriter path"); return -1; } String* path = new String(argv[1]);
Create a new managed class named XmlBuilder, and give it an XmlDocument* as a data member:
__gc class XmlBuilder { XmlDocument* doc; };
You need a managed class because it will be necessary to pass the XmlDocument pointer around between functions. You could pass the pointer explicitly in the argument list of each function, but it’s better to make it a member of a class so that it can be accessed by all the member functions.
Add a constructor that creates an XmlDocument object, and tell it to load the file that was specified on the command line.
public: XmlBuilder(String* path) { // Create the XmlDocument doc = new XmlDocument(); // Load the data doc->Load(path); Console::WriteLine(S"Document loaded"); }
Unlike XmlTextReader, the XmlDocument class reads and parses the file when it’s constructed. Note that you’re not catching exceptions here. Something might go wrong when opening or parsing the file, but exceptions are left for the caller to handle.
Add some code to the _tmain function to create an XmlBuilder object. Make sure you are prepared to handle any exceptions that occur.
// Create a Builder and get it to read the file try { XmlBuilder* pf = new XmlBuilder(path); } catch(Exception* pe) { Console::WriteLine(pe->Message); }
You can try building and running the code at this point. First copy the volcano.xml and geology.dtd files you created earlier into the project folder. If you see the “Document loaded” message displayed when you run the program, you know that the document has been loaded and parsed.
The next step is to access the nodes in the tree. The current XML document contains three volcano elements; what you’ll do is find the second element and insert a new element after it. There are a number of ways in which you could do this, and I’ll just illustrate one method. It isn’t the most efficient way to do the job, but it does show how to use several XmlDocument and XmlNode methods and properties.
Continue working on the CppDom project. Start working with the tree by getting a pointer to its root. Because you’ll use this root several times, add an XmlNode* member to the XmlBuilder class to hold the root, like this:
private: XmlNode* root;
Add the following code to the constructor to get the root node:
// Get the root of the tree root = doc->DocumentElement;
DocumentElement returns you the top of the DOM tree. Note that this is not the root element of the XML document, which is one level down.
You also need to get the list of child nodes for the root. Because you’ll be using this list again, add an XmlNodeList* member to the class to hold the list.
private: XmlNodeList* xnl;
The following code shows how you can get a list of child nodes and iterate over it. Add this code to the constructor:
// get the child node list xnl = doc->ChildNodes; IEnumerator* ie = xnl->GetEnumerator(); while (ie->MoveNext() == true) Console::WriteLine(S"Child: {0}", (dynamic_cast<XmlNode*>(ie->Current))->Name);
The ChildNodes property returns a list of child nodes as an XmlNodeList. The XmlNodeList is a typical .NET collection class, which means that you can get an enumerator to iterate over the nodes. The code iterates over the child nodes, printing the name of each. Note that because Current returns an Object*, it has to be cast to an XmlNode* before you can use the Name property.
The IEnumerator interface is part of the System::Collections namespace, so you need to add the following code near the top of the CppDom.cpp file, after the other using directives:
using namespace System::Collections;
If you run this code on the volcanoes.xml file, you should see output similar to the following:
Document loaded Child: xml Child: geology Child: #comment Child: geology
The root of the tree has four child nodes: the XML declaration, the DOCTYPE declaration, a comment, and the root node.
Note | Once you’ve verified the existence of the child nodes, you can remove the lines that declare and use the enumerator because you won’t need them again. Make sure you don’t remove the line that assigns the value to xnl! |
Now that you’ve got the root of the tree, you need to find the root element of the XML by using a public class member function named ProcessChildNodes, as shown here:
void ProcessChildNodes() { // Declare an enumerator IEnumerator* ie = xnl->GetEnumerator(); while (ie->MoveNext() == true) { // Get a pointer to the node XmlNode* pNode = dynamic_cast<XmlNode*>(ie->Current); // See if it is the root if (pNode->NodeType == XmlNodeType::Element && pNode->Name->Equals(S"geology")) { Console::WriteLine(S" Found the root"); ProcessRoot(pNode); } } }
The function creates an enumerator and iterates over the children of the root node. The root XML element will be of type XmlNodeType::Element and will have the name geology. Once we’ve identified that element, the function ProcessRoot is then used to process the children of the root XML element.
Here’s the public ProcessRoot member function:
void ProcessRoot(XmlNode* rootNode) { XmlNode* pVolc = dynamic_cast<XmlNode*>(rootNode->ChildNodes->Item(1)); // Create a new volcano element XmlElement* newVolcano = CreateNewVolcano(); // Link it in root->InsertBefore(newVolcano, pVolc); }
The function is passed in the root node. I know that the file I’m working with has more than two volcano elements, and I know that I want to insert a new one before the second element. So, I can get a direct reference to the second element by using the Items property on ChildNodes to access a child node by index. In real code, you’d obviously need to put in a lot more checking to make sure you were retrieving the desired node.
Once the node has been retrieved, you call CreateNewVolcano to create a new volcano element. Then you use InsertBefore to insert the new one immediately before the node you just retrieved by index.
Now add the public CreateNewVolcano function, which creates a new volcano element. To save space, I haven’t given the code for creating the whole element, but just enough that you can see it working.
XmlElement* CreateNewVolcano() { // Create a new element XmlElement* newElement = doc->CreateElement(S"volcano"); // Set the name attribute XmlAttribute* pAtt = doc->CreateAttribute(S"name"); pAtt->Value = S"Mount St.Helens"; newElement->Attributes->Append(pAtt); // Create the location element XmlElement* locElement = doc->CreateElement(S"location"); XmlText* xt = doc->CreateTextNode(S"Washington State, USA"); locElement->AppendChild(xt); newElement->AppendChild(locElement); return newElement; }
The function creates a new XmlElement for the volcano. Note that the node classes—XmlElement, XmlComment, and so on—don’t have public constructors, so you need to create them by calling the appropriate factory method. The name attribute gets appended to the element’s collection of attributes, and then the location element is created with its content. Building DOM trees like this is a process of creating new nodes and appending them to one another.
It would be useful to be able to print out the modified tree, so add a public function named PrintTree to the class, as shown here:
void PrintTree() { XmlTextWriter* xtw = new XmlTextWriter(Console::Out); xtw->Formatting = Formatting::Indented; doc->WriteTo(xtw); xtw->Flush(); Console::WriteLine(); }
You’ve already seen the use of XmlTextWriter to create XML manually. You can also use it to output XML from a DOM tree, by linking it up to an XmlDocument, as shown in the preceding code.
Add calls to ProcessChildNodes and PrintTree to the _tmain function, and you can build and test the program.
try { XmlBuilder* pf = new XmlBuilder(path); pf->ProcessChildNodes(); pf->PrintTree(); } catch(Exception* pe) { Console::WriteLine(pe->Message); }
When you run the program, you’ll be able to see that the new node has been added to the tree. Remember that this operation has modified only the DOM tree in memory; the original XML file has not been changed.
|