Accessing XML Data Using .NET Framework Classes

for RuBoard

Now that you've seen how to create an XML document, we get to the fun part: how to write code to extract and manipulate data from an XML document using classes found in the .NET frameworks. There's no one right way to do this; in fact, before .NET came along, two predominant ways were used to parse an XML document: the XML Document Object Model (DOM) and Simple API for XML (SAX).

An implementation of the XML DOM exists in the .NET framework. However, in this chapter we'll primarily focus on .NET's own XML handlers, such as the XmlNodeReader , XmlTextReader , and XmlTextWriter objects. These objects are the standard .NET way to access XML data; they provide a good combination of high performance, .NET integration, and ease of programming. But you should know about the other ways to deal with XML, too ”particularly because the specialized .NET reader and writer objects are designed to interact with the Internet-standard DOM objects. So for the remainder of this chapter, we'll include brief examples of how to work with the DOM model, as well.

About Simple API for XML (SAX)

Simple API for XML (SAX) was designed to provide a higher level of performance and a simpler programmability model than XML DOM. It uses a fundamentally different programmability model. Instead of reading in the entire document at once and exposing the elements of the document as nodes, SAX provides an event-driven model for parsing XML.

SAX is not supported in .NET ”yet. In fact, it's not even an official Internet standard. It's a programming interface for XML that was created by developers who wanted an XML parser with higher performance and a smaller memory footprint, especially when parsing very large documents.

If you are currently writing applications using SAX and want to use SAX in your .NET applications today, you can do so by using the MSXML 3.0 COM library through the COM interoperability features in .NET.

NOTE

Although it is not yet supported in the .NET framework, SAX is supported in Microsoft's COM-based XML parser implementation. For more information on this tool, see http://msdn.microsoft.com/xml/.

Using the XML Document Object Model

The XML Document Object Model (DOM) is a programming interface used to parse XML documents. It was the first programming interface provided for XML by Microsoft; XML DOM implementations are available that target other languages and other operating systems.

The original Microsoft XML DOM implementation is COM based, so it is accessible from any COM-compliant language. The XML parsers in .NET are, naturally, accessible from any .NET-compliant language.

The XML DOM does its magic by taking an XML document and exposing it in the form of a complex object hierarchy. This kind of hierarchy may be familiar to you if you've done client-side HTML Document Object Model programming in JavaScript or VBScript. The number of objects in XML DOM is fairly daunting; no fewer than 20 objects are in the base implementation, and then the Microsoft implementation adds a number of additional interfaces and proprietary extensions.

Fortunately, the number of objects you need to work with on a regular basis in the XML DOM is minimal. In fact, the XML DOM recommendation segregates the objects in the DOM into two groups: fundamental classes and extended classes. Fundamental classes are the ones that application developers find most useful; the extended classes are primarily useful to tools developers and people who like to pummel themselves with detail.

The fundamental classes of the XML DOM as implemented in the .NET framework are XmlNode , XmlNodeList , and XmlNamedNodeMap . These classes, as well as the parent XmlDocument class, are illustrated in Figure 10.1.

Figure 10.1. Fundamental XML DOM objects.

graphics/10fig01.gif

Note that the XmlDocument object is technically an extended class, not a fundamental class, because it inherits from XmlNode . We're including discussion of it in this chapter because it's kind of tricky to do useful stuff in XML without it. The class adds some useful file- and URL-handling capabilities to XmlNode .

NOTE

The XmlNode and XmlDocument classes are found in the System.Xml namespace. The XmlDocument class inherits from System.Xml.XmlNode . A reference to the classes, properties, and methods introduced in this chapter is included at the end of this chapter.

In general, to work with an XML document using the Document Object Model, you first open the document (using the .Load() or .LoadXml() method of the XmlDocument object). The .Load() method is overloaded and can take any one of four arguments: a string, a System.IO.TextReader object, a System.Xml.XmlReader object, or an instance of System.IO.Stream.

The easiest way to demonstrate how to load an XML document from a file on disk is to pass the .Load() method a string. The string can either be a local file on disk or a URL. If the string is a URL, the XmlDocument retrieves the document from a Web server. This is pretty handy; it makes you wish that every file-handling object worked this way.

For most of the examples in this chapter, we'll use a small XML file on disk called books.xml. Listing 10.10 contains the full contents of books.xml.

Listing 10.10 The Full Contents of the books.xml Document Example

 <BOOKS>   <BOOK>     <TITLE>C# Developer's Guide To ASP.NET, XML and ADO.NET</TITLE>     <AUTHOR id='101' location='San Francisco'>Jeffrey P. McManus</AUTHOR>     <AUTHOR id='107' location='Seattle'>Chris Kinsman</AUTHOR>   </BOOK> </BOOKS>

Listing 10.11 shows an example of how to load this XML document from disk using an XmlDocument object.

Listing 10.11 Loading a Local XML File Using the XmlDocument's `.Load()` Method

 <% @Page language="C#" debug="true" %> <%@ Import Namespace="System.Xml" %> <SCRIPT runat='server'> void Page_Load(Object Sender,EventArgs e) {   XmlDocument xd = new XmlDocument();   xd.Load(Server.MapPath("books.xml"));   Response.Write (xd.OuterXml);   xd = null; } </SCRIPT>

This code works for any XML document accessible to the local file system. Listing 10.12 demonstrates how to load an XML document via HTTP from a remote Web server.

Listing 10.12 Loading an XML File That Resides on a Web Server

 <% @Page language="C#" debug="true" %> <%@ Import Namespace="System.Xml" %> <SCRIPT runat='server'> void Page_Load(Object Sender,EventArgs e) {   XmlDocument xd = new XmlDocument();   xd.Load("http://www.myserver.com/books.xml");   Response.Write (xd.OuterXml);   xd = null; } </SCRIPT>

As you can see, the syntax is nearly identical whether you're loading the file from the local file system or over HTTP. Both examples are extremely simple; they demonstrate how easy it is to open and view an XML document using the DOM. The next step is to start doing things with the data in the document you've retrieved.

NOTE

Don't use the DOM ”or any of the other XML-reading techniques demonstrated in this chapter ”to read the Web application configuration file web.config. The ASP.NET page framework provides an object that is used specifically for retrieving configuration from Web.config ”the AppSettings method of the ConfigurationSettings object. For more information on how this works, see Chapter 5, "Configuration and Deployment."

Viewing Document Data Using the `XmlNode` Object

After you've loaded a document, you need some way to programmatically visit each of its nodes to determine what's inside. In the XML DOM, several ways exist to do this, all of which are centered around the XmlNode object.

The XmlNode object represents a node in the XML document. It exposes an object hierarchy that exposes attributes and child nodes, as well as every other part of an XML document.

When you've loaded an XML document to parse it (as we demonstrated in the previous code examples), your next step usually involves retrieving that document's top-level node. Use the .FirstChild property to do this.

Listing 10.13 shows an example of retrieving and displaying the name of the top-level node in the document using . FirstChild .

Listing 10.13 Loading a Local XML File Using the XmlDocument's `.Load()` Method

 <% @Page language="C#" debug="true" %> <%@ Import Namespace="System.Xml" %> <SCRIPT runat='server'> void Page_Load(Object Sender,EventArgs e) {   XmlDocument xd = new XmlDocument();   xd.Load(Server.MapPath("books.xml"));   Response.Write (xd.FirstChild.Name);   xd = null; } </SCRIPT>

The code demonstrates how the .FirstChild property returns an XmlNode object with its own set of properties and methods. In the example, we call the Name property of the XmlNode object represented by .FirstChild .

You can do more useful and interesting things with the XmlNode object. One common operation is drilling down and retrieving data from the ChildNodes object owned by XmlNode . Two features of ChildNodes make this possible: its status as an enumerable class, and the InnerText property of each child node.

Enumerable classes implement the .NET IEnumerable interface. This is the same interface definition that arrays, collections, and more complex constructs such as ADO.NET DataSets support. (You may think of ChildNodes as just another collection, but in .NET, Collection is a distinct data type.)

When an object supports IEnumerable, it exposes functionality (through a behind-the-scenes object called an enumerator) that enables other processes to visit each of its child members . In the case of ChildNodes , the enumerator lets your code visit the object's child XmlNode objects. The foreach block in C# is the construct that is most commonly used to traverse an enumerable class. Listing 10.14 shows an example of this.

Listing 10.14 Traversing the Enumerable `ChildNodes` Class

 <% @Page language="C#" debug="true" %> <%@ Import Namespace="System.Xml" %> <SCRIPT runat='server'> void Page_Load(Object Sender,EventArgs e) {   XmlDocument xd = new XmlDocument();   XmlNode ndBook;   xd.Load(Server.MapPath("books.xml"));   ndBook = xd.FirstChild["BOOK"];   foreach(XmlNode nd in ndBook.ChildNodes)   {     if(nd.Name == "AUTHOR")       Response.Write("The author's name is " + nd.InnerText + "<BR>");   } } </SCRIPT>

In this code example, the foreach loop goes through the set of XmlNode objects found in ChildNodes . When it finds one whose Name property is AUTHOR , it displays the node's value. Note that for the books.xml file example, two author names appear because the book example has two authors.

Note also that the value contained in an XML node is returned by the InnerText property in .NET, not by the .text property as it was in the COM-based MSXML library. Making a more granular distinction between a simple text property versus inner and outer text or inner and outer XML gives you a greater degree of power and flexibility. Use the outer properties when you want to preserve markup; the inner properties return the values themselves.

With the few aspects of the XmlDocument and XmlNode objects we've discussed so far, you now have the ability to perform rudimentary retrieval of data in an XML document using the DOM. However, looping through a collection of nodes using foreach leaves something to be desired. For example, what happens when your book node contains a set of 50 child nodes, and you're interested in extracting only a single child node from that?

Fortunately, .NET provides several objects that enable you to easily navigate the hierarchical structure of an XML document. These include the XmlTextReader and the XmlNodeReader object described in the next few sections of this chapter.

Using the `XmlTextReader` Object

The XmlTextReader object provides a method of accessing XML data that is both easier to code and potentially more efficient than the full-blown XML DOM. At the same time, the XmlTextReader understands DOM objects in a way that lets you use both types of access cooperatively.

NOTE

XmlTextReader is found in the System.Xml namespace. It inherits from System.Xml.XmlReader , an abstract class. A reference to the classes, properties, and methods introduced in this chapter is included at the end of this chapter.

If you've used the XML DOM in the past, the XmlTextReader will change the way you think about XML parsing in general. The XmlTextReader doesn't load an entire XML document and expose its various nodes and attributes to you in the form of a large hierarchical tree; that process causes a large performance hit as data is parsed and buffered. Instead, think of the XmlTextReader object as a truck that bounces along the road from one place to another. Each time the truck moves across another interesting aspect of the landscape, you have the ability to take some kind of interesting action based on what's there.

Parsing an XML document using the XmlTextReader object involves a few steps. First, you create the object, optionally passing in a filename or URL that represents the source of XML to parse. Next, execute the .Read method of the XmlTextReader object until that method returns the value False . (You'll typically set up a loop to do this so that you can move from the beginning to the end of the document.)

Each time you execute the XmlTextReader object's Read method, the XmlTextReader object's properties are populated with fragments of information from the XML document you're parsing. This information includes the type of the data the object just read and the value of the data itself (if any).

The type of data is exposed through the XmlTextReader object's NodeType property. The value of data retrieved can be retrieved in an untyped format through the Value property of the XmlTextReader object. It can also be retrieved in a typed format through the ReadString method.

Most of the time, the NodeType property will be XmlNodeType.Element (an element tag), XmlNodeType.Text (the data contained in a tag), or XmlNodeType.Attribute .

Listing 10.15 shows an example of how this works. The objective of this example is to retrieve the title of a book from an XML file that is known to contain any one of a number of nodes pertaining to the book itself.

Listing 10.15 Extracting a Book Title Using the `XmlTextReader` Object

 <% @Page language="C#" debug="true" %> <%@ Import Namespace="System.Xml" %> <SCRIPT runat='server'> void Page_Load(Object Sender,EventArgs e) {     XmlTextReader xr = new XmlTextReader(Server.MapPath("books.xml"));     Boolean bTitle = false;     while(xr.Read())     {       switch(xr.NodeType)       {         case XmlNodeType.Element:           if(xr.Name == "TITLE")             bTitle = true;           break;         case XmlNodeType.Text:           if(bTitle)           {             Response.Write("Book title: " + xr.ReadString());             bTitle = false;           }           break;       }     } } </SCRIPT>

The example opens the XML file by passing the name of the XML file to the constructor of the XmlTextReader object. It then reads one chunk of the document at a time through successive calls to the XmlTextReader object's Read method. If the current data represents the element name "TITLE" , the code sets a flag, bTitle .

When the bTitle flag is set to True , it means "get ready, a book title is coming next." The book title itself is extracted in the next few lines of code. When the code encounters the text chunk, it extracts it from the XML document in the form of a string.

Note that the values XmlNodeType.Element and XmlNodeType.Text are predefined members of the XmlNodeType enumeration. You can set up more involved parsing structures based on any XML element type found in the DOM if you want. For example, if you included a case to process the element type XmlNodeType.XmlDeclaration , you could process the XML declaration that appears (but that is not required to appear) as the first line of the XML document.

As you can see from these examples, a beautiful thing about XML is that if the structure of the document changes, your parsing code will still work correctly, as long as the document contains a TITLE node. (In the previous code example, if for some reason the document contains no book title, no action is taken.) So the problems with delimited data that we discussed at the beginning of this chapter go away in the new world of XML parsing.

The XmlTextReader works well both for large and small documents. Under most circumstances (particularly for large documents), it should perform better than the XML DOM parser. However, like the DOM, it too has its own set of limitations. The XmlTextReader object doesn't have the capability to scroll ”to jump around among various areas in the document. (If you're a database developer, you can think of an XmlTextReader as being analogous to a cursorless or forward-only resultset.) Also, as its name implies, the XmlTextReader object permits you only to read data; you can't use it to make changes in existing node values or add new nodes to an existing document.

Writing XML Data Using the `XmlTextWriter` Object

You can create an XML document using any text editor. Similarly, you can create an XML document programmatically using any object capable of writing to a file. For example, the TextWriter class, found in the namespace System.IO, is often used for general-purpose generation of text files. Because XML files are normal text files containing markup tags and a specific, defined structure, you could use a TextWriter object to create XML files.

However, the XmlTextWriter object provides some advantages over creating XML files with a general-purpose object such as TextWriter . The main benefit of using XmlTextWriter is that it can validate the XML you generate as you write. The class also has a number of useful features, such as the ability to specify and apply formatting, delimiter , and encoding modes automatically.

NOTE

XmlTextWriter is found in the System.Xml namespace. It inherits from System.Xml.XmlWriter , an abstract class. A reference to the classes, properties, and methods introduced in this chapter is included at the end of this chapter.

To get started creating XML documents with the XmlTextWriter class, you must first create an instance of the class. XmlTextWriter has three constructors that enable to you to create an instance of the object given an existing TextWriter or Stream object, but it's likely that the most common form you'll use is the constructor that takes a filename and an encoding type. In the code examples in this section, we use the encoding type UTF-8, which is the default. You can explicitly denote this encoding type by setting the encoding type to Encoding.UTF8.

Now you're ready to write data to the XmlTextWriter object using its methods. To create elements in the document, you use the WriteElementString method exposed by the object.

Listing 10.16 shows an example that demonstrates how to create a minimal version of the books.xml file using these methods.

Listing 10.16 Creating a Minimal XML Document Using the `XmlTextWriter` Object

 <%@ Import Namespace="System.Xml" %> <SCRIPT runat='server'> void Page_Load(Object Sender,EventArgs e) {   XmlTextWriter xw = new XmlTextWriter(Server.MapPath("books2.xml"), Encoding.UTF8);   try   {     xw.WriteStartDocument();     xw.WriteStartElement("BOOK");     xw.WriteElementString("TITLE", "C# Developer's Guide");     xw.WriteEndDocument();     Response.Write("Your file has been written.");   }   catch(Exception ex)   {     Response.Write("Exception: " + ex.Message);   }   finally   {     xw.Flush();     xw.Close();   } } </SCRIPT>

Normally we don't include exception-handling code in our brief code examples ( mainly because we're lazy sods, but also because they sometimes detract from the point of the code example). But in this case, we've included a handler to emphasize that it's important to handle exceptions in code that creates or modifies files. If you fail to include an exception handler in file-handling code, it's easy to make a mistake that prevents a file from being closed properly, for example, which is a bad thing.

In the previous example, you can see that we first begin by creating the XmlTextWriter object, passing it a filename and encoding scheme. We then call the WriteStartDocument method to begin working with the document. This has the side effect of sending an XML declaration to the document. Calling WriteStartDocument is required when using the XmlTextWriter , even though XML itself does not require that a declaration entity be present in a document.

Next we create the root node of the document with a call to the WriteStartElement method. The simplest form of WriteStartElement takes a single string argument ” the name of the node to create. This is the form of the method we've used in our example.

Next, we insert a node underneath the root node with a call to WriteElementString . The form of WriteElementString we're using here lets us pass two strings: a node name and contents of the node.

When we're done with the document, we call the WriteEndDocument method to close the root node and then call the Flush and Close methods to finish the process of committing the file to disk. When this code is executed, it produces the file shown in Listing 10.17.

Listing 10.17 XML Document Produced by Previous Code Example

 <?xml version="1.0" encoding="utf-8"?> <BOOK><TITLE>C# Developer's Guide</TITLE></BOOK>

This is adequate (it's parsable), but the formatting leaves something to be desired. If you want line breaks and indented child nodes in your document, you must set the Formatting property of the XmlTextWriter document to the enumerated value Formatting.Indented .

To create attributes associated with nodes, use the XmlTextWriter object's WriteAttributeString method.

Navigating and Updating Documents Using the `XmlNodeReader` Object

So far in this chapter, you've seen two distinct ways to access XML data ”the XML Document Object Model and the XmlTextReader object ”provided by the .NET framework. Both have their advantages and drawbacks. A third alternative exists: XmlNodeReader .

In many ways, the XmlNodeReader object represents the best of all worlds . It provides a simpler programmability model than the XmlDocument object, yet it integrates with the standard DOM objects nicely . In fact, in most cases when you're working with XML data in .NET, you'll typically create an XmlNodeReader by creating a DOM XmlDocument object first.

NOTE

The XmlNodeReader class is found in the System.Xml namespace. It inherits from System.Xml.XmlReader , an abstract class. A reference to the classes, properties, and methods introduced in this chapter is included at the end of this chapter.

Listing 10.18 shows an example of creating an XmlNodeReader object from an existing XmlDocument object that has been populated with data.

Listing 10.18 Creating an `XmlNodeReader` Object from an `XmlDocument` Object

 <%@ Import Namespace="System.Xml" %> <SCRIPT runat='server'> void Page_Load(Object Sender,EventArgs e) {   XmlDocument xd = new XmlDocument();   XmlNodeReader xn = new XmlNodeReader(xd);   // Code to work with the XmlNodeReader goes here } </SCRIPT>

Navigating Through the Document Using the `XmlNodeReader` Object

After you've created and populated the XmlNodeReader , you can use it to move through the document programmatically. You do this by placing calls to the XmlNodeReader 's Read method, which iterates through the document one element at a time. Listing 10.19 demonstrates this.

Listing 10.19 Using the `XmlNodeReader` Object to Traverse an XML Document

 <%@ Import Namespace="System.Xml" %> <SCRIPT runat='server'> void Page_Load(Object Sender,EventArgs e) {   XmlDocument xd = new XmlDocument();   xd.Load(Server.MapPath("books.xml"));   XmlNodeReader xn = new XmlNodeReader(xd);   while(xn.Read())   {     Response.Write(xn.Name + " - " + xn.Value + "<BR>");   } } </SCRIPT>

This is another example of how repeated calls to the Read method control the looping structure ”the same way you use the Read method of the XmlTextReader object (discussed earlier in this chapter). Because Read returns true when it successfully navigates to a new element and false when no more data is left to traverse to, it's easy to set up a while loop that displays all the data in the document.

You've seen that navigating though an XML document using the XmlTextReader and XmlNodeReader objects' Read method works well enough. But if the process of repeatedly executing the Read methods to blast through a document seems a little weak to you, you're right. Reading elements makes sense when the structure of a document is known, but how do you go directly to a node when you know the name of the node and can be reasonably sure that the node exists? And how do you get rid of the inelegant process of calling MoveToChild repeatedly to drill down to the place in the document where useful data exists?

Fortunately, there is another object that provides a number of more sophisticated techniques for drilling into the document hierarchy using an XML query technology known as XPath. We'll discuss this object in more detail in the next few sections.

Using XPath Queries to Retrieve XML Data

XPath is a standard that defines a syntax for retrieving data from an XML document. You can use XPath query syntax to retrieve data from an XML document without having to traverse the entire document. In .NET, you do this using the XPathDocument and XPathNavigator objects.

NOTE

The XPathDocument and XPathNavigator objects are members of the System.Xml.XPath namespace. A complete listing of the members of the XPathDocument and XPathNavigator objects is given in the reference section at the end of this chapter.

XPath syntax is described in more detail in the section "Querying XML Documents Using XPath Expressions" later in this chapter.

To begin performing XPath queries, you start by creating an XPathDocument object. This object is analogous to the XmlDocument object. You can use its constructor or its Load method to open an XML file on disk.

After you've created an XPathDocument object, you use the object's CreateNavigator method to create an instance of the XPathNavigator object. The XPathNavigator object is responsible for performing the actual XPath query of the document; it returns an iterator (an instance of System.Xml.XPath.XPathNodeIterator) that you can use to access each of the elements returned by the query. (If you're familiar with cursor-based database programming, you can think of the XPathNodeIterator as a type of cursor.)

The Select method of the XPathNavigator object enables you to filter and retrieve subsets of XML data from any XML document. You do this by constructing an XPath expression and passing the expression to the Select method of the XPathNavigator object. An XPath expression is a compact way of querying an XML document without going to the trouble of parsing the whole thing first. Using XPath, it's possible to retrieve very useful subsets of information from an XML document, often with only a single line of code.

Listing 10.20 shows a very simple example of using an XPath expression passed to the Select method to move to and display the author of the first book in the document books.xml.

Listing 10.20 Using the `Select` Method of the `XPathNavigator` Object to Retrieve a Subset of Nodes

 <%@ Import Namespace="System.Xml" %> <%@ Import Namespace="System.Xml.XPath" %> <SCRIPT runat='server'>    void Page_Load(Object Sender,EventArgs e)    {      XPathDocument xpd  = new XPathDocument(Server.MapPath("books.xml"));      XPathNavigator nav  = xpd.CreateNavigator();      XPathNodeIterator iterator = nav.Select("BOOKS/BOOK/AUTHOR");      while(iterator.MoveNext())      {        Response.Write(iterator.Current.Value + "<BR>");      }    } </SCRIPT>

When the Select method in this example is executed, you're telling the XmlNavigator object to retrieve all the AUTHOR nodes owned by BOOK nodes contained in the BOOKS root node. The XPath expression "BOOKS/BOOK/AUTHOR" means "all the authors owned by BOOK nodes under the BOOKS root node." Any AUTHOR nodes in the document owned by parent nodes other than BOOK won't be retrieved, although you could construct an XPath expression to retrieve AUTHOR nodes anywhere in the document regardless of their parentage.

The product of this operation is a selection, a subset of XML nodes that can then be manipulated independently of the main document. You can traverse the selection using the XPathNodeIterator object returned from your call to the Select method of the XPathNavigator object. After you have an iterator, you can retrieve and display the data from the selected nodes.

Manipulating the Current Node Using the XPath Iterator's Current Property

In the previous example, you saw that you could query an XML document using the XmlNavigator object. The product of this query was an iterator object, an instance of XPathNodeIterator , that enabled you to access the value of each of the nodes returned in the XPath query one at a time.

When you retrieve data in this manner, you may need to further manipulate each node. For example, rather than simply accessing the value of a node, you may need to retrieve an attribute associated with the node. You can do this by using the Current property of the XPathNodeIterator object. This property, an instance of an XPathNavigator object, contains a rich set of properties and methods for manipulating properties of an XML node retrieved by an XPath query.

The AUTHOR nodes of the books.xml file contain two attributes: id and location . After you've moved to an AUTHOR node, you can display the values of these attributes by using the GetAttribute method of the Current object. Listing 10.21 shows an example of this.

Listing 10.21 Using the `Current` Property to Extract the Value of Attributes on a Node

 <%@ Page debug="true" %> <%@ Import Namespace="System.Xml" %> <%@ Import Namespace="System.Xml.XPath" %> <SCRIPT runat='server'>     void Page_Load(Object Sender,EventArgs e)     {       XPathDocument xpd  = new XPathDocument(Server.MapPath("books.xml"));       XPathNavigator nav  = xpd.CreateNavigator();       XPathNodeIterator iterator  = nav.Select("BOOKS/BOOK/AUTHOR");       while(iterator.MoveNext())       {         Response.Write(iterator.Current);         Response.Write(" ID: " + iterator.Current.GetAttribute("id", "") + "<BR>");       }     } </SCRIPT>

Changing Values in an XML Document

In addition to navigating in an XML document using the various objects described in this chapter, you can also use the XML DOM to make changes in an XML document. Using DOM objects, you can:

Insert a node into the document
Remove a child node
Change the value of an element

To insert a node, you use the InsertAfter or InsertBefore methods of the XmlNode object. This method takes two parameters: the new child node to insert and a reference to an existing node. The location of the existing node determines where the new node should go.

Listing 10.22 shows an example of how to insert a new book into the books.xml document using the InsertAfter method of the XmlNode object.

Listing 10.22 Inserting a New Item into the Document Using the `InsertAfter` Method

 <%@ Page debug="true" %> <%@ Import Namespace="System.Xml" %> <%@ Import Namespace="System.Xml.XPath" %> <SCRIPT runat='server'>    void Page_Load(Object Sender,EventArgs e)    {      XmlDocument xd = new XmlDocument();      xd.Load(Server.MapPath("books.xml"));      XmlNode root  = xd.DocumentElement;   // BOOKS      // Insert BOOK element      XmlElement elemBook  = xd.CreateElement("BOOK");      root.InsertAfter(elemBook, root.FirstChild);      xd.Save(Server.MapPath("output.xml"));      Response.Write("Open the file output.xml to view the results.");   } </SCRIPT>

This code loads the XML document from disk and then adds a node using the InsertAfter method. It then saves the file to disk using the filename output.xml.

The contents of output.xml is the same as books.xml, but with a blank BOOK node included. However, a book is actually composed of at least two nodes: a BOOK node and a child TITLE node. To insert the child TITLE node, you can use the AppendChild method of the BOOK node, as shown in Listing 10.23.

Listing 10.23 Inserting a New Child Node Using the `AppendChild` Method

 <%@ Page debug="true" %> <%@ Import Namespace="System.Xml" %> <%@ Import Namespace="System.Xml.XPath" %> <SCRIPT runat='server'>   void Page_Load(Object Sender,EventArgs e)   {     XmlDocument xd = new XmlDocument();     xd.Load(Server.MapPath("books.xml"));     XmlNode root = xd.DocumentElement;   // BOOKS     // Insert BOOK element     XmlElement elemBook = xd.CreateElement("BOOK");     root.InsertAfter(elemBook, root.FirstChild);     // Insert TITLE element beneath the book     XmlElement elemTitle = xd.CreateElement("TITLE");     elemTitle.InnerText = "MY TITLE";     elemBook.AppendChild(elemTitle);     xd.Save(Server.MapPath("output.xml"));     Response.Write("Open the file output.xml to view the results.");   } </SCRIPT>

This code is the same as Listing 10.23, except that it contains an additional call to the AppendChild method of the BOOK node to add an associated TITLE child node.

You can see from the previous two listings that changes to a document using the DOM change only the in-memory representation of the object, not the way the document is stored on disk. If you want to persist the changes to disk, you must use the Save method of the XmlDocument object. Therefore, to save the changes you made to the document, you execute the Save method of the XmlDocument object that created the document.

Querying XML Documents Using XPath Expressions

XPath is a set-based query syntax for extracting data from an XML document. If you're accustomed to database programming using Structured Query Language (SQL), you can think of XPath as being somewhat equivalent to SQL. But as with so many analogies between relational and XML data, the similarities run out quickly. XPath provides a different syntax that supports the retrieval of hierarchical data in an XML document.

NOTE

The XPath syntax is a World Wide Web Consortium (W3C) recommendation. You can get more information about XPath from the W3C site at http://www.w3.org/TR/xpath. Information on the Microsoft XML 3.0 (COM) implementation of XPath is at http://msdn.microsoft.com/library/psdk/xmlsdk/xslr0fjs.htm.

XPath enables you to extract data from an XML document using a compact expression, ideally with a single line of code. It's generally a more concise way to extract information buried deep within an XML document. (The alternative to using XPath is to write loops or recursive functions, as most of the examples used earlier in this chapter did.) The compactness of XPath can come at a price, however: readability. Unless you're well versed in the XPath syntax, you may have trouble figuring out what the author of a complicated XPath expression was trying to look up. Bear this in mind as you use XPath in your applications.

Although the complete XPath syntax is quite involved (and beyond the scope of this book), you should know about certain commonly used operations as you approach XML processing using the .NET framework classes. The three most common XPath scenarios are

Retrieving a subset of nodes that match a certain value (for example, all the orders associated with customers)
Retrieving one or more nodes based on the value of an attribute (such as retrieving all the orders for customer ID 1006)
Retrieving all the parent and child nodes where an attribute of a child node matches a certain value (such as retrieving all the customers and orders where the Item attribute of the order node equals 'Tricycle' )

To make it easy to experiment with different kinds of XPath expressions and see how they retrieve different sections of the XML document, we've created an ASP.NET page that enables you to input an XPath query that is then applied against the books.xml document. We'll use this page as a way of demonstrating various XPath techniques in this section. Listing 10.24 shows a complete listing of this page.

Listing 10.24 The XPath Query Page Used to Test XPath Queries Against the books.xml File

 <%@ Import Namespace="System.Xml" %> <%@ Import Namespace="System.Xml.XPath" %> <SCRIPT runat='server'>   private void btnQuery_Click(System.Object sender, System.EventArgs e)   {     XmlDocument xd = new XmlDocument();     XmlNodeList nl;     xd.Load(Server.MapPath("books.xml"));     txtOutput.Text = "";     nl = xd.SelectNodes(txtXPath.Text);     foreach(XmlNode nd in nl)     {       txtOutput.Text += nd.OuterXml;     }   } </SCRIPT> <HTML>     <HEAD>         <title>ASP.NET XPath</title>     </HEAD>     <body>         <form runat="server">             XPath Expression:<br>             <asp:TextBox id="txtXPath"                          Text="BOOKS/BOOK/AUTHOR"                          TextMode="MultiLine"                          Rows="3" Width="200" runat="server" />             <asp:Button id="btnQuery"                         OnClick="btnQuery_Click"                         Text="Query" Runat="server" /><br>             <asp:TextBox id="txtOutput"                          TextMode="MultiLine"                          Rows="15" Width="400" runat="server" />         </form>     </body> </HTML>

This page contains two sections, the XML- and XPath-processing code contained in the btnQuery_Click event procedure and the HTML and ASP.NET server controls that enable the user to interact with the code. The page works by letting the user type in an XPath expression, which is then executed against the books.xml document. The results from the XPath query are placed into the output text box called txtOutput .

You can see how this page works by typing /BOOKS into the XPath Expression box. If everything works correctly, the output shown in Listing 10.25 should appear in the Results box.

Listing 10.25 Results of a XPath Query Simple XPath Query

 <BOOKS>   <BOOK>     <TITLE>C# Developer's Guide To ASP.NET, XML and ADO.NET</TITLE>       <AUTHOR id="101" location="San Francisco">Jeffrey P. McManus</AUTHOR>       <AUTHOR id="107" location="Seattle">Chris Kinsman</AUTHOR>     </BOOK>   </BOOKS>

When you run the code on your own machine, you'll probably notice that the XML is run together without any spaces or line breaks; we've inserted them here for readability.

NOTE

A shortcut exists to retrieve the root node of a document, and the shortcut doesn't even require you to know the name of the root node. The XPath expression /* will always retrieve the root node of a document (and, by extension, all the descendants of the root node).

This kind of query is interesting, but less than useful. It simply returns the entire contents of the document, and we could have done that with much less code. More interesting is XPath's capability to selectively filter information based on criteria that you specify.

In an earlier code example, you saw an example of an XPath query that retrieved author information. Listing 10.26 shows an example of this query again, with the expected output in the XPath query page listed next.

Listing 10.26 Using XPath to Retrieve an Author Given a Particular ID

  XPath Query Expression:  /BOOKS/BOOK/AUTHOR[@id = "107"] <OUTPUT>   <AUTHOR id="107" location="Seattle">Chris Kinsman</AUTHOR> </OUTPUT>

In this case, the @ symbol indicates that id is an attribute instead of the name of a note. Note that this expression will retrieve multiple instances of a given author in a case where a document contains multiple books with the same author. Listing 10.27 shows an example.

Listing 10.27 Using XPath to Retrieve Multiple Instances of the Same Author

  Source Document:  <BOOKS>   <BOOK>     <TITLE>C# Developer's Guide To ASP.NET, XML and ADO.NET</TITLE>     <AUTHOR id="101" location="San Francisco">Jeffrey P. McManus</AUTHOR>     <AUTHOR id="107" location="Seattle">Chris Kinsman</AUTHOR>   </BOOK>   <BOOK>     <TITLE>How to Pluck a Purdue Chicken</TITLE>     <AUTHOR id="107" location="Seattle">Chris Kinsman</AUTHOR>   </BOOK>   <BOOK>     <TITLE>My Life Among the Baboons</TITLE>     <AUTHOR id="107" location="Seattle">Chris Kinsman</AUTHOR>   </BOOK> </BOOKS>  XPath Query Expression:  /BOOKS/BOOK/AUTHOR[@id = "107"]/parent::* <BOOK>   <TITLE>C# Developer's Guide To ASP.NET, XML and ADO.NET</TITLE>   <AUTHOR id="101" location="San Francisco">Jeffrey P. McManus</AUTHOR>   <AUTHOR id="107" location="Seattle">Chris Kinsman</AUTHOR> </BOOK> <BOOK>   <TITLE>How to Pluck a Purdue Chicken</TITLE>   <AUTHOR id="107" location="Seattle">Chris Kinsman</AUTHOR> </BOOK> <BOOK>   <TITLE>My Life Among the Baboons</TITLE>   <AUTHOR id="107" location="Seattle">Chris Kinsman</AUTHOR> </BOOK>

The parent::* clause added to the end of the XPath expression in this example tells the query to retrieve the data as well as the parent node. Rather than just returning the same author three times, as it would if we'd specified only /BOOKS/BOOK/ AUTHOR[@id = "107"], we instead get the author node and the BOOK parent node, which makes more sense.

If you change the query expression to /BOOKS/BOOK/AUTHOR[@id = "101"]/parent::*, then only one book is retrieved (the book authored by author 101, McManus).

If you were interested in retrieving only one instance of the AUTHOR node, you could use the XmlNodeReader 's SelectSingle method (rather than Select method). This ensures that you retrieve only the first instance of the data.

You can combine multiple query criteria using AND and OR logic. For example, to retrieve all the book authors who live either in San Francisco or Seattle, use the expression shown in Listing 10.28.

Listing 10.28 Using `OR` in an XPath to Query Based on Multiple Search Criteria

  XPath Query Expression:  BOOKS/BOOK/AUTHOR[@location="Seattle" or @location="San Francisco"] <AUTHOR id="101" location="San Francisco">Jeffrey P. McManus</AUTHOR> <AUTHOR id="107" location="Seattle">Chris Kinsman</AUTHOR> <AUTHOR id="107" location="Seattle">Chris Kinsman</AUTHOR> <AUTHOR id="107" location="Seattle">Chris Kinsman</AUTHOR>

This query retrieves both authors because each one fits one or the other criterion. Kinsman is retrieved three times because his name is associated with three books in the list. Any book authors located in Saskatoon would be excluded from the resultset.

Now suppose you want to exclude nodes, perhaps by returning all the authors who don't live in Seattle. Listing 10.29 shows an example of an expression that does this, as well as the expected output.

Listing 10.29 Using XPath to Exclude Nodes Based on the Value of Attributes

  XPath Query Expression:  /BOOKS/BOOK/AUTHOR[@location != "Seattle"] <AUTHOR id="101" location="San Francisco">Jeffrey P. McManus</AUTHOR>

As you can see, the inequality operator != is the way you use XPath to retrieve all the nodes that do not have the location attribute of "Seattle" .

You can also perform greater-than and less-than comparisons using this technique. The XPath expression /BOOKS/BOOK/AUTHOR[@id > 105] retrieves all authors whose id attribute is greater than 105.

In addition to querying on attributes, you can also query on the text contained in nodes. A good example of a node that contains text in the books.xml document is the TITLE node, contained beneath the BOOK node in our hierarchy.

Listing 10.30 shows an example of a query that retrieves the books in which the TITLE node contains specific text.

Listing 10.30 Retrieving a Specific TITLE Node by Querying on Its Text

  XPath Query Expression:  /BOOKS/BOOK/TITLE[. = "How to Pluck a Purdue Chicken"]/parent::* <BOOK>   <TITLE>How to Pluck a Purdue Chicken</TITLE>   <AUTHOR id="107" location="Seattle">Chris Kinsman</AUTHOR> </BOOK>

The dot ( . ) operator is XPath-ese for "right here."

There is another way to retrieve all the books whose title matches a given text string without using the parent::* expression. You can instead include the parameter element in square brackets. Listing 10.31 shows an example of this syntax, retrieving all the TITLE nodes (and their parent BOOK nodes) given a specific title you specify.

Listing 10.31 Retrieving a Specific BOOK Node by Querying on the Text Found in the Book's TITLE Node

  XPath Query Expression:  /BOOKS/BOOK[TITLE = 'How to Pluck a Purdue Chicken'] <BOOK>   <TITLE>How to Pluck a Purdue Chicken</TITLE>   <AUTHOR id="107" location="Seattle">Chris Kinsman</AUTHOR> </BOOK>

Comparing this XPath expression ”and the response it generates ”to the query we ran in the previous example illustrates the difference between requesting a node and requesting its parent and all its descendants. In this example we're saying, "Give me the books whose TITLE text matches the following title." In the previous example, the request was, "Give me the TITLE nodes whose text matches the following title." It's a subtle distinction, but important as you delve into getting XPath to work for you.

This section doesn't represent a complete discussion of how to use XPath queries to retrieve data from existing XML documents. However, it should get you on the right path . XPath represents a whole minilanguage with additional functions and operators; for more information on the remaining details, consult the .NET framework documentation or (if you're desperate and/or have trouble getting to sleep at night) the W3C XPath recommendation.

for RuBoard

About Simple API for XML (SAX)

Using the XML Document Object Model

Figure 10.1. Fundamental XML DOM objects.

Listing 10.10 The Full Contents of the books.xml Document Example

Listing 10.11 Loading a Local XML File Using the XmlDocument's .Load() Method

Listing 10.12 Loading an XML File That Resides on a Web Server

Viewing Document Data Using the XmlNode Object

Listing 10.13 Loading a Local XML File Using the XmlDocument's .Load() Method

Listing 10.14 Traversing the Enumerable ChildNodes Class

Using the XmlTextReader Object

Listing 10.15 Extracting a Book Title Using the XmlTextReader Object

Writing XML Data Using the XmlTextWriter Object

Listing 10.16 Creating a Minimal XML Document Using the XmlTextWriter Object

Listing 10.17 XML Document Produced by Previous Code Example

Navigating and Updating Documents Using the XmlNodeReader Object

Listing 10.18 Creating an XmlNodeReader Object from an XmlDocument Object

Navigating Through the Document Using the XmlNodeReader Object

Listing 10.19 Using the XmlNodeReader Object to Traverse an XML Document

Using XPath Queries to Retrieve XML Data

Listing 10.20 Using the Select Method of the XPathNavigator Object to Retrieve a Subset of Nodes

Manipulating the Current Node Using the XPath Iterator's Current Property

Listing 10.21 Using the Current Property to Extract the Value of Attributes on a Node

Changing Values in an XML Document

Listing 10.22 Inserting a New Item into the Document Using the InsertAfter Method

Listing 10.23 Inserting a New Child Node Using the AppendChild Method

Querying XML Documents Using XPath Expressions

Listing 10.24 The XPath Query Page Used to Test XPath Queries Against the books.xml File

Listing 10.25 Results of a XPath Query Simple XPath Query

Listing 10.26 Using XPath to Retrieve an Author Given a Particular ID

Listing 10.27 Using XPath to Retrieve Multiple Instances of the Same Author

Listing 10.28 Using OR in an XPath to Query Based on Multiple Search Criteria

Listing 10.29 Using XPath to Exclude Nodes Based on the Value of Attributes

Listing 10.30 Retrieving a Specific TITLE Node by Querying on Its Text

Listing 10.31 Retrieving a Specific BOOK Node by Querying on the Text Found in the Book's TITLE Node

Listing 10.11 Loading a Local XML File Using the XmlDocument's `.Load()` Method

Viewing Document Data Using the `XmlNode` Object

Listing 10.13 Loading a Local XML File Using the XmlDocument's `.Load()` Method

Listing 10.14 Traversing the Enumerable `ChildNodes` Class

Using the `XmlTextReader` Object

Listing 10.15 Extracting a Book Title Using the `XmlTextReader` Object

Writing XML Data Using the `XmlTextWriter` Object

Listing 10.16 Creating a Minimal XML Document Using the `XmlTextWriter` Object

Navigating and Updating Documents Using the `XmlNodeReader` Object

Listing 10.18 Creating an `XmlNodeReader` Object from an `XmlDocument` Object

Navigating Through the Document Using the `XmlNodeReader` Object

Listing 10.19 Using the `XmlNodeReader` Object to Traverse an XML Document

Listing 10.20 Using the `Select` Method of the `XPathNavigator` Object to Retrieve a Subset of Nodes

Listing 10.21 Using the `Current` Property to Extract the Value of Attributes on a Node

Listing 10.22 Inserting a New Item into the Document Using the `InsertAfter` Method

Listing 10.23 Inserting a New Child Node Using the `AppendChild` Method

Listing 10.28 Using `OR` in an XPath to Query Based on Multiple Search Criteria