13.1 Reading XML Documents with XmlTextReader

< Day Day Up >

13.1 Reading XML Documents with `XmlTextReader`

You want to open an XML file and parse its contents using an XmlTextReader object.

Technique

Just like many other classes within the .NET Framework, the XmlTextReader class contains several different constructors you can use to create and initialize an object. If you group the constructors based on the way the XmlTextReader retrieves the XML data, you can see three main groupings. You can pass a string specifying the file path or URL of the XML file, a Stream object that you created using one of the many classes derived from Stream or a TextReader object. The previous chapter demonstrates how to utilize Stream and TextReader objects. Some constructors optionally allow you to pass an XmlNameTable or an XmlNodeType with an associated XmlParserContext object.

An XmlNameTable is an object designed to optimize string comparisons for element and attribute names within an XML document. When the document is parsed, each unique element and attribute name is added to the table as an object. You can then use this information to your advantage by performing object comparisons rather than using the more expensive string comparison methods . Let's assume, for example, that you had an XML file containing a list of names and phone numbers for the city that you live in. If your city is large, then the file itself would also be large. If you want to extract just the last names from the XML file as it's read, you can just do a string comparison on each element. However, this process will eat up clock cycles more than using the XmlNameTable method would. An example of using an XmlNameTable for this problem might be the following:

 XmlTextReader rdr = new XmlTextReader( "cityphone.xml" ); NameTable names = new NameTable(); string lastName = names.Add( "last_name" ); while( rdr.Read() ) {     if( rdr.NameTable.Get( "last_name" ) == lastName )         Console.WriteLine( "Found last name of {0}", rdr.Value ); }

You use the XmlNodeType and its associated XmlParserContext object to specify the type of XML document you are parsing. You use it when you want to parse a small part of XML rather than an entire document. For instance, if you just want to parse a single element rather than all the required parts of an XML document, such as the XML declaration, you pass a value of XmlNodeType.Element . However, the XML data that you pass might still contain entity or namespace references that the parser might not know about. You use the XmlParserContext object to resolve the cases in which these situations occur. For example, if the XML fragment you are using is an XmlNodeType.Element and one of the inner elements contains a reference to a namespace named phone , you have to create an XmlParserContext object and insert the namespace into the object so that it is properly resolved during parsing:

 string name = "<person> " +     "<first_name>Scott</first_name>" +     "<last_name>Limbach</last_name>" +     "<phone:area_code>360</phone:area_code>" +     "<phone:main>555-5555</phone:main>"; NameTable nt = new NameTable(); XmlNamespaceManager nsmgr = new XmlNamespaceManager(nt); nsmgr.AddNamespace("phone", "urn:cityphonedb"); //Create the XmlParserContext. XmlParserContext context = new XmlParserContext(null, nsmgr, null, XmlSpace.None); //Create the reader. XmlTextReader reader = new XmlTextReader(xmlFrag, XmlNodeType.Element, context);

Once you create an instance of an XmlTextReader object, you are ready to begin parsing the XML file. Reading XML data is performed by calling the Read method defined in the XmlTextReader class. This method reads the next node type in the document from its current position. To determine the type of the node that was just read in, you can use a switch statement on the XmlNodeType property of the XmlTextReader object using the values from the XmlNodeType enumerated data type. Depending on the type of the node just read, you can access the associated data either through the Name or Value property. Listing 13.1 shows how to populate a TreeView control on a Windows Form. The parsing occurs in the PopulateTreeView method. When the XmlTextReader begins parsing the document and an element is encountered , a new TreeNode is created. Any associated attributes for the element are placed in a Hashtable , which is then assigned to the Tag property of the TreeNode object. When the attributes have all been read using the MoveToNextAttribute method in the XmlTextReader class, the TreeNode is added to the tree. One thing to note is the Stack object that is being used. Because XML is hierarchical, the TreeView must display that hierarchy by creating TreeNode objects as children of other TreeNode objects. When a new element is found by the XmlTextReader , the last TreeNode that was created is pushed onto the stack. When an EndElement is encountered, then that parent node is "popped" off and made current. You can also use recursion, but a nonrecursive stack-based mechanism is better for performance and simplicity.

Listing 13.1 Populating a `TreeView` Using XML

 using System; using System.Drawing; using System.Collections; using System.ComponentModel; using System.Windows.Forms; using System.Data; using System.Xml; namespace _1_XmlTextReader {     public class Form1 : System.Windows.Forms.Form     {         private System.Windows.Forms.MainMenu mainMenu1;         private System.Windows.Forms.MenuItem menuItem1;         private System.Windows.Forms.MenuItem mnuOpen;         private System.Windows.Forms.MenuItem menuItem3;         private System.Windows.Forms.TreeView tvXML;         private System.Windows.Forms.OpenFileDialog openFileDialog1;         private System.Windows.Forms.ListView lvAttributes;         private System.Windows.Forms.ColumnHeader columnHeader1;         private System.Windows.Forms.ColumnHeader columnHeader2;         private System.Windows.Forms.Label label1;         private System.Windows.Forms.Label lblTextNode;         private System.ComponentModel.Container components = null;         public Form1()         {             InitializeComponent();         }         protected override void Dispose( bool disposing )         {             if( disposing )             {                 if (components != null)                 {                     components.Dispose();                 }             }             base.Dispose( disposing );         }         #region Windows Form Designer generated code         #endregion         [STAThread]         static void Main()         {             Application.Run(new Form1());         }         private void mnuOpen_Click(object sender, System.EventArgs e)         {             if( openFileDialog1.ShowDialog(this) == DialogResult.OK )             {                 PopulateTreeView( openFileDialog1.FileName );             }         }         private void PopulateTreeView( string fileName )         {             XmlTextReader rdr = new XmlTextReader( fileName );             Stack nodeStack = new Stack();             TreeNode curTreeNode = null;             // clear tree view             tvXML.Nodes.Clear();             while( rdr.Read() )             {                 switch (rdr.NodeType)                 {                         // new start element found                     case XmlNodeType.Element:                     {                         // push last element onto stack                         if( curTreeNode != null )                             nodeStack.Push( curTreeNode );                         // create new element                         curTreeNode = new TreeNode( rdr.Name );                         curTreeNode.Tag = new Hashtable();                         // populate attribute hashtable for element                         if( rdr.HasAttributes == true )                         {                             curTreeNode.ForeColor = Color.Red;                             while( rdr.MoveToNextAttribute() )                             {                                 ((Hashtable) curTreeNode.Tag).Add(                                     rdr.Name, rdr.Value );                             }                         }                         // add element to proper place in tree.                         // Parent node is on top of stack                         if( nodeStack.Count > 0 )                             ((TreeNode)nodeStack.Peek()).Nodes.Add(curTreeNode);                         else                             tvXML.Nodes.Add( curTreeNode );                         if( rdr.Name.EndsWith( "/>" ))                         {                             if( nodeStack.Count > 0 )                                 curTreeNode = (TreeNode) nodeStack.Pop();                         }                         break;                     }                     case XmlNodeType.Text:                     {                         ((Hashtable) curTreeNode.Tag).Add("Text", rdr.Value);                         break;                     }                     case XmlNodeType.EndElement:                     {                         // pop the last parent node off the stack                         if( nodeStack.Count > 0 )                             curTreeNode = (TreeNode) nodeStack.Pop();                         break;                     }                     default:                     {                         break;                     }                 }             }             rdr.Close();         }         private void tvXML_AfterSelect(             object sender,             System.Windows.Forms.TreeViewEventArgs e )         {             // clear attribute list view             lvAttributes.Items.Clear();             Hashtable atts = (Hashtable )tvXML.SelectedNode.Tag;             IDictionaryEnumerator attsEnum = atts.GetEnumerator();             // enumerate tree node attribute hashtable and add to listview             while( attsEnum.MoveNext() )             {                 if( attsEnum.Key.ToString() != "Text" )                     lvAttributes.Items.Add(new ListViewItem(                         new string[]{attsEnum.Key.ToString(),                         attsEnum.Value.ToString()} ));             }             if( ((Hashtable) tvXML.SelectedNode.Tag).ContainsKey( "Text" ))             {                 lblTextNode.Text = ( ((Hashtable)                     tvXML.SelectedNode.Tag)["Text"].ToString());             }             else             {                 lblTextNode.Text = "";             }         }     } }

Figure 13.1. The XML Viewer application parses an XML document using `XmlTextReader` and displays the information in a `TreeView` .

graphics/13fig01.gif

Comments

The XmlTextReader is one of three different readers that parse XML data. Each one of these classes parses XML data, allowing you to place the data in a data structure more appropriate to your application. The other two classes, XmlNodeReader and XmlValidatingReader , are covered in the next two sections.

XmlTextReader is forward only, noncached, and nonvalidating. As the XML parser encounters a new element, that element is not validated against a Document Type Definition (DTD) or XML schema, and the parser does not allow you to reverse the parsing process to revisit a node because the nodes are not saved in memory by the parser after having been read. If you used the Simple API for XML (SAX) model of reading XML in the past, these terms might seem familiar. However, XmlTextReader uses a pull model to control the reader, whereas SAX utilized a push model. One of the advantages of using an XmlTextReader is performance. If you aren't concerned with validating XML against a schema or Document Type Definition (DTD), then XmlTextReader should be high on your list of design decisions. Additionally, even though validation does not occur, the XML file or fragments must still be well formed. If the file is not well formed , then the Read method throws an exception, which you must handle. Sections 13.2, "Reading DOM Tree with XmlNodeReader ," and 13.5, "Validating XML Documents with Schemas," look at the remaining two XML readers that inherit from the XmlReader class. The XmlValidatingReader allows you to perform validation against a schema or DTD, and you use an XmlNodeReader object when you want to parse XML data from an existing XmlNode object.

< Day Day Up >