Recipe15.1.Reading and Accessing XML Data in Document Order

Recipe 15.1. Reading and Accessing XML Data in Document Order

Problem

You need to read in all the elements of an XML document and obtain information about each element, such as its name and attributes.

Solution

Create an XmlReader and use its Read method to process the document as shown in Example 15-1.

Example 15-1. Reading an XML document

 using System;  using System.Xml; // … public static void Indent(int level)  {      for (int i = 0; i < level; i++)        Console.Write(" ");  } public static void AccessXML( )  {     string xmlFragment = "<?xml version='1.0'?>" +         "<!-- My sample XML -->" +         "<?pi myProcessingInstruction?>" +         "<Root>" +         "<Node1 nodeId='1'>First Node</Node1>" +         "<Node2 nodeId='2'>Second Node</Node2>" +         "<Node3 nodeId='3'>Third Node</Node3>" +         "</Root>";     byte[] bytes = Encoding.UTF8.GetBytes(xmlFragment);     using (MemoryStream memStream = new MemoryStream(bytes))     {         XmlReaderSettings settings = new XmlReaderSettings();         // Check for any illegal characters in the XML.         settings.CheckCharacters = true;         using (XmlReader reader = XmlReader.Create(memStream, settings))         {             int level = 0;             while (reader.Read())             {                 switch (reader.NodeType)                 {                     case XmlNodeType.CDATA:                          Indent(level);                          Console.WriteLine("CDATA: {0}", reader.Value);                          break;                     case XmlNodeType.Comment:                          Indent(level);                          Console.WriteLine("COMMENT: {0}", reader.Value);                          break;                     case XmlNodeType.DocumentType:                          Indent(level);                         Console.WriteLine("DOCTYPE: {0}={1}",                              reader.Name, reader.Value);                          break;                     case XmlNodeType.Element:                          Indent(level);                          Console.WriteLine("ELEMENT: {0}", reader.Name);                         level++;                         while (reader.MoveToNextAttribute())                          {                             Indent(level);                              Console.WriteLine("ATTRIBUTE: {0}='{1}'",                                 reader.Name, reader.Value);                          }                          break;                     case XmlNodeType.EndElement:                          level--;                          break;                     case XmlNodeType.EntityReference:                          Indent(level);                          Console.WriteLine("ENTITY: {0}", reader.Name);                          break;                     case XmlNodeType.ProcessingInstruction:                          Indent(level);                          Console.WriteLine("INSTRUCTION: {0}={1}",                             reader.Name, reader.Value);                         break;                     case XmlNodeType.Text:                          Indent(level);                          Console.WriteLine("TEXT: {0}", reader.Value);                         break;                     case XmlNodeType.XmlDeclaration:                          Indent(level);                          Console.WriteLine("DECLARATION: {0}={1}",                             reader.Name, reader.Value);                         break;                  }              }          }     }  }

This code dumps the XML document in a hierarchical format:

 DECLARATION: xml=version='1.0' COMMENT: My sample XML INSTRUCTION: pi=myProcessingInstruction ELEMENT: Root  ELEMENT: Node1   ATTRIBUTE: nodeId='1'   TEXT: First Node  ELEMENT: Node2   ATTRIBUTE: nodeId='2'   TEXT: Second Node  ELEMENT: Node3   ATTRIBUTE: nodeId='3'   TEXT: Third Node

Discussion

Reading existing XML and identifying different node types is one of the fundamental actions that you will need to perform when dealing with XML. The code in the Solution creates an XmlReader from a string (it could also have used a stream), then iterates over the nodes while re-creating the formatted XML for output to the console window.

The Solution shows creating a MemoryStream from an XML fragment in a string like this:

     string xmlFragment = "<?xml version='1.0'?>" +         "<!-- My sample XML -->" +         "<?pi myProcessingInstruction?>" +         "<Root>" +         "<Node1 nodeId='1'>First Node</Node1>" +         "<Node2 nodeId='2'>Second Node</Node2>" +         "<Node3 nodeId='3'>Third Node</Node3>" +         "</Root>";     byte[] bytes = Encoding.UTF8.GetBytes(xmlFragment);     MemoryStream memStream = new MemoryStream(bytes);

Once the MemoryStream has been established, the settings for the XmlReader need to be set up on an XmlReaderSettings object instance. These settings tell the XmlReader to check for any illegal characters in the XML fragment:

     XmlReaderSettings settings = new XmlReaderSettings();     // Check for any illegal characters in the XML.     settings.CheckCharacters = true;

The while loop iterates over the XML by reading one node at a time and examining the NodeType property of the current node that the reader is on to determine what type of XML node it is:

     while (reader.Read( ))     {         switch (reader.NodeType)         {

The NodeType property is an XmlNodeType enumeration value that specifies the types of XML nodes that can be present. The XmlNodeType enumeration values are shown in Table 15-1.

Table 15-1. The XmlNodeType enumeration values
Name	Description
`Attribute`	An attribute node of an element.
`CDATA`	A marker for sections of text to escape that would usually be treated as markup.
`Comment`	A comment in the XML: `<!my comment -->`.
`Document`	The root of the XML document tree.
`DocumentFragment`	Document fragment node.
`DocumentType`	The document type declaration.
`Element`	An element tag: `<myelement>`.
`EndElement`	An end element tag: `</myelement>`.
`EndEntity`	Returned at the end of an entity after calling ResolveEntity.
`Entity`	Entity declaration.
`EntityReference`	A reference to an entity.
`None`	This is the node returned if `Read` has not yet been called on the `XmlReader`.
`Notation`	A notation in the DTD (document type definition).
`ProcessingInstruction`	The processing instruction: `<?pi myProcessingInstruction?>`.
`SignificantWhitespace`	Whitespace when mixed content model is used or when whitespace is being preserved.
`Text`	Text content for a node.
`Whitespace`	The whitespace between markup entries.
`XmlDeclaration`	The first node in the document that cannot have children: <`?xml version='1.0'?>`.