Reading XML Nodes

As you traverse the XML stream, the XmlTextReader will be a pointer to each node that is encountered . Once the XmlTextReader points at a valid node, there are four pieces of vital information that you can query for: Node Name , Node Namespace, Node Value, and Node Attributes. The XmlTextReader has accessors for each of these properties, but not all of them are applicable to every node type. The node value and node attributes properties are directly dependent on the type of the current node.

The node value property depends directly on the node type. Table 10.2 shows which node types have a value and the actual value that is returned for each. If the node type is not in the list, String.Empty is returned.

Table 10.2. XML Nodes with Values

NODE TYPE	VALUE
`Attribute`	The string value of the attribute
`CDATA`	The content of the CDATA section
`Comment`	The comment of the comment node
`ProcessingInstruction`	The entire content, not including the target
`SignificantWhitespace`	The white space within an xml:space = 'preserve' scope
`Text`	The content of the text node
`Whitespace`	The white space between markup
`XmlDeclaration`	The content of the declaration

The node attributes property depends on two things: the node type and whether the attributes are actually present on the node. Table 10.3 lists which nodes can contain attributes and which attributes are available on the node. Keep in mind that a node that has the ability to contain attributes is different from a node that actually contains attributes.

Table 10.3. XML Nodes That Can Contain Attributes

NODE TYPE	AVAILABLE ATTRIBUTE
Element	Any custom attribute
XmlDeclaration	Version, encoding, & standalone

The XmlTextReader also provides convenient methods to check whether a node has a value to return or if it has any attributes. The HasValue property returns true if the current node can have a value, but false if it is returned otherwise . Note that when HasValue is equal to true, it does not mean that the value is not the empty string. It means only that it is possible for the node to have a value. HasAttributes returns true if the current node has attributes, but returns false if it doesn't.

To facilitate checking the current node type, the XmlTextReader provides the NodeType property. The property returns an enumeration value of System.Xml.XmlNodeType . There is an enumeration value for each type of node that can be contained in an XML document. Some of these types, such as XmlNodeType.DocumentType or XmlNodeType.Notation , will never be encountered when using the .NET Compact Framework.

Reading XML Nodes with the `Read` Method

The XmlTextReader provides several methods to read through the XML stream. The Read method is the most basic of these methods. It iterates through each node in the document. When an XmlTextReader is first created and initialized , there is no information available, so the Read method must be called to read the very first node. This method returns true if the next node is read. It returns false if there are no more nodes to be read.

Listing 10.7 illustrates how to use the Read method to walk through the XML stream and print out data on the nodes that are found.

Listing 10.7

 C# while(reader.Read()) {   switch(reader.NodeType)   {   case XmlNodeType.Element:     if(reader.IsEmptyElement)      MessageBox.Show("<" + reader.Name + "/>");     else      MessageBox.Show("<" + reader.Name + ">");      break;   case XmlNodeType.EndElement:     MessageBox.Show("</" + reader.Name + ">");     break;   case XmlNodeType.CDATA:     MessageBox.Show("<![CDATA[" + reader.Value + "]]>");     break;   case XmlNodeType.Comment:     MessageBox.Show("<!-- " + reader.Value + " -->");     break;   case XmlNodeType.Document:     MessageBox.Show("Reading an XML document");     break;   case XmlNodeType.DocumentFragment:     MessageBox.Show("Reading an XML document fragment");     break;   case XmlNodeType.ProcessingInstruction:     MessageBox.Show("<? " +               reader.Name + " " +               reader.Value + "?>");     break;   case XmlNodeType.Text:     MessageBox.Show("Text: " + reader.Value);     break;   case XmlNodeType.XmlDeclaration:     MessageBox.Show("<?xml " + reader.Value + "?>");     break;   } } VB While (reader.Read())   If (reader.NodeType = XmlNodeType.Element) Then     If (reader.IsEmptyElement) Then        MessageBox.Show("<" & reader.Name & "/>")     Else        MessageBox.Show("<" & reader.Name & ">")     End If   ElseIf (reader.NodeType = XmlNodeType.EndElement) Then     MessageBox.Show("</" & reader.Name & ">")   ElseIf (reader.NodeType = XmlNodeType.CDATA) Then     MessageBox.Show("<![CDATA[" & reader.Value & "]]>")   ElseIf (reader.NodeType = XmlNodeType.Comment) Then     MessageBox.Show("<!-- " & reader.Value & " -->")   ElseIf (reader.NodeType = XmlNodeType.Document) Then     MessageBox.Show("Reading an XML document")   ElseIf (reader.NodeType = XmlNodeType.DocumentFragment) Then     MessageBox.Show("Reading an XML document fragment")   ElseIf (reader.NodeType = XmlNodeType.ProcessingInstruction) Then     MessageBox.Show("<? " & _                      reader.Name & " " &                      reader.Value & "?>");   ElseIf (reader.NodeType = XmlNodeType.Text) Then     MessageBox.Show("Text: " & reader.Value)   ElseIf (reader.NodeType = XmlNodeType.XmlDeclaration) Then     Messagebox.Show("<?xml " & reader.Value & "?>");   End If  End While

Reading the Start Element Tag

The ReadStartElement method is a helper method that checks whether the current node is a start element and advances the reader to the next node. Internally, this method first calls IsStartElement . If IsStartElement returns false, an XmlException is thrown. If IsStartElement returns true, the Read method is called. This will leave the XmlTextReader positioned on the content of the element. If the current node is a true empty element, <Empty/> , calling ReadStartElement will leave the XmlTextReader on the next node in the stream. If the current node is not a true empty element, <Empty></Empty> , the XmlTextReader is left on the end element, </Empty> .

You can optionally supply a name and/or namespace to this method. If this data is supplied, IsStartElement will check whether the current node has the matching name and/or namespace. If not, an XmlException is thrown.

The ReadStartElement should be used in conjunction with the ReadEndElement method. After you have called ReadStartElement and you have consumed any content of the element that exists, ReadEndElement should be called. Note that ReadEndElement will throw an XmlException if the current node is not an end element. Therefore, when ReadStartElement reads a true empty element, an XmlException will be thrown if the ReadEndElement is called next. To prevent this XmlException from being thrown, the XmlTextReader provides the IsEmptyElement property, which returns true if the current node is a true empty element, false otherwise.

The ReadStartElement method should be used when you want to move the reader past the start element and on to the content of the node. Listing 10.8 shows how to use ReadStartElement .

Listing 10.8

 C# reader.Read(); reader.ReadStartElement("Exercise"); reader.ReadStartElement("Name"); string exName = reader.ReadString(); reader.ReadEndElement(); reader.ReadStartElement("BodyPart"); string bp = reader.ReadString(); reader.ReadEndElement(); reader.Close(); MessageBox.Show("The " + exName + " exercise works the " + bp); VB reader.Read() reader.ReadStartElement("Exercise") reader.ReadStartElement("Name") Dim exName = reader.ReadString() reader.ReadEndElement() reader.ReadStartElement("BodyPart") bp = reader.ReadString() reader.ReadEndElement() reader.Close() MessageBox.Show("The " & exName & " exercise works the " & bp)

Reading Element Content as a String

The ReadElementString method is a helper method to read a text-only element. It first calls the MoveToContent method, discussed in the "Jumping to an Element's Content" section, to get the XmlTextReader positioned on the content of the current element. Then the method parses the content as a string value. The parsed string is returned, or if the element is empty, String.Empty is returned. This method will throw an XmlException in two cases: The current node is not a start element, or the element does not contain simple text. Simple text does not include markup such as child elements, comments, or processing instructions. If the node contains several text nodes or CDATA nodes, the text will be concatenated and returned to the user . ReadElementString will also consume the end element while reading the string content.

You can optionally supply a name and/or namespace to this method. If this data is supplied, MoveToContent will check whether the next node has the matching name and/or namespace URI. If not, an XmlException is thrown.

This method should be used when you need to pull all of the data from an element as a string. This method also alleviates the need to read the start and end elements with separate API calls. Listing 10.9 demonstrates how to use ReaderElementString .

Listing 10.9

 C# reader.Read(); reader.ReadStartElement("Exercise"); string name = reader.ReadElementString(); string bodypart = reader.ReadElementString(); reader.Close(); MessageBox.Show("The " + name + " exercise works the " + bodypart); VB reader.Read() reader.ReadStartElement("Exercise") Dim name As String Dim bodypart As String name = reader.ReadElementString() bodypart = reader.ReadElementString() reader.Close() MessageBox.Show("The " & name & " exercise works the " & bodypart)

Jumping to an Element's Content

The MoveToContent method is intended to get the reader to an element's content as fast and as reliably as possible. The method first checks whether the current node is a content node. A content node is an element, an end element, an entity reference, an end entity, or non “white space text. If the node is not a content node, this method will keep reading nodes until it finds the next content node in the XML stream or it reaches the end of the file. While searching for the next content node, MoveToContent will skip over DocumentType nodes, ProcessingInstruction nodes, Whitespace nodes, and SignificantWhitespace nodes. If the current node is an attribute of a content node, this method will move the reader back to the element that owns the attribute.

Conveniently, the MoveToContent method will return the System.Xml.XmlNodeType of the content node it finds. XmlNodeType.None is returned if the XmlTextReader has reached the end of the file. The MoveToContent method is extremely helpful when you want to skip over all non-content nodes and find a content node(s) of a specific type(s). Listing 10.10 illustrates how to use MoveToContent to perform such a task effectively.

Listing 10.10

 C# while( XmlNodeType.None != reader.MoveToContent()) {   if(XmlNodeType.Element == reader.NodeType      && reader.Name == "book")   {     MessageBox.Show(reader.ReadElementString());   } } VB While (XmlNodeType.None <> reader.MoveToContent())   If (XmlNodeType.Element = reader.NodeType _       And reader.Name = "book")   Then     MessageBox.Show(reader.ReadElementString())   End If  End While

The code in Listing 10.10 will walk through the input XML stream and print out the text content of each node with the local name "book." Writing code using the MoveToContent method is quite robust. This code snippet will never break, no matter what new nodes or attributes are added to the input XML stream.