Reading and Writing Streamed XML | Professional C# 2005 with .NET 3.0

The XmlReader and XmlWriter classes will feel familiar if you have ever used SAX. XmlReader-based classes provide a very fast, forward-only, read-only cursor that streams the XML data for processing. Because it is a streaming model, the memory requirements are not very demanding. However, you don’t have the navigation flexibility and the read or write capabilities that would be available from a DOM-based model. XmlWriter-based classes produce an XML document that conforms to the W3C’s XML 1.0 Namespace Recommendations.

XmlReader and XmlWriter are both abstract classes. The following classes are derived from XmlReader:

XmlNodeReader
XmlTextReader
XmlValidatingReader

The following classes are derived from XmlWriter:

XmlTextWriter
XmlQueryOutput

XmlTextReader and XmlTextWriter work with either a stream-based object from the System.IO namespace or TextReader/TextWriter objects. XmlNodeReader uses an XmlNode as its source instead of a stream. The XmlValidatingReader adds DTD and schema validation and therefore offers data validation. You look at these a bit more closely later in this chapter.

Using the XmlReader Class

As mentioned previously, XmlReader is a lot like SAX. One of the biggest differences, however, is that whereas SAX is a push type of model (that is, it pushes data out to the application, and the developer has to be ready to accept it), the XmlReader has a pull model, where data is pulled into an application requesting it. This provides an easier and more intuitive programming model. Another advantage to this is that a pull model can be selective about the data that is sent to the application: if you don’t want all of the data, you don’t need to process it. In a push model, all of the XML data has to be processed by the application, whether it is needed or not.

The following is a very simple example of reading XML data, and later you take a closer look at the XmlReader class. You’ll find the code in the XmlReaderSample folder. Here is the code for reading in the books.xml document. As each node is read, the NodeType property is checked. If the node is a text node, the value is appended to the text box:

  using System.Xml; private void button3_Click(object sender, EventArgs e) {   richTextBox1.Clear();   XmlReader rdr = XmlReader.Create("books.xml");   while (rdr.Read())   {     if (rdr.NodeType == XmlNodeType.Text)       richTextBox1.AppendText(rdr.Value + "\r\n");   } }

Earlier it was mentioned that XmlReader is an abstract class. So in order to use the XmlReader class directly, a Create static method has been added. The create method returns an XmlReader object. The overload list for the Create method contains nine entries. In the preceding example, a string that represents the file name of the XmlDocument is passed in as a parameter. Stream-based objects and TextReader-based objects can also be passed in.

Another object that can be used is an XmlReaderSettings object. XmlReaderSettings specifies the features of the reader. For example, a schema can be used to validate the stream. Set the Schemas property to a valid XmlSchemaSet object, which is a cache of XSD schemas. Then the XsdValidate property on the XmlReaderSettings object can be set to true.

Several Ignore properties exist that can be used to control the way the reader processes certain nodes and values. These properties include IgnoreComments, IgnoreIdentityConstraints, IgnoreInlineSchema, IgnoreProcessingInstructions, IgnoreSchemaLocation, and IgnoreWhitespace. These properties can be used to strip certain items from the document.

Read Methods

Several ways exist to move through the document. As shown in the previous example, Read() takes you to the next node. You can then verify whether the node has a value (HasValue()) or, as you see shortly, whether the node has any attributes (HasAttributes()). You can also use the ReadStartElement() method, which verifies whether the current node is the start element and then positions you on to the next node. If you are not on the start element, an XmlException is raised. Calling this method is the same as calling the IsStartElement() method followed by a Read() method.

ReadElementString() is similar to ReadString(), except that you can optionally pass in the name of an element. If the next content node is not a start tag, or if the Name parameter does not match the current node Name, an exception is raised.

Here is an example of how ReadElementString() can be used. Notice that this example uses FileStreams, so you will need to make sure that you include the System.IO namespace via a using statement:

  private void button6_Click(object sender, EventArgs e) {   FileStream fs = new FileStream("books.xml", FileMode.Open);   XmlReader tr = XmlReader.Create(fs);   while (!tr.EOF)   {     //if we hit an element type, try and load it in the listbox     if (tr.MoveToContent() == XmlNodeType.Element && tr.Name == "title")     {       richTextBox1.AppendText(tr.ReadElementString() + "\r\n");     }     else     {       //otherwise move on       tr.Read();     }   } }

In the while loop, you use MoveToContent() to find each node of type XmlNodeType.Element with the name title. You use the EOF property of the XmlTextReader as the loop condition. If the node is not of type Element or not named title, the else clause will issue a Read() method to move to the next node. When you find a node that matches the criteria, you add the result of a ReadElementString() to the list box. This should leave you with just the book titles in the list box. Note that you don’t have to issue a Read() call after a successful ReadElementString(). This is because ReadElementString() consumes the entire Element and positions you on the next node.

If you remove && tr.Name==”title” from the if clause, you will have to catch the XmlException when it is thrown. If you look at the data file, you will see that the first element that MoveToContent() will find is the <bookstore> element. Because it is an element, it will pass the check in the if statement. However, Because it does not contain a simple text type, it will cause ReadElementString() to raise an XmlException. One way to work around this is to put the ReadElementString() call in a function of its own. Then, if the call to ReadElementString() fails inside this function, you can deal with the error and return to the calling function.

Go ahead and do this; call this new method LoadTextBox() and pass in the XmlTextReader as a parameter. This is what the LoadTextBox() method looks like with these changes:

  private void LoadTextBox(XmlReader reader) {    try    {       richTextBox1.AppendText (reader.ReadElementString() + "\r\n");    }    // if an XmlException is raised, ignore it.    catch(XmlException er){} }

This section from the previous example:

  if (tr.MoveToContent() == XmlNodeType.Element && tr.Name == "title") {   richTextBox1.AppendText(tr.ReadElementString() + "\r\n"); } else {   //otherwise move on   tr.Read(); }

will have to change to the following:

  if (tr.MoveToContent() == XmlNodeType.Element) {   LoadTextBox(tr); } else {   //otherwise move on   tr.Read(); }

After running this example, the results should be the same as before. What you are seeing is that there is more than one way to accomplish the same goal. This is where the flexibility of the classes in the System.Xml namespace starts to become apparent.

The XmlReader can also read strongly typed data. There are several ReadElementContentAs methods, such as ReadElementContentAsDouble, ReadElementContentAsBoolean, and so on. The following example shows how to read in the values as a decimal and do some math on the value. In this case, the value from the price element is increased by 25 percent:

  private void button5_Click(object sender, EventArgs e) {   richTextBox1.Clear();   XmlReader rdr = XmlReader.Create("books.xml");   while (rdr.Read())   {     if (rdr.NodeType == XmlNodeType.Element)     {       if (rdr.Name == "price")       {         decimal price = rdr.ReadelementContentAsDecimal();         richTextBox1.AppendText("Current Price = " + price + "\r\n");         price += price * (decimal).25;         richTextBox1.AppendText("New Price = " + price + "\r\n\r\n");       }       else if(rdr.Name== "title")         richTextBox1.AppendText(rdr.ReadElementContentAsString() + "\r\n");     }   } }

If the value cannot be converted to a decimal value, a FormatException is raised. This is a much more efficient method than reading the value as a string and casting it to the proper data type.

Retrieving Attribute Data

As you play with the sample code, you might notice that when the nodes are read in, you don’t see any attributes. This is because attributes are not considered part of a document’s structure. When you are on an element node, you can check for the existence of attributes and optionally retrieve the attribute values.

For example, the HasAttributes property returns true if there are any attributes; otherwise, it returns false. The AttributeCount property tells you how many attributes there are, and the GetAttribute() method gets an attribute by name or by index. If you want to iterate through the attributes one at a time, you can use MoveToFirstAttribute() and MoveToNextAttribute() methods.

Here is an example of iterating through the attributes of the books.xml document:

  private void button7_Click(object sender, EventArgs e) {   richTextBox1.Clear();   XmlReader tr = XmlReader.Create("books.xml");   //Read in node at a time   while (tr.Read())   {     //check to see if it's a NodeType element     if (tr.NodeType == XmlNodeType.Element)     {       //if it's an element, then let's look at the attributes.       for (int i = 0; i < tr.AttributeCount; i++)       {         richTextBox1.AppendText(tr.GetAttribute(i) + "\r\n");       }     }   } }

This time you are looking for element nodes. When you find one, you loop through all of the attributes and, using the GetAttribute() method, you load the value of the attribute into the list box. In this example, those attributes would be genre, publicationdate, and ISBN.

Validating with XmlReader

Sometimes it’s important to know not only that the document is well formed but also that the document is valid. An XmlReader can validate the XML according to an XSD schema by using the XmlReaderSettings class. The XSD schema is added to the XmlSchemaSet that is exposed through the Schemas property. The XsdValidate property must also be set to true; the default for this property is false.

The following example demonstrates the use of the XmlReaderSettings class. The following is the XSD schema that will be used to validate the books.xml document:

  <?xml version="1.0" encoding="utf-8"?> <xs:schema attributeFormDefault="unqualified"         elementFormDefault="qualified" xmlns:xs="http://www.w3.org/2001/XMLSchema">   <xs:element name="bookstore">     <xs:complexType>       <xs:sequence>         <xs:element maxOccurs="unbounded" name="book">           <xs:complexType>             <xs:sequence>               <xs:element name="title" type="xs:string" />               <xs:element name="author">                 <xs:complexType>                   <xs:sequence>                     <xs:element minOccurs="0" name="name" type="xs:string" />                     <xs:element minOccurs="0" name="first-name" type="xs:string" />                     <xs:element minOccurs="0" name="last-name" type="xs:string" />                   </xs:sequence>                  </xs:complexType>                </xs:element>                <xs:element name="price" type="xs:decimal" />              </xs:sequence>              <xs:attribute name="genre" type="xs:string" use="required" />              <!-- <xs:attribute name="publicationdate"                                        type="xs:unsignedShort" use="required" /> -->              <xs:attribute name="ISBN" type="xs:string" use="required" />            </xs:complexType>          </xs:element>        </xs:sequence>      </xs:complexType>    </xs:element>  </xs:schema>

This schema was generated from the books.xml in Visual Studio. Notice that the publicationdate attribute has been commented out. This will cause the validation to fail at that point.

The following is the code that uses the schema to validate the books.xml document:

  private void button8_Click(object sender, EventArgs e) {   richTextBox1.Clear();       XmlReaderSettings settings = new XmlReaderSettings();    settings.Schemas.Add(null, "books.xsd");    settings.ValidationType = ValidationType.Schema;    settings.ValidationEventHandler +=  new System.Xml.Schema.ValidationEventHandler(settings_ValidationEventHandler);    XmlReader rdr = XmlReader.Create("books.xml", settings);    while (rdr.Read())    {      if (rdr.NodeType == XmlNodeType.Text)        richTextBox1.AppendText(rdr.Value + "\r\n");    }

After the XmlReaderSettings object setting is created, the schema books.xsd is added to the XmlSchemaSet object. The Add method for XmlSchemaSet has four overloads. One takes an XmlSchema object. The XmlSchema object can be used to create a schema on the fly without having to create the schema file on disk. Another overload takes another XmlSchemaSet object as a parameter. Another takes two string values: the first is the target namespace and the other is the URL for the XSD document. If the target namespace parameter is null, the targetNamespace of the schema will be used. The last overload takes the targetNamespace as the first parameter as well, but it used an XmlReader-based object to read in the schema. The XmlSchemaSet preprocesses the schema before the document to be validated is processed.

After the schema is referenced, the XsdValidate property is set to one of the ValidationType enumeration values. These valid values are DTD, Schema, or None. If the value selected is set to None, then no validation will occur.

Because the XmlReader object is being used, if there is a validation problem with the document, it will not be found until that attribute or element is read by the reader. When the validation failure does occur, an XmlSchemaValidationException is raised. This exception can be handled in a catch block; however, handling exceptions can make controlling the flow of the data difficult. To help with this, a ValidationEvent is available in the XmlReaderSettings class. This way the validation failure can be handled without having to use exception handling. The event is also raised by validation warnings, which do not raise an exception. The ValidationEvent passes in a ValidationEventArgs object that contains a Severity property. This property determines whether the event was raised by an error or a warning. If the event was raised by an error, the exception that caused the event to be raised is passed in as well. There is also a message property. In the example, the message is displayed in a MessageBox.

Using the XmlWriter Class

The XmlWriter class allows you write XML to a stream, a file, a StringBuilder, a TextWriter, or another XmlWriter object. Like XmlTextReader, it does so in a forward-only, noncached manner. XmlWriter is highly configurable, allowing you to specify such things as whether or not to indent content, the amount to indent, what quote character to use in attribute values, and whether namespaces are supported. Like the XmlReader, this configuration is done using an XmlWriterSettings object.

Here’s a simple example that shows how the XmlTextWriter class can be used:

  private void button9_Click(object sender, EventArgs e) {   XmlWriterSettings settings = new XmlWriterSettings();   settings.Indent = true;   settings.NewLineOnAttributes = true;   XmlWriter writer = XmlWriter.Create("booknew.xml", settings);   writer.WriteStartDocument();   //Start creating elements and attributes   writer.WriteStartElement("book");   writer.WriteAttributeString("genre", "Mystery");   writer.WriteAttributeString("publicationdate", "2001");   writer.WriteAttributeString("ISBN", "123456789");   writer.WriteElementString("title", "Case of the Missing Cookie");   writer.WriteStartElement("author");   writer.WriteElementString("name", "Cookie Monster");   writer.WriteEndElement();   writer.WriteElementString("price", "9.99");   writer.WriteEndElement();   writer.WriteEndDocument();   //clean up   writer.Flush();   writer.Close(); }

Here, you are writing to a new XML file called booknew.xml, adding the data for a new book. Note that XmlWriter will overwrite an existing file with a new one. You look at inserting a new element or node into an existing document later in this chapter. You are instantiating the XmlWriter object using the Create static method. In this example, a string representing a file name is passed as a parameter along with an instance of an XmlWriterSetting class.

The XmlWriterSettings class has properties that control the way that the XML is generated. The CheckedCharacters property is a Boolean that will raise an exception if a character in the XML does not conform to the W3C XML 1.0 recommendation. The Encoding class sets the encoding used for the XML being generated; the default is Encoding.UTF8. The Indent property is a Boolean value that determines if elements should be indented. The IndentChars property is set to the character string that it is used to indent. The default is two spaces. The NewLine property is used to determine the characters for line breaks. In the preceding example, the NewLineOnAttribute is set to true. This will put each attribute in a separate line, which can make the XML generated a little easier to read.

WriteStartDocument() adds the document declaration. Now you start writing data. First comes the book element, then you add the genre, publicationdate, and ISBN attributes, and then you write the title, author, and price elements. Note that the author element has a child element name.

When you click the button, you produce the booknew.xml file, which looks like this:

  <?xml version="1.0" encoding="utf-8"?> <book   genre="Mystery"   publicationdate="2001"   ISBN="123456789">   <title>Case of the Missing Cookie</title>   <author>     <name>Cookie Monster</name>   </author>   <price>9.99</price> </book>

The nesting of elements is controlled by paying attention to when you start and finish writing elements and attributes. You can see this when you add the name child element to the authors element. Note how the WriteStartElement() and WriteEndElement() method calls are arranged and how that arrangement produces the nested elements in the output file.

To go along with the WriteElementString() and WriteAttributeString() methods, there are several other specialized write methods. WriteCData()outputs a CData section (<!CDATA[...]]>), writing out the text it takes as a parameter. WriteComment() writes out a comment in proper XML format. WriteChars() writes out the contents of a char buffer. This works in a similar fashion to the ReadChars() method that you looked at earlier; they both use the same type of parameters. WriteChars() needs a buffer (an array of characters), the starting position for writing (an integer), and the number of characters to write (an integer).

Reading and writing XML using the XmlReader- and XmlWriter-based classes are surprisingly flexible and simple to do. Next, you learn how the DOM is implemented in the System.Xml namespace, through the XmlDocument and XmlNode classes.