XML Stream-Style Parsers | Professional VB 2005 with .NET 3.0 (Programmer to Programmer)

When demonstrating XML serialization, XML stream-style parsers were mentioned. After all, when an instance of an object is serialized to XML, it has to be written to a stream, and when it is deserialized, it is read from a stream. When an XML document is parsed using a stream parser, the parser always points to the current node in the document. The basic architecture of stream parsers is shown in Figure 11-1.

image from book
Figure 11-1

The following classes that access a stream of XML (read XML) and generate a stream of XML (write XML) are contained in the System.Xml namespace:

XmlWriter - This abstract class specifies a noncached, forward-only stream that writes an XML document (data and schema).
XmlReader - This abstract class specifies a noncached, forward-only stream that reads an XML document (data and schema).

Your diagram of the classes associated with the XML stream-style parser referred to one other class, XslTransform. This class is found in the System.Xml.Xsl namespace and is not an XML stream-style parser. Rather, it is used in conjunction with XmlWriter and XmlReader. This class is covered in detail later.

The System.Xml namespace exposes a plethora of additional XML manipulation classes in addition to those shown in the architecture diagram. The classes shown in the diagram include the following:

XmlResolver - This abstract class resolves an external XML resource using a Uniform Resource Identifier (URI). XmlUrlResolver is an implementation of an XmlResolver.
XmlNameTable - This abstract class provides a fast means by which an XML parser can access element or attribute names.

Writing an XML Stream

An XML document can be created programmatically in .NET. One way to perform this task is by writing the individual components of an XML document (schema, attributes, elements, and so on) to an XML stream. Using a unidirectional write-stream means that each element and its attributes must be written in order - the idea is that data is always written at the head of the stream. To accomplish this, you use a writable XML stream class (a class derived from XmlWriter). Such a class ensures that the XML document you generate correctly implements the W3C Extensible Markup Language (XML) 1.0 specification and the Namespaces in XML specification.

Why is this necessary when you have XML serialization? You need to be very careful here to separate interface from implementation. XML serialization works for a specific class, such as the ElokuvaTilaus class. This class is a proprietary implementation and not the format in which data is exchanged. For this one specific case, the XML document generated when ElokuvaTilaus is serialized just so happens to be the XML format used when placing an order for some movies. ElokuvaTilaus was given a little help from Source Code Style attributes so that it would conform to a standard XML representation of a film order summary.

In a different application, if the software used to manage an entire movie distribution business wants to generate movie orders, then it will have to generate a document of the appropriate form. The movie distribution management software achieves this using the XmlWriter object.

Before reviewing the subtleties of XmlWriter, note that this class exposes over 40 methods and properties. The example in this section provides an overview that touches on a subset of these methods and properties. This subset allows an XML document that corresponds to a movie order to be generated.

This example builds a module that generates an XML document corresponding to a movie order. You will use an instance of XmlWriter, called FilmOrdersWriter, which is actually a file on disk. This means that the XML document generated is streamed to this file. Because the FilmOrdersWriter variable represents a file, you have to take a few actions against the file. For instance, you have to make sure the file is

Created - The instance of XmlWriter, FilmOrdersWriter, is created using the Create() method as well as by assigning all the properties of this object with the XmlWriterSettings object.
Opened - The file the XML is streamed to, FilmOrdersProgrammatic.xml, is opened by passing the filename to the constructor associated with XmlWriter.
Generated - The process of generating the XML document is described in detail at the end of this section.
Closed - The file (the XML stream) is closed using the Close() method of XmlWriter or by simply making use of the Using keyword, which ensures that the object is closed at the end of the Using statement.

Before you create the XmlWriter object, you first need to customize how the object operates by using the XmlWriterSettings object. This object, new to .NET 2.0, enables you to configure the behavior of the XmlWriter object before you instantiate it:

  Dim myXmlSettings As New XmlWriterSettings myXmlSettings.Indent = True myXmlSettings.NewLineOnAttributes = True

You can specify a few settings for the XmlWriterSettings object that define how XML creation will be handled by the XmlWriter object. The following table details the properties of the XmlWriterSettings class:

Open table as spreadsheet

Property	Initial Value	Description
CheckCharacters	True	This property, if set to True, performs a character check on the contents of the XmlWriter object. Legal characters can be found at www.w3.org/TR/REC-xml#charsets.
CloseOutput	False	Specifies whether the XmlWriter should also close the stream or the System.IO.TextWriter object
ConformanceLevel	Conformance Level.Document	Allows the XML to be checked to ensure that it follows certain specified rules. Possible conformance-level settings include Document, Fragment, and Auto.
Encoding	Encoding.UTF8	Defines the encoding of the XML generated
Indent	True	Defines whether the XML generated should be indented or not. Setting this value to False will not indent child nodes from parent nodes.
IndentChars	Two spaces	Specifies the number of spaces by which child nodes will be indented from parent nodes. This setting only works when the Indent property is set to True. If you want, you can assign this any string value you choose.
NewLineChars	\r\n	Assigns the characters that are used to define line breaks
NewLineHandling	NewLineHandling.Replace	Defines whether to normalize line breaks in the output. Possible values include Replace, Entitize, and None.
NewLineOn Attributes	True	Defines whether a node’s attributes should be written to a new line in the construction. This will occur if set to True.
OmitXml Declaration	False	Defines whether an XML declaration should be generated in the output. This omission only occurs if set to True.
OutputMethod	OutputMethod.Xml	Defines the method to serialize the output. Possible values include Xml, Html, Text, and AutoDetect.

Once the XmlWriterSettings object has been instantiated and assigned the values you deem necessary, the next steps are to invoke the XmlWriter object and make the association between the XmlWriterSettings object and the XmlWriter object.

The basic infrastructure for managing the file (the XML text stream) and applying the settings class is

  Dim FilmOrdersWriter As XmlWriter = _    XmlWriter.Create("..\FilmOrdersProgrammatic.xml", myXmlSettings) FilmOrdersWriter.Close()

or the following, if you are utilizing the Using keyword, which is new to the .NET Framework 2.0 and highly recommended:

  Using FilmOrdersWriter As XmlWriter = _    XmlWriter.Create("..\FilmOrdersProgrammatic.xml", myXmlSettings) End Using

With the preliminaries completed (file created and formatting configured), the process of writing the actual attributes and elements of your XML document can begin. The sequence of steps used to generate your XML document is as follows:

Write an XML comment using the WriteComment() method. This comment describes from whence the concept for this XML document originated and generates the following code:
```
 < Same as generated by serializing, ElokuvaTilaus >
```
Begin writing the XML element, <ElokuvaTilaus>, by calling the WriteStartElement() method. You can only begin writing this element because its attributes and child elements must be written before the element can be ended with a corresponding </ElokuvaTilaus>. The XML generated by the WriteStartElement() method is as follows
```
 <ElokuvaTilaus>
```
Write the attributes associated with <ElokuvaTilaus> by calling the WriteAttributeString() method twice. The XML generated by calling the WriteAttributeString() method twice adds to the <ElokuvaTilaus> XML element that is currently being written to the following:
```
 <ElokuvaTilaus Elokuva Maara="10">
```
Using the WriteElementString() method, write the child XML element <Nimi> contained in the XML element, <ElokuvaTilaus>. The XML generated by calling this method is as follows:
```
 <Nimi>Grease</Nimi>
```
Complete writing the <ElokuvaTilaus> parent XML element by calling the WriteEndElement() method. The XML generated by calling this method is as follows:
```
 </ElokuvaTilaus>
```

Let’s now put all this together in the Module1.vb file shown here:

  Imports System.Xml Imports System.Xml.Serialization Imports System.IO Module Module1     Sub Main()         Dim myXmlSettings As New XmlWriterSettings         myXmlSettings.Indent = True         myXmlSettings.NewLineOnAttributes = True         Using FilmOrdersWriter As XmlWriter = _             XmlWriter.Create("..\FilmOrdersProgrammatic.xml", myXmlSettings)             FilmOrdersWriter.WriteComment(" Same as generated " & _                "by serializing, ElokuvaTilaus ")             FilmOrdersWriter.WriteStartElement("ElokuvaTilaus")             FilmOrdersWriter.WriteAttributeString("ElokuvaId", "101")             FilmOrdersWriter.WriteAttributeString("Maara", "10")             FilmOrdersWriter.WriteElementString("Nimi", "Grease")             FilmOrdersWriter.WriteEndElement() ' End ElokuvaTilaus         End Using     End Sub End Module

Once this is run, you will find the XML file FilmOrdersProgrammatic.xml created in the same folder as the Module1.vb file or in the bin/Debug or bin/Release folders. The content of this file is as follows:

 <?xml version="1.0" encoding="utf-8"?> <! Same as generated by serializing, ElokuvaTilaus > <ElokuvaTilaus   Elokuva   Maara="10">   <Nimi>Grease</Nimi> </ElokuvaTilaus>

The previous XML document is the same in form as the XML document generated by serializing the ElokuvaTilaus class. Notice that in the previous XML document the <Nimi> element is indented two characters and that each attribute is on a different line in the document. This was achieved using the XmlWriterSettings class.

The sample application covered only a small portion of the methods and properties exposed by the XML stream-writing class, XmlWriter. Other methods implemented by this class manipulate the underlying file, such as the Flush() method, and some methods allow XML text to be written directly to the stream, such as the WriteRaw() method.

The XmlWriter class also exposes a variety of methods that write a specific type of XML data to the stream. These methods include WriteBinHex(), WriteCData(), WriteString(), and WriteWhiteSpace().

You can now generate the same XML document in two different ways. You have used two different applications that took two different approaches to generating a document that represents a standardized movie order. However, there are even more ways to generate XML, depending on the circumstances. Using the previous scenario, you could receive a movie order from a store, and this order would have to be transformed from the XML format used by the supplier to your own order format.

Reading an XML Stream

In .NET, XML documents can be read from a stream as well. Data is traversed in the stream in order (first XML element, second XML element, and so on). This traversal is very quick because the data is processed in one direction and features such as write and move backward in the traversal are not supported. At any given instance, only data at the current position in the stream can be accessed.

Before exploring how an XML stream can be read, you need to understand why it should be read in the first place. Let’s return to our movie supplier example. Imagine that the application that manages the movie orders can generate a variety of XML documents corresponding to current orders, preorders, and returns. All the documents (current orders, preorders, and returns) can be extracted in stream form and processed by a report-generating application. This application prints the orders for a given day, the preorders that are going to be due, and the returns that are coming back to the supplier. The report-generating application processes the data by reading in and parsing a stream of XML.

One class that can be used to read and parse such an XML stream is XmlReader. Other classes in the .NET Framework are derived from XmlReader, such as XmlTextReader, which can read XML from a file (specified by a string corresponding to the file’s name), a Stream, or an XmlReader. This example uses an XmlReader to read an XML document contained in a file. Reading XML from a file and writing it to a file is not the norm when it comes to XML processing, but a file is the simplest way to access XML data. This simplified access enables you to focus on XML-specific issues.

In creating a sample, the first step is to make the proper imports into the Module1.vb file:

  Imports System.Xml Imports System.Xml.Serialization Imports System.IO

From there, the next step in accessing a stream of XML data is to create an instance of the object that will open the stream (the readMovieInfo variable of type XmlReader) and then open the stream itself. Your application performs this as follows (where MovieManage.xml is the name of the file containing the XML document):

  Dim myXmlSettings As New XmlReaderSettings() Using readMovieInfo As XmlReader = XmlReader.Create(fileName, myXmlSettings)

Note that because the XmlWriter has a settings class, the XmlReader also has a settings class. Though you can make assignments to the XmlReaderSettings object, in this case you do not. Later, this chapter covers the XmlReaderSettings object.

The basic mechanism for traversing each stream is to traverse from node to node using the Read() method. Node types in XML include Element and Whitespace. Numerous other node types are defined, but this example focuses on traversing XML elements and the white space that is used to make the elements more readable (carriage returns, linefeeds, and indentation spaces). Once the stream is positioned at a node, the MoveToNextAttribute() method can be called to read each attribute contained in an element. The MoveToNextAttribute() method only traverses attributes for nodes that contain attributes (nodes of type element). An example of an XmlReader traversing each node and then traversing the attributes of each node follows:

  While readMovieInfo.Read()    ' Process node here.    While readMovieInfo.MoveToNextAttribute()       ' Process attribute here.    End While End While

This code, which reads the contents of the XML stream, does not utilize any knowledge of the stream’s contents. However, a great many applications know exactly how the stream they are going to traverse is structured. Such applications can use XmlReader in a more deliberate manner and not simply traverse the stream without foreknowledge.

Once the example stream has been read, it can be cleaned up using the End Using call:

  End Using

This ReadMovieXml subroutine takes the filename containing the XML to read as a parameter. The code for the subroutine is as follows (and is basically the code just outlined):

 Private Sub ReadMovieXml(ByVal fileName As String)    Dim myXmlSettings As New XmlReaderSettings()    Using readMovieInfo As XmlReader = XmlReader.Create(fileName, myXmlSettings)       While readMovieInfo.Read()          ShowXmlNode(readMovieInfo)          While readMovieInfo.MoveToNextAttribute()             ShowXmlNode(readMovieInfo)          End While       End While    End Using    Console.ReadLine() End Sub

For each node encountered after a call to the Read() method, ReadMovieXml() calls the ShowXmlNode() subroutine. Similarly, for each attribute traversed, the ShowXmlNode() subroutine is called. This subroutine breaks down each node into its sub-entities:

Depth - The Depth property of XmlReader determines the level at which a node resides in the XML document tree. To understand depth, consider the following XML document composed solely of elements: <A><B></B><C><D></D></C></A>.

Element <A> is the root element, and when parsed would return a Depth of 0. Elements <B> and <C> are contained in <A> and are hence a Depth value of 1. Element <D> is contained in <C>. The Depth property value associated with <D> (depth of 2) should, therefore, be one more than the Depth property associated with <C> (depth of 1).
Type - The type of each node is determined using the NodeType property of XmlReader. The node returned is of enumeration type, XmlNodeType. Permissible node types include Attribute, Element, and Whitespace. (Numerous other node types can also be returned, including CDATA, Comment, Document, Entity, and DocumentType.)
Name - The type of each node is retrieved using the Name property of XmlReader. The name of the node could be an element name, such as <ElokuvaTilaus>, or an attribute name, such as ElokuvaId.
Attribute Count - The number of attributes associated with a node is retrieved using the AttributeCount property of XmlReader’s NodeType.
Value - The value of a node is retrieved using the Value property of XmlReader. For example, the element node <Nimi> contains a value of Grease.

Subroutine ShowXmlNode() is implemented as follows:

  Private Sub ShowXmlNode(ByVal reader As XmlReader)   If reader.Depth > 0 Then      For depthCount As Integer = 1 To reader.Depth         Console.Write(" ")      Next   End If   If reader.NodeType = XmlNodeType.Whitespace Then      Console.Out.WriteLine("Type: {0} ", reader.NodeType)   ElseIf reader.NodeType = XmlNodeType.Text Then      Console.Out.WriteLine("Type: {0}, Value: {1} ", _                           reader.NodeType, _                           reader.Value)   Else      Console.Out.WriteLine("Name: {0}, Type: {1}, " & _                           "AttributeCount: {2}, Value: {3} ", _                           reader.Name, _                           reader.NodeType, _                           reader.AttributeCount, _                           reader.Value)   End If End Sub

Within the ShowXmlNode() subroutine, each level of node depth adds two spaces to the output generated:

 If reader.Depth > 0 Then   For depthCount As Integer = 1 To reader.Depth     Console.Write(" ")   Next End If

You add these spaces in order to make human-readable output (so you can easily determine the depth of each node displayed). For each type of node, ShowXmlNode() displays the value of the NodeType property. The ShowXmlNode() subroutine makes a distinction between nodes of type Whitespace and other types of nodes. The reason for this is simple: A node of type Whitespace does not contain a name or attribute count. The value of such a node is any combination of white-space characters (space, tab, carriage return, and so on). Therefore, it does not make sense to display the properties if the NodeType is XmlNodeType.WhiteSpace. Nodes of type Text have no name associated with them, so for this type, subroutine ShowXmlNode only displays the properties NodeType and Value. For all other node types, the Name, AttributeCount, Value, and NodeType properties are displayed.

For the finalization of this module, add a Sub Main as follows:

  Sub Main(ByVal args() As String)    ReadMovieXml("..\MovieManage.xml") End Sub

Here’s an example construction of the MovieManage.xml file:

  <?xml version="1.0" encoding="utf-8" ?> <MovieOrderDump>  <FilmOrder_Multiple>     <multiFilmOrders>        <FilmOrder>           <name>Grease</name>           <filmId>101</filmId>           <quantity>10</quantity>        </FilmOrder>        <FilmOrder>           <name>Lawrence of Arabia</name>           <filmId>102</filmId>           <quantity>10</quantity>        </FilmOrder>        <FilmOrder>           <name>Star Wars</name>           <filmId>103</filmId>           <quantity>10</quantity>        </FilmOrder>     </multiFilmOrders>  </FilmOrder_Multiple>  <PreOrder>     <FilmOrder>        <name>Shrek III − Shrek Becomes a Programmer</name>        <filmId>104</filmId>        <quantity>10</quantity>     </FilmOrder>  </PreOrder>  <Returns>     <FilmOrder>        <name>Star Wars</name>        <filmId>103</filmId>        <quantity>2</quantity>     </FilmOrder>  </Returns> </MovieOrderDump>

Running this module produces the following output (a partial display, as it would be rather lengthy):

 Name: xml, Type: XmlDeclaration, AttributeCount: 2, Value: version="1.0" encoding="utf-8" Name: version, Type: Attribute, AttributeCount: 2, Value: 1.0 Name: encoding, Type: Attribute, AttributeCount: 2, Value: utf-8 Type: Whitespace Name: MovieOrderDump, Type: Element, AttributeCount: 0, Value:  Type: Whitespace  Name: FilmOrder_Multiple, Type: Element, AttributeCount: 0, Value:   Type: Whitespace   Name: multiFilmOrders, Type: Element, AttributeCount: 0, Value:    Type: Whitespace    Name: FilmOrder, Type: Element, AttributeCount: 0, Value:     Type: Whitespace     Name: name, Type: Element, AttributeCount: 0, Value:      Type: Text, Value: Grease

This example managed to use three methods and five properties of XmlReader. The output generated was informative but far from practical. XmlReader exposes over 50 methods and properties, which means that we have only scratched the surface of this highly versatile class. The remainder of this section looks at the XmlReaderSettings class, introduces a more realistic use of XmlReader, and demonstrates how the classes of System.Xml handle errors.

The XmlReaderSettings Class

Just like the XmlWriter object, the XmlReader object requires settings to be applied for instantiation of the object. This means that you can apply settings specifying how the XmlReader object behaves when it is reading whatever XML you might have for it. This includes settings for dealing with white space, schemas, and more:

Open table as spreadsheet

Property	Initial Value	Description
CheckCharacters	True	This property, if set to True, performs a character check on the contents of the retrieved object. Legal characters can be found at www.w3.org/TR/REC-xml#charsets.
CloseOutput	False	Specifies whether the XmlWriter should also close the stream or the System.IO.TextWriter object
ConformanceLevel	Conformance Level.Document	Allows for the XML to be checked to ensure that it follows certain specified rules. Possible conformance-level settings include Document,Fragment, and Auto.
IgnoreComments	False	Defines whether comments should be ignored or not
IgnoreProcessing Instructions	False	Defines whether processing instructions contained within the XML should be ignored
IgnoreWhitespace	False	Defines whether the XmlReader object should ignore all insignificant white space
LineNumberOffset	0	Defines the line number at which the LineNumber property starts counting within the XML file
LinePosition Offset	0	Defines the position in the line number at which the LineNumber property starts counting within the XML file
NameTable	An empty XmlNameTable object	Enables the XmlReader to work with a specific XmlNameTable object that is used for atomized string comparisons
ProhibitDtd	False	Defines whether the XmlReader should perform a DTD validation
Schemas	An empty XmlSchemaSet object	Enables the XmlReader to work with an instance of the XmlSchemaSet class
ValidationFlags	Validation Flags. AllowXml Attributes and Validation Flags. Process-Identity Constraints	Enables you to apply validation schema settings. Possible values include AllowXmlAttributes, ProcessIdentityConstraints, ProcessInlineSchema, ProcessSchemaLocation, ReportValidationWarnings, and None.
ValidationType	None	Specifies whether the XmlReader will perform validation or type assignment when reading. Possible values include Auto, DTD, None, Schema, and XDR.
XmlResolver		A write-only property that enables you to access external documents

An example of using this setting class to modify the behavior of the XmlReader class is as follows:

  Dim myXmlSettings As New XmlReaderSettings() myXmlSettings.IgnoreWhitespace = True myXmlSettings.IgnoreComments = True Using readMovieInfo As XmlReader = XmlReader.Create(fileName, myXmlSettings)    ' Use XmlReader object here. End Using

In this case, the XmlReader object that is created ignores the white space that it encounters as well as any of the XML comments. These settings, once established with the XmlReaderSettings object, are then associated with the XmlReader object through its Create() method.

Traversing XML Using XmlTextReader

An application can easily use XmlReader to traverse a document that is received in a known format. The document can thus be traversed in a deliberate manner. You implemented a class that serialized arrays of movie orders. The next example takes an XML document containing multiple XML documents of that type and traverses them. Each movie order is forwarded to the movie supplier by sending a fax. The document is traversed as follows:

 Read root element: <MovieOrderDump>     Process each <FilmOrder_Multiple> element         Read <multiFilmOrders> element             Process each <FilmOrder>                 Send fax for each movie order here

The basic outline for the program’s implementation is to open a file containing the XML document to parse and to traverse it from element to element:

 Dim myXmlSettings As New XmlReaderSettings() Using readMovieInfo As XmlReader = XmlReader.Create(fileName, myXmlSettings)       readMovieInfo.Read()       readMovieInfo.ReadStartElement("MovieOrderDump")       Do While (True)          '****************************************************          '* Process FilmOrder elements here                  *          '****************************************************       Loop       readMovieInfo.ReadEndElement()  '  </MovieOrderDump> End Using

The preceding code opened the file using the constructor of XmlReader, and the End Using statement takes care of shutting everything down for you. The code also introduced two methods of the XmlReader class:

ReadStartElement(String) - This verifies that the current in the stream is an element and that the element’s name matches the string passed to method ReadStartElement(). If the verification is successful, then the stream is advanced to the next element.
ReadEndElement() - This verifies that the current element is an end tab, and if the verification is successful, then the stream is advanced to the next element.

The application knows that an element, <MovieOrderDump>, will be found at a specific point in the document. The ReadStartElement() method verifies this foreknowledge of the document format. Once all the elements contained in element <MovieOrderDump> have been traversed, the stream should point to the end tag </MovieOrderDump>. The ReadEndElement() method verifies this.

The code that traverses each element of type <FilmOrder> similarly uses the ReadStartElement() and ReadEndElement() methods to indicate the start and end of the <FilmOrder> and <multiFilmOrders> elements. The code that ultimately parses the list of movie orders and faxes the movie supplier (using the FranticallyFaxTheMovieSupplier() subroutine) is as follows:

 Dim myXmlSettings As New XmlReaderSettings() Using readMovieInfo As XmlReader = XmlReader.Create(fileName, myXmlSettings)       readMovieInfo.Read()       readMovieInfo.ReadStartElement("MovieOrderDump")       Do While (True)          readMovieInfo.ReadStartElement("FilmOrder_Multiple")          readMovieInfo.ReadStartElement("multiFilmOrders")          Do While (True)             readMovieInfo.ReadStartElement("FilmOrder")             movieName = readMovieInfo.ReadElementString()             movieId = readMovieInfo.ReadElementString()             quantity = readMovieInfo.ReadElementString()             readMovieInfo.ReadEndElement() ' clear </FilmOrder>             FranticallyFaxTheMovieSupplier(movieName, movieId, quantity)             ' Should read next FilmOrder node             ' else quits             readMovieInfo.Read()             If ("FilmOrder" <> readMovieInfo.Name) Then                Exit Do             End If          Loop          readMovieInfo.ReadEndElement() ' clear </multiFilmOrders>          readMovieInfo.ReadEndElement() ' clear </FilmOrder_Multiple>           ' Should read next FilmOrder_Multiple node          ' else you quit          readMovieInfo.Read() ' clear </MovieOrderDump>          If ("FilmOrder_Multiple" <> readMovieInfo.Name) Then             Exit Do          End If       Loop       readMovieInfo.ReadEndElement()  '  </MovieOrderDump> End Using

Three lines within the preceding code contain a call to the ReadElementString method:

 movieName = readMovieInfo.ReadElementString() movieId = readMovieInfo.ReadElementString() quantity = readMovieInfo.ReadElementString()

While parsing the stream, it was known that an element named <name> existed and that this element contained the name of the movie. Rather than parse the start tag, get the value, and parse the end tag, it was easier just to get the data using the ReadElementString() method. This method retrieves the data string associated with an element and advances the stream to the next element. The ReadElementString() method was also used to retrieve the data associated with the XML elements <filmId> and <quantity>.

The output of this example is a fax, not shown here because the point of this example is to demonstrate that it is simpler to traverse a document when its form is known. The format of the document is still verified by XmlReader as it is parsed.

The XmlReader class also exposes properties that provide more insight into the data contained in the XML document and the state of parsing: IsEmptyElement, EOF, and IsStartElement.

.NET CLR-compliant types are not 100% inline with XML types, so the .NET Framework 2.0 has introduced some new methods in the XmlReader that make the process of casting from one of these XML types to .NET types easier.

Using the ReadElementContentAs() method, you can easily perform the necessary casting required:

  Dim username As String = _     myXmlReader.ReadElementContentAs(GetType(String), DBNull.Value) Dim myDate As DateTime = _    myXmlReader.ReadElementContentAs(GetType(DateTime), DBNull.Value)

Also available is a whole series of direct casts through new methods such as the following:

ReadElementContentAsBase64()
ReadElementContentAsBinHex()
ReadElementContentAsBoolean()
ReadElementContentAsDateTime()
ReadElementContentAsDecimal()
ReadElementContentAsDouble()
ReadElementContentAsFloat()
ReadElementContentAsInt()
ReadElementContentAsLong()
ReadElementContentAsObject()
ReadElementContentAsString()

In addition to these methods, the raw XML associated with the document can also be retrieved, using ReadInnerXml() and ReadOuterXml(). Again, this only scratches the surface of the XmlReader class, a class you will find to be quite rich in functionality.

Handling Exceptions

XML is text and could easily be read using mundane methods such as Read() and ReadLine(). A key feature of each class that reads and traverses XML is inherent support for error detection and handling. To demonstrate this, consider the following malformed XML document found in the file named Malformed.xml:

 <?xml version="1.0" encoding="IBM437" ?> <ElokuvaTilaus Elokuva, Maara="10">    <Nimi>Grease</Nimi> <ElokuvaTilaus>

This document may not immediately appear to be malformed. By wrapping a call to the method you developed (movieReadXML), you can see what type of exception is raised when XmlReader detects the malformed XML within this document:

  Try     movieReadXML("Malformed.xml") Catch xmlEx As XmlException     Console.Error.WriteLine("XML Error: " + xmlEx.ToString()) Catch ex As Exception     Console.Error.WriteLine("Some other error: " + ex.ToString()) End Try

The methods and properties exposed by the XmlReader class raise exceptions of type System.Xml.XmlException. In fact, every class in the System.Xml namespace raises exceptions of type XmlException. Although this is a discussion of errors using an instance of type XmlReader, the concepts reviewed apply to all errors generated by classes found in the System.Xml namespace.

The properties exposed by XmlException include the following:

Data - A set of key/value pairs that enable you to display user-defined information about the exception
HelpLink - The link to the help page that deals with the exception
InnerException - The System.Exception instance indicating what caused the current exception
LineNumber - The number of the line within an XML document where the error occurred
LinePosition - The position within the line specified by LineNumber where the error occurred
Message - The error message that corresponds to the error that occurred. This error took place at the line in the XML document specified by LineNumber and within the line at the position specified by LinePostion.
Source - Provides the name of the application or object that triggered the error
SourceUri - Provides the URI of the element or document in which the error occurred
StackTrace - Provides a string representation of the frames on the call stack when the error was triggered
TargetSite - The method that triggered the error

The error displayed when subroutine movieReadXML processes Malformed.xml is as follows:

 XML Error: System.Xml.XmlException: The ',' character, hexadecimal value 0x2C,  cannot begin a name. Line 2, position 49.

This indicates there is a comma separating the attributes in element <FilmOrder> (ElokuvaTilaus= ”101”, Maara=”10”). This comma is invalid. Removing the comma and running the code again gives the following output:

 XML Error: System.Xml.XmlException: This is an unexpected token. Expected 'EndElement'. Line 5, position 27.

Again, you can recognize the precise error. In this case, you do not have an end element, </ElokuvaTilaus>, but you do have an opening element, <ElokuvaTilaus>.

The properties provided by the XmlException class (such as LineNumber, LinePosition, and Message) provide a useful level of precision when tracking down errors. The XmlReader class also exposes a level of precision with respect to the parsing of the XML document. This precision is exposed by the XmlReader through properties such as LineNumber and LinePosition.

Using the MemoryStream Object

A very useful class that can greatly help you when working with XML is System.IO.MemoryStream. Rather than needing a network or disk resource backing the stream (as in System.Net.Sockets .NetworkStream and System.IO.FileStream), MemoryStream backs itself up onto a block of memory. Imagine that you want to generate an XML document and e-mail it. The built-in classes for sending e-mail rely on having a System.String containing a block of text for the message body, but if you want to generate an XML document, then you need a stream.

If the document is reasonably sized, then write the document directly to memory and copy that block of memory to the e-mail. This is good from a performance and reliability perspective because you don’t have to open a file, write it, rewind it, and read the data back in again. However, you must consider scalability in this situation because if the file is very large, or if you have a great number of smaller files, then you could run out of memory (in which case you have to go the “file” route).

This section describes how to generate an XML document to a MemoryStream object, reading the document back out again as a System.String value and e-mailing it. What you’ll do is create a new class called EmailStream that extends MemoryStream. This new class contains an extra method called CloseAndSend() that, as its name implies, closes the stream and sends the e-mail message.

First create a new console application project called EmailStream. The first task is to create a basic Customer object that contains a few basic members and can be automatically serialized by .NET through use of the SerializableAttribute attribute:

  <Serializable()> Public Class Customer   ' members...   Public Id As Integer   Public FirstName As String   Public LastName As String   Public Email As String End Class

The fun part now is the EmailStream class itself. This needs access to the System.Web.Mail namespace, so you need to add a reference to the System.Web assembly. The new class should also extend System.IO.MemoryStream, as shown here:

  Imports System.IO Imports System.Web.Mail Public Class EmailStream     Inherits MemoryStream

The first job of CloseAndSend() is to start putting together the mail message. This is done by creating a new System.Web.Mail.MailMessage object and configuring the sender, recipient, and subject:

  ' CloseAndSend - close the stream and send the email... Public Sub CloseAndSend(ByVal fromAddress As String, _                         ByVal toAddress As String, _                         ByVal subject As String)    ' Create the new message...    Dim message As New MailMessage    message.From = fromAddress    message.To = toAddress    message.Subject = subject

This method will be called once the XML document has been written to the stream, so you can assume at this point that the stream contains a block of data. To read the data back out again, you have to rewind the stream and use a System.IO.StreamReader. Before you do this, however, call Flush(). Traditionally, streams have always been buffered - that is, the data is not sent to the final destination (the memory block in this case, but a file in the case of a FileStream, and so on) each time the stream is written. Instead, the data is written in (pretty much) a nondeterministic way. Because you need all the data to be written, you call Flush() to ensure that all the data has been sent to the destination and that the buffer is empty.

In a way, EmailStream is a great example of buffering. All the data is held in a memory “buffer” until you finally send the data on to its destination in a response to an explicit call to this method:

  ' Flush and rewind the stream... Flush() Seek(0, SeekOrigin.Begin)

Once you’ve flushed and rewound the stream, you can create a StreamReader and dredge all the data out into the Body property of the MailMessage object:

  ' Read out the data... Dim reader As New StreamReader(Me) message.Body = reader.ReadToEnd()

After you’ve done that, close the stream by calling the base class method:

  ' Close the stream... Close()

Finally, send the message:

     ' Send the message...    SmtpMail.Send(message) End Sub

To call this method, you need to add some code to the Main method. First create a new Customer object and populate it with some test data:

  Imports System.Xml.Serialization Module Module1   Sub Main()     ' Create a new customer...     Dim customer As New Customer     customer.Id = 27     customer.FirstName = "Bill"     customer.LastName = "Gates"     customer.Email = "bill.gates@microsoft.com"

After you’ve done that, you can create a new EmailStream object. You then use XmlSerializer to write an XML document representing the newly created Customer instance to the block of memory that EmailStream is backing to:

  ' Create a new email stream... Dim stream As New EmailStream ' Serialize... Dim serializer As New XmlSerializer(customer.GetType()) serializer.Serialize(stream, customer)

At this point, the stream will be filled with data, and after all the data has been flushed, the block of memory that EmailStream backs on to will contain the complete document. Now you can call CloseAndSend to e-mail the document:

      ' Send the email...     stream.CloseAndSend("evjen@yahoo.com", _        "evjen@yahoo.com", "XML Customer Document")   End Sub End Module

You probably already have the Microsoft SMTP service properly configured - this service is necessary to send e-mail. You also need to make sure that the e-mail addresses used in your code go to your e-mail address! Run the project and check your e-mail; and you should see something similar to what is shown in Figure 11-2.

image from book
Figure 11-2

Document Object Model (DOM)

The classes of the System.Xml namespace that support the Document Object Model (DOM) interact as illustrated in Figure 11-3.

image from book
Figure 11-3

Within this diagram, an XML document is contained in a class named XmlDocument. Each node within this document is accessible and managed using XmlNode. Nodes can also be accessed and managed using a class specifically designed to process a specific node’s type (XmlElement, XmlAttribute, and so on). XML documents are extracted from XmlDocument using a variety of mechanisms exposed through such classes as XmlWriter, TextWriter, Stream, and a file (specified by filename of type String). XML documents are consumed by an XmlDocument using a variety of load mechanisms exposed through the same classes.

Where a DOM-style parser differs from a stream-style parser is with respect to movement. Using the DOM, the nodes can be traversed forward and backward. Nodes can be added to the document, removed from the document, and updated. However, this flexibility comes at a performance cost. It is faster to read or write XML using a stream-style parser.

The DOM-specific classes exposed by System.Xml include the following:

XmlDocument - Corresponds to an entire XML document. A document is loaded using the Load() method. XML documents are loaded from a file (the filename specified as type String), TextReader, or XmlReader. A document can be loaded using LoadXml() in conjunction with a string containing the XML document. The Save() method is used to save XML documents. The methods exposed by XmlDocument reflect the intricate manipulation of an XML document. For example, the following self-documenting creation methods are implemented by this class: CreateAttribute(), CreateCDataSection(), CreateComment(), CreateDocumentFragment(), CreateDocumentType(), CreateElement(), CreateEntityReference(), CreateNavigator(), CreateNode(), CreateProcessingInstruction(), Create SignificantWhitespace(), CreateTextNode(), CreateWhitespace(), and CreateXml Declaration(). The elements contained in the document can be retrieved. Other methods support the retrieving, importing, cloning, loading, and writing of nodes.
XmlNode - Corresponds to a node within the DOM tree. This class supports datatypes, namespaces, and DTDs. A robust set of methods and properties are provided to create, delete, and replace nodes: AppendChild(), Clone(), CloneNode(), CreateNavigator(), InsertAfter(), InsertBefore(), Normalize(), PrependChild(), RemoveAll(), RemoveChild(), ReplaceChild(), SelectNodes(), SelectSingleNode(), Supports(), WriteContentTo(), and WriteTo(). The contents of a node can similarly be traversed in a variety of ways: FirstChild, LastChild, NextSibling, ParentNode, and PreviousSibling.
XmlElement - Corresponds to an element within the DOM tree. The functionality exposed by this class contains a variety of methods used to manipulate an element’s attributes: AppendChild(), Clone(), CloneNode(), CreateNavigator(), GetAttribute(), GetAttributeNode(), GetElementsByTagName(), GetNamespaceOfPrefix(), GetPrefixOfNamespace(), InsertAfter(), InsertBefore(), Normalize(), PrependChild(), RemoveAll(), RemoveAllAttributes(), RemoveAttributeAt(), RemoveAttributeNode(), RemoveChild(), ReplaceChild(), SelectNodes(), SelectSingleNode(), SetAttribute(), SetAttributeNode(), Supports(), WriteContentTo(), and WriteTo().
XmlAttribute - Corresponds to an attribute of an element (XmlElement) within the DOM tree. An attribute contains data and lists of subordinate data, so it is a less complicated object than an XmlNode or an XmlElement. An XmlAttribute can retrieve its owner document (property, OwnerDocument), retrieve its owner element (property, OwnerElement), retrieve its parent node (property, ParentNode), and retrieve its name (property, Name). The value of an XmlAttribute is available via a read-write property named Value. The methods available to XmlAttribute include AppendChild(), Clone(), CloneNode(), CreateNavigator(), GetNamespaceOfPrefix(), GetPrefixOfNamespace(), InsertAfter(), InsertBefore(), Normalize(), PrependChild(), RemoveAll(), RemoveChild(), ReplaceChild(), SelectNodes(), SelectSingleNode(), WriteContentTo(), and WriteTo().

Given the diverse number of methods and properties (and there are many more than those listed here) exposed by XmlDocument, XmlNode, XmlElement, and XmlAttribute, it should be clear that any XML 1.0 or 1.1-compliant document can be generated and manipulated using these classes. In comparison to their XML stream counterparts, these classes afford more flexible movement within the XML document and through any editing of XML documents.

A similar comparison could be made between DOM and data serialized and deserialized using XML. Using serialization, the type of node (for example, attribute or element) and the node name are specified at compile time. There is no on-the-fly modification of the XML generated by the serialization process.

Other technologies that generate and consume XML are not as flexible as the DOM. This includes ADO.NET and ADO, which generate XML of a particular form. The default install of SQL Server 2000 does expose a certain amount of flexibility when it comes to the generation (FOR XML queries) and consumption of XML (OPENXML). SQL Server 2005 has more support for XML and even supports an XML datatype. SQL Server 2005 also expands upon the FOR XML query with FOR XML TYPE. The choice between using classes within the DOM and a version of SQL Server is a choice between using a language, such as Visual Basic, to manipulate objects or installing SQL Server and performing most of the XML manipulation in SQL.

DOM Traversing Raw XML Elements

The first DOM example loads an XML document into an XmlDocument object using a string that contains the actual XML document. This scenario is typical of an application that uses ADO.NET to generate XML, but then uses the objects of the DOM to traverse and manipulate this XML. ADO.NET’s DataSet object contains the results of ADO.NET data access operations. The DataSet class exposes a GetXml method that retrieves the underlying XML associated with the DataSet. The following code demonstrates how the contents of the DataSet are loaded into the XmlDocument:

  Dim xmlDoc As New XmlDocument Dim ds As New DataSet ' Set up ADO.NET DataSet() here xmlDoc.LoadXml(ds.GetXml())

This example simply traverses each XML element (XmlNode) in the document (XmlDocument) and displays the data accordingly. The data associated with this example is not retrieved from a DataSet but is instead contained in a string, rawData, which is initialized as follows:

  Dim rawData As String = _     "<multiFilmOrders>" & _     "  <FilmOrder>" & _     "    <name>Grease</name>" & _     "    <filmId>101</filmId>" & _     "    <quantity>10</quantity>" & _     "  </FilmOrder>" & _     "  <FilmOrder>" & _     "    <name>Lawrence of Arabia</name>" & _     "    <filmId>102</filmId>" & _     "    <quantity>10</quantity>" & _     "  </FilmOrder>" & _     "</multiFilmOrders>"

The XML document in rawData is a portion of the XML hierarchy associated with a movie order. The basic idea in processing this data is to traverse each <FilmOrder> element in order to display the data it contains. Each node corresponding to a <FilmOrder> element can be retrieved from your XmlDocument using the GetElementsByTagName() method (specifying a tag name of FilmOrder). The GetElementsByTagName() method returns a list of XmlNode objects in the form of a collection of type XmlNodeList. Using the For Each statement to construct this list, the XmlNodeList (movieOrderNodes) can be traversed as individual XmlNode elements (movieOrderNode). The code for handling this is as follows:

 Dim xmlDoc As New XmlDocument Dim movieOrderNodes As XmlNodeList Dim movieOrderNode As XmlNode xmlDoc.LoadXml(rawData) ' Traverse each <FilmOrder> movieOrderNodes = xmlDoc.GetElementsByTagName("FilmOrder") For Each movieOrderNode In movieOrderNodes     '**********************************************************     ' Process <name>, <filmId> and <quantity> here     '********************************************************** Next

Each XmlNode can then have its contents displayed by traversing the children of this node using the ChildNodes() method. This method returns an XmlNodeList (baseDataNodes) that can be traversed one XmlNode list element at a time:

  Dim baseDataNodes As XmlNodeList Dim bFirstInRow As Boolean baseDataNodes = movieOrderNode.ChildNodes bFirstInRow = True For Each baseDataNode As XmlNode In baseDataNodes   If (bFirstInRow) Then     bFirstInRow = False   Else     Console.Out.Write(", ")   End If   Console.Out.Write(baseDataNode.Name & ": " & baseDataNode.InnerText) Next Console.Out.WriteLine()

The bulk of the preceding code retrieves the name of the node using the Name property and the InnerText property of the node. The InnerText property of each XmlNode retrieved contains the data associated with the XML elements (nodes) <name>, <filmId>, and <quantity>. The example displays the contents of the XML elements using Console.Out. The XML document is displayed as follows:

 name: Grease, filmId: 101, quantity: 10 name: Lawrence of Arabia, filmId: 102, quantity: 10

Other, more practical, methods for using this data could have been implemented, including the following:

The contents could have been directed to an ASP.NET Response object, and the data retrieved could have been used to create an HTML table (<table> table, <tr> row, and <td> data) that would be written to the Response object.
The data traversed could have been directed to a ListBox or ComboBox Windows Forms control. This would enable the data returned to be selected as part of a GUI application.
The data could have been edited as part of your application’s business rules. For example, you could have used the traversal to verify that the <filmId> matched the <name>. Something like this could have been done if you really wanted to validate the data entered into the XML document in any manner.

Here is the example in its entirety:

 Dim rawData As String = _     "<multiFilmOrders>" & _     "  <FilmOrder>" & _     "    <name>Grease</name>" & _     "    <filmId>101</filmId>" & _     "    <quantity>10</quantity>" & _     "  </FilmOrder>" & _     "  <FilmOrder>" & _     "    <name>Lawrence of Arabia</name>" & _     "    <filmId>102</filmId>" & _     "    <quantity>10</quantity>" & _     "  </FilmOrder>" & _     "</multiFilmOrders>" Dim xmlDoc As New XmlDocument Dim movieOrderNodes As XmlNodeList Dim movieOrderNode As XmlNode Dim baseDataNodes As XmlNodeList Dim bFirstInRow As Boolean xmlDoc.LoadXml(rawData) ' Traverse each <FilmOrder> movieOrderNodes = xmlDoc.GetElementsByTagName("FilmOrder") For Each movieOrderNode In movieOrderNodes   baseDataNodes = movieOrderNode.ChildNodes   bFirstInRow = True   For Each baseDataNode As XmlNode In baseDataNodes     If (bFirstInRow) Then       bFirstInRow = False     Else       Console.Out.Write(", ")     End If     Console.Out.Write(baseDataNode.Name & ": " & baseDataNode.InnerText)   Next   Console.Out.WriteLine() Next

DOM Traversing XML Attributes

This next example demonstrates how to traverse data contained in attributes and how to update the attributes based on a set of business rules. In this example, the XmlDocument object is populated by retrieving an XML document from a file. After the business rules edit the object, the data is persisted back to the file:

 Dim xmlDoc As New XmlDocument  xmlDoc.Load("..\MovieSupplierShippingListV2.xml") '******************************************* ' Business rules process document here '******************************************* xmlDoc.Save("..\MovieSupplierShippingListV2.xml")

The data contained in the file, MovieSupplierShippingListV2.xml, is a variation of the movie order. You have altered your rigid standard (for the sake of example) so that the data associated with individual movie orders is contained in XML attributes instead of XML elements. An example of this movie order data is as follows:

 <FilmOrder name="Grease" film quantity="10" />

You have already seen how to traverse the XML elements associated with a document, so let’s assume that you have successfully retrieved the XmlNode associated with the <FilmOrder> element:

  Dim attributes As XmlAttributeCollection Dim filmId As Integer Dim quantity As Integer attributes = node.Attributes() For Each attribute As XmlAttribute In attributes   If 0 = String.Compare(attribute.Name, "filmId") Then     filmId = attribute.InnerXml   ElseIf 0 = String.Compare(attribute.Name, "quantity") Then     quantity = attribute.InnerXml   End If Next

The preceding code traverses the attributes of an XmlNode by retrieving a list of attributes using the Attributes() method. The value of this method is used to set the attributes’ object (datatype, XmlAttributeCollection). The individual XmlAttribute objects (variable, attribute) contained in attributes are traversed using a For Each loop. Within the loop, the contents of the filmId and the quantity attribute are saved for processing by your business rules.

Your business rules execute an algorithm that ensures that the movies in the company’s order are provided in the correct quantity. This rule specifies that the movie associated with filmId=101 must be sent to the customer in batches of six at a time due to packaging. In the event of an invalid quantity, the code for enforcing this business rule keeps removing a single order from the quantity value until the number is divisible by six. Then this number is assigned to the quantity attribute. The Value property of the XmlAttribute object is used to set the correct value of the order’s quantity. The code performing this business rule is as follows:

  If filmId = 101 Then   ' This film comes packaged in batches of six.   Do Until (quantity / 6) = True     quantity -= 1   Loop   Attributes.ItemOf("quantity").Value = quantity End If

What is elegant about this example is that the list of attributes was traversed using For Each. Then ItemOf was used to look up a specific attribute that had already been traversed. This would not have been possible by reading an XML stream with an object derived from the XML stream reader class, XmlReader.

You can use this code as follows:

 Sub TraverseAttributes(ByRef node As XmlNode)     Dim attributes As XmlAttributeCollection     Dim filmId As Integer     Dim quantity As Integer     attributes = node.Attributes()     For Each attribute As XmlAttribute In attributes         If 0 = String.Compare(attribute.Name, "filmId") Then             filmId = attribute.InnerXml         ElseIf 0 = String.Compare(attribute.Name, "quantity") Then             quantity = attribute.InnerXml         End If     Next     If filmId = 101 Then        ' This film comes packaged in batches of six        Do Until (quantity / 6) = True           quantity -= 1        Loop        Attributes.ItemOf("quantity").Value = quantity     End If End Sub  Sub WXReadMovieDOM()     Dim xmlDoc As New XmlDocument     Dim movieOrderNodes As XmlNodeList     xmlDoc.Load("..\MovieSupplierShippingListV2.xml")     ' Traverse each <FilmOrder>     movieOrderNodes = xmlDoc.GetElementsByTagName("FilmOrder")     For Each movieOrderNode As XmlNode In movieOrderNodes         TraverseAttributes(movieOrderNode)     Next     xmlDoc.Save("..\MovieSupplierShippingListV2.xml") End Sub