Understanding How to Perform Full-Content Reads


Understanding How to Perform Full-Content Reads

While parsing an XML document, the program may need to read the entire content of an element. For example, an XML element might contain a GIF image encoded using Base64 encoding. A program would need to locate the element and read all of the text from it and convert the text back to the image data. Or maybe an element contains the complete chapter of a book, and reading the whole text into one string is not practical for memory constraint reasons. Instead it would be nice to read the string in chunks , processing the data chunk by chunk .

The XmlTextRead provides methods to support these example scenarios and many more. The methods in the following list are referred to as full-content read methods:

  • ReadString

  • ReadChars

  • ReadBase64

  • ReadBinHex

  • ReadInnerXml

  • ReadOuterXml

We will now examine each method in detail. Let's start with reading the complete content as character data. The XmlTextReader provides two methods to facilitate this functionality: ReadChars and ReadString .

Reading All Content as String Data

The ReadString reads the content of an element or a text node as a string. This method behaves differently depending on whether the current node is an element or a text node. If the current node is an element, the method will attempt to concatenate all text, white space, significant white space, and CDATA nodes into one string. This string is returned to the user . This concatenation process stops when any markup is encountered. This includes mixed content as well as the end element. If the current node is a text node, the method will concatenate the same aforementioned nodes from the current text node to the element end tag or any markup that is encountered . It is also possible for the reader to be positioned on an attribute. In this case the reader will behave as if it were on the start element.

The return string can be either the concatenated string described previously or the empty string. The empty string can signify two things when returned from this method: It can mean that there is no more text to be read from the current node, or it can mean that the current node is not an element or a text node. Listing 10.16 demonstrates how to use the ReadString method.

Listing 10.16
 C# string data =   @"<Root>Text data followed by mark up.<Child/></Root>"; StringReader str = new StringReader(data); XmlTextReader reader = new XmlTextReader(str); reader.WhitespaceHandling=WhitespaceHandling.None; reader.MoveToContent(); MessageBox.Show("Content of Root: " + reader.ReadString()); VB Dim strData As String strData = "<Root>Text data followed by mark up.<Child/></Root>" Dim str = New StringReader(strData) Dim reader = New XmlTextReader(str) reader.WhitespaceHandling = WhitespaceHandling.None reader.MoveToContent() MessageBox.Show("Content of Root: " & reader.ReadString()) 

Reading String Data in Chunks

The ReadChars method reads the text content of an element into a character buffer. This method takes three parameters. The first parameter is the character buffer that the text will be copied into; an ArgumentNullException is thrown if this parameter is null. The second parameter is offset into the buffer in which the method should start writing the data; an ArgumentOutOfRangeException is thrown if the index is less then zero. The third parameter is the number of characters to copy into the buffer; an ArgumentOutOfRangeException is thrown if this parameter is less then zero. Also, an ArgumentException is thrown if the number of characters to copy is greater than the space available in the buffer starting from the specified offset index.

The ReadChars method returns the number of characters actually read from the XML stream. This number can be zero for one of two reasons. It can be zero if there are no more characters to be read from the element. It can also read zero if the reader is not positioned on an element.

This method should be used when an element contains a large amount of data or if allocating huge string objects would hurt performance and put a strain on memory usage. This method allows you to read the character data in smaller chunks, process the small chunks, and possibly reuse the same character buffer.

However, there are some characteristics particular to the ReadChars method that should be pointed out. The ReadChars method works only on element nodes. You will not be able to use the ReadChars method even if you are positioned on a text node. The text must be wrapped in an element node. The ReadChars method reads everything from the start tag to the end tag. This includes any markup or CDATA nodes. Well- formedness checking and normalization are turned off when using ReadChars . Also, this method eats the end tag when it finishes reading the content of the element. This is important to note because attempting to call ReadEndElement after reading the content of an element with ReadChars will raise an XmlException . Finally, reading attributes while using ReadChars is not possible. It is important that you examine all of the important attributes before calling ReadChars , because after the method is called, the attribute information will be lost.

Even with the characteristics particular to the ReadChars method, it is still very useful. It can be a very fast, efficient, and memory-conservative way to read text data from an XML element. Listing 10.17 demonstrates how to use the ReadChars method to buffer input from the XmlTextReader .

Listing 10.17
 C# reader.WhitespaceHandling = WhitespaceHandling.None; reader.MoveToContent(); reader.ReadStartElement("Hamlet"); int charsRead; char[] buffer = new char[64]; while(0 != (charsRead = reader.ReadChars(buffer, 0, 64))) {   MessageBox.Show("Characters Read: " + charsRead);   MessageBox.Show("Chars Read: " + new String(buffer)); } VB reader.WhitespaceHandling = WhitespaceHandling.None reader.MoveToContent() reader.ReadStartElement("Hamlet") Dim charsRead As Integer Dim buffer(64) As Char While (Not 0 = (charsRead = reader.ReadChars(buffer, 0, 64)))   MessageBox.Show("Characters Read: " & charsRead)   MessageBox.Show("Chars Read: " & New String(buffer)) End While 

Reading Binary-Encoded Data

The XmlTextReader provides two methods for decoding binary-encoded data: ReadBase64 and ReadBinHex . Both methods read the text form of the binary data and decode the data into an array of bytes. Because the methods are so similar, we will discuss them in parallel. The methods both take three parameters. The first parameter is a byte buffer that will be used to copy out the decoded data; an ArgumentNullException is thrown if the buffer is null. The second parameter is offset into the buffer in which the method should start writing the data; an ArgumentOutOfRangeException is thrown if the index is less than zero. The third parameter is the number of bytes that should be copied into the buffer; an ArgumentOutOfRangeException is thrown if this parameter is less than zero or if the number of bytes to copy is greater than the space available in the buffer, starting from the specified offset index.

Both methods return the number of bytes that were actually read from the XML stream. Zero is returned when there is no more data to be obtained from the XML stream. Like with ReadChars , zero is also returned if the current node is not an element. Listing 10.18 demonstrates how to read binary encoded data with the XmlTextReader .

Listing 10.18
 C# byte[] buffer = new byte[1024]; XmlTextReader reader = new XmlTextReader("input.xml"); reader.MoveToContent(); reader.ReadStartElement("Image"); int bytesRead; while(reader.Name == "Base64" &&       0 != (bytesRead = reader.ReadBase64(buffer, 0, 1024))) {   MessageBox.Show("Bytes Read: " + bytesRead); } while(reader.Name == "BinHex" &&       0 != (bytesRead = reader.ReadBinHex(buffer, 0, 1024))) {   MessageBox.Show("Bytes Read: " + bytesRead); } reader.Close(); VB Dim buffer(1024) As Byte Dim reader As New XmlTextReader("input.xml") reader.MoveToContent() reader.ReadStartElement("Image") Dim bytesRead As Integer While (reader.Name = "Base64" And _        Not 0 = (bytesRead = reader.ReadBase64(buffer, 0, 1024)))   MessageBox.Show("Bytes Read: " & bytesRead.ToString()) End While While (reader.Name = "BinHex" And _        Not 0 = (bytesRead = reader.ReadBinHex(buffer, 0, 1024)))   MessageBox.Show("Bytes Read: " & bytesRead.ToString()) End While reader.Close() 

Reading Element Content as Markup

There will be times when you will need to read all of the content from an element, whether it is plain text or mixed content, as one block of text. The XmlTextReader provides the methods ReadInnerXml and ReadOuterXml , which will read all content from an element and return it in string form.

These methods differ only in that ReaderOuterXml returns the start tag, content, and end tag of the current node, while ReadInnerXml returns only the content of the current node. The methods were designed to work against elements and attributes. If the reader is positioned on any other type of node, the empty string is returned. If the reader is positioned on an element, calling either method will move the reader to the next element after consuming the content of the current element, despite the fact that ReadInnerXml does not return the end tag. Table 10.4 describes what string will be returned from either method when positioned on the same node.

Table 10.4. Return Values of ReadInnerXml and ReadOuterXml

ELEMENT XML

POSITIONED ON

ReadInnerXml

ReadOuterXml

<author>

<author>

<fn>Ronnie</fn>

<author>

<fn>Ronnie</fn>

 

<ln>Yates</ln>

<fn>Ronnie</fn>

<ln>Yates<ln>

   

</ln>Yates</ln>

</author>

   

</author>

If the reader is positioned on an attribute, the ReadInnerXml will return the value of the attribute. If ReadOuterXml is called, the entire attribute as it appears in the XML stream will be returned to the user. Interestingly, in both cases the reader will not be moved to the next attribute. Instead, the reader will remain positioned on the same attribute. Table 10.5 illustrates the different values that are returned from ReadInnerXml and ReadOuterXml when the reader is positioned on an attribute.

Table 10.5. Return Values of ReadInnerXml and ReadOuterXml

ELEMENT XML

POSITIONED ON

ReadInnerXml

ReadOuterXml

<auth fn="Ronnie"/>

fn

Ronnie

fn="Ronnie"

These methods are designed to be used when you need to treat an element's content XML as a separate entity. For example, imagine that an application has a layer of business objects capable of saving and restoring their data in XML format. In this system there might be a central object serializer that would manage writing and reading the data out to a single XML file. This is not a difficult scenario to imagine, because the .NET Compact Framework does not provide an object serializer, such as the XmlSerializer or the BinarySerializer . Using ReadInnerXml or ReadOuterXml , the object serializer could hand off chunks of XML data to the separate business objects for them to parse and use for initialization.



Microsoft.NET Compact Framework Kick Start
Microsoft .NET Compact Framework Kick Start
ISBN: 0672325705
EAN: 2147483647
Year: 2003
Pages: 206

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net