Single-Pass Processing of XML Documents

I l @ ve RuBoard

As mentioned previously, it is often convenient to process an XML document in a single pass without retaining the document in memory. This is true both when you're reading a document from an external source and when you're creating a document to be transmitted to an external destination. The .NET Framework provides XML readers and XML writers for reading and creating XML documents in a serial fashion.

Parsing XML Documents Using the XMLReader Class

Historically, forward-only, noncached parsing of XML has been performed using the SAX API. Under the SAX model, you register a callback interface that acts as a sink for events generated by the SAX parser. The parser will generate an event as it encounters each part of the document, such as a start tag, some text content, or an end tag. In effect, the parser pushes the information from the document at your callback handler, so SAX is termed a push model. The parser will inform you of every part of the document it discovers ”elements, text, white space, and so forth ”even if you're not interested in them. You cannot unregister a particular type of information event; you must simply ignore those calls by not providing any code in the event handler method. This has obvious implications for the efficiency and speed of processing.

The forward-only, noncached parsing provided by the .NET Framework is similar in concept to SAX processing, with one important difference. In the equivalent .NET model, data is pulled from the parser on request. This means that the application never sees data in which it has no interest. As you'll see, it is possible to skip unwanted data and to process only elements with specific names .

The central class for forward-only, noncached processing in the .NET Framework is System.Xml.XmlReader . XmlReader is an abstract class that defines the properties and methods required to process an XML document under the pull model. As such, you can use it polymorphically to process XML documents from different sources. Because XMLReader is an abstract class, you must pick a concrete subclass to use the features defined. Three concrete subclasses of XmlReader are provided in the System.Xml namespace:

  • XmlTextReader , which is the fastest form of XmlReader . XmlText ­Reader ensures that the document is well- formed , but it does not validate the document being processed , nor does it resolve entities (either internal or external) except to ensure their well- formedness . If no validation or entity resolution is required, this is usually the best choice.

  • XmlValidatingReader , which adds a validation layer on top of an XmlTextReader . It can validate the document against a DTD or a schema, depending on the properties set.

  • XmlNodeReader , which provides forward-only processing over an XmlNode that forms part of a DOM tree. The XmlNodeReader does not implement any validation of the document being processed. (The XmlNode class is discussed later in the chapter.)

All of these subclasses ensure the well-formedness of the document being processed. Each one will be described later in the chapter.

Processing XML Using an XMLTextReader Instance

The XmlTextReader provides fast and simple access to an XML document. The constructor specifies the source of the XML document to be processed. Many alternatives are available because the constructor is highly overloaded, but you can obtain the XML document from one of three sources:

  • A file or other form of URL (such as the URL of an ASP.NET application that generates XML). If a single String is passed to the constructor, it is presumed to contain a URL.

  • A System.IO.Stream that is passed in as one of the constructor parameters

  • A System.IO.TextReader that is passed in as one of the constructor parameters

The XML Document

To start looking at how you can use an XmlTextReader to process an XML file, consider the sample file CakeCatalog.xml. (Yes, we're heading toward an online cake store.)

CakeCatalog.xml
 <CakeCatalog> <CakeTypestyle="Celebration" filling="sponge" shape="round"> <Message>HappyBirthday</Message> <Description>Oneofourmostpopularcakes</Description> <Sizes> <Optionvalue="10inch"/> <Optionvalue="12inch"/> <Optionvalue="14inch"/> </Sizes> </CakeType> <CakeTypestyle="Wedding" filling="sponge" shape="square"> <Message/> <Description>A3-tiercreationtograceanyceremony</Description> </CakeType> <CakeTypestyle="Wedding" filling="fruit" shape="round"> <Message/> <Description>Aheaviercakeforhungrierguests</Description> </CakeType> <CakeTypestyle="Christmas" filling="fruit" shape="square"> <Message>Season'sGreetings</Message> <Description>Spicyfruitcakeforcoldevenings</Description> <Sizes> <Optionvalue="12inch"/> <Optionvalue="14inch"/> </Sizes> </CakeType> </CakeCatalog> 

As you can see, there are different CakeType elements, each defined with attributes representing the style, filling, and shape of the cake. Within each CakeType element is more information about the message displayed on the cake, a description of the cake, and any size options. As a first pass, let's examine how to get hold of the CakeType elements and list out their attributes.

Finding and Processing Elements and Attributes

Before you can process the document, you must create an XmlReader based on it:

 XmlReaderreader=newXmlTextReader(args[0]); 

Warning

You might expect this line to generate an exception if the specified file does not exist or does not contain well-formed XML. However, the source file (or stream) and its contents are checked only on the first call to read content from the XML document.


Using an XmlReader , you can read nodes in sequence from the XML document. The XmlReader class exposes properties containing information about the current node, such as its name and value. If the current node is an element that has attributes or contains other XML or text, you can make calls to retrieve its attributes or contents. You can move the XmlReader on to the next node in the document by calling the Read method:

 reader.Read() 

The Read method returns a Boolean value indicating whether the operation was successful ( true ) or not ( false ). Initially, the XmlReader does not point to any node in the document. After the first call to Read , the XmlReader will refer to the first node in the document, which is typically the XML declaration. After each subsequent call to Read , the XmlReader will point to the next node in the document. When the current node changes, the exposed properties of the XmlReader will also change to reflect those of the current node. Remember, you cannot go back so you must ensure that you've obtained all the information you need from a node before proceeding further.

When the XmlReader encounters the end of the document, a call to Read will return false . The following code fragment shows how you can list out the nodes in a document and display certain information about them such as their type, local name, namespace, and their attribute and content information:

 while(reader.Read()) { Console.WriteLine("---STARTNODE---"); Console.WriteLine("Nodetype: " +reader.get_NodeType()); Console.WriteLine("Name: " +reader.get_LocalName()); Console.WriteLine("Namespace: " +reader.get_NamespaceURI()); if(reader.get_HasAttributes()) { Console.WriteLine("Hasattributes:yes"); Console.WriteLine("Numattributes: " + reader.get_AttributeCount()); } else { Console.WriteLine("Hasattributes:no"); } if(reader.get_HasValue()) { Console.WriteLine("Hasvalue:yes"); //Surroundvaluewithasteriskstodelimitwhitespace Console.WriteLine("Value:***" +reader.get_Value()+ "***"); } else { Console.WriteLine("Hasvalue:no"); } Console.WriteLine("----ENDNODE----"); } 

The output produced by processing the CakeCatalog.xml document using the code shown above looks like this:

 ---STARTNODE--- Nodetype:XmlDeclaration Name:xml Namespace: Hasattributes:yes Numattributes:2 Hasvalue:yes Value:***version="1.0" encoding="utf-8"*** ----ENDNODE---- ---STARTNODE--- Nodetype:Whitespace Name: Namespace: Hasattributes:no Hasvalue:yes Value:*** *** ----ENDNODE---- ---STARTNODE--- Nodetype:Element Name:CakeCatalog Namespace: Hasattributes:no Hasvalue:no ----ENDNODE---- ---STARTNODE--- Nodetype:Whitespace Name: Namespace: Hasattributes:no Hasvalue:yes Value:*** *** ----ENDNODE---- ---STARTNODE--- Nodetype:Element Name:CakeType Namespace: Hasattributes:yes Numattributes:3 Hasvalue:no ----ENDNODE---- 

Note

When performing read operations, the XmlReader does not consider the attributes of an element to be nodes in their own right. If you were to process the CakeCatalog.xml document using the code shown above, it would list out nodes of type Element , EndElement , XmlDeclaration , Whitespace , and Text . Attributes would not be listed. If you need to treat an attribute as a node, use the MoveToAttribute method to point the XmlReader at the given attribute.


You'll find the code to perform this simple listing of the CakeCatalog.xml file in the SimpleXmlReaderCatalogLister.jsl file in the SimpleXmlReaderCatalogLister sample project. This program takes the name of the XML document as a command-line parameter, creates an XmlTextReader for it, and performs the while loop shown previously to print out all of the nodes in the document.

If you were processing CakeCatalog.xml in an application, you would want to extract the CakeType elements and examine their attributes and contents. You'll find the code for extracting the different CakeType elements from CakeCatalog.xml in the XmlReaderCatalogLister.jsl sample file. This program takes the name of the XML document as a command-line parameter, creates an XmlTextReader for it, and calls the listCakes method. This method tests the type of each node encountered , ignoring any nonelements such as text or white space. To determine whether the current node is an element, the method retrieves the value of the XmlReader constructor's NodeType property and compares it to XmlNodeType.Element . Note that XmlNodeType is a .NET enumeration that contains values for each different type of node, so it has to be cast to a Java int to be used in a switch-case statement:

 switch((int)reader.get_NodeType()) { } 

The name of the current element can then be checked to see whether it is CakeType . If so, the attributes can be retrieved and output:

 if(reader.get_Name().CompareTo("CakeType")==0) { } 

The name returned by get_Name is the qualified name of the element. As you saw earlier, you can retrieve the local name and namespace separately using get_LocalName and get_NamespaceURI .

The attribute information is extracted using the GetAttribute method. This is an overloaded method that allows you to access attributes by name or by index. Accessing attribute values does not change the current node.

The results of processing CakeCatalog.xml are shown here:

 Around,sponge-filledCelebrationcake Asquare,sponge-filledWeddingcake Around,fruit-filledWeddingcake Asquare,fruit-filledChristmascake 

The two programs shown so far are fairly simple, but they introduce the basic processing model of the forward-only, noncached style:

  • Use a method call to change the current node. You can skip nodes or ask for nodes by name.

  • Retrieve the properties of the current node to access data and metadata.

As you progress through this chapter, you'll learn various techniques that improve on this style of processing.

Traversing Hierarchies and Reading Content

The document CakeCatalog.xml contains a lot more than just the CakeType elements. Each CakeType element contains a Message , a Description , and possibly a list of Sizes . To process these child elements, you have three choices:

  • Add some extra else if statements to the element handling case. The extra if statements will check for the child element names and write out a message based on their content or attributes. This would work reasonably well for our simple document but not for any form of processing that needs to know the context of the child node (what type of cake is currently being processed, for example). Indeed, if a Message element were added to the document outside of a CakeType element, this simple processing would merrily print out a message for it even though it is not an appropriate place to find this element.

  • Perform the same processing as described above but set flags as each element is encountered. You could, for example, set a Boolean flag called inCakeTypeElement to true when you encounter a CakeType element. All of the child element handling code would then verify that this flag is set to true before processing the child element. This is better, but it would mean that you'd be building a fairly complex state machine to process your document. In this case, the pull model has few advantages over the SAX push model.

  • Extend the code that handles the CakeType element so that it walks through subsequent nodes, discovering and processing the child elements as it goes. This approach does not rely on any saved context or a complex state machine.

The code in the ChildElementXmlReaderCatalogLister.jsl sample file illustrates how the children of a CakeType element can be processed using the last approach described. After the attributes of the CakeType element are displayed, the ReadStartElement method checks that the current node is still the CakeType element and then moves to the next node. ReadStartElement is a useful way of ensuring that you know where you are before proceeding further:

 reader.ReadStartElement("CakeType"); 

Depending on the amount of white space in the document, the next node might be the Message element or some white space. In many applications, the white space in the XML document will be irrelevant, so you'd want to skip over the white space and go right to the content. One way to do this is to use the MoveToContent method, which ignores any white space or comments and returns only when it finds the next node with meaningful content. MoveToContent returns the type of the node found (such as Element , EndElement , or some text). In the handling code for CakeType , you can check that the node found is an element and that its name is Message :

 XmlNodeTypecontent=reader.MoveToContent(); if(content==XmlNodeType.Element&& reader.get_Name().CompareTo("Message")==0) { ... } 

White space in an XML document is classified as significant or insignificant. Significant white space occurs in text content or where content is a mixture of text and elements. When you manipulate an XML document, you should usually preserve significant white space between the original document and the final document. Insignificant white space occurs, for example, between elements where tab and newline characters have been added for readability. This latter form of white space can be stripped out or ignored without affecting the contents of the document.

An alternative approach to avoiding insignificant white space is to set the WhitespaceHandling property on the XmlReader to WhitespaceHandling.Significant so that the insignificant white space between the tags is ignored:

 XmlTextReaderreader=newXmlTextReader(args[0]); reader.set_WhitespaceHandling(WhitespaceHandling.Significant); 

Note

In this case, the reader variable is of type XmlTextReader , and not the superclass XmlReader . The WhitespaceHandling property is not part of the superclass, so you cannot set it using a reference to XmlReader .


Limiting white space notification to only significant white space can safely be combined with the use of the MoveToContent method.

Returning to the sample code (once you've found the Message element), you can use the IsEmptyElement property to determine whether the Message element has content to be processed. A simple way to retrieve the text content of the Message element is to move on to the text node and retrieve its value:

 if(!reader.get_IsEmptyElement()) { reader.ReadStartElement("Message"); //Processthetext Console.WriteLine(" Message: " +reader.get_Value()); reader.Read();//MovetotheEndElement } 

Once the text content has been retrieved, the XmlReader will still point at the text content node. If you issue a MoveToContent method call at this point, it will not move the position of the XmlReader because you're already on a content node (the text node). You must therefore move away from the text node to progress. Hence the call to Read after the content is retrieved; it moves the XmlReader on to the EndElement node.

If the node is a Text node or any other form of character data, an alternative to get_Value is the ReadString method. This method performs an explicit read operation, just like the simple Read method, so the call to this method moves the XmlReader on to the EndElement node. Therefore, there is no need for the additional Read method call as there is with get_Value . Making this change would improve the efficiency of the code, but this option is viable only for text-based nodes.

The next step is to issue another Read before the MoveToContent call. If text content is present in the Message element, doing this will move beyond the EndElement node. This move is necessary because an EndElement counts as content as far as MoveToContent is concerned . If there is no content in the element, this Read call will move beyond the Message element (on to white space or the Description element):

 //MovetowhatshouldbetheDescriptionelement reader.Read(); content=reader.MoveToContent(); 

You can then perform the same testing and processing of the Description element as you did on the Message element.

If the CakeType element has a Sizes child element, the Sizes child element will contain a set of Option elements. You should traverse each of these Option elements, retrieving and displaying the value (the size of the cake). Again, you must issue a Read call to move away from the Sizes element, and you can then loop through the Option elements , calling GetAttribute on each of them to retrieve the value.

Other Options for Reading and Navigation

You've seen how to navigate and retrieve content in a straightforward way using Read , MoveToContent , and get_Value . Other methods are available that can help to simplify code that handles particular types of nodes.

When you read the content of a simple element that contains only a text string, you can execute the ReadElementString method. This method returns the text content of the element and moves the XmlReader onto the next node after the EndElement . This approach simplifies the message and description-reading code of the previous example:

 if(content==XmlNodeType.Element&& reader.get_Name().CompareTo("Message")==0) { if(!reader.get_IsEmptyElement()) { Console.WriteLine(" Message: " +reader.ReadElementString()); } else { Console.WriteLine(" Emptymessage"); reader.Skip(); } } //Movetowhatshouldbethe "Description" element content=reader.MoveToContent(); if(content==XmlNodeType.Element&& reader.get_Name().CompareTo("Description")==0) { if(!reader.get_IsEmptyElement()) { Console.WriteLine(" Description: " + reader.ReadElementString()); } else { Console.WriteLine(" Emptydescription"); reader.Skip(); } } 

As you can see, the use of ReadElementString removes the need to Read to the EndElement . Also, no extra read is required before the MoveToContent that takes the XmlReader on to the Description element. What is required, however, is an extra line of code in the else condition, where the element is empty. In this case, the Skip method is used to move the XmlReader beyond the empty element. This leaves the XmlReader in the right position for the next call to MoveToContent . You can use Skip anywhere that you would use the Read method, but Skip moves on to the next node without reading it. The code for this form of the application can be found in the ReadElementStringXmlReaderCatalogLister.jsl sample file.

Another option is to use the ReadInnerXml method to retrieve the contents of the element, as in this example:

 Console.WriteLine(" Description: " +reader.ReadInnerXml()); 

In the case of a simple text-containing element, the call to ReadInnerXML has the same effect as ReadElementString . If the XML contained in the current element is more complex, the XML content is returned as a single string. You can use ReadInnerXml on elements or attributes. In the case of an element, the returned string represents all of the children of the current element, including any markup.

There is also the method ReadOuterXml, which returns a string representing the current node (and all of its children if the node is an element), including any markup. For example, if the XmlReader object points at the first Message element in the document, ReadInnerXml will return "Happy Birthday" and ReadOuterXml will return "<Message>Happy Birthday</Message>".

The XmlReader object allows you to navigate through attributes in a similar way to navigating through elements. Rather than just call GetAttribute, you can use MoveToFirstAttribute , MoveToNextAttribute , and MoveToAttribute to position the XmlReader on one of the attribute nodes of the current start element tag. When you point at an attribute, you can use all of the applicable methods and properties of the XmlReader object. The MoveToElement method repositions the XmlReader to point at the element to which the current attribute belongs.

Types and Namespaces

So far, you've encountered text strings as attribute and element values. In addition to strings, XML documents will regularly contain nontext values such as integer or floating point values, although these must be encoded as strings within the document. The XmlConvert class provides a convenient way of converting common XML types into .NET Framework types.

Consider a scenario in which the cake size information is held as a number:

 <OptionsizeInInches="12"/> 

You can use XmlConvert to convert this into a runtime type:

 StringsizeStr=reader.GetAttribute("sizeInInches"); Console.WriteLine(" Sizeoption: " +sizeStr+ " inch"); intfeedsApprox=XmlConvert.ToInt32(sizeStr)*2; Console.WriteLine(" Thiswillfeed " +feedsApprox+ " people(approx)"); 

You can also use the XmlConvert class to create and interpret XML-compliant names. The XML standard sets out characters that are forbidden in names in an XML document. The methods EncodeName and EncodeLocalName convert a J# string into an XML-compliant name; DecodeName performs the reverse conversion.

The difference between EncodeName and EncodeLocalName is in the way they handle colons. This is important because the colon delimits namespace information in an XML-compliant name. All of the names we've looked at so far are simple names that do not include a namespace, so the local name and the qualified name have been the same. If you use namespaces, the code must change to handle this extra information.

Consider a version of the cake catalog that uses namespaces. This cake catalog is issued by the fictional Fourth Coffee company (whose URL is http://www.fourthcoffee.com ). The URI for the namespace is based on Fourth Coffee's URL and specifies a prefix of cakes :

 <cakes:CakeCatalogxmlns:cakes="http://www.fourthcoffee.com/xmlcakes"> <cakes:CakeTypecakes:style="Celebration" cakes:filling="sponge"  cakes:shape="round"> <cakes:Message>HappyBirthday</cakes:Message> 

The prefix might change from document to document, so you should use the namespace-aware forms of methods to navigate through the document. To see what impact the use of namespaces will have, consider how it would change the first few lines of the CakeType handling:

 StringcakeNamespace= "http://www.fourthcoffee.com/xmlcakes"; if(reader.get_LocalName().CompareTo("CakeType")==0&& reader.get_NamespaceURI().CompareTo(cakeNamespace)==0) { Console.WriteLine("A " + reader.GetAttribute("shape",cakeNamespace)+  ", " +reader.GetAttribute("filling",cakeNamespace)+  "-filled " +reader.GetAttribute("style",cakeNamespace)+ " cake"); //Movetowhatshouldbethe "Message" element reader.ReadStartElement("CakeType",cakeNamespace); XmlNodeTypecontent=reader.MoveToContent(); 

The initial if test now makes sure that the LocalName of the element ( CakeTypes ) and the associated namespace ( http://www.fourthcoffee.com/xmlcakes ) are correct before proceeding. Using the get_Name method at this point would return cakes:CakeTypes . It is still possible for you to use the get_Name method and to look for names with a given prefix, in this case cakes: . However, you would have to have already ensured that the correct namespace was associated with cakes for this code to work correctly. Generally, you're better off sticking with the forms of methods that provide explicit namespace information.

The namespace-aware form of the GetAttributes method is used to retrieve the attribute values because these, too, are namespace-qualified in the XML document. You should also use the namespace-aware versions of any method to which you pass an element or attribute name, such as the ReadStartElement method. The code for processing a version of the cake catalog that uses namespaces (CakeCatalogNS.xml) can be found in the NamespaceXmlReaderCatalogLister.jsl sample file.

Exception Handling

While processing an XML document, you might encounter errors. Some of these might relate to the underlying source of the XML, such as an error when reading data from a file or stream. They will generate exceptions unrelated to XML, such as FileNotFoundException , and should be handled appropriately.

If there is a problem with the XML document itself, an XmlException will be raised. For example, an exception will be raised if the document is not well formed or if you request a node with a particular name, using a method such as ReadStartElement , and the current node is not an element with that name.

Note

As with any other .NET exception, XmlException is not a Java language “checked exception, so you must remember to add an appropriate try/catch block.


Writing XML Documents Using the XmlWriter Class

The SAX API was developed because the preexisting DOM model of processing was too cumbersome for some applications that simply needed to parse XML documents. However, SAX is a read-only mechanism, so to output an XML document in the SAX world you could employ DOM to create an in-memory DOM tree and then write it out, or you could use a lot of println statements. The former can be too memory- intensive and unwieldy, and the latter is messy and prone to errors. What you need is a lightweight equivalent of the XMLReader that makes it easier for applications to write out XML documents.

The XMLWriter class in the System.Xml namespace provides what you might think of as a reverse-SAX mechanism. XmlWriter is an abstract class that contains a set of methods that allow you to write out individual pieces of an XML document, such as elements and attributes. The only subclass of XmlWriter is XmlTextWriter . XmlTextWriter contains additional properties, such as those that allow you to configure the output formatting of the document.

To see how a writer works, consider the operations required to write out a cake catalog with a single cake type. The cake type will have a message, description, and size options. All output should have the appropriate namespace information associated with it. For example, the output can look like this:

 <?xmlversion="1.0" encoding="utf-8"?> <cakes:CakeCatalogxmlns:cakes="http://www.fourthcoffee.com/xmlcakes"> <cakes:CakeTypecakes:style="Celebration" cakes:filling="sponge"  cakes:shape="square"> <cakes:Message>Congratulations!</cakes:Message> <cakes:Description>Generalachievement</cakes:Description> <cakes:Sizes> <cakes:Optioncakes:sizeInInches="10" /> <cakes:Optioncakes:sizeInInches="12" /> </cakes:Sizes> </cakes:CakeType> </cakes:CakeCatalog> 

The first step is to create an XMLWriter object. As with the sources for an XMLReader , you can specify multiple types of destination for an XMLWriter :

  • A filename passed as a string

  • A System.IO.Stream

  • A System.IO.TextReader

In the first two cases, you must also specify an encoding that will be used when writing the output document. The XML declaration that is output will contain this encoding as an attribute. The System.Text.Encoding abstract class has a set of encoding-specific subclasses, one of which is UTF8Encoding . It provides a standard Unicode-compliant, 8-bit encoding for the output document. You can use this to create a new XmlTextWriter :

 XmlTextWriterwriter=newXmlTextWriter("catalog.xml", newUTF8Encoding()); 

At this stage, nothing has been written to the document. Before you start to output the individual parts of the document, you should consider the formatting required. If you do not specify any formatting, the XmlWriter will output the XML document without any white space (that is, new lines or indentation). This is fine if the document is to be processed by another application, but it will be awkward for humans to read. To make the document friendlier, set the formatting to System.Xml.Formatting . Indented :

 writer.set_Formatting(Formatting.Indented); 

The XmlWriter also has properties you can set to define the character to be used for indenting (by default, a space) and the number of indenting characters for each level of nesting (by default, 2).

Because you are creating a well-formed XML document, you must write out the XML declaration, followed by the root element:

 writer.WriteStartDocument(); StringcakeNamespace= "http://www.fourthcoffee.com/xmlcakes"; writer.WriteStartElement("cakes", "CakeCatalog",cakeNamespace); ... writer.WriteEndElement();//CakeCatalog 

The WriteStartElement method generates a start element tag with the given local name, namespace, and namespace prefix (if specified). This method has various overloaded forms. In this case, the call will create a CakeCatalog element with a declaration of the Fourth Coffee namespace associated with the cakes prefix. If you need to add additional namespace attributes, you can add them in the same way as any other attribute (as you'll see in a moment).

Because this information is being written as a sequential stream, at this stage only the start tag (and its attributes) will be in the output buffer. At some point before the end of the document, you should write out the associated end element using the WriteEndElement method.

Note

In some cases, you can get away with not explicitly writing end elements. The code that handles start elements and other "major" changes, such as the end of the document, will automatically close any open elements for you. However, you must be careful to ensure that each element is at the correct level of nesting to take advantage of this feature. It is safest to write the end elements explicitly.


The next task is to write out the CakeType element and its attributes. The WriteStartElement method is used again, but this time no namespace prefix is required. Given the local name and the namespace, the XmlWriter can work out which namespace prefix should be used from the namespaces in scope. If you need to discover the namespace prefix associated with a namespace, use the LookupPrefix method. The CakeType element requires that the attributes style , filling , and shape be added to it and that their values to be defined. You can use the WriteAttributeString method to add an attribute to an element and to define its value. Again, the namespace is provided and the writer will work out the appropriate prefix:

 writer.WriteStartElement("CakeType",cakeNamespace); writer.WriteAttributeString("style",cakeNamespace, "Celebration"); writer.WriteAttributeString("filling",cakeNamespace, "sponge"); writer.WriteAttributeString("shape",cakeNamespace, "square"); 

All of the attributes are attached to the current start element tag. If the attribute value being generated is complex, you can use the sequence of method calls WriteStartAttribute , WriteString , and WriteEndAttribute to take more control of attribute writing.

The Message and Description elements are simple elements with text content and no attributes. To write these, you use the WriteElementString method, as in these two examples:

 writer.WriteElementString("Message",cakeNamespace, "Congratulations!"); writer.WriteElementString("Description",cakeNamespace,  "Generalachievement"); 

As with attributes, if the text content of an element is complex, you can use the sequence of method calls WriteStartElement , WriteString , and WriteEnd ­Element to generate it.

The Sizes and Options elements can be written using methods you've already seen. Once you have output an end element for each of your start elements, you should end the document and close it:

 writer.WriteEndDocument(); writer.Close(); 

The document will not be written to the underlying file until Close is called ”the XmlTextWriter class uses buffered output for optimization. At any point, you can write the part of the document created so far using the Flush method. If you're creating a large document, you should flush the output buffer regularly to avoid taking up too much memory. (The size of a document for which you need to do this will vary depending on how much memory your system has, but certainly most applications would want to flush partial documents exceeding 500 KB.) You'll find the full code for writing the desired XML document in the book's XmlWriterCatalogWriter.jsl sample file.

Escaping and Copying When Writing

Certain characters are not allowed in XML documents. Characters that correspond to standard escaped entities, such as the ampersand (&), will automatically be converted into their entity equivalent (in this case, &amp; ), whether they occur in text content or as part of an element or attribute name. However, other characters that might occur as part of a name, such as the colon ( : ), will not be escaped. In this case, you should use the XmlConvert class to encode the name of the element or attribute before writing it out.

If your application is acting as a filter ”that is, it is taking XML input and performing targeted transformations ”you have several useful copying methods available in the XmlWriter class. For example, the WriteAttributes method will copy all of the attribute values from the node pointed to by an XmlReader into the current element being written by the XmlWriter :

 writer.WriteStartElement("cakes", "CakeCatalog",cakeNamespace); writer.WriteAttributes(reader,false); 

The boolean flag parameter to WriteAttributes determines whether default attribute values are copied .

I l @ ve RuBoard


Microsoft Visual J# .NET (Core Reference)
Microsoft Visual J# .NET (Core Reference) (Pro-Developer)
ISBN: 0735615500
EAN: 2147483647
Year: 2002
Pages: 128

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net