Transforming XML in .NET Applications

I l @ ve RuBoard

You might have many reasons for needing to transform XML in one dialect to XML in another dialect in your application. For example, you might be part of a supply chain in which orders are passed in XML format from customers to suppliers. Or you might have dozens of files containing medical records that must be converted to a common format. In such situations, you must know what conversion is required, obtain an XSLT stylesheet to perform this transformation, and then apply that transformation programmatically using the capabilities provided by the .NET Framework.

The Need for Transformation

In Chapter 5, you saw how the catalog information for the cakes sold by the Fourth Coffee company could be encoded in XML. One form of this catalog is shown in the CakeCatalog.xml sample file, shown here:

CakeCatalog.xml
 <?xmlversion="1.0" encoding="utf-8" ?> <CakeCatalog> <CakeTypestyle="Celebration" filling="sponge" shape="round"> <Message>HappyBirthday</Message> <Description>Oneofourmostpopularcakes</Description> </CakeType> <CakeTypestyle="Wedding" filling="sponge" shape="square"> <Message/> <Description>A3-tiercreationtograceanyceremony</Description> </CakeType> <CakeTypestyle="Wedding" filling="fruit" shape="round"> <Message/> <Description>Aheaviercakeforhungrierguests</Description> </CakeType> <CakeTypestyle="Christmas" filling="fruit" shape="square"> <Message>Season'sGreetings</Message> <Description>Spicyfruitcakeforcoldevenings</Description> </CakeType> </CakeCatalog> 

Imagine that this catalog information will form part of a larger catalog ”perhaps because another firm wants to offer these cakes as part of its own catalog. For this integration to be seamless, the cake catalog must be converted into a more generic form such as that shown in the TxCatalog.xml sample file here:

TxCatalog.xml
 <?xmlversion="1.0" encoding="utf-8" ?> <Catalogtype="Cake" vendor="FourthCoffee"> <Entry> <EntryElementtype="CakeStyle" value="Celebration" /> <EntryElementtype="Filling" value="sponge" /> <EntryElementtype="Shape" value="round" /> <EntryElementtype="Message" value="TEXT_CONTENT">Happy Birthday</EntryElement> <EntryElementtype="Description" value="TEXT_CONTENT">Oneofour mostpopularcakes</EntryElement> </Entry> <Entry> <EntryElementtype="CakeStyle" value="Wedding" /> <EntryElementtype="Filling" value="sponge" /> <EntryElementtype="Shape" value="square" /> <EntryElementtype="Message" value="TEXT_CONTENT" /> <EntryElementtype="Description" value="TEXT_CONTENT">A3-tier creationtograceanyceremony</EntryElement> </Entry> <Entry> <EntryElementtype="CakeStyle" value="Wedding" /> <EntryElementtype="Filling" value="fruit" /> <EntryElementtype="Shape" value="round" /> <EntryElementtype="Message" value="TEXT_CONTENT" /> <EntryElementtype="Description" value="TEXT_CONTENT">Aheavier cakeforhungrierguests</EntryElement> </Entry> <Entry> <EntryElementtype="CakeStyle" value="Christmas" /> <EntryElementtype="Filling" value="fruit" /> <EntryElementtype="Shape" value="square" /> <EntryElementtype="Message" value="TEXT_CONTENT">Season's Greetings</EntryElement> <EntryElementtype="Description" value="TEXT_CONTENT">Spicyfruitcake forcoldevenings</EntryElement> </Entry> </Catalog> 

As you can see, the format of TxCatalog.xml is far more general than that of CakeCatalog.xml. Both documents contain the same information, but the structure is different. You saw in the previous chapter how you can access parts of an XML document and obtain information from it. You also saw how you can create a new XML document using the same APIs. Hence, you can perform such a conversion between these two document structures programmatically using a combination of XmlReader and XmlWriter or using the DOM features of an XmlDocument . However, such a conversion would require document-specific and potentially complex code.

The XSLT language and processing model is designed to aid transformation between XML dialects, so you should use it where transformation is required unless there is a compelling reason not to do so ”for example, when the type of transformation you require is difficult to code in XSLT.

The XSLT Processing Model

XSLT stylesheets are themselves XML documents, so the basic XSLT processing model involves three documents:

  • The source XML document that contains the XML to be transformed

  • The XSLT document that specifies the transformation required

  • The target XML document that will contain the results of the transformation

The source XML document and XSLT document are read and parsed by the XMLT processor, as shown in Figure 6-1. The XSLT transformation is applied to the source document to generate the target XML document.

Note

The target document does not have to be in XML format ”it can be plain text. However, for the purposes of this chapter we will concentrate on the generation of XML documents.


Figure 6-1. An XSLT processor transforms a source XML document based on an XSLT stylesheet.

XSLT is a declarative rather than a procedural language, so it can seem strange at first. An XSLT stylesheet consists of a set of rules that tell the processor what to output when it finds particular types of nodes or combinations of nodes in the source document. These rules are defined using XSLT templates. Each template consists of the following:

  • An XPath expression that identifies a particular part of the source document to transform. XPath allows you to select attributes, elements, or other content by providing a pattern that describes a path into the structure of the source document based on such information as its location in the document or the name of a particular element or attribute.

  • A rule dictating what should be output when that particular pattern is found in the source document.

We cannot go into all aspects of XSLT syntax at this point. If you need to investigate XSLT and XPath syntax in more detail, see Michael Kay's XSLT Programmer's Reference 2nd Edition (Wrox Press, 2001). However, to get an idea of what you can do, you can examine the sample file CatalogTransform.xsl, which defines an XSLT stylesheet that will transform CakeCatalog.xml into TxCatalog.xml:

CatalogTransform.xsl
 <xsl:stylesheetxmlns:xsl="http://www.w3.org/1999/XSL/Transform"  version="1.0"> <xsl:outputmethod="xml" /> <xsl:templatematch="/"> <xsl:apply-templates/> </xsl:template> <xsl:templatematch="CakeCatalog"> <Catalogtype="Cake" vendor="FourthCoffee"> <xsl:apply-templates/> </Catalog> </xsl:template> <xsl:templatematch="CakeType"> <Entry> <EntryElementtype="CakeStyle" value="{@style}" /> <EntryElementtype="Filling" value="{@filling}" /> <EntryElementtype="Shape" value="{@shape}" /> <xsl:apply-templates/> </Entry> </xsl:template> <xsl:templatematch="Message"> <EntryElementtype="Message" value="TEXT_CONTENT"> <xsl:apply-templates/> </EntryElement> </xsl:template> <xsl:templatematch="Description"> <EntryElementtype="Description" value="TEXT_CONTENT"> <xsl:apply-templates/> </EntryElement> </xsl:template> </xsl:stylesheet> 

Within this sample document, you can see some of the aspects common to all XSLT documents:

  1. A namespace declaration indicates that the prefix xsl is used for XSLT-specific tags.

  2. The root element indicates that it is an XSLT stylesheet ( xsl:stylesheet ).

  3. The body of the document consists of a set of xsl:template tags. Each of these defines a rule to be applied.

  4. The pattern in the source document to be matched by each template is defined by the match attribute. The first template matches the root node (indicated by " /") .

    Caution

    It is important to understand that in XSLT terms, the root node is not the same as the DOM root element. The XSLT root node is equivalent to the DOM XmlDocument .


  5. The xsl:output element tells the XSLT processor that the output from the transformation will be an XML document and should include an XML declaration and all XML tags generated by the transformation.

  6. When the root element is matched, the contents of the template are applied. In this case, the template simply contains an xsl:apply-templates element that tells the XSLT processor to carry on and examine the children of the root node to see if any of the other template rules apply to them.

  7. The next node to be matched is the root element (or document element) of the document ”the CakeCatalog element in our example. The template rule for this contains a mixture of literal output (for example, the Catalog tag and its attributes) and another xsl:apply-templates element.

  8. The xsl:apply-templates instruction in the root element processing tells the XSLT processor to continue down the tree and process the child nodes of the root element. In the example document, this means that the children of the CakeCatalog (the CakeType elements) are then processed . Again, the template body contains a mixture of literal content and XSLT instructions. Some of these instructions relate to obtaining attribute values from the current element in the source document ( CakeType ) and substituting them into the output document. For example, the string {@shape} is replaced with the value of the shape attribute on the current CakeType element being processed from the input document.

  9. The template body for CakeType element processing contains an xsl:apply-templates instruction. Once again, this tells the XSLT processor to process the children of such elements. In the example document, the xsl:apply-templates instruction causes the processing to cascade down to the Message and Description child elements. The rules for these elements are again a mixture of literal content and xsl:apply-templates instructions. Because there are no other elements below Message and Description , this instruction simply causes their text content to be evaluated and transferred to the output document.

As you can see, with relatively few instructions, the contents of the document can be easily converted. The processing model for each type of element follows a similar pattern. Obviously, some transformations are easier to achieve than others, and sometimes it might be easier to perform a particular conversion using DOM or XmlReader/XmlWriter . However, stylesheets usually allow you to apply the Pareto principle of obtaining 80 percent of the result for 20 percent of the effort. As you'll see later, you can combine XSLT stylesheets with more traditional programmatic manipulation.

XSLT has certain advantages over programmatic mechanisms when it comes to performing transformations:

  • All XSLT stylesheets are defined in standard XSLT syntax and therefore do not limit you to any particular procedural programming language, toolset, or platform.

  • If a transformation is to be changed, it is generally easier to redeploy a stylesheet file than to redeploy application code.

  • Many XSLT processors are available in a variety of programming languages. You can use the Microsoft XSLT processor (which is part of the MSXML component) from Visual Basic, C#, J#, and C++. Other XSLT processors are accessible from Java and C++, such as Apache Xalan and James Clarke's XT.

XSLT also has some drawbacks:

  • You have to learn the XSLT stylesheet language and become familiar with XPath expressions.

  • The use of stylesheets can be slower than programmatic transformation in some cases.

For most transformations, the advantages of using XSLT stylesheets will outweigh the disadvantages.

Applying Transformations

Transformations are required when an application or component needs data in a format different from the one that the data is currently in. When you're using XSLT, both of these formats will typically be XML, although the output can be almost any form of text. Transformations typically occur in the following situations:

  • When the application is delivering information to the user ”for example, to convert XML into HTML or WML for display in a Web browser or on a mobile phone.

  • In a B2B scenario ”for example, when you need to convert between your own data format and that of a customer or supplier. There might be a series of different transformations that convert multiple supplier formats into a common document structure for internal use. This form of conversion might also apply when you're dealing with legacy data.

  • When you're dealing with multiple inputs to aggregate data from different sources.

  • When filtering is required to reduce the amount of data being processed. Filtering creates a required subset of the data from a more data-rich document.

You can apply transformations as part of your application when it receives XML documents from external sources, when it generates documents as part of your application's output, or both. The appropriate type of transformation and the correct place to apply it will depend on your application. Figure 6-2 shows an application that accepts XML-based data from various sources and uses XSLT to convert it into a common format before it is processed and used to deliver information to various clients .

Figure 6-2. An application can apply XSLT in many ways for importing and exporting data.

.NET Support for XML Transformations

Visual J# and the .NET Framework provide support for the programmatic transformation of XML documents using XSLT. This support is based on widely accepted standards for the processing and transformation of XML.

Standards and Mechanisms Supported by the .NET Framework

To transform an XML document, you must identify parts of the input document to be converted and then specify the conversion to be performed. The W3C defines two standards to help define the transformation, both of which the .NET Framework supports. XPath defines a string-based syntax that allows you to traverse and search an XML document, which then allows you to specify the nodes in the input document that you want to transform. The XSLT standard builds on XPath and provides a set of element definitions with which you can define templates that specify the required structure of the output XML document.

Classes in the .NET Framework

As you saw in the previous chapter, the basic XML support required for document manipulation is provided in the System.Xml namespace. There are also two subnamespaces that provide functionality specific to XSLT and transformations:

  • System.Xml.XPath , which contains classes that support the navigation of XML documents in a flexible way based on XPath expressions.

  • System.Xml.Xsl , which comprises classes that support the transformation of XML documents using XSLT stylesheets.

This chapter will cover the use of the classes in these two namespaces and will use classes from the System.Xml namespace as required.

I l @ ve RuBoard


Microsoft Visual J# .NET (Core Reference)
Microsoft Visual J# .NET (Core Reference) (Pro-Developer)
ISBN: 0735615500
EAN: 2147483647
Year: 2002
Pages: 128

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net