XSLT Overview | Using XML with Legacy Business Applications

XSLT is another markup language based on XML. While I've witnessed quite a few doofus speakers and writers referring to XML itself as a programming language, when it comes to XSLT I think I might agree with them. (I'm sure a few computer science purists might think that even XSLT doesn't qualify as a programming language, but I'm not that dogmatic.) Like the DOM, XSLT deals with XML documents as trees. In very broad terms, XSLT specifies how an input document, referred to as a source tree , is transformed into an output document, referred to as a result tree , based on the contents of an XSLT document called a stylesheet . An XSLT processor is a software component that performs this transformation. The XSLT language uses XPath (XML Path Language), another language defined by a W3C recommendation. Among other things, XPath is used to identify specific parts of the source tree for processing.

XSLT and XPath are both very broad, capable languages. I can't cover all of their depth in just one chapter. However, I don't have to. XSLT and XPath can be used to perform XML transformations for a broad range of applications including not just application integration and electronic commerce but also publishing, word processing, and Web page design. XSLT and XPath were designed primarily to be used with XSL for formatting. People write whole books on each of these two technologies. But the fact of the matter is that most commonly performed business document transformations can be accomplished using a small subset of XSLT and XPath features. So, the basics of what you need to know can be covered in one chapter.

If you want to know more, which you probably will if you work with XSLT very much, there are several places you can go for help. Unlike the Schema Recommendation, I find the XSLT and XPath Recommendations to be well written and easy to understand (for the most part, once you get past the terminology). I highly recommend them. One of the reasons for their readability is that the authors put many of the conformance requirements in separate sections rather than interspersing them with other user -oriented content. On my Web site I also recommend several very good books that deal with more detailed features of XSLT and XPath and offer examples beyond what the Recommendations provide.

A Simple Example: Hello World

I think the best way to teach a programming language is to start with a very simple, contrived example. So, here is an XSLT version of Hello World.

Stylesheet (HelloWorld.xsl)

 <?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet version="1.0"     xmlns:xsl="http://www.w3.org/1999/XSL/Transform">   <xsl:template match="/">     <HelloWorld>Howdy!</HelloWorld>   </xsl:template> </xsl:stylesheet>

Here is our source document.

Source (Goodbye.xml)

 <?xml version="1.0" encoding="UTF-8"?> <Goodbye>   you all </Goodbye>

We can run the source document through an XSLT processor using our HelloWorld.xsl stylesheet and produce the following result.

Result (HelloWorld.xml)

 <?xml version="1.0" encoding="UTF-8"?> <HelloWorld>   Howdy! </HelloWorld>

In fact, we can process any XML document using this stylesheet and produce exactly this same result. Let's look at the major parts of this stylesheet and understand why.

The first thing to note is that the stylesheet is itself a well- formed XML document. The root xsl:stylesheet Element declares the XSLT namespace and uses xsl: as a namespace prefix. One of the restrictions on using XSLT is that all the XSLT Elements and Attributes in a stylesheet must be qualified with a namespace prefix that resolves to the XSLT namespace. You need to know that xsl:transform can be used as a synonym for xsl:stylesheet. That the Recommendation allows either indicates to me some serious philosophical differences in the work group that wrote it. So serious, in fact, that rather than achieving consensus on one name over the other they took the very unusual step of punting and allowing either. However, it takes only a cursory reading of the Recommendation to perceive a preference for stylesheet over transform, so that's what I use.

In our example there is only one Element in the body of the stylesheet document, the xsl:template Element. To aid in clarity, I am going to use in this chapter's text the convention of using the xsl: prefix on XSLT Element names . However, there are times when the generic "template" by itself rather than "xsl:template" reads better, so I will use it too.

The xsl:template Element defines what is called a template rule . It is very aptly named because the contents of the template Element function as a template (or pattern or cookie cutter if you prefer) for a portion of the result tree. The analogy of using a template in manufacturing to stamp out several copies of a part is not quite so evident in this example because we just produce one fragment of the result tree. In our example the content of the template Element is the entire body of the result tree document. This cookie cutter aspect will become more obvious as we look at other examples.

The rule aspect of the template Element relates to the conditions under which the template is invoked or applied. These are determined by various Attributes of xsl:template or by the contents of another xsl:template. In our example the rule is defined by the presence of the match Attribute. The value of the match Attribute is a pattern , based on a subset of XPath, that identifies a particular set of nodes in the source tree. Such a set is called a node-set . An XSLT processor basically functions by starting with the root of the source tree and doing a preorder traversal of the tree (top down, left to right). When it finds a node in a node-set that matches the pattern of a template's match Attribute, the contents of the xsl:template are written to the result tree. Any XSLT Elements in the body of the template Element are processed before writing the contents to the result tree. Since there are various means by which these child Elements may invoke other templates, this entire process can be recursive. In our example the value of the match Attribute is "/", a single forward slash. This is a special abbreviation for the source tree's root.

So, now that we know what the pieces are we can understand how this stylesheet works. The XSLT processor starts at the root of the source tree and sees if there is a template rule that matches it. Our xsl:template has a match Attribute value indicating the source tree root, so the processor examines the contents. There are no Elements from the XSLT namespace, so it writes the entire contents to the result tree. There are no other templates, so it terminates. Now, since all we are matching on is the source tree root, it should now be fairly obvious why we can use any XML document as a source and produce our Hello World document as the result.

Those are the bare-bones basics. Let's look at another example before we start digging into more details and concepts.

Another Simple Example: Changing Tag Names

Hello World has value only as an instructional tool. However, this next example comes very close to something you might actually use in a real-life situation. Suppose that your desktop bookkeeping system and your customer contact organizer are both able to import and export XML documents. Although it is unlikely , let us further suppose for this example that both systems use exactly the same information. However, they use different Element names to tag that information. Suppose that a prospect you've been courting quite awhile finally places an order, and you need to move their contact and billing information into the bookkeeping system. You can extract the information from your contact management system, but you need to change the Element names before you can import the document into your bookkeeping system. Here is a stylesheet that does it.

Stylesheet (ProspectToCustomer.xsl)

 <?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">   <xsl:template match="Prospect">     <Customer>       <FirstName>         <xsl:value-of select="FName"/>       </FirstName>       <MiddleName>         <xsl:value-of select="MI"/>       </MiddleName>       <LastName>         <xsl:value-of select="LName"/>       </LastName>       <Organization>         <xsl:value-of select="Company"/>       </Organization>       <StreetLine1>         <xsl:value-of select="Street1"/>       </StreetLine1>       <StreetLine2>         <xsl:value-of select="Street2"/>       </StreetLine2>       <CityOrMunicipalUnit>         <xsl:value-of select="City"/>       </CityOrMunicipalUnit>       <StateOrProvince>         <xsl:value-of select="State"/>       </StateOrProvince>       <PostalCode>         <xsl:value-of select="Zip"/>       </PostalCode>     </Customer>   </xsl:template> </xsl:stylesheet>

Here are the source document and the result produced by applying the stylesheet.

Source (Prospect.xml)

 <?xml version="1.0" encoding="UTF-8"?> <Prospect>   <FName>Wilma</FName>   <MI>J.</MI>   <LName>Peterson</LName>   <Company>Consolidated Consolidators</Company>   <Street1>Suite 35</Street1>   <Street2>14 Friendly Lane</Street2>   <City>Amarillo</City>   <State>TX</State>   <Zip>79101</Zip> </Prospect>

Result (Customer.xml)

 <?xml version="1.0" encoding="UTF-8"?> <Customer>   <FirstName>Wilma</FirstName>   <MiddleName>J.</MiddleName>   <LastName>Peterson</LastName>   <Organization>Consolidated Consolidators</Organization>   <StreetLine1>Suite 35</StreetLine1>   <StreetLine2>14 Friendly Lane</StreetLine2>   <CityOrMunicipalUnit>Amarillo</CityOrMunicipalUnit>   <StateOrProvince>TX</StateOrProvince>   <PostalCode>79101</PostalCode> </Customer>

You'll notice one new XSLT Element and another new concept in this stylesheet. The new Element is the xsl:value-of Element. There are various ways to use it, but here we insert a Text Node that contains the string value of the Element named in the select Attribute. You'll also notice that we used the value of "Prospect" in the match Attribute of the xsl:template rather than the "/" value for the document root. The reason for this is that Prospect as the document Element is not the same thing as the document root. We often refer to the "document Element" as the "root Element," but neither of these is the same thing as the document root. The document root is the ultimate parent of everything in the document and has just one child Element that is the root or document Element. Aside from the document root Element, the direct children of the document root include the XML prolog and any processing instructions or comments that precede the document root Element. So, having set our so-called context node to the root Element Prospect, the select Attributes of the xsl:value-of Elements identify the specific children of that Element by name using an XPath expression. We'll talk more about XPath in a later section in this chapter.

These two stylesheets offer examples of one particular approach to using XSLT. There are others. In the next section I briefly review some of these other approaches and describe the one we'll be using in this chapter.