XSL Transformations | Processing XML with Javaв„ў: A Guide to SAX, DOM, JDOM, JAXP, and TrAX

XSLT is a transformation language. An XSLT stylesheet describes how documents in one format are converted to documents in another format. Both input and output documents are represented by the XPath data model. XPath expressions select nodes from the input document for further processing. Templates that contain XSLT instructions are applied to the selected nodes to generate new nodes, which are added to the output document. The final document can be identical to the input document, a little different, or a lot different. Most of the time it's somewhere in the middle.

XSLT is based on the notion of templates. An XSLT stylesheet contains semi-independent templates for each element or other node that will be processed . An XSLT processor parses the stylesheet and an input document. Then it compares the nodes in the input document to the templates in the stylesheet. When it finds a match, it instantiates the template and adds the result to the output tree.

The biggest difference between XSLT and traditional programming languages is that the input document drives the flow of the program, rather than the stylesheet controlling it explicitly. When designing an XSLT stylesheet, you concentrate on which input constructs map to which output constructs rather than on how or when the processor reads the input and generates the output.

In some sense, XSLT is a push model like SAX rather than a pull model like DOM. This approach is initially uncomfortable for programmers accustomed to more procedural languages, but it has the advantage of being much more robust against unexpected changes in the structure of the input data. An XSLT transform rarely fails completely just because an expected element is missing or misplaced, or because an unexpected, invalid element is encountered .

Tip

If you are concerned about the exact structure of the input data and want to respond differently if it's not precisely correct (for example, if an XML-RPC server should respond to a malformed request with a fault document rather than a best guess), then validate the documents with a DTD or a schema before transforming them. XSLT doesn't provide the means to do this, but you can implement this in Java in a separate layer before deciding whether to pass the input document to the XSLT processor for transformation.

Template Rules

An XSLT stylesheet contains examples of what belongs in the output documentroughly one example for each significantly different construct that exists in the input documents. It also contains instructions that tell the XSLT processor how to convert input nodes into the example output nodes. The XSLT processor uses those examples and instructions to convert nodes in the input documents to nodes in the output document.

Examples and instructions are written as template rules. Each template rule has a pattern and a template. The template rule is represented by an xsl:template element. The customary prefix xsl is bound to the namespace URI http://www.w3.org/1999/XSL/Transform , and as usual the prefix can change as long as the URI remains the same. The pattern, a limited form of an XPath expression, is stored in this element's match attribute. The contents of the xsl:template element form the template. For example, the following is a template rule that matches methodCall elements and responds with a template consisting of a single methodResponse element:

 <xsl:template match="methodCall">    <methodResponse>     <params>       <param>         <value><string>Hello</string></value>       </param>     </params>   </methodResponse> </xsl:template>

Stylesheets

A complete XSLT stylesheet is a well- formed XML document. The root element of this document is xsl:stylesheet , which has a version attribute with the value 1.0. In practice, stylesheets normally contain multiple template rules to match different kinds of input nodes, but for now Example 17.1 shows one that contains just one template rule.

Example 17.1 An XSLT Stylesheet for XML-RPC Request Documents

 <?xml version="1.0" encoding="ISO-8859-1"?> <xsl:stylesheet version="1.0"   xmlns:xsl="http://www.w3.org/1999/XSL/Transform">   <xsl:template match="methodCall">     <methodResponse>       <params>         <param>           <value><string>Hello</string></value>         </param>       </params>     </methodResponse>   </xsl:template> </xsl:stylesheet>

Applying this stylesheet to any XML-RPC request document produces the following result:

 <?xml version="1.0" encoding="utf-8"?><methodResponse><params>  <param><value><string>Hello</string></value></param></params> </methodResponse>

The template in Example 17.1's template rule consists exclusively of literal result elements and literal data that are copied directly to the output document from the stylesheet. It also contains some white-space -only text nodes, but by default an XSLT processor strips these out. ^[1]

^[1] You can keep the white space by adding an xml:space="preserve " attribute to the xsl:template element if you want.

A template can also contain XSLT instructions that copy data from the input document to the output document, or create new data algorithmically.

Taking the Value of a Node

Perhaps the most common XSLT instruction is xsl:value-of . This returns the XPath string value of an object selected by an XPath expression. For example, the value of an element is the concatenation of all the character data but none of the markup contained between the element's start-tag and end-tag. Each xsl:value-of element has a select attribute whose value contains an XPath expression. This XPath expression identifies the object to take the value of. For example, this xsl:value-of element takes the value of the root methodCall element:

 <xsl:value-of select="/methodCall" />

This xsl:value-of element takes the value of the root int element further down the tree:

 <xsl:value-of select="/methodCall/params/value/int" />

This xsl:value-of element uses a relative location path . It calculates the string value of the int child of the value child of the params child of the context node (normally the node matched by the containing template):

 <xsl:value-of select="params/value/int" />

The xsl:value-of element can calculate the value of any of the four XPath data types (number, boolean, string, and node-set). For example, this expression calculates the value of e times ¼:

 <xsl:value-of select="2.71828 * 3.141592" />

In fact, you can use absolutely any legal XPath expression in the select attribute. This xsl:value-of element multiplies the number in the int element by ten and returns it:

 <xsl:value-of select="10 * params/value/int" />

In all cases, the value of an object is the XPath string value that the XPath string() function would return.

When xsl:value-of is used in a template, the context node is a node matched by the template and for which the template is being instantiated . The template in Example 17.2 copies the string value of the value element in the input document to the string element in the output document.

Example 17.2 An XSLT Stylesheet That Echoes XML-RPC Requests

 <?xml version="1.0" encoding="ISO-8859-1"?> <xsl:stylesheet version="1.0"   xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:template match="methodCall" xml:space="preserve"> <methodResponse>   <params>     <param>       <value>         <string>           <xsl:value-of select="params/param/value" />         </string>       </value>     </param>   </params> </methodResponse> </xsl:template> </xsl:stylesheet>

When this stylesheet is applied to the XML document in Example 17.3, the result is the XML document shown in Example 17.4. There are a number of GUI, command line, and server side programs that will do this, though our interest is going to be in integrating XSLT stylesheets into Java programs so I'm going to omit the details of exactly how this takes place for the moment. I'll pick it up again in the next section.

Example 17.3 An XML-RPC Request Document

 <?xml version="1.0"?> <methodCall>   <methodName>calculateFibonacci</methodName>   <params>     <param>       <value><int>10</int></value>     </param>   </params> </methodCall>

Example 17.4 An XML-RPC Response Document

 <?xml version="1.0" encoding="utf-8"?> <methodResponse>   <params>     <param>       <value>         <string>           10         </string>       </value>     </param>   </params> </methodResponse>

The white space in the output is a little prettier here than in the last example, because I used an xml:space="preserve" attribute in the stylesheet. More important, the content of the string element has now been copied from the input document. The output is a combination of literal result data from the stylesheet and information read from the transformed XML document.

Applying Templates

Probably the single most important XSLT instruction is the one that tells the processor to continue processing other nodes in the input document and instantiate their matching templates. This instruction is the xsl:apply-templates element. Its select attribute contains an XPath expression that identifies the nodes to apply templates to. The currently matched node is the context node for this expression. For example, the following template matches methodCall elements. However, rather than outputting a fixed response, it generates a methodResponse element whose contents are formed by instantiating the template for each params child element in turn :

 <xsl:template match="methodCall">    <methodResponse>     <xsl:apply-templates select="child::params"/>   </methodResponse> </xsl:template>

The complete output this template rule generates depends on what the template rule for the params element does. That template rule may further apply templates to its own param children, like this:

 <xsl:template match="params">    <params>     <xsl:apply-templates select="child::param"/>   </params> </xsl:template>

The param template rule may apply templates to its value child, like this:

 <xsl:template match="param">    <param>     <xsl:apply-templates select="child::value"/>   </param> </xsl:template>

The value template rule may apply templates to all of its child elements whatever their type, by using the * wildcard:

 <xsl:template match="value">    <value>     <xsl:apply-templates select="child::*"/>   </value> </xsl:template>

Finally, one template rule may take the value of all the possible children of value , by using the union operator :

 <xsl:template match="int  i4  string  boolean  double    dateTime.iso8601  base64  struct">   <string>     <xsl:value-of select="."/>   </string> </xsl:template>

This example descended straight down the expected tree, and mostly copied the existing markup. However, XSLT is a lot more flexible and can move in many different directions at any point. Templates can skip nodes, move to preceding or following siblings, reprocess previously processed nodes, or move along any of the axes defined in XPath.

The Default Template Rules

XSLT defines default template rules that are used when no explicit rule matches a node. The first such rule applies to the root node and to element nodes. It simply applies templates to the children of that node but does not specifically generate any output:

 <xsl:template match="*/">    <xsl:apply-templates/> </xsl:template>

This allows the XSLT processor to walk the tree from top to bottom by default, unless told to do something else by other templates. Any explicit templates override the default templates.

The second default rule applies to text and attribute nodes. It copies the value of each of these nodes into the output tree:

 <xsl:template match="text()@*">    <xsl:value-of select="."/> </xsl:template>

Together, these rules have the effect of copying the text contents of an element or document to the output, but deleting the markup structure. Of course you can change this behavior by overriding the built-in template rules with your own template rules.

Selection

Adding XPath predicates to match patterns and select expressions offers much of the if-then functionality you need. For the times when that's not quite enough, you can take advantage of the xsl:if and xsl:choose elements.

xsl:if

The xsl:if instruction enables the stylesheet to decide whether or not to do something. It contains a template. If the XPath expression in the test attribute evaluates to true, then the template is instantiated and added to the result tree. If the XPath expression evaluates to false, then it isn't. If the XPath expression evaluates to something that isn't a boolean, then it is converted to true or false using the XPath boolean() function described in Chapter 16. That is, 0 and NaN are false; all other numbers are true. Empty strings and node-sets are false; nonempty strings and node-sets are true.

For example, when evaluating an XML-RPC request, you might want to check that the request document indeed adheres to the specification, or at least that it's close enough for what you need it for. This requires checking that the root element is methodCall , that the root element has exactly one methodName child and one params child, and so forth. Following is an XPath expression that checks for various violations of XML-RPC syntax. (Remember that according to XPath, empty node-sets are false, and nonempty node-sets are true.)

 count(/methodCall/methodName) != 1   or count(/methodCall/params) != 1  or not(/methodCall/params/param/value)

I could check considerably more than this, but this suffices for an example. Now we can use this XPath expression in an xsl:if test inside the template for the root node. If the test succeeds (that is, if the request document is incorrect), then the xsl:message instruction terminates the processing:

 <xsl:template match="/">    <xsl:if test="count(/methodCall/methodName) != 1                  or count(/methodCall/params) != 1                  or not(/methodCall/params/param/value)">     <xsl:message terminate="yes">       The request document is invalid.     </xsl:message>   </xsl:if>   <xsl:apply-templates select="child::methodCall"/> </xsl:template>

The exact behavior of the xsl:message instruction is processor dependent. The message might be delivered by printing it on System.out , writing it into a log file, or popping up a dialog box. Soon, you'll see how to generate a fault document instead.

There is no xsl:else or xsl:else-if instruction. To choose from multiple alternatives, use the xsl:choose instruction instead.

xsl:choose

The xsl:choose instruction selects from multiple alternative templates. It contains one or more xsl:when elements, each of which contains a test attribute and a template. The first xsl:when element whose test attribute evaluates to true is instantiated. The others are ignored. There may also be an optional final xsl: otherwise element whose template is instantiated only if all the xsl:when elements are false.

For example, when an XML-RPC request is well-formed but syntactically incorrect, the server should respond with a fault document. This template tests for a number of possible problems with an XML-RPC request and processes the request only if none of the problems arise. Otherwise it emits an error message:

 <xsl:template match="/">    <xsl:choose>     <xsl:when test="not(/methodCall/methodName)">        Missing methodName     </xsl:when>     <xsl:when test="count(/methodCall/methodName) &gt; 1">       Multiple methodNames     </xsl:when>     <xsl:when test="count(/methodCall/params) &gt; 1">        Multiple params elements     </xsl:when>     <xsl:otherwise>       <xsl:apply-templates select="child::methodCall"/>     </xsl:otherwise>   </xsl:choose> </xsl:template>

XSLT does not have any real exception handling or error reporting mechanism. In the worst case, the processor simply gives up and prints an error message on the console. This template, like every other template, is instantiated to create a node-set. The nodes contained in that set will depend on which conditions are true. If there is an error condition, then this set will contain a single text node with the contents, "Missing methodName," "Multiple methodNames," or "Multiple params elements." Otherwise it will contain whatever nodes are created by applying templates to the methodCall child element. In either case, a node-set is returned that is inserted into the output document.

Calling Templates by Name

There is a second way in which a template can be instantiated. As well as matching a node in the input document, a template can be called by name using the xsl:call-template element. Parameters can be passed to such templates, and templates can even be called recursively. Indeed, it is recursion that makes XSLT Turing complete.

For example, here's a template named faultResponse that generates a complete XML-RPC fault document when invoked:

 <xsl:template name="faultResponse">    <methodResponse>     <fault>       <value>         <struct>           <member>             <name>faultCode</name>             <value><int>0</int></value>           </member>           <member>             <name>faultString</name>             <value><string>Invalid request document</string></value>           </member>         </struct>       </value>     </fault>   </methodResponse> </xsl:template>

The xsl:call-template element applies a named template to the context node. For example, earlier in this chapter you saw a root node template that terminated the processing if it detected an invalid document. Now it can call the fault template instead:

 <xsl:template match="/">    <xsl:choose>     <xsl:when test="count(/methodCall/methodName) != 1                  or count(/methodCall/params) != 1                  or not(/methodCall/params/param/value)">       <xsl:call-template name="faultResponse"/>     </xsl:when>     <xsl:otherwise>       <xsl:apply-templates select="child::methodCall"/>     </xsl:otherwise>   </xsl:choose> </xsl:template>

Named templates can factor out common code that's used in multiple places through top-down design, just as a complicated algorithm in a Java program may be broken into multiple methods rather than being kept as one large monolithic method. Indeed some large stylesheets, including the DocBook XSL stylesheets that produced this book, do use named templates for this purpose. However, named templates become even more important when you add parameters and recursion to the mix.

Passing Parameters to Templates

Each template rule can have any number of parameters represented as xsl:param elements. These appear inside the xsl:template element before the template itself. Each xsl:param element has a name attribute and an optional select attribute. The select attribute provides a default value for that parameter when the template is invoked but can be overridden. If the select attribute is omitted, then the default value for the parameter is set by the contents of the xsl:param element. (For a non-overridable variable, you can use a local xsl:variable element instead.)

For example, the parameters in the following fault template specify the fault code and the fault string. The default fault code is 0. The default fault string is Unspecified Error .

 <xsl:template name="faultResponse">    <xsl:param name="err_code" select="0"/>   <xsl:param name="err_message">Unspecified Error</xsl:param>   <methodResponse>     <fault>       <value>         <struct>           <member>             <name>faultCode</name>             <value>               <int><xsl:value-of select="$err_code"/></int>             </value>           </member>           <member>             <name>faultString</name>             <value>               <string>                 <xsl:value-of select="$err_message"/>               </string>             </value>           </member>         </struct>       </value>     </fault>   </methodResponse> </xsl:template>

XSLT is weakly typed. There is no type attribute on the xsl:param element. You can pass in pretty much any object as the value of one of these parameters. If you use such a variable in a place where an item of that type can't be used and can't be converted to the right type, then the processor will stop and report an error.

The xsl:call-template element can provide values for each of the named parameters using xsl: with-param child elements, or it can accept the default values specified by the xsl:param elements. For example, the following template rule for the root node uses different error codes and messages for different problems:

 <xsl:template match="/">    <xsl:choose>     <xsl:when test="not(/methodCall/methodName)">       <xsl:call-template name="faultResponse">         <xsl:with-param name="err_code" select="1" />         <xsl:with-param name="err_message">           Missing methodName         </xsl:with-param>       </xsl:call-template>     </xsl:when>     <xsl:when test="count(/methodCall/methodName) &gt; 1">       <xsl:call-template name="faultResponse">         <xsl:with-param name="err_code" select="1" />         <xsl:with-param name="err_message">           Multiple method names         </xsl:with-param>       </xsl:call-template>     </xsl:when>     <xsl:when test="count(/methodCall/params) &gt; 1">       <xsl:call-template name="faultResponse">         <xsl:with-param name="err_code" select="2" />         <xsl:with-param name="err_message">           Multiple params elements         </xsl:with-param>       </xsl:call-template>     </xsl:when>   <!-- etc. -->     <xsl:otherwise>       <xsl:apply-templates select="child::methodCall"/>     </xsl:otherwise>   </xsl:choose> </xsl:template>

I'm not sure I would always recommend this approach for validation. Most of the time, writing a schema is easier, but this technique can verify things a schema can't. For example, it could test that a value element contains either an ASCII string or a type element such as int , but not a type element and an ASCII string.

Recursion

The ability for a template to call itself (recursion) is the final ingredient of a fully Turing-complete language. For example, here's a template that implements the factorial function:

 <xsl:template name="factorial">    <xsl:param name="arg" select="0"/>   <xsl:param name="return_value" select="1"/>   <xsl:choose>     <xsl:when test="$arg = 0">       <xsl:value-of select="$return_value"/>     </xsl:when>     <xsl:when test="$arg &gt; 0">       <xsl:call-template name="factorial">         <xsl:with-param name="arg" select="$arg - 1"/>         <xsl:with-param name="return_value"                         select="$return_value * $arg"/>       </xsl:call-template>     </xsl:when>     <xsl:when       test="$arg &lt; 0">Error: function undefined!</xsl:when>   </xsl:choose> </xsl:template>

The factorial template has two arguments, $arg and $return_value . $arg is the number whose factorial the client wants, and must be passed as a parameter the first time this template is invoked. $return_value is initially 1. When $arg reaches zero, the template returns $return_value . However, if $arg is not 0, the template decrements $arg by 1, multiplies $return_value by the current value of $arg , and calls itself again.

Functional languages such as XSLT neither allow variables to change their values nor permit side effects. This can seem a little strange to programmers accustomed to imperative languages like Java. The key is to remember that almost any task a loop performs in Java, recursion performs in XSLT. For example, consider the most basic CS101 problem, printing out the integers from 1 to 10. In Java it's a simple for loop:

 for (int i=1; i < 10; i++) {   System.out.print(i); }

In XSLT you'd use recursion in this fashion:

 <xsl:template name="CS101">    <xsl:param name="index" select="1"/>   <xsl:if test="$index &lt; 10">     <xsl:value-of select="$index"/>     <xsl:call-template name="CS101">       <xsl:with-param name="index" select="$index + 1"/>     </xsl:call-template>   </xsl:if> </xsl:template>

Similar recursive techniques can be used for other looping operations such as sums, averages, sorting, and more. Neither iteration nor recursion is mathematically better or fundamentally faster than the other. They produce the same results in the end.

Note

The XSLT solution is more complex and less obvious than the Java equivalent, but that has more to do with XSLT's XML syntax than with recursion itself. In Java the same operation could be written recursively like this:

 public void fakeLoop(int i) {   System.out.print(i);   if (i < 10) fakeLoop(i++); }

In fact, it has been proven that, given sufficient memory, any recursive algorithm can be transformed into an iterative one and vice versa.

Let's look at a more complex example. Example 17.5 is a simple XSLT stylesheet that reads input XML-RPC requests in the form of Example 17.3 and converts them into output XML-RPC responses.

Example 17.5 An XSLT Stylesheet That Calculates Fibonacci Numbers

 <?xml version="1.0" encoding="ISO-8859-1"?> <xsl:stylesheet version="1.0"   xmlns:xsl="http://www.w3.org/1999/XSL/Transform">   <xsl:template match="/">     <xsl:choose>       <!-- Basic sanity check on the input -->       <xsl:when         test="count(methodCall/params/param/value/int) = 1">         <xsl:apply-templates select="child::methodCall"/>       </xsl:when>       <xsl:otherwise>         <!-- Sanity check failed -->         <xsl:call-template name="faultResponse"/>       </xsl:otherwise>     </xsl:choose>   </xsl:template>   <xsl:template match="methodCall">     <methodResponse>       <params>         <param>           <value>             <xsl:apply-templates               select="params/param/value/int"/>           </value>         </param>       </params>     </methodResponse>   </xsl:template>   <xsl:template match="int">     <int>       <xsl:call-template name="calculateFibonacci">         <xsl:with-param name="index" select="number(.)"/>       </xsl:call-template>     </int>   </xsl:template>   <xsl:template name="calculateFibonacci">     <xsl:param name="index"/>     <xsl:param name="low"  select="1"/>     <xsl:param name="high" select="1"/>     <xsl:choose>       <xsl:when test="$index &lt;= 1">         <xsl:value-of select="$low"/>       </xsl:when>       <xsl:otherwise>         <xsl:call-template name="calculateFibonacci">           <xsl:with-param name="index" select="$index - 1"/>           <xsl:with-param name="low"   select="$high"/>           <xsl:with-param name="high"  select="$high + $low"/>         </xsl:call-template>       </xsl:otherwise>     </xsl:choose>   </xsl:template>   <xsl:template name="faultResponse">     <xsl:param name="err_code"    select="0" />     <xsl:param name="err_message" select="'Unspecified Error'"/>     <methodResponse>       <fault>         <value>           <struct>             <member>               <name>faultCode</name>               <value>                 <int><xsl:value-of select="$err_code"/></int>               </value>             </member>             <member>               <name>faultString</name>               <value>                 <string>                   <xsl:value-of select="$err_message"/>                 </string>               </value>             </member>           </struct>         </value>       </fault>     </methodResponse>   </xsl:template> </xsl:stylesheet>

Although XSLT is Turing complete in a theoretical sense, in practical use it is missing a lot of the functionality you'd expect from a modern programming language, such as mathematical functions, I/O, and network access. To actually use this stylesheet as an XML-RPC server, we need to wrap it up in a Java program that can provide all this.