Extending XSLT with Java | Processing XML with Javaв„ў: A Guide to SAX, DOM, JDOM, JAXP, and TrAX

TrAX enables you to integrate XSLT code with Java programs. Most XSLT processors written in Java also let you go the other way, integrating Java code with XSLT stylesheets. The most common reason to do this is to provide access to operating system functionality that XSLT doesn't offer, such as querying a database, listing the files in a directory, or asking the user for more information with a dialog box. You can also use Java when you simply find it easier to implement some complex algorithm in imperative Java rather than in functional XSLT. For example, although you can do complicated string search and replace in XSLT, I guarantee you it will be about a thousand times easier in Java, especially with a good regular expression library. And finally, even though it might be relatively easy to implement a function in pure XSLT, you may choose to write it in Java anyway purely for performance reasons. This is especially true for mathematical functions such as factorial and Fibonacci numbers. XSLT optimizers are not nearly as mature or as reliable as Java optimizers, and they mostly focus on optimizing XPath search and evaluation on node-sets rather than on mathematical operations on numbers .

XSLT defines two mechanisms for integrating Java code into stylesheetsextension functions and extension elements. These are invoked in exactly the same way as built-in functions and elements such as document() and xsl:template . However, rather than being provided by the processor, they're written in Java. Furthermore, they have names in some non-XSLT namespace. The exact way such functions and elements are linked with the processor varies from processor to processor.

Regardless of which XSLT processor you're using, there are two basic parts to writing and using extension functions and elements:

Binding the extensions to the stylesheet. This is done via namespaces, class names, and the Java class path .
Mapping the five XSLT types (number, boolean, string, node-set, and result tree fragment) to Java types, and vice versa.

Extension Functions

Example 17.19 is a simple extension function that calculates Fibonacci numbers. It is a faster alternative to the earlier recursive template. The entire class is in the com.macfaq.math package. When writing extension functions and elements, you really have to use proper Java package naming and set up your class path appropriately.

Example 17.19 A Java Class That Calculates Fibonacci Numbers

 package com.macfaq.math; import java.math.BigInteger; public class FibonacciNumber {   public static BigInteger calculate(int n) {     if (n <= 0) {       throw new IllegalArgumentException(        "Fibonacci numbers are only defined for positive integers"       );     }     BigInteger low  = BigInteger.ONE;     BigInteger high = BigInteger.ONE;     for (int i = 3; i <= n; i++) {       BigInteger temp = high;       high = high.add(low);       low = temp;     }     return high;   } }

Notice that there's nothing about XSLT in this example. This is just like any other Java class. On the Java side, all that's needed to make it accessible to the XSLT processor is to compile it and install the .class file in the proper place in the processor's class path.

If the extension function throws an exception, as calculate() might if it's passed a negative number as an argument, then the XSLT processing will halt. XSLT has no way to catch and respond to exceptions thrown by extension functions. Consequently, if you want to handle them, you'll need to do so in the Java code. After catching the exception, you'll want to return something. Possibilities include

A String that contains an error message
A NodeList that contains a fault document
An integer error code

This may not be the same type you normally return, so you'll probably need to declare that the method returns Object in order to gain the additional flexibility. For example, the following method returns an error message inside a String instead of throwing an exception:

 public static Object calculate(int n) {   if (n <= 0) {    return     "Fibonacci numbers are only defined for positive integers";   }   BigInteger low  = BigInteger.ONE;   BigInteger high = BigInteger.ONE;   for (int i = 3; i <= n; i++) {     BigInteger temp = high;     high = high.add(low);     low = temp;   }   return high; }

This method returns -1 (an illegal value for a Fibonacci number) instead of throwing an exception:

 public static BigInteger calculate(int n) {   if (n <= 0) return new BigInteger("-1");   BigInteger low  = BigInteger.ONE;   BigInteger high = BigInteger.ONE;   for (int i = 3; i <= n; i++) {     BigInteger temp = high;     high = high.add(low);     low = temp;   }   return high; }

It would be up to the stylesheet to check for the error code before using the result, and handle such a situation appropriately. In this example, that might require calling the extension function before any output is generated, storing the result in a variable, and deciding whether to output a successful response or a fault document based on the value of that variable. Waiting until the template for the int element is activated would be too late, because by that point, substantial parts of a successful response document already have been generated.

Now we need a stylesheet that uses this function to calculate Fibonacci numbers instead of the XSLT template. The details at this point are somewhat processor specific, so I will cover the two most popularSaxon and Xalan. As you'll see, there are quite a few points of similarity between them (although I think Saxon's approach is the cleaner of the two). Most other processors are likely to use something similar.

Tip

Before spending a lot of time writing your own extension functions, check to see if the EXSLT library [http://www.exslt.org/] already has the extension function you need. EXSLT provides many useful extension functions and elements for working with dates and times, functions, math, strings, regular expressions, sets, and more. This library has been ported to many different processors in many platforms and languages. I used some of the date functions in the stylesheets for this book.

Extension Functions in Saxon

Saxon allows you to bind any Java class to a namespace prefix. The trick is to use the custom URI scheme java followed by a colon and the fully package-qualified name of the class. For example, the following attribute binds the namespace prefix fib to the com.macfaq.math.FibonacciNumber class:

 xmlns:fib="java:com.macfaq.math.FibonacciNumber"

As long as this mapping is in scope, you can invoke any static function in the com.macfaq.math.FibonacciNumber class by using the prefix fib and the name of the method. For example, the old template for the int element could be replaced by this one:

 <xsl:template match="int"                xmlns:fib="java:com.macfaq.math.FibonacciNumber">   <int>     <xsl:value-of select="fib:calculate(number(.))"/>   </int> </xsl:template>

Here the number() function converts the value of the context node to an XSLT number. Then the processor looks for a static method named calculate() in the Java class mapped to the fib prefix that takes a single argument. It finds one, invokes it, and inserts the return value into the result tree.

XSLT is much more weakly typed than Java, and this can be useful when writing extension functions. Saxon will only invoke methods that have the right name and the right number of arguments; however, it will often convert the types of arguments and return values as necessary to make a function fit. In this case, the calculate() method expects to receive an int , but an XSLT number is really more like a Java double . In this case, because Saxon can't find a matching method that takes a double , it truncates the fractional part of the double to get an int and invokes the method that takes an int . This is a conversion that Java itself would not do without an explicit cast.

Working in the opposite direction, the calculate() method returns a BigInteger , which is not equivalent to any of XSLT's types. Thus Saxon converts it to a string using its toString() method before inserting it into the result tree. Other more recognizable return types may be converted differently. For example, void is converted to an empty node-set, and primitive number types such as int and double are converted to XSLT numbers, as are type wrapper classes such as Integer and Double . A DOM NodeList is converted to an XPath node-set; however, the nodes in the list must all be created by Saxon's own DOM implementation. You can't use third-party DOM implementations such as Xerces or GNU JAXP in a Saxon extension function.

Tip

Normally, namespace mappings for extension functions and elements are relevant only in the stylesheet. Nonetheless, they often have an annoying habit of popping up in the output document. If you know that an extension element or function prefix will not be used in the output document (and 99 percent of the time you do know exactly this), then you can add an exclude-result-prefixes attribute to the stylesheet root element that contains a list of the namespace prefixes whose declarations should not be copied into the output document. For example,

 <xsl:stylesheet version="1.0"    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"   xmlns:fib="java:com.macfaq.math.FibonacciNumber"   xmlns:saxon="http://icl.com/saxon"   exclude-result-prefixes="fib saxon">

Instance Methods and Constructors in Saxon

XSLT is not an object-oriented language. Static methods fit much more neatly into its structures than do objects and instance methods. If I'm writing a method just for XSLT, I'll normally make it static, if at all possible. However, Saxon can use instance methods as extension functions too. As before, the fully package-qualified class name must be bound to a namespace prefix. The constructor for the class can be called using the special local function name new() . For example, the following template retrieves the current time using the Java Date class:

 <xsl:template name="currentTime"                xmlns:date="java:java.util.Date">   <xsl:value-of select="date:new()"/> </xsl:template>

date:new() in XSLT is basically the same thing as new Date() in Java. When the Date constructor is invoked with no arguments, Java initializes the resulting Date object to the current time. You can also pass arguments to constructors, just like you can to static methods.

The object that the new() function returns is normally assigned to a variable. You can pass this variable to other extension functions as an argument. To invoke instance methods on that object, pass the variable that points to the object whose instance method you're invoking as the first argument to the instance method. This causes the normal first argument to get pushed over to become the second argument, the second argument to become the third, and so on. For example, the following template uses the GregorianCalendar class to get today's date. First it uses the static getInstance() method to return a GregorianCalendar object initialized to the current time. Then it passes the appropriate integer constants to the get() instance method to retrieve the month, day, and year. It produces the current date in the form 2002-3-26.

 <xsl:template name="today"                xmlns:cal="java:java.util.GregorianCalendar">   <xsl:variable name="rightNow" select="cal:getInstance()" />   <!-- The Calendar class uses zero-based months;        i.e., January is month 0, February is month 1, and        so on. We have to add one to get the customary month        number. -->   <xsl:variable name="month" select="cal:get($rightNow, 2) + 1" />   <xsl:variable name="day" select="cal:get($rightNow, 5)" />   <xsl:variable name="year" select="cal:get($rightNow, 1)" />   <xsl:value-of    select="$year" />-<xsl:value-of    select="$month" />-<xsl:value-of    select="$day" /> </xsl:template>

Note

If I were writing this in Java rather than in XSLT, the code would look like this:

 Calendar rightNow = Calendar.getInstance();  // Months are zero-based; i.e., January is month 0, February  // is month 1, and so on. We have to add one to get the // customary month number. String month = rightNow.get(Calendar.MONTH) + 1; String date  = rightNow.get(Calendar.DATE); String year  = rightNow.get(Calendar.YEAR); String result = year + "-" + month + "-" + date;

However, Saxon doesn't support extension fields, so XSLT must use the actual constant key values instead of the named constants.

If you absolutely must use the value of a field (for example, because a method expects an instance of the type-safe enum pattern instead of an int constant), you can always write an extension function whose sole purpose is to return the relevant field.

Extension Functions in Xalan

Xalan's extension function mechanism is a little more complicated and a little more powerful than Saxon's, but not a great deal more. Xalan offers somewhat greater access to the XSLT context inside extension functions if you need it, and has some additional shortcuts for mapping Java classes to namespace prefixes. Most important, it allows extension functions to work with any compliant DOM2 implementation, rather than requiring its own custom DOM.

Xalan uses the custom URI scheme xalan to bind namespace prefixes to classes. To bind a Java class to a namespace prefix in Xalan, you add an attribute of the form xmlns: prefix ="xalan:// packagename.classname " to the root element of the stylesheet or some other ancestor element. For example, the following attribute binds the namespace prefix fib to the com.macfaq.math.FibonacciNumber class:

 xmlns:fib="xalan://com.macfaq.math.FibonacciNumber"

As long as this mapping is in scope, you can invoke any static function in the com.macfaq.math.FibonacciNumber class by using the prefix fib and the name of the method. For example, the pure XSLT template for the int element could be replaced by this one:

 <xsl:template match="int"     xmlns:fib="xalan://com.macfaq.math.FibonacciNumber">   <int>     <xsl:value-of select="fib:calculate(number(.))"/>   </int> </xsl:template>

Xalan also allows you to define a namespace prefix for the entire Java class library by associating it with the URI http://xml.apache.org/xslt/java . The function calls must then use fully qualified class names. For example, the following template uses the prefix java to identify extension functions:

 <xsl:template match="int"      xmlns:java="http://xml.apache.org/xslt/java">   <int>     <xsl:value-of select=      "java:com.macfaq.math.FibonacciNumber.calculate(number(.))"     />   </int> </xsl:template>

This form is convenient if your stylesheets use many different classes. It of course is not limited to classes you write yourself. It works equally well for classes from the standard library and third-party libraries. For example, following is a random template that uses Java's Math.random() method:

 <xsl:template name="random"                xmlns:java="http://xml.apache.org/xslt/java">   <xsl:value-of select="java:java.lang.Math.random()" /> </xsl:template>

Constructors and Instance Methods in Xalan

Xalan can use instance methods as extension functions too. The new() function invokes the constructor for the class and can take whatever arguments the constructor requires. For example, the following template retrieves the current time using the Java Date class:

 <xsl:template name="currentTime"                xmlns:java="http://xml.apache.org/xslt/java">   <xsl:value-of select="java:java.util.Date.new()"/> </xsl:template>

If the prefix is bound to a specific class, you can omit the class name. For example,

 <xsl:template name="currentTime"                xmlns:date="xalan://java.util.Date">   <xsl:value-of select="date:new()"/> </xsl:template>

The object the new() function returns can be assigned to an XSLT variable, which then can be passed as an argument to other extension functions or used to invoke instance methods on the object. As in Saxon, to invoke an instance method you pass the object whose method you're invoking as the first argument to the method. For example, following is the Xalan version of the GregorianCalendar template that produces the current date in the form 2002-3-26.

 <xsl:template name="today"                xmlns:cal="xalan://java.util.GregorianCalendar">   <xsl:variable name="rightNow" select="cal:getInstance()" />   <xsl:variable name="month" select="cal:get($rightNow, 2)" />   <!-- The GregorianCalendar class counts months from zero        so we have to add one to get the customary number -->   <xsl:variable name="day" select="cal:get($rightNow, 5) + 1" />   <xsl:variable name="year" select="cal:get($rightNow, 1)" />   <xsl:value-of    select="$year" />-<xsl:value-of    select="$month" />-<xsl:value-of    select="$day" /> </xsl:template>

Like Saxon, Xalan does not permit access to fields in a class, so once again it's necessary to use the actual values instead of the named constants for the arguments to the get() method.

Exceptions thrown by extension functions have the same results in Xalan as in Saxon; that is, the XSLT processing halts, possibly in the middle of transforming a document. Once again, it's probably a good idea to design your extension functions so that they handle all probable exceptions internally and always return a sensible result.

Type Conversion in Xalan

Xalan converts method arguments and return types between Java and XSLT types in a mostly intuitive way. Table 17.1 lists the conversions from XSLT's five types to Java types in order of preference:

Table 17.1. Xalan Conversions from XSLT to Java

XSLT Type	Java Types, in Decreasing Order of Preference
Node-set	`org.w3c.dom.traversal.NodeIterator, org.w3c.dom.NodeList, org.w3c.dom.Node, String, Object, char, double, float, long, int, short, byte, boolean`
String	`String, Object, char, double, float, long, int, short, byte, boolean`
Boolean	`boolean, Boolean, Object, String`
Number	`double, Double, float, long, int, short, char, byte, boolean, String, Object`
Result tree fragment	`org.w3c.dom.traversal.NodeIterator, org.w3c.dom.NodeList, org.w3c.dom.Node, String, Object, char, double, float, long, int, short, byte, boolean`

Working in the reverse direction from Java to XSLT, the conversions are fairly obvious. Table 17.2 summarizes them. In addition to the ones listed here, other object types will normally be converted to a string using their toString() method if they're actually de-referenced somewhere in the stylesheet. However, their original type will be maintained when they're passed back to another extension function.

Table 17.2. Xalan Conversions from Java to XSLT

Java Type	Xalan XSLT Type
`org.w3c.dom.traversal.NodeIterator`	Node-set
`org.apache.xml.dtm.DTM`	Node-set
`org.apache.xml.dtm.DTMAxisIterator`	Node-set
`org.apache.xml.dtm.DTMIterator`	Node-set
`org.w3c.dom.Node` and its subtypes ( `Element` , `Attr` , etc.)	Node-set
`org.w3c.dom.DocumentFragment`	Result tree fragment
`String`	String
`Boolean`	Boolean
`Number` and its subclasses ( `Double` , `Integer` , etc.)	Number
`double`	Number
`float`	Number
`int`	Number
`long`	Number
`short`	Number
`byte`	Number
`char`	Object
`boolean`	Boolean
`null`	Empty string
`void`	Empty string

Expression Context in Xalan

There is one thing Xalan extension functions can do that Saxon extension functions can't do. A Xalan extension function can receive the current XSLT context as an argument. This provides information about the context node, the context node position, the context node list, and variable bindings. Admittedly, needing to know this information inside an extension function is rare. Most operations that consider the current context are more easily implemented in XSLT than in Java. Nonetheless, if you need to know this for some reason, you can declare that the initial argument to your function has type org.apache.xalan.extensions.Expression- Context ; for example,

 public static Node  findMaximum  (ExpressionContext  context  )

You do not need to pass an argument of this type explicitly. Xalan will create an ExpressionContext object for you and pass it to the method automatically. Furthermore, Xalan will always pick a method that takes an ExpressionContext over one that does not.

This Xalan-J ExpressionContext interface, shown in Example 17.20, provides methods to get the context and the context node list, convert the context node into either its string or number value (as defined by the XPath string() and number() functions), and to get the XPath object bound to a known variable or parameter.

Example 17.20 The Xalan ExpressionContext Interface

 package org.apache.xalan.extensions; public interface ExpressionContext {   public Node         getContextNode();   public NodeIterator getContextNodes();   public double       toNumber(Node n);   public String       toString(Node n);   public XObject      getVariableOrParam(    org.apache.xml.utils.QName qualifiedName)    throws javax.xml.transform.TransformerException; }

Extension Elements

An extension element is much like an extension function. In the stylesheet, however, it appears as an entire element such as <saxon:script/> or <redirect:write /> rather than as a mere function in an XPath expression contained in a select or test attribute. Any value it returns is placed directly in the result tree.

For example, suppose you wanted to define a fibonacci element like this one:

 <fib:fibonacci xmlns:fib="java:com.macfaq.math.FibonacciNumber">    10 </fib:fibonacci>

When processed , this element would be replaced by the specified Fibonacci number.

The first question is how the XSLT processor should recognize this as an extension element. After all, fib:fibonacci looks just like a literal result element that should be copied verbatim. The answer is that the xsl:stylesheet root element (or some other ancestor element) should have an extension-element-prefixes attribute containing a white-space -separated list of namespace prefixes that identify extension elements. For example, the following stylesheet uses the saxon and fib prefixes for extension elements:

 <xsl:stylesheet version="1.0"    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"   xmlns:saxon="http://icl.com/saxon"   xmlns:fib="java:com.macfaq.math.FibonacciNumber"   extension-element-prefixes="saxon fib">   <!- - ... - -> </xsl:stylesheet>

Because you can't be sure which extension elements are likely to be available across processors, it's customary to include one or more xsl:fallback elements as children of each extension element. Each such element contains a template that is instantiated if and only if the parent extension element can't be found. Example 17.21 demonstrates a stylesheet that attempts to use the fib:fibonacci extension element. If that element cannot be found, then a pure XSLT solution is used instead.

Example 17.21 A Stylesheet That Uses an Extension Element

 <?xml version="1.0" encoding="ISO-8859-1"?> <xsl:stylesheet version="1.0"   xmlns:xsl="http://www.w3.org/1999/XSL/Transform"   xmlns:fib="http://namespaces.cafeconleche.org/fibonacci"   extension-element-prefixes="fib">   <!-- I deleted the validation code from this stylesheet to        save space, but it would be easy to add back in        for production use. -->   <xsl:template match="/methodCall">     <methodResponse>       <params>         <param>           <value>             <xsl:apply-templates select="params/param/value" />           </value>         </param>       </params>     </methodResponse>   </xsl:template>   <xsl:template match="value">     <int>       <fib:fibonacci>         <xsl:value-of select="number(.)"/>         <xsl:fallback>           <!-- This template will be called only if the                fib:fibonacci code can't be loaded. -->           <xsl:call-template name="calculateFibonacci">             <xsl:with-param name="index" select="number(.)" />           </xsl:call-template>         </xsl:fallback>       </fib:fibonacci>     </int>   </xsl:template>   <xsl:template name="calculateFibonacci">     <xsl:param name="index"/>     <xsl:param name="low"  select="1"/>     <xsl:param name="high" select="1"/>     <xsl:choose>       <xsl:when test="$index &lt;= 1">         <xsl:value-of select="$low"/>       </xsl:when>       <xsl:otherwise>         <xsl:call-template name="calculateFibonacci">           <xsl:with-param name="index" select="$index - 1"/>           <xsl:with-param name="low"   select="$high"/>           <xsl:with-param name="high"  select="$high + $low"/>         </xsl:call-template>       </xsl:otherwise>     </xsl:choose>   </xsl:template> </xsl:stylesheet>

Alternately, you can pass the namespace-qualified name of the extension element to the element-available() function to figure out whether or not the extension is available. For example,

 <xsl:template match="value">    <int>     <xsl:choose>       <xsl:when test="element-available('fib:fibonacci')">         <fib:fibonacci>           <xsl:value-of select="number(.)"/>         </fib:fibonacci>       </xsl:when>       <xsl:otherwise>         <xsl:call-template name="calculateFibonacci">           <xsl:with-param name="index" select="number(.)" />         </xsl:call-template>       </xsl:otherwise>     </xsl:choose>   </int> </xsl:template>

From this point on, the exact details of how to code the extension element in Java are implementation dependent. Consult the documentation for your XSLT processor to learn how to write an extension element and install it. You will not be able to use preexisting methods and classes as extension elements; rather, you will need to custom code extension elements to fit in with the processor's own code.

Caution

Writing an extension element is much more complex than writing an extension function. It requires intimate knowledge of and interaction with the XSLT processor. If at all possible, it's advisable to use an extension function, perhaps one that returns a node-set, instead of an extension element.