Extension Functions | NetBeansв„ў IDE Field Guide: Developing Desktop, Web, Enterprise, and Mobile Applications (2nd Edition)

Extending the library of functions that can be called from XPath expressions has proved to be by far the most important way in which vendors extend the capability of the language, and so we will concentrate most of our attention on this particular extensibility mechanism.

When are Extension Functions Needed?

There are a number of reasons you might want to call an extension function from your stylesheet:

You might want to get data held externally, perhaps in a database or in an application.
You may need to access system services that are not directly available in XSLT or XPath. For example, you might want to use a random number generator, or append a record to a log file.
You might want to perform a complex calculation that is cumbersome to express in XSLT, or that performs poorly. For example, if you are generating SVG graphics, you might need to use trigonometric functions such as sin() and cos() . This situation arises far less with XSLT 2.0 than it did in 1.0, because the core function library is so much richer, especially in its ability to do string manipulation and date/time arithmetic. But if the function you need is out there in some Java library, it's no crime to call it.
A more questionable use of external functions is to get around the "no side effects" rule in XSLT, for example to update a counter. Avoid this if you can; if you need such facilities, then you haven't yet learned to think about solving problems in the way that is natural for XSLT. More on this in the next chapter.

There are two ways of using extension functions in XSLT. You can write your own extension functions, or you can call extension functions that already exist. These functions might be provided by your XSLT vendor, or they might come from a third-party library such as the EXSLT library found at http://www.exslt.org/ (many EXSLT functions provide capabilities that are no longer needed in 2.0, but some of them, such as the mathematical functions, are still very relevant).

Many vendors designed the interfaces for Java and JavaScript so that the extensive class libraries available in both these languages would be directly accessible to the stylesheet, with no further coding required. This is certainly true for mathematical functions, string manipulation, and date handling. Which language you choose to use to write extension functions is a matter of personal choice, though it will be heavily constrained by the XSLT processor you are using. With a Java-based processor such as Saxon or Xalan-J, the natural choice is to write extension functions in Java. With Microsoft processors, the natural choice is a .NET language such as C#. If you are using the 4XSLT processor, it is probably because your favorite language is Python. Processors written in C or C++ tend to require a more complex procedure for linking extension functions, if they are supported at all.

When are Extension Functions not Needed?

There is probably a tendency for newcomers to XSLT to write extension functions simply because they haven't worked out how to code the logic in an XSLT stylesheet function. Slipping back into a programming language you have used for years , rather than battling with an unfamiliar one, is always going to be tempting when you have deadlines to meet. It's understandable, but it's not the right thing to do.

There are other wrong reasons for using extension functions. These include:

Believing that an XSLT implementation of the logic is bound to be slower: Don't believe this until you have proved it by measurement. Dimitre Novatchev has written XSLT functions that do numerical computations in XSLT (calculating square roots, for example) and has found that the performance can be quite acceptable. In some cases, the XSLT logic is intrinsically slower, but this may not matter, because it avoids the overhead of switching languages.
Supplying external data to the stylesheet: The best way to supply information to the stylesheet is in the form of a stylesheet parameter. Another good way is to provide the data in the form of an XML document, in response to a call on the document() function (many processors allow you to write logic that intercepts the URI supplied to the document() function, or you could use a URI that invokes a servlet or a Web service).
Achieving side effects: There are some side effects that are reasonably acceptable, for example writing messages to a log file-these are basically actions that do not affect the subsequent processing of the stylesheet, where the order of events is not critically important. But trying to get round the no-side-effects rule in other ways is nearly always the wrong thing to do, though it can be very tempting. Sooner or later the optimizer will rearrange your code in a way that stops your extension function working.
Using XSLT as a job control language: I have seen stylesheets that consist entirely of calls to external services, effectively using XSLT as a scripting language to invoke a sequence of external tasks . XSLT wasn't designed for this role, and the fact that order of execution in XSLT is undefined makes it a very poor choice of tool for this job. Use a shell script language, or the ant utility.

Calling Extension Functions

Extension functions are always called from within an XPath expression. A typical function call looks like this:

  my:function($arg1, 23, string(title))

The name of an extension function will always contain a namespace prefix and a colon . The prefix ( «my » in this example) must be declared in a namespace declaration on some containing element in the stylesheet, in the usual way. The function may take any number of arguments (zero or more), and the parentheses are needed even if there are no arguments. The arguments can be any XPath expressions; in our example, the first argument is a variable reference, the second is a number, and the third is a function call. The arguments are passed to the function by value, which means that the function can never modify the values of the arguments (though if you pass nodes, the function may be able to modify the contents of the nodes). The function always returns a result.

We'll have more to say about the data types of the arguments, and the data type of the result, in due course.

What Language is Best?

Many processors offer only one language for writing extension functions (if indeed they allow extension functions at all) so the choice may already be made for you. Some processors offer a choice, for example Xalan supports both Java and JavaScript, while Microsoft supports any of the usual scripting languages, for example JScript and VBScript.

Generally , I'd suggest using the native language for your chosen processor: for example Java for Oracle, Saxon, and Xalan-J; JScript for MSXML3; Python for 4XSLT. If you want to use your stylesheet with more than one processor, write one version of the extension function for each language.

Client-Side Script

If you are generating HTML pages, your stylesheet can put anything it likes in the HTML page that it is generating. This includes <script> elements containing JavaScript code to be executed when the HTML page is displayed.

Don't get confused between this kind of script, and script that is executed in your stylesheet during the course of the transformation. It's especially easy to get the two confused when the transformation itself is running within the browser. Remember that stylesheet extension functions are always called using function calls in XPath expressions, while you are generating the HTML to be displayed. HTML <script> is always called in response to browser events such as the user clicking on a button.

Binding Extension Functions

When you call «my:function() » from within an XPath expression, the XSLT processor needs to find a suitable function to call. This process is called binding. The XSLT specification does not define how the binding is done, but two mechanisms have become popular, which I call explicit binding and implicit binding. Explicit binding uses a top-level declaration in the stylesheet (under a vendor-specific namespace) to define where specific functions or collections of functions are to be found, while implicit binding relies solely on the name of the function, typically using the namespace URI to identify a collection of functions, and the local name to identify the specific function within that collection.

Explicit Binding

Two popular processors that use an explicit binding technique are MSXML and Xalan. It's also available in Saxon, but rarely used.

MSXML uses a special top-level element < msxsl :script> to define extension functions, which may be written in a variety of languages, though JavaScript is the most popular.

Here is an example stylesheet that uses an extension written in VBScript, just to be different. The implementation of the function is written inline within the <msxsl:script> element.

Using VBScript in an MSXML3 Stylesheet

This example shows a stylesheet that converts dimensions in inches to the equivalent in millimeters.

Source

The source file is inches.xml . Double-click on it in Windows Explorer to invoke the stylesheet.

  <?xml version="1.0" encoding="iso-8859-1"?>   <?xml-stylesheet type="text/xsl" href="to-mm.xsl"?>   <dimensions>   The size of the picture is <inches>5</inches> by   <inches>12</inches>.   </dimensions>

Stylesheet

The stylesheet is to-mm.xsl .

It contains a simple VBScript function within an <msxsl: script> element, and invokes this as an extension function from the template rule for the <inches> element.

  <xsl:stylesheet   xmlns:xsl="http://www.w3.org/1999/XSL/Transform"   version="1.0"   xmlns:extra="urn:extra-functions"   >   <msxsl:script xmlns:msxsl="urn:schemas-microsoft-com:xslt"   language="VBScript"   implements-prefix="extra"   >   Function ToMillimetres(inches)   ToMillimetres = inches * 25.4   End Function   </msxsl:script>   <xsl:output method="html"/>   <xsl:template match="/" >   <html><body><p>   <xsl:apply-templates/>   </p></body></html>   </xsl:template>   <xsl:template match="inches">   <xsl:text> </xsl:text>   <xsl:value-of select="format-number(extra:ToMillimetres(number(.)),   '0.00')"/>   <xsl:text>mm </xsl:text>   </xsl: template>   </xsl:style'sheet>

Output

The following text is displayed in the browser:

  The size of the picture is 127.00mm by 304.80mm.

Note that this doesn't work in Netscape, which doesn't recognize the <msxsl: script> element.

These scripts can call COM objects named in the system registry in the usual way. However, if the stylesheet is running in the browser, the user's security settings may prevent your script from instantiating a client-side object.

Of course, this example uses an extension function to do something that could be trivially done within an XSLT 2.0 stylesheet function.

JavaScript and VBScript are both dynamically typed languages, which made them a good fit with XSLT 1.0 and XPath 1.0. However, it's easy to get tripped up by the fact that the function calling conventions aren't always what you expect. For example, both in XPath 1.0 and in 2.0, if a function in the core library such as starts-with () expects a string, then you can supply an attribute node, and the value of the attribute will be extracted automatically (in XPath 2.0 this process is called atomization ). JavaScript doesn't declare the types of function parameters, which means that no such conversion is possible: if you supply an attribute node, that's what the JavaScript code will see, and if it was expecting a string, it will probably fail.

The Xalan product also supports JavaScript extension functions using a similar mechanism, but this time the binding element is <xalan:script> , where the «xalan » prefix represents the URI http://xml.apache.org/xslt . Several <xalan: script> elem ents can be grouped together in a <xalan: component> element.

There's nothing to stop you having an <msxsl:script> declaration and a <xalan:script> declaration in the same stylesheet. An XSLT processor is required to ignore top-level declarations in an unknown namespace, so each processor will ignore the declaration that's intended for the other. This means you can have two implementations of the same extension function in your stylesheet, one for use when you're running MSXML, another for use when running Xalan.

Implicit Binding

Most of the Java XSLT processors (Saxon, Xalan, jd.xslt, Oracle, xt) also support an implicit binding of extension functions to Java methods . This is generally based on the idea that the namespace URI used in the function identifies the Java class, and the local name of the function corresponds to the method name.

For example, the following stylesheet can be used in Saxon to calculate a square root.

An Extension Function to Calculate a Square Root

This example shows a stylesheet that returns the square root of a number in the source document.

Source

The source document is sqrt.xml:

  <number>2.0</number>

Stylesheet

The stylesheet is sqrt. xsl.

  <xsl:transform   xmlns:xsl="http:www.w3.org/1999/XSL/Transform   xmlns:xs="http://www.w3.org/2001/XMLSchema"   exclude-result-prefixes="xs"   version="2.0">   <xsl:template match="number">   <result>   <xsl:value-of select="Math:sqrt(xs:double(.))"   xmlns:Math="java:java.lang.Math"/>   </result>   </xsl:template> </xsl:transform>

Output

  <?xml version="1.0" encoding="UTF-8"?> <result>1.4142135623730951</result>

This stylesheet calls an external function Math:sqrt() , where the namespace prefix «Math » is bound to the namespace URI «java:java.lang.Math » . Saxon recognizes namespace URIs beginning with «java: » as special-the part of the URI after the «java: » is interpreted as a Java class name. The processor loads this Java class and looks to see whether it contains a static method called «sqrt » that can take an argument that is a double. It does, so this method is called, and the result is taken as the return value from the function call.

Although each of the Java XSLT processors supports implicit bindings of Java methods to extension functions in much this kind of way, the details vary from one processor to another, and it may be difficult to write code that is completely portable across processors. In particular, processors are likely to vary in how they map between the XPath data types and Java data types. This is especially true if the Java class contains several methods of the same name, but with different argument types (method overloading). To find out the detail of how each processor handles this, you will need to consult the documentation for your specific product.

Most of what I've said so far about extension functions applies equally to XSLT 1.0 and XSLT 2.0. In fact, most of the processors mentioned do not yet have an XSLT 2.0 version. So it remains to be seen how vendors will tackle the challenge of mapping the much richer type system in XSLT 2.0 to Java classes.

Many XSLT 1.0 processors allow a call on a Java method to return a wrapped Java object, which can then be supplied as an argument to another extension function. For example, you might have a function sql:connect() that returns an object of type "SQL connection," and another function sql:query() that accepts a SQL connection as its first argument. In XSLT 1.0, with its limited type system, this object is generally modeled using a single extra data type "external object." With XSLT 2.0, it is possible to go further than this and implicitly import any number of user-defined types into the stylesheet. Saxon takes this to its logical extreme, and implicitly imports the whole of the Java class hierarchy, mapping class names into the namespace http://saxon.sf.net/java-type . The result is that ( assuming the prefix «class » is bound to this namespace) you can declare a variable such as:

  <xsl:variable name="connection" as="classs java.sql.Connection"   select="sql:connect(...)"/>

This means that these external objects can be used with complete type safety, because the Java class hierarchy has been mapped to the XSLT/XPath type hierarchy.

Generally, processors map the common data types into their obvious equivalent in the external programming language. For example, in Java, an xs:double maps to a Java double ; an xs: string to a String ; an xs:boolean to a Java boolean ; and so on. The xs: integer type is a little tricky because XML Schema doesn't define its maximum range; Saxon maps it to a Java long , but other products may make a different choice, for example «java.math.BigInteger » . Bindings for the more common data types are defined in the Java Architecture for XML Binding (JAXB, see http://java.sun.com/xml/downloads/jaxb.html) and it's quite likely that these will be adopted by XSLT vendors, though they were not originally defined for that purpose. This doesn't currently cover the less common types like xs:gYearMonth , where you can certainly expect variations between products. However, there is an initiative in the Java Community Process to complete the mappings between Java classes and XML Schema data types (watch out for a package called javax. xml. datatype in JDK 1.5) and this may lead to further harmonization between XSLT processors.

With XSLT 1.0 (and in the draft XSLT 1.1 specification) most XSLT processors naturally followed the weak typing approach of implicitly converting the supplied parameters in the XPath function call to the required type declared in the Java method. Current releases of Saxon still follow this approach. However, it would be more logical to switch to a stricter model aligned with the XPath 2.0 function calling rules, where only very limited conversions between the supplied value and the required type are supported.

When the values passed to an extension function are nodes, rather than atomic values, the data mapping issues become more complicated. The accepted standard for manipulating XML trees in most languages is the DOM, and it's likely that many processors will offer extension functions the ability to manipulate nodes using the DOM interface, even though the DOM does not match the XSLT/XPath data model particularly well. This is discussed in the next section.

XPath Trees and the DOM

We haven't got space in this book for a detailed description of the DOM interface, but most readers will already have come across it in some form, and it is described in detail in most good books on XML. The DOM provides an object model (and therefore an API) for navigating and manipulating XML data in tree form. Many XSLT processors allow extension functions to access nodes, using the methods defined in the DOM API.

If you want extension functions to access the XSLT source tree, or a secondary input tree that was loaded using the document() function, or even a temporary tree constructed during the course of the XSLT transformation, then you can generally do this by passing a node as one of the function arguments. The extension function can then manipulate this node, and other related nodes such as its children and parent, as objects in a DOM structure. It may also be possible for an extension function to construct a new tree, and return it (typically as a DOM Document object) to the calling XPath expression, where it can be manipulated as a secondary input tree in the same way as the result of the XSLT document() function. Some products may also allow a DOM that's passed to an extension function to be modified in situ-this is definitely a dubious practice, because it creates a dependency on order of execution, but it's not absolutely prohibited .

The only problem with using the DOM in this way is that there are many small but significant differences between the tree model used by XSLT and XPath, and the tree model defined in the DOM specification. For example, the DOM exposes entity references and CDATA sections, the XPath model doesn't.

This is exacerbated by the fact that there are two different implementation approaches adopted by XSLT vendors: both are perfectly valid and both need to be catered for. Some products, such as Microsoft MSXML3, are DOM oriented. This processor uses a DOM as its internal tree model, and provides the XPath model as a virtual data structure (a view or wrapper) on top of this. This means, for example, that CDATA sections will be physically present on the tree, and XPath operations such as following-sibling will dynamically merge the CDATA contents with the surrounding text nodes. When an extension function is called, such a product will present the native underlying DOM to the called function, CDATA nodes and all. Other products (Saxon is an example, as is the XSLT processor in Microsoft .NET) use an internal data structure that is closely aligned to the XPath model described in Chapter 2. This data structure will have discarded any information that is not needed for XPath processing, such as CDATA sections and entity references. When an external function is called, the situation is now reversed ; such a product will provide the DOM interface as a wrapper on top of the native XPath model.

It's impossible to hide all the differences between these two approaches. For example, where the XSLT specifications dictate that white-space nodes must be stripped from the tree, a DOM-oriented product will probably not remove these nodes physically from the tree, but will simply hide them from XPath view. A product that uses a native XPath tree is likely to remove the unwanted white-space nodes from the tree while building the tree. This means that with one approach, the stripped white-space nodes will be present in the DOM as seen by extension functions, and with the other, they will be absent.

Another difference is that with a native XPath tree, adjacent text nodes will be merged (or normalized) into a single node, whereas with a native DOM tree, they may be unnormalized. (Actually, MSXML3 doesn't always normalize text nodes correctly even in the XPath tree view.)

What all this means is that if you want your extension functions to be fully portable between different processors, you have to be aware of these possible differences, and work around them. The following table lists the areas of potential differences between the DOM view and the XPath view.

XPath Node	DOM Node	Correspondence
Document	Document	One to one.
Element	Element	One to one.
Attribute	Attribute	The XPath tree never represents namespace declarations as attributes named xmlns or xmlns: * . The DOM might or might not have such Attr nodes. If the source document used entity references within the attribute value, these might or might not be preserved in the DOM. The value of the getSpecified property in the DOM is unpredictable.
Text	Text	The DOM text nodes might or might not be normalized. If CDATA sections were used in the original document, CDATA nodes might or might not be present in the DOM. If the source document used entity references within the text value, these might or might not be preserved in the DOM. Whitespace nodes that have been stripped as far as XSLT processing is concerned might or might not be present as text nodes in the DOM.
Processing instruction	Processing instruction	One to one.
Comment	Comment	One to one.
Namespace	N/A	There is no direct equivalent in the DOM to XPath's namespace nodes. It is possible in a DOM for elements and attributes to use namespace URIs that are not declared anywhere on the tree.
N/A	CDATA section	CDATA section nodes may be present on the DOM tree if this is the native data structure used by the processor, but they are unlikely to be present if the processor constructs a DOM from the XPath tree.
N/A	Entity reference	Entity reference nodes may be present on the DOM tree if this is the native data structure used by the processor, but they are unlikely to be present if the processor constructs a DOM from the XPath tree.

When you call methods defined in the DOM, the result will follow the DOM rules, not the XPath rules. For example, in XPath the string value of an element node is the concatenation of all the text content within that element; but in the DOM, the apparently similar nodeValue() method returns null.

It's not a good idea to attempt to update a DOM that is passed to an extension function. Three things might happen, depending on the implementation:

The attempt to update the DOM may cause an exception.
If the DOM was constructed as a copy of the XPath tree, the updates may succeed, but have no effect on the tree as seen subsequently within the stylesheet.
If the DOM and the XPath tree are different views of the same data, then updates may affect the subsequent XSLT processing. This might cause subsequent failures, for example if nodes have been deleted while the XSLT processor holds references to them.

Constructing a new tree, in the form of a DOM, and returning this to the stylesheet as the result of the extension function, is perfectly OK if the implementation allows it.

These rules for the mapping of XPath trees probably seem rather complicated, and there are certainly lots of potential pitfalls. My own advice would be to steer clear of this area if you possibly can. Navigating around the tree is something you can do perfectly well within XSLT and XPath; you don't need to escape into a different language for this. It's simpler, and usually quite adequate, to pass simple strings and numbers to your extension functions.

If you want to write an extension function that constructs and returns a new tree, you might well find that a simpler alternative is to call the document() function and implement a URIResolver (or in .NET, an XmlResolver) that takes the URI provided in this call, and returns the relevant data source. The JAXP URIResolver interface is described in Appendix D, and an overview of the .NET transformation API is in Appendix C.

Calling External Functions within a Loop

I wanted to show an example that includes a reasonably realistic stylesheet with multiple calls on extension functions. It turns out that all the examples I used for this in XSLT 1.0 are things that can be done quite straightforwardly with standard facilities in XSLT 2.0. However, with this caveat, I've decided to retain this example to show the principles.

This example is specific to the Saxon processor. It can be made to work with any processor that supports Java extension functions, but it will need minor alterations.

Calling External Functions within a Loop

In this example, we will use a Java Buffered Reader object to read an external file, copying it to the output one line at a time, each line being followed by an empty <br / > element. (The alternative way of doing this would be to read the file using the unparsed-text () function described in Chapter 7, and then to break it into its lines using <xsl:analyze-string>.)

Source

This stylesheet doesn't need a primary source document.

The real input is a serial file, which can be any text file. For example, the following hiawatha.txt :

  Take your bow, 0 Hiawatha,   Take your arrows, jasper-headed,   Take your war-club, Puggawaugun,   And your mittens, Minjekahwan,   And your birch-canoe for sailing,   And the oil of Mishe-Nama.

Stylesheet

The stylesheet can be downloaded as reader.xsl.

First we declare the namespaces we will need. It's often easiest to declare these namespaces on the <xsl:stylesheet> element itself. I shall stick to the convention of using the same ‰ java:* ‰« URI to identify the name of the Java class, and I will also use the abbreviated class name as the namespace prefix. You won't usually want these namespaces appearing in theresult document, so you can suppress them using exclude-result-prefixes .

  <xsl:stylesheet   xmlns:xsl="http://www.w3.org/1999/XSL/Transform"   version="2.0"   xmlns:FileReader="java;java.io.FileReader"   xmlns:BufferedReader="java:java.io.BufferedReader"   exclude-result-prefixes="FileReader BufferedReader">

The name of the file we want to read from will be supplied as a parameter to the stylesheet.

  <xsl:param name="filename"/>

When we are ready to read the file, we create the BufferedReader in a variable. Then we call a template to read the file, line by line.

  <xsl:template name="main">   <out>   <xsl:variable name="reader"   select="BufferedReader:new(FileReader:new($filename))"/>   <xsl:call-template name="read-lines">   <xsl:with-param name="reader" select="$reader"/>   </xsl:call-template>   </out>   </xsl:template>

The read-lines template reads and outputs the first line of the file, and then calls itself recursively to process the remainder. The readLine() method of the BufferedReader class returns null to indicate that the end of file has been reached, and in Saxon, a Java null is translated to a return value of an empty sequence. So we test whether to continue the recursion using the test ‰ exists ($line) ‰« , which returns false when the return value was null.

  <xsl: template name="read-lines">   <xsl:param name="reader"/>   <xsl:variable name="line"   select="BuffereReader:readLine{$reader) "/>   <xsl:if test"exists ($line).">   <xsl:value-of select="$line"/><br/>   <xsl:call-template name=read-lines">   <xsl:with-param name="reader" select="$reader"/>   </xsl:call-template>   </xsl:if>   </xsl:template>   </xsl:stylesheet>

Note that this template is tail-recursive: it does no further work after calling itself. This means that a processor that provides tail-call optimization should be able to handle arbitrary long input files. A processor without this feature may fail with a stack overflow, perhaps after reading 500 or 1000 lines of text.

Output

When you run this stylesheet, you need to supply a value for the filename parameter. For example:

  Java net.sf.saxon.Transform -it main reader.xsl filename=hiawatha.txt

This command line invokes Saxon without a source document, specifying ‰ main ‰« as the name of the first template to be executed, and ‰ hiawatha. txt ‰« as the value of the ‰ filename ‰« parameter.

The output looks like this, adding newlines for clarity.

  <?xml version="1.0" encoding="UTF-8"?>   <out>   Take your bow, O Hiawatha,<br/>   Take your arrows, jasper-headed,<br/>   Take your war-club, Puggawaugun,<br/>   And your mittens, Minjekahwan, <br/>   And your birch-canoe for sailing><br/>   And the oil of Mishe-Nama.<br/>   </out>

In this example, the function call does have side effects, because the $reader variable is an external Java object that holds information about the current position in the file being read, and advances this position each time a line is read from the file. In general, function calls with side effects are dangerous, because XSLT does not define the order in which statements are executed. But in this case, the logic of the stylesheet is such that an XSLT processor would have to be very devious indeed to execute the statements in any order other than the obvious one. The fact that the recursive call on the read-lines template is within an <xsl:if> instruction that tests the $line variable means that the processor is forced to read a line, test the result, and then, if necessary, make the recursive call to read further lines.

The next example uses side effects in a much less controlled way, and in this case causes results that will vary from one XSLT processor to another.

Functions with Uncontrolled Side Effects

Just to illustrate the dangers of using functions with side effects, we'll include an example where the effects are not predictable.

A Function with Uncontrolled Side Effects

This example shows how a processor can call extension functions in an unpredictable order, causing incorrect results if the functions have side effects. This can apply even when the extension function is apparently read-only.

Source

Like the previous example, this stylesheet doesn't use a source document.

In this example we'll read an input file containing names and addresses, for example addresses.txt . We'll assume this file is created by a legacy application and consists of groups of five lines. Each group contains a customer number on the first line, the customer's name on the second, an address on lines three and four, and a telephone number on line five. Because that's the way legacy data files often work, we'll assume that the last line of the file contains the string ‰ **** ‰« .

  15668   Mary Cousens   15 Birch Drive   Wigan   01367-844355   17796   John Templeton   17 Spring Gardens   Wolverhampton   01666-932865   19433   Jane Arbuthnot   92 Mountain Avenue   Swansea   01775-952266   ****

Stylesheet

We might be tempted to write the stylesheet as follows ( addresses. xsl ), modifying the previous example:

  <xsl:stylesheet   xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0"   xmlns:FileReader="java:java.io.FileReader"   xmlns:BufferedReader="java:java.io. BufferedReader"   exclude-result-prefixes="FileReader BufferedReader">   <xsl:output indent="yes"/>   <xsl:param name="filename"/>   <xsl:template name="main">   <xsl:variable name="reader"   select="BufferedReader:new(FileReader:new($filename)) "/>   <xsl:call-template name="read-addresses">   <xsl:with-param name="reader" select="$reader"/>   </xsl:call-template>   </xsl:template>   <xsl:template name="read-addresses">   <xsl: param name="reader" / >   <xsl:variable name="linel"   select="BufferedReader:readLine ($reader)"/>   <xsl:if test="$linel != '****'">   <xsl:variable name="line2"   select="BufferedReader:readLine($reader)"/>   <xsl;variable namep"Iine3"   select*"BufferedReader:readLine($reader)    "/>     <xsl:variable name="line4"   select=s"BufferedReader:readLine($reader)"/>   <xsl:variable name="line5"   select="BufferedReader:readLine($reader)"/>   <label>   <address>   <xsl:value-of select="$line3"/><br/>   <xsl:value-of select="$line4"/><br/>   </address>   <recipient>Attn: <xsl:value-of select="$line2"/></recipient>   </label>   <xsl:call-template name="read-addresses">   <xsl:with-param name="reader" select="$reader"/>   </xsl:call-template>   </xsl:if>   </xsl:template>   </xsl:stylesheet>

What's the difference? This time we are making an assumption that the four variables $line2 , $line3 , $line4 , and $line5 will be evaluated in the order we've written them. There is no guarantee of this. The processor is quite at liberty, for example, not to evaluate a variable until it is used, which means that $line3 will be evaluated before $line2 , and worse still, $line5 (because it is never used) may not be evaluated at all, meaning that instead of reading a group of five lines from the file, the template will only read four lines each time it is invoked.

Output

The result, in the case of Saxon, is a disaster.

<?xml version="1.0" encoding="UTF-8"?> <label> <address>15 Birch Drive<br/>Wigan<br/> </address> <recipient>Attn: Mary Cousens</recipient> </label> <label> <address>John Templeton<br/>17 Spring Gardens<br/> </address> <recipient>Attn: 17796</recipient> </label> <label> <address>19433<br/>Jane Arbuthnot<br/> </address> <recipient>Attn: 01666-932865</recipient> </label> <label> <address>01775-952266<br/>****<br/> </address> <recipient>Attn: Swansea</recipient> </label>

Saxon doesn't evaluate a variable until you refer to it, and it doesn't evaluate the variable at all if you never refer to it. This becomes painfully visible in the output, which reveals that it's simply not safe for an XSLT stylesheet to make assumptions about the order of execution of different instructions.

This stylesheet might work on some XSLT processors, but it certainly won't work on all.

The correct way to tackle this stylesheet in XSLT 2.0 is to read the whole text using the unparsed-text() function; then to split it into lines using either <xsl:analyze-string> or the tokenize() function; and then to use grouping facilities to split it into groups of five lines each. There is no need for extension functions at all.

This example raises the question of whether there is any way you can write a call to an extension function and be sure that the call will actually be executed, given that the function is one that returns no result. It's hard to give a categorical answer to this because there is no limit on the ingenuity of optimizers to avoid doing work that makes no contribution to the result tree. However, with Saxon today a function that returns no result is treated in the same way as one that returns null , which is interpreted in XPath as an empty sequence. So you can call a void method using:

<xsl:sequence select="classs:voidMethod()"/>

and provided the <xsl:sequence> instruction itself is evaluated, the method will always be called.