xsl:function

The <xsl:function> declaration defines a stylesheet function that can be invoked using a function call from any XPath expression.

Changes in 2.0

This element is new in XSLT 2.0.

Format

 <xsl:function   name = qname   as? = sequence-type   override? = "yes"  "no">   <!-- Content: (xsl:param*, sequence-constructor) --> </xsl:function>

Position

<xsl:function> is a top-level declaration, which means that it always appears as a child of the <xsl:stylesheet> element.

Attributes

Name	Value	Meaning
name mandatory	Lexical QName	The name of the function
as optional	SequenceType	The type of the value returned when this function is evaluated. A type error is reported if the result does not match this type
override optional	«yes » or «no »	Indicates whether this function overrides any vendor-supplied function of the same name

Name

Value

Meaning

name

mandatory

Lexical QName

The name of the function

optional

SequenceType

The type of the value returned when this function is evaluated. A type error is reported if the result does not match this type

override

optional

«yes » or «no »

Indicates whether this function overrides any vendor-supplied function of the same name

The construct SequenceType is outlined in Chapter 4, and is described in full in Chapter 9 of XPath 2.0 Programmer's Reference .

Content

Zero or more <xsl:param> elements, followed by a sequence constructor.

Effect

Stylesheet functions can be called from XPath expressions in the same way as system-provided functions. The function defined by this <xsl:function> element is added to the static context for every XPath expression in the stylesheet, which means that the function will be invoked when evaluating a function call in an XPath expression that has a matching name and number of arguments ( arity ).

When a stylesheet function is called from an XPath expression, the parameters supplied in the function call are evaluated and bound to the variables defined in the <xsl:param> elements, the sequence constructor contained in the <xsl:function> element is evaluated, and the result of this evaluation is returned as the result of the XPath function call.

The name of the function is given as a lexical QName in the name attribute. This name must have a namespace prefix: This is to ensure that the name does not clash with the names of functions in the standard function library. The XSLT 2.0 specification defines several namespaces (all starting with « http://www.w3.org/ » ) that are reserved-that is, they cannot be used for the names of user -defined functions, variables, or other stylesheet objects.

The stylesheet is allowed to contain two functions of the same name if they have different arity.

It is an error to have two functions in the stylesheet with the same name, arity, and import precedence, unless there is another with higher import precedence. When a function call in an XPath expression is evaluated, the function with highest import precedence is chosen .

The parameters to a function (which are defined using <xsl:param> elements as children to the <xsl:function> element) are mandatory parameters; it is not possible to use the required attribute to specify that a parameter is optional, or to specify a default value. The parameters are interpreted positionally: the first argument in the function call binds to the first <xsl:param> element, the second argument binds to the second <xsl:param> , and so on.

The values supplied as arguments to the function in the XPath function call are converted to the types defined by as attributes on the corresponding <xsl:param> elements if required, using the standard conversion rules described on page 476. If this conversion fails, a type error is reported. If an <xsl:param> element has no as attribute, then any value of any type is acceptable, and no checking or conversion takes place. This is equivalent to specifying «as="item()*" » .

On entry to the function, the context item, position, and size are undefined. It is therefore an error to use the expression «. » , or any relative path expression, or the functions position() and last() . Even a path expression beginning with «/ » is not allowed, because such path expressions select from the root of the tree containing the context node. This means that all information needed by the function must either be passed explicitly as a parameter, or be available in a global variable. Path expressions such as «$par/a/b/c » can be used to navigate from nodes that are supplied as parameters to other nodes in the same tree.

Other values in the dynamic context, such as the current template, current mode, current group , and current grouping key, are also either undefined or empty on entry to a stylesheet function.

The result of the function is obtained by evaluating the sequence constructor. This result may be a sequence consisting of nodes (either newly constructed nodes or references to existing nodes) or atomic values or both. If there is an as attribute on the <xsl:function> element, then the result of evaluating the sequence constructor is converted, if necessary, to the specified type. Once again, the standard conversion rules defined on page 476 are used. A type error is reported if this conversion fails. If the <xsl:function> element has no «as » attribute, then a result of any type may be returned, and no checking or conversion takes place.

The override Attribute

The override attribute controls what happens if a user-written function and a vendor-supplied function have the same name.

«override="yes" » means that the user-written function wins. This is the default. This value maximizes portability: The same implementation of the function will be used on all XSLT processors.
«override="no" » means that the vendor-supplied function wins. This setting is useful when the stylesheet function has been written as a fallback implementation of the function, for use in environments where no vendor-supplied implementation exists. For example, at http://www.exslt.org/ there is a definition of a mathematical function library including the function math:sqrt() , which evaluates the square root of its argument. This function is likely to be available with a number of XSLT processors, but not all. By supplying an XSLT implementation of this function, and specifying «override="no" » , the stylesheet author can ensure that a call to math:sqrt() will execute on any XSLT processor, and will take advantage of the vendor's native implementation when available.

You can find a square root function implemented in XSLT on Dimitre Novatchev's FXSL site at http://fxsl. sourceforge .net/ . It's not as inefficient as you might imagine. Nor is it a purely academic exercise. XSLT can be used to create graphical renditions of your data in SVG format, and this will often require such computations .

Usage and Examples

In this section I will first outline a few ways in which stylesheet functions can be used. I will then look more specifically at the differences between stylesheet functions and named templates. Then I will discuss the use of recursive functions, which provide an extremely powerful programming tool.

Using Stylesheet Functions

Stylesheet functions can be used in many different ways. Here is a simple function, which can be applied to an <employee> element to calculate the employee's annual leave entitlement in days:

  <xsl:function name="pers:annual-leave" as="xs:integer"   xpath-default-namespace="http://ns.megacorp.com/hr">   <xsl:param name="emp" as="element(employee)"/>   <xsl:variable name="service"   as="xdt:yearMonthDuration"   select="subtract-dateTimes-yielding-yearMonthDuration(   current-date(),   $emp/date-of-joining)"/>   <xsl:choose>   <xsl:when test="$service gt xdt:yearMonthDuration('P10Y')">   <xsl:copy-of select="20"/>   </xsl:when>   <xsl:when test="$service gt xdt:yearMonthDuration('P3Y')">   <xsl:copy-of select="17"/>   </xsl:when>   <xsl:otherwise>   <xsl:copy-of select="15"/>   </xsl:otherwise>   </xsl:choose>   </xsl:function>

This function can now be called from any XPath expression. For example, you can process all the employees with more than 16 days' annual leave by writing:

  <xsl:apply-templates select="//employee[pers:annual-leave(.) gt 16]"/>

Or you could process the employees sorted according to the number of days they are entitled to:

  <xsl:apply-templates select="//employee">   <xsl:sort select="pers:annual-leave(.)"/>   </ xsl:apply-templates>

This function could be packaged in a library module with other similar functions, allowing reuse of the code, and allowing the algorithms to be changed in one place rather than having them scattered around many different stylesheets. For calculating properties of nodes, functions are much more flexible than named templates because of the way they can be called.

As well as encapsulating properties of elements, functions can be used to encapsulate relationships. For example, the following function determines the responsible line manager for an employee:

  <xsl:function name="pers:line-manager" as="element(pers:employee)"   xpath-default-namespace="http://ns.megacorp.com/hr">   <xsl:param name="emp" as="element(employee)">   <xsl:variable name="mgr_nr"   select="doc('departments.xml')   /departments   /department[@dept-no = $emp/department]   /manager-nr"/>   <xsl:sequence select="doc('employees.xml')   /key('emp', $mgr-nr)"/>   </xsl:function>

Users of this function do not need to know how the relationship between employees and their line manager is actually represented in the XML source documents, only that the information is obtainable. This function can then be used in a path expression, rather like a virtual axis:

  <xsl:template match="pers:employee">   <xsl:text>Manager:</xsl:text>   <xsl:value-of select="pers:line-manager(.)/name"/>   </xsl:template>

In these examples I have declared the types of the parameters and the result by reference to types defined in a schema. This helps to document what the function is intended for, and it ensures that you will get an error message (rather than garbage output) if you call the function with incorrect parameters, for example a department rather than an employee element.

I have used the xpath-default-namespace attribute, which can be used on any XSLT element to define the namespace that is used for unprefixed element and type names occurring in XPath expressions. For details of this attribute, see the entry for <xsl:stylesheet> on page 433.

I also chose to put the functions in the same namespace as the elements that they operate on. This is not the only approach possible, but to my mind it establishes clearly that there is a close relationship between the types (such as pers:employee ) and the functions designed to operate on those types-the functions act like methods on a class.

Functions versus Named Templates

XSLT offers two very similar constructs: named templates and stylesheet functions. This section discusses the differences between them, and suggests when they might be used.

The main difference between named templates and stylesheet functions is the way they are called: templates are called from the XSLT level using the <xsl:call-template> instruction, while stylesheet functions are called from XPath expressions using a function call.

In earlier working drafts of XSLT 2.0, the differences between named templates and stylesheet functions were considerably greater. Named templates were always implemented using XSLT instructions, while stylesheet functions were largely implemented using XPath expressions. Named templates always returned newly constructed nodes, while stylesheet functions returned references to existing nodes, or atomic values.

In the final version of the specification, these differences have largely disappeared. The content model for the <xsl:template> and <xsl:function> elements is identical, and there is no difference in the way they are evaluated to produce a result, or in the kinds of result they can return. The only difference is in the way they are called.

If you need to invoke the same functionality from both the XSLT and the XPath levels, it is very easy to define a named template as a wrapper for a stylesheet function:

  <xsl:template name="my:func">   <xsl:param name="p1" required="yes"/>   <xsl:param name="p2" required="yes"/>   <xsl:sequence select="my:func($p1, $p2)"/>   </xsl:template>

or to define a stylesheet function as a wrapper for a named template,

  <xsl:function name="my:func">   <xsl:param name="p1" required="yes"/>   <xsl:param name="p2" required="yes"/>   <xsl:call-template name="my:func">   <xsl:with-param name="p1" select="$p1"/>   <xsl:with-param name="p2" select="$p2"/>   </xsl:call-template>   </xsl:function>

My own preference is to use stylesheet functions when I want to compute a value or to select existing nodes, and to use a named template when I want to construct new nodes. This reflects the fact that in general, the role of XSLT instructions is to construct nodes in the result tree, while the role of XPath expressions is to select nodes in the source tree and compute values derived from their content. Implementations are quite likely to reflect this distinction, with an XPath engine that is better at one kind of job and an XSLT engine that is better at another. In particular, XSLT engines are very likely to be able to bypass the formal process of constructing a sequence of nodes, then copying the nodes in this sequence to construct a tree, and serializing the tree. Many XSLT engines will be able to collapse this into a single operation, where the new nodes are serialized as soon as they are constructed. This optimization is much more difficult if the constructed nodes have to be passed to an XPath engine first, as the result of a function call.

The fact that stylesheet functions do not have access to the context item may seem at first to be an inconvenience. But I think that the fact that all parameters to the function are explicit greatly helps programming discipline, and produces code that is easier to maintain. It also makes life much easier for the optimizer, which brings another benefit in terms of faster execution.

Functions with side effects can cause some surprises at the XPath level, and creating a new node is a kind of a side effect. For example, one might expect that the result of the expression:

 my:f($x) is my:f($x)

is always true. But if the function my:f() creates a new node, this is not the case. It is no longer a pure function, because it returns different results on different invocations. An optimizer has to recognize this possibility when rearranging such an expression: It must make sure that the function is actually called twice.

Another example of this effect is the (admittedly rather perverse) expression:

 count(//a/../my:f(.))

Normally when evaluating a path expression, the processor can first find all the nodes that the path expression locates , then sort them into document order and eliminate duplicates. But with the expression above, if «my:f() » creates new nodes then the result of the final count() depends critically on how many times the function «my:f() » is called, and the correct answer is that it must be called exactly once for each distinct parent of an <a> element in the source document.

This is the kind of expression that is used to sort out the sheep from the goats when doing XPath conformance testing. At the time of writing, I have to confess that Saxon fails this test miserably: It only eliminates duplicates from the final result of a path expression, and not from intermediate results.

The bottom line is that functions that create and return new nodes are likely to play havoc with XPath optimization and are best avoided. Creating a temporary tree as local working data within the function is, of course, a different matter: the thing that causes problems is two invocations of the same function, with the same parameters, returning different results.

Recursion

Stylesheet functions can be recursive: They can call themselves either directly or indirectly. Because XSLT is a functional programming language without side effects (specifically, without updateable variables), recursion plays a key role in writing algorithms of any complexity.

In XSLT 1.0, such algorithms were written using recursive named templates. Apart from being rather long-winded, this had the drawback that it was difficult to return certain kinds of result: Templates in XSLT 1.0 could only return results by constructing new nodes. In XSLT 2.0 both templates and functions offer much more flexibility in the types of result they can return, and stylesheet functions offer additional flexibility in the way they can be called (for example, they can be called from within a predicate used in a match pattern).

As it happens, many of the simple problems where recursion was needed in XSLT 1.0 can now be solved in other ways, because XPath 2.0 offers a wider range of aggregation functions, and functions such as tokenize() to break up a string into its parts . But there are still cases where recursion is needed, and even when it isn't, many people find the recursive solution more elegant than an iterative solution using <xsl:for-each> or the XPath «for » expression.

Here is a function that extracts the part of a string after the last «/ » character:

  <xsl:function name="str:suffix" as="xs:string">   <xsl:param name="in" as="xs:string"/>   <xsl:sequence select="   if (contains($in, '/')   then str:suffix(substring-after($in, '/'))   else $in"/>   </xsl:function>

Note how all the logic is contained in a single XPath expression. I could have used <xsl:choose> to express the same logic, but with this kind of function I personally find it clearer to write the whole algorithm at the XPath level.

The function result is returned using an <xsl:sequence> instruction. It sometimes feels a little strange to use <xsl:sequence> when the value being computed is a single integer or string, but in the XPath data model a single value is the same as a sequence of length 1, so it's worth getting used to the idea that everything is a sequence. When returning an atomic value, one could just as easily use <xsl:copy-of> , but I prefer <xsl:sequence> because it works in all cases: When you are returning nodes, you don't want to copy them unnecessarily. You might also be tempted to use <xsl:value-of> , and in this case it would work, but the semantics aren't quite what we want: <xsl:value-of> would convert the selected string into a text node, which would then be atomized back to a string by virtue of the function's declared return type. Even if the optimizer can sort this out, it's making unnecessary work.

Another observation about this function is that it is tail-recursive. This means that after calling itself, the function does nothing else before it returns to its caller. This property is important because tail-recursive functions can be optimized to save memory. Sometimes a problem with recursion is that if you recurse too deeply, say to 1000 levels or so, the system runs out of stack space and terminates with a fatal error. A good XSLT processor can optimize a tail-recursive function so that this doesn't happen. Basically the trick is that instead of making the recursive call and then unwinding the stack when it regains control, the function can unwind the stack first, and then make the recursive call. This is because the stack frame isn't needed once the recursive call returns.

I mentioned earlier that many functions can now be implemented easily without using recursion, and this one is no exception. It could also be written using regular expressions:

  <xsl:function name="str:suffix" as="xs:string">   <xsl:param name="in" as="xs:string"/>   <xsl:sequence select="replace($in, '.*/([^/]*)', '')"/>   </xsl:function>

But this is a matter of personal style. It's a good idea to have more than one tool in your kitbag, and recursion is the most powerful one available.

One kind of problem where you will need recursion is when you need to analyze a graph (that is, a network of related objects). The next example shows the technique.

Looking for Cycles among Attribute Sets

This example examines a stylesheet module (we'll stick to a single module for simplicity) and determines whether it contains any cycles among its attribute set definitions. Also to keep it simple, I'll assume that attribute sets have simple names and ignore the complications caused by namespace prefixes.

Source

The source document is any stylesheet module. But the example is only interesting if you run it on a stylesheet module that contains attribute sets that are (incorrectly) defined in terms of themselves. I've included such a sample as cyclic-stylesheet.xsl .

This example requires a schema-aware processor, and assumes that the source document is validated against the schema for XSLT 2.0 stylesheets, which is included in the download as xslt20.xsd .

Output

The stylesheet will report the name of any attribute set that is defined directly or indirectly in terms of itself.

Stylesheet

This stylesheet is available as find-cycles.xsl . We'll start with a function that returns the attribute sets that are directly referenced from a given attribute set:

  <xsl:function name=" cyc:direct" as="schema-element(xsl:attribute-set)*">   <xsl:param name="in" as="schema-element(xsl:attribute-set)"/>   <xsl:sequence select="$in/../xsl:attribute-set   [@name=$in/@use-attribute-sets]"/>   </xsl:function>

This returns any attribute-set in the containing document whose name attribute is equal to any of these strings. Notice what's going on: The schema for XSLT stylesheets tells us that the use-attribute-sets attribute is a sequence of strings. The «= » operator causes this attribute to be atomized, which returns the typed value of the attribute, namely this sequence of strings; it then returns true if any of the strings in this sequence is equal to the other operand. This will only work if the input document has been validated against its schema.

Now observe that an attribute set A references another attribute set B if either cyc:direct(A) includes B , or there is an attribute set in cyc:direct(A) that references B . This translates into the recursive function:

  <xsl:function name="cyc:references" as="xs:boolean">   <xsl:param name="A" as="schema-element(xsl:attribute-set)"/>   <xsl:param name="B" as="schema-element(xsl:attribute-set)"/>   <xsl:sequence select="   if (cyc:direct($A) intersect $B)   then true()   else some $X in cyc:direct($A) satisfies cyc:references($X, $B)"/>   </xsl:function>

Now, finally, you can discover whether there are any cycles, that is, any attribute sets that reference themselves directly or indirectly:

  <xsl:value-of select="some $X in /*/xsl:attribute-set   satisfies cyc:references($X, $X)"/>

Note that once again, the function is tail-recursive.

An interesting observation about this function is that it is written entirely using constructs that are new in XSLT 2.0 and XPath 2.0: stylesheet functions, type checking based on schema-defined types, atomization of a list-valued attribute, the <xsl:sequence> instruction that allows XSLT instructions to return values other than nodes, the XPath intersect operator, and the XPath some..satisfies expression.

For a more generalized function that looks for cycles in any source data file, regardless how the relationships are represented, see the section Simulating Higher Order Functions under <xsl:apply-templates> on page 198.