Section 4.4. XSLT extensions | ASP.Net 2.0 Cookbook (Cookbooks (OReilly))


Prev	don't be afraid of buying books	Next

4.4 XSLT extensions

Implementors of XSLT 1.0 soon discovered that the language, while being fairly complete for its stated goal, lacks many convenience functions that are common in practical programming languages. As a result, almost every XSLT processor included custom extensions. Typically, functions for dealing with nodesets, strings, and regular expressions, as well as common mathematical functions, were added. Some processors also implemented extension instructions or attributes.

4.4.1 EXSLT

EXSLT (Extended XSLT) emerged as a standard unifying common XSLT extensions to ensure interoperability between processors. A lot of EXSLT's goodies have become part of XSLT 2.0 and XPath 2.0.

The EXSLT web site ^[10] provides complete details on each extension and lists the processors implementing it natively. For some functions, the site provides freely downloadable implementations in scripting languages (notably JavaScript).

^[10] www.exslt.org

EXSLT extensions include functions and instructions for:

manipulating date and time values;
dynamic (runtime) evaluation of strings containing XPath expressions;
identifying and converting data types;
mathematical calculations;
matching and replacing with regular expressions;
nodeset manipulation (difference, intersection, etc.);
defining XSLT functions;
creating multiple output documents.

4.4.2 Saxon extensions

Saxon ^[11] is a well-known, fast, and standards-compliant open source XSLT processor written by Michael Kay. Its author is also the editor of the W3C's XSLT specification.

^[11] saxon.sf.net

Saxon is written in Java. Stable versions of Saxon (the 6.* series) implement XSLT 1.0; beta versions (the 7.* series) are a testbed implementation of XSLT 2.0. Saxon 7 was very robust and stable in my testing.

Among other things, Saxon is notable for its extensibility. You can write your own extension functions in Java and call them from your XSLT stylesheet. Saxon also offers a wide range of useful built-in extensions that are unique to this processor.

Using proprietary Saxon extensions is not recommended unless absolutely necessary. Not only are they nonportable to other processors, but in the beta 7.* series they sometimes change or disappear between Saxon releases. Still, even if you never need any of these extensions, or if you use a different processor altogether, knowing what's available in a leading XSLT tool is entertaining and mind-widening.

4.4.2.1 The assignability dilemma

One of the most controversial extensions is the saxon:assign ^[12] instruction. It allows you to change the value of a variable that was declared with a saxon:assignable="yes" attribute in its xsl:variable .

^[12] In Saxon 7.*, the saxon prefix corresponds to the URI http://saxon.sf.net/ .

This extension clearly violates the principles of functional programming, at the same time making the language easier to use for novice XSLT programmers by allowing all sorts of imperative constructs. However tempting saxon:assign may look to you, remember that it can make your stylesheet a nightmare to debug because of side effects.

You cannot count on saxon:assign always being available in Saxon nor on it ever becoming part of the XSLT standard. Quoth Michael Kay: "Using saxon:assign is cheating." The author of Saxon spends considerable effort improving its performance, and he has hinted that as soon as saxon:assign becomes an obstacle to these optimization efforts, it will be gone. So, beware of locking yourself into saxon:assign , as you may end up with an obsolete Saxon version without an upgrade path .

A related extension is the saxon:while instruction that lets you create a conventional "while" loop. About the only sensible way to use such a loop is in combination with saxon:assignable variables .

4.4.2.2 Optimization-related extensions

Besides being a fast processor by itself, Saxon also offers tools that you can use to speed up the execution of your stylesheet. These tools include:

The saxon:memo-function="yes" attribute can be set on an xsl:function element if you want Saxon to cache the function's returned values. If the function is called again with the same arguments, the result is not calculated again but retrieved from the cache. Obviously, this optimization may break if the function is affected by side effects, so it is incompatible with saxon:assign .
The saxon:expression( string ) function takes an XPath expression as a string, parses it, and returns an object representing this expression in a parsed and ready-to-run form. Then, you use the saxon:eval( expr ) function to "call" this stored expression in the current context. If you just need to evaluate an expression stored in a string, use saxon:evaluate( string ) . Unfortunately, both eval() and evaluate() functions are limited in that you cannot use your stylesheet's variables or functions within the evaluated expression.

Why bother with turning an expression into an object? Because it is supposed to be faster than just using the same expression every time; whether it's actually faster is a subject for experimentation. More importantly, the ability to evaluate an expression stored in a string may make your source definition more powerful ( 3.9.1.3 ).

Before you jump into optimization, remember the old wisdom: "Premature optimization is the root of all evil." ^[13] Never spend a minute on optimization unless you are absolutely forced by performance constraints, and never optimize except where the real bottlenecks occur (see 6.4.4 for some XSLT profiling software).

^[13] Attributed to Donald Knuth.

4.4.2.3 Debugging extensions

Compared to other languages, debugging XSLT is made easier by the lack of side effects but more difficult by the fact that the processor may, for optimization, reorder templates during execution and skip unused variables ( 4.5.3 ), functions, or templates. Saxon's debugging tools are therefore very useful for a serious XSLT developer.

The saxon:path() function returns the "canonical" XPath to the current node. It is an absolute path uniquely identifying the current node. For example, the template
```
 <xsl:template match=  "p"  > <xsl:message><xsl:value-of select=  "saxon:path()"  /></xsl:message> </xsl:template> 
```
will output absolute XPaths for all p elements in the source (see also 6.3.2.1 ):
```
 /page[1]/block[1]/p[1] /page[1]/block[1]/section[1]/p[1] /page[1]/block[1]/section[1]/p[2] ... 
```
The saxon:line-number() function returns the line number corresponding to the context node in the source document. We'll use this function in our Schematron wrapper ( 5.1.2 ) in order to report the source line number for each validity error.
The saxon:systemId() function returns the URI of the source document. It is useful not only for debugging; by parsing this string, you can get the absolute URI of the source document's directory, and this URI may be used for resolving all sorts of relative URIs or translating absolute URIs from one base directory to another. We will use this function in our stylesheet setup ( 5.1.1 ).
The saxon:explain="yes" attribute can be set on any element, including literal result elements. This will cause Saxon, when compiling the stylesheet, to print the static type and internal representation of all XPath expressions in that element's attributes.

4.4.2.4 Miscellaneous goodies

Nodesets. As mentioned in 4.2 , XPath 2.0 has a whole bunch of new functions for manipulating nodesets and sequences, as does EXSLT ( 4.4.1 ). Saxon adds a few more, such as the very useful saxon:has-same-nodes() function that tests if two nodesets have at least one node in common.

Pls, DTDs, entities. Saxon allows you to access or create those aspects of XML documents that are out of reach of standard XSLT. Thus, there's a function to access the pseudo-attributes of a processing instruction, a set of instructions for creating an internal DTD subset ( 2.2.4.3 ) in the output document, and an instruction to create an entity reference.

Output control. Saxon offers several additional xsl:output attributes to control how the output document is serialized. You can set the number of spaces to use for one level of indentation, control the entity representation of characters outside of the output character set, plug in a custom Java class as an alternative serializer (with optional well- formedness control), or pass the output to another stylesheet without serialization.

Parsing strings and serializing trees. A new function in XSLT 2.0, unparsed-text () , loads a text file into memory as a data stringthat is, without XML parsing. Saxon offers two complementary and probably more useful functions: saxon:parse() parses a string representing a well- formed XML document and returns the root node of the resulting tree; saxon:serialize() , conversely, returns a string with a serialization of the node tree passed to it as an argument.

4.4.3 Custom extensions

Most XSLT processors let you write your own extensions in a programming language other than XSLT. Usually, you can add your custom functions to the set of built-in XPath functions, but some processors also let you add custom instructions (stylesheet element types).

The abandoned XSLT 1.1 draft ( 4.1 ) provided the xsl:script instruction that made it possible to place extension code right into your stylesheet. For good or for bad, this is now gone. To quote the 2.0 Working Draft, "This specification does not define any mechanism for creating or binding implementations of extension instructions or extension functions, and does not require that implementations support any such mechanism. Such mechanisms, if they exist, are implementation-defined."

For extension functions, you have to declare a namespace that will allow the processor to find the extension you created. For Java-based processors, this namespace's URI usually includes the complete class name of your extension function (e.g., com.projectname.xslt.graph ). After that, functions whose names contain that namespace's prefix will be sought in the corresponding class. We'll see many examples of this in Chapter 5.


	Amazon