Section 4.3. Taming a functional language


Prev	don't be afraid of buying books	Next

4.3 Taming a functional language

If you're coming from a different programming background, one feature of XSLT (all versions) may seem especially difficult to grasp. I'm not referring to the XML-based syntax; once you get a feel for it, it is surprisingly transparent (even if bulky). For many novices, much more puzzling is XSLT's lack of an assignment operator. ^[7]

^[7] Note for C-literate readers: "=" in XPath always means comparison, never assignment.

Everything is possible by asking the right questions. XSLT was designed as a functional programming language. The functional programming paradigm dates from the 1980s ^[8] and has proved very useful, even if in a limited way. Other established functional languages include Haskell and Scheme.

^[8] Search the Web for Why Functional Programming Matters , a good historical article on functional programming.

A functional program, as the name implies, consists of functions. Unlike those in conventional programming languages, however, these functions are absolutely independent from each other. Each function has a set of arguments and returns a value, but it cannot produceor be affected byany side effects . In other words, if you pass the same set of arguments to a function, you will always get the same result, no matter at what point of program execution this happens or what other functions were called before it.

A conventional programming language encourages the imperative style of programming; in it, you give orders such as "take this, add to that, put the result there." Functional programming, on the other hand, is expressive ; here you don't give orders, but write expressions that nest all the way up from built-in primitives to the final program output. You can think of it this way: The goal of each function is not to perform a task, but to answer a question . Naturally, if you ask the same question, you should always get the same answerhence the ban on side effects.

Why is XSLT functional? This paradigm is naturally applicable to XSLT where a stylesheet consists of a number of largely independent templates (i.e., functions). It also enables an XSLT processor to perform efficient optimizations at runtime, for example by reordering or parallelizing the execution of templates.

The lack of variable assignment is thus a direct consequence of the functional programming paradigm. XSLT's variables are not in fact variable ; being once assigned a value, they never change for the rest of their life. You can create any number of new variables within the scope of a template or function, but they are not changeable within this scope and are not accessible outside the scope.

Relearn, rethink, rewrite. Even with immutable variables, functional languages are Turing-complete, which means you can use them to implement any imaginable algorithm. Let's examine a few typical situations where the functional paradigm may appear especially limitingand see how we can cope.

Local variables. Sometimes, XSLT novices attempt to use global variables where local ones would suffice. If several templates of your stylesheet use a variable with the same meaning but unconnected values, you don't need to make this variable global. Each template may have its own local declaration and provide its own value for the same-named variable.
Passing parameters. But what if you need to pass the value around, that is, to set the value in one template and use it in another? In accordance with the functional programming principles, you should make the second template callable ( 4.5.1 ) and explicitly pass the required value to it via a parameter. Thus, even though global variables are immutable, you can still exchange information among templates via parameters.

Quite often, however, the urge to share variables between templates is a sign of bad stylesheet design. See if you can rearrange your code into different template chunks , eliminating the need to communicate variable values among them. You may have to use template modes to let one source node trigger different templates (see Example 4.2, page 169).
Recursion. You cannot change the value of a variable, but you can call a function or template with different values of parameters. This means that by recursively calling a function or template from itself, you can keep track of a counter or accumulator variable. Example 4.1 shows a definition for the function eg:fact() that calculates the factorial of its argument.

Example 4.1. A function calculating factorial as an example of XSLT recursion.
```
 <xsl:function name=  "eg:fact"  >   <xsl:param name=  "n"  />   <xsl:choose>     <xsl:when test=  "$n = 1 or $n = 0"  >       <xsl:value-of select=  "1"  />     </xsl:when>     <xsl:otherwise>       <xsl:value-of select=  "$n * eg:fact($n - 1)"  />     </xsl:otherwise>   </xsl:choose> </xsl:function> 
```
You can now use this function in your XPath expressions, for example:
```
 <xsl:value-of select=  "eg:fact(12)"  />  
```
A downside to this approach is that in most processors, XSLT recursion is costly in terms of memory and may be very slow. If this is becoming a problem, read on for other suggestions.
XPath tools. In some cases, algorithmic patterns that are difficult to express in a purely functional style become much more accessible if you take advantage of XPath functions and operators that work with sequences. For example, if you want to do something with each character of a string, in most programming languages you write a while loop with an index variable incremented on each iteration. With XSLT 2.0, you can use the to operator of XPath to create a sequence of integers and iterate over that sequence by an xsl:for-each :
```
 <xsl:for-each select=  "1 to string-length($s)"  >  The character at position  <xsl:value-of select=  "."  />  is'  <xsl:value-of select=  "substring($s, ., 1)"  />'. </xsl:for-each> 
```
You cannot change values of global variables, but you can store any values in temporary XML documents (with the xsl:result-document instruction) and read them back (with the document() function). The XSLT 2.0 specification forbids you from reading back the document you have just created in the same stylesheet run , but you can rely on it being there when you run that (or any other) stylesheet next time. Thus, temporary documents may be a complete functional substitute for assignable variables so long as you break your transformation algorithm into separate stylesheets so that no such "variable" is written and read in the same stylesheet run.

This mechanism is especially useful when implementing complex multidocument transformations. For example, this is how the Index and the Table of Contents for this book were produced. When transforming each chapter (stored in a separate file), two auxiliary documents are created containing extracted index terms and section headings. Later, a separate stylesheet reads in these auxiliary documents from all chapters, merges them together, and processes the result to produce the Index and TOC. ^[9]

^[9] Actually, this process is a bit more complex, since it also involves extracting the corresponding page numbers from formatted chapters.

With a wee bit of extension programming, you can even run one stylesheet from within another ( 5.6 ).
Chaining templates. If you want to make some preliminary changes to the input document and then process this changed version, you don't even need two stylesheet runs for this. Just assign a special mode attribute value, e.g. first-pass , to all the templates performing the preprocessing, save their output into a variable, and apply the second-pass templates to this variable instead of the input document. All of this could be done in a template matching /, as Example 4.2 demonstrates .

Example 4.2. Processing input in two independent passes , storing the intermediate tree in a variable.
```
 <xsl:template match=  "/"  >      <xsl:variable name=  "intermediate"  >     <xsl:apply-templates mode=  "first-pass"  select=  "*"  />   </xsl:variable>      <xsl:result-document href=  "intermediate.xml"  >     <xsl:copy-of select=  "$intermediate"  />   </xsl:result-document>      <xsl:apply-templates select=  "$intermediate/*"  /> </xsl:template> 
```
If you are adding a preprocessing pass to an existing stylesheet, no other changes are necessary. The templates of the second pass (those without any mode attribute) won't have the slightest suspicion that what they work with is not the genuine source document but its preprocessed version stored in a variable. This demonstrates that even though you cannot change global variable values within templates, you can still pass the value returned by one template as input to another using a third template's local variable.
If none of the above methods work for you, you can write your own extensions in a nonfunctional programming language and link them up to your stylesheet. The language most frequently used for this purpose is Java, in part because some of the major XSLT processors ( 6.4.1 ) are also written in Java and their extensibility mechanisms for this language are well defined.

The main advantage of this method is efficiency: Extension functions are usually faster than those written in XSLT. On the downside, with extension functions it may be difficult to pass and return complex values such as nodesets. Also, in an extension function you may have (depending on the processor) little or no access to the XPath engine or to the parsed tree of the source document. Because of this, extensions work best for simple but performance-critical tasks such as processing a document's data ( 5.4.2 ).
As an absolutely last resort, Saxon offers the saxon:assign extension instruction; see 4.4.2.1 for a discussion.


	Amazon

4.3 Taming a functional language

Example 4.1. A function calculating factorial as an example of XSLT recursion.

Example 4.2. Processing input in two independent passes , storing the intermediate tree in a variable.