XSLT 2.0 as a Language | NetBeansв„ў IDE Field Guide: Developing Desktop, Web, Enterprise, and Mobile Applications (2nd Edition)

What are the most significant characteristics of XSLT as a language, which distinguish it from other languages? In this section I shall pick four of the most striking features: the fact that it is written in XML syntax, the fact that it is a language free of side effects, the fact that processing is described as a set of independent pattern-matching rules, and the fact that it has a type system based on XML Schema.

Use of XML Syntax

As we've seen, the use of SGML syntax for stylesheets was proposed as long ago as 1994, and it seems that this idea gradually became the accepted wisdom. It's difficult to trace exactly what the overriding arguments were, and when you find yourself writing something like:

  <xsl:variable name="y">   <xsl:call-template name="f">   <xsl:with-param name="x"/>   </xsl:call-template>   </xsl:variable>

to express what in other languages would be written as «y=f(x); », then you may find yourself wondering how such a decision came to be made.

In fact, it could have been worse ; in the very early drafts, the syntax for writing what are now XPath expressions was also expressed in XML, so instead of writing «select= "book/author/first-name" » you had to write something along the lines of:

  <select>   <path>   <element type="book">   <element type="author">   <element type="first-name">   </path>   </select>

The most obvious arguments for expressing XSLT stylesheets in XML are perhaps as follows :

There is already an XML parser in the browser; so it keeps the footprint small if this can be reused.
Everyone had got fed up with the syntactic inconsistencies between HTML/XML and CSS and didn't want the same thing to happen again.
The Lisp-like syntax of DSSSL was widely seen as a barrier to its adoption; so it would be better to have a syntax that was already familiar in the target community.
Many existing popular template languages (including simple ASP and JSP pages) are expressed as an outline of the output document with embedded instructions; so this is a familiar concept.
The lexical apparatus is reusable, for example Unicode support, character and entity references, whitespace handling, namespaces.
It's occasionally useful to have a stylesheet as the input or output of a transformation ( witness the Microsoft XSL converter as an example); so it's a benefit if a stylesheet can read and write other stylesheets.
Providing visual development tools easily solves the inconvenience of having to type lots of angle brackets.

Like it or not, the XML-based syntax is now an intrinsic feature of the language that has both benefits and drawbacks. It does make the language verbose, but in the end, the number of keystrokes has very little bearing on the ease or difficulty of solving particular transformation problems.

In XSLT 2.0, the long-windedness of the language has been reduced considerably by increasing the expressiveness of the non-XML part of the syntax, namely XPath expressions. Many computations that required five lines of XSLT code in 1.0 can be expressed in a single XPath expression in 2.0. Two constructs in particular led to this simplification: the conditional expression ( if..then..else ) in XPath 2.0; and the ability to define a function in XSLT (using <xsl:function> ) that can be called directly from an XPath expression. To take the example discussed earlier, if you replace the template «f » by a user -written function «f » , you can replace the five lines in the example with:

  <xsl:variable name="y" select="f($x)"/>

The decision to base the XSLT syntax on XML has proved its worth in several ways that I would not have predicted in advance:

It has proved very easy to extend the syntax. Adding new elements and attributes is trivial; there is no risk of introducing parsing difficulties when doing so, and it is easy to manage backwards compatibility. (In contrast, extending XQuery's non-XML syntax without introducing parsing ambiguities is a highly delicate operation.)
The separation of XML parsing from XSLT processing leads to good error reporting and recovery in the compiler. It makes it much easier to report the location of an error with precision and to report many errors in one run of the compiler. This leads to a faster development cycle.
It makes it easier to maintain stylistic consistency between different constructs in the language. The discipline of defining the language through elements and attributes creates a constrained vocabulary with which the language designers must work, and these constraints impose a certain consistency of design.

No Side Effects

The idea that XSL should be a declarative language free of side effects appears repeatedly in the early statements about the goals and design principles of the language, but no one ever seems to explain why: what would be the user benefit?

A function or procedure in a programming language is said to have side effects if it makes changes to its environment; for example, if it can update a global variable that another function or procedure can read, or if it can write messages to a log file, or prompt the user. If functions have side effects, it becomes important to call them the right number of times and in the correct order. Functions that have no side effects (sometimes called pure functions) can be called any number of times and in any order. It doesn't matter how many times you evaluate the area of a triangle, you will always get the same answer; but if the function to calculate the area has a side effect such as changing the size of the triangle, or if you don't know whether it has side effects or not, then it becomes important to call it once only.

I expand further on this concept in the section on Computational Stylesheets in Chapter 9 , page 625.

It is possible to find hints at the reason why this was considered desirable in the statements that the language should be equally suitable for batch or interactive use, and that it should be capable of progressive rendering. There is a concern that when you download a large XML document, you won't be able to see anything on your screen until the last byte has been received from the server. Equally, if a small change were made to the XML document, it would be nice to be able to determine the change needed to the screen display, without recalculating the whole thing from scratch. If a language has side effects then the order of execution of the statements in the language has to be defined, or the final result becomes unpredictable. Without side effects, the statements can be executed in any order, which means it is possible, in principle, to process the parts of a stylesheet selectively and independently.

Whether XSLT has actually achieved these goals is somewhat debatable. Certainly, determining which parts of the output document are affected by a small change to one part of the input document is not easy, given the flexibility of the expressions and patterns that are now permitted in the language. Equally, most existing XSLT processors require the whole document to be loaded into memory (there is a version of jd.xslt that is disk-based, but this is the exception to the rule). However, there has been research work that suggests the goals are achievable (search for papers on "Incremental XSLT Transformation" or "Lazy XSLT Transformation"). When E. F. Codd published the relational calculus in 1970, he made the claim that a declarative language was desirable because it was possible to optimize it, which was not possible with the navigational data access languages in use at the time. In fact, it took another 15 years before relational optimization techniques (and, to be fair, the price of hardware) reached the point where large relational databases were commercially viable . But in the end he was proved right, and the hope is that the same principle will also eventually deliver similar benefits in the area of transformation and styling languages.

Of course, there will always be some transformations where the whole document needs to be available before you can produce any output; examples are where the stylesheet sorts the data, or where it starts with a table of contents. But there are many other transformations where the order of the output directly reflects the order of the input, and progressive rendering should be possible in such cases. Xalan-J made a start on this by running a transformation thread in parallel with the parsing thread, so the transformer can produce output before the parser has finished, and MSXML3 (from the evidence of its API) seems to be designed on a similar principle. The Stylus Studio debugging tool tracks the dependencies between parts of the output document and the template rules that were used to generate them, and so one can start to see the potential to regenerate the output selectively when small changes are made.

What it means in practice to be free of side effects is that you cannot update the value of a variable. This restriction is something many users find very frustrating at first, and a big price to pay for these rather remote benefits. But as you get the feel of the language and learn to think about using it the way it was designed to be used, rather than the way you are familiar with from other languages, you will find you stop thinking about this as a restriction. In fact, one of the benefits is that it eliminates a whole class of bugs from your code. I shall come back to this subject in Chapter 9, where I outline some of the common design patterns for XSLT stylesheets and, in particular, describe how to use recursive code to handle situations where in the past you would probably have used updateable variables to keep track of the current state.

Rule-Based

The dominant feature of a typical XSLT stylesheet is that it consists of a sequence of template rules, each of which describes how a particular element type or other construct should be processed . The rules are not arranged in any particular order; they don't have to match the order of the input or the order of the output, and in fact there are very few clues as to what ordering or nesting of elements the stylesheet author expects to encounter in the source document. It is this that makes XSLT a declarative language, because you specify what output should be produced when particular patterns occur in the input, as distinct from a procedural program where you have to say what tasks to perform in what order.

This rule-based structure is very like CSS, but with the major difference that both the patterns (the description of which nodes a rule applies to) and the actions (the description of what happens when the rule is matched) are much richer in functionality.

Displaying a Poem

Let's see how we can use the rule-based approach to format a poem. Again, we haven't introduced all the concepts yet, and so I won't try to explain every detail of how this works, but it's useful to see what the template rules actually look like in practice.

Input

Let's take this poem as our XML source. The source file is called poem.xml , and the stylesheet is poem.xsl .

  <poem>   <author>Rupert Brooke</author>   <date>1912</date>   <title>Song</title>   <stanza>   <line>And suddenly the wind comes soft,</line>   <line>And Spring is here again;</line>   <line>And the hawthorn quickens with buds of green</line>   <line>And my heart with buds of pain.</line>   </stanza>   <stanza>   <line>My heart all Winter lay so numb,</line>   <line>The earth so dead and frore,</line>   <line>That I never thought the Spring would come again</line>   <line>Or my heart wake any more.</line>   </stanza>   <stanza>   <line>But Winter's broken and earth has woken, </line>   <line>And the small birds cry again;</line>   <line>And the hawthorn hedge puts forth its buds,</line>   <line>And my heart puts forth its pain.</line>   </stanza>   </poem>

Output

Let's write a stylesheet such that this document appears in the browser, as shown in Figure 1-8.

Figure 1-8

Stylesheet

It starts with the standard header.

  <xsl:stylesheet   xmlns:xsl="http://www.w3.org/1999/XSL/Transform"   version="1.0">

Now we write one template rule for each element type in the source document. The rule for the <poem> element creates the skeleton of the HTML output, defining the ordering of the elements in the output (which doesn't have to be the same as the input order). The <xsl:value-of> instruction inserts the value of the selected element at this point in the output. The <xsl:apply-templates> instructions cause the selected child elements to be processed, each using its own template rule.

  <xsl:template match="poem">   <html>   <head>   <title><xsl:value-of select="title"/></title>   </head>   <body>   <xsl:apply-templates select="title"/>   <xsl:apply-templates select="author"/>   <xsl:apply-templates select="stanza"/>   <xsl:apply-templates select="date"/>   </body>   </html>   </xsl:template>

In XSLT 2.0 we could replace the four <xsl:apply-templates> instructions with one, written as follows:

  <xsl:apply-templates select="title  _,  author, stanza, date"/>

This takes advantage of the fact that the type system for the language now supports ordered sequences. The «, » operator performs list concatenation and is used here to form a list containing the <title> , <author> , <stanza> , and <date> elements in that order. Note that this includes all the <stanza> elements, so in general this will be a sequence containing more than four items.

The template rules for the <title> , <author> , and <date> elements are very simple; they take the content of the element (denoted by «select="." » ), and surround it within appropriate HTML tags to define its display style.

  <xsl:template match="title">   <div align="center">   <h1><xsl:value-of select="."/></h1>   </div>   </xsl:template>   <xsl:template match="author">   <div align="center">   <h2>By <xsl:value-of select="."/></h2>   </div>   </xsl:template>   <xsl:template match="date">   <p><i><xsl:value-of select="."/></i></p>   </xsl:template>

The template rule for the <stanza> element puts each stanza into an HTML paragraph, and then invokes processing of the lines within the stanza, as defined by the template rule for lines:

  <xsl:template match="stanza">   <p><xsl:apply-templates select="line"/></p>   </xsl:template>

The rule for <line> elements is a little more complex: if the position of the line within the stanza is an even number, it precedes the line with two non-breaking-space characters (   ). The <xsl:if> instruction tests a boolean condition, which in this case calls the position() function to determine the relative position of the current line. It then outputs the contents of the line, followed by an empty HTML <br> element to end the line.

  <xsl:template match="line">   <xsl:if test="position() mod 2 = 0">&#160;&#160;</xsl:if>   <xsl:value-of select="."/><br/>   </xsl:template>

And to finish off, we close the <xsl:stylesheet> element.

  </xsl:stylesheet>

Although template rules are a characteristic feature of the XSLT language, we'll see that this is not the only way of writing a stylesheet. In Chapter 9, I will describe four different design patterns for XSLT stylesheets, only one of which makes extensive use of template rules. In fact, the Hello World stylesheet I presented earlier in this chapter doesn't make any real use of template rules; it fits into the design pattern I call fill-in-the-blanks , because the stylesheet essentially contains the fixed part of the output with embedded instructions saying where to get the data to put in the variable parts.

Types Based on XML Schema

I have described three characteristics of the XSLT language (the use of XML syntax, the principle of no side-effects, and the rule-based processing model) that were essential features of XSLT 1.0 and that have been retained essentially unchanged in XSLT 2.0. The fourth characteristic is new in XSLT 2.0, and creates a fundamental change in the nature of XSLT as a language. This is the adoption of a type system based on XML Schema.

There are two aspects to the type system of any programming language. The first is the set of types that are supported (for example, integers, strings, lists, tuples), together with the mechanisms for creating user-defined types. The second aspect is the rules that the language enforces to ensure type-correctness.

XSLT 1.0 had a very small set of types (integers, booleans, strings, node-sets , and result tree fragments ), and the rules it applied were what is often called "weak typing": this means that the processor would always attempt to convert the value supplied in an expression or function call to the type that was required in that context. This makes for a very happy-go-lucky environment: if you supply an integer where a string is expected, or vice versa, nothing will break.

XSLT 2.0 has changed both aspects of the type system. There is now a much richer set of types available (and this set is user-extensible), and the rules for type-checking are stricter.

We will look at the implications of this in greater detail in Chapter 4.