XSLT: A Quick Introduction | Beginning ASP.NET Databases Using VB.NET

XSLT is a language for describing XML transformations ”that is, operations that take one or more XML documents as input and produce one or more XML documents as output. The language was developed within W3C as part of a wider exercise concerned with styling, or rendition , of XML: hence the name "eXtensible Stylesheet Language ”Transformations."

XSLT 1.0 was published as a W3C Recommendation on 16 November 1999. During the first three years of its life it has attracted a substantial number of implementations (probably as many as twenty), including implementations built into the two main web browsers, Internet Explorer and Netscape, and a number of open -source implementations, one of these being Saxon, distributed by the author of this chapter. Most of the implementations achieve an excellent level of conformance to the W3C specification, although the existence of vendor extensions means that portability is not always as easy to achieve across implementations as one might like. The language has been widely adopted by the user community, despite having a reputation in some quarters for being difficult to learn and sluggish in performance.

Probably 80 percent of the actual usage of XSLT today is for transforming XML to HTML. This is handled by treating the result document as a well- formed XML tree, with the transformation being followed by a serialization phase that translates this tree into an HTML output file. Another 10 percent of XSLT usage performs the function of rendering XML into other display formats, such as SVG, WML, or PDF (via the other part of XSL, the formatting objects vocabulary). The remaining 10 percent of usage is in XML-to-XML applications, notably the transformation of messages sent between applications in an enterprise integration infrastructure, either within an organization or across organization boundaries. But although small today, this segment of the market is probably the one that is growing fastest , and the one that offers the greatest commercial returns for suppliers.

Some of the key characteristics of XSLT as a language are listed below:

XML-based syntax : An XSLT transformation program (referred to, for historic reasons, as a stylesheet) is itself an XML document. This feature is particularly useful when large parts of the stylesheet contain fixed, or relatively fixed, XML elements and attributes to be written directly to the output, because then we can regard the stylesheet as a template for the result document. Another useful consequence of this design decision is that we can use XSLT stylesheets as the source or target of further transformations. Although this appears at first sight to be a rather exotic idea, it is actually common in large-scale applications for stylesheets to be generated or adapted using "meta-stylesheets," which are themselves written in XSLT.
Declarative, functional programming model : The basic programming paradigm of XSLT is functional programming. A stylesheet describes a transformation of a source tree to a result tree. The result tree is a function of the source, and individual subtrees of the result are functions of the source information from which they are derived. Although a stylesheet contains constructs such as conditionals and iterations that are familiar from procedural programming, nothing in the language prescribes a certain order of execution. In particular, there are no assignment statements and no updateable variables . This feature probably accounts for the reputation of the language as being hard to learn, because web authors accustomed to languages like JavaScript find that it can require a considerable mental readjustment. It also accounts for the poor performance that is sometimes reported , because without a simple imperative model of what the machine is actually doing, it is easy for programmers to write extremely inefficient code. (For further discussion, see the section on optimization later in this chapter.)
Rule-based : An XSLT stylesheet is expressed as a collection of rules, in the tradition of text-processing languages like awk and sed . The rules consist of a pattern to be matched in the input, and instructions for generating nodes in the result tree (a template) when the pattern is matched. Unlike the rules in text-processing languages, however, the rules are not applied sequentially to each line of input text in turn ; instead, they perform a traversal of the input tree. In most simple transformations, each template rule for a parent node triggers the activation of the rules for its children, which results in a recursive, depth-first traversal of the source tree. But this is entirely under the control of the stylesheet author, and it is possible to traverse the input tree in any way the author chooses.

The advantage of this rule-based approach is that the stylesheet can be made very resilient to changes in the details of the structure of the input document. It is particularly good at handling the recursive structures that occur in " document-oriented " XML, which often have very liberal rules for the nesting of one tag within another. For " data-oriented " XML transformations, where the structures are more rigid, this style of processing has fewer advantages, and in fact there is no need to write every stylesheet in this way.
Tree-to-tree transformation : The input and output of a transformation are modeled as trees, not as serial XML. The construction of a source tree (using an XML parser) and the serialization of a final result tree are separate operations from the transformation itself, and in many applications they are not actually performed; for example, it is common to build a pipeline of transformations, so that the output of one is used directly as the input to the next , without intermediate serialization. This means that incidental details of the source XML (for example, the distinction between single and double quotes around attributes) are not visible to the application, and in general are not preserved through a transformation. Sometimes this can cause usability problems; for example, a transformation will always expand entity references and attribute defaults defined in a DTD, which is not ideal if the result document is intended to be further edited by the user.
Two-language model : XSLT uses XPath as a sublanguage. We examine the relationship of XSLT and XPath in more detail in the following section. Roughly speaking, XSLT instructions are used to produce nodes in the result tree and to control the sequence of processing. XPath expressions are used to select data from the source tree. XPath expressions are always invoked from XSLT instructions; there is no capability (in XSLT 1.0) of any callback in the opposite direction. This means that the language is not fully composable in the sense that any expression can be nested inside any other.

The segment of XSLT code in Listing 3.1 illustrates these features.

Listing 3.1 Code Illustrating Key Features of XSLT

 <xslt:template match="appendix">    <h2>       Appendix <xslt:number format="A">       <xslt:text>&nbsp;</xslt:text>       <a name="{@id}"/>       <xslt:value-of select="@title"/>    </h2>    <xslt:apply-templates/> </xslt:template>

This shows a single template rule. The pattern is very simple: match="appendix" indicates that the template rule matches elements named appendix . The body of the template rule defines nodes to be written to the output tree. Here xslt:number is an instruction for generating a sequence number; xslt:text indicates literal text to be written to the output tree; xslt:value-of computes the result of an XPath expression and writes that as text to the output tree; and xslt:apply-templates selects further nodes from the source tree (by default, the children of the current node), and causes them to be processed by firing the appropriate template rule for each one. The elements a and h2 , which are not in the XSLT namespace, are copied directly to the output. The curly brackets in the name attribute of the a element indicate an attribute value template : They enclose an XPath expression that computes a string value to be inserted in the content of the attribute. This construct is used because the constraints of XML syntax make it impossible to nest instructions inside an attribute value.

Suppose that the input document looks like this:

 <appendix id="bibl" title="Bibliography">    <para>A reference</para> </appendix>

We haven't shown the template rule that processes the para elements, but assuming it outputs an HTML p element and copies the textual content, the result of applying this stylesheet is likely to be:

 <h2>Appendix C&nbsp;<a name="bibl"/>Bibliography</h2> <p>A reference</p>