Chapter 2: Xalan | Professional XML Development with Apache Tools: Xerces, Xalan, FOP, Cocoon, Axis, Xindice (Wrox Professional Guides)

The Apache XSLT processing engine is Xalan (named after a rare musical instrument). There are both Java and C++ versions of Xalan. The current version of Xalan implements the XSLT 1.0 and XPath 1.0 Recommendations from the World Wide Web Consortium (W3C). The Xalan code base was originally donated by IBM’s Lotus subsidiary, where it was known as LotusXSL. Xalan is on its second-generation code base, after some substantial changes were made from Xalan-Java 1. Since version 2.1.0, released in 2001, Xalan has also included the XSLTC compiler originally developed by Sun Microsystems.

Xalan works by interpreting the XSLT stylesheet—it translates the stylesheet as it’s processing it. XSLTC compiles the stylesheet into a Java class called a translet. This avoids the overhead of translating the stylesheet over and over, which is particularly important in server applications where you’re using the same stylesheet to transform many different XML documents.

Prerequisites

The XPath and XSLT technologies are used in a number of the xml.apache.org projects, so the next two sections of this chapter will review their key concepts. If you’re proficient with XPath and XSLT, you can probably skim or skip these sections. XPath and XSLT are topics large enough for their own books, so be aware that what you’re about to read is a quick review.

XPath

XPath is a language for specifying (addressing) parts of an XML document. Its data model treats an XML document as a tree of nodes. There are seven types of nodes, as shown in the following table.

Node Type	Description
Root	The root node of the XPath tree. The root element of the XML document is a child of this node.
Element	There is one element node for every element in the XML document.
Text	Character data inside elements is represented as elements. This includes data in CDATA sections. Unlike the DOM, the XPath data model puts as much character data into a text node as possible.
Attribute	There is one attribute node for every attribute of every element in the XML document. The element to which the attributes are attached is the parent of those attribute nodes. The attribute nodes aren’t children of that element node.
Namespace	There is one namespace node for every namespace prefix in scope for an element. The element for which the prefixes are in scope is the parent of those namespace nodes. The namespace nodes aren’t children of that element node.
Processing instruction	There is a processing instruction node for every processing instruction outside of the DTD.
Comment	There is a comment node for every comment outside of the DTD.

XPath is an expression-based language. This means every syntactically correct construct in XPath yields a value. XPath expressions can produce four different types of results: node sets (a set of nodes), booleans, numbers, and strings. The most important kinds of expressions in XPath are those that produce node sets. The type of expression used most commonly to generate a node set is a location path.

Location Paths

A location path is an expression that selects a node set from the XPath data model for an XML document. Location paths are rooted at a context node, which is part of the overall context used to evaluate XPath expressions. XPath was designed for use in both XSLT and XPointer, and because of the differences in what these two specifications are trying to achieve, the context for an XPath location path is different depending on whether the XPath is being used inside XSLT or XPointer. For an understanding of XPath itself, you don’t need to know whether XSLT or XPointer is being used (we’ll look at how XSLT defines the XPath context when we get to XSLT). For now, you need to know that all XPath expressions are evaluated in a context provided by either XSLT or XPath. The context includes a node (the context node), the context position, the context size, a set of variable bindings, a function library, and a set of namespace declarations for the expression.

There are two types of location paths. Relative location paths select nodes relative to the context node. Absolute location paths begin with the slash (/) character and select nodes relative to the root node of the document. You can think of an absolute location path as a relative location path whose context node is the root node of the document.

A location path is composed of a number of location steps, separated by a slash. The first location step selects a set of nodes relative to the context node. This node set is then used to form context nodes for the next location step. Every node in the node set obtained from the first location step is used as a context node, and the next location step is applied to those context nodes in order to select another set of nodes. This results in a set of node sets, one node set for every derived context node. This set of node sets is unioned together to produce a single node set. This process is repeated for each location step until there are no more location steps in the location path.

Let’s look at an example to demonstrate this process. The following listing shows an XML document that represents a collection of books:

  1: <?xml version="1.0" encoding="UTF-8"?>   2: <books xmlns="http://sauria.com/schemas/apache-xml-book/books"    3:   xmlns:tns="http://sauria.com/schemas/apache-xml-book/books"    4:   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"    5:   xsi:schemaLocation=   6:     "http://sauria.com/schemas/apache-xml-book/books   7:      http://www.sauria.com/schemas/apache-xml-book/books.xsd"   8:   version="1.0">   9:   <book>  10:    <title>Professional XML Development with Apache Tools</title>  11:    <author>Theodore W. Leung</author>  12:    <isbn>0-7645-4355-5</isbn>  13:    <month>December</month>  14:    <year>2003</year>  15:    <publisher>Wrox</publisher>  16:    <address>Indianapolis, Indiana</address>  17:   </book>  18:   <book>  19:    <title>Effective Java</title>  20:    <author>Joshua Bloch</author>  21:    <isbn>0-201-31005-8</isbn>  22:    <month>August</month>  23:    <year>2001</year>  24:    <publisher>Addison-Wesley</publisher>  25:    <address>New York, New York</address>  26:   </book>  27:   <book>  28:    <title>Design Patterns</title>  29:    <author>Erich Gamma</author>  30:    <author>Richard Helm</author>  31:    <author>Ralph Johnson</author>  32:    <author>John Vlissides</author>  33:    <isbn>0-201-63361-2</isbn>  34:    <month>October</month>  35:    <year>1994</year>  36:    <publisher>Addison-Wesley</publisher>  37:    <address>Reading, Massachusetts</address>  38:   </book>  39: </books>

Now let’s look at a simple location path: /tns:books/tns:book. This location path selects the three book elements that are children of the book element. Let’s start at the beginning of the location path and see what’s going on here. The initial slash tells you that this is an absolute location path. The first location step is tns:books. This location path selects any child of the root node whose name is tns:books. Here tns:books represents the namespace-qualified books element. It’s important to include the tns: prefix because the books element is in a namespace—otherwise no nodes will match. The result of applying tns:books is a node set that contains the single node representing the books element in the document. Now you use that node set as a set of context nodes for the next location step, tns:book. This location step selects any child of the context node whose name is tns:book. This results in a node set with the three book elements from the document. At this point, you’ve run out of location steps, so this node set is the final value of the location path.

In the expression /tns:books/tns:book, you’re looking at fairly simple location steps that select nodes according to their expanded names. An expanded name is composed of a namespace URI and a local part. Any node that can be affected by namespace declarations has an expanded name. Two expanded names are equal if they have the same local part; in addition, their namespace URIs must both be null, or, if they aren’t null, they must be equal. Location steps can be quite a bit more complicated than this, though. A location step has three parts:

An axis—A number of axes are defined on the XPath data model tree. These axes define sets of nodes that have a specified relationship with the context node. So far, you’ve just seen the use of the child axis, which includes a context node and its children. The location step only selects nodes from the axis.
A node test—Node tests specify the type and expanded name of the nodes the location step should select.
Zero or more predicates—Predicates filter nodes out of the node set being selected by the location step. Predicates are defined using XPath expressions.

All the location steps we’ll be looking at use an abbreviated syntax that makes them easier to write. The full syntax for a location step is axis::node-test[predicate0]…[predicaten].

Now, let’s look at each of these components in more detail.

Axes

Let’s begin with axes. The following table describes all the XPath axes. It’s important to keep in mind that everything else in a location step is done relative to the axis (unless the axis is explicitly specified in a predicate).

Axis Name	Description
child	Contains the children of the context node.
descendent	Contains the descendents of the context node—that is, the children and the children of the children, and so on. Note that attribute and namespace nodes aren’t included because they aren’t children.
parent	Contains the parent of the context node if there is one.
ancestor	Contains the parent of the context node, and the parent of the parent, and so on, all the way up to and including the root node.
following-sibling	Contains all the nodes that are siblings of the context node (have the same parent) and that come after the context node in document order. If the context node is an attribute or namespace node, then this axis is empty.
preceding-sibling	Contains all the nodes that are siblings of the context node (have the same parent) and that come before the context node in document order. If the context node is an attribute or namespace node, then this axis is empty.
following	Contains all the nodes that come after the context node in document order, excluding its descendants, attribute nodes, and namespace nodes.
preceding	Contains all the nodes that come before the context node in document order, excluding its descendants, attribute nodes, and namespace nodes.
attribute	Contains the attribute nodes of the context node if the context node is an element node; otherwise this axis is empty.
namespace	Contains the namespace nodes of the context node if the context node is an element node; otherwise this axis is empty.
self	Contains the context node itself.
descendent-or-self	Contains the context node and the descendents of the context node.
ancestor-or-self	Contains the context node and the ancestors of the context node, all the way up to the root node.

Node Tests

Node tests work on the principal node type of the location step’s axis. The following table shows the principal node types by axis.

Axis	Principal Node Type
attribute	Attribute
namespace	Namespace
All others	Element

There are four kinds of node tests:

QName—Any node of the principal node type whose expanded name is the same as the expanded name of the QName. The location step tns:books is a QName node test.
NCName:*—Any node of the principal node type whose expanded name’s namespace URI is the same as the namespace URI for the prefix denoted by the NCName. The location step tns:* uses an NCName node test.
*—Selects any node of the principal node type. So, child::* selects all element children of the context node, because elements are the principal node type for the child axis.
NodeType—Selects any node of type NodeType. Type tests are element, attribute, text, comment, and processing-instruction. Node matches any node of any type. The location step child::text selects all text nodes in the child axis.

Predicates

Predicates are expressions used to filter nodes out of a node set. A predicate expression is evaluated over each node in the node set. The node from the node set is used as the context node for the predicate expression. Predicate expressions are Boolean values. If the expression evaluates to true, the node is part of the result; if the expression evaluates to false, it isn’t.

There’s one additional twist to predicates. The predicate expression can evaluate to a number. This number is then automatically compared to the context position, yielding true if the number and the context position are the same, or false otherwise. You’re probably wondering how the context position is determined. To describe this, we need to take a detour and define the proximity position of a node with respect to an axis. Axes can be either forward axes or reverse axes. A forward axis contains only the context node or nodes after the context node in document order (we’ll define this in a minute). A reverse axis contains only the context node or nodes before the context node in document order. Document order is an ordering on the nodes in a node set that is supposed to correspond to the natural order of the content of the XML document. Element nodes occur before their children. An element’s attribute and namespace nodes occur after the element node but before its children, and namespace nodes occur before the attribute nodes. Reverse document order is the reverse of document order. The proximity position of a node with respect to an axis is the position of the node when the node set is ordered in document order (for a forward axis) or reverse document order (for a reverse axis). Positions are numbered starting at 1. The context position of the context node in a predicate expression is the proximity position of the node with respect to the axis used by the predicate. In this setting, the context size is the size of the node set being used as input to the predicate.

Let’s look at a few predicate examples:

The location path /tns:books/tns:book[tns:year=’2003’] selects the tns:book children of the tns:books element (from the root) where the tns:year children of the book have the value 2003. The predicate [tns:year=’2003’] narrows down the node set produced by the /tns:books/tns:book location path.

Here’s an example using the context position. The XPath /tns:books/tns:book[2] selects the second tns:book child of the tns:books child of the root element.

Our last example shows how you can specify the axis to get what you’re looking for. The path /tns:books[attribute::version=’1.0’] asks for the tns:books child of the root node, and then asks for only those nodes whose version attribute (from the attribute axis) is equal to ‘1.0’.

Expressions

We’re going to wind up our coverage of XPath by filling in some details related to expressions. First, we’ll look at the functions XPath allows you to call, and then we’ll examine the abbreviated syntax for XPath. First let’s finish some details of the syntax for building expressions.

Expressions can be combined using familiar operators. The operators that work on expression are as follows, grouped in order of decreasing precedence:

| (union of two location paths / node sets)
and, or
=, !=
<, >, <=, >=

All XPath expressions yield one of the following four types: node-set, boolean, number, or string.

XPath Core Function Library

XPath defines a standard library of functions to be used in expressions. XSLT and XPointer extend this library with functions specific to their vocabularies. There are four groupings of functions, according to the four types available in XPath 1.0 (see the following tables).

Function	Description
number last()	Returns the context size.
number position()	Returns the context position.
number count(node-set)	Returns the number of nodes in the node set.
node-set id(object)	Returns the element whose unique ID (ID valued attribute) is object.
string local-name(node-set)	Returns the local name of the node from the node set that is first in document order.
string namespace-uri(node-set)	Returns the namespace URI of the node from the node set that is first in document order.
string name(node-set)	Returns a QName representing the expanded name of the node from the node set that is first in document order.
string string(object)	Returns an object as a string.
string concat(string, string, string*)	Returns the string concatenation of all the arguments.
boolean starts-with(string,string)	Returns true if the first argument starts with the second argument.
boolean contains(string, string)	Returns true if the first argument contains the second argument.
string substring-before(string, string)	If the second string is a substring of the first string, returns the portion of the first string before the first occurrence of the second string.
string substring-after(string, string)	If the second string is a substring of the first string, returns the portion of the first string after the first occurrence of the second string.
string substring(string, number, number?)	Returns the substring of the first argument starting at the position indicated by the second argument. If the third argument is present, the substring stops at the position indicated by the third argument; otherwise it continues to the end of the string.
number string-length(string?)	Returns the length of the string. If no argument is provided, converts the context node to a string and returns the length of that string.
string normalize-space(string?)	Returns a normalized version of the string stripped of leading and trailing whitespace and with all sequences of whitespace characters replaced with a single space.
string translate(string, string, string)	Translates the string in the first argument by taking characters from the second argument and substituting them with the corresponding character from the third argument. See the following example. The result of translate(“characters”,”aeiou”,”AEIOU") is characters. The characters in the first argument are examined one by one. If they match a character in the second argument, they’re replaced by the character at the same position (from the second argument) in the third argument.
boolean boolean(object)	Converts an object to a boolean. A number is true if it’s neither positive nor negative zero or NAN. A node set is true if and only if it’s non-empty. A string is true if and only if its length is non-zero.
boolean not(boolean)	Returns true if the argument is false, or false otherwise.
boolean true	Returns true.
boolean false	Returns false.
boolean lang(string)	Returns true if the context node’s xml:lang attribute is the same as the first argument. XPath provides some basic operators for numbers, including +, -, *, div, mod, and unary -. It also provides the functions in the following table.
Number number(object)	Converts an object to a number. Strings become numbers if they’re IEEE 754 formatted. Boolean true becomes 1; false becomes 0. A node set is converted to a string, and that string is converted to a number.
Number sum(node-set)	Returns the sum of converting all nodes in the node set to strings and those strings into numbers.
Number floor(number)	Returns the largest integer that isn’t greater than the number.
Number ceiling(number)	Returns the smallest integer that isn’t less than the number.
Number round(number)	Returns the integer that’s closest to the number. Prefers the integer closer to positive infinity if there is a tie.

Abbreviated Syntax

To make it easier to write XPath expressions, you can use the following abbreviated syntax:

If no axis is specified, child:: is assumed.
The attribute axis attribute:: can be abbreviated @.
// is an abbreviation for /descendent-or-self::node()/.
. is an abbreviation for self::node().
.. is an abbreviation for parent::node().

Let’s look at some of the examples we’ve used so far in this chapter, and compare abbreviated and unabbreviated forms:

/tns:books/tns:book => /child::tns:books/child::tns:book
/tns:books/tns:book[tns:year=’2003’] => /child::tns:books/child::tns:book[child::tns:year = ‘2003’]
/tns:books/tns:book[2] => /child::tns:books/child::/tns:book[2]
/tns:books[attribute::version=’1.0’] => /child::tns:books[(attribute::version = “1.0”)] (full)
/tns:books[attribute::version=’1.0’] => /tns:books[@version=’1.0’] (fully abbreviated)

That concludes our quick review of XPath. Now we’ll see how XPath fits into XSLT to give you an XML transformation system.

XSLT

XSL Transformations, or XSLT, is a language for transforming XML documents into other XML documents. It’s based on a declarative programming paradigm, which means you specify what you want the results of the transformation to look like as opposed to specifying how you want the transformation to be performed. XSLT looks at XML documents as trees, using a data model that’s very similar to the one used by XPath. The biggest difference between the XPath data model and the XSLT data model is that text nodes containing only whitespace are removed from the source tree and stylesheet trees after they’re constructed, but prior to stylesheet processing. An XSLT transformation gives a set of rules that are used to transform a source tree into a result tree.

An XSLT transformation is expressed as an XML document. The document must be well-formed and obey the rules for namespace usage in XML. This document is also known as a stylesheet. The stylesheet contains XML elements that are defined by XSLT as well as elements that aren’t defined by XSLT, but that appear in either the source or result document. Namespaces are used to keep the XSLT source and result elements from getting mixed up.

Templates

A stylesheet contains a number of XSLT constructs, but the most important content is a set of template rules, or templates. These templates do the job of transforming the source document. A template rule has two parts: a pattern that is matched against nodes from the source tree, and a template that can be instantiated as part of the result tree. The result tree is constructed by processing the current node list until it’s empty. Initially, the current node list contains the root node of the source tree. So, the XSLT engine attempts to process the current node (in this case, the root node) from the current node list (also the root node in this case). To process a node, the engine looks for all templates whose pattern matches the current node. If there is more than one template whose pattern matches, then a conflict-resolution procedure is applied to select a single template. Once a single template has been selected, it’s instantiated to create part of the result tree. As part of the template instantiation, the template may add more nodes from the source tree to the current node list. The process then repeats by selecting a new current node from the current node list until the current node list is empty.

Let’s see some of this in action in a real XSLT stylesheet. You’ll write a stylesheet that converts the earlier books inventory into HTML. Here’s the beginning of that stylesheet:

  1: <?xml version="1.0" encoding="UTF-8"?>   2: <xsl:stylesheet version="1.0"    3:   xmlns:xsl="http://www.w3.org/1999/XSL/Transform"   4:   xmlns:books="http://sauria.com/schemas/apache-xml-book/books">

The root element of the stylesheet comes from the XSLT namespace. This namespace’s prefix is xsl by convention. Note that you declare the xsl prefix and the books prefix on the root element. Because the books vocabulary is in a namespace, the stylesheet needs to be able to get items in the namespace. Elements from HTML will be unprefixed.

The xsl:output element defines the output method for this stylesheet. There are output methods for XML, HTML, and text:

Lines 6-17 contain a pair of template definitions:

  5:     <xsl:output method="html" indent="yes"/>   6:     <xsl:template match="books:books">   7:       <html>   8:         <head><title>Book Inventory</title></head>   9:         <body>  10:           <xsl:apply-templates/>  11:         </body>  12:       </html>  13:     </xsl:template>  14:     <xsl:template match="books:book">  15:        <xsl:apply-templates/>  16:       <p/>  17:     </xsl:template>  18: </xsl:stylesheet>

The match attribute of the xsl:template element defines the pattern that is checked against the current node. This stylesheet is processed by starting with the root node of the document and looking for a pattern that matches. The first template’s pattern books:books selects any child node named books:books, which is the child of the root node in this case. (Note that you could have written the pattern as /books:books to make sure the pattern started with the root node, but making the path relative is more flexible.) The selected books:books node is now the current node. The body of the template (between the xsl:template start and end tags) contains the template that’s instantiated. In this template, all elements without a namespace prefix are from HTML and are copied directly into the result tree. This creates the skeleton of an HTML document, which then needs to be filled in by the rest of the stylesheet. As the XSLT engine is instantiating this template, it encounters the xsl:apply-templates element. This element says (because it has no attributes), select all the children of the current node (now the books:books node), find the templates that match them, and instantiate those templates. This is how the rest of the source tree gets processed.

Now let’s follow what happens with the xsl:apply-templates element. The books:books element has three children, each of which is a books:book element. So, xsl:apply-templates tries to find a template for each of these children. In this case it’s easy, because the children are all the same kind of node and there are only two templates in the stylesheet. The XSLT engine finds the second template, which selects child nodes named books:book. The body of this template contains xsl:apply-templates and an HTML <p> (paragraph) element. Let’s put off processing of xsl:apply-templates for a moment, and step back to the books:books template. The result tree now has the HTML skeleton and the result of processing the books:book template (some unknown data and a <p> element) for each of the books:book elements in the source tree. Now we can look at the processing of the xsl:apply-templates element in the books:book template. This is a bit confusing, because you don’t have any additional templates in the stylesheet and none of the templates in the stylesheet match.

XSLT defines a built-in template rule for each node type in the XSLT data model. For the root node and element nodes, the built-in rule evaluates xsl:apply-templates for all the children of the current node. The built-in rule for text and attribute nodes copies the value of the text or attribute value into the result tree, while the rule for processing instructions and comments does nothing. So, when xsl:apply-templates for books:book is processed, the built-in rules are used for the children of books:book. The result of applying the stylesheet looks like this:

  1: <html    2:   xmlns:books="http://sauria.com/schemas/apache-xml-book/books">   3: <head>   4: <META    5:   http-equiv="Content-Type" content="text/html; charset=UTF-8">   6: <title>Book Inventory</title>   7: </head>   8: <body>   9:     10:    Professional XML Development with Apache Tools  11:    Theodore W. Leung  12:    0-7645-4355-5  13:    December  14:    2003  15:    Wrox  16:    Indianapolis, Indiana  17:   <p></p>  18:     19:    Effective Java  20:    Joshua Bloch  21:    0-201-31005-8  22:    August  23:    2001  24:    Addison-Wesley  25:    New York, New York  26:   <p></p>  27:     28:    Design Patterns  29:    Erich Gamma  30:    Richard Helm  31:    Ralph Johnson  32:    John Vlissides  33:    0-201-63361-2  34:    October  35:    1994  36:    Addison-Wesley  37:    Reading, Massachusetts  38:   <p></p>  39:   40: </body>  41: </html>

You can see how the built-in templates caused the text children of books:book to be copied through unchanged.

Now let’s expand the stylesheet to let you control the formatting of the children of books:book. In order to do this, you need to add a bunch of templates. Almost all of these templates select the specific child of books:book that they’re supposed to format. The body of the templates includes some HTML elements to do the formatting. They use the xsl:value-of element to insert a value from the source tree into the result tree. In all of these templates, you use a dot (.) (the location path for the current node), but you could use any location path you wanted. The full stylesheet is as follows:

  1: <?xml version="1.0" encoding="UTF-8"?>   2: <xsl:stylesheet version="1.0"    3:   xmlns:xsl="http://www.w3.org/1999/XSL/Transform"   4:   xmlns:books="http://sauria.com/schemas/apache-xml-book/books">   5:     <xsl:output method="xml" version="1.0" encoding="UTF-8"    6:         indent="yes"/>   7:     <xsl:template match="books:books">   8:       <html>   9:         <head><title>Book Inventory</title></head>  10:         <body>  11:           <xsl:apply-templates/>  12:         </body>  13:       </html>  14:     </xsl:template>  15:     <xsl:template match="books:book">  16:        <xsl:apply-templates/>  17:       <p />  18:     </xsl:template>  19:     <xsl:template match="books:title">  20:       <em><xsl:value-of select="."/></em><br />  21:     </xsl:template>  22:     <xsl:template match="books:author">  23:       <b><xsl:value-of select="."/></b><br />  24:     </xsl:template>  25:     <xsl:template match="books:isbn">  26:       <xsl:value-of select="."/><br />  27:     </xsl:template>  28:     <xsl:template match="books:month">  29:       <xsl:value-of select="."/>,   30:     </xsl:template>  31:     <xsl:template match="books:year">  32:       <xsl:value-of select="."/><br />  33:     </xsl:template>  34:     <xsl:template match="books:publisher">  35:       <xsl:value-of select="."/><br />  36:     </xsl:template>  37:     <xsl:template match="books:address">  38:       <xsl:value-of select="."/><br />  39:     </xsl:template>  40: </xsl:stylesheet>

Note that this stylesheet handles multiple authors with ease. The third books:book element has four books:author elements, so as xsl:apply-templates is processing the children of that element, it happily matches each of the books:author elements in the source tree and processes each one using the correct template. The fully formatted version looks like this:

  1: <?xml version="1.0" encoding="UTF-8"?>   2: <html    3:  xmlns:books="http://sauria.com/schemas/apache-xml-book/books">   4: <head>   5: <title>Book Inventory</title>   6: </head>   7: <body>   8:      9:    <em>Professional XML Development with Apache Tools</em>  10: <br/>  11:    <b>Theodore W. Leung</b>  12: <br/>  13:    0-7645-4355-5<br/>  14:    December,   15:       16:    2003<br/>  17:    Wrox<br/>  18:    Indianapolis, Indiana<br/>  19:   <p/>  20:     21:    <em>Effective Java</em>  22: <br/>  23:    <b>Joshua Bloch</b>  24: <br/>  25:    0-201-31005-8<br/>  26:    August,   27:       28:    2001<br/>  29:    Addison-Wesley<br/>  30:    New York, New York<br/>  31:   <p/>  32:     33:    <em>Design Patterns</em>  34: <br/>  35:    <b>Erich Gamma</b>  36: <br/>  37:    <b>Richard Helm</b>  38: <br/>  39:    <b>Ralph Johnson</b>  40: <br/>  41:    <b>John Vlissides</b>  42: <br/>  43:    0-201-63361-2<br/>  44:    October,   45:       46:    1994<br/>  47:    Addison-Wesley<br/>  48:    Reading, Massachusetts<br/>  49:   <p/>  50: </body>  51: </html>

The xsl:apply-template element can have an attribute named select, whose value is a location path. This location path is used to select the nodes that are processed by xsl:apply-templates. Until now, you haven’t seen stylesheets that use the select attribute, which means all children of the current node are used. The select attribute gives you more control over what’s included in the result tree. The uses of select by xsl:value-of in the previous listing are good pointers to the way you would use select with xsl:apply-templates.

Output

XSLT uses the xsl:output element to control how the result tree is output. The xsl:output element has a number of attributes that control the output process. The most important is the method attribute, which selects the output method to use. XSLT includes three predefined output methods—one for XML, one for HTML, and one for text. The XML and HTML output methods obey the rules for XML and HTML, respectively. The HTML method has more work to do; it doesn’t output end tags for certain HTML tags, such as area, br, co, hr, img, and a few others. The text output method only outputs the text nodes of a document (in document order). All other nodes are ignored. The value of the method attribute should be “xml” for XML output, “html” for HTML output, and “text” for text output. The method attribute can also be a QName, in which case the XSLT processor is responsible for supplying an output method.

The XML and HTML output methods escape the & and < characters when they appear in a text node, to ensure that the output document is well-formed XML or HTML. The xsl:value-of and xsl:text (described later) elements can have an attribute named disable-output-escaping, which you can set to “yes” to turn off this escaping.

Creating the Result Tree

XPath and xsl:apply-templates cover a lot of the details of matching parts of the source tree and instantiating templates. You need to flesh out the way the result tree is constructed. In all the examples of xsl:apply-templates that we’ve shown, we’ve used literal result elements.

A literal result element is an element that doesn’t come from either the XSLT namespace or the namespace of any XSLT extension. When literal result elements occur in an xsl:template, they’re instantiated to create an element node (with the same name as the literal result element), which is then inserted into the proper place in the result tree. This makes it somewhat natural to specify how the result tree should be built. The literal result elements inside a template can be interspersed with elements from the XSLT namespace, such as xsl:apply-templates or xsl:value-of. You saw examples of this in the previous listing. The literal result element node brings with it any attribute nodes that correspond to attributes defined on the literal result element in the stylesheet. It also brings along any namespace nodes, except those namespace URIs that are the namespace URI of either XSLT itself or a declared XSLT extension.

The uses of literal result elements that you have seen show how to compute the contents of an element from values in the source tree using either xsl:apply-templates or xsl:value-of, but we haven’t shown how to do this for attribute values. XSLT uses a mechanism called attribute value templates to allow the values of attributes in literal result elements to be computed. To use an attribute value template, you enclose the expression that computes the necessary attribute value in curly braces ({}) inside the value of the attribute. The expression is evaluated and inserted at the point delimited by the curly braces. Here’s a template that converts the books:books element to a time element. An attribute value template is used to compute the versionNumber attribute in the result tree from the version attribute in the source tree:

  1: <xsl:template match="books:books">   2:   <tome versionNumber="{@version}"}>   3:     <xsl:apply-templates/>   4:   </tome>   5: </xsl:template>

Literal result elements are a mostly static way of specifying the result tree. XSLT also provide elements that allow you to construct any part of the result tree dynamically. To insert a dynamically generated node in the result tree, you place one of the following elements in the template at the point where you want the computed node to appear:

<xsl:element>—Inserts a dynamically created element node. The name attribute is an attribute value template containing the QName to be used for the generated element. The content of xsl:element is itself a template where literal result elements or dynamically created nodes can appear. This template is for the attributes and child of the dynamically created node.
<xsl:attribute>—Inserts a dynamically created attribute node. The name attribute is an attribute value template containing the QName to be used for the generated attribute; it can be specified using an attribute value template. The content of xsl:attribute is a template for the value of the attribute.
<xsl:text>—Inserts a static text node. The content of the text node is the content of xsl:text. See xsl:value-of to dynamically generate text.
<xsl:processing-instruction>—Inserts a dynamically created processing instruction node. The name attribute is an attribute value template that allows you to specify the name of the processing instruction. The content of xsl:processing-instruction is a template for the string value of the processing instruction.
<xsl:comment>—Inserts a dynamically generated comment. The content of xsl:comment is a template for the value of the comment.
<xsl:value-of>—Inserts dynamically generated text node. The text to be inserted is specified by the select attribute, which is an XSLT expression whose result is converted to a string. A text node is not created if the value of the select attribute evaluates to an empty string.
<xsl:copy>—Copies the current node to the result tree, including namespace nodes but excluding attribute nodes and children. The content of xsl:copy is a template for the attributes and children of the node being copied.

XSLT allows you to sort the nodes selected by xsl:apply-templates. Sorting is accomplished by adding any number of xsl:sort elements as children of the xsl:apply-templates whose processing you want sorted. The first xsl:sort element specifies the primary sort key, the second element specifies the secondary sort key, and so on. The data to sort on is specified via the select attribute, which is expected to be an expression. You control sort order with the order attribute, whose value can be “ascending” or “descending”. You can also ask for text or numerical sorting with the data-type attribute by specifying “text” or “number” for a value.

Extensions

XSLT provides two mechanisms for using extensions: extension elements and extension functions. XSLT doesn’t specify how extensions are implemented, it just tells how they’re specified. Extension elements are elements in specially designated extension namespaces. You designate extension namespaces by listing their prefixes as the value of either an extension-element prefixes attribute on xsl:stylesheet, or an xsl:extension-element-prefixes attribute on a literal result element or extension element. Both of these attributes are whitespace-separated lists of extension namespace prefixes (note that this means the prefixes need to be declared somewhere in the stylesheet).

Extension functions work a little differently. A function name that contains a colon is assumed to be a call to an extension function. The function name is expanded using the set of namespace declarations in the current context.

This concludes our brief review of XSLT. There are many topics we didn’t cover, and we didn’t explore any subjects in full detail. But just in case you picked up this book and don’t know XSLT, at least you have a fighting chance of making it through the rest of the chapter.