5.1 XSLT Basics | XML Publishing with Axkit

An XSLT stylesheet is made up of a single top-level xsl:stylesheet element that contains one or more xsl:template elements. These templates can contain literal elements that become part of the generated result, functional elements from the XSLT grammar that control such things as which parts of the source document to process, and often a combination of the two. The contents of the source XML document are accessed and evaluated from within the stylesheet's templates and other function elements using the XPath language. The following shows a few XSLT elements (the associated XPath expression is highlighted):

 <xsl:value-of select="  price  "/> <xsl:apply-templates select="  /article/section  "/> <xsl:copy-of select="  order/items  "/>

5.1.1 The XPath Language

XPath is a language used to select and evaluate various properties of the elements, attributes, and other types of nodes found in an XML document. In the context of XSLT, XPath is used to provide access to the various nodes contained by the source XML document being transformed. The XPath expressions used to access those nodes is based on the relationships between the nodes themselves . Nodes are selected using either the shortcut syntax that is somewhat reminiscent of the file and directory paths used to describe the structure of a Unix filesystem, or one that describes the abstract relationship axes between nodes (parent, child, sibling, ancestor , descendant, etc.). In addition, XPath provides a number of useful built-in functions that allow you to evaluate certain properties of the nodes selected from a given document tree. The most common components of an XPath expression are location paths and relationship axes, function calls, and predicate expressions.

5.1.1.1 Location paths and relationship axes

Borrowing from the Document Object Model, XPath visualizes the contents of an XML document as an abstract tree of nodes. At the top level of that tree is the root node represented by the string / . The root node is not the same as the top-level element (often called the document element) in the XML document, but is rather an abstract node above that level, which contains the document element and any special nodes, such as processing instructions:

 <?xml version="1.0"?> <?xml-stylesheet href="mystyle.xsl" type="text/xsl"?> <page>   <para>I &#2665; the XML Infoset</para> </page>

In this document, the xml-stylesheet processing instruction is a meaningful part of the document as a whole, but it is not contained by the page document element. Were it not for the abstract root node floating above the document element, you would have no way to access the processing instruction from within your stylesheets.

The practical result of having a root node above the top-level document element is that all XPath expressions that attempt to select nodes using an absolute path from root node to a node contained in the document must include both an / for the root node and the name of the document element.

Here are a few examples of absolute location paths:

 / /html/body /book/chapter/sect1/title /recordset/row/order-quantity

Relative location paths are resolved within the context of the current node (element, attribute, etc.) being processed . The following are functionally identical:

 chapter/sect1 child::chapter/sect1

Attribute nodes are accessed by using the attribute: : axis (or the shortcut, @ ), followed by the name of the attribute:

 attribute::class @class sect1/attribute::id sect1/@id /html/body/@bgcolor

Relationship axes provide a way to access nodes in the document based on relative relationships to the context node that often cannot be captured by a simple location path. For example, you can look back up the node tree from the current node using the ancestor or parent axis:

 ancestor::chapter/title parent::chapter/title

or across the tree at the same level using the preceding -sibling or following-sibling axes:

 preceding-sibling::product/@id following-sibling::div/@class

XPath also provides several useful shortcuts to help make things easier. The . (dot) is an alias for the self axis, and . . is an alias for the parent axis. The // path abbreviation is an alias for the descendant-or-self axis. While using // can be expensive to process, it is hard to fault the simplicity it offers. For example, collecting all the hyperlinks in an XHTML document into a single nodeset, regardless of the current context or where the links may appear in the document, is as simple as:

//a

Similarly, you can select all the para descendants of the context node (the node currently being processed) using:

 .//para

If all these relationship axes are confusing, recall that XPath visualizes the document as a hierarchical tree of nodes in much the same way that the filesystem is presented in most Unix-like operating systems. (Parents and ancestors are "up" towards the "root," siblings are at the same level on a given branch, and children and descendants are contents of the current node.)

5.1.1.2 Functions

XPath provides a nice list of built-in functions to help with node selection, evaluation, and processing. This list includes functions for accessing the properties of nodesets (e.g., position() and count() ), functions for accessing the abstract components of a given node (e.g., namespace-uri() and name() ), string processing functions (e.g., substring-before () , concat() , and translate() ), number processing (e.g., round() , sum() , and ceiling() ) and Boolean functions (e.g., true() , false() , and not() ). A detailed reference covering each function is not appropriate here, but a few useful examples are.

Get the number of para elements in the document:

 count(//para)

Create a fully qualified URL based on the relative_url attribute of the context node:

 concat('http://mysite.com/', @relative_url)

Replace dashes with underscores in the text of the context node:

 translate(., '-', '_')

Get the scheme name from a fully qualified URL:

 substring-before(@url, '://')

Quickly, get the total number of items ordered:

 sum(/order/products/item/quantity)

In addition to the core functions provided by XPath, XSLT adds several additional functions to make the task of transforming documents easier, or more robust. Two are especially useful; the document() (which provides a way to include all or part of separate XML documents into the current result), and the key() function (which offers an easy way to select specific nodes from larger, regularly-shaped sets by using part of the individual nodes as a lookup). You can find a complete listing of XSLT's functions on the Web at http://www.w3.org/TR/xslt#add-func.

Many functions provided by XPath and XSLT can be useful for transforming the source document's content; others are only useful when combined with XPath predicate expressions.

5.1.1.3 Predicates

Predicate expressions provide the ability to examine the properties of a nodeset in a way similar to the WHERE clause in SQL. When a predicate expression is used, only those nodes that meet the criteria established by the predicate are selected. Predicates are set off from the rest of the expression using square brackets and can appear after any node identifier in the larger XPath expression. When the predicate expression evaluates to an integer, the node at that position is returned. The following all select the second div child of the context node:

 div[2] div[1 + 1] div[position() = 2] div[position() > 1 and position < 3]

More than one predicate can be used in a given expression. The following returns a nodeset containing the title of the first section of the fifth chapter of a book in DocBook XML format.

 /book/chapter[5]/sect1[1]/title

Predicate expressions may also contain function calls. The following selects all descendants of the current node that contain significant text data but no child elements:

 .//*[string-length(normalize-space(text())) > 0 and count(child::*) = 0]

5.1.2 Stylesheet Templates

The basic building block of an XSLT stylesheet is the xsl:template element. The xsl:template element is used to create the output of the transformation. A template is invoked either by matching nodes in the source document against a pattern contained by the template element's match attribute, or by giving the template a name via the name attribute and calling it explicitly.

The xsl:template element's match attribute takes a pattern expression that determines to which nodes in the source tree the template will be applied. These match rules can be evaluated and invoked from within other templates using the xsl:apply-templates element. The pattern expressions take the form of an XPath location expression that may also include predicate expressions. For example, the following tells the XSLT processor to apply that template to all div elements that have body parents:

 <xsl:template match="body/div">   . . .  </xsl:template>

You can also add predicates to your patterns for more fine- tuned matching:

 <xsl:template match="body/div[@class='special']">    . . .  </xsl:template>

By adding the @class='special ' predicate, this template would only be applied to the subset of those elements matched by the previous example that have a class attribute with the value special .

Templates that contain match rules are invoked using the xsl:apply-templates element. If the xsl:apply-templates element's optional select attribute is present, the nodes returned by its pattern expression are evaluated one at a time against each pattern expression in the stylesheet's templates. When a match is found, the nodes are processed by the matching template. (If no match is found, a built-in template is used, and the children of the selected nodes are processed, and so on, recursively.) If the select attribute is not present, all child nodes of the current node are evaluated. This allows straightforward recursive processing of tree-shaped data.

 <?xml version="1.0"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">   <xsl:template match="/">    <!-- root template, always matches -->    <html>       <xsl:apply-templates/>    </html>   </xsl:template>   <xsl:template match="article">     <!-- matches element nodes named 'article' -->     <body>       <xsl:apply-templates/>     </body>   </xsl:template>   <xsl:template match="para">     <!-- matches element nodes named 'para' -->     <p>       <xsl:apply-templates/>     </p>   </xsl:template>   <xsl:template match="emphasis">     <!-- matches element nodes named 'emphasis' -->     <em>       <xsl:apply-templates/>     </em>   </xsl:template> </xsl:stylesheet>

Applying this stylesheet to the following XML document:

 <?xml version="1.0"?> <article>  <para>  I was <emphasis>not</emphasis> pleased with the result.  </para> </article>

gives the following result:

 <?xml version="1.0"?> <html>  <body>  <p>   I was <em>not</em> pleased with the result.  </p>  </body> </html>

The name attribute provides a way to explicitly invoke a given template from within other templates using the xsl:call-template element. For example, the following inserts the standard disclaimer contained in the template named document_footer at the bottom of an HTML page:

 <xsl:template match="/">    . . .  more processing here  . . .      <xsl:call-template name="document_footer"/>     </body>   </html> </xsl:template> <xsl:template name="document_footer">   <div class="footer">     <p>copyright  2001 Initech LLC. All rights reserved.</p>   </div> </xsl:template>

The xsl:template element's mode attribute provides a way to have template rules that have the same match expression (match the same nodeset) but process the matched nodes in a very different way. For example, the following snippet shows two templates whose expressions match the element nodes named section . One displays the main view of the document (the default), and the other builds a table of contents:

 <xsl:template match="article">   <!-- create the table of contents first -->   <div class="toc">     <xsl:apply-templates select="section" mode="toc"/>   </div>   <!-- then process the document as usual -->   <xsl:apply-templates select="section"/> </xsl:template> <xsl:template match="section">   <div class="section">     <h2>       <a name="{generate-id(title)}">         <xsl:value-of select="title"/>       </a>     </h2>     <xsl:apply-templates/>   </div> </xsl:template> <xsl:template match="section" mode="toc">   <a href="#{generate-id(title)}">     <xsl:value-of select="title"/>   </a>   <br />   <xsl:apply-templates select="section" mode="toc"/> </xsl:template>

5.1.3 Loop Constructs

Adding to its flexibility, XSLT borrows the concepts of iterative loops and conditional processing from traditional programming languages. The xsl:for-each element is used for iterating over the nodes in the nodeset returned by the expression contained in its select attribute. In spirit, this corresponds to Perl's foreach (@some_list) loop.

 <xsl:template match="para">   <xsl:for-each select="xlink">      . . .  do something with each xlink child of the current para element   </xsl:foreach> </xsl:template>

In the earlier example showing how template modes are used, you created two templates for the section elements: one for processing those nodes in the context of the main body of the document, and one for building the table of contents. You could just as easily have used an xsl:for-each element to create the table of contents:

 <xsl:template match="article">   <!-- create the table of contents first -->   <div class="toc">     <xsl:for-each select=".//section">       <a href="#{generate-id(title)}">         <xsl:value-of select="title"/>       </a>       <br />     </xsl:for-each>   </div>   <!-- then process the document as usual -->   <xsl:apply-templates select="section"/> </xsl:template>

So, which type of processing is better: iteration or recursion? There is no hard-and-fast rule. The answer depends largely on the shape of the node trees being processed. Generally, an iterative approach using xsl:for-each is appropriate for nodesets that contain regularly shaped data structures (such as line items in a product list), while recursion tends to work better for irregular trees containing mixed content (elements that can have both text data and child elements, such as articles or books). XSLT's processing model is founded on the notion of recursion (process the current node, apply templates to all or some of the current node's children, and so on). The point is that one size does not fit all. Having a working understanding of both styles of processing is key to the efficient and professional use of XSLT.

5.1.4 Conditional Blocks

Conditional "if/then" processing is available in XSLT using xsl:if , xsl:choose , and their associated child elements.

The xsl:if element offers an all-or-nothing approach to conditional processing. If the expression passed to the processor through the test attribute evaluates to true , the block is processed; otherwise , it is skipped altogether.

Consider the following template. It prints a list of an employee's coworkers, adding the appropriate commas in between the coworker's names (plus an and just before the final name) by testing the position() of each coworker child:

 <xsl:template match="employee">   <p>     Our employee <xsl:value-of select="first-name"/> works with:     <xsl:for-each select="coworker">       <xsl:value-of select="."/>       <xsl:if test="position() != last()">, </xsl:if>       <xsl:if test="position() = last()-1"> and </xsl:if>     </xsl:for-each>     on a daily basis.   </p> </xsl:template>

In cases in which you need to emulate the if-then-else block or switch statement found in most programming languages, use the xsl:choose element. An xsl:choose must contain one or more xsl:when elements and may contain an optional xsl:otherwise element. The test attribute of each xsl:when is evaluated in turn and contents processed for the first expression that evaluates to true. If none of the test conditions return a true value and an xsl:otherwise element is included, the contents of that element are processed:

 <xsl:template match="article">   <xsl:choose>     <xsl:when test="$page-view='toc'">       <xsl:apply-templates select="section" mode="toc"/>     </xsl:when>     <xsl:when test="$page-view='summary'">       <xsl:apply-templates select="abstract" mode="summary"/>     </xsl:when>     <xsl:otherwise>       <xsl:apply-templates select="section"/>     </xsl:otherwise>   </xsl:choose> </xsl:template>

5.1.5 Parameters and Variables

XSLT offers a way to capture and reuse arbitrary chunks of data during processing via the xsl:param and xsl:variable elements. In both cases, a unique name is given to the parameter or variable using a name attribute, and the contents can be accessed elsewhere in the stylesheet by prepending a $ (dollar sign) to that name. Therefore, the value of a variable declaration whose name attribute is myVar will be accessible later as $myVar .

The xsl:param element serves two purposes: it provides a mechanism for passing simple key/value data to the stylesheet from the outside, and it offers a way to pass information between templates within the stylesheet. One benefit of using XSLT in an environment such as AxKit is that all HTTP parameters are available from within your stylesheets via xsl:param elements:

 <?xml version="1.0"?> <xsl:stylesheet      xmlns:xsl="http://www.w3.org/1999/XSL/Transform"      version="1.0"> <xsl:param name="user"/> <xsl:template match="/">    . . .    <p>Greetings, <xsl:value-of select="$user"/>.   Welcome to our site.</p>    . . .  </xsl:template>  . . .

The result of a request to a document that is transformed by this stylesheet, such as http://myhost.tld/mypage.xml?user=ahclem (or a POST request that contains a defined " user " parameter), contains the following:

 <p>Greetings, ahclem. Welcome to our site.</p>

Default values can be set by adding text content to the xsl:param element:

 <xsl:param name="user">Mystery Guest</xsl:param>

or by passing a valid XPath expression to its select attribute:

 <xsl:param name="user" select="'Mystery Guest'"/> <xsl:param name="my-value" select="/path/to/default"/>

Parameters can also be used to pass data to other templates during the transformation process. In this second form, a template rule is invoked using the xsl:call-template element, whose required name attribute must correspond to the name attribute of an xsl:template , or by using an xsl:apply-templates that is expected to match one of the template's match attributes. In both cases, data can be passed to the template via one or more xsl: with-param elements, which are then available inside its invoked template using xsl:param elements. The result returned by the called template is inserted into the transformation's output at the point that the template is called.

Calling named templates and passing in parameters is the closest thing that XSLT has to the user-defined functions or subroutines provided in traditional programming languages. It can be employed to create reusable pseudofunctions:

 <xsl:template match="guestlist">   <xsl:for-each select="visitor">     <xsl:call-template name="lastname-first">       <xsl:with-param name="fullname" select="name"/>     </xsl:call-template>   </xsl:for-each>    . . .  </xsl:template>  . . .  <xsl:template name="lastname-first">   <xsl:param name="fullname"/>   <xsl:value-of select="$fullname/lastname"/>   <xsl:text>, </xsl:text>   <xsl:value-of select="$fullname/firstname"/>   <xsl:if test="$fullname/middle-initial">     <xsl:text> </xsl:text>     <xsl:value-of select="$fullname/middle-initial"/>     <xsl:text>.</xsl:text>   </xsl:if> </xsl:template>

The xsl:variable element is similar to xsl:param in that it can be assigned an arbitrary value such as a nodeset or string. Unlike parameters, though, variables only provide a way to store chunks of data; they are not used to pass information in from the environment or between templates.

They can be useful for such things as creating shortcuts to complex nodesets:

 <xsl:varable name="super-stars"              select="/roster/players[points-per-game > 25]" />

or setting default values for data that may be missing from a given part of the document:

 <xsl:varable name="username">  <xsl:choose>   <xsl:when test="/application/session/username">     <xsl:value-of select="/application/session/username"/>   </xsl:when>   <xsl:otherwise>Unknown User</xsl:otherwise>  </xsl:choose> </xsl:variable>

Once a variable or parameter has been assigned a value, it becomes read-only. This behavior trips most web developers who are used to doing such things as:

 my $grand_total = 0; foreach my $row (@order_data) {         $grand_total += $row->{quantity} * $row->{price}; } print "Order Total: $grand_total";

There is a way around this limitation, but it requires creating a template that recursively consumes a nodeset while passing the sum of the previous value and current value back to itself through a parameter as each node is processed:

 <xsl:template match="/"> <root>    . . .    Order total:   <xsl:call-template name="price-total">     <xsl:with-param name="items" select="order/item"/>   </xsl:call-template> </root> </xsl:template> <xsl:template name="price-total">   <xsl:param name="items"/>   <xsl:param name="total">0</xsl:param>   <xsl:choose>     <xsl:when test="$items">       <xsl:call-template name="price-total">         <xsl:with-param name="items"                               select="$items[position() != 1]"/>         <xsl:with-param name="total"                         select="$total + $items[position()=1]/quantity/text() *  $items[position()=1]/price/text()"/>       </xsl:call-template>     </xsl:when>     <xsl:otherwise>       <xsl:value-of select="format-number($total, '#,##0.00')"/>     </xsl:otherwise>   </xsl:choose> </xsl:template>

If you are more used to thinking in Perl, the following snippet illustrates the same principle:

 my $order_total = &price_total('0', @order_items); sub price_total {     my ($total, @items) = @_;         if (@items) {                 my $data = shift (@items);                 &price_total($total + $data->{price} * $data->{quantity}, @items);         }         else {                 return $total;         } }

This short introduction to XSLT's syntax and features really only touches the surface of what it can achieve. If what you read here intrigues you, I strongly recommend picking up one of the many fine books that cover the language in much greater depth.