Chapter 13. XSL Transformations

CONTENTS
  •  Using XSLT Style Sheets in XML Documents
  •  Creating XSLT Style Sheets
  •  Altering Document Structure Based on Input
  •  Generating Comments with xsl:comment
  •  Generating Text with xsl:text
  •  Copying Nodes
  •  Sorting Elements
  •  Using xsl:if
  •  Using xsl:choose
  •  Controlling Output Type

In this chapter, I'm going to start working with the Extensible Styles Language (XSL). XSL has two parts a transformation language and a formatting language.

The transformation language lets you transform documents into different forms, while the formatting language actually formats and styles documents in various ways. These two parts of XSL can function quite independently, and you can think of XSL as two languages, not one. In practice, you often transform a document before formatting it because the transformation process lets you add the tags the formatting process requires. In fact, that is one of the main reasons that W3C supports XSLT as the first stage in the formatting process, as we'll see in the next chapter.

This chapter covers the transformation language, and the next details the formatting language. The XSL transformation language is often called XSLT, and it has been a W3C recommendation since November 11, 1999. You can find the W3C recommendation for XSLT at http://www.w3.org/TR/xslt.

XSLT is a relatively new specification, and it's still developing in many ways. There are some XSLT processors of the kind we'll use in this chapter, but bear in mind that the support offered by publicly available software is not very strong as yet. A few packages support XSLT fully, and we'll see them here. However, no browser supports XSLT fully yet.

I'll start this chapter with an example to show how XSLT works.

Using XSLT Style Sheets in XML Documents

You use XSLT to manipulate documents, changing and working with their markup as you want. One of the most common transformations is from XML documents to HTML documents, and that's the kind of transformation we'll see in the examples in this chapter.

To create an XSLT transformation, you need two documents the document to transform, and the style sheet that specifies the transformation. Both documents are well-formed XML documents.

Here's an example; this document, planets.xml, is a well-formed XML document that holds data about three planets Mercury, Venus, and Earth. Throughout this chapter, I'll transform this document to HTML in various ways. For programs that can understand it, you can use the <?xml-stylesheet?> processing instruction to indicate what XSLT style sheet to use, where you set the type attribute to "text/xml" and the href attribute to the URI of the XSLT style sheet, such as planets.xsl in this example (XSLT style sheets usually have the extension .xsl).

<?xml version="1.0"?> <?xml-stylesheet type="text/xml" href="planets.xsl"?> <PLANETS>     <PLANET>         <NAME>Mercury</NAME>         <MASS UNITS="(Earth = 1)">.0553</MASS>         <DAY UNITS="days">58.65</DAY>         <RADIUS UNITS="miles">1516</RADIUS>         <DENSITY UNITS="(Earth = 1)">.983</DENSITY>         <DISTANCE UNITS="million miles">43.4</DISTANCE><!--At perihelion-->     </PLANET>     <PLANET>         <NAME>Venus</NAME>         <MASS UNITS="(Earth = 1)">.815</MASS>         <DAY UNITS="days">116.75</DAY>         <RADIUS UNITS="miles">3716</RADIUS>         <DENSITY UNITS="(Earth = 1)">.943</DENSITY>         <DISTANCE UNITS="million miles">66.8</DISTANCE><!--At perihelion-->     </PLANET>     <PLANET>         <NAME>Earth</NAME>         <MASS UNITS="(Earth = 1)">1</MASS>         <DAY UNITS="days">1</DAY>         <RADIUS UNITS="miles">2107</RADIUS>         <DENSITY UNITS="(Earth = 1)">1</DENSITY>         <DISTANCE UNITS="million miles">128.4</DISTANCE><!--At perihelion-->     </PLANET> </PLANETS>

XSL Style Sheets

Here's what the style sheet planets.xsl might look like. In this case, I'm converting planets.xml into HTML, stripping out the names of the planets, and surrounding those names with HTML <P> elements:

<?xml version="1.0"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">     <xsl:template match="PLANETS">         <HTML>             <xsl:apply-templates/>         </HTML>     </xsl:template>     <xsl:template match="PLANET">         <P>             <xsl:value-of select="NAME"/>         </P>     </xsl:template> </xsl:stylesheet>

All right, we have an XML document and the style sheet we'll use to transform it. So, how exactly do you transform the document?

Making a Transformation Happen

You can transform documents in three ways:

  • In the server. A server program, such as a Java servlet, can use a style sheet to transform a document automatically and serve it to the client. One such example is the XML Enabler, which is a servlet that you'll find at the XML for Java Web site, http://www.alphaworks.ibm.com/tech/xml4j.

  • In the client. A client program, such as a browser, can perform the transformation, reading in the style sheet that you specify with the <?xml-stylesheet?> processing instruction. Internet Explorer can handle transformations this way to some extent.

  • With a separate program. Several standalone programs, usually based on Java, will perform XSLT transformations. I'll use these programs primarily in this chapter.

In this chapter, I'll use standalone programs to perform transformations because those programs offer by far the most complete implementations of XSLT. I'll also take a look at using XSLT in Internet Explorer.

Two popular programs will perform XSLT transformations: XT and XML for Java.

James Clark's XT

You can get James Clark's XT at http://www.jclark.com/xml/xt.html. Besides XT itself, you'll also need a SAX-compliant XML parser, such as the one we used in the previous chapter that comes with the XML for Java packages, or James Clark's own XP parser, which you can get at http://www.jclark.com/xml/xp/index.html.

XT is a Java application. Included in the XT download is the JAR file you'll need, xt.jar. The XT download also comes with sax.jar, which holds James Clark's SAX parser. You can also use the XML for Java parser with XT; to do that, you must include both xt.jar and xerces.jar in your CLASSPATH, something like this:

%set CLASSPATH=%CLASSPATH%;C:\XML4J_3_0_1\xerces.jar;C:\xt\xt.jar;

Then you can use the XT transformation class, com.jclark.xsl.sax.Driver. You supply the name of the SAX parser you want to use, such as the XML for Java class org.apache.xerces.parsers.SAXParser, by setting the com.jclark.xsl.sax.parser variable with the java -D switch. Here's how I use XT to transform planets.xml, using planets.xsl, into planets.html:

%java -Dcom.jclark.xsl.sax.parser= org.apache.xerces.parsers.SAXParser com.jclark.xsl.sax.Driver planets.xml planets.xsl planets.html

XT is also packaged as a Win32 exe. To use xt.exe, however, you will need the Microsoft Java Virtual Machine (VM) installed (included with Internet Explorer). Here's an example in Windows that performs the same transformation as the previous command:

C:\>xt planets.xml planets.xsl planets.html
XML for Java

You can also use the IBM alphaWorks XML for Java XSLT package, called LotusXSL. LotusXSL implements an XSLT processor in Java that can be used from the command line, in an applet or a servlet, or as a module in another program. By default, it uses the XML4J XML parser, but it can interface to any XML parser that conforms to the either the DOM or the SAX specification.

Here's what the XML for Java site says about LotusXSL: "LotusXSL 1.0.1 is a complete and a robust reference implementation of the W3C Recommendations for XSL Transformations (XSLT) and the XML Path Language (XPath)."

You can get LotusXSL at http://www.alphaworks.ibm.com/tech/xml4j; just click the XML item in the frame at left, click LotusXSL, and then click the Download button (or you can go directly to http://www.alphaworks.ibm.com/tech/lotusxsl, although that URL may change). The download includes xerces.jar, which includes the parsers that the rest of the LotusXSL package uses (although you can use other parsers), and xalan.jar, which is the LotusXSL JAR file. To use LotusXSL, make sure that you have xalan.jar in your CLASSPATH; to use the XML for Java SAX parser, make sure that you also have xerces.jar in your CLASSPATH, something like this:

%set CLASSPATH= %CLASSPATH%;C:\lotusxsl_1_0_1\xalan.jar;C:\xsl\lotusxsl_1_0_1\xerces.jar;

Unfortunately, the LotusXSL package does not have a built-in class that will take a document name, a style sheet name, and an output file name like XT. However, I'll create one named xslt, and you can use this class quite generally for transformations. Here's what xslt.java looks like:

import org.apache.xalan.xslt.*; public class xslt {     public static void main(String[] args)     {         try {             XSLTProcessor processor = XSLTProcessorFactory.getProcessor();             processor.process(new XSLTInputSource(args[0]),                 new XSLTInputSource(args[1]),                 new XSLTResultTarget(args[2]));         }         catch (Exception e)         {             System.err.println(e.getMessage());         }     } }

After you've set the CLASSPATH as indicated, you can create xslt.class with javac like this:

%javac xslt.java

The file xslt.class is all you need. After you've set the CLASSPATH as indicated, you can use xslt.class like this to transform planets.xml, using the style sheet planets.xsl, into planets.html:

%java xslt planets.xml planets.xsl planets.html

What does planets.html look like? In this case, I've set up planets.xsl to simply place the names of the planets in <P> HTML elements. Here are the results, in planets.html:

<HTML>     <P>Mercury</P>     <P>Venus</P>     <P>Earth</P> </HTML>

That's the kind of transformation we'll see in this chapter.

There's another way to transform XML documents without a standalone program you can use a client program such as a browser to transform documents.

Using Browsers to Transform XML Documents

Internet Explorer includes a partial implementation of XSLT; you can read about Internet Explorer support at http://msdn.microsoft.com/xml/XSLGuide/. That support is based on the W3C XSL working draft of December 16, 1998 (which you can find at http://www.w3.org/TR/1998/WD-xsl-19981216.html); as you can imagine, things have changed considerably since then.

To use planets.xml with Internet Explorer, I have to make a few modifications. For example, I have to convert the type attribute in the <?xml-stylesheet?> processing instruction from "text/xml" to "text/xsl":

<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="planets.xsl"?> <PLANETS>     <PLANET>         <NAME>Mercury</NAME>         <MASS UNITS="(Earth = 1)">.0553</MASS>         <DAY UNITS="days">58.65</DAY>         <RADIUS UNITS="miles">1516</RADIUS>         <DENSITY UNITS="(Earth = 1)">.983</DENSITY>         <DISTANCE UNITS="million miles">43.4</DISTANCE><!--At perihelion-->     </PLANET>     <PLANET>         <NAME>Venus</NAME>         <MASS UNITS="(Earth = 1)">.815</MASS>         <DAY UNITS="days">116.75</DAY>         <RADIUS UNITS="miles">3716</RADIUS>         <DENSITY UNITS="(Earth = 1)">.943</DENSITY>         <DISTANCE UNITS="million miles">66.8</DISTANCE><!--At perihelion-->     </PLANET>     <PLANET>         <NAME>Earth</NAME>         <MASS UNITS="(Earth = 1)">1</MASS>         <DAY UNITS="days">1</DAY>         <RADIUS UNITS="miles">2107</RADIUS>         <DENSITY UNITS="(Earth = 1)">1</DENSITY>         <DISTANCE UNITS="million miles">128.4</DISTANCE><!--At perihelion-->     </PLANET> </PLANETS>

I can also convert the style sheet planets.xsl for use in Internet Explorer. A major difference between the W3C XSL recommendation and the XSL implementation in Internet Explorer is that Internet Explorer doesn't implement any default XSL rules (which I'll discuss in this chapter). This means that I have to explicitly include an XSL rule for the root of the document, which you specify with /. I also have to use a different namespace in the style sheet, http://www.w3.org/TR/WD-xsl, and omit the version attribute in the <xsl:stylesheet> element:

<?xml version="1.0"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/TR/WD-xsl">     <xsl:template match="/">         <HTML>             <xsl:apply-templates/>         </HTML>     </xsl:template>     <xsl:template match="PLANETS">         <xsl:apply-templates/>     </xsl:template>     <xsl:template match="PLANET">         <P>             <xsl:value-of select="NAME"/>         </P>     </xsl:template> </xsl:stylesheet>

You can see the results of this transformation in Figure 13.1.

Figure 13.1. Performing an XSL transformation in Internet Explorer.

graphics/13fig01.gif

We now have an overview of XSL transformations and have seen them at work. It's time to see how to create XSLT style sheets in detail.

Creating XSLT Style Sheets

XSLT transformations accept a document tree as input and produce a tree as output. From the XSLT point of view, documents are trees built of nodes, and there are seven types of nodes XSLT recognizes; here are those nodes, and how XSLT processors treat them:

Node Description
Document root Is the very start of the document
Attribute Holds the value of an attribute after entity references have been expanded and surrounding whitespace has been trimmed
Comment Holds the text of a comment, not including <!-- and -->
Element Consists of all character data in the element, which includes character data in any of the children of the element
Namespace Holds the namespace's URI
Processing instruction Holds the text of the processing instruction, which does not include <? and ?>
Text Holds the text of the node

To indicate what node or nodes you want to work on, XSLT supports various ways of matching or selecting nodes. For example, the character / stands for the root node. To get us started, I'll create a short example here that will replace the root node and, therefore, the whole document with an HTML page.

As you might expect, XSLT style sheets must be well-formed XML documents, so you start a style sheet with the XML declaration. Next, you use a <stylesheet> element; XSLT style sheets use the namespace xsl, which, now that XSLT has been standardized, corresponds to http://www.w3.org/1999/XSL/Transform. You must also include the version attribute in the <stylesheet> element, setting that attribute to the only current version, 1.0:

<?xml version="1.0"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">     .     .     .

That's how you start an XSLT style sheet (in fact, if you're using a standalone program that requires you to give the name of the style sheet you're using, you can usually omit the <xsl:stylesheet> element). To work with specific nodes in an XML document, XSLT uses templates. When you match or select nodes, a template tells the XSLT processor how to transform the node for output. In this example, I want to replace the root node with a whole new HTML document, so I start by creating a template with the <xsl:template> element, setting the match attribute to the node to match, "/":

<?xml version="1.0"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">     <xsl:template match="/">     .     .     .     </xsl:template> </xsl:stylesheet>

When the root node is matched, the template is applied to that node. In this case, I want to replace the root node with an HTML document, so I just include that HTML document directly as the content of the <xsl:template> element:

<?xml version="1.0"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">     <xsl:template match="/">         <HTML>             <HEAD>                 <TITLE>                     A trivial transformation                 </TITLE>             </HEAD>             <BODY>                 This transformation has replaced                 the entire document.             </BODY>         </HTML>     </xsl:template> </xsl:stylesheet>

And that's all it takes; by using the <xsl:template> element, I've set up a rule in the style sheet. When the XSL processor reads the document, the first node that it sees is the root node. This rule matches that root node, so the XSL processor replaces it with the HTML document, producing this result:

<HTML>     <HEAD>         <TITLE>             A trivial transformation         </TITLE>     </HEAD>     <BODY>         This transformation has replaced         the entire document.     </BODY> </HTML>

That's our first, rudimentary transformation. All we've done is replace the entire document with another one. But, of course, that's just the beginning.

The xsl:apply-templates Element

The template I used in the previous section applied to only one node the root node and performed a trivial action, replacing the entire XML document with an HTML document. However, you can also apply templates to the children of a node that you've matched, and you do that with the <xsl:apply-templates> element.

For example, say that I want to convert planets.xml to HTML. The document node in that document is <PLANETS>, so I can match that element with a template, setting the match attribute to the name of the element I want to match. Then I replace the <PLANETS> element with an <HTML> element, like this:

<?xml version="1.0"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">     <xsl:template match="PLANETS">         <HTML>     .     .     .         </HTML>     </xsl:template>     .     .     . </xsl:stylesheet>

But what about the children of the <PLANETS> element? To make sure that they are transformed correctly, you use the <xsl:apply-templates> element this way:

<?xml version="1.0"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">     <xsl:template match="PLANETS">         <HTML>             <xsl:apply-templates/>         </HTML>     </xsl:template>     .     .     . </xsl:stylesheet>

Now you can provide templates for the child nodes. In this case, I'll just replace each of the three <PLANET> elements with some text, which I place directly into the template for the <PLANET> element:

<?xml version="1.0"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">     <xsl:template match="PLANETS">         <HTML>             <xsl:apply-templates/>         </HTML>   </xsl:template>     <xsl:template match="PLANET">         <P>             Planet data will go here .         </P>     </xsl:template> </xsl:stylesheet>

And that's it; now the <PLANETS> element is replaced by an <HTML> element, and the <PLANET> elements are also replaced:

<HTML>     <P>         Planet data will go here .     </P>     <P>         Planet data will go here .     </P>     <P>         Planet data will go here .     </P> </HTML>

You can see that this transformation works, but it's still less than useful; all we've done is replace the <PLANET> elements with some text. What if we wanted to access some of the data in the <PLANET> element? For example, say that we wanted to place the text from the <NAME> element in each <PLANET> element in the output document:

<PLANET>     <NAME>Mercury</NAME>     <MASS UNITS="(Earth = 1)">.0553</MASS>     <DAY UNITS="days">58.65</DAY>     <RADIUS UNITS="miles">1516</RADIUS>     <DENSITY UNITS="(Earth = 1)">.983</DENSITY>     <DISTANCE UNITS="million miles">43.4</DISTANCE><!--At perihelion--> </PLANET>

To gain access to this kind of data, you can use the select attribute of the <xsl:value-of> element.

Getting the Value of Nodes with xsl:value-of

In this example, I'll extract the name of each planet and insert that name into the output document. To get the name of each planet, I'll use the <xsl:value-of> element in a template targeted at the <PLANET> element, and I'll select the <NAME> element with the select attribute like this:

<?xml version="1.0"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">     <xsl:template match="PLANETS">         <HTML>             <xsl:apply-templates/>         </HTML>     </xsl:template>     <xsl:template match="PLANET">         <xsl:value-of select="NAME"/>     </xsl:template> </xsl:stylesheet>

Using select like this, you can select nodes. The select attribute is much like the match attribute of the <xsl:template> element, except that the select attribute is more powerful. With it, you can specify the node or nodes to select using the full XPath XML specification, as we'll see later in this chapter. The select attribute is an attribute of the <xsl:apply-templates>, <xsl:value-of>, <xsl:for-each>, and <xsl:sort> elements, all of which we'll also see in this chapter.

Applying the previous style sheet, the <xsl:value-of select="NAME"/> element directs the XSLT processor to insert the name of each planet into the output document, so that document looks like this:

<HTML>   Mercury   Venus   Earth </HTML>

Handling Multiple Selections with xsl:for-each

The select attribute selects only the first node that matches its selection criterion. What if you have multiple nodes that could match? For example, say that you can have multiple <NAME> elements for each planet:

<PLANET>     <NAME>Mercury</NAME>     <NAME>Closest planet to the sun</NAME>     <MASS UNITS="(Earth = 1)">.0553</MASS>     <DAY UNITS="days">58.65</DAY>     <RADIUS UNITS="miles">1516</RADIUS>     <DENSITY UNITS="(Earth = 1)">.983</DENSITY>     <DISTANCE UNITS="million miles">43.4</DISTANCE><!--At perihelion--> </PLANET>

The <xsl:value-of> element's select attribute by itself will select only the first <NAME> element; to loop over all possible matches, you can use the <xsl:for-each> element like this:

<?xml version="1.0"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">     <xsl:template match="PLANETS">         <HTML>             <xsl:apply-templates/>         </HTML>     </xsl:template> <xsl:template match="PLANET">     <xsl:for-each select="NAME">         <P>             <xsl:value-of select="."/>         </P>     </xsl:for-each> </xsl:template> </xsl:stylesheet>

This style sheet will catch all <NAME> elements, place their values in a <P> element, and add them to the output document, like this:

<HTML>     <P>Mercury</P>     <P>Closest planet to the sun</P>     <P>Venus</P>     <P>Earth</P> </HTML>

We've seen now that you can use the match and select attributes to indicate what nodes you want to work with. The actual syntax that you can use with these attributes is fairly complex but worth knowing. I'll take a look at the match attribute in more detail first, and I'll examine the select attribute later in this chapter.

Specifying Patterns for the match Attribute

You can use an involved syntax with the <xsl:template> element's match attribute, and an even more involved syntax with the select attribute of the <xsl:apply-templates>, <xsl:value-of>, <xsl:for-each>, <xsl:copy-of>, and <xsl:sort> elements. We'll see them both in this chapter, starting with the syntax you can use with the match attribute.

Matching the Root Node

As we've already seen, you can match the root node with /, like this:

<xsl:template match="/">     <HTML>         <xsl:apply-templates/>     </HTML> </xsl:template>
Matching Elements

You can match specific XML elements simply by giving their name, as we've also seen:

<xsl:template match="PLANETS">     <HTML>         <xsl:apply-templates/>     </HTML> </xsl:template>
Matching Children

You can use the / operator to separate element names when you want to refer to a child of a particular node. For example, say that you wanted to create a rule that applies only to <NAME> elements that are children of <PLANET> elements. In that case, you can match to the expression "PLANET/NAME". Here's a rule that will surround the text of such elements in an <H3> element:

<xsl:template match="PLANET/NAME">   <H3><xsl:value-of select="."/></H3> </xsl:template>

Notice the expression "." here. You use "." with the select attribute to specify the current node, as we'll see when discussing the select attribute.

You can also use the * character as a wildcard, standing for any element (* can match only elements). For example, this rule applies to all <NAME> elements that are grandchildren of <PLANET> elements:

<xsl:template match="PLANET/*/NAME">   <H3><xsl:value-of select="."/></H3> </xsl:template>
Matching Element Descendants

In the previous section, I used the expression "PLANET/NAME" to match all <NAME> elements that are direct children of <PLANET> elements, and I used the expression "PLANET/*/NAME" to match all <NAME> elements that are grandchildren of <PLANET> elements. However, there's an easier way to perform both matches: Just use the expression "PLANET//NAME", which matches all <NAME> elements that are inside <PLANET> elements, no matter how many levels deep. (The matched elements are called descendants of the <PLANET> element). In other words, "PLANET//NAME" matches "PLANET/NAME", "PLANET/*/NAME", "PLANET/*/*/NAME", and so on:

<xsl:template match="PLANETS//NAME">   <H3><xsl:value-of select="."/></H3> </xsl:template>
Matching Attributes

You can match attributes if you preface their name with @. Here's an example; in this case, I'll display the data in planets.xml in an HTML table. You might note, however, that the units for the various measurements are stored in attributes, like this:

<PLANET>     <NAME>Earth</NAME>     <MASS UNITS="(Earth = 1)">1</MASS>     <DAY UNITS="days">1</DAY>     <RADIUS UNITS="miles">2107</RADIUS>     <DENSITY UNITS="(Earth = 1)">1</DENSITY>     <DISTANCE UNITS="million miles">128.4</DISTANCE><!--At perihelion--> </PLANET>

To recover the units and display them as well as the values for the mass and so on, I'll match the UNITS attribute with @UNITS. Here's how that looks note that I'm using the element <xsl:text> element to insert a space into the output document (more on <xsl:text> later):

<?xml version="1.0"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">     <xsl:template match="/PLANETS">         <HTML>             <HEAD>                 <TITLE>                     The Planets Table                 </TITLE>             </HEAD>             <BODY>                 <H1>                     The Planets Table                 </H1>                 <TABLE>                     <TD>Name</TD>                     <TD>Mass</TD>                     <TD>Radius</TD>                     <TD>Day</TD>                     <xsl:apply-templates/>                 </TABLE>             </BODY>         </HTML>     </xsl:template>     <xsl:template match="PLANET">        <TR>           <TD><xsl:value-of select="NAME"/></TD>           <TD><xsl:apply-templates select="MASS"/></TD>           <TD><xsl:apply-templates select="RADIUS"/></TD>        </TR>    </xsl:template>     <xsl:template match="MASS">         <xsl:value-of select="."/>         <xsl:text> </xsl:text>         <xsl:value-of select="@UNITS"/>     </xsl:template>     <xsl:template match="RADIUS">         <xsl:value-of select="."/>         <xsl:text> </xsl:text>         <xsl:value-of select="@UNITS"/>     </xsl:template>     <xsl:template match="DAY">         <xsl:value-of select="."/>         <xsl:text> </xsl:text>         <xsl:value-of select="@UNITS"/>     </xsl:template> </xsl:stylesheet>

Now the resulting HTML table includes not only values, but also their units of measurement. (The spacing leaves a little to be desired, but HTML browsers will have no problem with it; we'll take a look at ways of handling whitespace later in this chapter.)

<HTML> <HEAD> <TITLE>                     The Planets Table                 </TITLE> </HEAD> <BODY> <H1>                     The Planets Table                 </H1> <TABLE> <TD>Name</TD><TD>Mass</TD><TD>Radius</TD><TD>Day</TD>     <TR> <TD>Mercury</TD><TD>.0553 (Earth = 1)</TD><TD>1516 miles</TD> </TR>     <TR> <TD>Venus</TD><TD>.815 (Earth = 1)</TD><TD>3716 miles</TD> </TR>     <TR> <TD>Earth</TD><TD>1 (Earth = 1)</TD><TD>2107 miles</TD> </TR> </TABLE> </BODY> </HTML>

You can also use the @* wildcard to select all attributes of an element. For example, "PLANET/@*" selects all attributes of <PLANET> elements.

Matching by ID

You can also match elements that have a specific ID value using the pattern id(). To use this selector, you must give elements an ID attribute, and you must declare that attribute of type ID, as you can do in a DTD. Here's an example rule that adds the text of all elements that have the ID Christine:

<xsl:template match = "id('Christine')">     <H3><xsl:value-of select="."/></H3> </xsl:template>
Matching Comments

You can match the text of comments with the pattern comment(). You should not store data that should go into the output document in comments in the input document, of course. However, you might want to convert comments from the <!--comment--> form into something another markup language might use, such as a <COMMENT> element.

Here's an example; planet.xml was designed to include comments so that we could see how to extract them:

<PLANET>     <NAME>Venus</NAME>     <MASS UNITS="(Earth = 1)">.815</MASS>     <DAY UNITS="days">116.75</DAY>     <RADIUS UNITS="miles">3716</RADIUS>     <DENSITY UNITS="(Earth = 1)">.943</DENSITY>     <DISTANCE UNITS="million miles">66.8</DISTANCE><!--At perihelion--> </PLANET>

To extract comments and put them into <COMMENT> elements, I'll include a rule just for comments:

<?xml version="1.0"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">     <xsl:template match="PLANETS">         <HTML>             <xsl:apply-templates/>         </HTML>     </xsl:template> <xsl:template match="comment()">     <COMMENT>         <xsl:value-of select="."/>     </COMMENT> </xsl:template> </xsl:stylesheet>

Here's what the result is for Venus, where I've transformed the comment into a <COMMENT> element:

Venus .815 116.75 3716 .943 66.8<COMMENT>At perihelion</COMMENT>

Note that the text for the other elements in the <PLANET> element is also inserted into the output document. The reason for that is that the default rule for each element is to include its text in the output document. Because I haven't provided a rule for elements, their text is simply included in the output document. I'll take a closer look at default rules later in the chapter.

Matching Text Nodes with text()

You can match the text in a node with the pattern text(). There's really not much reason to ever use text(), however, because XSLT includes a default rule: If there are no other rules for a text node, the text in that node is inserted into the output document. If you were to make that default rule explicit, it might look like this:

<xsl:template match="text()">     <xsl:value-of select="."/> </xsl:template>

You can override this rule by not sending the text in text nodes to the output document, like this:

<xsl:template match="text()"> </xsl:template>

In the previous example, you can see that a great deal of text made it from the input document to the output document because there was no explicit rule besides the default one for text nodes the only output rule that I used was for comments. If you turn off the default rule for text nodes by adding the previous two lines to the version of planets.xsl used in the previous example, the text of those text nodes does not go into the output document. This is the result:

<HTML> <COMMENT>At perihelion</COMMENT> <COMMENT>At perihelion</COMMENT> <COMMENT>At perihelion</COMMENT> </HTML>
Matching Processing Instructions

You can use the pattern processing-instruction() to match processing instructions.

<xsl:template match="/processing-instruction()">     <I>         Found a processing instruction.     </I> </xsl:template>

You can also specify what processing instruction you want to match by giving the name of the processing instruction (excluding <? and ?>), as in this case, where I'm matching the processing instruction <?xml-include?>:

<xsl:template match="/processing-instruction(xml-include)">     <I>         Found an xml-include processing instruction.     </I> </xsl:template>

One of the major reasons that XML makes a distinction between the root node (at the very beginning of the document) and the document node is so that you have access to the processing instructions and other nodes in the document's prolog.

Using the Or Operator

You can match to a number of possible patterns, which is very useful when your documents get a little more involved than the ones we've been using so far in this chapter. Here's an example; in this case, I want to display <NAME> and <MASS> elements in bold, which I'll do with the HTML <B> tag. To match either <NAME> or <MASS> elements, I'll use the Or operator, which is a vertical bar (|), in a new rule, like this:

<?xml version="1.0"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">     <xsl:template match="PLANETS">         <HTML>             <xsl:apply-templates/>         </HTML>     </xsl:template>     <xsl:template match="PLANET">         <P>             <xsl:apply-templates/>         </P>     </xsl:template>     <xsl:template match="NAME | MASS">         <B>             <xsl:apply-templates/>         </B>     </xsl:template> </xsl:stylesheet>

Here are the results; note that the name and mass values are both enclosed in <B> elements. (Also note that, because of the XSL default rules, the text from the other child elements of the <PLANET> element is also displayed.)

<HTML>   <P>     <B>Mercury</B>     <B>.0553</B>     58.65     1516     .983     43.4   </P>   <P>     <B>Venus</B>     <B>.815</B>     116.75     3716     .943     66.8   </P>   <P>     <B>Earth</B>     <B>1</B>     1     2107     1     128.4   </P> </HTML>

You can use any valid pattern with the | operator, such as expressions like PLANET | PLANET//NAME, and you can use multiple | operators, such as NAME | MASS | DAY, and so on.

Testing with []

You can use the [] operator to test whether a certain condition is true. For example, you can test the following:

  • The value of an attribute in a given string

  • The value of an element

  • Whether an element encloses a particular child, attribute, or other element

  • The position of a node in the node tree

Here are some examples:

  • This expression matches <PLANET> elements that have child <NAME> elements:

    <xsl:template match = "PLANET[NAME]">
  • This expression matches any element that has a <NAME> child element:

    <xsl:template match = "*[NAME]">
  • This expression matches any <PLANET> element that has either a <NAME> or a <MASS> child element:

    <xsl:template match="PLANET[NAME | MASS]">

Say that we gave the <PLANET> elements in planets.xml a new attribute COLOR which holds the planet's color:

<?xml version="1.0"?> <?xml-stylesheet type="text/xml" href="planets.xsl"?> <PLANETS>   <PLANET COLOR="RED">     <NAME>Mercury</NAME>     <MASS UNITS="(Earth = 1)">.0553</MASS>     <DAY UNITS="days">58.65</DAY>     <RADIUS UNITS="miles">1516</RADIUS>     <DENSITY UNITS="(Earth = 1)">.983</DENSITY>     <DISTANCE UNITS="million miles">43.4</DISTANCE><!--At perihelion-->   </PLANET>   <PLANET COLOR="WHITE">     <NAME>Venus</NAME>     <MASS UNITS="(Earth = 1)">.815</MASS>     <DAY UNITS="days">116.75</DAY>     <RADIUS UNITS="miles">3716</RADIUS>     <DENSITY UNITS="(Earth = 1)">.943</DENSITY>     <DISTANCE UNITS="million miles">66.8</DISTANCE><!--At perihelion-->   </PLANET>   <PLANET COLOR="BLUE">     <NAME>Earth</NAME>     <MASS UNITS="(Earth = 1)">1</MASS>     <DAY UNITS="days">1</DAY>     <RADIUS UNITS="miles">2107</RADIUS>     <DENSITY UNITS="(Earth = 1)">1</DENSITY>     <DISTANCE UNITS="million miles">128.4</DISTANCE><!--At perihelion-->   </PLANET> </PLANETS>

This expression matches <PLANET> elements that have COLOR attributes:

<xsl:template match="PLANET[@COLOR]">

What if you wanted to match planets whose COLOR attribute was BLUE? You can do that with the = operator, like this:

<?xml version="1.0"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">     <xsl:template match="PLANETS">         <HTML>             <xsl:apply-templates/>         </HTML>     </xsl:template>     <xsl:template match="PLANET[@COLOR = 'BLUE']">             The <xsl:value-of select="NAME"/> is blue.     </xsl:template>     <xsl:template match="text()">     </xsl:template> </xsl:stylesheet>

This style sheet filters out all planets whose color is blue and omits the others by turning off the default rule for text nodes. Here's the result:

<HTML>         The Earth is blue. </HTML>

In fact, the expressions you can use in the [] operators are W3C XPath expressions. XPath expressions give you ways of specifying nodes in an XML document using a fairly involved syntax. And because the select attribute, which we're about to cover, uses XPath, I'll take a look at XPath as well.

Specifying Patterns for the select Attribute

I've taken a look at the kinds of expressions that you can use with the <xsl:template> element's match attribute. You can use an even more involved syntax with the select attribute of the <xsl:apply-templates>, <xsl:value-of>, <xsl:for-each>, <xsl:copy-of>, and <xsl:sort> elements.

The select attribute uses XPath expressions, which is a W3C recommendation as of November 16, 1999. You can find the XPath specification at http://www.w3.org/TR/xpath.

We've seen that you can use the match attribute to find nodes by name, child element(s), attributes, or even descendant. We've also seen that you can make some tests to see whether elements or attributes have certain values. You can do all that and more with the XPath specification supported by the select attribute, including finding nodes by parent or sibling elements, as well as much more involved tests. XPath is much more of a true language than the expressions you can use with the match attribute; for example, XPath expressions can return not only lists of nodes, but also Boolean, string, and numeric values.

The XML for Java package has a handy example program, ApplyXPath.java, that enables you to apply an XPath expression to a document and see what the results would be. This is great for testing. For example, if I applied the XPath expression "PLANET/NAME" to planets.xml, here is what the result would look like, displaying the values of all <NAME> elements that are children of <PLANET> elements (the <output> tags are added by ApplyXPath):

%java ApplyXPath planets.xml PLANET/NAME <output> <NAME>Mercury</NAME><NAME>Venus</NAME><NAME>Earth</NAME></output>

XPath expressions are more powerful than the match expressions we've seen; for one thing, they're not restricted to working with the current node or child nodes because you can work with parent nodes, ancestor nodes, and more. Specifying what node you want to work in relation to is called specifying an axis in XPath. I'll take a look at XPath syntax in detail next.

Understanding XPath

To specify a node or set of nodes in XPath, you use a location path. A location path, in turn, consists of one or more location steps, separated by / or //. If you start the location path with /, the location path is called an absolute location path because you're specifying the path from the root node; otherwise, the location path is relative, starting with the current node, which is called the context node. Got all that? Good, because there's more.

A location step is made up of an axis, a node test, and zero or more predicates. For example, in the expression child::PLANET[position() = 5], child is the name of the axis, PLANET is the node test, and [position() = 5] is a predicate. You can create location paths with one or more location steps, such as /descendant::PLANET/child::NAME, which selects all the <NAME> elements that have a <PLANET> parent. The best way to understand all this is by example, and we'll see plenty of them in a few pages. In the meantime, I'll take a look at what kind of axes, node tests, and predicates XPath supports.

XPath Axes

In the location path child::NAME, which refers to a <NAME> element that is a child of the current node, the child is called the axis. XPath supports many different axes, and it's important to know what they are. Here's the list:

Axis Description
ancestor Holds the ancestors of the context node. The ancestors of the context node are the parent of context node and the parent's parent and so forth, back to and including the root node.
ancestor-or-self Holds the context node and the ancestors of the context node.
attribute Holds the attributes of the context node.
child Holds the children of the context node.
descendant Holds the descendants of the context node. A descendant is a child or a child of a child, and so on.
descendant-or-self Contains the context node and the descendants of the context node.
following Holds all nodes in the same document as the context node that come after the context node.
following-sibling Holds all the following siblings of the context node. A sibling is a node on the same level as the context node.
namespace Holds the namespace nodes of the context node.
parent Holds the parent of the context node.
preceding Contains all nodes that come before the context node.
preceding-sibling Contains all the preceding siblings of the context node. A sibling is a node on the same level as the context node.
self Contains the context node.

You can use axes to specify a location step or path, as in this example, where I'm using the child axis to indicate that I want to match to child nodes of the context node, which is a <PLANET> element. (We'll see later that an abbreviated version lets you omit the child:: part.)

<xsl:template match="PLANET">     <HTML>         <CENTER>             <xsl:value-of select="child::NAME"/>         </CENTER>         <CENTER>             <xsl:value-of select="child::MASS"/>         </CENTER>         <CENTER>             <xsl:value-of select="child::DAY"/>         </CENTER>     </HTML> </xsl:template>

In these expressions, child is the axis, and the element names NAME, MASS, and DAY are node tests.

XPath Node Tests

You can use names of nodes as node tests, or you can use the wild card * to select element nodes. For example, the expression child::*/child::NAME selects all <NAME> elements that are grandchildren of the context node. Besides nodes and the wild card character, you can also use these node tests:

Node Test Description
comment() Selects comment nodes.
node() Selects any type of node.
processing-instruction() Selects a processing instruction node. You can specify the name of the processing instruction to select in the parentheses.
text() Selects a text node.
XPath Predicates

The predicate part of an XPath step is perhaps its most intriguing part because it gives you the most power. You can work with all kinds of expressions in predicates; here are the possible types:

  • Node sets

  • Booleans

  • Numbers

  • Strings

  • Result tree fragments

I'll take a look at these various types in turn.

XPath Node Sets

As its name implies, a node set is simply a set of nodes. An expression such as child::PLANET returns a node set of all <PLANET> elements. The expression child::PLANET/child::NAME returns a node list of all <NAME> elements that are children of <PLANET> elements. To select a node or nodes from a node set, you can use various functions that work on node sets in predicates.

Function Description
last() Returns the number of nodes in a node set.
position() Returns the position of the context node in the context node set (starting with 1).
count(node-set) Returns the number of nodes in node-set. Omitting node-set makes this function use the context node.
id(string ID) Returns a node set containing the element whose ID matches the string passed to the function, or returns an empty node set if no element has the specified ID. You can list multiple IDs separated by whitespace, and this function will return a node set of the elements with those IDs.
local-name(node-set) Returns the local name of the first node in the node set. Omitting node-set makes this function use the context node.
namespace-uri(node-set) Returns the URI of the namespace of the first node in the node set. Omitting node-set makes this function use the context node.
name(node-set) Returns the full, qualified name of the first node in the node set. Omitting node-set makes this function use the context node.

Here's an example; in this case, I'll number the elements in the output document using the position() function:

<?xml version="1.0"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">     <xsl:template match="PLANETS">         <HTML>             <HEAD>                 <TITLE>                     The Planets                 </TITLE>             </HEAD>             <BODY>                 <xsl:apply-templates select="PLANET"/>             </BODY>         </HTML>     </xsl:template>     <xsl:template match="PLANET">         <P>             <xsl:value-of select="position()"/>.             <xsl:value-of select="NAME"/>         </P>     </xsl:template> </xsl:stylesheet>

Here's the result, where you can see that the planets are numbered:

<HTML> <HEAD> <TITLE>                     The Planets                 </TITLE> </HEAD> <BODY> <P>1.             Mercury</P> <P>2.             Venus</P> <P>3.             Earth</P> </BODY> </HTML>

You can use functions that operate on node sets in predicates, as in child::PLANET[position() = last()], which selects the last <PLANET> child of the context node.

XPath Booleans

You can also use Boolean values in XPath expressions. Numbers are considered false if they're zero and are considered true otherwise. An empty string ("") is also considered false, and all other strings are considered true.

You can use XPath logical operators to produce Boolean true/false results; here are the logical operators:

Operator Description
!= Is not equal to.
< Is less than. (Use &lt; in XML documents.)
<= Is less than or equal to. (Use &lt;= in XML documents.)
= Is equal to. (C, C++, Java, JavaScript programmers take note this operator is one = sign, not two.)
> Is greater than.
>= Is greater than or equal to.

You shouldn't use < directly in XML documents; use the entity reference &lt; instead.

You can also use the keywords and and or to connect Boolean clauses with a logical And or Or operation, as we've seen when working with JavaScript and Java.

Here's an example using the logical operator >. This rule applies to all <PLANET> elements after position 5:

<xsl:template match="PLANET[position() > 5]">     <xsl:value-of select="."/> </xsl:template>

There is also a true() functions that always returns a value of true, and a false() function that always returns a value of false.

You can also use the not() function to reverse the logical sense of an expression, as in this case, where I'm selecting all but the last <PLANET> element:

<xsl:template match="PLANET[not(position() = last())]">     <xsl:value-of select="."/> </xsl:template>

Finally, the lang() function returns true or false, depending on whether the language of the context node (which is given by xml:lang attributes) is the same as the language you pass to this function.

XPath Numbers

In XPath, numbers are actually stored as in double-precision floating-point format. (See Chapter 10, "Understanding Java," for more details on doubles; technically speaking, all XPath numbers are stored in 64-bit IEEE 754 floating-point double-precision format.) All numbers are stored as doubles, even integers such as 5, as in the example we just saw:

<xsl:template match="PLANET[position() > 5]">     <xsl:value-of select="."/> </xsl:template>

You can use several operators on numbers:

Operator Action
+ Adds.
- Subtracts.
* Multiplies.
div Divides. (The / character, which stands for division in other languages, is already heavily used in XML and XPath.)
mod Returns the modulus of two numbers (the remainder after dividing the first by the second).

For example, the element <xsl:value-of select="180 + 420"/> inserts the string "600" into the output document. This example selects all planets whose day (measured in earth days) divided by its mass (where the mass of Earth = 1) is greater than 100:

<xsl:template match="PLANETS">     <HTML>         <BODY>             <xsl:apply-templates select="PLANET[DAY div MASS > 100]"/>         </BODY>     </HTML> </xsl:template>

XPath also supports these functions that operate on numbers:

Function Description
ceiling() Returns the smallest integer larger than the number that you pass it
floor() Returns the largest integer smaller than the number that you pass it
round() Rounds the number that you pass it to the nearest integer
sum() Returns the sum of the numbers that you pass it

For example, here's how you can find the average mass of the planets in planets.xml:

<xsl:template match="PLANETS">     <HTML>         <BODY>             The average planetary mass is:             <xsl:value-of select="sum(child::MASS)             div count(descendant::MASS)"/>         </BODY>     </HTML> </xsl:template>
XPath Strings

In XPath, strings are made up of Unicode characters. A number of functions are specially designed to work on strings, as shown in this table.

Function Description
starts-with(string string1, string string2) Returns true if the first string starts with the second string
contains(string string1, string string2) Returns true if the first string contains the second one
substring(string string1, number offset, number length) Returns length characters from the string, starting at offset
substring-before(string string1, string string2) Returns the part of string1 up to the first occurrence of string2
substring-after(string string1, string string2) Returns the part of string1 after the first occurrence of string2
string-length(string string1) Returns the number of characters in string1
normalize-space(string string1) Returns string1 after leading and trailing whitespace is stripped and multiple consecutive whitespace is replaced with a single space
translate(string string1, string string2, string string3) Returns string1 with all occurrences of the characters in string2 replaced by the matching characters in string3
concat(string string1, string string2, ) Returns all strings concatenated (that is, joined) together
format-number(number number1, string string2, string string3) Returns a string holding the formatted string version of number1, using string2 as a formatting string (create formatting strings as you would for Java's java.text.DecimalFormat method), and string3 as the optional locale string
XPath Result Tree Fragments

A result tree fragment is a part of an XML document that is not a complete node or complete set of nodes. You can create result tree fragments in various ways, such as with the document() function when you point to somewhere inside another document.

You really can't do much with result tree fragments in XPath. Actually, you can do only two things: use the string() or boolean() functions to turn them into strings or Booleans.

XPath Examples

We've seen a lot of XPath in theory; how about some examples? Here's a number of location path examples note that XPath enables you to use and or or in predicates to apply logical tests using multiple patterns.

Example Action
child::PLANET Returns the <PLANET> element children of the context node.
child::* Returns all element children (* only matches elements) of the context node.
child::text() Returns all text node children of the context node.
child::node() Returns all the children of the context node, no matter what their node type is.
attribute::UNIT Returns the UNIT attribute of the context node.
descendant::PLANET Returns the <PLANET> element descendants of the context node.
ancestor::PLANET Returns all <PLANET> ancestors of the context node.
ancestor-or-self::PLANET Returns the <PLANET> ancestors of the context node. If the context node is a <PLANET> as well, also returns the context node.
descendant-or-self::PLANET Returns the <PLANET> element descendants of the context node. If the context node is a <PLANET> as well, also returns the context node.
self::PLANET Returns the context node if it is a <PLANET> element.
child::NAME/descendant::PLANET Returns the <PLANET> element descendants of the child <NAME> elements of the context node.
child::*/child::PLANET Returns all <PLANET> grandchildren of the context node.
/ Returns the document root (that is, the parent of the document element).
/descendant::PLANET Returns all the <PLANET> elements in the document.
/descendant::PLANET/child::NAME Returns all the <NAME> elements that have a <PLANET> parent.
child::PLANET[position() = 3] Returns the third <PLANET> child of the context node.
child::PLANET[position() = last()] Returns the last <PLANET> child of the context node.
/descendant::PLANET[position() = 3] Returns the third <PLANET> element in the document.
child::PLANETS/child::PLANET[position() = 4 ]/child::NAME[position() = 3] Returns the third <NAME> element of the fourth <PLANET> element of the <PLANETS> element.
child::PLANET[position() > 3] Returns all the <PLANET> children of the context node after the first three.
preceding-sibling::NAME[position() = 2] Returns the second previous <NAME> sibling element of the context node.
child::PLANET[attribute::COLOR = "RED"] Returns all <PLANET> children of the context node that have a COLOR attribute with value of RED.
child::PLANET[attribute::]COLOR = "RED"][position() = 3 Returns the third <PLANET> child of the context node that has a COLOR attribute with value of RED.
child::PLANET[position() = 3][attribute::COLOR="RED"] Returns the third <PLANET> child of the context node, only if that child has a COLOR attribute with value of RED.
child::MASS[child::NAME = "VENUS" ] Returns the <MASS> children of the context node that have <NAME> children whose text is VENUS.
child::PLANET[child::NAME] Returns the <PLANET> children of the context node that have <NAME> children.
child::*[self::NAME or self::MASS ] Returns both the <NAME> and <MASS> children of the context node.
child::*[self::NAME or self::MASS][position() = first()] Returns the first <NAME> or <MASS> child of the context node.

As you can see, some of this syntax is pretty involved and a little lengthy to type. However, there is an abbreviated form of XPath syntax.

XPath Abbreviated Syntax

You can take advantage of a number of abbreviations in XPath syntax. Here are the rules:

Expression Abbreviation
self::node() .
parent::node() ..
child::childname childname
attribute::childname @childname
/descendant-or-self::node()/ //

 

You can also abbreviate predicate expressions such as [position() = 3] as [3], [position() = last()] as [last()], and so on. Using the abbreviated syntax makes XPath expressions a lot easier to use. Here are some examples of location paths using abbreviated syntax note how well these fit the syntax we saw with the match attribute earlier in the chapter:

Path Description
PLANET Returns the <PLANET> element children of the context node.
* Returns all element children of the context node.
text() Returns all text node children of the context node.
@UNITS Returns the UNITS attribute of the context node.
@* Returns all the attributes of the context node.
PLANET[3] Returns the third <PLANET> child of the context node.
PLANET[first()] Returns the first <PLANET> child of the context node
*/PLANET Returns all <PLANET> grandchildren of the context node.
/PLANETS/PLANET[3]/NAME[2] Returns the second <NAME> element of the third <PLANET> element of the <PLANETS> element.
//PLANET Returns all the <PLANET> descendants of the document root.
PLANETS//PLANET Returns the <PLANET> element descendants of the <PLANETS> element children of the context node.
//PLANET/NAME Returns all the <NAME> elements that have an <PLANET> parent.
. Returns the context node itself.
.//PLANET Returns the <PLANET> element descendants of the context node.
.. Returns the parent of the context node.
../@UNITS Returns the UNITS attribute of the parent of the context node.
PLANET[NAME] Returns the <PLANET> children of the context node that have <NAME> children.
PLANET[NAME="Venus"] Returns the <PLANET> children of the context node that have <NAME> children with text equal to Venus.
PLANET[@UNITS = "days"] Returns all <PLANET> children of the context node that have a UNITS attribute with value days.
PLANET[6][@UNITS = "days"] Returns the sixth <PLANET> child of the context node, only if that child has a UNITS attribute with value days. Can also be written as PLANET[@UNITS = "days"][6].
PLANET[@COLOR and @UNITS] Returns all the <PLANET> children of the context node that have both a COLOR attribute and a UNITS attribute.

Here's an example in which I put the abbreviated syntax to work, moving up and down inside a <PLANET> element:

<?xml version="1.0"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">     <xsl:template match="PLANETS">         <HTML>             <xsl:apply-templates select="PLANET"/>         </HTML>     </xsl:template>     <xsl:template match="PLANET">         <xsl:apply-templates select="MASS"/>     </xsl:template>     <xsl:template match="MASS">         <xsl:value-of select="../NAME"/>         <xsl:value-of select="../DAY"/>         <xsl:value-of select="."/>    </xsl:template> </xsl:stylesheet>

Default XSLT Rules

XSLT has some built-in, default rules that we've already seen in action. For example, the default rule for text nodes is to add the text in that node to the output document.

The most important default rule applies to elements and can be expressed like this:

<xsl:template match="/ | *">     <xsl:apply-templates/> </xsl:template>

This rule is simply there to make sure that every element, from the root on down, is processed with <xsl:apply-templates/> if you don't supply some other rule. If you do supply another rule, it overrides the corresponding default rule.

The default rule for text can be expressed like this, where, by default, the text of a text node is added to the output document:

<xsl:template match="text()">     <xsl:value-of select="."/> </xsl:template>

The same kind of default rule applies to attributes, which are added to the output document with a default rule like this:

<xsl:template match="@*">     <xsl:value-of select="."/> </xsl:template>

By default, processing instructions are not inserted in the output document, so their default rule can be expressed simply like this:

<xsl:template match="processing-instruction()"/>

The same goes for comments, whose default rule can be expressed this way:

<xsl:template match="comment()"/>

The upshot of the default rules is that if you don't supply any rules at all, all the parsed character data in the input document is inserted in the output document. Here's what an XSLT style sheet with no explicit rules looks like:

<?xml version="1.0"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> </xsl:stylesheet>

Here's the results of applying this style sheet to planet.xml:

<?xml version="1.0" encoding="UTF-8"?>     Mercury     .0553     58.65     1516     .983     43.4     Venus     .815     116.75     3716     .943     66.8     Earth     1     1     2107     1     128.4

XSLT Rules and Internet Explorer

One of the problems of working with XSLT in Internet Explorer is that that browser doesn't supply any default rules. You have to supply all the rules yourself.

Altering Document Structure Based on Input

So far, the templates in this chapter have been fairly rigid skeletons, specifying exactly what should go into the output document in what order. But you can use XSLT elements such as <xsl:element>, <xsl:attribute>, <xsl:text>, and so on to create new nodes on the fly, based on what you find in the input document. I'll take a look at how this works now.

Creating Attribute Templates

Say that you wanted to convert the text in some elements to attributes in other elements how could you do it? Attribute values must be quoted in XML, but you can't just use expressions like these, where I'm taking the values of <NAME>, <MASS>, and <DAY> elements and trying to make them into attribute values:

<xsl:template match="PLANET">     <PLANET NAME="<xsl:value-of select='NAME'/>"         MASS="<xsl:value-of select='MASS'/>"         DAY="<xsl:value-of select='DAY'/>"     />

This won't work because you can't use < inside attribute values, as I have here. Instead, you must use an expression like {NAME}; here's the proper XSLT:

<?xml version="1.0"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:template match="PLANETS">     <HTML>         <HEAD>             <TITLE>                 Planets             </TITLE>         </HEAD>         <BODY>             <xsl:apply-templates select="PLANET"/>         </BODY>     </HTML> </xsl:template> <xsl:template match="PLANET">     <PLANET NAME="{NAME}"         MASS="{MASS}"         DAY="{DAY}"     /> </xsl:template> </xsl:stylesheet>

Here's the resulting document note that I've been able to convert the values in various elements to attributes:

<HTML> <HEAD> <TITLE>                 Planets             </TITLE> </HEAD> <BODY> <PLANET DAY="58.65" MASS=".0553" NAME="Mercury"> </PLANET> <PLANET DAY="116.75" MASS=".815" NAME="Venus"> </PLANET> <PLANET DAY="1" MASS="1" NAME="Earth"> </PLANET> </BODY> </HTML>

You can even include multiple expressions in curly braces, like this, where I'm adding the units for mass and day measurements from the UNITS attribute in the original elements:

<?xml version="1.0"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:template match="PLANETS">     <HTML>         <HEAD>             <TITLE>                 Planets             </TITLE>         </HEAD>         <BODY>             <xsl:apply-templates select="PLANET"/>         </BODY>     </HTML> </xsl:template> <xsl:template match="PLANET">     <PLANET NAME="{NAME}"         MASS="{MASS} {MASS/@UNITS}"         DAY="{DAY} {DAY/@UNITS}"     /> </xsl:template> </xsl:stylesheet>

Creating New Elements

You can create new elements with the <xsl:element> element. For example, say that I store the name of planets in a NAME attribute instead of a <NAME> element in planets.xml, like this:

<?xml version="1.0"?> <?xml-stylesheet type="text/xml" href="planets.xsl"?> <PLANETS>   <PLANET NAME="Mercury">       <MASS UNITS="(Earth = 1)">.0553</MASS>       <DAY UNITS="days">58.65</DAY>       <RADIUS UNITS="miles">1516</RADIUS>       <DENSITY UNITS="(Earth = 1)">.983</DENSITY>       <DISTANCE UNITS="million miles">43.4</DISTANCE><! At perihelion >   </PLANET>     .     .     .

I could create a new element using the name of the planet with <xsl:element>, supplying the name of the new planet with the name attribute, and enclosing a <MASS> element this way:

<?xml version="1.0"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:template match="PLANETS">     <HTML>         <HEAD>             <TITLE>                 Planets             </TITLE>         </HEAD>         <BODY>             <xsl:apply-templates select="PLANET"/>         </BODY>     </HTML> </xsl:template> <xsl:template match="PLANET">     <xsl:element name="{@NAME}">         <MASS><xsl:value-of select="MASS"/></MASS>     </xsl:element> </xsl:template> </xsl:stylesheet>

Here is the result, where I've created a new <mercury> element:

<HTML> <HEAD> <TITLE>                 Planets             </TITLE> </HEAD> <BODY> <Mercury> <MASS>.0553</MASS> </Mercury>     .     .     .

In this way, you can create new elements and name them when the XSLT transformation takes place.

Creating New Attributes

Just as you can create new elements with <xsl:element> and set the element name and content under programmatic control, you can do the same for attributes using the <xsl:attribute> element.

Here's an example; in this case, I'm creating new <PLANET> elements with attributes corresponding to the various planet names, and values taken from the COLOR attribute in the original <PLANET> elements:

<?xml version="1.0"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:template match="PLANETS">     <HTML>         <HEAD>             <TITLE>                 Planets             </TITLE>         </HEAD>         <BODY>             <xsl:apply-templates select="PLANET"/>         </BODY>     </HTML> </xsl:template> <xsl:template match="PLANET">     <PLANET>         <xsl:attribute name="{NAME}">             <xsl:value-of select="@COLOR"/>         </xsl:attribute>     </PLANET> </xsl:template> </xsl:stylesheet>

Here are the results; as you can see, I've created new attributes on the fly, using the names of the planets:

<HTML> <HEAD> <TITLE>                 Planets             </TITLE> </HEAD> <BODY> <PLANET Mercury="RED"> </PLANET> <PLANET Venus="WHITE"> </PLANET> <PLANET Earth="BLUE"> </PLANET> </BODY> </HTML>

Generating Comments with xsl:comment

You can also create comments on the fly with the <xsl:comment> element. Here's an example; in this case, I'm creating comments that will replace <PLANET> elements, and I'll include the name of the planet in the text of the comment:

<?xml version="1.0"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:template match="PLANETS">     <HTML>         <HEAD>             <TITLE>                 Planets             </TITLE>         </HEAD>         <BODY>             <xsl:apply-templates select="PLANET"/>         </BODY>     </HTML> </xsl:template> <xsl:template match="PLANET">     <xsl:comment>This was the <xsl:value-of select="NAME"/> element</xsl:comment> </xsl:template> </xsl:stylesheet>

Here's the result:

<HTML> <HEAD> <TITLE>                 Planets             </TITLE> </HEAD> <BODY> <!--This was the Mercury element--> <!--This was the Venus element--> <!--This was the Earth element--> </BODY> </HTML>

Generating Text with xsl:text

You can create text nodes with the <xsl:text> element, allowing you to do things such as replace whole elements with text on the fly. One reason you can use <xsl:text> is to preserve whitespace, as in this example from earlier in the chapter, where I used <xsl:text> to insert spaces:to use <xsl:text> is when you want characters such as < and & to appear in your output document, not &lt; and &amp;. To do that, you set the <xsl:text> element's disable-output-escaping

<?xml version="1.0"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">     <xsl:template match="/PLANETS">         <HTML>             <HEAD>                 <TITLE>                     The Planets Table                 </TITLE>             </HEAD>             <BODY>                 <H1>                     The Planets Table                 </H1>                 <TABLE>                     <TD>Name</TD>                     <TD>Mass</TD>                     <TD>Radius</TD>                     <TD>Day</TD>                     <xsl:apply-templates/>                 </TABLE>             </BODY>         </HTML>     </xsl:template>     <xsl:template match="PLANET">        <TR>         <TD><xsl:value-of select="NAME"/></TD>         <TD><xsl:apply-templates select="MASS"/></TD>         <TD><xsl:apply-templates select="RADIUS"/></TD>       </TR>    </xsl:template>     <xsl:template match="MASS">       <xsl:value-of select="."/>       <xsl:text> </xsl:text>       <xsl:value-of select="@UNITS"/>     </xsl:template>     <xsl:template match="RADIUS">       <xsl:value-of select="."/>       <xsl:text> </xsl:text>       <xsl:value-of select="@UNITS"/>     </xsl:template>     <xsl:template match="DAY">       <xsl:value-of select="."/>       <xsl:text> </xsl:text>       <xsl:value-of select="@UNITS"/>     </xsl:template> </xsl:stylesheet>

Another reason attribute to "yes":

<?xml version="1.0"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:template match="PLANETS">     <HTML>         <HEAD>             <TITLE>                 Planets             </TITLE>         </HEAD>         <BODY>             <xsl:apply-templates select="PLANET"/>         </BODY>     </HTML> </xsl:template> <xsl:template match="PLANET">     <xsl:text disable-output-escaping = "yes">         &lt;PLANET&gt;     </xsl:text> </xsl:template> </xsl:stylesheet>

Here is the result:

<HTML> <HEAD> <TITLE>                 Planets             </TITLE> </HEAD> <BODY>       <PLANET>       <PLANET>       <PLANET>   </BODY> </HTML>

Copying Nodes

You can use the <xsl:copy> element to copy nodes, specifying just what parts you want to copy. The default rule for elements is that only the text in the element is copied. However, you can change that with <xsl:copy>, which can copy whole elements, text nodes, attributes, processing instructions and more, as you direct.

Here's an example; in this case, I'll strip all comments, processing instructions, and attributes out of planets.xml, simply by copying only text and elements:

<?xml version="1.0"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:template match="* | text()">     <xsl:copy>         <xsl:apply-templates select="* | text()"/>     </xsl:copy> </xsl:template> </xsl:stylesheet>

Here's the output of this transformation:

<?xml version="1.0" encoding="UTF-8"?> <PLANETS>   <PLANET>     <NAME>Mercury</NAME>     <MASS>.0553</MASS>     <DAY>58.65</DAY>     <RADIUS>1516</RADIUS>     <DENSITY>.983</DENSITY>     <DISTANCE>43.4</DISTANCE>   </PLANET>     .     .     .

Sorting Elements

You can use the <xsl:sort> element to sort node sets. You use this element inside <xsl:apply-templates> and then use its select attribute to specify what to sort on. For example, here's how I sort the planets based on density:

<?xml version="1.0"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">     <xsl:template match="PLANETS">         <HTML>             <HEAD>                 <TITLE>                     Planets                 </TITLE>             </HEAD>             <BODY>                 <H1>Planets sorted by density</H1>                 <TABLE>                     <TD>Planet</TD>                     <TD>Mass</TD>                     <TD>Day</TD>                     <TD>Density</TD>                     <xsl:apply-templates>                         <xsl:sort select="DENSITY"/>                     </xsl:apply-templates>                 </TABLE>             </BODY>         </HTML>     </xsl:template>     <xsl:template match="PLANET">         <TR>             <TD><xsl:apply-templates select="NAME"/></TD>             <TD><xsl:apply-templates select="MASS"/></TD>             <TD><xsl:apply-templates select="DAY"/></TD>             <TD><xsl:apply-templates select="DENSITY"/></TD>         </TR>     </xsl:template> </xsl:stylesheet>

Here are the results of this transformation:

<HTML> <HEAD> <TITLE>                     Planets                 </TITLE> </HEAD> <BODY> <H1>Planets sorted by density</H1> <TABLE> <TD>Planet</TD><TD>Mass</TD><TD>Day</TD><TD>Density</TD> <TR> <TD>Venus</TD><TD>.815</TD><TD>116.75</TD><TD>.943</TD> </TR> <TR> <TD>Mercury</TD><TD>.0553</TD><TD>58.65</TD><TD>.983</TD> </TR> <TR> <TD>Earth</TD><TD>1</TD><TD>1</TD><TD>1</TD> </TR> </TABLE> </BODY> </HTML>

You can see this HTML page in Figure 13.2.

Figure 13.2. Sorting elements.

graphics/13fig02.gif

Note that, by default, <xsl:sort> performs an alphabetic sort, which means that 10 will come before 2. You can perform a true numeric sort by setting the data-type attribute to "number", like this:

<xsl:sort data-type="number" select="DENSITY"/>

You can also create descending sorts by setting the <xsl:sort> element's order attribute to "descending".

Using xsl:if

You can make choices based on the input document using the <xsl:if> element. To use this element, you simply set its test attribute to an expression that evaluates to a Boolean value.

Here's an example. In this case, I'll list the planets one after the other and add a HTML horizontal rule, <HR>, element after the last element but only after the last element. I can do that with <xsl:if>, like this:

<?xml version="1.0"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:template match="PLANETS">     <HTML>         <HEAD>             <TITLE>                 Planets             </TITLE>         </HEAD>         <BODY>             <xsl:apply-templates select="PLANET"/>         </BODY>     </HTML> </xsl:template> <xsl:template match="PLANET">     <P>     <xsl:value-of select="NAME"/>     is planet number <xsl:value-of select="position()"/> from the sun.     </P>     <xsl:if test="position() = last()"><xsl:element name="HR"/></xsl:if> </xsl:template> </xsl:stylesheet>

Here is the result; as you can see, the <HR> element appears after only the last planet has been listed:

<HTML> <HEAD> <TITLE>                 Planets             </TITLE> </HEAD> <BODY> <P>Mercury     is planet number 1 from the sun.     </P> <P>Venus     is planet number 2 from the sun.     </P> <P>Earth     is planet number 3 from the sun.     </P> <HR> </BODY> </HTML>

Using xsl:choose

The <xsl:choose> element is much like the Java switch statement, which enables you to compare a test value against several possible matches. Suppose that we add COLOR attributes to each <PLANET> element in planets.xml:

<?xml version="1.0"?> <?xml-stylesheet type="text/xml" href="planets.xsl"?> <PLANETS>   <PLANET COLOR="RED">     <NAME>Mercury</NAME>     <MASS UNITS="(Earth = 1)">.0553</MASS>     <DAY UNITS="days">58.65</DAY>     <RADIUS UNITS="miles">1516</RADIUS>     <DENSITY UNITS="(Earth = 1)">.983</DENSITY>     <DISTANCE UNITS="million miles">43.4</DISTANCE><!--At perihelion-->   </PLANET>   <PLANET COLOR="WHITE">     <NAME>Venus</NAME>     <MASS UNITS="(Earth = 1)">.815</MASS>     <DAY UNITS="days">116.75</DAY>     <RADIUS UNITS="miles">3716</RADIUS>     <DENSITY UNITS="(Earth = 1)">.943</DENSITY>     <DISTANCE UNITS="million miles">66.8</DISTANCE><!--At perihelion-->   </PLANET>   <PLANET COLOR="BLUE">     <NAME>Earth</NAME>     <MASS UNITS="(Earth = 1)">1</MASS>     <DAY UNITS="days">1</DAY>     <RADIUS UNITS="miles">2107</RADIUS>     <DENSITY UNITS="(Earth = 1)">1</DENSITY>     <DISTANCE UNITS="million miles">128.4</DISTANCE><!--At perihelion-->   </PLANET> </PLANETS>

Now say that we want to display the names of the various planets, formatted in different ways using HTML <B>, <I>, and <U> tags, depending on the value of the COLOR attribute. I can do this with an <xsl:choose> element. Each case in the <xsl:choose> element is specified with an <xsl:when> element, and you specify the actual test for the case with the test attribute. Here's what it looks like:

<?xml version="1.0"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:template match="PLANETS">     <HTML>         <HEAD>             <TITLE>                 Planets             </TITLE>         </HEAD>         <BODY>             <xsl:apply-templates select="PLANET"/>         </BODY>     </HTML> </xsl:template> <xsl:template match="PLANET">     <xsl:choose>         <xsl:when test="@COLOR = 'RED'">             <B>                 <xsl:value-of select="NAME"/>             </B>         </xsl:when>         <xsl:when test="@COLOR = 'WHITE'">             <I>                 <xsl:value-of select="NAME"/>             </I>         </xsl:when>         <xsl:when test="@COLOR = 'BLUE'">             <U>                 <xsl:value-of select="NAME"/>             </U>         </xsl:when>         <xsl:otherwise>              <PRE>                  <xsl:value-of select="."/>              </PRE>         </xsl:otherwise>     </xsl:choose> </xsl:template> </xsl:stylesheet>

Note also the <xsl:otherwise> element in this example, which acts the same way as the default: case in a switch statement that is, if no other case matches, the <xsl:otherwise> element is applied. Here is the result of this XSLT:

<HTML> <HEAD> <TITLE>                 Planets             </TITLE> </HEAD> <BODY> <B>Mercury</B> <I>Venus</I> <U>Earth</U> </BODY> </HTML>

Controlling Output Type

A lot of the examples in this chapter have converted XML into HTML, and you might have wondered how an XSLT processor knows to omit the <?xml?> declaration from the beginning of such output documents. It turns out that there's a special rule here: If the document node of the output document is <HTML>, the XSLT processor knows that the output document type is HTML and writes the document accordingly.

In fact, you can specify three types of output documents:

  • XML. This is the default, and such documents start with an <?xml?> declaration. In addition, entity references will not be replaced with characters such as < or & in the output document; the actual entity reference will appear in the output.

  • HTML. This is standard HTML 4.0, without a XML declaration or any need to close elements that don't normally have a closing tag in HTML 4.0. Empty elements can end with >, not />, and < and & characters in text are not escaped with the corresponding character entity references.

  • Text. This type of output represents pure text. In this case, the output document is simply the plain text of the document tree.

You can set the output method by setting the <xsl:output> element's method attribute to "xml", "html", or "text". For example, if you want to create an HTML document, even though the root element is not <HTML>, you can use this <xsl:output> element:

<xsl:output method = "html"/>

Another useful attribute of <xsl:output> is the indent attribute, which enables the XSLT processor (but does not force it) to insert whitespace to indent the output. Here's how you can use this attribute:

<xsl:output indent = "yes"/>

This next table shows some <xsl:output> attributes that you can use to create or modify XML declarations:

Attribute Description
encoding Specifies the value for the XML declaration's encoding attribute.
omit-xml-declaration Specifies whether the processor should omit the XML declaration. Set this to yes or no.
standalone Specifies the value for the XML declaration's standalone attribute. Set this to yes or no.
version Specifies the value for the XML declaration's version attribute.

Another useful attribute of <xsl:output> is media-type, which enables you to specify the MIME type of the output document. Here's an example:

<xsl:output media-type="text/xml"/>

You can also use the <xsl:output> doctype-system and doctype-public attributes to specify an external DTD. For example, take a look at the following <xsl:output> element:

<xsl:output doctype-system = "planets.dtd"/>

It produces a <!DOCTYPE> element in the output document, like this:

<!DOCTYPE PLANETS SYSTEM "planets.dtd">

As you can see, there's a tremendous amount going on in XSL transformations. In fact, there's more than we can cover here for plenty of additional details, take a look at the W3C XSLT specification at http://www.w3.org/TR/xslt, and the XPath specification at http://www.w3.org/TR/xpath.

There's more to XSL besides XSL transformations, XSL also includes a whole formatting language, and I'm going to take a look at that in the next chapter.

CONTENTS


Inside XML
Real World XML (2nd Edition)
ISBN: 0735712867
EAN: 2147483647
Year: 2005
Pages: 23
Authors: Steve Holzner

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net