9.2 Location Paths | XML in a Nutshell, Third Edition

The most useful XPath expression is a location path . A location path identifies a set of nodes in a document. This set may be empty, may contain a single node, or may contain several nodes. These can be element nodes, attribute nodes, namespace nodes, text nodes, comment nodes, processing-instruction nodes, root nodes, or any combination of these. A location path is built out of successive location steps . Each location step is evaluated relative to a particular node in the document called the context node .

9.2.1 The Root Location Path

The simplest location path is the one that selects the root node of the document. This is simply the forward slash / . (You'll notice that a lot of XPath syntax is deliberately similar to the syntax used by the Unix shell. Here / is the root node of a Unix filesystem, and / is the root node of an XML document.) For example, this XSLT template rule uses the XPath pattern / to match the entire input document tree and wrap it in an html element:

 <xsl:template match="/">   <html><xsl:apply-templates/></html> </xsl:template>

/ is an absolute location path because no matter what the context node isthat is, no matter where the processor was in the input document when this template rule was appliedit always means the same thing: the root node of the document. It is relative to which document you're processing, but not to anything within that document.

9.2.2 Child Element Location Steps

The second simplest location path is a single element name. This path selects all child elements of the context node with the specified name . For example, the XPath profession refers to all profession child elements of the context node. Exactly which elements these are depends on what the context node is, so this is a relative XPath. For example, if the context node is the Alan Turing person element in Example 9-1, then the location path profession refers to these three profession child elements of that element:

 <profession>computer scientist</profession> <profession>mathematician</profession> <profession>cryptographer</profession>

However, if the context node is the Richard Feynman person element in Example 9-1, then the XPath profession refers to its single profession child element:

 <profession>physicist</profession>

If the context node is the name child element of Richard Feynman or Alan Turing's person element, then this XPath doesn't refer to anything at all because neither of those has any profession child elements.

In XSLT, the context node for an XPath expression used in the select attribute of xsl:apply-templates and similar elements is the node that is currently matched. For example, consider the simple stylesheet in Example 9-2. In particular, look at the template rule for the person element. The XSLT processor will activate this rule twice, once for each person node in the document. The first time the context node is set to Alan Turing's person element. The second time the context node is set to Richard Feynman's person element. When the same template is instantiated with a different context node, the XPath expression in <xsl:value-of select="name"/> refers to a different element, and the output produced is therefore different.

Example 9-2. A very simple stylesheet for Example 9-1

 <?xml version="1.0"?> <xsl:stylesheet version="1.0"                 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">         <xsl:template match="people">     <xsl:apply-templates select="person"/>   </xsl:template>         <xsl:template match="person">     <xsl:value-of select="name"/>   </xsl:template>       </xsl:stylesheet>

When XPath is used in other systems, such as XPointer or XForms, other means are provided for determining what the context node is.

9.2.3 Attribute Location Steps

Attributes are also addressable by XPath. To select a particular attribute of an element, use an @ sign followed by the name of the attribute you want. For example, the XPath expression @born selects the born attribute of the context node. Example 9-3 is a simple XSLT stylesheet that generates an HTML table of names and birth and death dates from documents like Example 9-1.

Example 9-3. An XSLT stylesheet that uses root element, child element, and attribute location steps

 <?xml version="1.0"?> <xsl:stylesheet version="1.0"                 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">         <xsl:template match="/">     <html>       <xsl:apply-templates select="people"/>     </html>   </xsl:template>         <xsl:template match="people">     <table>       <xsl:apply-templates select="person"/>     </table>   </xsl:template>         <xsl:template match="person">     <tr>       <td><xsl:value-of select="name"/></td>       <td><xsl:value-of select="@born"/></td>       <td><xsl:value-of select="@died"/></td>     </tr>   </xsl:template>       </xsl:stylesheet>

The stylesheet in Example 9-3 has three template rules. The first template rule has a match pattern that matches the root node, / . The XSLT processor activates this template rule and sets the context node to the root node. Then it outputs the start-tag <html> . This is followed by an xsl:apply-templates element that selects nodes matching the XPath expression people . If the input document is Example 9-1, then there is exactly one such node, the root element. This is selected and its template rule, the one with the match pattern of people , is applied. The XSLT processor sets the context node to the root people element and then begins processing the people template. It outputs a <table> start-tag and then encounters an xsl:apply-templates element that selects nodes matching the XPath expression person . Two child elements of this context node match the XPath expression person , so they're each processed in turn using the person template rule. When the XSLT processor begins processing each person element, it sets the context node to that element. It outputs that element's name child element value and born and died attribute values wrapped in a table row and three table cells . The net result is:

 <html>    <table>       <tr>          <td>             Alan             Turing                </td>          <td>1912</td>          <td>1954</td>       </tr>       <tr>          <td>             Richard             P             Feynman                </td>          <td>1918</td>          <td>1988</td>       </tr>    </table> </html>

9.2.4 The comment( ), text( ), and processing-instruction( ) Location Steps

Although element, attribute, and root nodes account for 90% or more of what you need to do with XML documents, this still leaves four kinds of nodes that need to be addressed: namespace nodes, text nodes, processing-instruction nodes, and comment nodes. Namespace nodes are rarely handled explicitly. The other three node types have special node tests to match them. These are as follows :

comment( )
text( )
processing-instruction( )

Since comments and text nodes don't have names, the comment( ) and text( ) location steps match any comment or text node child of the context node. Each comment is a separate comment node. Each text node contains the maximum possible contiguous run of text not interrupted by any tag. Entity references and CDATA sections are resolved into text and markup and do not interrupt text nodes.

By default, XSLT stylesheets do process text nodes but do not process comment nodes. You can add a comment template rule to an XSLT stylesheet so it will process comments too. For example, this template rule replaces each comment with the text "Comment Deleted" in italic:

 <xsl:template match="comment( )">   <i>Comment Deleted</i> </xsl:template>

With no arguments, the processing-instruction( ) location step selects all processing-instruction children of the context node. If it has an argument, then it only selects the processing-instruction children with the specified target. For example, the XPath expression processing-instruction('xml-stylesheet') selects all processing-instruction children of the context node whose target is xml-stylesheet .

9.2.5 Wildcards

Wildcards match different element and node types at the same time. There are three wildcards: * , node( ) , and @* .

The asterisk ( * ) matches any element node regardless of name. For example, this XSLT template rule says that all elements should have their child elements processed but should not result in any output in and of themselves :

 <xsl:template match="*"><xsl:apply-templates select="*"/></xsl:template>

The * does not match attributes, text nodes, comments, or processing-instruction nodes. Thus, in the previous example, output will only come from child elements that have their own template rules that override this one.

You can put a namespace prefix in front of the asterisk. In this case, only elements in the same namespace are matched. For example, svg:* matches all elements with the same namespace URI as the svg prefix is mapped to. As usual, it's the URI that matters, not the prefix. The prefix can be different in the stylesheet and the source document as long as the namespace URI is the same.

The node( ) wildcard matches not only all element types but also the root node, text nodes, processing-instruction nodes, namespace nodes, attribute nodes, and comment nodes.

The @* wildcard matches all attribute nodes. For example, this XSLT template rule copies the values of all attributes of a person element in the document into the content of an attributes element in the output:

 <xsl:template match="person">   <attributes><xsl:apply-templates select="@*"/></attributes> </xsl:template>

As with elements, you can attach a namespace prefix to the wildcard to match attributes in a specific namespace. For instance, @xlink:* matches all XLink attributes provided that the prefix xlink is mapped to the http://www.w3.org/1999/xlink URI. Again, it's the URI that matters, not the actual prefix.

9.2.6 Multiple Matches with

You often want to match more than one type of element or attribute but not all types. For example, you may want an XSLT template that applies to the profession and hobby elements but not to the name , person , or people elements. You can combine location paths and steps with the vertical bar ( ) to indicate that you want to match any of the named elements. For instance, professionhobby matches profession and hobby elements. first_namemiddle_initiallast_name matches first_name , middle_initial , and last_name elements. @id@xlink:type matches id and xlink:type attributes. *@* matches elements and attributes but does not match text nodes, comment nodes, or processing-instruction nodes. For example, this XSLT template rule applies to all the nonempty leaf elements (elements that don't contain any other elements) of Example 9-1of Example 9-1:

 <xsl:template match="first_namelast_nameprofessionhobby">   <xsl:value-of select="text( )"/> </xsl:template>