Understanding XPath

To specify a node or set of nodes in XPath, you use a location path . A location path, in turn , consists of one or more location steps, separated by / or // . If you start the location path with / , the location path is called an absolute location path because you're specifying the path from the root node; otherwise , the location path is relative, starting with the current node, which is called the context node. Got all that? Good, because there's more.

A location step is made up of an axis, a node test, and zero or more predicates. For example, in the expression child::PLANET[position() = 5] , child is the name of the axis, PLANET is the node test, and [position() = 5] is a predicate. You can create location paths with one or more location steps, as in /descendant::PLANET/child::NAME , which selects all the <NAME> elements that have an <PLANET> parent. The best way to understand all this is by example, and we'll see plenty of them in a few pages. In the meantime, I'll take a look at what kind of axes, node tests, and predicates XPath supports.

XPath Axes

In the location path child::NAME , which refers to a <NAME> element that is a child of the current node, child is called the axis. XPath supports many different axes, and it's important to know what they are. Here's the list:

The ancestor axis holds the ancestors of the context node; the ancestors of the context node are the parent of context node and the parent's parent and so forth, back to and including the root node.
The ancestor-or-self axis holds the context node and the ancestors of the context node.
The attribute axis holds the attributes of the context node.
The child axis holds the children of the context node.
The descendant axis holds the descendants of the context node. A descendant is a child or a child of a child, and so on.
The descendant-or-self axis contains the context node and the descendants of the context node.
The following axis holds all nodes in the same document as the context node that come after the context node.
The following-sibling axis holds all the following siblings of the context node. A sibling is a node on the same level as the context node.
The namespace axis holds the namespace nodes of the context node.
The parent axis holds the parent of the context node.
The preceding axis contains all nodes that come before the context node.
The preceding-sibling axis contains all the preceding siblings of the context node. A sibling is a node on the same level as the context node.
The self axis contains the context node.

You can use axes to specify a location step or path, as in this example. I'm using the child axis to indicate that I want to match to child nodes of the context node, which is a <PLANET> element (we'll see later that there's an abbreviated version that lets you omit the child:: part):

 <xsl:template match="PLANET">      <HTML>         <CENTER>  <xsl:value-of select="child::NAME"/>  </CENTER>         <CENTER>  <xsl:value-of select="child::MASS"/>  </CENTER>         <CENTER>  <xsl:value-of select="child::DAY"/>  </CENTER>     </HTML> </xsl:template>

In these expressions, child is the axis, and the element names NAME , MASS , and DAY are node tests.

XPath Node Tests

You can use names of nodes as node tests, or you can use the wildcard * to select element nodes. For example, the expression child::*/child::NAME selects all <NAME> elements that are grandchildren of the context node. Besides nodes and the wildcard character, you can use these node tests:

The comment() node test selects comment nodes.
The node() node test selects any type of node.
The processing-instruction() node test selects a processing instruction node. You can specify the name of the processing instruction to select in the parentheses.
The text() node test selects a text node.

XPath Predicates

The predicate part of an XPath step is perhaps its most intriguing part because it gives you the most power. You can work with all kinds of expressions in predicates; here are the possible types:

Node sets
Booleans
Numbers
Strings
Result tree fragments

I'll take a look at these various types in turn.

XPath Node Sets

As its name implies, a node set is simply a set of nodes. An expression such as child::PLANET returns a node set of all <PLANET> elements. The expression child::PLANET/child::NAME returns a node list of all <NAME> elements that are children of <PLANET> elements. To select a node or nodes from a node set, you can use various functions that work on node sets in predicates. Here are those functions:

last() Returns the number of nodes in a node set.
position() Returns the position of the context node in the context node set (starting with 1 ).
count( node-set ) Returns the number of nodes in the node set. Omitting node-set makes this function use the context node.
id(string ID ) Returns a node set containing the element whose ID matches the string passed to the function, or an empty node set if no element has the specified ID. You can list multiple IDs separated by whitespace, and this function will return a node set of the elements with those IDs.
local-name( node-set ) Returns the local name of the first node in the node set. Omitting node-set makes this function use the context node.
namespace-uri( node-set ) Returns the URI of the namespace of the first node in the node set. Omitting node-set makes this function use the context node.
name( node-set ) Returns the full, qualified name of the first node in the node set. Omitting node-set makes this function use the context node.

Here's an example. In this case, I'll number the elements in the output document using the position() function:

Listing ch13_14.xsl

 <?xml version="1.0"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">     <xsl:template match="PLANETS">         <HTML>             <HEAD>                 <TITLE>                     The Planets                 </TITLE>             </HEAD>             <BODY>                 <xsl:apply-templates select="PLANET"/>             </BODY>         </HTML>     </xsl:template>     <xsl:template match="PLANET">         <P>  <xsl:value-of select="position()"/>.  <xsl:value-of select="NAME"/>         </P>     </xsl:template> </xsl:stylesheet>

Here's the result, where you can see that the planets are numbered:

 <HTML>  <HEAD> <TITLE>                     The Planets                 </TITLE> </HEAD> <BODY> <P>1.             Mercury</P> <P>2.             Venus</P> <P>3.             Earth</P> </BODY> </HTML>

You can use functions that operate on node sets in predicates, as in child::PLANET[position() = last()] , which selects the last <PLANET> child of the context node.

XPath Booleans

You can also use Boolean values in XPath expressions. Numbers are considered false if they're zero; otherwise, they're considered true. An empty string, "" , is also considered false; all other strings are considered true.

You can use XPath logical operators to produce Boolean true / false results; here are the logical operators:

!= means "is not equal to."
< means "is less than." (Use < in XML documents.)
<= means "is less than or equal to." (Use <= in XML documents.)
= means "is equal to." (C, C++, Java, and JavaScript programmers take note: This operator is one = sign, not two.)
> means "is greater than."
>= means "is greater than or equal to."

Use < and >

Note in particular that you shouldn't use < directly in XML documents; you should use the entity reference < instead. Some processors require you to use > for > as well.

You can also use the keywords and or or to connect Boolean clauses with a logical And or Or operation, as we've seen when working with JavaScript and Java.

Here's an example using the logical operator > . This rule applies to all <PLANET> elements after position 5:

 <xsl:template match="PLANET[position() > 5]">      <xsl:value-of select="."/> </xsl:template>

There is also a true() functions that always returns a value of true , and a false() function that always returns a value of false .

You can also use the not() function to reverse the logical sense of an expression, as in this case, where I'm selecting all but the last <PLANET> element:

 <xsl:template match="PLANET[not(position() = last())]">      <xsl:value-of select="."/> </xsl:template>

Finally, the lang() function returns true or false depending on whether the language of the context node (which is given by xml:lang attributes) is the same as the language you pass to this function.

XPath Numbers

In XPath, numbers are actually stored in double floating-point format (see Chapter 10, "Understanding Java," for more details on doublestechnically, all XPath numbers are stored in 64-bit IEEE 754 floating-point double format). All numbers are stored as doubles, even integers such as 5, as in the example we just saw:

 <xsl:template match="PLANET[position() > 5]">      <xsl:value-of select="."/> </xsl:template>

You can use several operators on numbers:

Operator	Function
`+`	Addition.
`-`	Subtraction.
`*`	Multiplication.
`div`	Division. (The / character that stands for division in other languages is already used heavily in XML and Xpath.)
`mod`	Returns the modulus of two numbers (the remainder after dividing the first by the second).

For example, the element <xsl:value-of select="180 + 420"/> inserts the string "600" into the output document. This example selects all planets whose day (measured in Earth days) divided by its mass (where the mass of the Earth equals 1) is greater than 100:

 <xsl:template match="PLANETS">      <HTML>         <BODY>             <xsl:apply-templates select="PLANET[DAY div MASS > 100]"/>         </BODY>     </HTML> </xsl:template>

XPath also supports these functions that operate on numbers:

ceiling() Returns the smallest integer larger than the number you pass it
floor() Returns the largest integer smaller than the number you pass it
round() Rounds the number you pass it to the nearest integer
sum() Returns the sum of the numbers you pass it

For example, here's how you can find the average mass of the planets in ch13_01.xml:

 <xsl:template match="PLANETS">      <HTML>         <BODY>  The average planetary mass is:   <xsl:value-of select="sum(child::MASS) div count(child::MASS)"/>  </BODY>     </HTML> </xsl:template>

XPath Strings

In XPath, strings are made up of Unicode characters . A number of functions are specially designed to work on strings:

starts-with (string string1 , string string2 ) Returns true if the first string starts with the second string
contains(string string1 , string string2 ) Returns true if the first string contains the second string
substring(string string1 , number offset , number length ) Returns length characters from the string, starting at offset
substring-before (string string1 , string string2 ) Returns the part of string1 up to the first occurrence of string2
substring-after(string string1 , string string2 ) Returns the part of string1 after the first occurrence of string2
string-length (string string1 ) Returns the number of characters in string1
normalize-space(string string1 ) Returns string1 after leading and trailing whitespace is stripped and multiple consecutive whitespace characters are replaced with a single space
translate(string string1 , string string2 , string string3 ) Returns string1 with all occurrences of the characters in string2 replaced by the matching characters in string3
concat(string string1 , string string2 , ...) Returns all strings concatenated (that is, joined) together
format-number(number number1 , string string2 , string string3 ) Returns a string holding the formatted string version of number1 , using string2 as a formatting string (create formatting strings as you would for java's java.text.DecimalFormat method) and string3 as the optional locale string

XPath Result Tree Fragments

A result tree fragment is a part of an XML document that is not a complete node or complete set of nodes. You can create result tree fragments in various ways, such as with the document() function when you point to somewhere inside another document.

You really can't do much with result tree fragments in XPath. Actually, you can do only two things: use the string() or boolean() functions to turn them into strings or Booleans.

XPath Examples

We've seen a lot of XPath in theory, how about some examples? Here's a number of location path examplesnote that XPath allows you to use and or or in predicates to apply logical tests using multiple patterns:

child::PLANET Returns the <PLANET> element children of the context node.
child::* Returns all element children ( * matches only elements) of the context node.
child::text() Returns all text node children of the context node.
child::node() Returns all the children of the context node, no matter what their node type is.
attribute::UNIT Returns the UNITS attribute of the context node.
descendant::PLANET Returns the <PLANET> element descendants of the context node.
ancestor::PLANET Returns all <PLANET> ancestors of the context node.
ancestor-or-self::PLANET Returns the <PLANET> ancestors of the context node. If the context node is a <PLANET> as well, also returns the context node.
descendant-or-self::PLANET Returns the <PLANET> element descendants of the context node. If the context node is a <PLANET> as well, also returns the context node.
self::PLANET Returns the context node if it is a <PLANET> element.
child::NAME/descendant::PLANET Returns the <PLANET> element descendants of the child <NAME> elements of the context node.
child::*/child::PLANET Returns all <PLANET> grandchildren of the context node.
/ Returns the document root (that is, the parent of the document element).
/descendant::PLANET Returns all the <PLANET> elements in the document.
/descendant::PLANET/child::NAME Returns all the <NAME> elements that have a <PLANET> parent.
child::PLANET[position() = 3] Returns the third <PLANET> child of the context node.
child::PLANET[position() = last()] Returns the last <PLANET> child of the context node.
/descendant::PLANET[position() = 3] Returns the third <PLANET> element in the document.
child::PLANETS/child::PLANET[position() = 4 ]/child::NAME[position() = 3] Returns the third <NAME> element of the fourth <PLANET> element of the <PLANETS> element.
child::PLANET[position() > 3] Returns all the <PLANET> children of the context node after the first three.
preceding-sibling::NAME[position() = 2] Returns the second previous <NAME> sibling element of the context node.
child::PLANET[attribute::COLOR = "RED"] Returns all <PLANET> children of the context node that have a COLOR attribute with value of "RED" .
child::PLANET[attribute::COLOR = "RED"][position() = 3] Returns the third <PLANET> child of the context node that has a COLOR attribute with value of "RED" .
child::PLANET[position() = 3][attribute::COLOR="RED"] Returns the third <PLANET> child of the context node only if that child has a COLOR attribute with value of "RED" .
child::MASS[child::NAME = "VENUS"] Returns the <MASS> children of the context node that have <NAME> children whose text is "VENUS" .
child::PLANET[child::NAME] Returns the <PLANET> children of the context node that have <NAME> children.
child::*[self::NAME or self::MASS] Returns both the <NAME> and <MASS> children of the context node.
child::*[self::NAME or self::MASS][position() = first()] Returns the first <NAME> or <MASS> child of the context node.

As you can see, some of this syntax is pretty involved and a little lengthy to type. However, there is an abbreviated form of XPath syntax.

XPath Abbreviated Syntax

You can take advantage of a number of abbreviations in XPath syntax. Here are the rules:

self::node() can be abbreviated as .
parent::node() can be abbreviated as ..
child:: childname can be abbreviated as childname
attribute:: childname can be abbreviated as @ childname
/descendant-or-self::node()/ can be abbreviated as //

You can also abbreviate predicate expressions, as in [position() = 3] as [3] , [position() = last()] as [last()] , and so on. Using the abbreviated syntax makes XPath expressions a lot easier to use. Here are some examples of location paths using abbreviated syntaxnote how well these fit the syntax we saw with the match attribute earlier in the chapter:

Abbreviated Syntax	Description
`PLANET`	Returns the `<PLANET>` element children of the context node.
`*`	Returns all element children of the context node.
`text()`	Returns all text node children of the context node.
`@UNITS`	Returns the `UNITS` attribute of the context node.
`@*`	Returns all the attributes of the context node.
`PLANET[3]`	Returns the third `<PLANET>` child of the context node.
`PLANET[first()]`	Returns the first `<PLANET>` child of the context node.
`*/PLANET`	Returns all `<PLANET>` grandchildren of the context node.
`/PLANETS/PLANET[3] /NAME[2]`	Returns the second `<NAME>` element of the third `<PLANET>` element of the `<PLANETS>` element.
`//PLANET`	Returns all the `<PLANET>` descendants of the document root.
`PLANETS//PLANET`	Returns the `<PLANET>` element descendants of the `<PLANETS>` element children of the context node.
`//PLANET/NAME`	Returns all the `<NAME>` elements that have an `<PLANET>` parent.
`.`	Returns the context node itself.
`.//PLANET`	Returns the `<PLANET>` element descendants of the context node.
`..`	Returns the parent of the context node.
`../@UNITS`	Returns the `UNITS` attribute of the parent of the context node.
`PLANET[NAME]`	Returns the `<PLANET>` children of the context node that have `<NAME>` children.
`PLANET[NAME="Venus"]`	Returns the `<PLANET>` children of the context node that have `<NAME>` children with text equal to `"Venus"` .
`PLANET[@UNITS = "days"]`	Returns all `<PLANET>` children of the context node that have a `UNITS` attribute with the value `"days"` .
`PLANET[6][@UNITS = "days"]`	Returns the sixth `<PLANET>` child of the context node, only if that child has a `UNITS` attribute with the value `"days"` . Can also be written as `PLANET[@UNITS = "days"][6]` .
`PLANET[@COLOR and @UNITS]`	Returns all the `<PLANET>` children of the context node that have both a `COLOR` attribute and an `UNITS` attribute.

Here's an example in which I put the abbreviated syntax to work, moving up and down inside a <PLANET> element:

Listing ch13_15.xsl

 <?xml version="1.0"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">     <xsl:template match="PLANETS">         <HTML>             <xsl:apply-templates select="PLANET"/>         </HTML>     </xsl:template>     <xsl:template match="PLANET">         <xsl:apply-templates select="MASS"/>     </xsl:template>     <xsl:template match="MASS">         <xsl:value-of select="../NAME"/>         <xsl:value-of select="../DAY"/>         <xsl:value-of select="."/>    </xsl:template> </xsl:stylesheet>