XPath Basics | Using XML with Legacy Business Applications

XPath is a language for identifying nodes in an XML document tree. XPath, unlike most of the other work that the W3C has done on XML, is not itself an XML-based language. However, its compact syntax makes it easy to use for things such as Attribute values within languages that are XML based, such as XSLT. XSLT primarily uses XPath in specifying values for the match and select Attributes of various Elements.

One of the fundamental concepts of XPath is the expression . An expression generally evaluates to a specific node or group of nodes (the node-set mentioned earlier) in a document tree. However, it can evaluate to other things as we'll see later in this chapter.

The form of XPath expression most commonly used in XSLT is known as a location path . A simplified syntax is shown in the following BNF productions , based on Section 2 of the XPath Recommendation.

Basic Syntax for a Location Path Expression

 LocationPath ::= RelativePath  AbsolutePath AbsolutePath ::= '/' RelativePath? RelativePath ::= Step  RelativePath '/' Step

As you can see, the main difference between an absolute path and a relative path is that the absolute path starts at the document root with the '/' forward slash character. A relative path is always stated in relation to a context node . In XSLT, the current node usually evaluates to or is based on the context node.

There are full and abbreviated forms of path names. I'll mention the full names here and there, but for the most part I prefer the abbreviated names since they are usually more readable.

Location paths in their simplest form very much resemble and are analogous to UNIX directory paths. Take the following little XML document as an example:

Location Path Example 1

 <?xml version="1.0" encoding="UTF-8"?> <LocationPathExample>   <Grandfather Name="Bob">     <Father Name="Junior">       <Son Name="Trey"/>     </Father>   </Grandfather> </LocationPathExample>

There are several points to note about this example.

The /LocationPathExample expression points to the root document Element LocationPathExample. This is like having a directory named Location Path Example directly under the root of a UNIX file system.
The /LocationPathExample/Grandfather expression points to Bob.
The /LocationPathExample/Grandfather/Father expression points to Bob's son, Junior.
The /LocationPathExample/Grandfather/Father/Son expression points to Junior's son, Trey (Robert III in some circles, Bubba in others).

These are all akin to typing cd from a UNIX shell command line to set your present working directory to a specific directory. Relative paths work in a similar fashion.

If the context node is Grandfather, we use the relative path Father to point to Junior. This is like typing:

 cd Father

instead of:

 cd /LocationPathExample/Grandfather/Father

In a similar fashion, if the context node is Father, we can use a relative path of Son rather than /LocationPathExample/Grandfather/Father/Son.

The real key to understanding location paths is to understand the location step used in the third production. A location step has the following simplified form, based on the syntax from Section 2 of the XPath Recommendation.

Basic Syntax for a Location Step

 Step ::= AxisSpecifier? NodeTest Predicate* AxisSpecifier ::= AxisName '::'  '@'? AxisName ::= A set of twelve names such as "child", "ancestor",    "attribute" NodeTest ::= NameTest  NodeType '(' ')' NameTest ::= '*'  NCName ':' '*"  Qname NodeType ::= 'comment'  'text'  'processing-instruction'      'node' Predicate ::= '[' Expr ']' Expr ::= a logical expression, a variable reference, a literal,     a number, a function call, or another expression

Let's look at the three major components of a step and then see what happens when they are combined.

An axis specifier basically tells you which direction to go in the tree relative to where you are now, that is, the context node. It is similar to an X-Y-Z coordinate system in geometry but applied to a two-dimensional tree. Giving an XSLT processor an axis specifier in a location step is kind of like an air traffic controller telling a pilot on landing approach to make a 90-degree turn and descend at a 10-degree glide slope. An axis specifier can have a long name like descendent -or-self or an abbreviation like //. The axis specifier that we'll by far use the most is child . It is so commonly used that it is the default axis. The axis specifier can in fact be omitted, and that is why, unlike the EBNF in the XPath Recommendation, I show the axis specifier as optional in the production above. Even though child is the most frequently used axis, be confident that there are specifiers that let you move in a single step from the context node to any adjacent node in the tree. Once you get there, you can go to any of its adjacent nodes, and so on.

A node test lets us test for a type of node (not very common in our applications but certainly possible) or a node with a particular name (very common). The name can be a wildcard, a namespace followed by a wildcard, a name with a namespace prefix, or an unqualified local name. In our applications we'll mostly use just a simple, unqualified Element or Attribute name. You can certainly include a namespace prefix, but if people design their documents in a user -friendly way you won't need namespace prefixes on Elements or Attributes in instance documents. So, you won't need them in location path expressions that find things in those instance documents. According to the BNF, the node test is really the only part of a location step that is required. However, later in this chapter we will see some special cases where it isn't used.

Expressions, used as the predicate of a location step, are perhaps the most complex part of XPath. They include such things as positional references along the current axis, various types of Boolean tests, and a built-in function library. The good news is that we don't have to use predicates very much in our types of applications. I will introduce you to a few of the most useful. The clever and creative will no doubt find uses for several others.

Working together in a location step, the axis sets a direction and the node test tells the processor what to look for in that direction. If the processor might find more than one node and instead return a node-set, we can use a predicate to filter nodes out of the node-set and narrow it down to just the node or nodes we're interested in. When we string location steps together in a relative or absolute path the results are cumulative. A single step evaluates to one node-set; it is used as a set of context nodes for the next step that yields another set of nodes, and so on.

Let's look at a few examples to help solidify these concepts. We'll add another Father, Billy, to the location path example we just discussed.

Location Path Example 2

 <?xml version="1.0" encoding="UTF-8"?> <LocationPathExample>   <Grandfather Name="Bob">     <Father Name="Junior">       <Son Name="Trey"/>     </Father>     <Father Name="Billy">       <Son Name="Willy"/>       <Son Name="Billy Joe"/>       <Son Name="Billy Bob"/>     </Father>   </Grandfather> </LocationPathExample>

The simplest type of location step has only the implied child axis specifier and a node test. The absolute location path expression /LocationPathExample has one step following the initial slash (/). This path specifies the child Element named LocationPathExample of the document root.

For a bit more complex example, let's use an axis specifier other than child. We'll assume that our current context node is the Son Element with a Name Attribute of Willy. Using the axis specifier of parent with the abbreviated form "..", we create a two-step location path of "../Son". The first step of ".." has only the axis specifier, taking us to the parent Element Father with the name Attribute of Billy. The next step specifies the Element children named Son along the implied child axis. This yields a node-set of the brothers Willy, Billy Joe, and Billy Bob.

To show how a predicate is used to filter nodes, let's assume that we didn't necessarily know which of the brothers was our current context node but that we wanted to point to Billy Bob. We add a node test to specify a Name Attribute with a value of "Billy Bob". So, adding this to second location step of the previous example, we have a relative location path of "../Son[@Name='Billy Bob']".

So, those are the basics of XPath. There are, of course, exceptions to the generalities presented here. Explaining them is one of the reasons those other books are so long. The XPath Recommendation is also very specific about explaining exceptions, so use it when you can't quite figure out why something isn't working as you thought it would.

Now that we understand the basics of XPath, let's look at how XSLT uses it. XSLT allows use of the full XPath expression syntax in the select Attribute of several XSLT Elements. A restricted form of expression known as a pattern is used in the match Attribute of xsl:template. A pattern is the same as a location path expression except that in the location steps the only axes that are allowed are child, attribute, and the // abbreviation for descendent-or-self. Two or more of these restricted location paths may be logically ORed together with the pipe symbol () to form a more complex pattern. In effect, the set of patterns that might be used in a particular document is a subset of the set of location path expressions that might be used.

You can't use XPath in a match or select Attribute without somehow applying a template. The various ways to organize and call templates can have an impact on how we construct our XPath expressions. We'll look at some of these next.