Locating Nodes: Path Expressions

In XQuery, path expressions are used to locate nodes in XML data. XQuery's path expressions are derived from XPath 1.0 and are identical to the path expressions of XPath 2.0. The functionality of path expressions is closely related to the underlying data model. We start with a few examples that convey the intuition behind path expressions, then define how they operate in terms of the data model.

The most commonly used operators in path expressions locate nodes by identifying their location in the hierarchy of the tree. A path expression consists of a series of one or more steps , separated by a slash, / , or double slash, // . Every step evaluates to a sequence of nodes. For instance, consider the following expression:

 doc("books.xml")/bib/book

This expression opens books.xml using the doc() function and returns its document node, uses /bib to select the bib element at the top of the document, and uses /book to select the book elements within the bib element. This path expression contains three steps. The same books could have been found by the following query, which uses the double slash, // , to select all of the book elements contained in the document, regardless of the level at which they are found:

 doc("books.xml")//book

Predicates are Boolean conditions that select a subset of the nodes computed by a step expression. XQuery uses square brackets around predicates. For instance, the following query returns only authors for which last="Stevens" is true:

 doc("books.xml")/bib/book/author[last="Stevens"]

If a predicate contains a single numeric value, it is treated like a subscript. For instance, the following expression returns the first author of each book:

 doc("books.xml")/bib/book/author[1]

Note that the expression author[1] will be evaluated for each book. If you want the first author in the entire document, you can use parentheses to force the desired precedence:

 (doc("books.xml")/bib/book/author)[1]

Now let's explore how path expressions are evaluated in terms of the data model. The steps in a path expression are evaluated from left to right. The first step identifies a sequence of nodes using an input function, a variable that has been bound to a sequence of nodes, or a function that returns a sequence of nodes. Some XQuery implementations also allow a path expression to start with a / or // .

Such paths start with the root node of a document, but how this node is identified is implementation-defined. For each / in a path expression, XQuery evaluates the expression on the left-hand side and returns the resulting nodes in document order; if the result contains anything that is not a node, a type error is raised. After that, XQuery evaluates the expression on the right-hand side of the / once for each left-hand node, merging the results to produce a sequence of nodes in document order; if the result contains anything that is not a node, a type error is raised. When the right-hand expression is evaluated, the left-hand node for which it is being evaluated is known as the context node.

The step expressions that may occur on the right-hand side of a / are the following:

A NameTest , which selects element or attribute nodes based on their names . A simple string is interpreted as an element name ; we have already seen the NameTest bib , which evaluates to the bib elements that are children of the context node. If the name is prefixed by the @ character (pronounced "at"), then the NameTest evaluates to the attributes of the context node that have the specified name. For instance, doc("books.xml")/bib/book/@year returns the year attribute of each book. NameTest supports both namespaces and wildcards, which are discussed later in this section.
A KindTest , which selects processing instructions, comments, text nodes, or any node based on the type of the node. The KindTest used to select a given kind of node looks like a function with the same name as the type of the node: processing-instruction() , comment() , text(), and node().
An expression that uses an explicit "axis" together with a NameTest or KindTest to choose nodes with a specific structural relationship to the context node. If the NameTest book selects book elements, then child::book selects book elements that are children of the context node; descendant::book selects book elements that are descendants of the context node; attribute::book selects book attributes of the context node; self::book selects the context node if it is a book element, descendant-or-self::book selects the context node or any of its descendants if they are book elements, and parent::book selects the parent of the context node if it is a book element. Explicit axes are not frequently used in XQuery.
A PrimaryExpression , which may be a literal, a function call, a variable name, or a parenthetical expression. These are discussed in the next section of this tutorial.

Now let's apply what we have learned to the following expression:

 doc("books.xml")/bib/book[1]

Working from left to right, XQuery first evaluates the input function, doc("books.xml") , returning the document node, which becomes the context node for evaluating the expression on the right side of the first slash. This right-hand expression is bib , a NameTest that returns all elements named bib that are children of the context node. There is only one bib element, and it becomes the context node for evaluating the expression book , which first selects all book elements that are children of the context node and then filters them to return only the first book element.

Up to now, we have not defined the // operator in terms of the data model. The formal definition of this operator is somewhat complex; intuitively, the // operator is used to give access to all attributes and all descendants of the nodes in the left-hand expression, in document order. The expression doc("books.xml")//bib matches the bib element at the root of our sample document, doc("books.xml")//book matches all the book elements in the document, and doc("books.xml")//@year matches all the year attributes in the document. The // is formally defined using full axis notation: // is equivalent to /descendant-or-self::node()/.

For each node from the left-hand expression, the // operator takes the node itself, each attribute node, and each descendant node as a context node, then evaluates the right-hand expression. For instance, consider the following expression:

 doc("books.xml")/bib//author[1]

The first step returns the document node, the second step returns the bib element, the third stepwhich is not visible in the original queryevaluates descendant-or-self::node() to return the bib element and all nodes descended from it, and the fourth step selects the first author element for each context node from the third step. Since only book elements contain author elements, this means that the first author of each book will be returned.

In the examples we have shown so far, NameTest uses simple strings to represent names. NameTest also supports namespaces, which distinguish names from different vocabularies. Suppose we modify our sample data so that it represents titles with the title element from the Dublin Core, a standard set of elements for bibliographical data [DC]. The namespace URI for the Dublin Core is http://purl.org/dc/elements/1.1/. Here is an XML document containing one simple book, in which the title element is taken from Dublin Core:

 <book year="1994" xmlns:dcx="http://purl.org/dc/elements/1.1/">     <dcx:title>TCP/IP Illustrated</dcx:title>     <author><last>Stevens</last><first>W.</first></author> </book>

In this data, xmlns:dcx="http://purl.org/dc/elements/1.1/" declares the prefix "dcx" as a synonym for the full namespace, and the element name dcx:title uses the prefix to indicate this is a title element as defined in the Dublin Core. The following query finds Dublin Core titles:

 declare namespace dc="http://purl.org/dc/elements/1.1/" doc("books.xml")//dc:title

The first line declares the namespace dc as a synonym for the Dublin Core namespace. Note that the prefix used in the document differs from the prefix used in the query. In XQuery, the name used for comparisons consists of the namespace URI and the "local part," which is title for this element.

Wildcards allow queries to select elements or attributes without specifying their entire names. For instance, a query might want to return all the elements of a given book, without specifying each possible element by name. In XQuery, this can be done with the following query:

 doc("books.xml")//book[1]/*

The * wildcard matches any element, whether or not it is in a namespace. To match any attribute, use @* . To match any name in the namespace associated with the dc prefix, use dc:* . To match any title element, regardless of namespace, use *:title .