What s New in XPath 2.0?

What's New in XPath 2.0?

As of this writing, XPath 2.0 is still in Working Draft form, but it's now stabilized, giving us the chance to work with it. XPath 2.0 is described this way by W3Cjust as you'd describe XPath 1.0, in fact:

"The primary purpose of XPath is to address parts of an XML document. XPath uses a compact, non-XML syntax to facilitate use of XPath within URIs and XML attribute values. XPath gets its name from its use of a path notation as in URLs for navigating through the hierarchical structure of an XML document."

Although the primary purpose of XPath hasn't changed in this new version, much of the actual specification has. You'll still be able to use the familiar path steps, each made up of an axis (XPath 2.0 uses the same axes as XPath 1.0), followed by a node test, followed by a predicate. However, much of the terminology has changed, along with some basic conceptsfor example, XPath supports sequences instead of node-sets . We're going to see how all this works in detail over the next few chapters.

XPath 2.0, XQuery 1.0, and XSLT 2.0 are all tied together, and XPath 2.0 is the common denominator. The W3C groups working on these standards have been working together closely. One way of looking at what's been going on is that XSLT 2.0 and XQuery 1.0 are designed to share as much as possibleand that what they share is in fact XPath 2.0.

So why XPath 2.0? What's it got that XPath 1.0 doesn't have? There are many answers, but one of the main ones is support for new data types. As you know, XPath 1.0 supports only these data types:

string
boolean
node-set
number

That was okay long ago, but things have changedin particular, W3C has been moving toward XML schema for its data types. Supporting new data types based on XML schema means that XPath 2.0 supports all the simple primitive types built into XML schema. There are 19 such types in all, including many that XPath 1.0 doesn't support, such as data types for dates, URIs, and so on.

The XPath 2.0 data model also supports data types that you can derive from these data types in your own XML schema. We're going to see how to work with these various types ourselves .

XPath 2.0 also gives you tremendously more power than XPath 1.0 did. There are dozens of new built-in functions that you can use now, and many more operators. These functions and operators are far more type-aware than what we've seen in XPath 1.0. We'll take a look in this chapter at how these new operators and functions can simplify tasks that were difficult in XPath 1.0.

XML SCHEMA

If you're not familiar with XML schema, you can get all the details at http://www.w3.org/TR/xmlschema-0/, http://www.w3.org/TR/xmlschema-1/, and http://www.w3.org/TR/xmlschema-2/. Another good resource is the book Sams Teach Yourself XML in 21 Days (ISBN: 0672325764).

Also new in XPath 2.0 are sequences , which replace the familiar node-sets from XPath 1.0. In fact, all XPath 2.0 expressions evaluate to sequences, as we're going to see. And you can also use variables in XPath 2.0.

The current working draft for XPath 2.0 is at http://www.w3.org/TR/xpath20/. This document tells you about XPath 2.0 in some detail, but it doesn't provide the whole story. In addition, there are documents outlining the XPath 2.0 data modelwhich tells you how XPath 2.0 sees an XML documentthe data types used in XPath 2.0, and the functions and operators available. Here's the list:

The XPath 2.0 specification is at http://www.w3.org/TR/xpath20/.
The XPath data model defines the information in an XML document that is available to an XPath processor. The data model is defined in the XQuery 1.0 and XPath 2.0 Data Model document at http://www.w3.org/TR/xpath-datamodel/.
The library of functions and operators supported by XPath 2.0 is defined in the XQuery 1.0 and XPath 2.0 Functions and Operators document, which is at http://www.w3.org/TR/xquery-operators/.
The type system used in XPath 2.0 is based on XML Schema, which you can read all about at www.w3.org/TR/xmlschema-0/, www.w3.org/TR/xmlschema-1/, and www.w3.org/TR/xmlschema-2/. The types defined in XML schema can be found in www.w3.org/TR/xmlschema-2/.
The formal semantics of XPath 2.0 are defined in the XQuery 1.0 and XPath 2.0 Formal Semantics document. This document is useful for programmers creating XPath processors, and you can find it at http://www.w3.org/TR/xquery-semantics/.

You still create location paths in XPath 2.0, of course, and build them from location steps. A location step, as in XPath 1.0, can contain an axis, a node test, and a predicate. The allowable axes are the same as in XPath 1.0.However, there are differences alreadythe namespace axis is considered deprecated in XPath 2.0, which means it's considered obsolete. It's included for backward compatibility, but is not available at all in XQuery 1.0.

Handling Nodes

Although the data types have changed, the node kinds are more or less the same in XPath 2.0 compared to XPath 1.0. As you recall, you can have these kinds of nodes in XPath 1.0: root nodes, element nodes, attribute nodes, processing instruction nodes, comment nodes, text nodes, and namespace nodes. There is one difference in XPath 2.0, howeverroot nodes are now called document nodes instead, ending a long-standing confusion.

Handling Data Types

As also mentioned, one of the main motivations behind XPath 2.0 was to expand the data types available. XPath 1.0 supported Booleans, node-sets, strings, and numbers , but that was pretty basic. XPath 2.0 supports all the primitive simple types built into XML schema, as well as the types you can derive by restriction from the primitive simple types, which gives you a great deal more control over data typing. Here are the simple primitive typesthe xs namespace corresponds to "http://www.w3.org/2001/XMLSchema":

xs:string
xs:boolean
xs:decimal
xs:float
xs:double
xs:duration
xs:dateTime
xs:time
xs:date
xs:gYearMonth
xs:gYear
xs:gMonthDay
xs:gDay
xs:gMonth
xs:hexBinary
xs:base64Binary
xs:anyURI
xs:QName
xs:NOTATION

Besides these types, you can also use types derived from primitive simple types by restriction, as we'll see when we discuss the data model in depth in this chapter, after this overview. Collectively, these simple primitive types and the types derived from primitive simple types by restriction are called atomic types . And XPath 2.0 sequences can contain both atomic types and nodes.

Working with Sequences

Every XPath 2.0 expression (that is, anything an XPath processor can evaluate, including expressions that return nodes from a document or string values and so on) evaluates to a sequence. Here's the XPath 2.0 definition of a sequence:

A sequence is an ordered collection of zero or more items.
An item is either an atomic value or a node.
An atomic value is a value in the value space of an XML Schema atomic type, as defined in the XML Schema specification. Atomic values can either be simple primitive types, or be derived by restriction from these types, as we'll discuss in this chapter.
A node is one of the seven node kinds described in the XQuery 1.0 and XPath 2.0 Data Model document.
A sequence containing exactly one item is called a singleton sequence . An item is identical to a singleton sequence containing that item.
A sequence containing zero items is called an empty sequence .

Sequences can contain nodes or atomic values. As we've seen, an atomic value is a value of one of the 19 built-in simple primitive data types defined in the XML schema specification, or a type derived from them by restriction.

Sequences are the successor to node-setsbesides nodes, they also let you work with simple data items. The term "sequence" is really a catch-all way to refer to data you can work with in XPath 2.0, either an atomic value or a node, or a collection of such items. Sequences can be made up of a single item or multiple items; it's all the same to XPath 2.0. Giving them one name, sequence is an easy way to let you handle single or multiple items (even though the term "sequence" is not very apt for single items).

Sequences can be constructed with this kind of syntax : (1, 2, 3), which is a sequence of the atomic values 1, 2, and 3. In fact, the comma is an operator in XPath 2.0the sequence construction operator. You can also extract items from a sequence using the [] operator. Here's an example:

 (4, 5, 6)[2]

This expression returns the value 5. You can also use the range operator, to , to create sequences, as in this example:

 (1 to 1000)

Note that you cannot nest sequencesthat is, if you have a sequence (1, 2) and then try to nest that in another sequence as ((1, 2), 3), the result is simply the sequence (1, 2, 3).

Sequences are also ordered , which is different from node-sets in XPath 1.0. For example, take a look at this sequence:

 (//planet/mass, //planet/name)

Here, we're creating a sequence in which <mass> elements from our planetary data XML document come before <name> elementswhich is the opposite of the way these elements appear in actual document order. But the order of these elements as we've specified them is preserved in the sequence we're creating here.

Here's another way in which XPath 2.0 differs from 1.0sequences, unlike node-sets, can have duplicate items. For example, take a look at this sequence:

 (//planet/mass, //planet/name, //planet/mass)

ORDERED VERSUS UNORDERED SEQUENCES

Here's something to know behind the scenes about sequences versus node-sets. W3C wanted to make life a little easier for people moving from XPath 1.0 to 2.0, so the way sequences are constructed is designed to be somewhat node-set friendly.

Although node-sets are unordered, node-sets are usually constructed in document order. XSLT 2.0 is designed to work on sequences in sequence order, but in order to be compatible with XPath 1.0, path expressions are designed to always return their results using document order by default.

Also, duplicates are removed from the results by default, which means the sequence you get from a path expression is usually going to be the same as the node-set you'd get.

Here, we're creating a sequence of all <mass> elements, followed by all <name> elementsfollowed by all <mass> elements again. This is legal in sequences, but not in node-sets. (In fact, the very definition of XPath 1.0 node-sets precludes duplicate items.)

So that's what sequences are all about in generalinstead of only supporting one multiple-item construct, the node-set, XPath 2.0 supports sequences, which can contain multiple simple-typed data items as well as nodes.

The `for` Expression

Sequences are more than just a new conceptXPath 2.0 is really centered around them. There are whole new expressions designed to work with sequences, such as the for expression. This expression is designed to let you handle sequences by looping, or iterating, over all items in a sequence.

We'll meet the for expression in Chapter 8, "XPath 2.0 Expressions and Operators," but here's a preview that also puts XPath 2.0 variables to work. Say that you wanted to find the average planetary mass in our planets example. Doing that with the for expression is easyhere's what that might look like:

 for $variable in /planets/planet return $variable/mass

Notice what we're doing herewe're using the for expression to loop over all <mass> values. We do that with a variable, something new for us in XPath, named $variable . Variables in XPath 2.0 start with a $ preceding a normal XML-legal name, so you can use any legal XML name here, like $var , $numberProducts , $name , and so on.

We're using the path expression /planets/planet to return a sequence holding all <planet> elements in the document. How do we return the <mass> elements of these <planet> elements in a sequence? We can use the return keyword, as you see here. In this case, the expression we want to return each time through the loop is $variable/mass , and because $variable holds a new <planet> element each time through the loop, we'll get a sequence of all <mass> elements this way.

To get the average mass of the planets, you could use the avg function this way:

 avg(for $variable in /planets/planet return $variable/mass)

Note that you could also write our for expression as

 for $variable in /planets/planet/mass return $variable

This does the same thing that the expression /planets/planet/mass doesit returns a sequence of <mass> elements. Here's another example, where we're multiplying the miles per gallon of a number of cars by their fuel capacity to get their total operating ranges:

 for $variable in /cars return $variable/milesPerGallon * $variable/gasCapacity

That's how the for expression works in general, like this:

 for  variable  in  sequence  return  expression

The `if` Expression

Besides the for expression, you can now use the conditional if expression in XPath 2.0. Being able to use conditional expressions like if and loop expressions like for in XPath adds a lot of the programming power of true programming languages to XPath 2.0.

Here's an example of an if expression, which finds the minimum of two temperatures (which you can also do with the XPath 2.0 min function):

 if ($temperature1 < $temperature2) then $temperature1 else $temperature2

Here, we're comparing the value in $temperature1 to the value in $temperature2 . If $temperature1 holds a value that is less than the value in $temperature2 , this if expression returns the value in $temperature1 ; otherwise , it returns to the value in $temperature2 .

This has the feel of a true programming language, and there's a lot more power here than in XPath 1.0. Now, you're allowed to branch to different expressions based on a test expression. More on the if statement in Chapter 8.

The `some` and `every` Expressions

You can use a rudimentary form of a conditional expression in XPath 1.0for example, this expression, as used in a predicate:

 /planets/planet[1]/name = "Mars"

would be true if any <name> element in the first <planet> element had the text "Mars".

XPath 2.0 extends this kind of checking. You can either perform the same test in XPath 2.0 using the same syntax, or you can use the some expression. Using the some expression means that at least one item in a sequence satisfies an expression given with a satisifies predicate, like this:

 some $variable in /planets/planet[1]/name satisfies $variable = "Mars"

In this case, this expression returns true if at least one <name> element in the first <planet> element has the text "Mars".

You can also perform other kinds of tests here, such as this expression, which is true if any <radius> element in the first <planet> element contains a value greater than 2000:

 some $variable in /planets/planet[1]/radius satisfies $variable > 2000

You can also insist that every <radius> element in the first <planet> element contains a value greater than 2000 if you use the every expression instead of some , like this:

 every $variable in /planets/planet[1]/radius satisfies $variable > 2000

More about some and every in Chapter 8.

Unions, Intersections, and Differences

In XPath 1.0, you could use the operator to create the union (that is, the combination) of two sets, as in this case, where we're matching attributes and nodes in an XSLT template:

 <?xml version="1.0" encoding="utf-8"?> <xsl:stylesheet version="1.1" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output method="xml"/>   <xsl:template match="distance[preceding::*/name='Mercury']">     <distance>This planet is farther than Mercury from the sun.</distance>   </xsl:template>  <xsl:template match="@*node()">  <xsl:copy>       <xsl:apply-templates select="@*node()"/>     </xsl:copy>   </xsl:template> </xsl:stylesheet>

In XPath 2.0, you can create not only unions like this, but also intersections , which contain all the items two sequences have in common, and differences , which contain all the items that two sequences have that are not in common.

Let's take a look at how this works. For example, to get the same result as the previous XPath 1.0 example, we can use the union operator in XPath 2.0:

 <?xml version="1.0" encoding="utf-8"?> <xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output method="xml"/>   <xsl:template match="distance[preceding::*/name='Mercury']">     <distance>This planet is farther than Mercury from the sun.</distance>   </xsl:template>  <xsl:template match="@* union node()">  <xsl:copy>       <xsl:apply-templates select="@* union node()"/>     </xsl:copy>   </xsl:template> </xsl:stylesheet>

In addition, XPath 2.0 introduces the intersect operator, which returns the intersection of two sequences (that is, all those items they have in common). For example, if the variable $planets holds a sequence of <planet> elements, we could create a sequence of <planet> elements that $variable has in common with the planets in our planetary data document, like this:

 $planets intersect /planets/planet

To find the difference between two sequences, you can use the except operator. For example, if you wanted to find all items in $planets that were not also in the sequence returned by /planets/planet , you could use except this way:

 $planets except /planets/planet

Here's something else that's new in XPath 2.0you can now specify multiple node tests in location steps. Here's an example:

 planets/(massday)/text()

Here's what that would look like in XPath 1.0:

 planets/mass/text()  planets/day/text()

And, as already mentioned, there are many new functions coming up in XPath 2.0, and we're going to see them in the final four chapters of this book. One of the specific tasks that W3C undertook in XPath 2.0 was to augment its string-processing capabilities. Accordingly, you'll find more string functions in XPath 2.0, including upper-case , lower-case , string-pad , matches , replace , and tokenize .

Note in particular the matches , replace , and tokenize functionsthese functions use regular expressions , a powerful new addition to XPath. As we'll discuss in Chapter 10, "XPath 2.0 String Functions," regular expressions let you create patterns to use in matching text. Regular expression patterns use their own syntaxfor example, the pattern \d{3}-\d{3}-\d{4} matches U.S. phone numbers, like 888-555-1111. Being able to use regular expressions like this is very powerful because you can match the text in a node to the patterns you're searching for.

Comments

You can also create XPath 2.0 comments using the delimiters (: and :) . Here's an example:

  (: Check for at least one planet with the name Mars :)  some $variable in /planets/planet[1]/name satisfies $variable = "Mars"

Comments may be nested.

That completes our XPath 2.0 overviewnow you've gotten an idea of the kinds of things that are different in XPath 2.0. Besides what we've seen in these few examples, there are plenty of additional new expressions coming up, such as cast , treat , and instance of . We'll get all the details in the coming chapters. In the meantime, how about a few working examples?