Understanding the XPath 1.0 Data Types | XPath Kick Start: Navigating XML with XPath 1.0 and 2.0

To let you work with the contents of an XML document, XPath lets you model the data in that document in a specific way called the data model . The data model specifies how XPath sees a document, and that's essential to know if you want XPath to find the data you're looking for.

For example, if you have an XML document and want to pick out all the <friend> elements, you need to know how XPath sees that document to be able to instruct it to do what you want. In this chapter, we're going to take a look at how XPath 1.0 sees the contents of an XML documentthat is, we'll be discussing the XPath 1.0 data model.

To handle the data in an XML document, XPath 1.0 lets you work with four different data types . For example, by defining a string data type, XPath lets you handle text strings in XML elements and work with them directly. For instance, if you use the XPath expression //planet[ name ="Venus"] , XPath will return all <planet> children in a document that have <name> children with text equal to "Venus". This works because XPath lets you work with text strings like "Venus". You can also work with numbers , like this: //planet[position()=3] , which lets you specify that you want the third <planet> element in the document.

There are other data types available besides strings and numbers that you can work with in XPath, and we'll take a look at all the allowable data types here. Then we'll be ready to use those data types with the XPath data model to create XPath expressions of the type XPath processors will be able to understand and use to extract data from XML documents.

Here are the data types in XPath 1.0 (XPath 2.0 adds many more data types, as we'll see in the second half of the book):

A number stored as a floating-point number
A string a sequence of characters
A Boolean a true or false value
A node-set an unordered collection of unique nodes

XPath expressions are the fundamental building blocks of XPath, and an XPath expression is anything XPath can evaluate to yield a result (which is not an error). For example, here's an XPath expression: //planet[position() > 3] . This expression returns a node-set containing all the <planet> elements in a document after the first three.

All XPath 1.0 expressions must evaluate to a value that is one of the four data typesnumber, string, Boolean, or node-set. For example, not only is //planet[position() > 3] an XPath expression (this expression results in a node-set), but so is position() (which results in a number)and so is 3 , all by itself, as well as position() > 3 (this expression yields a Boolean true/false value depending on whether the tested node's position is greater than three).

Let's take a look at all the allowed data types in more detail now.

Numbers

First of all, you can use numbers as XPath expressions. For example, in the XPath expression //planet[position()=7] (which you might use to match the seventh <planet> element in an XML document), the number 7 is a valid XPath expression, evaluating to itself.

The position() function also evaluates to a numberthe position of the current node among its sibling nodes. And there are other functions that evaluate to numbersfor example, the XPath 1.0 floor function returns the largest integer less than the argument you pass to it. That means that floor(4.6) would return a value of 4, for instance, so floor(4.6) is an XPath expression that evaluates to a number.

NUMBERS IN XPATH 1.0

In XPath 1.0, a number represents a floating-point number. A number can have any double-precision 64-bit format that conforms to IEEE 754. These include a special "Not-a-Number" (NaN) value, positive and negative infinity, and positive and negative zero.

All of which is to say that numbers are a valid data type in XPath 1.0you can use them directly, and expressions can be evaluated to yield a number.

Strings

XPath expressions can also be text strings (defined in XPath as "a sequence of zero or more characters," where the characters are Unicode characters by default). For example, in the XPath expression //planet[name="Mars"] , which returns all <planet> children in a document that have <name> children with text equal to "Mars", "Mars" is an XPath expression of data type string.

Here's another exampleif you have an XML element like this: <planet color = "RED">Mars</planet> , the XPath expression attribute::color would return the string "RED".

So as you can see, XPath expressions can also be of the string type.

Booleans

Besides numbers and strings, XPath expressions can also be Boolean true/false values. For example, take a look at the XPath expression position()=3 . The position() function returns the position of a node among its siblings, and if position() returns 3, the XPath expression position()=3 is true. Otherwise , the expression position()=3 is false.

Here's another examplein the XPath expression //planet[attribute::color = "RED"] , which returns all <planet> elements that have a color attribute with value of "RED", attribute::color = "RED" is an XPath expression that returns a Boolean value. In fact, in the expression //planet[attribute::color] , the expression attribute::color is itself a Boolean expression. It's true if the current <planet> element has a color attribute, but false otherwise.

Booleans, then, make up the third data type that XPath expressions can evaluate to, in addition to numbers and strings.

Node-Sets

The fourth data type, node-sets, is where all the excitement lies in XPath 1.0. A node-set holds zero or more nodes (note that a node-set might contain only a single node), and working with node-sets is what really lets you work with the data in an XML document.

For example, the XPath expression //planet[position() > 3] returns all the <planet> elements after the first three. That means you get a node-set of <planet> elements when you evaluate this expression. Node-sets are the most interesting data type because a node-set holds actual nodes from the XML document. For example, you can filter a set of nodes that you want to work with into your node-set, ignoring all the rest of the data in the XML document. And treating a whole collection of nodes as one single data itema node-setis very handy.

Here's another examplethe expression child::planet[attribute::color = "RED"] will return a node-set containing all <planet> children of the context node that have a color attribute with value of "RED".

Node-sets are data types that are unique to XPathyou may be familiar with strings, numbers, and Booleans already, but node-sets are where the real meat of XPath is. A node-set is really a collection, not just a single data item like a string or a number; a node-set can hold either a single node or multiple nodes, but either way it's still called a node-set.

DATA TYPES IN XPATH 2.0

The data types in XPath 1.0 are pretty primitivejust numbers, strings, Booleans, and node-sets. Augmenting these types was one of the big pushes behind XPath 2.0, which supports data types taken from XML schemas, as we're going to see in the second half of the book. Schemas support a great many data types, such as boolean , byte , date , dateTime , int , long , nonPositiveInteger , normalizedString , positiveInteger , short , unsignedByte , unsignedInt , unsignedLong , unsignedShort , and many more.

If an XPath expression returns a node-set containing multiple nodes, the XPath processor software will return all those nodes to you, as we've seen in the XPath Visualiser.

So what about the actual nodes in a node-set? What kinds of nodes can you have? That's where the data model comes in, and we're going to turn to that topic next .