Introducing XPointers | XPath Kick Start: Navigating XML with XPath 1.0 and 2.0

XPointers are designed to let you point to specific locations inside a document. There isn't much software that supports XPointers currently, although some does. For example, the Adobe Scalable Vector Graphics (SVG) browser plug-in (http://www.adobe.com/svg/viewer/install/main.html) does, as well as Amaya (http://www.w3.org/Amaya/) and an application named XLip by Fujitsu.

The XPointer specification was split into parts to make it easier to implement. The XPointer specification is now divided into three recommendations and a working draft:

http://www.w3.org/TR/xptr-framework/ The XPointer framework, which gives a general overview and points you to the other three schemes
http://www.w3.org/TR/xptr-element/ The element scheme
http://www.w3.org/TR/xptr-xmlns/ The namespace scheme
http://www.w3.org/TR/xptr-xpointer/ The general XPointer scheme

The XPointer framework specification introduces the idea of XPointers and indicates how you can use barenames (that is, simple element names ) as XPointers. And it points to the other three parts of the specification that you can use in XPointersthe element scheme, the namespace scheme, and the general XPointer scheme. We'll take a look at four of these ways of creating XPointers here, starting with barenames.

Using Barenames

The XPointer Framework specification (www.w3.org/TR/xptr-framework/) says that you can use barenamesthat is, just the names of elementsas XPointers. You can append an XPointer to the end of a URI in an XLink by preceding it with a # , as here, where we're pointing at the <data> element in http://www.XPathCorp.com/jamesbond.xml :

 <insurance xmlns:xlink = "http://www.w3.org/1999/xlink"     xlink:type = "simple"     xlink:show = "new"  xlink:href="http://www.XPathCorp.com/jamesbond.xml#data">  Health Insurance </insurance>

Besides using barenames like this, you can also use the element, namespace, and general XPointer schemes. They're coming up next .

Using the Element Scheme

The element scheme (www.w3.org/TR/xptr-element/) was split out of the general XPointer scheme to make XPointer easier to implement. Here, you use element() to identify elements by ID, not by name . For example, to find the element with the ID data , you could use this expression:

 <insurance xmlns:xlink = "http://www.w3.org/1999/xlink"     xlink:type = "simple"     xlink:show = "new"  xlink:href="http://www.XPathCorp.com/jamesbond.xml#element(data)">  Health Insurance </insurance>

You can also specify child sequences by number; for example, to pick out the <data> element's third child element, and then to identify that element's first child element, you can use this XPath-like expression:

 <insurance xmlns:xlink = "http://www.w3.org/1999/xlink"     xlink:type = "simple"     xlink:show = "new"  xlink:href="http://www.XPathCorp.com/jamesbond.xml#element(data/3/1)">  Health Insurance </insurance>

In other words, the element scheme lets you specify an element by ID, and you can also add location steps, using numbers , to access child elements.

Using the Namespace Scheme

You can use the namespace scheme (see www.w3.org/TR/xptr-xmlns/) to use namespaces when pointing to data. For example, if the <invoice> element you wanted to access was part of the job namespace, you could specify that element this way:

 <insurance xmlns:xlink = "http://www.w3.org/1999/xlink"     xlink:type = "simple"     xlink:show = "new"  xlink:href="http://www.XPathCorp.com/adjunct.xml#xmlns(job=   "http:/XPathCorp.com/job:invoice">  Health Insurance </insurance>

This XPointer accesses <job:invoice> in the document http://www.XPathCorp.com/adjunct.xml.

Using the General XPointer Scheme

Besides using the element and namespace schemes, you can also use the general XPointer scheme. The element and namespace schemes were added to XPointer in an attempt to make XPointer easier to use, but you can still create general XPointers.

The original form of XPointers is still in working draft form as of this writing (see www.w3.org/TR/xptr-xpointer/). This is actually where the real core of XPointer lies, because you can use full XPath expressions to point to exactly what you want. (In fact, as we're going to see, the general XPointer scheme extends XPath.) Here's an example using a general XPointernote that you use the xpointer() function to contain the XPath expression:

 <insurance xmlns:xlink = "http://www.w3.org/1999/xlink"     xlink:type = "simple"     xlink:show = "new"  xlink:href="http://www.XPathCorp.com/invoices.xml#xpointer(   /child::*[5]/child::*[last()])">  Health Insurance </insurance>

In this example, we're accessing the last child of the fifth element in www.XPathCorp.com/invoices.xml . That's the way you use full XPath expressions with general XPointersyou pass them to the xpointer function.

You can use the same axes as you use in XPath 1.0 in XPointers, but there are two new node tests. Here are the node tests you can use with XPointers:

* Any element
node() Any node
text() A text node
comment() A comment node
processing-instruction() A processing instruction node
point() A point in a resource
range() A range in a resource

Note the point and range node tests. A point represents one specific location in a document, and a range is made up of everything between two points. To support points and ranges, the general XPointer scheme extends the concept of nodes to locations . Locations are an XPath node, a point, or a range. Node sets become location sets in the XPointer specification. We'll take a look at working with points and ranges next.

Creating XPointer Points

You can create an XPointer point with two itemsa node and an index, which can hold a positive integer or zero. The node sets an origin for the point, and the index specifies the distance between the point and that origin. What units are used for the index? There are two different ways of measuring the index: you can measure in terms of characters , or in terms of a number of nodes.

If the starting node can contain only text not child nodesthe index is automatically measured in characters. The points you create this way are called character-points . Here, the index must be a positive integer or zero.

For example, you might treat <text> as a container node in this case:

 <text> Hello there! </text>

Here, there are twelve character-points, one before every character. The character-point at index zero is right before the first character, "H"; the character-point at index 1 is just before the "e", and so on.

On the other hand, when the start node has child nodes that is, when it's an element node or the root nodethe index of a point is measured in child nodes. For example, an index of zero means the point is just before any child nodes. An index of 5 specifies a point immediately after the fifth child node.

How do you actually create points? You can use the point() function with an XPath 1.0 predicate like this: point()[position()=9] . For example, if you wanted to locate a point right before the "l" in the text "Goldfinger", where that text is in the <name> element of the first <review> element in the <reviews> element, you might do something like this:

 xpointer(/reviews/review[1]/name/text()/point()[position() = 2])

Creating XPointer Ranges

To create a range, all you need is two points, a start point and an end point. They have to be in the same document, and, as you might expect, the start point must be before or the same as the end point.

COLLAPSED RANGES

If the start point and the end point are the same point, the range you create is a collapsed range.

There are a few functions that were added to XPointer to create ranges:

range( location- set ) takes the locations you pass to it and returns a range that completely covers the locations.
range-inside( location-set ) returns a range or ranges covering each location inside the location set; if you pass an element, the result is a range that encloses all that is inside the element.
range-to( location-set ) returns a range for each location in the location set.
string-range(location_set, string [index [, length]]) returns a range for every match to a search string.

For example, here's how you would use the string-range function to return a location set of ranges for all matches to the word "Goldfinger" throughout a document:

 string-range(/*, "Goldfinger")

SHOP TALK : SPLITTING THE XPOINTER SPECIFICATION

Why was the XPointer specification split into four partsone of which is still in working draft stage? As with some other specifications, I get the feeling that it looks as if it was simply too complex to get much use. In a rather rare disclosure of the inside story on this, take a look at http://www.w3.org/XML/2002/10/LinkingImplementations.html. Here's a quote from that document:

"The XPointer specification entered CR status 2000-06-07, then had a second CR 2001-09-11. During the second CR phase, several implementations were identified. Few, however, implemented the whole XPointer specification. Points and Ranges, the principal extensions beyond XPath, were rarely implemented. In early January of 2002, when it became clear that the XPointer specification would not move to PR, the XML Linking Working Group revisited the specification and began to factor it into separate documents."

In my experience, many people who might have used XPointers were not knowledgeable enough in terms of XPath 1.0 to implement them. It appears that W3C made things easier for such people by allowing for barenames and easier syntax. The general form of XPointers, which allows for the use of XPath 1.0, is still in working draft form, and it's beginning to look like it won't get past that stage.