11.1 XPointers on URLs | XML in a Nutshell, Third Edition

A URL that identifies a document looks something like http://java.sun.com:80/products/jndi/index.html . In this example, the scheme http tells you what protocol the application should use to retrieve the document. The authority, java.sun.com:80 in this example, tells you from which host the application should retrieve the document. The authority may also contain the port to connect to that host and the username and password to use. The path , /products/jndi/index.html in this example, tells you which file in which directory to ask the server for. This may not always be a real file in a real filesystem, but it should be a complete document that the server knows how to generate and return. You're already familiar with all of this, and XPointer doesn't change any of it.

You probably also know that some URLs contain fragment identifiers that point to a particular named anchor inside the document the URL locates . This is separated from the path by the octothorpe, # . For example, if we added the fragment download to the previous URL, it would become http://java.sun.com:80/products/jndi/index.html#download . When a web browser follows a link to this URL, it looks for a named anchor in the document at http://java.sun.com:80/products/jndi/index.html with the name download , such as this one:

 <a name="download"></a>

It would then scroll the browser window to the position in the document where the anchor with that name is found. This is a simple and straightforward system, and it works well for HTML's simple needs. However, it has one major drawback: to link to a particular point of a particular document, you must be able to modify the document to which you're linking in order to insert a named anchor at the point to which you want to link. XPointer endeavors to eliminate this restriction by allowing authors to specify where they want to link to using full XPath expressions as fragment identifiers. Furthermore, XPointer expands on XPath by providing operations to select particular points in or ranges of an XML document that do not necessarily coincide with any one node or set of nodes. For instance, an XPointer can describe the range of text currently selected by the mouse.

The most basic form of XPointer is simply an XPath expressionoften, although not necessarily, a location pathenclosed in the parentheses of xpointer() . For example, these are all acceptable XPointers:

 xpointer(/) xpointer(//first_name) xpointer(id('sec-intro')) xpointer(/people/person/name/first_name/text( )) xpointer(//middle_initial[position( )=1]/../first_name) xpointer(//profession[.="physicist"]) xpointer(/child::people/child::person[@index<4000]) xpointer(/child::people/child::person/attribute::id)

Not all of these XPointers necessarily refer to a single element. Depending on which document the XPointer is evaluated relative to, an XPointer may identify zero, one, or more than one node. Most commonly the nodes identified are elements, but they can also be attribute nodes or text nodes, as well as points or ranges.

If you're uncertain whether a given XPointer will locate something, you can back it up with an alternative XPointer. For example, this XPointer looks first for first_name elements. However, if it doesn't find any, it looks for last_name elements instead:

 xpointer(//first_name)xpointer(//last_name)

The last_name elements will be found only if there are no first_name elements. You can string as many of these XPointer parts together as you like. For example, this XPointer looks first for first_name elements. If it doesn't find any, it then seeks out last_name elements. If it doesn't find any of those, it looks for middle_initial elements. If it doesn't find any of those, it returns an empty node-set:

 xpointer(//first_name)xpointer(//last_name)xpointer(//middle_initial)

No special separator character or whitespace is required between the individual xpointer( ) parts, although whitespace is allowed. This XPointer means the same thing:

 xpointer(//first_name) xpointer(//last_name) xpointer(//middle_initial)