XPointer is the syntax you use for the most general addressing of parts of an XML object [XPointer]. When an HTTP URI references XML, any fragment specifier to select a portion of the XML is written in XPointer syntax. XPointer can also be called explicitly to extract a subset of data (see Chapter 19). Note that XPointer does not include any way to point into the DTD or XML declaration for a document. XPointer extends XPath so that you can use it in the following ways:
XPointer EncodingWhen an XPointer contains special characters that have significance to XPointer but that should be treated as data, a circumflex ("^") prefixes the special characters. One example would be an unbalanced parenthesis either "(" or ")" appearing in a literal string. Due to this use, circumflex itself must be considered a special character and occurrences of a literal circumflex doubled ("^^"). For example, xpointer( string-range ( /., "f)^" ) ) would be encoded as follows: xpointer( string-range ( /., "f^)^^") ) The preceding rule applies only to general XPointer encoding. An XPointer must also be URI encoded (see Section 7.1.4) if it is used in a URI and XML encoded if it appears in XML. So, for example, xpointer(string-range(//example,"André :-)")) would always be XPointer encoded for the unbalanced parenthesis in the "smiley" as follows: xpointer(string-range(//example,"André :-^)")) If it appeared as such in an XML attribute value delimited by double quotes and encoded in US-ASCII, you would escape the double quotes and the accented "é": xpointer(string-range(//example,"André :-^)")) If it appeared generally as a URI reference fragment specifier, you would encode the space, double quotes, circumflex, and accented "é": xpointer(string-range(//example,%22Andr%C3%A9%20:-%5E)%22) 7.3.1 Forms of XPointerXPointer has three possible forms:
Full XPointersA full XPointer can be complicated, but you will rarely encounter the full form. It consists of a sequence of one or more parts that can optionally be separated by white space. Each part has the following format: scheme(string-with-balanced-parentheses) The "xpointer" and "xmlns" schemes are defined below. The "string-with-balanced-parentheses" means any string where all parentheses both "(" and ")" are properly nested, except for those escaped with the circumflex ("^"), as described earlier. Full XPointer parts end with a close parenthesis that matches the open parenthesis after the scheme name. If multiple parts are present, they are evaluated from left to right until one succeeds. If all parts fail, then the full XPointer fails. As yet undefined schemes are permitted for future expansion. Encountering a scheme you do not understand is equivalent to a failure of that part. (This scheme-based system allows other data types than general XML to define their own schemes.) You use the "xmlns" scheme to set up the namespace context for XPointer expression evaluations occurring farther to the right. The parenthesized string immediately after this scheme must consist of a namespace prefix followed by a namespace URI. Do not use quotation marks around the URI. For example: xmlns(foo=http://bar.example/x) The "xmlns" scheme adds its namespace declaration to the namespace context and then "fails" so that evaluation proceeds to the next part of the full XPointer to its right. If two "xmlns" parts try to bind the same prefix, the one evaluated later (the one farther to the right) overrides the first. Those "xmlns" parts that attempt to bind the "xml" prefix are ignored. Instead, the prefix is always bound to "http://www.w3.org/XML/1998/namespace"; this binding is always part of the namespace context for evaluating XPointer expressions. The "xpointer" scheme does the obvious thing namely, it interprets the parenthesized string immediately after it as an XPointer expression. If it is evaluated without error and yields a nonempty location-set, then that result is the value of the entire full XPointer. You can use the ability to provide a sequence of XPointer parts for various purposes. The following example shows general fail-over from one XPointer expression to another. It finds all "foo" elements that don't have a foo descendant or, if no such foo elements are present, all "bar" elements that do not have any children. xpointer(//*foo[not(.//foo)])xpointer(//*bar[not(./*)] Bare NamesIn comparison with a full XPointer, it is difficult to get much simpler than a bare name. A bare name is pretty much like it sounds just a token. It refers to the element with that token as an ID. In other words, the bare name fragment specifier #foo is the same as the full XPointer fragment specifier #xpointer(id("foo")) Child SequencesA child sequence consists of a series of one or more decimal numbers preceded by a slash ("/") and separated by slashes. No white space is permitted. The sequence may be optionally prefixed with a bare name. Two example fragment specifiers are shown here: #/2/7/18/2/8 #pi/3/14/ Such sequences can only locate elements. They do so by using each number to index into the children of the element found by the previous step. The starting point is the root element, if the child sequence starts with a slash, or the element that the name as an ID specifies (if it starts with a name). The preceding examples are therefore equivalent to the following: #xpointer(/*[2]/*[7]/*[18]/*[2]/*[8]) #xpointer(id("pi")/*[3]/*[14]) 7.3.2 The XPath ExtensionsXPath deals only with nodes. XPointer extends XPath, however, so that it can handle more general locations. The locations permitted include a pointer into the middle of text as well as more general ranges, such as might result from a user clicking and dragging on a screen display of XML to include parts of two elements with different parents. In summary, the extensions to XPath have the following effects:
Location Extension: PointXPointer adds the "point" type to XPath, defined as follows:
You need to be careful about thinking of a "point" as just a location in the external representation of XML. For example, consider "<a>xyz</a>". It is an element node with a child text node. The point using this element as container and index 1 is the point just after the text node. The point using the text node as a container and index 3 is the point just after the last character in the text. Although the two are different points, a poorly designed user interface might display them indistinguishably on a computer screen. A point location does not have an expanded name. It does have a null string value. The XPath set of node tests is extended to include "point( )" so that points can be selected from a location-set. The axes of a point are location-sets defined as follows:
Location Extension: RangeXPointer adds to XPath the "range" type. A range is simply defined as two points: the start point and the end point of the range. The start point must not follow the end point, and both must appear in the same XML document. The range represents the XML content and structure between its points. If the container node of one point of a range is an element, text, or root, then the container node of the other point must also be one of these three types. If the container node of one such point is any other type, then both the start and end point must reside within the same node. For example, you can have a range that appears within the string value of a processing instruction, where both points have the processing instruction as their container node. Alternatively, for a range from a processing instruction to (and including) an immediately following element, the points of the range might have as their container nodes the parents of the processing instruction and element. You could not, however, have a range from inside the text content of a processing instruction to inside the text content of a following element. A range with the same start and end point is called a collapsed range. A range location does not have an expanded name. The string value of a range depends on the nature of its points. If both are character-points in the same container node, the string value is just as you would expect the characters between the start and end points. Otherwise, the string value consists of the characters in text nodes for which the character is found after the start point and before the end point. For example, in <a>1#23<b attribute='value'>foo</b>xy#z</a> the string value of a range from just after the first octothorpe ("#") to just before the second would simply be 23fooxy In the same example, the string value of the range from just before element "a" to just after element "a" is 1#23fooxy#z The XPath set of node tests is extended to include "range( )" so that ranges can be selected from a location-set. The axes of a range are the same as the axes of the start point of that range. Covering RangesXPointer defines the concept of a covering range. A covering range that encompasses any type of location can be found as follows:
Document OrderXPointer extends the XPath concept of "document order" to include points and ranges. First, a "preceding node" is defined for all points as follows:
Using these definitions, you can find document orderings that XPath does not specify:
Initialization of Evaluation ContextThe evaluation of XPointer expressions occurs in the same way as the evaluation of XPath expressions, albeit with a few changes:
7.3.3 XPointer FunctionsThe following functions have been added to the core XPath function library for the evaluation of XPointer expressions. In this section, the function name appears in boldface, preceded by the data type of the result in italics. Parameters are represented by their data type in italics. Parameters are followed by a question mark when they are optional.
location-set end-point (location-set) The result is a point for each location in the input as specified by the following rules:
location-set here( ) This function fails if the XPointer where it appears is not in XML. If it is in XML, then the function returns a location-set with a single member. If the XPointer expression being evaluated occurred in a text node, then the function returns the parent element. Otherwise, it returns the node containing the XPointer, presumably an attribute or processing instruction node. (When an XPointer occurs as element content, it isn't actually in that element but rather appears in a text child of that element.) location-set origin( ) This function provides addressing relative to the origin of the link traversed to reach the document containing the XPointer. It returns a location-set with a single member the element from which the traversal was initiated. An error occurs if you invoke this function where no such traversal has occurred or the document from which traversal occurred is not XML. You cannot use this function in a URI reference fragment identifier where a URI is also provided, unless that URI identifies the same resource from which the traversal was initiated. See [Xlink] for more information on traversal. location-set range (location-set) This function returns the ranges covering all items in the input. A covering range is added to the output for each member of the input. location-set range-inside (location-set) This function returns the ranges covering the contents of all items in the input. For every input item that is a range or point, that range (or the collapsed range of the point) is added. For all other types of input item, a range is added with that item as the container node and a start point index of zero. The end point index is the number of children of that item or, if the input item is of a type that cannot have children, the length of the string value of the item. location-set range-to (location-set) Range-to is a special function in terms of the way in which it makes use of the context. For each location in the context, it returns a range from the start point of the context location to the end point found by evaluating its parameter with that context location. A special-purpose extension to the XPath syntax permits the use of a range-to in place of an axis specifier and node test in a location path step. For example, to obtain a range from the element with the ID "label1" to the element with the ID "label2" you can write the following code: xpointer(id("label1")/range-to(id("label2"))) As another example, if portions of a document have been marked by EdStart and EdEnd elements, ranges covering all such pairs could be found with the following code: xpointer(//EdStart/range-to(following::EdEnd[1]))) location-set start-point (location-set) This function returns a point for each location in the input as specified by the following rules:
location-set string-range (location-set, string, number?, number?) For each item in the input location-set, the function searches the string value of that item for the second parameter. For each nonoverlapping occurrence found, it adds a range to the output location-set. This range consists of two character-points encompassing the occurrence of the string if the optional numeric third and fourth parameters are absent. If one numeric parameter is present, the function returns the position of the first character of the resulting range adjusted by that parameter relative to the beginning of the matched string. A single numeric parameter value of 1 indicates no adjustment. If a second numeric parameter is present, it specifies the length of the resulting range in characters. The default, in the absence of a second numeric parameter, is that the resulting range extends to include the last matched character. If the numeric parameters are such that the resulting range would extend beyond either end of the string value, the XPointer part in which the function appears fails. |