Location Paths


Although there are many different kinds of XPath expressions, the one that's of primary use in Java programs is the location path . A location path selects a set of nodes from an XML document. Each location path is composed of one or more location steps. Each location step has an axis, a node test and, optionally , one or more predicates. Furthermore, each location step is evaluated with respect to a particular context node. A double colon ( :: ) separates the axis from the node test, and each predicate is enclosed in square brackets.

Some examples will help explain these terms. Consider the simple XML-RPC request document in Example 16.3.

The Differences between the XPath and DOM Data Models

The XPath data model is similar to the DOM data model but not quite the same. The most important differences relate to the names and values of nodes. In XPath, only attributes, elements, processing instructions, and namespace nodes have names, which are divided into a local part and a namespace URI. XPath does not use pseudo- names like #document and #comment. The other big difference is that, in XPath, the value of an element or root node is the concatenation of the values of all its text node descendants, not null as it is in DOM. For example, the XPath value of <p>Hello</p> is the string Hello , and the XPath value of <p>Hello<em>Goodbye</em></p> is the string HelloGoodbye .

Other differences between the DOM and XPath data models include the following:

  • XPath does not have separate nodes for CDATA sections. CDATA sections are simply merged with their surrounding text.

  • XPath does not include any representation of the document type declaration.

  • Each XPath text node always contains the maximum contiguous run of text. No text node is adjacent to any other text node.

  • All entity references must be resolved before an XPath data model can be built. Once resolved, they are not reported separately from their contents.

  • In XPath, the element that contains an attribute is the parent of that attribute, but the attribute is not a child of the element.

  • Every namespace that has scope for an element or attribute is an XPath namespace node for that element or attribute. This does not refer to namespace declaration attributes, such as xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" , but rather to all elements for which a namespace mapping is defined. There are no nodes in an XPath model that directly represent namespace declaration attributes.

Example 16.3 An XML-RPC Request Document
 <?xml version="1.0"?> <methodCall>   <methodName>calculateFibonacci</methodName>   <params>     <param>       <value>         <int>23</int>       </value>     </param>   </params> </methodCall> 

Exactly how the context node for a location step is determined depends on the environment in which the location step appears. When using XPath in Java code, you normally pass the context node as an argument to the method that evaluates the expression. In XSLT the context node is normally the currently matched node in the input document. In other environments, other means are provided to choose the context node. For now, let's just pick the root methodCall element as the context node. Then child::methodName is a location step that selects a node-set containing the single methodName element. It moves along the child axis with the node test methodName . That is, it selects all the children of the context node named methodName . child::params returns a node-set that contains the single params element.

Location paths are not guaranteed to return a node-set that contains exactly one node (and assuming they do is a very common mistake). child::* returns a node-set containing two element nodes, one for the methodName element and one for the params element. The asterisk is a wildcard node test that matches any element, regardless of name .

Axes

There are 12 axes along which a location step can move. Each selects a different subset of the nodes in the document, depending on the context node. The axes are as follows :

self

The node itself.

child

All child nodes of the context node. (Attributes and namespaces are not considered to be children of the node to which they belong.)

descendant

All nodes completely contained inside the context node (between the end of its start-tag and the beginning of its end-tag); that is, all child nodes, plus all children of the child nodes, and all children of the children's children, and so forth. This axis is empty if the context node is not an element node or a root node.

descendant-or-self

All descendants of the context node and the context node itself.

parent

The node that most immediately contains the context node. The root node has no parent. The parent of the root element, comments, and processing instructions in the document's prolog and epilog is the root node. The parent of every other node is an element node. The parent of a namespace or attribute node is the element node that contains it, even though namespaces and attributes aren't children of their parent elements.

ancestor

The root node and all element nodes that contain the context node.

ancestor-or-self

All ancestors of the context node and the context node itself.

preceding

All nonattribute, non-namespace nodes that come before the context node in document order and are not ancestors of the context node.

preceding-sibling

All nonattribute, non-namespace nodes that come before the context node in document order and have the same parent node.

following

All nonattribute, non-namespace nodes that follow the context node in document order and are not descendants of the context node.

following-sibling

All nonattribute, non-namespace nodes that follow the context node in document order and have the same parent node.

attribute

Attributes of the context node. This axis is empty if the context node is not an element node.

namespace

Namespaces in scope on the context node. This axis is empty if the context node is not an element node.

Consider the slightly more complex SOAP request document in Example 16.4. Let us pick the middle Quote element (the one whose symbol is AAPL) as the context node and move along each of the axes from there.

Example 16.4 A SOAP Request Document
 <?xml version="1.0"?> <!-- XPath axes example --> <SOAP-ENV:Envelope   xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/"   xmlns="http://namespaces.cafeconleche.org/xmljava/ch2/">   <SOAP-ENV:Body>     <Quote symbol="RHAT">       <Price currency="USD">7.02</Price>     </Quote>     <Quote symbol="AAPL">       <Price currency="USD">24.85</Price>     </Quote>     <Quote symbol="BAC">       <Price currency="USD">68.59</Price>     </Quote>   </SOAP-ENV:Body> </SOAP-ENV:Envelope> 
  • The self axis contains one node: the middle Quote element that was chosen as the context node.

  • The child axis contains three nodes: a text node containing white space, an element node with the local name Price, and another text node containing only white space, in that order. [1]

    [1] All of the white space counts, although there are ways to get rid of it or ignore it if you want to, as you'll see later.

  • The descendant axis contains four nodes: a text node containing white space, an element node with the local name Price, a text node with the value "24.85," and another text node containing only white space, in that order.

  • The descendant-or-self axis contains five nodes: an element node with the local name Quote, a text node containing white space, an element node with the local name Price, a text node with the value "24.85," and another text node containing only white space, in that order.

  • The parent axis contains a single element node with the local name Body.

  • The ancestor axis contains three nodes: an element node with the local name Body, an element node with the local name Envelope, and the root node, in that order.

  • The ancestor-or-self axis contains four nodes: an element node with the local name Quote, an element node with the local name Body, an element node with the local name Envelope, and the root node, in that order.

  • The preceding axis contains eight nodes: a text node containing only white space, another text node containing only white space, a text node containing the string 7.02, an element node named Price, another text node containing only white space, an element node named Quote, a text node containing only white space, and a comment node, in that order. Note that ancestor elements and attribute and namespace nodes are not counted along the preceding axis.

  • The preceding-sibling axis contains three nodes: a text node containing white space, an element node with the name Quote and the symbol RHAT, and another text node containing only white space.

  • The following axis contains eight nodes: a text node containing only white space, a Quote element node, a text node containing only white space, a Price element node, a text node containing the string 68.59, and three text nodes containing only white space. Descendants are not included in the following axis.

  • The following-sibling axis contains three nodes: a text node containing white space, an element node with the name Quote and the symbol BAC, and another text node containing only white space.

  • The attribute axis contains one attribute node with the name symbol and the value AAPL.

  • The namespace axis contains two namespace nodes, one with the name SOAP-ENV and the value http://schemas.xmlsoap.org/soap/envelope/ and the other with an empty string name and the value http://namespaces.cafeconleche.org/xmljava/ch2/ .

Generally these sets would be further subsetted via a node test. For example, if the location step preceding::Quote were applied to this context node, then the resulting node-set would only contain a single node, an element node named Quote.

Node Tests

The axis chooses the direction in which to move from the context node. The node test determines what kinds of nodes will be selected along that axis. The node tests are as follows:

Name

Any element or attribute with the specified name. If the name is prefixed, then the local name and namespace URI are compared, not the qualified names. If the name is not prefixed, then the element must be in no namespace at all. An unprefixed name in an XPath expression never matches an element in a namespace, even in the default namespace. When using XPath to search for an unprefixed element like Quote that is in a namespace, you have to use a prefixed name instead, such as stk:Quote. Exactly how the prefix is mapped to the namespace depends on the environment in which the XPath expression is used.

*

Along the attribute axis, the asterisk matches all attribute nodes. Along the namespace axis, the asterisk matches all namespace nodes. Along all other axes, the asterisk matches all element nodes.

prefix:*

Any element or attribute in the namespace mapped to the prefix.

comment ()

Any comment.

text ()

Any text node.

node ()

Any node.

processing-instruction()

Any processing instruction.

processing-instruction( ' target ' )

Any processing instruction with the specified target.

For example, once again considering the SOAP request document in Example 16.4 and choosing the AAPL Quote element as the context node, consider these location steps:

  • self::* selects one node, the middle Quote element that serves as the context node.

  • child::* selects one node, an element node with the name Price and the value 24.85.

  • child::Price selects no nodes because there are no Price elements in this document that are not in any namespace.

  • child::stk:Price selects one node, an element node with the name Price and the value 24.85, provided that the prefix stk is bound to the http://namespaces.cafeconleche.org/xmljava/ch2/ namespace URI in the local environment.

  • descendant::text() selects three nodes: a text node containing white space, a text node with the value "24.85," and another text node containing only white space.

  • descendant-or-self::* selects two nodes: an element node with the name Quote and an element node with the name Price.

  • parent::SOAP-ENV:Envelope selects an empty node set, because the parent of the context node is not SOAP-ENV:Envelope .

  • ancestor::SOAP-ENV:Envelope selects one node, the document element, assuming that the local environment maps the prefix SOAP-ENV to the namespace URI http://schemas.xmlsoap.org/soap/envelope/ .

  • ancestor::SOAP-ENV:* selects two nodes: the SOAP-ENV:Body element and the SOAP-ENV:Envelope element, again assuming that the prefixes are properly mapped.

  • ancestor-or-self::* selects three nodes: an element node with the local name Quote, an element node with the local name Body, and an element node with the local name Envelope.

  • preceding::comment() selects the single comment in the prolog.

  • preceding-sibling::node() selects three nodes: a text node containing white space, an element node with the name Quote and the symbol RHAT, and another text node containing only white space, in that order.

  • following::* selects two nodes: a Quote element node and a Price element node.

  • following-sibling::processing-instruction() returns an empty node-set.

  • attribute::symbol selects the attribute node with the name symbol and the value AAPL.

  • namespace::SOAP-ENV returns a node-set containing a namespace node with name SOAP-ENV and the value http://schemas.xmlsoap.org/soap/envelope/ .

  • namespace::* returns a node-set containing two namespace nodes: one with the name SOAP-ENV and the value http://schemas.xmlsoap.org/soap/envelope/ and the other with an empty string name and the value http://namespaces.cafeconleche.org/xmljava/ch2/ .

Predicates

Each location step can have zero or more predicates that further filter the node-set. A predicate is an XPath expression in square brackets which is evaluated for each node selected by the location step. If the predicate is true, then the node is kept in the node-set. If the predicate is false, then the node is removed from the node-set. For example, given the same SOAP request document, suppose that the context node is now the SOAP-ENV:Body element and that the stk prefix is mapped to the http://namespaces.cafeconleche.org/xmljava/ch2/ namespace URI. This location step returns a node-set containing all of the Quote elements whose price is less than ten:

 child::stk:Quote[child::stk:Price < 10] 

If this XPath expression were embedded in an XML document, you might need to escape the less-than sign as &lt; . However, this is not necessary when using XPath expressions in Java programs.

There can be more than one predicate. For example, the following location step checks both that the absolute price is greater than ten and that the currency is U.S. dollars:

 child::stk:Quote[child::stk:Price > 10][attribute::currency   = "USD"] 

If the predicate returns a number, then the node is kept in the set only if the number is equal to the position of the context node in the context node list. For example, the following location step selects the third Quote child of the context node but not the first or second:

 child::stk:Quote[3] 

If the context node has fewer than three Quote children, then this returns an empty node-set.

If the predicate returns an empty string, then the context node is deleted from the set. If the string is not empty, then the context node is not deleted. For example, this location step selects Quote elements whose symbol attribute has a value:

 child::stk:Quote[string(attribute::symbol)] 

This is not quite the same as selecting the Quote elements that have a symbol attribute. The following Quote element would not be matched by the above location step:

 <Quote symbol="">    <Price currency="USD">17.32</Price> </Quote> 

If the predicate returns a node-set, then the source node is kept in the returned set only if the predicate node-set is nonempty . It is deleted otherwise . For example, the following location step finds Quote children of the context node that have at least one Price child:

 child::stk:Quote[child::stk:Price] 

This location step finds Quote children of the context node that have at least one Price child and at least one Quantity child:

 child::stk:Quote[child::stk:Price][child::stk:Quantity] 

When applied to the SOAP-ENV:Body element in Example 16.4, this step returns an empty node-set because none of its Quote children have a Quantity child.

Compound Location Paths

The forward slash ( / ) combines location steps into a location path. The node-set selected by the first step becomes the context node-set for the second step. The node-set identified by the second step becomes the context node-set for the third step, and so on.

Continuing with Example 16.4 and still using the second Quote element as the context node, consider the following location paths (here I assume that the environment for the XPath expressions maps the prefix stk to the namespace URI http://namespaces.cafeconleche.org/xmljava/ch2/ and the prefix SOAP-ENV to the namespace URI http://schemas.xmlsoap.org/soap/envelope/ ):

child::stk:Price/attribute::currency

This selects the currency attribute node currency= " USD "

preceding-sibling::stk:Quote/descendant::*

This selects one node, the first Price element in the document.

parent::*/child::stk:Quote

This selects all three Quote element nodes in the document, including the context node itself.

parent::*/child::stk:Quote[child::stk:Price > 20]

This selects the AAPL and the BAC Quote element nodes, but not the RHAT Quote element node.

parent::*/descendant::stk:Price

This selects all three Price element nodes in the document.

parent::*/child::stk:Quote[attribute::symbol='BAC']/child::stk:Price

This selects the Price element node of the BAC Quote element.

parent::*/descendant::stk:Price/attribute::currency

This selects all three currency attribute nodes in the document.

Absolute Location Paths

So far all of the location paths have been relative to a specified context node, and I've just identified that context node in prose . When we begin discussing XPath APIs, you'll see that most methods for evaluating an XPath expression have a context node argument. But not all location paths require context nodes. In particular, a location path that begins with a forward slash ( / ) is an absolute path that starts at the root node of the document (not the root element but the root node).

Continuing with the same Example 16.4 and once again assuming that the environment binds the prefix stk to the namespace URI http://namespaces.cafeconleche.org/xmljava/ch2/ and the prefix SOAP-ENV to the namespace URI http://schemas.xmlsoap.org/soap/envelope/ , consider these location paths:

/child::SOAP-ENV:Envelope/child::SOAP-ENV:Body/child::stk:Quote/ child::stk:Price

This selects all three Price element nodes.

/child::SOAP-ENV:Envelope/child::SOAP-ENV:Body

This selects the single SOAP-ENV:Body element node.

/descendant::stk:Price

This selects all three Price element nodes in the document.

/descendant::stk:Quote[child::stk:Price > 20]

This selects the Quote element nodes whose Price is greater than 20; that is, it selects the AAPL and the BAC Quote element nodes, but not the RHAT Quote element node.

/child::SOAP-ENV:Body

This returns an empty node-set, because the root element of the document is SOAP-ENV:Envelope , not SOAP-ENV:Body .

/descendant::*/attribute:*

This returns a node-set that contains all attribute nodes in the document.

/descendant-or-self::node()

This returns a node-set that contains all nonattribute, non-namespace nodes in the document.

/

This selects the root node of the document.

Abbreviated Location Paths

XPath location paths can use the abbreviations listed in Table 16.2 in location paths. The semantics are the same, but the syntax is a little easier to type.

Table 16.2. Abbreviated Syntax for XPath
Abbreviation Expanded Form
Name child:: Name
@ Name attribute:: Name
// /descendant-or-self::node()/
. self::node()
.. parent::node()

Using the abbreviated forms, the previous batch of relative XPaths from Example 16.4 that used the second Quote element as the context node can be rewritten as follows:

stk:Price/@currency

This selects the currency attribute node currency="USD"

preceding-sibling::stk:Quote//*

This isn't an exact abbreviation for preceding-sibling::stk:Quote/descendant::* ( // expands to /descendant-or-self::node()/ , not /descendant:: ), but the node-set selected is the same, the first Price element in the document.

../stk:Quote

This selects all three Quote element nodes in the document, including the context node itself.

..//stk:Price

This also isn't an exact abbreviation for the original expression, but again it selects the same node-set, which in this case contains all three Price element nodes in the document.

../stk:Quote[stk:Price > 20]

This selects the AAPL and the BAC Quote element nodes, but not the RHAT Quote element node.

../stk:Quote[@symbol='BAC']/stk:Price

This selects the Price child element node of the BAC Quote element.

..//stk:Price/@currency

This too isn't an exact abbreviation for the original expression, but once again it selects the same node-set that contains all three currency attribute nodes in the document.

Absolute location paths can also be abbreviated. In this case, // is especially convenient because at the start of a location path it produces a node-set containing every nonattribute, non-namespace node in the document. It is important to note, however, that it is quite inefficient in most XPath processors. If it's possible to rewrite an expression not to use // (or the unabbreviated descendant or descendant-or-self axes), then you probably should.

Following are some examples of abbreviated absolute location paths that apply to Example 16.4.

/SOAP-ENV:Envelope/SOAP-ENV:Body/stk:Quote

This selects all three Quote element nodes.

/SOAP-ENV:Envelope/SOAP-ENV:Body

This selects the single SOAP-ENV:Body element node.

//stk:Price

This selects all three Price element nodes in the document.

//stk:Quote[stk:Price > 20]

This selects the Quote element nodes whose Price is greater than 20; that is, it selects the AAPL and the BAC Quote element nodes, but not the RHAT Quote element node.

/stk:Price

This returns an empty node-set, because the root element of the document is SOAP-ENV:Envelope , not Price .

//@*

This returns a node-set that contains all attribute nodes in the document.

//.

This returns a node-set that contains all nonattribute, non-namespace nodes in the document.

Combining Location Paths

Occasionally it's useful to select a node-set that's built from multiple, more or less unrelated parts of an XML document. For example, you might want to select all of the Price elements and all of the Quote elements in a document. //stk:Price selects all of the prices. //stk:Quote selects all of the quotes. You can use the vertical bar, , to combine these two node-sets into one, as in the following examples.

  • //stk:Price //stk:Quote selects all of the Price element nodes and all of the Quote element nodes in the document.

  • //@currency //stk:Price selects all of the currency attribute nodes and all of the Price element nodes.

  • //stk:Quote/stk:Price //stk:Quote/stk:Quantity selects all of the Price and Quantity child elements of all Quote elements.



Processing XML with Java. A Guide to SAX, DOM, JDOM, JAXP, and TrAX
Processing XML with Javaв„ў: A Guide to SAX, DOM, JDOM, JAXP, and TrAX
ISBN: 0201771861
EAN: 2147483647
Year: 2001
Pages: 191

Similar book on Amazon

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net