Although we already know, for example, that you can assign . to a select attribute to refer to the current node, . is not a valid match pattern; its an XPath abbreviation for self::node() . Match patterns are restricted to only two axes: child and attribute , but XPath has thirteen axes, including self . Youll see all those axes in this chapter, as well as an example of each at work.
Formally speaking, XPath enables you to refer to specific sections of XML documents; its a language for addressing the various parts of such documents. XPath is what you use to indicate what part of a document you want to work with. W3C says of XPath:
The primary purpose of XPath is to address parts of an XML document. In support of this primary purpose, it also provides basic facilities for manipulation of strings, numbers and Booleans. XPath uses a compact, non-XML syntax to facilitate use of XPath within URIs and XML attribute values. XPath operates on the abstract, logical structure of an XML document, rather than its surface syntax. XPath gets its name from its use of a path notation as in URLs for navigating through the hierarchical structure of an XML document.
This quotation comes from the XPath 1.0 specification. Note that although the primary purpose of XPath is to address parts of XML documents, it also supports syntax to work with strings, numbers, and Boolean true/false values; that support is also very useful by itself, as youll see.
Currently, XPath version 1.0 is the standard, but the requirements for XPath 2.0 have been released. There are no drafts of XPath 2.0 yet, just a list of what W3C plans to put into it. An overview at the end of this chapter looks at that list. You can find the primary XPath resources in two places:
The XPath 1.0 specification. You use XPath to locate and point to specific sections and elements in XML documents so that you can work with them. www.w3.org/TR/xpath
The XPath 2.0 requirements. XPath is being updated to offer more support for XSLT 2.0primarily support for XML schemas. www.w3.org/TR/xpath20req
For more on XPath, see Inside XML . You might also want to take a look at these XPath tutorials:
www.zvon.org/xxl/XPathTutorial/General/examples.html
www.pro-solutions.com/tutorials/xpath/
The match patterns youve seen so far have returned node sets that you can loop over or match, but XPath is more general than that. In addition to node sets, XPath expressions can also return numbers, Boolean (true/false) values, and strings. Understanding XPath means understanding XPath expressions, and only one kind of XPath expression (although a very important kind) returns node sets that locate sections of a document. Other XPath expressions return other kinds of data, as youll see.
The full syntax of XPath expressions is given in the XPath specification, and I include it here for reference. As it does for match patterns, W3C uses Extended Backus-Naur Form (EBNF) notation to give the formal definition of XPath expressions. (You can find an explanation of this grammar in www.w3.org/TR/REC-xml, section 6.) The following list includes the EBNF notations you need:
::= means is defined as
+ means one or more
* means zero or more
means or
- means not
? means optional
Also, note that when an item is quoted with single quotation marks, as in ancestor or ::, that item is meant to appear in an expression literally (like ancestor::PLANET), as are items named literals . Heres the formal definition of an XPath expression (named Expr in this definition) in full:
Expr ::= OrExpr OrExpr ::= AndExpr OrExpr 'or' AndExpr AndExpr ::= EqualityExpr AndExpr 'and' EqualityExpr EqualityExpr ::= RelationalExpr EqualityExpr '=' RelationalExpr EqualityExpr '!=' RelationalExpr RelationalExpr ::= AdditiveExpr RelationalExpr '<' AdditiveExpr RelationalExpr '>' AdditiveExpr RelationalExpr '<=' AdditiveExpr RelationalExpr '>=' AdditiveExpr AdditiveExpr ::= MultiplicativeExpr AdditiveExpr '+' MultiplicativeExpr AdditiveExpr '-' MultiplicativeExpr MultiplicativeExpr ::= UnaryExpr MultiplicativeExpr MultiplyOperator UnaryExpr MultiplicativeExpr 'div' UnaryExpr MultiplicativeExpr 'mod' UnaryExpr UnaryExpr ::= UnionExpr '-' UnaryExpr MultiplyOperator ::= '*' UnionExpr ::= PathExpr UnionExpr '' PathExpr PathExpr ::= LocationPath FilterExpr FilterExpr '/' RelativeLocationPath FilterExpr '//' RelativeLocationPath LocationPath ::= RelativeLocationPath AbsoluteLocationPath AbsoluteLocationPath ::= '/' RelativeLocationPath? AbbreviatedAbsoluteLocationPath RelativeLocationPath ::= Step RelativeLocationPath '/' Step AbbreviatedRelativeLocationPath AbbreviatedAbsoluteLocationPath ::= '//' RelativeLocationPath AbbreviatedRelativeLocationPath ::= RelativeLocationPath '//' Step Step ::= AxisSpecifier NodeTest Predicate* AbbreviatedStep AxisSpecifier ::= AxisName '::' AbbreviatedAxisSpecifier AxisName ::= 'ancestor' 'ancestor-or-self' 'attribute' 'child' 'descendant' 'descendant-or-self' 'following' 'following-sibling' 'namespace' 'parent' 'preceding' 'preceding-sibling' 'self' AbbreviatedAxisSpecifier ::= '@'? NodeTest ::= NameTest NodeType '(' ')' 'processing-instruction' '(' Literal ')' NameTest ::= '*' NCName ':' '*' QName NodeType ::= 'comment' 'text' 'processing-instruction' 'node' Predicate ::= '[' PredicateExpr ']' PredicateExpr ::= Expr FilterExpr ::= PrimaryExpr FilterExpr Predicate PrimaryExpr ::= VariableReference '(' Expr ')' Literal Number FunctionCall VariableReference ::= '$' QName Number ::= Digits ('.' Digits?)? '.' Digits Digits ::= [0-9]+ FunctionCall ::= FunctionName '(' ( Argument ( ',' Argument )* )? ')' FunctionName ::= QName - NodeType Argument ::= Expr AbbreviatedStep ::= '.' '..'
As you can see, theres a lot to this specification, including calls to XPath functions (which youll see in the next chapter). The best way to understand XPath expressions is to organize them by the data types they can return.