Exploring XPath | Visual C#. NET 2003 Unleashed

XML Path Language (or XPath) is a W3C standard that primarily allows identifying parts of an XML document. In other words, with the help of XPath, you can locate one or several nodes in the XML document. In addition, XPath is used for numerical calculations, string manipulations, testing Boolean conditions, and more. XPath is used by various other W3C specifications, such as XSLT, XQuery, XPointer, and XML Schema. Due to this, XPath technology is one of the more important things that every XML developer should know.

The Microsoft .NET Framework works with the XPath 1.0 W3C recommendation. The classes in System.Xml and System.Xml.XPath namespaces allow the execution of XPath queries and working with the result sets.

The XPath specification broadly covers the following topics:

Data model This section describes the general concepts and terms used in XPath. There are seven defined types of nodes in XPath: root node, element node, attribute node, text node, namespace node, processing instruction node, and comment node.
Location paths This section details the constructs and syntax used for addressing parts of an XML document. A location path is used to address a certain node set of a document. The location path syntax is very similar to other hierarchical notations used in computer applications, such as URIs, file/folder paths, and so on. The location path can be either absolute or relative (for example, /companies/company/cell-phones/phone[position()=2]).
Expressions This section describes the most basic XPath construct: an expression. Location paths (explained earlier) are a special case of XPath expressions. Expressions are made up of operands and operators. By using expressions, you could take a node set (an unordered collection of nodes without duplicates), a Boolean value, a floating-point number, or a string.
FunctionsThis section discusses 27 functions that are divided into four categories: node set functions, string functions, Boolean functions, and number functions. Each function takes zero or more arguments and returns a single result. Function example: concat(A, B).

All classes that encapsulate XPath functionality in .NET are located in the System.XML.XPath namespace. They provide the XPath parser and evaluation engine that is described in the W3C XML Path Language Version 1.0 Recommendation.

Learning the Syntax

This section discusses the most commonly used XPath syntax constructions. You can find more detailed information about each XPath technology on the Internet (for example, www.w3schools.com/xpath/xpath_syntax.asp).

The main term in XPath is the path of a node in the XML tree. A simple example of a path is companies/company/cell-phones/cell-phone (this example is based on the XML example in Listing 8.2).

An XPath query returns all the nodes that are located by a particular path and can contain wildcards in the path's definition. For example, by using the path companies/*/cell-phones in an XPath query, it would return all nodes that are located in the companies node and all children of this one. Branches in XPath can be defined by using square brackets. The following query returns all cell-phones nodes from the companies node and the child node of this one called company: companies/company[cell-phones].

In addition, you can use the equals operator in a query; for example, companies/company/cell-phones[cell-phones='Some Description']. This query returns all cell phones that have a description equal to 'Some Description'.

NOTE

Notice that comparison operators can be used only inside square brackets.

Attributes in the query should be defined with the @ character before the name of the attribute. The following example returns all cell phones with the name MC60: companies/company/cell-phones/cell-phone[@name='MC60'].

The following list contains all of the operators and wildcards that are predefined in XPath:

/ Selects child nodes from the collection that is located at the left side of it. If using it at the beginning of the query, the search will be performed from the root node.
// Recursive search. It selects a node in any depth. If using it at the beginning of query, the recursive search will be performed from the root node.
. The current context.
* Wildcard that selects all elements (ignores the name of element).
@ Attribute (prefix of attribute's name). If the name of the attribute is not set, the search will return all attributes.
: Namespaces separator. It separates namespace's prefix from the name of the element or attribute.
( ) Groups operation for obvious setting of sequence.
[ ] Applies a filter. Also used as an index of collections.
+ Addition.
- Subtraction.
Div Division (according to IEEE 754).
* Multiplication.
Mod Returns remainder from division.

Filtering Nodes

As previously described, square brackets can be used for filtering nodes: [pattern]. The pattern located inside the brackets is very similar to the SQL statement WHERE. The filter could be applied to all elements in the collection and return only nodes that correspond to it. You could apply several filters on one level of query. You are not allowed to use empty filters. Filters are always applied for the node in the current context. So, the following example returns all companies with the child node cell-phones: /companies/company[cell-phones]. Also you could use . instead of the name of the context node: /companies/company/cell-phones/cell-phone[.='Some Value'].

If the filter should return not only one element, you could use the keyword any or all. If you are not using it, the filter will return only the first element that corresponds to the filter.

Filters could contain Boolean expressions, comparison expressions, and appellation expressions. The following list contains the set of operators that can be used in filters:

and Logical AND
or Logical OR
not() Negation
= Equality
!= Inequality
< Less than
<= Less than or equal to
> Greater than
>= Greater than or equal to
| Union; returns the union of two sets of nodes

The following is the order of priority for each operator:

() Grouping
[] Filtering
/ and // path
<, <=, >, >= Comparison
=, != Comparison
| union
not() Boolean negation
and Boolean AND
or Boolean OR

NOTE

Notice that all operators are case sensitive. This means that each operator is very similar to its meaning in SQL.