We’ve seen how easy it is to represent business data by using an XML document. The next step is to understand how an application processes that data. Of course, to process the data in an XML document, an application needs a way to navigate the document to retrieve the values from the document’s elements and attributes. This is what XPath is designed for. In this section I’ll cover some of the fundamentals of XPath. For a complete reference, view the specification at http://www.w3c.org/TR/xpath.
XPath provides a syntax for addressing the data in an XML document by treating the document as a tree of nodes. Each element, attribute, or value in the document is represented as a node in the tree, and XPath expressions are used to identify the node or nodes you want to process. To understand how this works, let’s take a simple XML document as an example:
<?xml version="1.0"?> <Order OrderNo="1234"> <OrderDate>2001-01-01</OrderDate> <Customer>Graeme Malcolm</Customer> <Item> <Product Product UnitPrice="18">Chai</Product> <Quantity>2</Quantity> </Item> <Item> <Product Product UnitPrice="19">Chang</Product> <Quantity>1</Quantity> </Item> </Order>
Figure A1-1 shows a node tree that could represent this document.
Figure A1.1 - XML node tree
You can use XPath expressions to define location paths to nodes in the tree or to return a node or set of nodes that meet specified criteria. The expressions can be absolute paths or they can be relative to the currently selected node (known as the context node).
You can express XPath location paths by using either unabbreviated or abbreviated syntax. Both syntaxes define the root of the document by using a backslash (/) and allow forward and backward navigation through the nodes in the tree.
Let’s examine some absolute location paths in the node tree produced by the order document described earlier. To select the Order element node by using unabbreviated syntax, we would use the following XPath expression:
/child::Order
Translated into abbreviated syntax, this expression becomes:
/Order
To drill further down into the document, we could retrieve the Customer node by using the following unabbreviated XPath expression:
/child::Order/child::Customer
Here’s the abbreviated equivalent:
/Order/Customer
If you want to retrieve an attribute node, you must indicate this by using the attribute keyword in unabbreviated syntax or the @ character in abbreviated syntax. To retrieve the OrderNo attribute of the Order element, use the following unabbreviated syntax:
/child::Order/attribute::OrderNo
The abbreviated syntax for the OrderNo attribute is
/Order/@OrderNo
To retrieve descendant nodes—that is, nodes anywhere farther down the hierarchy—you can use the descendant keyword in unabbreviated syntax or a double slash (//) in abbreviated syntax. For example, to retrieve all the Product nodes in the order document, you could specify the following unabbreviated location path:
/child::Order/descendant::Product
The abbreviated equivalent is
/Order//Product
You can use wildcards to indicate nodes whose names aren’t relevant. For example, the asterisk (*) wildcard indicates that any node name can be used. The following unabbreviated location path selects all the child elements of Order:
/child::Order/child::*
The equivalent abbreviated syntax is
/Order/*
XPath location paths are often relative to a context node, in which case the path describes how to retrieve a node or set of nodes relative to the current one. For example, if the first Item element in the order document is the context node, the relative location path to retrieve the Quantity child element is
child::Quantity
In abbreviated syntax, the relative location path is
Quantity
Similarly, to retrieve the ProductID attribute of the Product child element, the location path is
child::Product/attribute::ProductID
This path translates to
Product/@ProductID
To navigate back up the tree, use the parent keyword. The abbreviated equivalent for this keyword is a double period (..). For example, if the context node is the OrderDate element, the OrderNo can be retrieved from the Order element using the following location path:
parent::Order/attribute::OrderNo
Note that this syntax will return a value only if the parent node is called Order. To retrieve the OrderNo attribute from the parent regardless of its name, you have to use the following unabbreviated syntax:
parent::*/attribute::OrderNo
The abbreviated version is simpler because you don’t need to provide a specific identifier for the parent. The parent of the context node is simply referred to by using the double period, as shown here:
../@OrderNo
In addition, you can reference the context node itself by using either the self keyword or a single period. This can be useful in a number of circumstances, especially when you must determine the currently selected node.
You can limit the nodes returned using an XPath expression by including search criteria in the location path. The criteria for the node or nodes to be returned are appended to the location path in square brackets.
For example, to retrieve all the Product elements with a UnitPrice attribute greater than 18, you can use the following XPath expression:
/child::Order/child::Item/child::Product[attribute::UnitPrice>18]
In abbreviated syntax, you can use the following expression:
/Order/Item/Product[@UnitPrice>18]
The criteria include a relative path to the node being retrieved, so you can use nodes from anywhere in the hierarchy in your criteria. The following example retrieves the Item nodes where the Product child element has a ProductID attribute of 1:
/child::Order/child::Item[child::Product/attribute::ProductID=1]
The abbreviated syntax for this expression is shown here:
/Order/Item[Product/@ProductID=1]
Now that you understand how to use XPath expressions to locate data in an XML document, you’re ready to examine one of the most commonly used XML-related technologies, Extensible Stylesheet Language (XSL).