Nodes | SOA for the Business Developer: Concepts, BPEL, and SCA (Business Developers series)

The XPath processor reads the XML source and includes the information in a series of data structures called nodes, which include only the information necessary for data access. An XPath node doesn't provide detail, for example, on whether an attribute value was embedded in single or double quotation marks.

Seven different kinds of nodes are related to one another in a tree structure that is specific to XPath. The purpose of four of the seven nodes is straightforward. Each element, attribute, comment, and processing-instruction node has information that was derived from a corresponding aspect of the XML source. Each text node has information on the text value of an XML element. Each namespace node has information on the namespaces that are in scope for a given element. Last, the single root node (what XPath 2.0 calls the document node) has information on the entire XML document.

In the tree structure built from the following example, the children of the root node are, in order, a comment node, an element node, and another comment node.

 <?xml version="1.0" encoding="ISO-8859-1"?> <!- here is an insured -> <Insured></Insured> <!- end of file ->

The root node is not the same as the root element, which is the ancestor of all other elements in the XML source. The root node is more inclusive; it is the ancestor of the element node that was derived from the root element.

The XPath nodes have no details on the XML declaration. They also lack details on a DOCTYPE declaration, which is present when a validation mechanism called a Document Type Definition (DTD) is in use.

You can access specific data by referencing the nodes in a sequence that leads from the root node to the nodes of interest. Every node has a string value, so you gain access to a unit of business data as soon as you reference (in particular) an element or attribute node. Consider, for example, the XML document shown in Listing 6.1.

Listing 6.1: Sample XML document

 <?xml version="1.0" encoding="ISO-8859-1"?> <!-- CarPolicy applicant --> <Insured Customer>    <CarPolicy PolicyType="Auto">       <Vehicle Category="Sedan">          <Make>Honda</Make>          <Model>Accord</Model>       </Vehicle>       <Vehicle Category="Sport" Domestic="True">          <Make>Ford</Make>          <Model>Mustang</Model>       </Vehicle>    </CarPolicy>    <CarPolicy PolicyType="Antique">       <Vehicle Category="Sport">          <Make>Triumph</Make>          <Model>Spitfire</Model>       </Vehicle>       <Vehicle Category="Coupe" Domestic="True">          <Make>Buick</Make>          <Model>Skylark</Model>       </Vehicle>       <Vehicle Category="Sport">          <Make>Porsche</Make>          <Model>Speedster</Model>       </Vehicle>    </CarPolicy> </Insured>

Here's a kind of XPath expression (called a location path) for accessing the make of the vehicles whose Category value is Coupe.

 /Insured/CarPolicy/Vehicle[@Category='Coupe']/Make

As we describe this expression in the following paragraphs, we refer to nodes by a type name, as when we say Vehicle node.

The initial virgule (/) in the expression indicates that the search for data starts at the root node. The set of characters between one virgule and the next represents a location step. Each location step selects nodes based on criteria that you specify.

The first step (Insured) brings the search to the node that is subordinate to the root node and has details on the root element. The Insured node refers to a single element, but the general rule is important: location steps provide access to a node set, which is a group of nodes that (with exceptions) are arranged in XML-source order (an order that reflects the sequence of content in the XML source) or is an empty set. An empty set is the outcome when no node conforms to the selection criteria.

We will have more to say about ordering in due time.

In general terms, a location path is an XPath expression that resolves to a node set. An absolute location path is one that starts at the root node, and a relative location path is one that starts in the middle of a node tree.

The second step in our sample expression (CarPolicy) brings us to a node set that has multiple members - specifically, a set of all CarPolicy nodes that are children of the Insured node.

The third step (Vehicle[@Category='Coupe']) continues the path, referencing all Vehicle nodes that are children of any CarPolicy node that is itself a child of the Insured node. The brackets ([]) and the syntax internal to them is a predicate, which contains a Boolean expression or (as shown later) an abbreviation that is expanded to a Boolean expression. The XPath processor selects only the nodes for which the expression evaluates to true.

In this case, the location step means "access all the Vehicle nodes, with the further restriction that the string value of the Category attribute node is Coupe." When you refer to an attribute node in a predicate, you precede the name with the "at sign" (@), as shown.

Here's another predicate, outside our example.

 [@Exterior='white' and @Interior='red']

You might read this as, "… with the further restriction that the exterior is white and the interior is red."

Continuing with our main example, the fourth step (Make) completes the path, referencing the Make node for the Vehicle nodes whose Category attribute value is Coupe. In this case, the overall expression resolves to an element node whose string value is Buick.

XPath cannot create nodes or add detail to an XML source. You can use XPath in the context of XSLT, however, to create output that is based on, first, an XML source and, second, a set of directions (including XPath expressions) that are supplied in an XML stylesheet. If you use the instructions in Appendix B to try out XPath expressions, you'll be creating an output with XSLT.