Location Paths | Integrating PHP and XML 2004

A location path is an XPath expression that notifies the XPath processor about how to navigate around an XML document. It is a sequence of location steps separated by a slash, /. The XPath processor evaluates a location path from left to right starting with an initial context node. Each node that results from the evaluation of one location step represents the context node to evaluate the next location step. XPath, then, combines the results of all location steps and returns selected XML elements.

There are two types of location paths, absolute and relative. You start the absolute location path with a /, but cannot start the relative location path with a /. In an absolute location path, the current nodeset consists of the root node. In a relative location path, the location steps are separated by a /. The / represents a direct parent-child relationship between the nodes involved in the location step.

Each location step selects a nodeset with respect to its context node. The nodes in a nodeset represent the context node for the next location step. For example, the location path, child::customer/child::item, selects the child item element of the child customer element of the context node. The child axis is the default value for an axis.

Creating Location Steps

A location step determines the nodes through which an XML document should be traversed to arrive at a final location. The syntax for a location step is:

 axis_name::nodetest[predicate]

For example, you navigate through an XML source document, companydetails.xml, using XPath.

Listing 5-4 shows the content of the XML document, companydetails.xml:

Listing 5-4: The companydetails.xml Document

 <company name="Blue Moon Systems"> <Managing Director>Ron Floyd</Managing Director> <Department name="Administration"> <Name>Tom</Name> <Age>35</Age> <Address> 17, landmark plaza, New York</Address> <EmployeeId>a01</EmployeeId> </Department> <Department name="Sales"> <Name>John</Name> <Age>43</Age> <Address> 34, landmark plaza, New York</Address> <EmployeeId>s02</EmployeeId> </Department> </company>

The above listing shows the content of the companydetails.xml document that contains information about the Blue Moon Systems company. The companydetails.xml document consists of three element nodes: company name, Managing Director, and Department. The Department element node consists of an attribute node, name. For each Department attribute node, there are four child nodes: Name, Age, Address, and EmployeeId.

For example, the location path, child::company name/ child::Department [attribute::name="Sales"] consists of two location steps:

child::company name
- Axis: child
- Node test: company name
- Predicate: null
child::Department [attribute::name="Sales"]
- Axis: child
- Node test: Department
- Predicate: [attribute::name="Sales"]

The first location step does not contain a predicate but the second location step includes a predicate. The predicate in the second location step selects the node representing a Department element only if it contains a name attribute with the value, Sales.

Identifying Axes

An axis within an XPath location step determines the direction in which the XPath processor should navigate an XML document. There are two types of XPath axes, forward and backward. A forward axis consists of either the context node or nodes that appear after the context node in an XML document. A backward axis consists of the context node together with nodes that appear before it in an XML document. The backward axis is also known as the reverse axis.

The XPath processor uses the position() function to evaluate the position of a node using the information about the type of axis. If an axis is forward, the position of a node in the nodeset obtained from an XPath tree model is equal to the position of the node in the XML document order. For example, the first child element node in a tree model is the first child element node in an XML document order. If the axis is reverse, the position of a node in the nodeset obtained from an XPath tree model is equal to the position of the node in the reverse document order. For example, the first ancestor element node in a tree model is the last ancestor element in reverse document order. XPath consists of 13 axes. They are:

Child : Contains child nodes of the context node. The attribute and namespace nodes are not the child nodes of the element node to which they belong. A child axis is the default axis and never returns any attribute and namespace nodes in the return nodeset.
Parent : Contains the node that represents the parent of the context node. If the root node of a document is the context node, it does not have any parent.
Ancestor : Contains the ancestors of a context node. The ancestors of a context node include parents of the context node, parents of the parent, and other parents in the hierarchy. The ancestor axis always includes the root node except when the root node itself is defined as the context node. The root node does not have any ancestors .
Descendant : Contains the descendants of a context node. The descendants of a context node include the child nodes of the context node, child nodes of child nodes, and other child nodes in the hierarchy. As a result, the child axis is a part of the descendant axis. It does not contain the attribute and namespace nodes.
Descendant-or-self : Contains descendant nodes together with the context node.
Ancestor-or-self : Contains ancestor nodes together with the context node.
Following-sibling : Contains all element nodes that are sibling of the context node and appear after the context node in an XPath data model. If the context node is either an attribute node or a namespace node, then the following-sibling axis does not include any node. For example, the customerdetails.xml file contains the context node as the node that represents the <Department> element with attribute value, Administration. The XPath expression that uses the following-sibling axis to retrieve the <Department> element node with attribute value, Sales, is:
```
 child::Department [@name=Administration]/following-sibling::Department [@name=Sales] 
```

Preceding -sibling : Contains all element nodes that are sibling of the context node and precedes the context node in the data model. If the context node is either an attribute node or a namespace node, the preceding-sibling axis does not include any node. For example, the customerdetails.xml file contains the context node defined as the node that represents the <Department> element with attribute value, Sales. The XPath expression that uses the preceding-sibling axis to return the <Department> element with attribute value, Administration is:
```
 child::Department [@name=Sales]/preceding-sibling::Department [@name=Administration] 
```

Following : Contains all nodes that appear after the context node in an XML document. It excludes the descendant nodes, attribute nodes, and namespace nodes.
Preceding : Contains all nodes that appear before the context node in an XML document except the ancestor node, attribute node, and the namespace node.
Attribute : Contains the attributes of a context node. It includes a node only if an element node is the context node. In all other cases, the attribute axis is empty.
Namespace : Contains the namespace nodes of the context node. It includes a node only if an element node is the context node. If the context node is other than an element node, the attribute axis is empty.
Self : Contains only the context node.

Figure 5-2 shows the data model for an XML document:

click to expand: this figure shows a data model that represents the elements in an xml document as nodes.

Figure 5-2: Data Model

You can determine various nodes corresponding to a specific axis by choosing a context node.

Table 5-2 lists the nodes corresponding to various axes, choosing node F as the context node:

Table 5-2: Nodes Corresponding to an Axis
Axis Name	Nodes
Self	F
Child	I, J
Parent	B
Ancestor	B, A, root node
Descendant	I, J, L
Descendant-or-self	F, I, J, L
Ancestor-or-self	F, B, A, root node
Following-sibling	G
Preceding-sibling	E
Following	G, K, C, D, H
Preceding	E

The child, parent, descendant, descendant-or-self, following-sibling, following, attribute, namespace, and self axes are forward axes. The ancestor, ancestor-or-self, preceding-sibling, and preceding axes are reverse axes.

Performing the Node Test

A node test describes a test on those XPath tree nodes that an XPath processor selects along the specified location step axis. The nodes that succeed this test act as a nodeset for the next location step. A node test identifies a node within an axis. Various node tests that an XPath processor can perform on XPath expressions are:

QName : Selects nodes that have a QName and represent principal nodes for an axis.
node() : Selects all nodes regardless of their name and type.
text() : Selects all text nodes.
comment() : Selects all comment nodes.
processing instruction(): Selects all processing instruction nodes.
prefix:* : Selects principal nodes that belong to the namespace defined by a prefix.

Note	The * node test selects all principal nodes for an axis. For a child axis, the principal node type is an element node. If the context node includes child nodes other than element nodes, the resultant nodeset does not include those nodes.

Listing 5-5 shows the content of the XML document, course.xml:

Listing 5-5: The course.xml Document

 <?xml version="1.0" ?>  <!--An example XML representation of the courses in a college-->  <courses> <college>St. Peters</college>  <branches> <branch name="EC"> <semester number="1"> <subject name="Digital Circuits"> First subject in semester 1 of EC branch. </subject>  <subject name="Analog Circuits"> </subject>  <subject name="Electromagnetics"> </subject>  </semester> <semester number="2"> <subject name="Radar Theory"> </subject>  <subject name="Satellite Communication theory"> </subject>  <subject name="Active Networks"> </subject>  </semester> </branch> <branch name="CS"> <semester number="1"> <subject name="Microprocessor"> </subject>  <subject name="Data Structure"> </subject>  <subject name="Computer System Organization"> </subject>  </semester> <semester number="2"> <subject name="Operating System"> </subject>  <subject name="Java"> </subject>  </semester> </branch> </branches> <CommonSubjects> <Common name="PDC"> </Common>  <Common name="NA"> </Common> <Common name="CP"> </Common> </CommonSubjects> </courses>

In the above listing, the course.xml document contains information about the semester-wise courses for two disciplines, EC and CS. The course.xml document also lists the subjects common for these two disciplines.

The following location step expressions demonstrate how to implement a node test:

child::subject : Selects the <subject> child nodes of the context node. If the context node is the <semester> element with the number attribute value, 2 in the EC branch, the returned nodeset contains three <subject> child nodes. In addition, if the <courses> element is the context node, the location step expression returns an empty nodeset. This is because the <courses> element consists of the <subject> child node as its descendant node.
child::* : Selects all element nodes that are child nodes of the context node. If the <courses> element is the context node, the expression returns a nodeset that contains the <college>, <branches>, and <CommonSubjects> element nodes.
child::text() : Selects text nodes that are child nodes of the context node. For example, the context node is the <subject> element node with the name attribute value, Digital Circuits in semester 1 of EC branch. The XPath expression, returns the nodeset that consists of one text node having the string value, First subject in semester 1 of EC branch.
child::branch/descendant::subject : Selects <subject> element nodes that are descendant nodes of the <branch> child node of the context node. If you select the <branches> element node as the context node, the expression returns a nodeset that contains 11 <subject> elements. This is because all <subject> element nodes are descendant nodes of the <branches> element node.

In the above examples, the XPath expressions use the relative location path. Alternatively, you can use an absolute location path to retrieve specific data from an XML document. For example, the location step that selects all element nodes that are child nodes of the root node expression is:

 /child::*

There is only one element node for a root node. As a result, the above location step returns a nodeset that consists of only the <course> element node.

Setting Predicates

The location step consists of a predicate that describes the conditions that a node must satisfy to be selected. A predicate acts as a filter for the nodeset that the node test returns. The resultant nodeset of a location step includes nodes that satisfy the predicate. You must write a predicate within square brackets. The comparison operators that a predicate uses to compare a node property with some specific value are: =, !=, >, <, >=, or <=. A node property can be the attribute value or sibling order value of a node, which the position() XPath function returns.

The following location step expressions use Listing 5-4 to demonstrate how to implement predicates:

child::subject [position()=2] : Selects the second <subject> child node of the context node. If the context node is the first <semester> element node in the EC branch, the expression returns the second <subject> element node with the name attribute value, Analog Circuits.
preceding::branch [attribute::name] : Selects the <branch> element nodes that possess a name attribute and precedes the context node. If the context node is the <subject> element node with the name attribute value, Microprocessor, the expression returns the <branch> element node with name attribute value, EC.

Each of the above expressions returns a nodeset that consists of a single node. XPath functions you define in a predicate can also return more than one node. For example, the following location step selects all <subject> element nodes that appear after position 1 and represent child nodes of the context node in Listing 5-4:

 child::subject [position()>1]

If the context node is the <semester> element node with number attribute value, 1 in EC branch, the expression returns second and third <subject> element nodes in the nodeset.