Finding the Intended Content | The Official XMLSPY Handbook

One thing you may have noticed is that it isn’t always easy to find the intended content within an XML document by using XPath and regular expressions. Often in XSLT, a query is executed and the result contains extra bits that are either added to the result or are missing from the result. And to get those extra bits included in the result, other bits magically appear. XSLT selection can be a mind-boggling problem that generates strange side effects. The simplest way to understand XPath expressions is to consider an XPath expression as a directory location alteration.

When you type the following command, you expect to move up a level in the directory structure:

cd ..

When you type this next command, you expect to change directories to the subdirectory documents:

cd documents

XPath works in a similar way to these directory identifiers except that the path manipulations are much richer, more flexible, and do not require the cd command.

Using the XMLSPY Evaluate XPath functionality

To edit an XPath expression, a simple approach is to type the XPath in the XSLT document and run the XSLT document. If the XPath is correct, XML content is generated. But if you want an experiment that gives immediate gratification, the XMLSPY XPath Evaluator is better. To access the XPath Evaluator, select the XML document that you want to test and then choose XML → Evaluate XPath. This command sequence results in the dialog box shown in Figure 6-3.

click to expand
Figure 6-3: The XMLSPY IDE with the Evaluate XPath dialog box active.

The Evaluate XPath dialog box dynamically executes an XPath on the loaded XML document and shows the results in the dialog box text area. As a simple test, type the following into the XPath text box in the dialog box:

//elements

The results are similar to those shown in Figure 6-4.

click to expand
Figure 6-4: The XPath Evaluation output.

XPath basics

The examples shown thus far are all relatively simple and straightforward. The following XPath example is a bit more sophisticated:

child::elements[ attribute::embedded != “true”]

What this XPath example says is “Find all the child elements XML nodes in which the embedded XML attribute is not equal to the value of true.” In this more sophisticated example, the concepts of axis, predicates, and so on have been used.

Changing Position in XML

When you reference XML nodes in an XPath expression, a position change occurs. Usually in XPath, the way to reference a sub-element would be use one of the following notations:

/item /*[ 1]

//item[ 1]

In all these examples, the sub-element is specified using special characters, which are defined as follows:

*: Match all sub-elements specified at the current context (current location).
/: Select the root document element as the current context (lack of / indicates selection starts at the current context).
//: Make all nodes the current context (consider this as an XML document where all the nodes have the chance to be a root node).
[ ]: Specify a predicate from which the nodes selected thus far are filtered according to the rules indicated inside the brackets (in the samples shown, the predicates are to find the first element of the selection).
.: Select the current context (usually is not referenced).
..: Select the parent element instead of changing location to a sub-element.
@: Select a specific attribute based on the current context (when used with the wildcard *, will select all attributes).

The XPath examples shown previously are called abbreviated syntax XPath because they use the special characters shown in the preceding list. As the following examples demonstrate, however, you can express the previous examples by using standard XPath with axis specifiers:

child::* child::item child::*[ 1] descendant::item[ 1]

The axis specifier is the text before the double colon. Following the double colon is the same text as in the abbreviated XPath syntax. The wildcards and predicates are still applicable. The axis specifiers are defined as follows (note that all axis identifiers are relative to the current context):

child: References all the children of the current context
descendant: References all the children of the current context, which includes children of children
parent: References the parent of the current context
ancestor: References the parent of the current context, which includes the parent of the parent
ancestor-or-self: References the current context or the parents of the current context, which includes the parent of the parent
following-sibling: References all the following XML nodes from the current context
preceding-sibling: References all the preceding XML nodes from the current context
following: References all XML nodes after the current context
preceding: References all the XML nodes before the current context
attribute: References a specific attribute from the current context
namespace: Contains the namespace nodes of the current context
self : References the current context
descendant-or-self: References the current context and all descendants
ancestor-or-self: References the current context and all ancestors

Demonstrating Axis Specifiers

The best way to illustrate what an axis specifier does is to illustrate it through various examples.

Tip

The axis specifier examples do not show that I used the Evaluate XPath dialog box; however, I want to reiterate that I did. You may be under the impression that I used the XSLT query expression. You should also use this dialog box when you’re experimenting with XPath.

The XPath expressions that I execute in the following examples are all based on this XML document:

<data>     <child>         <elements>Hello</elements>     </child>     <elements>World<sub>          <elements>Embedded</elements></sub>     </elements> </data>

In the following examples, an XPath expression is defined, executed on the XML document, and the result is highlighted in the XML document. Notice that I use bold in the XPath expression results to indicate the selected XML nodes.

Starting with a simple child selection, consider the following XPath:

child::*

The selected nodes are shown in bold, as follows:

<data>      <child>           <elements>Hello</elements>      </child>      <elements>World<sub>           <elements>Embedded</elements></sub></elements> </data>

Notice that selection results are the root node. This may seem a bit puzzling because you would expect that the child nodes are child and elements XML nodes. The selected nodes are still the same, even if you modify the XPath to the following:

/child::*

The reason for this is the XML specification. The root node is not the data XML node, but the XML document, which has a child XML node data. It is a small item to remember, but you need to keep it in mind because it can be confusing at times.

Now consider the following XPath that selects a specific XML node:

/child::*/child::child

The selected nodes are shown in bold in the following:

<data>     <child>         <elements>Hello</elements>     </child>         <elements>World<sub>              <elements>Embedded</elements></sub></elements> </data>

This selection is fairly obvious because it is an XML node drill down. To make this selection a bit more interesting, use the descendant axis specifier, as in the following expression:

/child::*/descendant::elements

The selected nodes are shown in bold:

<data>      <child>           <elements>Hello</elements>      </child>      <elements>World<sub>           <elements>Embedded</elements></sub></elements> </data>

The descendant axis specifier makes it possible to select a current context and then select each child XML node within it. This is useful when you want to select every XML node or a specific set of XML nodes beneath a specific context.

If, however, you want to find XML nodes after a found condition, use the following axis specifier, as shown by the following example:

/child::*/child::child/following::elements

The selected nodes are shown in bold:

<data>        <child>               <elements>Hello</elements>        </child>        <elements>World<sub>               <elements>Embedded</elements></sub></elements> </data>

The following axis specifier is like a descendant axis specifier in that all element nodes are selected including the embedded elements XML node. If only the first level of the following XML nodes are to be selected, the following axis specifier needs to be replaced with following-sibling, as in this example:

/child::*/child::child/following-sibling::elements

The selected nodes are shown in bold:

<data>        <child>               <elements>Hello</elements>        </child>        <elements>World<sub>               <elements>Embedded</elements></sub></elements> </data>

The preceding and preceding-sibling axis specifiers function identically to the following and following-sibling axis specifiers, except that the preceding and preceding-sibling axis specifiers search the XML nodes before the current context. Consider the following use of the preceding axis specifier:

/child::*/child::child/preceding::*

The selected nodes are shown in bold:

<data>        <child>                <elements>Hello</elements>        </child>        <elements>World<sub>                <elements>Embedded</elements></sub></elements> </data>

Notice that nothing is selected, even though the data XML node is before the child XML node. But the data XML node is a level higher, meaning that it is a parent and therefore is not part of the selection. To make this selection work, the ancestor axis specifier is used as shown in the following example:

/child::*/child::child/ancestor::*

The selected nodes are shown in bold:

<data>        <child>               <elements>Hello</elements>        </child>        <elements>World<sub>               <elements>Embedded</elements></sub></elements> </data>

The selected data XML node is a correct selection because it is the parent of the child XML node. Note that if there were a parent to the data XML node, it would be selected as well. A tweak to this selection is to include the current context in the XPath, as the following example shows:

/child::*/child::child/elements/ancestor-or-self::*

The selected nodes are shown in bold:

<data>        <child>                <elements>Hello</elements>        </child>        <elements>World<sub>                <elements>Embedded</elements></sub></elements> </data>

Notice in the XPath, the elements XML node is referenced directly without an axis specifier. This is legal and implies the child axis specifier. The node set is correct and includes the current context.

Tip

When using the self keyword within an axis specifier, be careful about looping because the current context is in the selection. This could lead to an infinite loop.

If only the parent is to be selected, you use the parent axis specifier, as shown in the following example:

/child::*/child::child/elements/parent::*

The selected nodes are shown in bold:

<data>        <child>               <elements>Hello</elements>        </child>        <elements>World<sub>               <elements>Embedded</elements></sub></elements> </data>

The result is as expected, but only because the parent axis specifier has a wildcard parent selection. If the wildcard were to be exchanged with a specific XML node identifier, such as elements, the XPath would return an empty parent node set. This type of selection is very useful when testing if the parent is a specific XML node type.

The attribute axis specifier is not useful when the default XML document is utilized. A modified XML document that includes an attribute looks like the following:

<data>        <child haselements=”true”>               <elements>Hello</elements>        </child>        <elements>World<sub>               <elements>Embedded</elements></sub></elements> </data>

Execute the following XPath expression:

/child::*/child::child/attribute::*

The selected node is shown in bold:

<data>        <child haselements="true">               <elements>Hello</elements>        </child>        <elements>World<sub>               <elements>Embedded</elements></sub></elements> </data>

The node set includes a number of attributes. In the XML document, there is only one. It is important to realize that when attributes are selected, axis specifiers such as following and preceding still apply.

Determining when to use axis specifiers and abbreviated syntax

After you have gone through the axis specifier syntax and abbreviated syntax, it could be tempting to rewrite the preceding example using the attribute axis specifier to the following abbreviated syntax:

/child::*/child::child[@*]

But when the XPath is evaluated, the node set is not the same as the attribute axis specifier example. The square brackets are predicates that filter out XML nodes that contain attributes. The correct XPath that returns the same result set as if you used the attribute axis specifier is as follows:

/child::*/child::child/@*

Now the XPath returns the same selection of attributes. This leads us to an interesting finding. XPaths require some experience to fully understand how the selection occurs. In the simplest case, XPaths are used in XSLT to set a current context, and most likely the abbreviated syntax is used. But there are situations where fine-tuning or a more specific type of selection is required. In that scenario, the axis XPath specifiers are very useful because they allow exact definition of a node set. Axis XPath specifiers are not as cumbersome as the abbreviated syntax, but this does not mean that you should always use the axis specifier syntax. The reason for the creation of the abbreviated syntax is to allow a developer to quickly write a generic XPath expression without having to be too verbose.

Using XPath Expressions

Also possible in XPath is the capability to use functions to retrieve information about a given context. The axis specifiers and the abbreviated syntax allow a developer to select a specific context or node set. But that node set is useless when trying to transform content because the node set will be output without the control of the XSLT document. To get specific control of the output, XPath expressions are used. But XPath expressions can also be used to manage multiple conditions. For example, the XMLSPY Built In templates debugged at the start of this chapter had the following rule:

<xsl:template match=”*|/” mode=”?”>        <xsl:apply-templates mode=”?”/> </xsl:template>

The xsl:template match uses the vertical bar (|) to denote one condition or another. If you’re a C, C++, C#, or Java programmer, you know that this bar is an or operator used in a conditional statement. The operators can combine with expressions, which were used in the xsl:value-of XSLT XML nodes.