Understanding XPath


XPath is a simple query language for XML that mimics standard directory access syntax. For example, if I had a company directory and wanted to access the third company node, I could use syntax like:

 /companies/company[3] 

XPath syntax has the advantage of being both more understandable and more portable than a platform-specific notation, such as the ColdFusion variable syntax described last chapter. In addition, because XPath was created for searching documents, it tends to be more flexible than a platform-specific method.

XPath is to XML as URLs are to the Internet. Where a URL is a patternistic way to search for resources online, XPath is a language used to search for nodes within an XML document.

Example: A CD Collection

Listing 15.1 is an XML document describing a few of the CDs in my collection. This CD collection will be the input XML document for all of the examples in this chapter.

Listing 15.1. CDCollection.xmlThe XML document used for this chapter
 <?xml version="1.0" ?> <cdcollection>   <artist  name="Air">     <genre>Electronic</genre>     <cd  name="10,000 Hz Legend" rating="3">       <recommend cd="2" />     </cd>     <cd  name="Talkie Walkie" rating="4">       <recommend cd="1" />       <recommend cd="6" />     </cd>   </artist>   <artist  name="Kylie Minogue">     <genre>Dance</genre>     <cd  name="Fever" rating="3">       <recommend cd="4" />     </cd>     <cd  name="Body Language" rating="4">       <recommend cd="5" />     </cd>     <recommend artist="3" />   </artist>   <artist  name="Dannii Minogue">     <genre>Dance</genre>     <genre>Electronic</genre>     <cd  name="Neon Nights" rating="5">       <recommend cd="4" />     </cd>     <cd  name="You Won't Forget About Me EP" rating="5">       <recommend cd="5" />     </cd>     <recommend artist="2" />   </artist>   <artist  name="Brooklyn Funk Essentials">     <genre>Funk</genre>     <genre>Dance</genre>     <genre>Spoken Word</genre>     <cd  name="Cool &amp; Steady &amp; Easy" rating="5">       <recommend cd="4" />       <recommend cd="5" />     </cd>     <recommend artist="1" />     <recommend artist="5" />   </artist>   <artist  name="Felix Da Housecat">     <genre>Electronica</genre>     <genre>Dance</genre>     <genre>Retro</genre>     <cd  name="A Bugged Out Mix" rating="3">       <recommend cd="4" />       <recommend cd="6" />       <recommend cd="9" />     </cd>     <cd  name="Kittenz and Thee Glitz" rating="5" />     <cd  name="Devin Dazzle &amp; The Neon Fever" rating="5">       <recommend cd="9" />     </cd>     <recommend artist="2" />     <recommend artist="4" />   </artist> </cdcollection> 

The collection is broken down by artist, each of whom can have one or more genres, one or more CDs, and one or more recommendations for other artists. Each CD can have one or more recommendations for other CDs. By the end of the chapter we will have turned this XML structure into an HTML listing that displays all this information in a user-friendly format.

XPath syntax

The syntax of an XPath search expression is based on the same syntax as file paths in UNIX or Windows, with the addition of features that allow the developer to restrict the returned node set based on some criteria. For example, to find all cd nodes with a rating higher than 3, I would use syntax like:

 /cdcollection/artist/cd[@rating > 3] 

In essence, I am using typical directory hierarchy syntax to select nodes, then using typical array/structure syntax to restrict nodes. When I run the XPath search against my document, I will get back an array containing all the nodes that match the expression.

Selections using / and //

Much like you select columns from a table in an SQL statement, you select nodes from an XML document in an XPath expression. For instance, in this syntax:

 /cdcollection 

We're selecting the cdcollection element directly underneath the root node of the document.

If we wanted to select children of cdcollection, we'd extend the previous selection with another one:

 /cdcollection/artist 

And so forth until we've drilled down to the level of the XML hierarchy for which we're looking.

It is possible to shorten certain XPath expressions. For instance, this expression:

 /cdcollection/artist/cd 

could also be written as:

 //cd 

The two expressions will return the same results; however, they do not mean the same thing. The first expression specifies that we want to retrieve the cd nodes at that exact position in the hierarchy, whereas the second specifies that we want to retrieve all cd nodes anywhere in the hierarchy. In effect, // means "search the entire document."

While // is certainly more convenient in some cases, its use can present some problems. First, because it is not doing any kind of restriction by document structure, the XPath engine must visit every node in the document to find all the possible matches. This makes searches using // much slower in most cases.

Second, consider this expression:

 /cdcollection/artist/recommend 

Shortening that expression to:

 //recommend 

Returns not only those recommend nodes underneath an artist node, but also the recommend nodes found under any cd nodes as well. In order to return the same data as the original expression, we'd have to use syntax like:

 //artist/recommend 

But again, using the // instead of giving a fully qualified path will make the search take longer, especially on large documents.

Restrictions using []

Now, let's say that we have an XPath expression like:

 /cdcollection/artist/cd 

That would select all the cd nodes at that position in the hierarchy (regardless of which artist element where they were contained). But what if I only wanted to select the cd nodes underneath the first artist node? I would use a restriction like:

 /cdcollection/artist[1]/cd 

much as I would if I were accessing an array element in ColdFusion. Let's walk through the expression as it stands right now.

1.

XPath selects the cdcollection node underneath the root.

2.

XPath then selects all of the artist nodes underneath the node that's been found so far.

3.

Then XPath applies the restriction [1] to the currently selected set, meaning to take only the first artist node it finds.

4.

Finally, XPath selects all the cd child nodes of the currently selected artist node.

There are other restriction formats. For instance, if I wanted to find only those cd nodes with a rating higher than 3, I would use syntax like:

 /cdcollection/artist/cd[@rating > 3] 

The @ symbol is XPath shorthand for "attribute," so whenever you are referencing an attribute name in XPath, just remember to prefix it with @.

Restrictions can contain further selections, as in this expression that retrieves all artist nodes containing at least one CD with a rating higher than 3:

 /cdcollection/artist[cd/@rating > 3] 

Notice that the selection inside of the square brackets did not start with / or //. This means that the selection starts from the context node, meaning the node immediately outside the square brackets.

What if you were searching for artists in a particular genre? (Remember that genre is an element rather than an attribute like rating.) Restrictions by element value look similar to restrictions by attribute value:

 /cdcollection/artist[genre = 'Electronica'] 

That expression would return all artist elements containing at least one genre element with a value of "Electronica."

It is also possible to combine expressions using Boolean operators. If I wanted to find all artist nodes in the Electronica genre who also have at least one cd with a rating higher than 3, I could use syntax like:

 /cdcollection/artist[genre = 'Electronica' and cd/@rating > 3] 

If I wanted to find artists who were either in the Electronica genre or had a CD rated higher than 3, I could use an or search like:

 /cdcollection/artist[genre = 'Electronica' or cd/@rating > 3] 

There is also a negation operator; if I wanted to find artists in the Electronica genre but did not have any CDs rated higher than 3, I could use syntax like:

 /cdcollection/artist[genre = 'Electronica' and not(cd/@rating > 3)] 

Using XmlSearch() to retrieve an array of nodes

Given the CD collection shown in Listing 15.1, how would I find all of the artist recommendation nodes? I could use ColdFusion to loop through the artist nodes, and then find each artist's recommendation nodes; or I could use XmlSearch() to run an XPath search, as in Listing 15.2.

Listing 15.2. FindArtistRecommendations.cfmUsing XPath to search the CD Collection
 <cffile action="READ"   file="#ExpandPath('CDCollection.xml')#"   variable="xmlDocument"> <cfset xmlObject = XmlParse(xmlDocument)> <cfset results = XmlSearch(xmlObject, "/cdcollection/artist/recommend")> <cfdump var="#results#"> 

The ColdFusion function XmlSearch() takes the XPath string (in this case /cdcollection/artist/recommend) and returns an array of XML elements, as shown in Figure 15.1.

Figure 15.1. The result of an XPath search using XmlSearch().


Each element in the array represents one of the elements found by the XPath search.



Advanced Macromedia ColdFusion MX 7 Application Development
Advanced Macromedia ColdFusion MX 7 Application Development
ISBN: 0321292693
EAN: 2147483647
Year: 2006
Pages: 240
Authors: Ben Forta, et al

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net