Navigating Through XML Data with XPath and XQuery

< Day Day Up >

Processing an XML document by iterating over every element works in some cases. In large XML documents, when you're interested in only specific records, not the whole set, this parsing is unwieldy. To narrow the focus and extract only the records you need, the XML Beans API allows using XPath and XQuery expressions.

Overview of XPath

XPath uses expressions with a syntax that's similar to file system navigation to identify content in an XML document. For example, when working with the sample Casino document (refer to Listing 10.4), the following line would select all the games at all the tables in the casino:

 /casino/table/game

Locating Elements in a Document

The leading slash in the preceding expression means that you're defining the absolute path to the elements of interest. In this case, the statement returns only the instances of game that are subelements of table .

If the path expression starts with a double slash ( // ), all elements that meet the search criteria are returned, regardless of their location in the XML document hierarchy. For example, the following line also returns all instances of game in the casino:

 //game

The path expressions can also replace specific "directories" in the path expression with wildcards ( * ). For example, the following line also returns all instances of game in the casino ”you can think of it as returning all instance of game that are grandchildren of the casino element:

 //casino/*/game

Filtering on the Elements

To help filter the data, XPath provides a library of standard operators to work on the contents of elements to which path expressions are pointing. Expressions using these operators are specified by including them in square brackets ( [] ) at the end of an XPath expression.

The typical equality ( = , != ), relational ( < , <= , > , >= ), and Boolean ( or , and ) operators are available and can be used to check against element and attribute values. Attributes are specified by the @ prefix.

For example, the following line returns all tables where Blackjack is being played :

 /casino/table[game="blackjack"]

This second example returns the data for tables 1 through 5 in the casino:

 /casino/table[@number<=5]

Processing the Element Data

XPath also supplies a large library of standard functions to process the data after it's retrieved from path and relational expressions. Discussion on the complete set of these functions is beyond the scope of this book, but a couple of brief examples follow.

This first example provides a count of all slot machines:

 count(/casino/table/[game="slots"])

This second example returns the record for the first slot machine in the casino:

 /casino/table/[game="slots"][position()=1]

Where to Get More Data

To learn more about XPath and its features, visit the XPath page on the World Wide Web Consortium (W3C) Web site at http://www.w3.org/TR/xpath.

Using XPath Expressions with XML Beans

To use XPath expressions, the XML Beans API provides the selectPath method, which returns results related to the current XML document. There are two different ways to call this method: through an XmlObject (or any subclass) or through an XmlCursor . In both cases, the method expects a single string argument containing the following:

A declaration of the namespace for the elements, if used in the document
The XPath expression, prepended with $this to indicate that the expression is being evaluated from the current element, which in many cases can be the root element of the document

There is an XmlObject for every element of an XML Beans representation of a document, so the selectPath method can be called from the root element or any of its descendants. When called on an instance of XmlObject , the method returns an array of objects that match the criteria. Listing 10.7 shows the code necessary to call one of the previous examples ”returning all tables where Blackjack is being played.

ABOUT THE `XMLCursor` OBJECT

The XmlCursor object enables you to navigate through a document with a token model, similar to DOM parsers, and is ideal for handling XML documents that aren't based on Schema.

Listing 10.7. Calling an XPath Expression with the XML Beans API

 public XmlObject[] getBlackjackTables(String casinoString) {     try     {         CasinoDocument casinoDoc = CasinoDocument.Factory.         parse(casinoString);         String xPathString = "$this/casino/table[game="blackjack"]";         XmlObject[] resultArray = casinoDoc.selectPath(xPathString);         return resultArray;     }     catch (com.bea.xml.XmlException xmle)     {         return null;     } }

When called on an instance of XmlCursor , the selectPath method stores in the XmlCursor object a list of locations, or selections, in the XML document that match the criteria. Navigation through the selections is accomplished by calling the iterator-type methods summarized in Table 10.3.

Table 10.3. Iterator-Type Methods of the `XmlCursor` Class

METHOD	PURPOSE
`getSelectionCount()`	Returns the number of matching rows
`toNextSelection()`	Moves the cursor to the next item on the list
`toSelection(int)`	Moves the cursor to the specified item
`hasNextSelection()`	Checks to see whether the cursor has any more matching items
`clearSelections()`	Clears list of items from cursor, not from document

Overview of XQuery

Although XPath is a powerful data navigation language, it lacks the capability to represent complex loop operations and user -defined functions. XQuery builds on XPath and adds these missing features. This section is short because, except for the most obscure forms of XPath statements, XQuery is a superset of XPath.

Iterating over Data

XQuery provides a means for creating FLWR ( for , let , where , and result ) constructs that provide query language features similar to SQL for relational databases:

The for clause enables you to loop over a sequence of elements and assign the value of each iteration to a variable.
The let clause enables you to assign the results of a single-valued XPath expression to a variable.
The where clause further qualifies the for clause, restricting loop iteration to elements that meet the specified criteria.
The result clause determines what data is passed back to the calling method.

The following code, although not overly exciting, shows the FLWR representation of the code needed to return all tables where Blackjack is being played:

 for $tb in $this/casino/table where $tb.game="blackjack" return $tb

The power of the FLWR format is illustrated by two important features. First, XQuery permits for loops to be nested, providing the XML equivalent of joining data from two tables. Second, the return clause can include embedded FLWR constructs, allowing result data to be wrapped by formatting text. This latter feature is often used to produce HTML documents from XQuery expressions.

To demonstrate these two features, service data ”the table ID and corresponding status ( ok , out of money ) ”is added as a separate element tree within the Casino document, as shown in Listing 10.8.

Listing 10.8. Representing Service Data in the Casino XML Document

 <casino xmlns="http://openuri.org/bea/samples/workshop/chap10/casinoLayout">     <table number="1"> ...     </table>     <serviceRequest id="1">         <tableNum>4,</tableNum>         <status>out of money</status>     </serviceRequest>     <serviceRequest id="2">         <tableNum>5,</tableNum>         <status>out of money</status>     </serviceRequest> </casino>

The code in Listing 10.9 can then be used to create a work order for the coin-filling attendant. This work order lists the slot machines that need to be filled and the affiliated coin denomination.

Listing 10.9. Nested Loops and Embedded FLWR Constructs in XQuery

 <html> <h1>Work Order</h1> {     for $tb in $this/casino/table     for $sr in $this/casino/serviceRequest     where $tb@number=$sr.tableNum         and $tb.game="slots"         and $sr.status="out of money"     return <h2>Table{$sr.tableNum} needs ${$tb.minimumBet} coins</h2> } </html>

Defining Functions

XQuery, as a full-featured programming language, also enables you to create user-defined functions. These functions have varied uses, but two possible applications are

To simplify program text when there are repeated operations (for example, like macros)
To allow for recursion

The code in Listing 10.10 shows how to streamline XQuery code listings by encapsulating the code to print the number of tables where each game is being played in a function.

Listing 10.10. User-Defined Functions in XQuery

 define function formatTotalLine($game) {     let $countGame=count($this/cl:table[game=$game])     return         $game + " is being played at " + $countGame + " tables." } for $game in ("slots","blackjack","poker","roulette")     return formatTotalLine($game)

Where to Get More Data

To learn more about XQuery and its features, visit the XQuery page on the W3C Web site at http://www.w3.org/TR/xquery.

Using XQuery Expressions with XML Beans

To use XQuery expressions, the XML Beans API provides the executeQuery method. As with the selectPath method, there are two different ways to call this method: through an XmlObject (or any subclass) or through an XmlCursor . In both cases, the method expects a single string argument containing

A declaration of the namespace for the elements, if used in the document
The XQuery expression, prepended with $this to indicate that the expression is being evaluated from the current element, which in many cases can be the root element of the document

Note that unlike the selectPath method, the executeQuery method returns results as a new XML document. Calling the method on an XmlObject returns the new document as an array of the type XmlObject ; calling the method on an XmlCursor returns a new instance of XmlCursor .

Listing 10.11 shows the code for calling one of the previous examples ”returning the work order for the coin-filling attendant as an HTML document.

Listing 10.11. Invoking an XQuery Expression with the XML Beans API

 public XmlObject generateWorkOrder(XmlObject casinoDoc) {     XmlCursor cursor = casinoDoc.newCursor();     cursor.toNextSibling();     XmlObject resultXML = null;     String xQueryString =         "<html><h1>Work Order</h1>" +         "{ " +         "for $tb in $this/casino/table " +         "for $sr in $this/casino/serviceRequest " +         "where $tb@number=$sr.tableNum and $tb.game=\"slots\" " +         "and $sr.status=\"out of money\" " +         "return <h2>Table{$sr.tableNum} needs ${$tb.minimumBet}"         " coins</h2>" +         "} " +         "</html>";     try     {         XmlCursor resultCursor = cursor.execQuery(xQueryString);         resultXML = resultCursor.getObject();     return resultXML;     }     catch(Exception e)     {     return null;     } }

< Day Day Up >

Overview of XPath

Locating Elements in a Document

Filtering on the Elements

Processing the Element Data

Where to Get More Data

Using XPath Expressions with XML Beans

ABOUT THE XMLCursor OBJECT

Listing 10.7. Calling an XPath Expression with the XML Beans API

Table 10.3. Iterator-Type Methods of the XmlCursor Class

Overview of XQuery

Iterating over Data

Listing 10.8. Representing Service Data in the Casino XML Document

Listing 10.9. Nested Loops and Embedded FLWR Constructs in XQuery

Defining Functions

Listing 10.10. User-Defined Functions in XQuery

Where to Get More Data

Using XQuery Expressions with XML Beans

Listing 10.11. Invoking an XQuery Expression with the XML Beans API

ABOUT THE `XMLCursor` OBJECT

Table 10.3. Iterator-Type Methods of the `XmlCursor` Class