Section 21.4. Querying XML with XPath | JavaScript: The Definitive Guide

21.4. Querying XML with XPath

XPath is a simple language that refers to elements, attributes, and text within an XML document. An XPath expression can refer to an XML element by its position in the document hierarchy or can select an element based on the value of (or simple presence of) an attribute. A full discussion of XPath is beyond the scope of this chapter, but Section 21.4.1. presents a simple XPath tutorial that explains common XPath expressions by example.

The W3C has drafted an API for selecting nodes in a DOM document tree using an XPath expression. Firefox and related browsers implement this W3C API using the evaluate( ) method of the Document object (for both HTML and XML documents). Mozilla-based browsers also implement Document.createExpression( ), which compiles an XPath expression so that it can be efficiently evaluated multiple times.

IE provides XPath expression evaluation with the selectSingleNode( ) and selectNodes( ) methods of XML (but not HTML) Document and Element objects. Later in this section, you'll find example code that uses both the W3C and IE APIs.

If you wish to use XPath with other browsers, consider the open-source AJAXSLT project at http://goog-ajaxslt.sourceforge.net.

21.4.1. XPath Examples

If you understand the tree structure of a DOM document, it is easy to learn simple XPath expressions by example. In order to understand these examples, though, you must know that an XPath expression is evaluated in relation to some context node within the document. The simplest XPath expressions simply refer to children of the context node:

 contact            // The set of all <contact> tags beneath the context node contact[1]         // The first <contact> tag beneath the context contact[last( )]    // The last <contact> child of the context node contact[last( )-1]  // The penultimate <contact> child of the context node

Note that XPath array syntax uses 1-based arrays instead of JavaScript-style 0-based arrays.

The "path" in the name XPath refers to the fact that the language treats levels in the XML element hierarchy like directories in a filesystem and uses the "/" character to separate levels of the hierarchy. Thus:

 contact/email      // All <email> children of <contact> children of context /contacts          // The <contacts> child of the document root (leading /) contact[1]/email   // The <email> children of the first <contact> child contact/email[2]   // The 2nd <email> child of any <contact> child of context

Note that contact/email[2] evaluates to the set of <email> elements that are the second <email> child of any <contact> child of the context node. This is not the same as contact[2]/email or (contact/email)[2].

A dot (.) in an XPath expression refers to the context element. And a double-slash (//) elides levels of the hierarchy, referring to any descendant instead of an immediate child. For example:

 .//email         // All <email> descendants of the context //email          // All <email> tags in the document (note leading slash)

XPath expressions can refer to XML attributes as well as elements. The @ character is used as a prefix to identify an attribute name:

 @id            // The value of the id attribute of the context node contact/@name  // The values of the name attributes of <contact> children

The value of an XML attribute can filter the set of elements returned by an XPath expression. For example:

 contact[@personal="true"]  // All <contact> tags with attribute personal="true"

To select the textual content of XML elements, use the text( ) method:

 contact/email/text( )  // The text nodes within <email> tags //text( )              // All text nodes in the document

XPath is namespace-aware, and you can include namespace prefixes in your expressions:

 //xsl:template     // Select all <xsl:template> elements

When you evaluate an XPath expression that uses namespaces, you must, of course, provide a mapping of namespace prefixes to namespace URLs.

These examples are just a survey of common XPath usage patterns. XPath has other syntax and features not described here. One example is the count( ) function, which returns the number of nodes in a set rather than returning the set itself:

 count(//email)    // The number of <email> elements in the document

21.4.2. Evaluating XPath Expressions

Example 21-10 shows an XML.XPathExpression class that works in IE and in standards-compliant browsers such as Firefox.

Example 21-10. Evaluating XPath expressions

 /**  * XML.XPathExpression is a class that encapsulates an XPath query and its  * associated namespace prefix-to-URL mapping. Once an XML.XPathExpression  * object has been created, it can be evaluated one or more times (in one  * or more contexts) using the getNode( ) or getNodes( ) methods.  *  * The first argument to this constructor is the text of the XPath expression.  *  * If the expression includes any XML namespaces, the second argument must  * be a JavaScript object that maps namespace prefixes to the URLs that define  * those namespaces. The properties of this object are the prefixes, and  * the values of those properties are the URLs.  */ XML.XPathExpression = function(xpathText, namespaces) {     this.xpathText = xpathText;    // Save the text of the expression     this.namespaces = namespaces;  // And the namespace mapping     if (document.createExpression) {         // If we're in a W3C-compliant browser, use the W3C API         // to compile the text of the XPath query         this.xpathExpr =             document.createExpression(xpathText,                                       // This function is passed a                                       // namespace prefix and returns the URL.                                       function(prefix) {                                           return namespaces[prefix];                                       });     }     else {         // Otherwise, we assume for now that we're in IE and convert the         // namespaces object into the textual form that IE requires.         this.namespaceString = "";         if (namespaces != null) {             for(var prefix in namespaces) {                 // Add a space if there is already something there                 if (this.namespaceString) this.namespaceString += ' ';                 // And add the namespace                 this.namespaceString += 'xmlns:' + prefix + '="' +                     namespaces[prefix] + '"';             }         }     } }; /**  * This is the getNodes( ) method of XML.XPathExpression. It evaluates the  * XPath expression in the specified context. The context argument should  * be a Document or Element object. The return value is an array  * or array-like object containing the nodes that match the expression.  */ XML.XPathExpression.prototype.getNodes = function(context) {     if (this.xpathExpr) {         // If we are in a W3C-compliant browser, we compiled the         // expression in the constructor. We now evaluate that compiled         // expression in the specified context.         var result =             this.xpathExpr.evaluate(context,                                     // This is the result type we want                                     XPathResult.ORDERED_NODE_SNAPSHOT_TYPE,                                     null);         // Copy the results we get into an array.         var a = new Array(result.snapshotLength);         for(var i = 0; i < result.snapshotLength; i++) {             a[i] = result.snapshotItem(i);         }         return a;     }     else {         // If we are not in a W3C-compliant browser, attempt to evaluate         // the expression using the IE API.         try {             // We need the Document object to specify namespaces             var doc = context.ownerDocument;             // If the context doesn't have ownerDocument, it is the Document             if (doc == null) doc = context;             // This is IE-specific magic to specify prefix-to-URL mapping             doc.setProperty("SelectionLanguage", "XPath");             doc.setProperty("SelectionNamespaces", this.namespaceString);             // In IE, the context must be an Element not a Document,             // so if context is a document, use documentElement instead             if (context == doc) context = doc.documentElement;             // Now use the IE method selectNodes( ) to evaluate the expression             return context.selectNodes(this.xpathText);         }         catch(e) {             // If the IE API doesn't work, we just give up             throw "XPath not supported by this browser.";         }     } } /**  * This is the getNode( ) method of XML.XPathExpression. It evaluates the  * XPath expression in the specified context and returns a single matching  * node (or null if no node matches). If more than one node matches,  * this method returns the first one in the document.  * The implementation differs from getNodes( ) only in the return type.  */ XML.XPathExpression.prototype.getNode = function(context) {     if (this.xpathExpr) {         var result =             this.xpathExpr.evaluate(context,                                     // We just want the first match                                     XPathResult.FIRST_ORDERED_NODE_TYPE,                                     null);         return result.singleNodeValue;     }     else {         try {             var doc = context.ownerDocument;             if (doc == null) doc = context;             doc.setProperty("SelectionLanguage", "XPath");             doc.setProperty("SelectionNamespaces", this.namespaceString);             if (context == doc) context = doc.documentElement;             // In IE call selectSingleNode instead of selectNodes             return context.selectSingleNode(this.xpathText);         }         catch(e) {             throw "XPath not supported by this browser.";         }     } }; // A utility to create an XML.XPathExpression and call getNodes( ) on it XML.getNodes = function(context, xpathExpr, namespaces) {     return (new XML.XPathExpression(xpathExpr, namespaces)).getNodes(context); }; // A utility to create an XML.XPathExpression and call getNode( ) on it XML.getNode  = function(context, xpathExpr, namespaces) {     return (new XML.XPathExpression(xpathExpr, namespaces)).getNode(context); };

21.4.3. More on the W3C XPath API

Because of the limitations in the IE XPath API, the code in Example 21-10 handles only queries that evaluate to a document node or set of nodes. It is not possible in IE to evaluate an XPath expression that returns a string of text or a number. This is possible with the W3C standard API, however, using code that looks like this:

 // How many <p> tags in the document? var n = document.evaluate("count(//p)", document, null,                           XPathResult.NUMBER_TYPE, null).numberValue; // What is the text of the 2nd paragraph? var text = document.evaluate("//p[2]/text( )", document, null,                              XPathResult.STRING_TYPE, null).stringValue;

There are two things to note about these simple examples. First, they use the document.evaluate( ) method to evaluate an XPath expression directly without compiling it first. The code in Example 21-10 instead used document.createExpression( ) to compile an XPath expression into a form that could be reused. Second, notice that these examples are working with HTML <p> tags in the document object. In Firefox, XPath queries can be used on HTML documents as well as XML documents.

See Document, XPathExpression, and XPathResult in Part IV for complete details on the W3C XPath API.