21.4. Querying XML with XPathXPath is a simple language that refers to elements, attributes, and text within an XML document. An XPath expression can refer to an XML element by its position in the document hierarchy or can select an element based on the value of (or simple presence of) an attribute. A full discussion of XPath is beyond the scope of this chapter, but Section 21.4.1. presents a simple XPath tutorial that explains common XPath expressions by example. The W3C has drafted an API for selecting nodes in a DOM document tree using an XPath expression. Firefox and related browsers implement this W3C API using the evaluate( ) method of the Document object (for both HTML and XML documents). Mozilla-based browsers also implement Document.createExpression( ), which compiles an XPath expression so that it can be efficiently evaluated multiple times. IE provides XPath expression evaluation with the selectSingleNode( ) and selectNodes( ) methods of XML (but not HTML) Document and Element objects. Later in this section, you'll find example code that uses both the W3C and IE APIs. If you wish to use XPath with other browsers, consider the open-source AJAXSLT project at http://goog-ajaxslt.sourceforge.net. 21.4.1. XPath ExamplesIf you understand the tree structure of a DOM document, it is easy to learn simple XPath expressions by example. In order to understand these examples, though, you must know that an XPath expression is evaluated in relation to some context node within the document. The simplest XPath expressions simply refer to children of the context node: contact // The set of all <contact> tags beneath the context node contact[1] // The first <contact> tag beneath the context contact[last( )] // The last <contact> child of the context node contact[last( )-1] // The penultimate <contact> child of the context node Note that XPath array syntax uses 1-based arrays instead of JavaScript-style 0-based arrays. The "path" in the name XPath refers to the fact that the language treats levels in the XML element hierarchy like directories in a filesystem and uses the "/" character to separate levels of the hierarchy. Thus: contact/email // All <email> children of <contact> children of context /contacts // The <contacts> child of the document root (leading /) contact[1]/email // The <email> children of the first <contact> child contact/email[2] // The 2nd <email> child of any <contact> child of context Note that contact/email[2] evaluates to the set of <email> elements that are the second <email> child of any <contact> child of the context node. This is not the same as contact[2]/email or (contact/email)[2]. A dot (.) in an XPath expression refers to the context element. And a double-slash (//) elides levels of the hierarchy, referring to any descendant instead of an immediate child. For example: .//email // All <email> descendants of the context //email // All <email> tags in the document (note leading slash) XPath expressions can refer to XML attributes as well as elements. The @ character is used as a prefix to identify an attribute name: @id // The value of the id attribute of the context node contact/@name // The values of the name attributes of <contact> children The value of an XML attribute can filter the set of elements returned by an XPath expression. For example: contact[@personal="true"] // All <contact> tags with attribute personal="true" To select the textual content of XML elements, use the text( ) method: contact/email/text( ) // The text nodes within <email> tags //text( ) // All text nodes in the document XPath is namespace-aware, and you can include namespace prefixes in your expressions: //xsl:template // Select all <xsl:template> elements When you evaluate an XPath expression that uses namespaces, you must, of course, provide a mapping of namespace prefixes to namespace URLs. These examples are just a survey of common XPath usage patterns. XPath has other syntax and features not described here. One example is the count( ) function, which returns the number of nodes in a set rather than returning the set itself: count(//email) // The number of <email> elements in the document 21.4.2. Evaluating XPath ExpressionsExample 21-10 shows an XML.XPathExpression class that works in IE and in standards-compliant browsers such as Firefox. Example 21-10. Evaluating XPath expressions
21.4.3. More on the W3C XPath APIBecause of the limitations in the IE XPath API, the code in Example 21-10 handles only queries that evaluate to a document node or set of nodes. It is not possible in IE to evaluate an XPath expression that returns a string of text or a number. This is possible with the W3C standard API, however, using code that looks like this: // How many <p> tags in the document? var n = document.evaluate("count(//p)", document, null, XPathResult.NUMBER_TYPE, null).numberValue; // What is the text of the 2nd paragraph? var text = document.evaluate("//p[2]/text( )", document, null, XPathResult.STRING_TYPE, null).stringValue; There are two things to note about these simple examples. First, they use the document.evaluate( ) method to evaluate an XPath expression directly without compiling it first. The code in Example 21-10 instead used document.createExpression( ) to compile an XPath expression into a form that could be reused. Second, notice that these examples are working with HTML <p> tags in the document object. In Firefox, XPath queries can be used on HTML documents as well as XML documents. See Document, XPathExpression, and XPathResult in Part IV for complete details on the W3C XPath API. |