Evaluating an XPath Expression

Problem

You want to extract information from a parsed XML document by evaluating an XPath expression.

Solution

Use the Xalan library. First, parse the XML document to obtain a pointer to a xalanc::XalanDocument. This can be done by using instances of XalanSourceTreeInit, XalanSourceTreeDOMSupport, and XalanSourceTreeParserLiaisoneach defined in the namespace xalanclike so:

#include 
#include 
#include 
#include 
...
int main( )
{
 ...
 // Initialize the XalanSourceTree subsystem
 XalanSourceTreeInit init;
 XalanSourceTreeDOMSupport support; 

 // Interface to the parser
 XalanSourceTreeParserLiaison liaison(support);

 // Hook DOMSupport to ParserLiaison
 support.setParserLiaison(&liaison); 
 LocalFileInputSource src(document-location);
 XalanDocument* doc = liason.ParseXMLStream(doc);
 ...
}

Alternatively, you can use the Xerces DOM parser to obtain a pointer to a DOMDocument, as in Example 14-14, and then use instances of XercesDOMSupport, XercesParserLiaison, and XercesDOMWrapperParsedSource each defined in namespace xalanc to obtain a pointer to a XalanDocument corresponding to the DOMDocument:

#include 
#include 
#include 
#include 
...
int main( ) {
 ...
 DOMDocument* doc = ... ;
 XercesDOMSupport support;
 XercesParserLiaison liaison(support);
 XercesDOMWrapperParsedSource src(doc, liaison, support);
 XalanDocument* xalanDoc = src.getDocument( );
 ...
}

Next, obtain a pointer to the node that serves as the context node when evaluating the XPath expression. You can do this by using XalanDocument's DOM interface. Construct an XPathEvaluator to evaluate the XPath expression and a XalanDocumentPrefixResolver to resolve namespace prefixes in the XML document. Finally, call the XPathEvaluator's evaluate( ) method, passing the DOMSupport, the context node, the XPath expression, and the PrefixResolver as arguments. The result of evaluating the expression is returned as an object of type XObjectPtr; the operations you can perform on this object depend on its XPath data type, which you can query using the getType( ) method.

For example, suppose you want to extract a list of animals' names from the document animals.xml from Example 14-1. You can do this by parsing the document and evaluating the XPath expression animalList/animal/name/child::text( ) with the document root as context node. This is illustrated in Example 14-23.

Example 14-23. Evaluating an XPath expression using Xalan

#include  // size_t
#include 
#include  // cout
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include "animal.hpp"
#include "xerces_strings.hpp"

using namespace std;
using namespace xercesc;
using namespace xalanc;

// RAII utility that initializes the parser and the XPath engine
// and frees resources when it goes out of scope
class XPathInitializer {
public:
 XPathInitializer( ) 
 { 
 XMLPlatformUtils::Initialize( );
 XPathEvaluator::initialize( );
 }
 ~XPathInitializer( ) 
 { 
 XPathEvaluator::terminate( );
 XMLPlatformUtils::Terminate( );
 }
private:
 // Prohibit copying and assignment
 XPathInitializer(const XPathInitializer&);
 XPathInitializer& operator=(const XPathInitializer&);
};

// Receives Error notifications
class CircusErrorHandler : public DefaultHandler {
public:
 void error(const SAXParseException& e)
 {
 throw runtime_error(toNative(e.getMessage( )));
 }
 void fatalError(const SAXParseException& e) { error(e); }
};

int main( )
{
 try {
 // Initialize Xerces and XPath and construct a DOM parser.
 XPathInitializer init;
 XercesDOMParser parser;
 
 // Register error handler
 CircusErrorHandler error;
 parser.setErrorHandler(&error);

 // Parse animals.xml.
 parser.parse(fromNative("animals.xml").c_str( ));
 DOMDocument* doc = parser.getDocument( );
 DOMElement* animalList = doc->getDocumentElement( );

 // Create a XalanDocument based on doc.
 XercesDOMSupport support;
 XercesParserLiaison liaison(support);
 XercesDOMWrapperParsedSource src(doc, liaison, support);
 XalanDocument* xalanDoc = src.getDocument( );

 // Evaluate an XPath expression to obtain a list 
 // of text nodes containing animals' names
 XPathEvaluator evaluator;
 XalanDocumentPrefixResolver resolver(xalanDoc);
 XercesString xpath = 
 fromNative("animalList/animal/name/child::text( )");
 XObjectPtr result =
 evaluator.evaluate( 
 support, // DOMSupport
 xalanDoc, // context node
 xpath.c_str( ), // XPath expr
 resolver ); // Namespace resolver
 const NodeRefListBase& nodeset = result->nodeset( );


 // Iterate through the node list, printing the animals' names
 for ( size_t i = 0,
 len = nodeset.getLength( );
 i < len;
 ++i )
 {
 const XMLCh* name = 
 nodeset.item(i)->getNodeValue( ).c_str( );
 std::cout << toNative(name) << "
";
 }
 } catch (const DOMException& e) {
 cout << "xml error: " << toNative(e.getMessage( )) << "
";
 return EXIT_FAILURE;
 } catch (const exception& e) {
 cout << e.what( ) << "
";
 return EXIT_FAILURE;
 }
}

Discussion

XPath is a pattern matching language designed to extract information from XML documents. XPath's main constructthe path expressionprovides a hierarchical syntax for referring to elements, attributes, and text nodes based on their names, attributes, textual content, inheritance relations, and other properties. In addition to operating on sets of nodes, or node sets, the XPath language can handle strings, numbers, and boolean values. XPath Version 2.0, which is not currently supported by the Xalan library, provides an even richer data model, based on the XML Schema recommendation. (See Recipe 14.5.)

XPath expressions are evaluated in the context of a node in an XML document, called the context node, which is used to interpret relative constructs such as parent, child, and descendent. In Example 14-23, I specified the root of the XML document as the context node; this is the node that is the parent of the XML document's root element and of any top-level processing instructions and comments. When evaluated with the root node as the context node, the path expression animalList/animal/name/child::text( ) matches all text node children of name elements whose parent element is an animal element and whose grandparent is an animalList element.

The evaluate( ) method of XPathEvaluator returns an XObjectPtr representing the result of evaluating the XPath expression. The data type of an XObjectPtr can be queried by dereferncing it to obtain an XObject and calling the method getType( ); the underlying data can then be accessed by calling num( ), boolean( ), str( ), or nodeset(). Since the XPath expression in Example 14-23 represents a node set, I used the nodeset( ) method to obtain a reference to a NodeRefListBase, which provides access to the nodes in a node set through its getLength( ) and item( ) methods. The method item( ) returns a pointer to a XalanNode, whose getNodeValue( ) method returns a string with an interface similar to std::basic_string.

Since XPath provides an easy way to locate nodes in an XML document, it's natural to wonder whether you can use Xalan XPath expressions to obtain instances of xercesc::DOMNode from a xercesc::DOMDocument. Indeed it is possible, but it is slightly awkward; what's more, by default, the xercesc::DOMNodes obtained in this way are part of a read-only view of the XML document tree, which limits the usefulness of XPath as a tool for DOM manipulation. There are ways to work around this restriction, but they are complex and potentially dangerous.

Fortunately, the Pathan library provides an implementation of XPath that is compatible with Xerces and which allows easy manipulation of the Xerces DOM. Example 14-24 shows how to use Pathan to locate and remove the node corresponding to Herby the elephant in the XML document from Example 14-1, by evaluating the XPath expression animalList/animal[child::name='Herby']. Comparing this example with Example 14-10 makes it clear how powerful the XPath language is.

Example 14-24. Locating a node and removing it using Pathan

#include 
#include  // cout
#include 
#include 
#include 
#include 
#include 
#include 
#include  
#include "xerces_strings.hpp" // Example 14-4

using namespace std;
using namespace xercesc;

/*
 * Define XercesInitializer as in Example 14-8, and
 * CircusErrorHandler and DOMPtr as in Example 14-10
 */

int main( )
{
 try {
 // Initialize Xerces and retrieve a DOMImplementation.
 XercesInitializer init;
 DOMImplementation* impl = 
 DOMImplementationRegistry::getDOMImplementation(
 fromNative("LS").c_str( )
 );
 if (impl == 0) {
 cout << "couldn't create DOM implementation
";
 return EXIT_FAILURE;
 }

 // Construct a DOMBuilder to parse animals.xml.
 DOMPtr parser = 
 static_cast(impl)->
 createDOMBuilder(
 DOMImplementationLS::MODE_SYNCHRONOUS, 0
 );
 CircusErrorHandler err;
 parser->setErrorHandler(&err);

 // Parse animals.xml.
 DOMDocument* doc = 
 parser->parseURI("animals.xml");
 DOMElement* animalList = doc->getDocumentElement( );

 // Create XPath expression.
 auto_ptr 
 evaluator(XPathEvaluator::createEvaluator( ));
 auto_ptr 
 resolver(evaluator->createNSResolver(animalList));
 auto_ptr 
 xpath(
 evaluator->createExpression(
 fromNative(
 "animalList/animal[child::name='Herby']"
 ).c_str( ),
 resolver.get( )
 )
 );
auto_ptr evaluator(XPathEvaluator::createEvaluator( ));
auto_ptr resolver(evaluator->createNSResolver(animalList));
auto_ptr xpath(
 evaluator->createExpression(
 fromNative("animalList/animal[child::name='Herby']").c_str( ),
 resolver.get( )
 ));

 // Evaluate the expression.
 XPathResult* result = 
 xpath->evaluate(
 doc, 
 XPathResult::ORDERED_NODE_ITERATOR_TYPE, 
 0
 );

 DOMNode* herby;
 if (herby = result->iterateNext( )) {
 animalList->removeChild(herby);
 herby->release( ); // optional.
 }

 // Construct a DOMWriter to save animals.xml.
 DOMPtr writer = 
 static_cast(impl)->createDOMWriter( );
 writer->setErrorHandler(&err);

 // Save animals.xml.
 LocalFileFormatTarget file("circus.xml");
 writer->writeNode(&file, *animalList);
 } catch (const DOMException& e) {
 cout << toNative(e.getMessage( )) << "
";
 return EXIT_FAILURE;
 } catch (const XPathException &e) {
 cout << e.getString( ) << "
";
 return EXIT_FAILURE;
 } catch (const exception& e) {
 cout << e.what( ) << "
";
 return EXIT_FAILURE;
 }
}

Example 14-24 uses Pathan 1, which implements the XPath 1.0 recommendation, the same version currently supported by Xalan. Pathan 2, currently available in a beta version, provides a preliminary implementation of the XPath 2.0 recommendation. Pathan 2 represents a more faithful implementation of the XPath standard; I recommend using Pathan 2 instead of Pathan 1 as soon as a non-beta version becomes available.