Using XPath


XPath provides a way to specify selection criteria for elements, such as “all the items that cost more than $30” or “the invoices that are more than 30 days old.” XPath expressions are normally used in XSL style sheets but can also be used within code to select elements from a Document Object Model (DOM) tree.

The XPathNavigator Class

Before you can try using XPath, let me introduce the XPathNavigator class, which is part of the System::Xml::XPath namespace. In Chapter 20, you encountered two ways of parsing XML. The first, XmlTextReader, provided a simple, forward-only mechanism, where you used the Read method to read the elements in sequence. The second, XmlDocument, read the entire document into memory, but you had to walk through the tree manually. One of the main differences between these two classes is that XmlTextReader always has the idea of a current node, but XmlDocument does not. XPathNavigator is a class that sits on top of an XmlDocument and navigates through the document for you. Like XmlTextReader, it has the notion of a current position, but unlike XmlTextReader, you aren’t restricted to moving forward through the document.

The following tables list the most commonly used properties and methods of the XPathNavigator class. You’ll notice that there is a certain amount of overlap with the XmlDocument class.

Property

Description

HasAttributes

Set to true if the current element has attributes

HasChildren

Set to true if the current node has children

IsEmptyElement

Set to true if the current element has no content

Name, LocalName

The name of the current node, with or without a namespace prefix

NodeType

The node type; will be one of the XmlNodeType values listed on page 409

Prefix

The current namespace prefix, if any

Value

The value of the current node

XmlLang

The value of the xml:lang attribute

Method

Description

Clone

Creates a new XPathNavigator positioned at the same point.

ComparePosition

Compares the position of this navigator to that of another navigator.

Compile

Compiles an XPath expression into an XpathExpression object.

Evaluate

Evaluates an XPath expression.

GetAttribute

Gets the value of a named attribute.

GetNamespace

Gets the value of a namespace node corresponding to a local name.

IsDescendant

Returns true if an XPathNavigator is a descendant of the current navigator. One navigator is a descendant of another if it is positioned on a descendant node.

IsSamePosition

Returns true if two navigators are positioned on the same node.

Matches

Returns true if the current node matches an XPath expression.

MoveTo

Moves the navigator to the same position as another navigator. Returns false if the move fails.

MoveToAttribute

Positions the navigator on a given attribute. Returns false if the attribute cannot be found.

MoveToFirst, MoveToNext,
MoveToPrevious

Moves between nodes at the same level in the tree (sibling nodes). Returns false if there is not a valid node to move to.

MoveToFirstAttribute,
MoveToNextAttribute

Moves to the first and subsequent attributes of an element. Returns false if there is not an attribute to move to.

MoveToFirstChild

Moves to the first child element. Returns false if there are no children.

MoveToId

Moves to a node with the specified ID attribute. Returns false if a node with the given ID cannot be found.

MoveToNamespace,
MoveToNextNamespace

Moves to namespace nodes. Returns false if the namespace cannot be found or if the navigator is not positioned on an element node.

MoveToParent

Moves up one level in the tree. Returns false if the current node does not have a parent.

MoveToRoot

Moves to the root of the tree.

Select

Selects zero or more nodes based on an XPath expression.

SelectAncestors

Selects ancestors of the current node based on selection criteria.

SelectChildren

Selects children of the current node based on selection criteria.

SelectDescendants

Selects descendants of the current node based on selection
criteria.

Using XPathNavigator

This exercise will show you how to create an XPathNavigator and use it to move around a document. It uses the same volcanoes.xml and geology.dtd files used in the exercises in Chapter 20.

  1. Create a new Visual C++ Console Application (.NET) project named CppNavigator.

  2. Add the following three lines to the top of CppNavigator.cpp:

    #using <System.xml.dll> using namespace System::Xml; using namespace System::Xml::XPath;

    The code for the XML classes lives in System.xml.dll, so it needs to be included by means of a #using directive. It’s also going to be easier to use the classes if you include using directives for the System::Xml and System::Xml::XPath namespaces, as shown in the preceding code.

  3. You’re going to supply the name of the XML document when you run the program from the command line, so change the declaration of the _tmain function to include the command-line argument parameters, like this:

    int _tmain(int argc, char* argv[])
  4. Add this code to the start of the _tmain function to check the number of arguments and save the path:

    // Check for required arguments if (argc < 2) { Console::WriteLine(S"Usage: CppNavigator path"); return -1; } String* path = new String(argv[1]);
  5. Now that you’ve got the path, create an XmlDocument to parse the file and load it into a DOM tree.

    try { // Create the XmlDocument to parse the file XmlDocument* doc = new XmlDocument(); // Load the file doc->Load(path); Console::WriteLine(S"Document loaded"); } catch(Exception* pe) { Console::WriteLine(pe->Message); }

    As I explained in the XmlDocument example in Chapter 20, it’s a good idea to be prepared to catch exceptions when using XmlDocument because it will throw exceptions if it has problems opening the file and if it finds any parsing errors.

  6. Create an XPathNavigator that uses the XmlDocument. Add the following code and any code presented further in this exercise to the end of the code inside the try block:

    // Create the navigator XPathNavigator* nav = doc->CreateNavigator();

    The navigator will let you navigate over the tree created by the XmlDocument.

    Note

    It’s also possible to create XPathNavigator objects to work with fragments of XML documents by using an alternative constructor that takes a pointer to a node somewhere in the document.

  7. The following code shows how you use the navigator to walk through the document:

    // Move to the top of the tree and print details nav->MoveToRoot(); Console::WriteLine(S"top: name={0}, type={1}, value={2}", nav->Name, __box(nav->NodeType)->ToString(), nav->Value); // Move to the first child, which is a comment nav->MoveToFirstChild(); Console::WriteLine(S"first child: name={0}, type={1}", nav->Name, __box(nav->NodeType)->ToString()); // Move to the next element, which is the root element nav->MoveToNext(); Console::WriteLine(S"next child: name={0}, type={1}", nav->Name, __box(nav->NodeType)->ToString()); // Move to the next element, which will be the first // volcano nav->MoveToFirstChild(); Console::WriteLine(S"next child: name={0}, type={1}", nav->Name, __box(nav->NodeType)->ToString()); if (nav->HasAttributes) { nav->MoveToFirstAttribute(); Console::WriteLine(S" attribute: name={0}, type={1}", nav->Name, nav->Value); nav->MoveToParent(); } 

    The navigator isn’t positioned on any node initially, so you need to call MoveToRoot to move it to the top of the tree. As with XmlDocument, this isn’t the root element of the XML but rather the top of the DOM tree.

    Move around the tree by calling the various Move methods. You need to be careful to distinguish between sibling and child nodes. MoveToNext and MoveToPrevious will move between sibling nodes at the same level in the tree, whereas MoveToFirstChild, MoveToNextChild, and the other Child functions move down a level to work with child nodes. You can use MoveToParent to move back up a level when you’ve finished processing child nodes.

    In this example, MoveToRoot positions the XPathNavigator object at the root of the DOM tree. If you look at the output from this code, you’ll see that the root doesn’t have a name. Its type is Root, and its value is a long string of text, which represents the concatenated values of all its child nodes. This isn’t very useful, but it is logical because the value of an element consists of its value plus the value of all its children.

    This code uses the Name, NodeType, and Value properties. Whether the node has a Name and a Value will depend on the NodeType.

    You’re navigating down to the first volcano element, and because it has a name attribute, the code prints out the details of the first attribute. Note the call to MoveToParent after the attribute details have been printed. Attributes are children of their parent node, so when you’ve finished processing the attributes, you have to move one level up to point the navigator back at the parent element.

  8. Experiment with adding more code to the program to print out selected elements and attributes, and make sure that you’re getting the results you expect.

Using XPath with XPathNavigator

Now that you know how to create and use an XPathNavigator, let’s move on to XPath itself. The XPath expression language is very complex and capable of defining extremely precise matches. This chapter isn’t the place for anything like a full explanation of XPath expressions, but you’ll find an introduction to the topic in the following sidebar, “Basic XPath Syntax.” For more details about XPath, consult the XML SDK documentation provided by the Microsoft Developer Network (MSDN) at http://msdn.microsoft.com/library.

start sidebar
Basic XPath Syntax

In case you haven’t encountered XPath before, here is an introduction to the very simplest XPath syntax.

XPath uses pattern matching to create expressions that match one or more elements within a document. You create basic expressions using element names, with child relationships denoted by forward slash marks (/). The syntax is very similar to specifying file and directory paths. For example, foo/bar specifies bar as a child of foo. When passed to an XPath processor and evaluated, it will match all bar elements that are children of foo elements. A leading slash mark means that the search should begin at the root, an asterisk matches any element, and two slash marks (//) will match any number of levels in the tree. Here are a few more examples:

books//price -- match price elements any number of levels below books books/*/author -- match author elements that are grandchildren of books /company -- match company elements that occur at the root

Simple conditionals can be represented with square brackets ([]), so order[subtotal] will match order elements that have a child subtotal element. To match attributes, use an at sign (@) sign, which is short for attribute. So, volcano[@name] will match all volcano elements that have a name attribute.

When an XPath engine evaluates an expression, it returns a list of the nodes that match, and this list might contain zero, one, or more nodes. It’s important to note that these nodes are now completely out of context, and you can’t tell where in the document they occur or what relationship they have to one another.

end sidebar

You can use XPath to select a set of nodes using the Select function of XPathNavigator, as demonstrated in the following brief exercise.

  1. Continue with the project that you used for the previous exercise. Add the following code, which will set the XPathNavigator back to the top of the document tree:

    // Move back to the root Console::WriteLine(S"XPath test..."); nav->MoveToRoot();
  2. The following code will select all the volcano elements that are children of geology:

    // Select all ’volcano’ elements that are children of ’geology’ // and have a ’comment’ child, starting at the root XPathNodeIterator* xpi = nav->Select(S"/geology/volcano"); Console::WriteLine(S"Select returned {0} nodes", __box(xpi->Count));

    The Select function takes a string representing an XPath expression and passes it to the XPath engine. The function returns an XPathNodeIterator* that you can use to iterate over the set of nodes retrieved by the XPath engine. You can find out how many nodes were retrieved by using the Count property on the XPathNodeIterator.

    XPathNodeIterator is basically an enumerator, so it supports the MoveNext method and the Current property. The following code will print details of all the elements in the node list:

    while (xpi->MoveNext()) { XPathNavigator* xpn = xpi->Current; Console::Write(S"node: name={0}, type={1}", xpn->Name, __box(xpn->NodeType)->ToString()); xpn->MoveToFirstAttribute(); Console::WriteLine(S", name={0}", xpn->Value); } 

    As usual, MoveNext moves from item to item in the collection. You might expect Current to return you a pointer to a node, but it actually returns a pointer to another XPathNavigator, which you use to investigate the tree of elements under the current item. The Write statement writes the node name and type, and then you retrieve the first attribute, which holds the name of the volcano.

  3. Build and run the program. You should get the following output:

     XPath test... Select returned 3 nodes node: name=volcano, type=Element, name=Erebus node: name=volcano, type=Element, name=Hekla node: name=volcano, type=Element, name=Mauna Loa 
  4. Modify the expression to return only those volcano elements that have a comment child element, as shown here:

    XPathNodeIterator* xpi = nav->Select(S"/geology/volcano[comment]");

    You should now get only one child node returned because only Hekla has a comment child node.

Note

If you’re going to use the same XPath expression several times, you can compile it to produce an XPathExpression. This object can be used instead of a string in calls to Select statements, and it cuts out the text parsing step.




Microsoft Visual C++  .NET(c) Step by Step
Microsoft Visual C++ .NET(c) Step by Step
ISBN: 735615675
EAN: N/A
Year: 2003
Pages: 208

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net