XPath

XPath

XPath, which is short for XML Path Language, is a language for addressing parts of an XML document. Its name includes the word path because of the similarities between XML paths and file system paths. In a file system, for example, \Book\Chap13 identifies the Chap13 subdirectory of the root directory s Book subdirectory. In an XML document, /Guitars/Guitar identifies all elements named Guitar that are children of the root element Guitars. /Guitars/Guitar is an XPath expression. XPath expressions are fully described in the XPath specification found at http://www.w3.org/TR/xpath.

XPath can be put to work in a variety of ways. Later in this chapter, you ll learn about XSL Transformations (XSLT), which is a language for converting XML documents from one format to another. XSLT uses XPath expressions to identify nodes and node sets. Another common use for XPath is extracting data from XML documents. Used this way, XPath becomes a query language of sorts the XML equivalent of SQL, if you will. The W3C is working on an official XML query language called XQuery (http://www.w3.org/TR/xquery), but for the moment, an XPath processor is the best way to extract information from XML documents without having to manually traverse DOM trees. The FCL comes with an XPath engine named System.Xml.XPath.XPathNavigator. Before we discuss it, let s briefly review XPath.

XPath Basics

Expressions are the building blocks of XPath. The most common type of expression is the location path. The following location path evaluates to all Guitar elements that are children of a root element named Guitars:

/Guitars/Guitar

This one evaluates to all attributes (not elements) named Image that belong to Guitar elements that in turn are children of the root element Guitars:

/Guitars/Guitar/@Image

The next expression evaluates to all Guitar elements anywhere in the document:

//Guitar

The // prefix is extremely useful for locating elements in a document regardless of where they re positioned.

XPath also supports wildcards. This expression selects all elements that are children of a root element named Guitars:

/Guitars/*

The next example selects all attributes belonging to Guitar elements anywhere in the document:

//Guitar/@*

Location paths can be absolute or relative. Paths that begin with / or // are absolute because they specify a location relative to the root. Paths that don t begin with / or // are relative paths. They specify a location relative to the current node, or context node, in an XPath document.

The components of a location path are called location steps. The following location path has two location steps:

/Guitars/Guitar

A location step consists of three parts: an axis, a node test, and zero or more predicates. The general format for a location step is as follows:

axis::node-test[predicate1][predicate2][...]

The axis describes a relationship between nodes. Supported values include child, descendant, descendant-or-self, parent, ancestor, and ancestor-or-self, among others. If you don t specify an axis, the default is child. Therefore, the expression

/Guitars/Guitar

could also be written

/child::Guitars/child::Guitar

Other axes can be used to qualify location paths in different ways. For example, this expression evaluates to all elements named Guitar that are descendants of the root element:

/descendant::Guitar

The next expression evaluates to all Guitar elements that are descendants of the root element or are themselves root elements:

/descendant-or-self::Guitar

In fact, // is shorthand for /descendant-or-self. Thus, the expression

//Guitar

is equivalent to the one above. Similarly, @ is shorthand for attribute. The statement

//Guitar/@*

can also be written

//Guitar/attribute::*

Most developers prefer the abbreviated syntax, but both syntaxes are supported by XPath 1.0 compliant expression engines.

The predicate is the portion of the location path, if any, that appears in square brackets. Predicates are nothing more than filters. For example, the following expression evaluates to all Guitar elements in the document:

//Guitar

But this one uses a predicate to narrow down the selection to Guitar elements having attributes named Image:

//Guitar[@Image]

The next one evaluates to all Guitar elements that have attributes named Image whose value is MyStrat.jpeg :

//Guitar[@Image = "MyStrat.jpeg"]

Predicates can include the following comparison operators: <, >, =, !=, <=, and >=. The following expression targets Guitar elements whose Year elements designate a year after 1980:

//Guitar[Year > 1980]

Predicates can also include and and or operators. This expression selects guitars manufactured after 1980 by Fender:

//Guitar[Year > 1980][Make = "Fender"]

The next expression does the same, but combines two predicates into one using the and operator:

//Guitar[Year > 1980 and Make = "Fender"]

Changing and to or identifies guitars that were manufactured by Fender or built after 1980:

//Guitar[Year > 1980 or Make = "Fender"]

XPath also supports a set of intrinsic functions that are often (but not always) used in predicates. The following expression evaluates to all Guitar elements having Make elements whose text begins with the letter G. The key is the starts-with function invoked in the predicate:

//Guitar[starts-with (Make, "G")]

The next expression uses the text function to return all text nodes associated with Make elements that are subelements of Guitar elements. Like DOM, XPath treats the text associated with an element as a separate node:

//Guitar/Make/text ()

The starts-with and text functions are but two of many that XPath supports. For a complete list, refer to the XPath specification.

When executed by an XPath processor, a location path returns a node set. XPath, like DOM, uses tree-structured node sets to represent XML content. Suppose you re given the XML document in Figure 13-3 and you execute the following location path against it:

//Guitar

The resulting node set contains two nodes, each representing a Guitar element. Each Guitar element is the root of a node tree containing Make, Model, Year, Color, and Neck subelement nodes (Figure 13-9). Each subelement node is the parent of a text node that holds the element s text. XPath node types are defined separately from DOM node types, although the two share many similarities. XPath defines fewer node types than DOM, which make XPath node types a functional subset of DOM node types.

Figure 13-9

Node set resulting from an XPath expression.

XPathNavigator and Friends

The .NET Framework class library s System.Xml.XPath namespace contains classes for putting XPath to work in managed applications. Chief among those classes are XPathDocument, which represents XML documents that you want to query with XPath; XPathNavigator, which provides a mechanism for performing XPath queries; and XPathNodeIterator, which represents node sets generated by XPath queries and lets you iterate over them.

The first step in performing XPath queries on XML documents is to create an XPathDocument wrapping the XML document itself. XPathDocument features a variety of constructors capable of initializing an XPathDocument from a stream, a URL, a file, a TextReader, or an XmlReader. The following statement creates an XPathDocument object and initializes it with the content found in Guitars.xml:

XPathDocument doc = new XPathDocument ("Guitars.xml");

Step two is to create an XPathNavigator from the XPathDocument. XPathDocument features a method named CreateNavigator for just that purpose. The following statement creates an XPathNavigator object from the XPathDocument created in the previous step:

XPathNavigator nav = doc.CreateNavigator ();

The final step is actually executing the query. XPathNavigator features five methods for executing XPath queries. The two most important are Evaluate and Select. Evaluate executes any XPath expression. It returns a generic Object that can be a string, a float, a bool, or an XPathNodeIterator, depending on the expression and the type of data that it returns. Select works exclusively with expressions that return node sets and is therefore an ideal vehicle for evaluating location paths. It always returns an XPathNodeIterator representing an XPath node set. The following statement uses Select to create a node set representing all nodes that match the expression //Guitar :

XPathNodeIterator iterator = nav.Select ("//Guitar");

XPathNodeIterator is a simple class that lets you iterate over the nodes returned in a node set. Its Count property tells you how many nodes were returned:

Console.WriteLine ("Select returned {0} nodes", iterator.Count);

XPathNodeIterator s MoveNext method lets you iterate over the node set a node at a time. As you iterate, XPathNodeIterator s Current property exposes an XPathNavigator object that represents the current node. The following code iterates over the node set, displaying the type, name, and value of each node:

while (iterator.MoveNext ()) { Console.WriteLine ("Type={0}, Name={1}, Value={2}", iterator.Current.NodeType, iterator.Current.Name, iterator.Current.Value); }

The string returned by the XPathNavigator s Value property depends on the node s type and content. For example, if Current represents an attribute node or an element node that contains simple text (as opposed to other elements), then Value returns the attribute s value or the text value of the element. If, however, Current represents an element node that contains other elements, Value returns the text of the subelements concatenated together into one long string.

Each node in the node set that Select returns can be a single node or the root of a tree of nodes. Traversing a tree of nodes encapsulated in an XPathNavigator is slightly different from traversing a tree of nodes in an XmlDocument. Here s how to perform a depth-first traversal of the node trees returned by XPathNavigator.Select:

while (iterator.MoveNext ()) OutputNode (iterator.Current); . . . void OutputNode (XPathNavigator nav) { Console.WriteLine ("Type={0}, Name={1}, Value={2}", nav.NodeType, nav.Name, nav.Value); if (nav.HasAttributes) { nav.MoveToFirstAttribute (); do { OutputNode (nav); } while (nav.MoveToNextAttribute ()); nav.MoveToParent (); } if (nav.HasChildren) { nav.MoveToFirstChild (); do { OutputNode (nav); } while (nav.MoveToNext ()); nav.MoveToParent (); } }

XPathNavigator features a family of Move methods that you can call to move any direction up, down, or sideways in a tree of nodes. This sample uses five of them: MoveToFirstAttribute, MoveToNextAttribute, MoveToParent, MoveToFirstChild, and MoveToNext. Observe also that the XPathNavigator itself exposes the properties of the nodes that you iterate over, in much the same manner as XmlTextReader.

So how might you put this knowledge to work in a real application? Look again at Figure 13-5. The application listed there uses XmlDocument to extract content from an XML document. Content can also be extracted often with less code with XPath. To demonstrate, the application in Figure 13-10 is the functional equivalent of the one in Figure 13-5. Besides demonstrating the basic semantics of XPathNavigator usage, it shows that you can perform subqueries on node sets returned by XPath queries by calling Select on the XPathNavigator exposed through an iterator s Current property. XPathDemo first calls Select to create a node set representing all Guitar elements that are children of Guitars. Then it iterates through the node set, calling Select on each Guitar node to select the node s Make and Model child elements.

XPathDemo.cs

using System; using System.Xml.XPath; class MyApp { static void Main () { XPathDocument doc = new XPathDocument ("Guitars.xml"); XPathNavigator nav = doc.CreateNavigator (); XPathNodeIterator iterator = nav.Select ("/Guitars/Guitar"); while (iterator.MoveNext ()) { XPathNodeIterator it = iterator.Current.Select ("Make"); it.MoveNext (); string make = it.Current.Value; it = iterator.Current.Select ("Model"); it.MoveNext (); string model = it.Current.Value; Console.WriteLine ("{0} {1}", make, model); } } }
Figure 13-10

Utility that uses XPath to extract XML content.

A Do-It-Yourself XPath Expression Evaluator

To help you get acquainted with XPath, the application pictured in Figure 13-11 is a working XPath expression analyzer that evaluates XPath expressions against XML documents and displays the results. Like Microsoft SQL Server s query analyzer, which lets you test SQL commands, the XPath expression analyzer Expressalyzer for short lets you experiment with XPath queries. To try it out, type a file name or URL into the Document box and click Load to point Expressalyzer to an XML document. Then type a location path into the Expression box and click the Execute button. The results appear in the tree view control in the lower half of the window.

Figure 13-11

Windows Forms XPath expression analyzer.

Expressalyzer s source code appears in Figure 13-12. Expressalyzer is a Windows Forms application whose main form is an instance of AnalyzerForm. Clicking the Load button activates the form s OnLoadDocument method, which wraps an XPathDocument around the data source. Clicking the Execute button activates the OnExecuteExpression method, which executes the expression by calling Select on the XPathDocument. If you need more real estate, resize the Expressalyzer window and the controls inside it will resize too. That little piece of magic results from the AnchorStyles assigned to the controls Anchor properties. For a review of Windows Forms anchoring, refer to Chapter 4.

Expressalyzer.cs

using System; using System.Drawing; using System.Windows.Forms; using System.Xml.XPath; class AnalyzerForm : Form { GroupBox DocumentGB; TextBox Source; Button LoadButton; GroupBox ExpressionGB; TextBox Expression; Button ExecuteButton; ImageList NodeImages; TreeView XmlView; XPathNavigator Navigator; public AnalyzerForm () { // Initialize the form's properties Text = "XPath Expression Analyzer"; ClientSize = new System.Drawing.Size (488, 422); // Instantiate the form's controls DocumentGB = new GroupBox (); Source = new TextBox (); LoadButton = new Button (); ExpressionGB = new GroupBox (); Expression = new TextBox (); ExecuteButton = new Button (); XmlView = new TreeView (); // Initialize the controls Source.Anchor = AnchorStyles.Top AnchorStyles.Left AnchorStyles.Right; Source.Location = new System.Drawing.Point (16, 24); Source.Size = new System.Drawing.Size (336, 24); Source.TabIndex = 0; Source.Name = "Source"; LoadButton.Anchor = AnchorStyles.Top AnchorStyles.Right; LoadButton.Location = new System.Drawing.Point (368, 24); LoadButton.Size = new System.Drawing.Size (72, 24); LoadButton.TabIndex = 1;

 LoadButton.Text = "Load"; LoadButton.Click += new System.EventHandler (OnLoadDocument); DocumentGB.Anchor = AnchorStyles.Top AnchorStyles.Left AnchorStyles.Right; DocumentGB.Location = new Point (16, 16); DocumentGB.Size = new Size (456, 64); DocumentGB.Text = "Document"; DocumentGB.Controls.Add (Source); DocumentGB.Controls.Add (LoadButton); Expression.Anchor = AnchorStyles.Top AnchorStyles.Left AnchorStyles.Right; Expression.Location = new System.Drawing.Point (16, 24); Expression.Size = new System.Drawing.Size (336, 24); Expression.TabIndex = 2; Expression.Name = "Expression"; ExecuteButton.Anchor = AnchorStyles.Top AnchorStyles.Right; ExecuteButton.Location = new System.Drawing.Point (368, 24); ExecuteButton.Size = new System.Drawing.Size (72, 24); ExecuteButton.TabIndex = 3; ExecuteButton.Text = "Execute"; ExecuteButton.Enabled = false; ExecuteButton.Click += new System.EventHandler (OnExecuteExpression); ExpressionGB.Anchor = AnchorStyles.Top AnchorStyles.Left AnchorStyles.Right; ExpressionGB.Location = new System.Drawing.Point (16, 96); ExpressionGB.Name = "ExpressionGB"; ExpressionGB.Size = new System.Drawing.Size (456, 64); ExpressionGB.Text = "Expression"; ExpressionGB.Controls.Add (Expression); ExpressionGB.Controls.Add (ExecuteButton); NodeImages = new ImageList (); NodeImages.ImageSize = new Size (12, 12); NodeImages.Images.AddStrip (new Bitmap (GetType (), "Buttons")); NodeImages.TransparentColor = Color.White; XmlView.Anchor = AnchorStyles.Top AnchorStyles.Bottom AnchorStyles.Left AnchorStyles.Right; XmlView.Location = new System.Drawing.Point (16, 176); XmlView.Size = new System.Drawing.Size (456, 232); XmlView.ImageList = NodeImages; XmlView.TabIndex = 4; XmlView.Name = "XmlView"; // Add the controls to the form Controls.Add (DocumentGB); Controls.Add (ExpressionGB); Controls.Add (XmlView); } void OnLoadDocument (object sender, EventArgs e) { try { XPathDocument doc = new XPathDocument (Source.Text); Navigator = doc.CreateNavigator (); ExecuteButton.Enabled = true; } catch (Exception ex) { MessageBox.Show (ex.Message); } } void OnExecuteExpression (object sender, EventArgs e) { try { XPathNodeIterator iterator = Navigator.Select (Expression.Text); XmlView.Nodes.Clear (); while (iterator.MoveNext ()) AddNodeAndChildren (iterator.Current, null); } catch (Exception ex) { MessageBox.Show (ex.Message); } } void AddNodeAndChildren (XPathNavigator nav, TreeNode tnode) { TreeNode child = AddNode (nav, tnode); if (nav.HasAttributes) { nav.MoveToFirstAttribute (); do { AddAttribute (nav, child); } while (nav.MoveToNextAttribute ()); nav.MoveToParent (); } if (nav.HasChildren) { nav.MoveToFirstChild (); do { AddNodeAndChildren (nav, child); } while (nav.MoveToNext ()); nav.MoveToParent (); } } TreeNode AddNode (XPathNavigator nav, TreeNode tnode) { string text = null; TreeNode child = null; TreeNodeCollection tnodes = (tnode == null) ? XmlView.Nodes : tnode.Nodes; switch (nav.NodeType) { case XPathNodeType.Root: case XPathNodeType.Element: tnodes.Add (child = new TreeNode (nav.Name, 0, 0)); break; case XPathNodeType.Attribute: text = String.Format ("{0}={1}", nav.Name, nav.Value); tnodes.Add (child = new TreeNode (text, 1, 1)); break; case XPathNodeType.Text: text = nav.Value; if (text.Length > 128) text = text.Substring (0, 128) + "..."; tnodes.Add (child = new TreeNode (text, 2, 2)); break; case XPathNodeType.Comment: text = String.Format ("<!--{0}-->", nav.Value); tnodes.Add (child = new TreeNode (text, 4, 4)); break; case XPathNodeType.ProcessingInstruction: text = String.Format ("<?{0} {1}?>", nav.Name, nav.Value); tnodes.Add (child = new TreeNode (text, 5, 5)); break; } return child; } void AddAttribute (XPathNavigator nav, TreeNode tnode) { string text = String.Format ("{0}={1}", nav.Name, nav.Value); tnode.Nodes.Add (new TreeNode (text, 1, 1)); } static void Main () { Application.Run (new AnalyzerForm ()); } }

Figure 13-12

Source code for an XPath expression analyzer.



Programming Microsoft  .NET
Applied MicrosoftNET Framework Programming in Microsoft Visual BasicNET
ISBN: B000MUD834
EAN: N/A
Year: 2002
Pages: 101

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net