Tools for XPath


Using the right tool for the job is just as important for programmers as it is for construction workers. The right tool will help you write XPath expressions quickly and avoid errors. A number of tools let you experiment with XPath by writing an expression and quickly seeing what the result will be. This type of tool is particularly appropriate for XPath because it has these characteristics:

  • q XPath expressions are often short. You can type them quickly. Sometimes a text field is all you need to enter an expression.

  • q XPath expressions are not destructive. All you can do with XPath is extract information for a document, so your worst expression won't end up reformatting your hard drive.

The tools you learn about in this section let you quickly see what an XPath expression returns when evaluated on a certain document. You can determine which tools suit your environment best.

Online XPath Sandbox

The XPath sandbox is an online tool that doesn't require any installation. All you need is a web browser. You can access it by going to the followingURL: http://www.orbeon.com/ops/goto-example/xpath

Figure 9-1 shows the relevant part of the page you will see in your browser.

image from book
Figure 9-1

You can use this page as follows:

  • q Input area-Use this area to modify the document on which your XPath expression is evaluated.

  • q Xpath area-Type the XPath expression here.

  • q Output area-The result of your expression evaluation on the document is shown here.

The service performs the XPath evaluation as you type, so you don't need to worry about having to click a submit button. If your expression is invalid, an error will be displayed at the top of the page. This service uses the XPath engine in Saxon, which is known as one of the most compliant implementations of XPath.

XPath in Your Browser

XPath is often used to extract information from web pages. More often than not, HTML is not well-formed XML. However, you can transform HTML into XML automatically with tools like HTML Tidy (http://www.tidy.sourceforge.net). This tool has a derived C library called TidyLib; bindings for PHP, Perl, Python, and other languages; and ports, such as JTidy (http://www.sourceforge.net/projects/jtidy), which is written Java.

Writing an XPath expression that extracts the piece of information you are interested in from a web page is often an iterative process. For example, say you want to extract the current value of the Dow from the Google Finance page shown in Figure 9-2.

image from book
Figure 9-2

You can use the XPath Checker add-on for Firefox to extract what you want from this page. Just install Firefox from https://www.addons.mozilla.org/firefox/1095/, restart it, go to the Google Finance page, and choose View XPath from the contextual menu. After looking at the source of the page, you might notice that the values for Dow are located the line of a table with id mkt0. Enter the expression //tr[@id= ‘mkt0’] in the XPath Checker window, and make sure that it extracts the expected line, as shown in Figure 9-3.

image from book
Figure 9-3

The current value for the Dow is in the second column, which you can extract by entering the expression //tr[@id= ‘mkt0’]/td[2].

XML Editors

Most XML editors like XML Spy (http://www.altova.com/products/xmlspy/xml_editor.html) and Stylus Studio (http://www.stylusstudio.com/) provide a way for you to evaluate an XPath expression on a document you have opened in the editor. Figure 9-4 shows the interface provided by XML Spy.

image from book
Figure 9-4

Eclipse and IntelliJ

If you already have a Java IDE such as Eclipse (http://www.eclipse.org/) or IntelliJ (http://www.jetbrains.com/idea/), you might want to use the XML editor provided by that IDE instead of installing a specialized XML IDE. Eclipse and IntelliJ don't provide a tool to evaluate XPath expressions outside the box, but you can download a third party plug-in as follows:

  • q You can install the XPathView plug-in for IntelliJ from File image from book Settings image from book Plugins. XPathView can evaluate an expression on a document as shown in Figure 9-5, or you can use it to search for information from a number of files.

    image from book
    Figure 9-5

  • q The Eclipse-XPath-plugin is a similar plug-in for Eclipse. You can find it at http://www.sourceforge.net/projects/eclipse-xpath.




Professional XML
Professional XML (Programmer to Programmer)
ISBN: 0471777773
EAN: 2147483647
Year: 2004
Pages: 215

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net