Navigating a Document with XPath

Problem

You want to find or address sections of an XML document in a standard, programming-languageindependent way.

Solution

The XPath language defines a way of referring to almost any element or set of elements in an XML document, and the REXML library comes with a complete XPath implementation. REXML::XPath provides three class methods for locating Element objects within parsed documents: first, each, and match.

Take as an example the following XML description of an aquarium. The aquarium contains some fish and a gaudy castle decoration full of algae. Due to an aquarium stocking mishap, some of the smaller fish have been eaten by larger fish, just like in those cartoon food chain diagrams. (Figure 11-1 shows the aquarium.)

	xml = %{
	<aquarium>
	 

	 
	 
	 
	 
	 

	 
	 <algae color="green" />
	 
	 }

	 require 
exml/document
	 doc = REXML::Document.new xml

Figure 11-1. The aquarium


We can use REXML:: Xpath.first to get the Element object corresponding to the first tag in the document:

	REXML::XPath.first(doc, //fish)
	# => 

We can use match to get an array containing all the elements that are green:

	REXML::XPath.match(doc, //[@color="green"])
	# => [ … , <algae color=green/>]

We can use each with a code block to iterate over all the fish that are inside other fish:

	def describe(fish)
	 "#{fish.attribute(size)} #{fish.attribute(color)} fish"
	end
	REXML:: 
XPath.each(doc, //fish/fish) do |fish|
	 puts "The #{describe(fish.parent)} has eaten the #{describe(fish)}."
	end
	# The large orange fish has eaten the small green fish.
	# The small green fish has eaten the tiny red fish.

Discussion

Every element in a Document has an xpath method that returns the canonical XPath path to that element. This path can be considered the elements "address" within the document. In this example, a complex bit of Ruby code is replaced by a simple XPath expression:

	red_fish = doc.children[0].children[3].children[1].children[1]
	# => 

	red_fish.xpath
	# => "/aquarium/fish[2]/fish/fish"

	REXML::XPath.first(doc, red_fish.xpath)
	# => 

Even a brief overview of XPath is beyond the scope of this recipe, but here are some more examples to give you ideas:

	# Find the second green element.
	REXML::XPath.match(doc, //[@color="green"])[1]
	# => <algae color=green/>

	# Find the color attributes of all small fish.
	REXML::XPath.match(doc, //fish[@size="small"]/@color)
	# => [color=lue, color=green]

	# Count how many fish are inside the first large fish.
	REXML::XPath.first(doc, "count(//fish[@size=large][1]//*fish)")
	# => 2

The Elements class acts kind of like an array that supports XPath addressing. You can make your code more concise by passing an XPath expression to Elements#each, or using it as an array index.

	doc.elements.each(//fish) { |f| puts f.attribute(color) }
	# blue
	# orange
	# green
	# red

	doc.elements[//fish]
	# => 

Within an XPath expression, the first element in a list has an index of 1, not 0. The XPath expression //fish[size=large][1] matches the first large fish, not the second large fish, the way large_fish[1] would in Ruby code. Pass a number as an array index to an Elements object, and you get the same behavior as XPath:

	doc.elements[1]
	# => <aquarium> … 
	doc.children[0]
	# => <aquarium> … 

See Also

  • The XPath standard, at http://www.w3.org/TR/xpath, has more XPath examples
  • XPath and XPointer by John E. Simpson (OReilly)


Strings

Numbers

Date and Time

Arrays

Hashes

Files and Directories

Code Blocks and Iteration

Objects and Classes8

Modules and Namespaces

Reflection and Metaprogramming

XML and HTML

Graphics and Other File Formats

Databases and Persistence

Internet Services

Web Development Ruby on Rails

Web Services and Distributed Programming

Testing, Debugging, Optimizing, and Documenting

Packaging and Distributing Software

Automating Tasks with Rake

Multitasking and Multithreading

User Interface

Extending Ruby with Other Languages

System Administration



Ruby Cookbook
Ruby Cookbook (Cookbooks (OReilly))
ISBN: 0596523696
EAN: 2147483647
Year: N/A
Pages: 399

Flylib.com © 2008-2020.
If you may any questions please contact us: flylib@qtcs.net