Navigating a Document with XPath

Table of contents:

Problem

You want to find or address sections of an XML document in a standard, programming-languageindependent way.

Solution

The XPath language defines a way of referring to almost any element or set of elements in an XML document, and the REXML library comes with a complete XPath implementation. REXML::XPath provides three class methods for locating Element objects within parsed documents: first, each, and match.

Take as an example the following XML description of an aquarium. The aquarium contains some fish and a gaudy castle decoration full of algae. Due to an aquarium stocking mishap, some of the smaller fish have been eaten by larger fish, just like in those cartoon food chain diagrams. (Figure 11-1 shows the aquarium.)

	xml = %{
	<aquarium>
	 

	 
	 
	 
	 
	 

	 
	 <algae color="green" />
	 
	 }

	 require 
exml/document
	 doc = REXML::Document.new xml

Figure 11-1. The aquarium

We can use REXML:: Xpath.first to get the Element object corresponding to the first tag in the document:

	REXML::XPath.first(doc, //fish)
	# =>

We can use match to get an array containing all the elements that are green:

	REXML::XPath.match(doc, //[@color="green"])
	# => [ … , <algae color=green/>]

We can use each with a code block to iterate over all the fish that are inside other fish:

	def describe(fish)
	 "#{fish.attribute(size)} #{fish.attribute(color)} fish"
	end
	REXML:: 
XPath.each(doc, //fish/fish) do |fish|
	 puts "The #{describe(fish.parent)} has eaten the #{describe(fish)}."
	end
	# The large orange fish has eaten the small green fish.
	# The small green fish has eaten the tiny red fish.

Discussion

Every element in a Document has an xpath method that returns the canonical XPath path to that element. This path can be considered the elements "address" within the document. In this example, a complex bit of Ruby code is replaced by a simple XPath expression:

	red_fish = doc.children[0].children[3].children[1].children[1]
	# => 

	red_fish.xpath
	# => "/aquarium/fish[2]/fish/fish"

	REXML::XPath.first(doc, red_fish.xpath)
	# =>

Even a brief overview of XPath is beyond the scope of this recipe, but here are some more examples to give you ideas:

	# Find the second green element.
	REXML::XPath.match(doc, //[@color="green"])[1]
	# => <algae color=green/>

	# Find the color attributes of all small fish.
	REXML::XPath.match(doc, //fish[@size="small"]/@color)
	# => [color=lue, color=green]

	# Count how many fish are inside the first large fish.
	REXML::XPath.first(doc, "count(//fish[@size=large][1]//*fish)")
	# => 2

The Elements class acts kind of like an array that supports XPath addressing. You can make your code more concise by passing an XPath expression to Elements#each, or using it as an array index.

	doc.elements.each(//fish) { |f| puts f.attribute(color) }
	# blue
	# orange
	# green
	# red

	doc.elements[//fish]
	# =>

Within an XPath expression, the first element in a list has an index of 1, not 0. The XPath expression //fish[size=large][1] matches the first large fish, not the second large fish, the way large_fish[1] would in Ruby code. Pass a number as an array index to an Elements object, and you get the same behavior as XPath:

	doc.elements[1]
	# => <aquarium> … 
	doc.children[0]
	# => <aquarium> …

Navigating a Document with XPath

Problem

Solution

Figure 11-1. The aquarium

Discussion

See Also