You want to find or address sections of an XML document in a standard, programming-languageindependent way.
The XPath language defines a way of referring to almost any element or set of elements in an XML document, and the REXML library comes with a complete XPath implementation. REXML::XPath provides three class methods for locating Element objects within parsed documents: first, each, and match.
Take as an example the following XML description of an aquarium. The aquarium contains some fish and a gaudy castle decoration full of algae. Due to an aquarium stocking mishap, some of the smaller fish have been eaten by larger fish, just like in those cartoon food chain diagrams. (Figure 11-1 shows the aquarium.)
xml = %{ <aquarium><algae color="green" /> } require exml/document doc = REXML::Document.new xml
We can use REXML::
Xpath.first to get the Element object corresponding to the first
REXML::XPath.first(doc, //fish) # =>
We can use match to get an array containing all the elements that are green:
REXML::XPath.match(doc, //[@color="green"]) # => [… >, <algae color=green/>]
We can use each with a code block to iterate over all the fish that are inside other fish:
def describe(fish) "#{fish.attribute(size)} #{fish.attribute(color)} fish" end REXML:: XPath.each(doc, //fish/fish) do |fish| puts "The #{describe(fish.parent)} has eaten the #{describe(fish)}." end # The large orange fish has eaten the small green fish. # The small green fish has eaten the tiny red fish.
Every element in a Document has an xpath method that returns the canonical XPath path to that element. This path can be considered the elements "address" within the document. In this example, a complex bit of Ruby code is replaced by a simple XPath expression:
red_fish = doc.children[0].children[3].children[1].children[1] # =>red_fish.xpath # => "/aquarium/fish[2]/fish/fish" REXML::XPath.first(doc, red_fish.xpath) # =>
Even a brief overview of XPath is beyond the scope of this recipe, but here are some more examples to give you ideas:
# Find the second green element. REXML::XPath.match(doc, //[@color="green"])[1] # => <algae color=green/> # Find the color attributes of all small fish. REXML::XPath.match(doc, //fish[@size="small"]/@color) # => [color=lue, color=green] # Count how many fish are inside the first large fish. REXML::XPath.first(doc, "count(//fish[@size=large][1]//*fish)") # => 2
The Elements class acts kind of like an array that supports XPath addressing. You can make your code more concise by passing an XPath expression to Elements#each, or using it as an array index.
doc.elements.each(//fish) { |f| puts f.attribute(color) } # blue # orange # green # red doc.elements[//fish] # =>
Within an XPath expression, the first element in a list has an index of 1, not 0. The XPath expression //fish[size=large][1] matches the first large fish, not the second large fish, the way large_fish[1] would in Ruby code. Pass a number as an array index to an Elements object, and you get the same behavior as XPath:
doc.elements[1] # => <aquarium> … > doc.children[0] # => <aquarium> … >
Strings
Numbers
Date and Time
Arrays
Hashes
Files and Directories
Code Blocks and Iteration
Objects and Classes8
Modules and Namespaces
Reflection and Metaprogramming
XML and HTML
Graphics and Other File Formats
Databases and Persistence
Internet Services
Web Development Ruby on Rails
Web Services and Distributed Programming
Testing, Debugging, Optimizing, and Documenting
Packaging and Distributing Software
Automating Tasks with Rake
Multitasking and Multithreading
User Interface
Extending Ruby with Other Languages
System Administration