Credit: Rod Gaither
You want to parse an XML file into a Ruby data structure, to traverse it or extract data from it.
Pass an XML document into the REXML::Document constructor to load and parse the XML. A Document object contains a tree of subobjects (of class Element and Text) rep-resenting the tree structure of the underlying document. The methods of Document and Element give you access to the XML tree data. The most useful of these methods is #each_element.
Heres some sample XML and the load process. The document describes a set of orders, each of which contains a set of items. This particular document contains a single order for two items.
orders_xml = %{} require exml/document orders = REXML::Document.new(orders_xml) 105 02/10/2006 Corner Store
To process each order in this document, we can use Document#root to get the documents root element (
orders.root.each_element do |order| # eachin order.each_element do |node| # , , etc. in if node.has_elements? node.each_element do |child| # each - in
puts "#{child.name}: #{child.attributes[desc]}" end else # the contents of , , etc. puts "#{node.name}: #{node.text}" end end end # number: 105 # date: 02/10/2006 # customer: Corner Store # item: Red Roses # item: Candy Hearts
Parsing an
XML file into a Document gives you a tree-like
data structure that you can treat kind of like an array of arrays. Starting at the document root, you can move down the tree until you find the data that interests you. In the example above, note how the structure of the Ruby code mirrors the structure of the original document. Every call to each_element moves the focus of the code down a level: from
There are many other methods of Element you can use to navigate the tree structure of an XML document. Not only can you iterate over the child elements, you can reference a specific child by indexing the parent as though it were an array. You can navigate through siblings with Element.next_element and Element.previous_element. You can move up the document tree with Element.parent:
my_order = orders.root.elements[1] first_node = my_order.elements[1] first_node.name # => "number" first_node.next_element.name # => "date" first_node.parent.name # => "order"
This only scratches the surface; there are many other ways to interact with the data loaded from an XML source. For example, explore the convenience methods Element.each_element_with_attribute and Element.each_element_with_text, which let you select elements based on features of the elements themselves.
Strings
Numbers
Date and Time
Arrays
Hashes
Files and Directories
Code Blocks and Iteration
Objects and Classes8
Modules and Namespaces
Reflection and Metaprogramming
XML and HTML
Graphics and Other File Formats
Databases and Persistence
Internet Services
Web Development Ruby on Rails
Web Services and Distributed Programming
Testing, Debugging, Optimizing, and Documenting
Packaging and Distributing Software
Automating Tasks with Rake
Multitasking and Multithreading
User Interface
Extending Ruby with Other Languages
System Administration