Ruby is an interpreted scripting language, designed with object-orientation and simplicity in mind. It was written by Yukihiro "Matz" Matsumoto in 1995. He wanted to create a language that used the best parts of his favorite languages (Perl, Smalltalk, Eiffel, Ada, and Lisp). At the same time, he wanted to create a language that was expressive. That is, one that was simple to use and understand, but with a great deal of power. The name is a slight tribute to Perl, as Matz decided that keeping with the name of a precious gem would be appropriate.
Ruby has many of the features of Perl and Python, as well as features of more academic languages. It shares the text-processing capabilities of Perl and Python, as well as the dynamic nature of these languages. From the more academic languages, Ruby obtained lambda expressions-powerful inline functions-as well as strict object-orientation. It is this last feature that truly distinguishes Ruby from the previous two languages. While both Perl and Python have some aspects of object-orientation, they are more or less recent additions to the language. Ruby, on the other hand, was designed around the concepts of object-oriented programming. Recently, Ruby has grown in popularity, partly because of its completeness, but also due to the increasing popularity of a Web interface written to use it: Ruby on Rails.
Ruby has support for both reading and writing XML via the REXML library. This library is part of the base Ruby class library. This library was originally modeled off of the Electric XML library. This library was written in Java by The Mind Electric. However, the API is designed to fit into the Ruby way of doing things. This means, "Keep the common case simple, and the uncommon, possible." As such, the API does not completely follow standards, such as the W3C DOM. Instead, it forms a close, but more easily developed model.
Ruby has two main methods for reading XML content:
q A tree-based method that is similar, but not identical, to working with the DOM. While it maintains the XML file in a tree-based memory structure, it does not provide all the methods required by the W3C DOM implementation. Although this makes porting code that uses the W3C model more difficult to port to Ruby, it means that the resulting API is closer to the natural way of working with Ruby.
q A stream-based method based on SAX2. This method more closely follows the SAX model, as described in Chapter 13. The core parser handles three events: one that occurs at the beginning of each element, one at the end, and one for text nodes. You provide the parser with a file or block of XML to process, and it begins executing the methods in order. Although SAX processing is much faster than DOM processing, it is inherently a forward-only pass through the XML. In Ruby, you create a SAX parser by inheriting from a class, and overriding the appropriate methods (see below).
While some differences between the Ruby tree-based model and the W3C DOM are minor, they can trip you up. See the following table for some of the mostly commonly encountered differences between the two.
W3C DOM method | Ruby equivalent | Description |
---|---|---|
documentElement | root or root_node | Returns the root element of the XML file |
addChild | << or add | Adds a new node to the document tree. |
childNodes | get_elements | Returns the collection of elements below the current element. |
attributes | attribute | Returns the collection of attributes for the selected element. Attributes can be identified by numerical index (starting with 1) or the name of the attribute. |
firstChild | elements[1] | The elements method returns the collection of child elements of the selected element. Children can be identified by numerical index (starting with 1), string name, or via an XPath statement. |
nextSibling | next_element | Returns the next sibling when iterating through a set of elements. |
getElementsByTagName | get_elements | Both methods take an XPath statement, and return the collection of elements that match that statement. |
Listing 17-17 shows some of the methods of REXML in use on the customer.xml file shown earlier (Listing 17-2).
Listing 17-17: Reading XML with Ruby
![]() |
require "rexml/document" include REXML doc = Document.new(File.new('customers.xml'), 'r') puts ">>Print the full first element" puts doc.root.elements[1] puts puts ">>Print the id of the first customer" puts doc.root.elements['customer'].attributes['id'] puts puts ">>Select an element via an XPath and display it" puts doc.elements["//customer[@id='HUNGC']"] puts puts ">>Iterate over child elements" el = doc.elements["//customer[@id='ANTON']"] puts el.elements["company"].text el.elements["contact"].each_element{|e| puts e.name+ ": " +e.text } puts puts ">>Select elements via XPath and display child elements" el2 = doc.each_element("//customer/address[country='Canada']/city"){|e| puts e.text}
![]() |
Listing 17-16 begins with a required statement to load in the REXML library's document handling. Next, REXML is included, so that all references to objects in that namespace don't need the REXML prefix (that is, Document.new, not REXML::Document.new). The file is loaded into a Document object. It is at this point that the tree structures are built up in memory. The first highlighted line in the listing shows one difference between the REXML structure and the W3C DOM. The root element in Ruby is root, rather than the documentElement in the DOM.
Elements can be extracted from the document via the elements method. Notice that the first element is numbered 1, rather than 0. Alternatively, the name of the element or attribute can be used to identify the item in the collection. The third highlighted line in the preceding code shows a third method: Each of the collection methods (elements and attributes), as well as the each_??? methods accept an XPath statement to restrict the selection.
The last two samples in the previous listing show the each_element method that iterates over child elements. This is a shorthand method for doc.elements.each. As described earlier, the each_element also accepts an XPath statement to restrict the returned children. The output of the code in Listing 17-17 should look similar to Listing 17-18.
Listing 17-18: Output of Ruby tree-based processing
![]() |
>>Print the full first element <customer id='ALFKI'> <company>Alfreds Futterkiste</company> <address> <street>Obere Str. 57</street> <city>Berlin</city> <zip>12209</zip> <country>Germany</country> </address> <contact> <name>Maria Anders</name> <title>Sales Representative</title> <phone>030-0074321</phone> <fax>030-0076545</fax> </contact> </customer> >>Print the id of the first customer ALFKI >>Select an element via an XPath and display it <customer id='HUNGC'> <company>Hungry Coyote Import Store</company> <address> <street>City Center Plaza 516 Main St.</street> <city>Elgin</city> <region>OR</region> <zip>97827</zip> <country>USA</country> </address> <contact> <name>Yoshi Latimer</name> <title>Sales Representative</title> <phone>(503) 555-6874</phone> <fax>(503) 555-2376</fax> </contact> </customer> >>Iterate over child elements Antonio Moreno Taquería name: Antonio Moreno title: Owner phone: (5) 555-3932 >>Select elements via XPath and display child elements Montreal Tsawassen Vancouver
![]() |
In addition to the methods listed previously, the Ruby XML implementation includes a number of methods that are designed to get information on the structure of the document. These methods use common Ruby idioms, making the code feel more Ruby-like. The following table outlines some of these methods.
Method | Description |
---|---|
each_element | Iterates over the child elements of the selected element. Can be passed an XPath statement to restrict the elements iterated over. |
has_elements? | Returns true if the current node has child elements. |
has_attributes? | Returns true if the current node has attributes. |
has_text? | Returns true if the current node has a child text node. |
text= | Assigns a value as the inner text for an element. |
Listing 17-19 shows some of these methods being used to query the structure of the XML.
Listing 17-19: Getting structure information with Ruby
![]() |
require "rexml/document" include REXML doc = Document.new(File.new('customers.xml'), 'r') #get information about element el = doc.elements["//customer[@id='RICAR']"] if el.has_attributes? puts el.name+ " has attributes" el.attributes.each {|name, value| puts name+ ": " +value} end def dump_element(e) if e.has_elements? puts puts e.name+ " has children" e.each_element{|el| dump_element(el)} else if e.has_text? puts e.name+ ": " +e.text end end end if el.has_elements? dump_element(el) end
![]() |
Listing 17-20 shows the output of the preceding code.
Listing 17-20: XML Structure Information
![]() |
customer has attributes id: RICAR customer has children company: Ricardo Adocicados address has children street: Av. Copacabana, 267 city: Rio de Janeiro region: RJ zip: 02389-890 country: Brazil contact has children name: Janete Limeira title: Assistant Sales Agent phone: (21) 555-3412
![]() |
In addition to the tree-based document parsing, REXML supports a stream-based parser. With this technique, you create one or more listener classes with methods that are called as the document is processed. You create a listener class by inheriting from the StreamListener class. You must then override the methods of the StreamListener class to provide the implementation you desire. The following table describes the most common methods you should override.
Method | Description |
---|---|
tag_start | Called when the parser first encounters a new tag. The name of the new element will be passed to the method, as well as the attributes for the element. The attributes are provided in an array of name-value pairs. This method is usually used to prepare for the processing by identifying the elements you are interested in. |
tag_end | Called when the parser encounters the end of an element. This is usually used to undo whatever settings where enabled when the corresponding tag_start method was called, such as turning off flags or decrementing counters. |
text | Called when the parser encounters a text node. This is often where the bulk of the processing occurs when using stream-based parsers. |
Listing 17-21 shows some of these methods processing the customers XML file.
Listing 17-21: Reading XML using streams with Ruby
![]() |
require 'rexml/document' require 'rexml/streamlistener' include REXML include Parsers class Listener include StreamListener def initialize @cities = Hash.new(0) @flag = false end def tag_start(name, attributes) if name == 'city' @flag = true end end def tag_end(name) if name == 'customers' puts dump_list end end def text(text) if @flag puts "Adding " +text @cities[text] = @cities[text] + 1 @flag = false end end def dump_list() puts ">> Count of each city" @cities.each {|key, value| puts key+ ": " +value.to_s } puts "===" end end listener = Listener.new parser = StreamParser.new(File.new("customers.xml"), listener) parser.parse
![]() |
The Listener class includes the StreamListener mixin and contains five methods, three of which are used by the streaming parser (tag_start, tag_end and text). The tag_start method is called as each new tag is reached by the parser, whereas tag_end is called at the end.
The tag_start method receives two parameters: the name of the element and an array containing the keys and values of the attributes for that element. As the code is identifying cities, it simply sets a flag if the parser has reached a city element.
The counting is done within the text method. As this will be called many times throughout the life of the application, however, it uses the @flag variable to determine if it is within a city element. If this is the case, the entry for the city in the hash table @cities is incremented. As Ruby is a dynamic language, if the city did not have an entry in the hash table, one would be created at this point, and the value set to 1. Finally, the flag is turned off. This could also have been done in the tag_end method.
Once the end of the document has been reached (identified by the tag_end method being called on the customers end element), the contents of the hash table are printed to the console. This method uses a Ruby block to print each entry in the @cities hash table.
Listing 17-22: Output of the Ruby stream-based processor
![]() |
Adding Lyon Adding Reims Adding Stuttgart Adding Oulu Adding Resende Adding Seattle Adding Helsinki Adding Warszawa >> Count of each city Stuttgart: 1 Butte: 1 Kobenhavn: 1 Tsawassen: 1 London: 6 Brandenburg: 1 Cunewalde: 1 Marseille: 1 Berlin: 1 Sao Paulo: 4 Portland: 2 Lyon: 1 Albuquerque: 1 Warszawa: 1 Lille: 1 Frankfurt a.M.: 1
![]() |
Writing XML with the REXML library is quite simple. The Document class is used to create the new document, whereas Element and Attribute classes add elements and attributes, respectively.
Writing XML using Ruby is significantly different from using the W3C DOM. Instead of sticking with the API used with the DOM, the authors of the Ruby library chose to follow common Ruby idioms. The following table outlines some of these methods.
Method | Description |
---|---|
Document.new | new(source = nil, context = {}) |
XMLDecl.new | XMLDecl.new(version, encoding, standalone) |
Element.new | new (arg = UNDEFINED, parent=nil, context=nil) |
Element.add_element | add_element(arg=nil, arg2=nil) book3 = folder.add_element("bookmark", {"href"=>"http://www.wrox.com"}) |
Element.add_attribute | add_attribute(key, value=nil) |
Element.<< | << item |
Listing 17-23 shows the creation of a simple XBEL (XML Bookmarks Exchange Language) document using REXML. Note that this sample uses a number of different techniques on purpose; it shows the choices available for creating new elements and attributes.
Listing 17-23: Writing XML with Ruby
![]() |
require "rexml/document" include REXML doc = Document.new doc << XMLDecl.new doc << Element.new("xbel") doc.root.attributes["version"] = "1.0" folder = Element.new("folder") folder << Element.new("title").add_text("Some useful bookmarks") book1 = folder.add_element("bookmark") book1.add_attribute("href", "http://www.geekswithblogs.net/evjen") book1.add_element("title").add_text("Bill Evjen's Weblog") book2 = Element.new("bookmark") book2.add_attribute("href", "http://www.acmebinary.com/blogs/kent") book2 << Element.new("title") << Text.new("Kent Sharkey's Weblog") folder << book2 book3 = folder.add_element("bookmark", {"href"=>"http://www.wrox.com"}) book3 << Element.new("title") << Text.new("Wrox Home Page") book3 << Element.new("desc") << Text.new("Home of great, red books") doc.root.add_element(folder) doc.write(File.new("output.xml", "w"), 2)
![]() |
First, the REXML library is loaded with the require statement, and aliased with the include REXML statement. This eliminates the inclusion REXML:: at every use of the library. The XML document is created, and the standard XML declaration added. Next, the root element is added, along with an attribute.
Listing 17-24 shows the resulting XML document.
Listing 17-24: Output from Ruby
![]() |
<?xml version='1 <xbel version='1 <folder> <title>Some useful bookmarks</title> <bookmark href='http://www <title>Bill Evjen's Weblog</title> </bookmark> <bookmark href='http://www <title>Kent Sharkey's Weblog</title> </bookmark> <bookmark href='http://www <title>Wrox Home Page</title> <desc>Home of great, red books</desc> </bookmark> </folder> </xbel>
![]() |
New elements can be created either standalone (with Element.new), or as part of the existing structure (with add_element). Similarly, attributes can be added via add_attribute, or using Attribute.new. Finally, the append method (<<) is overridden to permit adding either elements or attributes. The choice in methods allows you to either select the method that works best for you or for the situation at hand.
Notice from the output that the text is automatically encoded in the case of the single quote characters. In addition, other characters not appropriate in XML files (such as & or “) will be encoded. This behavior can be overridden by adding the :raw value to the context (for Element.new). You may use this format when you are writing the entries to a CDATA block or other location where the characters may actually be valid.
The base libraries for Ruby also include support for creating and accessing Web services using either SOAP or XML-RPC. In addition, they provide support for working with RSS and W3C XML Schemas. External libraries are generally installed as Ruby gems, a packaging format built into Ruby. As of this writing, there are approximately 1200 Ruby gems, many providing support for various XML formats. Some of the more notable Ruby gems include:
q Amrita2-An XHTML templating engine that provides for the transformation of XML documents into XHTML. It is similar in concept to XSLT, but does not follow that standard.
q FeedTools-A powerful library for working with RSS, Atom and CDF (Channel Definition Format) files.
q XMPP4R-A library for communicating with the XML format used by the Jabber instant messenger protocol.