Converting an XML Document into a Hash

Problem

When you parse an XML document with Document.new, you get a representation of the document as a complex data structure. Youd like to represent an XML document using simple, built-in Ruby data structures.

Solution

Use the XmlSimple library, found in the xml-simple gem. It parses an XML document into a hash.

Consider an XML document like this one:

	xml = %{
	
	 Phyllo dough
	 Ice cream
	 
	 
	 
	 
	}

Heres how you parse it with XMLSimple:

	require 
ubygems
	require xmlsimple

	doc = XmlSimple.xml_in xml

And heres what it looks like:

	require pp
	pp doc
	# {"icecubetray"=>[{"cube2"=>[{}], "cube1"=>[{}]}],
	# "food"=>["Phyllo dough", "Ice cream"],
	# "scale"=>"celcius",
	# "temp"=>"-12"}

Discussion

XmlSimple is a lightweight alternative to the Document class. Instead of exposing a tree of Element objects, it exposes a nested structure of Ruby hashes and arrays. Theres no performance savings (XmlSimple actually builds a Document class behind the scenes and iterates over it, so its about half as fast as Document), but the resulting object is easy to use. XmlSimple also provides several tricks that can make a document more concise and navigable.

The most useful trick is the KeyAttr one. Suppose you had a better-organized freezer than the one above, a freezer in which everything had its own name attribute:[2]

[2] Okay, its not really better organized. In fact, its exactly the same. But it sure looks cooler!

	xml = %{
	
	 
	 
	 
	 
	 
	 
	}

You could parse this data with just a call to XmlSimple.xml_in, but you get a more concise representation by specifing the name attribute as a KeyAttr argument. Compare:

	parsed1 = XmlSimple.xml_in xml
	pp parsed1
	# {"scale"=>"celcius",
	# "item"=>
	# [{"name"=>"Phyllo dough", "type"=>"food"},
	# {"name"=>"Ice cream", "type"=>"food"},
	# {"name"=>"Ice cube tray",
	# "type"=>"container",
	# "item"=>
	# [{"name"=>"Ice cube", "type"=>"food"},
	# {"name"=>"Ice cube", "type"=>"food"}]}],
	# "temp"=>"-12"}

	parsed2 = XmlSimple.xml_in(xml, KeyAttr => 
ame)
	pp parsed2
	# {"scale"=>"celcius",
	# "item"=>
	# {"Phyllo dough"=>{"type"=>"food"},
	# "Ice cube tray"=>
	# {"type"=>"container",
	# "item"=>{"Ice cube"=>{"type"=>"food"}}},
	# "Ice cream"=>{"type"=>"food"}},
	# "temp"=>"-12"}

The second parsing is also easier to navigate:

	parsed1["item"].detect { |i| i[
ame] == Phyllo dough }[	ype]
	# => "food"
	parsed2["item"]["Phyllo dough"]["type"]
	# => "food"

But notice that the second parsing represents the ice cube tray as containing only one ice cube. This is because both ice cubes have the same name. When two tags at the same level have the same KeyAttr, one overwrites the other in the hash.

You can modify the data structure with normal Ruby hash and array methods, then write it back out to XML with XMLSimple.xml_out:

	parsed1["item"] << {"name"=>"Curry leaves", "type"=>"spice"}
	parsed1["item"].delete_if { |i| i["name"] == "Ice cube tray" }

	puts XmlSimple.xml_out(parsed1, "RootName"=>"freezer")
	# 
	# 
	# 
	# 
	# 

Be sure to specify a RootName argument when you call xml_out. When it parses a file, XmlSimple removes one level of indirection by throwing away the name of your documents root element. You can prevent this by using the KeepRoot argument in your original call to xml_in. Youll need an extra hash lookup to navigate the resulting data structure, but youll retain the name of your root element.

	parsed3 = XmlSimple.xml_in(xml, KeepRoot=>true)
	# Now theres no need to add an extra root element when writing back to XML.
	XmlSimple.xml_out(parsed3, RootName=>nil)

One disadvantage of XmlSimple is that, since it puts elements into a hash, it replaces the order of the original document with the random-looking order of a Ruby hash. This is fine for a document listing the contents of a freezerwhere order doesn matterbut it would give interesting results if you tried to use it on a web page.

Another disadvantage is that, since an elements attributes and children are put into the same hash, you have no reliable way of telling one from the other. Indeed, attributes and subelements may even end up in a list together, as in this example:

	pp XmlSimple.xml_in(%{
	
	 Body of temporary worker who knew too much
	})
	# {"scale"=>"celcius",
	# "temp"=>["-12", "Body of temp worker who knew too much"]}

See Also

  • The XmlSimple home page at http://www.maik-schmidt.de/xml-simple.html has much more information about the options you can pass to XmlSimple.xml_in


Strings

Numbers

Date and Time

Arrays

Hashes

Files and Directories

Code Blocks and Iteration

Objects and Classes8

Modules and Namespaces

Reflection and Metaprogramming

XML and HTML

Graphics and Other File Formats

Databases and Persistence

Internet Services

Web Development Ruby on Rails

Web Services and Distributed Programming

Testing, Debugging, Optimizing, and Documenting

Packaging and Distributing Software

Automating Tasks with Rake

Multitasking and Multithreading

User Interface

Extending Ruby with Other Languages

System Administration



Ruby Cookbook
Ruby Cookbook (Cookbooks (OReilly))
ISBN: 0596523696
EAN: 2147483647
Year: N/A
Pages: 399

Flylib.com © 2008-2020.
If you may any questions please contact us: flylib@qtcs.net