Compressing Whitespace in an XML Document

Problem

When REXML parses a document, it respects the original whitespace of the documents text nodes. You want to make the document smaller by compressing extra whitespace.

Solution

Parse the document by creating a REXML::Document out of it. Within the Document constructor, tell the parser to compress all runs of whitespace characters:

	require 
exml/document

	text = %{<a>Some whitespace</a> Some more}

	REXML::Document.new(text, { :compress_whitespace => :all }).to_s
	# => "<a>Some whitespace</a> Some more"

Discussion

Sometimes whitespace within a document is significant, but usually (as with HTML) it can be compressed without changing the meaning of the document. The resulting document takes up less space on the disk and requires less bandwidth to transmit.

Whitespace compression doesn have to be all-or-nothing. REXML gives two ways to configure it. Instead of passing :all as a value for :compress_whitespace, you can pass in a list of tag names. Whitespace will only be compressed in those tags:

	REXML::Document.new(text, { :compress_whitespace => %w{a} }).to_s
	# => "<a>Some whitespace</a> Some more"

You can also switch it around: pass in :respect_whitespace and a list of tag names whose whitespace you don want to be compressed. This is useful if you know that whitespace is significant within certain parts of your document.

	REXML::Document.new(text, { :respect_whitespace => %w{a} }).to_s
	# => "<a>Some whitespace</a> Some more"

What about text nodes containing only whitespace? These are often inserted by XML pretty-printers, and they can usually be totally discarded without altering the meaning of a document. If you add :ignore_whitespace_nodes => :all to the parser configuration, REXML will simply decline to create text nodes that contain nothing but whitespace characters. Heres a comparison of :compress_whitespace alone, and in conjunction with :ignore_whitespace_nodes:

	text = %{<a>Some text</a>
 Some more

}
	REXML::Document.new(text, { :compress_whitespace => :all }).to_s
	# => "<a>Some text</a>
 Some more
"
	REXML::Document.new(text, { :compress_ 
whitespace => :all,
	 :ignore_ 
whitespace_nodes => :all }).to_s
	# => "<a>Some text</a>Some more"

By itself, :compress_ whitespace shouldn make a document less human-readable, but :ignore_whitespace_nodes almost certainly will.

See Also

  • Recipe 1.11, "Managing Whitespace"


Strings

Numbers

Date and Time

Arrays

Hashes

Files and Directories

Code Blocks and Iteration

Objects and Classes8

Modules and Namespaces

Reflection and Metaprogramming

XML and HTML

Graphics and Other File Formats

Databases and Persistence

Internet Services

Web Development Ruby on Rails

Web Services and Distributed Programming

Testing, Debugging, Optimizing, and Documenting

Packaging and Distributing Software

Automating Tasks with Rake

Multitasking and Multithreading

User Interface

Extending Ruby with Other Languages

System Administration



Ruby Cookbook
Ruby Cookbook (Cookbooks (OReilly))
ISBN: 0596523696
EAN: 2147483647
Year: N/A
Pages: 399

Flylib.com © 2008-2020.
If you may any questions please contact us: flylib@qtcs.net