Converting from One Encoding to Another

Credit: Mauro Cicio

Problem

You want to convert a document to a given charset encoding (probably UTF-8).

Solution

If you don know the documents current encoding, you can guess at it using the Charguess library described in the previous recipe. Once you know the current encoding, you can convert the document to another encoding using Rubys standard iconv library.

Heres an XML document written in Italian, with no explicit encoding:

	doc = %{
	 
	 spaghetti al ragù
	 frappè
	 }

Lets figure out its encoding and convert it to UTF-8:

	require iconv
	require charguess # not necessary if input encoding is known

	input_encoding = CharGuess::guess doc # => "windows-1252"
	output_encoding = utf-8

	converted_doc = Iconv.new(output_encoding, input_encoding).iconv(doc)

	CharGuess::guess(converted_doc) 	 # => "UTF-8"

Discussion

The heart of the iconv library is the Iconv class, a wrapper for the Unix 95 iconv( ) family of functions. These functions translate strings between various encoding systems. Since iconv is part of the Ruby standard library, it should be already available on your system.

Iconv works well in conjunction with Charguess: even if Charguess guesses the encoding a little bit wrong (such as guessing Windows-1252 for an ISO-8859-1 document), it always makes a good enough guess that iconv can convert the document to another encoding.

Like Charguess, the Iconv library is not XML-or HTML-specific. You can use libcharguess and iconv together to convert an arbitrary string to a given encoding.

See Also

  • Recipe 11.11, "Guessing a Documents Encoding"
  • The iconv library is documented at http://www.ruby-doc.org/stdlib/libdoc/iconv/rdoc/classes/Iconv.html; you can find pointers to The Open Group Unix library specifications


Strings

Numbers

Date and Time

Arrays

Hashes

Files and Directories

Code Blocks and Iteration

Objects and Classes8

Modules and Namespaces

Reflection and Metaprogramming

XML and HTML

Graphics and Other File Formats

Databases and Persistence

Internet Services

Web Development Ruby on Rails

Web Services and Distributed Programming

Testing, Debugging, Optimizing, and Documenting

Packaging and Distributing Software

Automating Tasks with Rake

Multitasking and Multithreading

User Interface

Extending Ruby with Other Languages

System Administration



Ruby Cookbook
Ruby Cookbook (Cookbooks (OReilly))
ISBN: 0596523696
EAN: 2147483647
Year: N/A
Pages: 399

Flylib.com © 2008-2020.
If you may any questions please contact us: flylib@qtcs.net