Credit: Mauro Cicio
You want to convert a document to a given charset encoding (probably UTF-8).
If you don know the documents current encoding, you can guess at it using the Charguess library described in the previous recipe. Once you know the current encoding, you can convert the document to another encoding using Rubys standard iconv library.
Heres an XML document written in Italian, with no explicit encoding:
doc = %{ }
Lets figure out its encoding and convert it to UTF-8:
require iconv require charguess # not necessary if input encoding is known input_encoding = CharGuess::guess doc # => "windows-1252" output_encoding = utf-8 converted_doc = Iconv.new(output_encoding, input_encoding).iconv(doc) CharGuess::guess(converted_doc) # => "UTF-8"
The heart of the iconv library is the Iconv class, a wrapper for the Unix 95 iconv( ) family of functions. These functions translate strings between various encoding systems. Since iconv is part of the Ruby standard library, it should be already available on your system.
Iconv works well in conjunction with Charguess: even if Charguess guesses the encoding a little bit wrong (such as guessing Windows-1252 for an ISO-8859-1 document), it always makes a good enough guess that iconv can convert the document to another encoding.
Like Charguess, the Iconv library is not XML-or HTML-specific. You can use libcharguess and iconv together to convert an arbitrary string to a given encoding.
Strings
Numbers
Date and Time
Arrays
Hashes
Files and Directories
Code Blocks and Iteration
Objects and Classes8
Modules and Namespaces
Reflection and Metaprogramming
XML and HTML
Graphics and Other File Formats
Databases and Persistence
Internet Services
Web Development Ruby on Rails
Web Services and Distributed Programming
Testing, Debugging, Optimizing, and Documenting
Packaging and Distributing Software
Automating Tasks with Rake
Multitasking and Multithreading
User Interface
Extending Ruby with Other Languages
System Administration