Problem
You need to handle strings that contain nonASCII characters: probably Unicode characters encoded in UTF-8.
Solution
To use Unicode in Ruby, simply add the following to the beginning of code.
$KCODE='u' require 'jcode'
You can also invoke the Ruby interpreter with arguments that do the same thing:
$ ruby -Ku -rjcode
If you use a Unix environment, you can add the arguments to the shebang line of your Ruby application:
#!/usr/bin/ruby -Ku -rjcode
The jcode library overrides most of the methods of String and makes them capable of handling multibyte text. The exceptions are String#length, String#count, and String#size, which are not overridden. Instead jcode defines three new methods: String#jlength, string#jcount, and String#jsize.
Discussion
Consider a UTF-8 string that encodes six Unicode characters: efbca1 (A), efbca2 (B), and so on up to UTF-8 efbca6 (F):
string = "xefxbcxa1" + "xefxbcxa2" + "xefxbcxa3" + "xefxbcxa4" + "xefxbcxa5" + "xefxbcxa6"
The string contains 18 bytes that encode 6 characters:
string.size # => 18 string.jsize # => 6
String#count is a method that takes a strong of bytes, and counts how many times those bytes occurs in the string. String#jcount takes a string of characters and counts how many times those characters occur in the string:
string.count "xefxbcxa2" # => 13 string.jcount "xefxbcxa2" # => 1
String#count treats "xefxbcxa2" as three separate bytes, and counts the number of times each of those bytes shows up in the string. String#jcount TReats the same string as a single character, and looks for that character in the string, finding it only once.
"xefxbcxa2".length # => 3 "xefxbcxa2".jlength # => 1
Apart from these differences, Ruby handles most Unicode behind the scenes. Once you have your data in UTF-8 format, you really don't have to worry. Given that Ruby's creator Yukihiro Matsumoto is Japanese, it is no wonder that Ruby handles Unicode so elegantly.
See Also
Strings
Numbers
Date and Time
Arrays
Hashes
Files and Directories
Code Blocks and Iteration
Objects and Classes8
Modules and Namespaces
Reflection and Metaprogramming
XML and HTML
Graphics and Other File Formats
Databases and Persistence
Internet Services
Web Development Ruby on Rails
Web Services and Distributed Programming
Testing, Debugging, Optimizing, and Documenting
Packaging and Distributing Software
Automating Tasks with Rake
Multitasking and Multithreading
User Interface
Extending Ruby with Other Languages
System Administration