Chapter 4. Internationalization in Ruby
Earlier we said that character data was arguably the most important data type. But what do we mean by character data? Whose characters, whose alphabet, whose language and culture?
In the past computing has had an Anglocentric bias, perhaps going back as far as Charles Babbage. This is not necessarily a bad thing. We had to start somewhere, and it might as well be with an alphabet of 26 letters and no diacritic marks (accents and other marks added to a base character).
But computing is a global phenomenon now. Probably every country in the world has at least some computers and some Net access. Naturally everyone would prefer to work with web pages, email, and other data not just in English but in that person's own language.
Human written languages are amazingly diverse. Some are nearly phonetic; others are hardly phonetic at all. Some have true alphabets, whereas others are mostly large collections of thousands of symbols evolved from pictograms. Some languages have more than one alphabet. Some are intended to be written vertically; some are written horizontally, as most of us are used tobut from right to left, as most of us are not used to. Some alphabets are fairly plain; some have letters adorned with a bewildering array of dots, accents, circles, lines, and ticks. Some languages have letters that can be combined with their neighboring letters in certain circumstances; sometimes this is mandatory, sometimes optional. Some languages have the concept of upper- and lowercase letters; most do not.
We've come a long way in 25 years or so. We've managed to create a little order out of the chaos of characters and languages.
If you deal much with programming applications that are meant to be used in linguistically diverse environments, you know the term internationalization. This could be defined as the enabling of software to handle more than one written language.
Related terms are multilingualization and localization. All of these are traditionally abbreviated by the curious practice of deleting the middle letters and replacing them with the number of letters deleted:
def shorten(str) (str[0..0] + str[1..-2].length.to_s + str[-1..-1]).upcase end shorten("internationalization") # I18N shorten("multilingualization") # M17N shorten("localization") # L10N
The terms I18N and M17N are largely synonymous; globalization has also been used, but this has other meanings in other contexts. The term L10N refers to something a little broaderthe complete support for local conventions and culture, such as currency symbols, ways of formatting dates and times, using a comma as a decimal separator, and much more.
Let's start by examining a little terminology, since this area of computing is fairly prone to jargon. This will also include a little history, since the current state of affairs makes sense only as we view its slow evolution. The history lessons will be kept to a minimum.