Section 4.4. Conclusion


4.3. Using Message Catalogs

Lojban is culturally fully neutral. Its vocabulary was built algorithmically using today's six most widely spoken languages: Chinese, Hindi, English, Russian, Spanish, and Arabic.

What is Lojban?, Nick Nicholas and John Cowan

A message catalog is a collection of messages in a specific language. This is integral to the idea of localization (L10N). The idea is to "isolate" the language-specific strings from the rest of the application so that we can simply "plug in" a different catalog to get messages and strings in a different language.

The "best" way to do this in Ruby is to use the library formally called Ruby-GetText-Package. I'll simply refer to this as gettext, after the filename of the library; this is not to be confused with the gettext utility. This excellent library is the work of Masao Mutoh, who helped extensively with this section.

This library is a Ruby implementation (not a wrapper) modeled after the GNU gettext utilities (the most famous set of utilities in this area). The official site is at http://gettext.rubyforge.org/, and the official GNU gettext utilities site is at http://www.gnu.org/software/gettext/.

4.3.1. Background and Terminology

The gettext library is really more than one library, as we'll see. The basic functionality is accessed with a require 'gettext', and certain useful utilities are accessed through require 'gettext/utils' (for maintaining message catalogs).

The primary reason to use message catalogs, of course, is to translate messages between languages. We also handle cases where singular and plural forms need to be distinguished (one file, two files); these rules vary widely from one language to another, of course.

Typically each library or application will have its own message catalog. This means that sets of translated catalogs can be included as part of a distributed package.

Environment variables such as LANG and GETTEXT_PATH are honored. We'll explain these later.

There are two basic maintenance operations you might perform on a message catalog (outside your Ruby code). One is to extract messages from your Ruby source to create an initial catalog; the other is to update (merge) new messages from Ruby source into an existing catalog. We'll look at the extract and merge operations in section 4.3.3, "Localizing a Simple Application."

4.3.2. Getting Started with Message Catalogs

You may already have this library installed. If not, gem install gettext is the easiest way to get it.

For development purposes, you will need the related GNU utilities. If you're on a UNIX-like system, you probably already have them. If you're on Win32, one way to get them is to install Glade/GTK+ for Windows. Either way, the utilities are needed only for development, not at runtime.

If you don't have rake, install the gem. It's convenient to have in these situations.

Assuming that your environment is all set up and everything is installed, you can begin to work with catalogs. Let's look at some terminology:

  • A po-file is a portable object file. These are the text forms (or human-readable forms) of the message catalogs; each of these files has a counterpart under each different locale supported. A pot-file is a template file.

  • A mo-file is a portable binary message catalog file. It is created from a po-file. Our Ruby library reads mo-files, not po-files.

  • A text domain is, in effect, just the basename of a mo-file. This text domain is associated with an application (or bound to it).

4.3.3. Localizing a Simple Application

The following example defines a Person class and manipulates it. The show method shows the localized messages.

require 'gettext' class Person   include GetText   def initialize(name, age, children_num)     @name, @age, @children_num = name, age, children_num     bindtextdomain("myapp")   end   def show     puts _("Information")     puts _("Name: %{name}, Age: %{age}") % {:name => @name, :age => @age}     puts n_("%{name} has a child.", "%{name} has %{num} children.",              @children_num) % {:name => @name, :num => @children_num}   end end john = Person.new("John", 25, 1) john.show linda = Person.new("Linda", 30, 3) linda.show


Assume that we save this code to myapp/person.rb. The directory hierarchy is significant, as we'll see later. The call to bindtextdomain binds the text domain "myapp" to the Person object at runtime.

In the show method, there are three gettext calls. The method is named _ (a single underscore) to be unobtrusive.

The first call just displays the localized message corresponding to the "Information" string. The second demonstrates a localized message with two arguments. The hash specifies a list of values to be substituted into the string; we can't interpolate directly into the string because that would interfere with our whole purpose of storing a small number of messages in a catalog.

Also remember that the parameters are separated so that they can appear in different orders if necessary. Sometimes the data will get rearranged during translation because languages may have differing word order.

You can do the same method call in this shorter way:

puts _("Name: %s, Age: %d") % [@name, @age]


However, the longer style is recommended. It is more descriptive and gives more information to the translator.

The n_ method handles singular and plural cases. The @children value is used to compute an index telling us which of the specified strings to use. (The Plural-Forms entry, which I will explain soon, specifies how to calculate the index.)

Note that these default messages need to be in English (even if you as a programmer are not a native English speaker). Like it or not, English is the nearest thing to a universal language from the viewpoint of most translators.

I said we would find rake to be useful. Let's create a Rakefile (under myapp) to maintain message catalogs. We'll give it the two common operations update po-files and make mo-files.

require 'gettext/utils' desc "Update pot/po files." task :updatepo do   GetText.update_pofiles("myapp", ["person.rb"], "myapp 1.0.0") end desc "Create mo-files" task :makemo do   GetText.create_mofiles end


This code uses the gettext/utils library, which contains various functions to help in maintaining message catalogs. The update_pofiles method creates the initial myapp/po/myapp.pot file from the person.rb source. When it is invoked the second time (or more), this function performs an update or merge of the myapp/po/myapp.pot file and all of the myapp/po/#{lang}/myapp.po files.

The second parameter is an array of target files. Usually you will specify something like the following:

GetText.update_pofiles("myapp", Dir.glob("{lib,bin}/**/*.{rb,rhtml}"),                        "myapp 1.0.0")


The GetText.create_mofiles call creates data/locale/ subdirectories as needed and generates mo-files from po-files.

So if we issue the command rake updatepo, we create the myapp/po directory and create myapp.pot under it.

Now edit the header part of the po/myapp.pot. This is basically a description of your application (name, author, email, license, and so on).

# My sample application.                      (Some descriptive title) # Copyright (C) 2006  Foo Bar                 (Author of this app) # This file is distributed under XXX license. (License info) # # FIRST AUTHOR <EMAIL@ADDRESS>, YEAR.         (Translator's info) # #, fuzzy msgid "" msgstr "" "Project-Id-Version: myapp 1.0.0\n"           (Project ID and version) #...


You may wonder what the fuzzy marker is. This is simply marking something that has not been translated or has a doubtful translation. When messages are generated automatically, they will be marked fuzzy so that a human can check them and change them.

You will then send the myapp.pot file to the translators. (Of course, you may be translating it yourself.)

Now suppose that you're a Japanese translator. The locale is ja_JP.UTF-8, meaning "Japanese (ja) as spoken in Japan (JP), with encoding UTF-8."

Start by copying myapp.pot to myapp.po. If you have the GNU gettext utilities, it is better to use msginit instead of a simple cp command; this utility will honor the environment variables and set certain header variables correctly. Invoke it this way (on UNIX):

LANG=ja_JP.UTF-8 msginit -i myapp.pot -o myapp.po


Then edit myapp.po as shown in Listing 4.3. Note that you need to edit this file in the same charset of the definition in the Content-Type line.

Listing 4.3. The Completed myapp.pot File

# My sample application. # Copyright (C) 2006  Foo Bar # This file is distributed under XXX license. # # Your name <yourname@foo.com>, 2006.        (All translator's info) #                                            (Remove the 'fuzzy' line) msgid "" msgstr "" "Project-Id-Version: myapp 1.0.0\n" "POT-Creation-Date: 2006-05-22 23:27+0900\n" "PO-Revision-Date: 2006-05-23 14:39+0900\n" "Last-Translator: Your Name <foo@bar.com>\n" (Current translator's info) "Language-Team: Japanese\n"                  (Your language) "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=UTF-8\n"  (Encoding of this file) "Content-Transfer-Encoding: 8bit\n" "Plural-Forms: nplurals=2; plural=(n != 1);\n" (Pluralization form) #: person.rb:12 msgid "Information" msgstr "Jouhou" #: person.rb:13 msgid "Name: %{name}, Age: %{age}" msgstr "Namae: %{name}, Nenrei: %{age}" #: person.rb:14 msgid "%{name} has a child." msgid_plural "%{name} has %{num} children." msgstr[0] "%{name} ha hitori kodomo ga imasu." msgstr[1] "%{name} ha %{num} nin no kodomo ga imasu."

The msgid tag marks the original message, and msgstr marks the translated message. If you find msgid_plural, you need to separate the msgstr[i] values to follow the Plural-Forms rule. The index i is the number calculated from the Plural-Forms expression. In this case, when n != 1, we will use msgstr[1] (plural messages).

The behavior of Plural-Forms betrays its origins in the C language. The usage we see here depends on the fact that Boolean expressions in C return 0 or 1.

Be aware that singular and plural forms vary widely from one language to another. In fact, many languages have more than one plural form. In Polish, the word plik (file) is singular; for numbers greater than one, there are two plural forms. The form pliki is for numbers ending in 2, 3, or 4, and plików is for all other numbers.

So in Polish, our Plural-Forms would look something like this:

Plural-Forms: nplurals=3; \               plural=n==1 ? 0 : \               n%10>=2 && n%10<=4 && (n%100=20) ? 1 : 2;


Obviously the header is important. In particular, Content-Type and Plural-Forms are indispensable. If you can use msginit, they are inserted automatically; otherwise, you need to add them manually.

At this point, the translator sends back the localized files to the developer. (So you can put on your "developer's hat" again.)

The myapp.po files from the translators go under their respective language directories (under myapp/po). So, for example, the French version would be stored in myapp/po/fr/myapp.po, the German version in myapp/po/de/myapp.po, and so on.

Then issue the commend rake makemo. This will convert the po-files to mo-files. These generated mo-files go under myapp/data/locale/ (which has a subdirectory for each language).

So our entire directory structure looks like this:

myapp/     Rakefile     person.rb     po/         myapp.pot         de/myapp.po         fr/myapp.po         ja/myapp.po         :     data/         locale/             de/LC_MESSAGES/myapp.mo             fr/LC_MESSAGES/myapp.mo             ja/LC_MESSAGES/myapp.mo             :


All our translation tasks are finished. Now let's test this example. But before we do, you need to specify where the mo-files are located and which locale you are testing. Set the GETTEXT_PATH and LANG environment variables, run the program, and observe the output.

export GETTEXT_PATH="data/locale" export LANG="ja_JP.UTF-8" ruby person.rb


The application will output localized messages depending on the value of the LANG variable.

4.3.4. Other Notes

If you include message catalogs along with your application, it's best to package everything using RubyGems or the setup.rb library. Refer to section 17.2, "Installation and Packaging," for more information.

With RubyGems, the message catalogs are installed to a directory of this form:

(gem-packages-installed-dir)/myapp-x.x.x/data/locale/


This is included as the search path of the gettext library. Your application will be localized without using GETTEXT_PATH.

Using setup.rb, they are installed to the (system-dir)/share/locale/ directory. Again the application will be localized without GETTEXT_PATH.

Remember that this library is not a wrapper of the GNU gettext utilities. However, the message files are compatible, so you can use the GNU maintenance tools if you want. Of course, these utilities are not required at runtime (that is, the end user does not need them).




The Ruby Way(c) Solutions and Techniques in Ruby Programming
The Ruby Way, Second Edition: Solutions and Techniques in Ruby Programming (2nd Edition)
ISBN: 0672328844
EAN: 2147483647
Year: 2004
Pages: 269
Authors: Hal Fulton

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net