Section 3.10. Internationalization (I18N) and Localization

3.10. Internationalization (I18N)[18] and Localization

[18] Refer to to learn more about the Linux I18N Project.

How I18N affects the porting effort greatly depends on the application. If an application requires simple message catalogue conversions, date and time displays, or simple text string searches using regular expressions, porting these functionalities from UNIX platforms to Linux can be easy. However, if the application does more-complicated text analysis, like ones used in editors, those may be harder to port for the reason mentioned in the next paragraph.

Linux conforms to the ISO naming standard locale names, [locale]_[territory].[codeset], where locale is the two-character language and territory is the two-character territory. Examples of these are en_US.iso885915 and zh_CN.gb18030. However, the locales available on each system and locale contents differ. Porting applications that require complicated use of available locales may require the need to learn language specificity and translation as well as modification of existing locales on Linux to get similar behavior that is expected from the source application running on a different platform.

Table 3-4 shows some of the supported GNU libc internationalization functions.

Table 3-4. GNU libc I18N Support Table

GNU libc


wcscat, wcsncat, wcscmp, wcsncmp, wcscpy, wcsncpy, wcslen, wcschr, wcsrchr, wcsstr, wcspbrk, wcsspn, wcscspn, wcswcs, wcstok, wcscoll, wcwidth, wcswidth, wcsxfrm

Wide character string functions

mblen, mbrlen, wcslen, strlen, wcswidth, wcwidth

Dimension functions

mbsinit, mbrtowc, wcrtomb mbsrtowcs, wcsrtombs, btowc, wctob, mbtowc, mbstowcs wctomb, wcstombs, wcrtomb, wcsnrtombs, mbsnrtowcs

Conversion functions

iconv, iconv_open, iconv_close

Translation functions

ngettext, dngettext, dcngettext setlocale, textdomain, bindtextdomain, bind_textdomain_codeset, gettext, dgettext, dcgettext, catgets, gencat, catclose, catopen

Message catalog functions

getc, getc_unlocked, getchar getchar_unlockec, fgetc, getw, putc, putchar, fputc, putw, fgetc, getw, ungetwc, getwc, getwchar, fgetwc, ungetwc, fgets, ungetc, fputwc, fputws, putwc, putwchar, fputs, fgetws

Input/output functions

fwprintf, wprintf, swprintf, vfwprintf, vwprintf, vswprintf, strftime, strfmon, printf, fprintf, sprintf, wcfstime, vprintf, vfprintf, vsprintf, vsnprintf

String formatting functions

iswalpha, iswupper, iswlower iswdigit, iswxdigit, iswalnum iswspace, iswpunct, iswprint, iswgraph, iswcntrl, wctype, iswctype, isalpha, isupper, islower, isdigit, isxdigit, isalnum, isspace, ispunct, isprint, isgraph, iscntrl, isascii, isblank, isspace, isupper

Character classifications

strcoll, strxfrm, wcscoll, wcsxfrm

Collation functions

regcomp, regexec, regerror, regfree

Regular expressions

memset, wmemchr, wmemcmp wmemcpy, wmemmove, wmemset, wcpcpy, wcpncpy, wcscpy, wcsncpy, wcsspn, wmemset, cpcpy, cpncpy, cscpy, csncpy, csspn

String copying and filling functions

towlower, towupper, towctrans, wctrans, tolower, toupper

Conversion functions

On Linux, you can use the command locale a to determine the locales installed on the system, and the decoder file can be found in /usr/share/locale/locale.alias.

3.10.1. Iconv Support

Linux provides the iconv(1) utility to convert traditional string encoding to unicode. To get a list of currently implemented encodings in Linux, use the iconv list command at the Linux command prompt.

Sometimes using the iconv utilty to translate encodings is not sufficient. Some applications, such as mailers and Web interfaces, need to convert between two different text encodings. The GNU libc provides functionalities to programmatically convert string encodings between internal string representation (unicode) and external string representation (a traditional encoding).

Example 3-4 shows how to use the libiconv APIs.

Example 3-4. Listing of iconv_samp.c

#include <iconv.h> #include <errno.h> #include <unistd.h> #include <stdio.h> main(int argc, char *argv[]) {   int Input = STDIN_FILENO;   iconv_t cd;                  /* conversion descriptor */   int bytesread;                /* num bytes read into input buffer */   unsigned char inbuf[BUFSIZ]; /* input buffer */   const unsigned char *inchar; /* ptr to input character */   size_t inbytesleft;          /* num bytes left in input buffer */   unsigned char outbuf[BUFSIZ];/* output buffer */   const unsigned char *outchar;/* ptr to output character */   size_t outbytesleft;         /* num bytes left in output buffer */   size_t ret_val;               /* number of successful conversions */   if (argc < 2)   {     printf("usage  : iconv_samp <tocode> <fromcode>\n");     printf("Example : iconv_samp WINDOWS-1256 ISO_8859-16\n");     printf("Run the command \"iconv --list\" for list of known                       character sets\n");     exit(1);   }   /* Initiate conversion -- get conversion descriptor */   if ((cd = iconv_open(argv[1], argv[2])) == (iconv_t)-1) {     printf("iconv_open failed : %d \n", errno);     exit(1);   }   inbytesleft = 0;       /* no. of bytes converted */                          /* translate the characters */   for ( ;; ) {     /*      * if any bytes are leftover, they will be in the      * beginning of the buffer on the next read().  */ inchar = inbuf;         /* points to input buffer */ outchar = outbuf;       /* points to output buffer */ outbytesleft = BUFSIZ;  /* no of bytes to be converted */ if ((bytesread = read(Input, inbuf+inbytesleft,        (size_t)BUFSIZ-inbytesleft)) < 0) {       perror("prog");       return 1; } if (!(inbytesleft += bytesread)) {   break;      /* end of conversions */ } ret_val = iconv(cd, (char **)&inchar, &inbytesleft,           (char **)&outchar, &outbytesleft); if (write(1, outbuf, (size_t)BUFSIZ-outbytesleft) < 0) {   perror("prog");   return 1; } /* iconv() returns the number of non-identical conversions  * performed. If the entire string in the input buffer is  * converted, the value pointed to by inbytesleft will be  * zero. If the conversion stopped due to any reason, the  * value pointed to by inbytesleft will be nonzero and  * errno is set to indicate the condition.  */ switch (ret_val) {   case 0:     /* nothing to do */     break;   case -1:     switch (errno)     {       case EINVAL:         /* Conversion stopped due to an incomplete          * multibyte sequence encountered in the          * input buffer.          */         /* Copy data left, to the start of buffer */         memcpy((char *)inbuf, (char *)inchar,               (size_t)inbytesleft);         break;       case EILSEQ:         /* Conversion stopped due to an invalid          * multibyte sequence encountered in the          * input buffer.                      */             printf("EILSEQ encountered program exiting\n");             exit(1);             break;           case E2BIG:             /* Conversion stopped due to lack of space              * in the output buffer.              */             memcpy((char *)inbuf, (char *)inchar,                (size_t)inbytesleft);             break;           default:             break;         }       default:       break;     }     /* Go back and read from the input file. */   }   /* end conversion & get rid of the conversion table */   if (iconv_close(cd) == (int)-1) {     printf("iconv_close failed : %d \n", errno);     exit (1);   }   return 0; } 

Compile the code:

$ gcc iconv_samp.c -o iconv_samp 

Use the executable to convert an input file:

$ ./iconv_samp WINDOWS-1256 ISO_8859-16 < input_file 

3.10.2. How to Create a Message Catalog[19]

[19] Refer to Linux Programming by Example by Robbins (Prentice Hall, 2004) for more detailed Linux I18N examples.

A message catalog is a file used to translate language-dependent output of an application in the language to which the system locale is set. Example 3-5 shows how to create a message catalog on Linux using the GNU xgettext and msgfmt utilities.

Example 3-5. Sample Listing of message_cat.c

#include <locale.h> #include <stdio.h> #include <libintl.h> #define PACKAGE  "my_message" #define LOCALEDIR "locale" int main() {   setlocale (LC_MESSAGES, "");   bindtextdomain (PACKAGE, LOCALEDIR);   textdomain (PACKAGE);   printf("%s\n", gettext("Linux Porting Guide"));   printf("%s\n", gettext("Author: Mendoza, Skawratananond,Walker"));   return(0); } 

In this example, we want to output the title and authors of the book in Spanish. To do this, we first run xgettext against the sample program:

$ xgettext hello.c 

This produces a file named message.po:

$ cat message.po # SOME DESCRIPTIVE TITLE. # Copyright (C) YEAR THE PACKAGE'S COPYRIGHT HOLDER # This file is distributed under the same license as the PACKAGE package. # FIRST AUTHOR <EMAIL@ADDRESS>, YEAR. # #, fuzzy msgid "" msgstr "" "Project-Id-Version: PACKAGE VERSION\n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2005-03-17 12:22-0800\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME <EMAIL@ADDRESS>\n" "Language-Team: LANGUAGE <>\n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=ISO-8859-2\n" "Content-Transfer-Encoding: 8bit\n" #: hello.c:13 msgid "Linux Porting Guide" msgstr "Guia de Migracion a Linux" #: hello.c:14 msgid "Author: Mendoza, Skawratananond,Walker" msgstr "Autor: Mendoza, Skawratananond,Walker" 

Edit the file message.po, and change the msgstr for each of the messages to be translated. Edit the charset if it was not set before running xgettext. Then run the command msgfmt against the messages.po file. The file named refers to the domain set in the bindtextdomain with the extension of .mo.

$ msgfmt -v -o messages.po 

For this example, we create a directory for Spanish (Costa Rica) in our local directory (not the default directory /usr/lib/locale, because we do not have root access):

$ mkdir -p locale/es_CR/LC_MESSAGES 

Then move to the correct directory:

$ mv locale/es_CR/LC_MESSAGES 

Export the correct LC_MESSAGES value and run the sample program:

$ export LC_MESSAGES=es_CR $ ./hello Guia de Migracion a Linux Autor: Mendoza, Skawratananond,Walker 

We have just successfully created a message catalogue file on Linux.

UNIX to Linux Porting. A Comprehensive Reference
UNIX to Linux Porting: A Comprehensive Reference
ISBN: 0131871099
EAN: 2147483647
Year: 2004
Pages: 175 © 2008-2017.
If you may any questions please contact us: