7.2 I18N and L10N

   

Internationalization, often abbreviated as I18N, is the process of implementing applications that support multiple locales. With JSTL, internationalization is performed with the actions introduced in "Overview" on page 250.

Localization, often abbreviated as L10N, is the process of adapting an internationalized application to support a specific locale. For the most part, with JSTL, that means creating a set of resource bundles for specific locales.

To develop internationalized Web applications that you can subsequently localize, you must have a basic understanding of locales, resource bundles, Unicode, and charsets, all of which are discussed in the following sections.

Locales

The most basic localization task is identifying geographical, political, or cultural regions , known as locales. Locale constants for countries and languages are defined by the International Standards Organization (ISO). Table 7.1 lists some examples of locale constants for selected countries .

Table 7.1. Examples of ISO Country Locale Constants

Country

Code

Canada

  CA  

China

  CN  

Germany

  DE  

Iceland

  IS  

Italy

  IT  

Mexico

  MX  

United States

  US  

For a complete list of country locale constants, see the following URL:

http://www.iso.org/iso/en/prods-services/iso3166ma/02iso-3166-code-lists/list-en1.html

Table 7.2 lists some examples of language locale constants.

Table 7.2. Examples of Language Locale Constants

Language

Code

French

  fr  

Chinese

  zh  

German

  DE  

Icelandic

  IS  

Italian

  IT  

Spanish

  es  

English

  en  

For a complete list of language locale constants, see the following URL:

http://www-old.ics.uci.edu/pub/ietf/http/ related /iso639.txt

When you specify a locale with <fmt:setLocale> or by specifying the FMT_LOCALE configuration setting directly, [3] you can specify a language, such as fr for French, or a language-country combination, such as fr-CA , for Canadian French. If you specify a language-country combination, you can use either a hyphen or an underscore to separate the language and country, so you can write Canadian French as fr-CA or fr_CA . The language code must always precede the country code.

[3] See "Configuration Settings" on page 230 for more information about specifying configuration settings directly with servlets and life-cycle listeners.

You can also specify a variant for a locale. Variants are vendor- and browser-specific. For example, you can specify the variants WIN for Microsoft Windows or MAC for Macintosh. You can specify a locale variant with the <fmt:setLocale> action's variant attribute; for example, you could specify the Macintosh variant for France French like this: <fmt:setLocale value='fr-FR' variant='MAC'/> . In practice, locale variants are rarely used.

Resource Bundles

As we discussed in "Overview" on page 250, a resource bundle is a collection of key/value pairs. You can specify resource bundles with a properties file or with a Java class.

Properties files are by far the most popular way to specify resource bundles, even though a resource bundle specified as a Java class is more flexible than one specified as a properties file. The reasons for that popularity are simple ”properties files are easier to create than Java classes, and they do not need to be compiled.

Resource bundles, whether they are specified with a properties file or implemented as a Java class, must reside in either the WEB-INF/classes directory or a subdirectory of WEB-INF/classes .

Resource Bundles as Properties Files

Listing 7.7 lists a simple properties file that specifies localized messages in English for a login page.

Listing 7.7 A Properties File That Represents a Resource Bundle
 # Application Properties -- English Version login.window-title=Localized Error Messages login.first-name=First Name login.last-name=Last Name login.email-address=Email Address 

In a properties file, lines beginning with the # character are comments. Key/value pairs are specified with this syntax: key=value. In a properties file, both keys and values are always strings.

Resource Bundles as Java Classes

Listing 7.8 lists a Java class that specifies a resource bundle equivalent to the one represented by the properties file listed in Listing 7.7.

The Java class in the preceding listing extends java.util.ListResource Bundle , which defines one abstract method: getContents . That method returns a two-dimensional array of objects containing key/value pairs.

Specifying a resource bundle with a Java class is more flexible than using a properties file because values are not limited to strings, as is the case for properties files. Also, if you specify a resource bundle with a Java class, you can calculate values, which you cannot do in a properties file. In practice, those features are rarely used, and as a result properties files are the preferred method of specifying resource bundles.

Unicode and Charsets

Internally, the Java programming language uses Unicode to store characters . Unicode is a character coding system that assigns unique numbers for every character for every major language in the world.

Listing 7.8 A Java Class That Represents a Resource Bundle
 import java.util.ListResourceBundle; // Application Properties -- English Version public class app_en extends ListResourceBundle {    private static final Object[][] contents = {       { "login.window-title",  "Localized Error Messages" },       { "login.first-name",    "First Name" },       { "login.last-name",     "Last Name" },       { "login.email-address", "Email Address" },    };    public Object[][] getContents() {       return contents;    } } 

Java's use of Unicode means that JSP pages can store and display strings containing all characters found in all of the commonly used written languages. It also means that you can use Unicode escape sequences to represent characters that you may not find on your keyboard; Table 7.3 lists some of those characters.

Table 7.3. Unicode Escape Sequence Examples

Unicode Escape

Symbol

Description

\u00C0

Capital A, accent grave

\u00C9

ˆ

Capital E, accent acute

\u00C9

Capital A, accent circumflex

\u00A9

Copyright symbol

\u0099

Trademark

\u00B6

Paragraph Sign

\u0086

Dagger Symbol

At their most fundamental level, browsers map bytes to characters or glyphs; for example, browsers will map \u00A9 to the copyright symbol. Those mappings are facilitated by a charset, which is defined as a method of converting a sequence of bytes into a sequence of characters. [4] The default charset for JSP pages is ISO-8859-1 , which maps bytes to characters for Latin-based languages. Table 7.4 lists charsets for a few languages.

[4] That definition comes from RFC 2278 ”see http://www.faqs.org/rfcs/rfc2278.html .

Table 7.4. Charset Examples

Language

Language Code

Charset(s)

Chinese (Simplified/Mainland)

  zh  

GB2312

Chinese (Traditional/Taiwan)

  zh  

Big5

English

  en  

ISO-8859-1

French

  fr  

ISO-8859-15

German

  DE  

ISO-8859-15

Icelandic

  IS  

ISO-8859-1

Italian

  IT  

ISO-8859-15

Japanese

  ja  

Shift_JIS, ISO-2022-JP, EUC-JP

Korean

  ko  

EUC-KR

Russian

  ru  

ISO-8859-5, KO18-R

Spanish

  es  

ISO-8859-15

There is also a single charset that can be used for all languages ”the Universal Character Set ( UCS ), defined in 1993 by the ISO and the IEC.

The UCS can encode all of the characters and symbols for all of the written languages of the world, with room to spare. With 31 bits to represent each character or symbol, the UCS has room for a whopping 2 billion of them. That's the bad news, though, because most applications can only handle 16-bit encodings. The good news for the UCS is that it's Unicode compatible.

The majority of applications can't handle the UCS encoding, but because of its usefulness and compatibility with Unicode, a few transform encodings were developed, the most popular of which is UTF-8 ( UCS Transformation Format 8). The UTF-8 charset transforms UCS into 1-, 2-, or 3-byte encodings, and because it preserves the US ASCII range, UTF-8 can transmit US ASCII as single bytes, which is much more efficient than the UCS . Those properties make UTF-8 the most widely used format for displaying multiple languages. In addition, many of the newer specifications from W3C and IETF use UTF-8 as a default character set. Internally, the Java programming language uses UTF-8 in .class files and for Java serialization.

See Listing 7.9 on page 280 for an example of a Web application that uses the UTF-8 charset.

   


Core JSTL[c] Mastering the JSP Standard Tag Library
Core JSTL[c] Mastering the JSP Standard Tag Library
ISBN: 131001531
EAN: N/A
Year: 2005
Pages: 124

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net