To create software that is locale-aware, you'll need to first understand what is meant by the term "locale," as well as the vital role locale variables play in the development process. As you'll see, the manner in which locales are handled and interpreted differs depending on whether you're working in a traditional Win32 programming environment or within the .NET Framework. The following sections will help you understand the locale settings you'll encounter among various systems and environments.
Merriam-Webster's Collegiate Dictionary defines "locale" as "a place or locality." In Windows, this notion has been expanded to define some of the system components that deal with international support. The terminology related to locale variables has been misused in some cases, and sometimes terms differ according to whether you're dealing with Windows 2000, Windows XP, or the .NET Framework. Since you will need to work with several of the following Windows locale variables when creating locale-aware applications, it is important that you clearly understand their functionality, and their impact on the system's and the application's behavior, all from an end-user point of view first. (For more information on Windows locale variables, see the table at http://www.microsoft.com/globaldev/reference/localetable.asp.)
The different languages and scripts that an operating system can support-once the user installs them-are known as "language groups" in Windows 2000.Generally speaking, a language group contains code-page information, keyboard layouts, and fonts. Some language groups also have a scripting engine, which enables the user to edit supported scripts within the operating system.
With Windows 2000 the user can add support for many languages and scripts (including, among others, Western European, Central European, Arabic, Indic, and Turkic languages). Since Windows 2000 and Windows XP are both single, world-ready binaries, functionality for all supported scripts is available on all language versions of these operating systems. If additional languages are needed, the user only needs to install these separately either during or after setup.
In Windows XP, Microsoft has simplified the process of installing these groups by gathering languages with similar properties into a language collection. On English Windows XP, support for all European languages (including Baltic languages, Cyrillic languages, Greek, and Turkic languages)-known as the "Basic Language Collection"-is installed by default. Additional language collections (such as East Asian for Japanese, Chinese, and Korean, as well as complex scripts for Arabic, Hebrew, the Indic family of languages, and Thai) can also be installed. (After installing a language group or collection, the user will need to restart the computer.)
A Spanish user of English Windows XP is a translator of Telugu (from the Indic family of languages). In order to type Telugu in Notepad, for example, the user needs to install support for complex scripts. Support for additional languages cannot be installed programmatically. The user must either install it manually, or unattended setup scripts must be used.
Figure 4-1 shows a system in which support for complex scripts, for right-to-left languages, and for East Asian languages has been installed.
Figure 4-1 - Regional And Language Options property sheet of Windows XP, where East Asian, complex script, and right-to-left language support have been installed.
The system locale determines which code page-such as an Arabic 1256 or Korean 949-is used on the system by default. As explained in Chapter 3, "Unicode," a system locale's code page will be used on operating systems that use Unicode as their native encoding-such as Windows 2000 and Windows XP-to convert text data from Unicode to code page whenever dealing with legacy non-Unicode applications. In fact, in Windows XP the system locale is called "Language for non-Unicode programs." Only applications that do not use Unicode as their default character-encoding mechanism are affected by this setting; therefore, applications that are already Unicode-encoded can safely ignore the value and functionality of this setting. (See Figure 4-2.) Support for new scripts in Windows 2000 and Windows XP (such as for the Indic family, Armenian, and Georgian languages) has been provided through Unicode encoding and, therefore, these scripts are also free of any system-locale limitations.
Figure 4-2 - System locale can be set from the Advanced tab of the Regional And Language Options property sheet in Windows XP.
In this example, the system locale has been set to Romanian (Windows code page 1250 and Original Equipment Manufacturer [OEM] code page 8521), a Central European language. This means that all Central European-language applications that are based on code pages can run safely in this configuration, since they are part of the same language collection.
So, for example, a Polish application (another Central European language) does not need a system-locale change. Also, since English is part of the invariant American Standard Code for Information Interchange (ASCII) range of all code pages, English legacy applications will always run properly.
Sometimes there is no noticeable difference between two system locales. For example, the German (Standard) and German (Austria) system locales are identical, since they share the same OEM and Windows code pages; therefore, the behavior of non-Unicode-based applications will be identical in both scenarios. In general, system locales of a language group are very similar and might only be different in the OEM code page.
As its name suggests, the system locale is a unique setting for each system. Only an administrator has the right to change the system locale, and the computer will need to be restarted in order for changes to take effect. The administrator can only select a system locale if the appropriate language group-and its associated script support-is installed.
The system locale is a system variable that cannot be changed programmatically. The only way to change it is for an administrator to do so manually.
The user locale determines which default settings a user wants for formatting dates, times, currency, and large numbers. Although it's presented as a language (some in a combination with a country), it's not a language setting. That is, choosing the Hebrew user locale means that the user wants to adhere to the standards of Israel, not really of the Hebrew language. To avoid any confusion with this naming, the .NET Framework calls the user locale "culture information."
As its name implies, the user locale (known in Windows XP as "Standards and formats") is a variable that each individual user can set. This can be done on the fly by selecting changes from the Regional Options tab in the Regional And Language Options property sheet. (See Figure 4-3.) Locale-aware applications should use this value to display formatted data.
Figure 4-3 - User locale set to Konkani.
When changes are made, all locale-aware applications should monitor the window message WM_SETTINGCHANGE and should be able to update their displayed data accordingly. Numbers, currency, date, and time are some of the variables that are affected by the user-locale setting. The user locale is a user variable that cannot be changed programmatically. The only way to change it is for the user to do so manually.
An American user living in Monaco and running an English version of Windows XP sets the user locale to French (Monaco) in order to use the local standards for date, time, and currency.
Known as "input locale" in Windows 2000 and "input language" in Windows XP, this variable describes a language a user wants to enter into an application (not necessarily type) and the method of input. There can be multiple input locales installed and the user can switch between them. The default input locale is the locale that is active when a new application is started (or in some applications, when a new window is opened). Switching to a different input locale is done on a per-thread basis; that is, you can have two different input locales in two different applications.
Figure 4-4 - Each user can add and remove input languages from the Languages tab of the Regional And Language Options property sheet.
Figure 4-5 - List of installed input locales. The user can toggle between them by using right Alt+Shift or by selecting one of them from the taskbar.
This variable is available in Windows XP (and was also available in Microsoft Windows Millennium Edition [Windows Me] though not in Windows 2000) to define the country or location where the user lives. Each user can change this variable on the fly by selecting changes from the Regional Options tab of the Regional And Language Options property sheet. Any changes made are also applied on the fly. By selecting a particular location, the user has set a variable that a Web service (such as one that deals with weather) can check, thus allowing the Web service to deliver information and services specific to the region or country the user has selected.
A user traveling to Malaysia wants to get the weather forecast for that locale and sets the location to that country. (See Figure 4-6 on the following page.) The location is a user variable that cannot be changed programmatically. The only way to change it is for the user to do so manually.
Figure 4-6 - Regional Options with the location variable set to Malaysia.
The thread locale defaults to the currently selected user locale and determines the formatting of dates, times, currency, and large numbers for the thread. It can be changed programmatically using the API SetThreadLocale, but in most cases the thread locale should not be overwritten.
On Microsoft Windows NT 4, a lot of applications used the thread locale to define which language resources should be retrieved and displayed. However, this practice represents a misusage of the thread locale. (In Windows 2000 and Windows XP, the system's resource loader does not default to the thread locale variable.) As you'll see, resource languages should always be driven by and follow the user interface (UI) language variable.
A financial stock-trading application for the New York Stock Exchange used in a bank worldwide has to display the time, date, and stock prices in U.S. formats. This application uses SetThreadLocale to set the thread locale to English (United States) and can use the NLS APIs to format dates, times, and stock prices.
This variable allows each user to select the language of the UI for such things as dialog boxes, menus, and Help files. This option is only available on the MUI Pack of Windows XP Professional and on the MultiLanguage version of Windows 2000 Professional. (For more information on MUI, see Chapter 6, "Multilingual User Interface [MUI].")
It's important to distinguish between the system UI language and the user UI language. Though it is true that the user UI language is sometimes the same as the system UI, in other instances it is not. The system language is the language of the localized version that was used to set up Windows 2000 Professional or Windows XP Professional. All menus, dialog boxes, error messages, and Help files are in this language, except on multilanguage versions (such as on the MUI Pack of Windows XP Professional and the MultiLanguage version of Windows 2000 Professional), where the user can select a different language.
The user UI language on a non-MUI machine would be the same as the system UI language. With MUI, however, the user can change the language by clicking the Languages tab within the Regional And Language Options property sheet. (See Figure 4-7.) To see the effect of this change, the user will have to log off and then log back on.
Figure 4-7 - The UI language option is only available on MUI systems (and on the MultiLanguage version of Windows 2000 Professional). In this case the UI language is being set to Hebrew (though changes have not yet been applied).
MUI is only available on Microsoft Windows 2000 Professional, Microsoft Windows 2000 Server Family, Microsoft Windows XP Professional, and Microsoft Windows .NET Server Family. The UI language is a user variable that cannot be changed programmatically. The only way to change it is for the user to do so manually.
Microsoft Internet Explorer versions 4 and later share the same functionality across all Windows platforms. Since versions of Windows prior to Windows 2000 did not offer the same flexibility in terms of locale selection (no distinction between user and system locale, for example), Internet Explorer allows the user to select the browser language setting. Web sites can use this setting to offer their content in the user's selected language and to format and display data using the selected locale standards, as in the case of http://www.msn.com. To try out the browser language setting, from Internet Options, click the Languages button (see Figure 4-8), set your browser language to French (France), for example, and go to the MSN Hotmail site at http://www.hotmail.com. MSN Hotmail reads the selected browser setting and redirects you to the French Hotmail version.
All the locale variables described previously function independently of one another, and changing one of them does not affect the setting of the other variables. To summarize this section, here's an example involving an English version of Windows XP.
Figure 4-8 - The browser language setting can be accessed through Internet Options on the Tools menu. Click the Languages button.
Michelle from Chile is living in the United States. Her location is set to the United States because she wants to use a national Internet Service Provider (ISP) and also wants to get the weather forecast for the United States. Her user locale is set to Spanish-Chile because she wants to see formatted data within her native country's standards.
Michelle is also fluent in Korean. She recently purchased a Korean word-processor (a non-Unicode application) that she wants to run on her English machine. She sets her system locale to Korean. To start using her application, in addition to using the Englishkeyboard, she installs a Korean IME as a second input locale. Her husband, sharing the same computer and not being comfortable with English, can set the UI language to Spanish for his own account (if running an MUI version)!
Now that you have seen the basic functionality of locale variables, and their impact on the system's and an application's behavior, you will be better equipped to apply the technical concepts discussed in the remainder of the chapter. The section that follows will help you understand how to use locale variables when dealing with Win32 applications. It will also discuss working with NLS APIs, language IDs, and sort IDs.
To provide you with a general comprehension of the settings you'll encounter among various systems and environments, the previous section discussed all locale variables. However, among all the locale variables described, only the user locale, the location, the browser language setting, and the thread locale deal with locale awareness in applications. These particular variables will be the focus of the remainder of this chapter.
As mentioned previously, NLS APIs are used in the traditional Win32 programming paradigm in order to provide culture-sensitive or locale-sensitive information. NLS is an unfortunate misnaming, since the settings it deals with are not always "national," but instead can be cultural or even custom-defined. NLS APIs provide the right tools to eliminate any locale-specific assumptions from your application, allowing you to represent data in the formatting that corresponds to the currently selected user locale. For example, suppose that you want torepresent the system time; the English-centric way of doing it would be something like:
SYSTEMTIME SystemTime; GetSystemTime(&SystemTime); stprintf(g_szTemp, TEXT("today is: %d/%d/%d"), SystemTime.wMonth, SystemTime.wDay, SystemTime.wYear);
Without taking into account the type of date and calendar format that the user is expecting, the English (United States) Gregorian date format of mm/dd/yyyy is imposed here. To overcome this problem, NLS APIs provide three sets of APIs that allow you to:
The modular approach would be to query the selected user locale, get the default calendar format for that retrieved locale, and format the system time using the default calendar format. A common argument to use for almost all of the NLS APIs is a locale ID, or LCID. An LCID is a combination of a language ID (LANGID) and a sort ID. Here is how it works.
A language ID is a WORD value composed of a primary language combined with a sublanguage ID that defines the country, region, or even a writing system where this language is used. For example, English is a primary language and Australia is a sublanguage, yielding the English (Australia) language ID. (For the list of system-defined LANGIDs, see the Microsoft Windows Platform SDK at http://msdn.microsoft.com, and Appendix D, "Table of Language Identifiers.")
For most of the locales, the LANGID is sufficient to clearly identify cultural characteristics of a given language. However, a few locales have an additional parameter that would make them distinguishable from one variation to another; that parameter is the sort order, or the way characters (or ideographs) are sorted. For example, the German (Germany) language has two types of sorting: Dictionary and Phone Book (DIN) that treat umlaut characters differently. To identify these two versions of sort order for German (Germany), the notion of LANGID has been expanded with a sort ID to create a DWORD value called the "LCID." (See Figure 4-9.)
Figure 4-9 - LANGIDs and Locale IDs.
As mentioned, for most of the locales a sort ID is unnecessary. Therefore, the SORT_DEFAULT value is set in their respective LCIDs. (For a list of Windows-supported sort methods, see Appendix F, "Sorting Methods for Select Languages.")
The following section will discuss how to work with locale-specific information within the .NET Framework. As you'll see, culture codes, subculture codes, and the CultureInfo class all play an important role within this framework.
The .NET Framework took a new approach to the definition of locale identification and readjusted the misnaming introduced by NLS APIs with its introduction of culture codes (associated with a language) and subculture codes (associated with a country or region). The CultureInfo class from the System.Globalization namespace contains culture-specificinformation, such as the language, country or region, calendar, and other cultural conventions. This class also provides the information required for performing culture-specific operations, such as casing, formatting dates and numbers, and comparing strings.
The CultureInfo class specifies a unique name for each culture based on the Request for Comments (RFC) 1766 from the Internet Engineering Task Force (IETF). (For more information, go to http://www.ietf.org.) The name is a combination of a two-letter lowercase culture code associated with a language (known as a "primary language ID" in traditional NLS) and, if required, a two-letter uppercase subculture code (known as a "sublanguage" in traditional NLS) associated with a country, region, or even a writing system. The subculture code follows the culture code, separated by a dash (-). Examples include "ja-JP" for Japanese in Japan, "en-US" for U.S. English, or "de-DE" for German in Germany (as opposed to an alternate such as "de-AT" for German in Austria). In this context, a neutral culture is specified by only the two-letter lowercase culture code. For example, "fr" specifies the neutral culture for French, and "de" specifies the neutral culture for German. A specific culture is identified by the culture code followed by the two-letter uppercase subculture code. For example, "fr-FR" specifies French in France and "fr-CA" specifies French in Canada.
Some culture names have suffixes that specify the sort order or the script; for example, "-Cyrl" specifies the Cyrillic script, "-Latn" specifies the Latin script. Thus Azeri (Cyrillic)-Azerbaijan would be indicated by "az-AZ-Cyrl." The culture identifier is the hexadecimal value associated with these culture names and is equivalent to NLS LCIDs. (For a complete list of culture names and identifiers, see Appendix K, "CultureInfo Names and Identifiers in the Microsoft .NET Framework Library.")
The RegionInfo class, also from the System.Globalization namespace, contains information about a given country or region. In contrast with CultureInfo, RegionInfo does not represent user preferences and does not depend on the user's language or culture. A good example of this category of information is the RegionInfo.IsMetric property of RegionInfo that retrieves a Boolean value, indicating whether the country or region uses the metric system for measurements. Another example is the RegionInfo.CurrencySymbol property, which returns the default currency symbol of a given region or country.
The RegionInfo.Name property is one of the two-letter codes defined in the International Organization for Standardization (ISO) 3166 for representing a particular country or region. (For a complete list of culture names and identifiers, see Appendix L, "RegionInfo Names and Identifiers in the Microsoft .NET Framework Library.")
The sections that follow will show you the specific tasks involved in writing an application that is locale-aware. All code samples in the rest of this chapter are given in the following programming languages: