Casing | Developing International Software

Glossary

Case-folding: Taking a string of text and converting everything into either lowercase or uppercase.
Lowercase: Denotes letters that are not capitalized. For instance, the word "nationality" is all lowercase. The notion of lowercase does not apply to East Asian and Middle Eastern scripts.
Uppercase: Denotes letters that are capitalized. For instance, acronyms (such as "HTML") typically consist of all uppercase letters. The notion of uppercase does not apply to East Asian and Middle Eastern scripts.

Another thing you'll need to consider when creating a locale-aware application involves linguistic nuances. These nuances might seem trivial, but could have a large impact on application design and functionality. For example, Windows allows you to convert characters into either uppercase or lowercase equivalents. Some applications use this feature to automatically convert the first letter of every sentence into uppercase or to assume that certain types of words should always be capitalized. In Russian, however, names of the days of the week are never capitalized-capitalizing the word for "Wednesday" changes the meaning to "environment," and capitalizing the word for "Sunday" changes the meaning to "resurrection."

In the past as localized products were developed, language-sensitive issues-such as casing-were sometimes handled with what were thought of as well-designed, intelligent algorithms. For example, an uppercasing macro that relies onthe code-point numbers of ASCII characters and the linear relationship between uppercase characters (A = 41) and lowercase characters (a = 61) can be written as:

 #define ToUpper(ch)  ((ch)<='Z' ? (ch) : (ch)+'A' - 'a')

You can see the problems this English-centric approach presented when representing uppercasing on non-Latin scripts or languages with accented characters where, for example, character mapping doesn't follow the assumed relationship between lowercase and uppercase characters? There are several other reasons why algorithmic solutions for case-folding do not cover all occurrences.

First, some languages do not have a one-to-one mapping between their uppercase and lowercase characters. For instance, the uppercase equivalent of the German is "SS." Second, some characters have different mappings depending upon the language in which they are used. For example, the lowercase "i" in English maps to a dotless uppercase letter: "I." However, in Turkish the lowercase "i" maps to a dotted uppercase letter: " ." Finally, most non-Latin scripts do not even use the concept of lowercase and uppercase, as in the case of Chinese, Japanese, and Korean; Arabic, Farsi, and Hebrew; as well as Thai. For example, since Farsi has no notion of uppercasing, string output is composed of random and unsup-ported glyphs. (See Figure 4-20.)

figure 4-20 the english-centric uppercasing macro used on an english string and on a farsi string, where the notion of casing does not exist.

Figure 4-20 - The English-centric uppercasing macro used on an English string and on a Farsi string, where the notion of casing does not exist.

Casing in Win32

CharUpper and CharUpperBuff functions convert lowercase characters of a string or a buffer, respectively, to uppercase characters. This uppercasing is done with regard to the currently selected user-locale value and the linguistic uppercasing rules associated with this locale. CharLower and CharLowerBuff functions convert uppercase characters of a string or a buffer, respectively, to lowercase characters. This lowercasing is done with regard to the currently selected user-locale value and the linguistic lowercasing rules associated with this locale.

If you want to perform the casing operation based on locale standards other than the currently selected user-locale rules (something that the functions just mentioned do not allow), you can use LCMapString as shown here:

 LCMapString(    LCID Locale,        // locale identifier whose rule will be used                        //    to perform the casing    DWORD dwMapFlags,   // mapping transformation (LCMAP_LOWERCASE or                        //    LCMAP_UPPERCASE    LPCTSTR lpSrcStr,   // source string    int cchSrc,         // number of characters in source string    LPTSTR lpDestStr,   // destination buffer    int cchDest         // size of destination buffer );

Casing in Web Pages

Scripts running in a browser might need to manipulate the character casing. VBScript and Microsoft JScript both provide means for case conversions thatoperate on multilingual input. In VBScript, use UCase to convert a string to uppercase and LCase to convert a string to lowercase; in JScript, use String.toUpperCase and String.toLowerCase, respectively.

The following example uses the UCase function to return an uppercase version of a string:

 Dim MyWord MyWord = UCase("Hello World")   ' Returns "HELLO WORLD".

Obviously the scripting technology does not offer the same flexibility to manipulate the casing as NLS APIs do in the case of Win32 programming. However, the functions available give you the ability to manipulate the casing for whatever string, though the conversion is not locale-sensitive.

Casing in the .NET Framework

The String class provides a set of methods you can use to perform culture-sensitive string manipulation once you set your CurrentCulture to a given desired culture. The String.ToUpper and String.ToLower methods can be used to convert a character string to uppercase or lowercase for a given culture.

Another major area that pertains to locale awareness is sorting in a way that matches the particular locale. The sections that follow examine the best way to accommodate the multiple sort orders that exist in various countries and regions. They will also illustrate the most efficient ways to perform string comparison for Win32 applications and in the .NET Framework.