Sort Orders


Sorting (also called collation) has numerous differences from culture to culture. It is one of those many areas that the .NET Framework handles correctly with very little intervention on behalf of the developer, other than to specify what culture should be used for sorting. Table 6.6 provides a number of examples of characters and character combinations that sort differently in one language to another.

Table 6.6. Examples of Characters with Different Sort Behaviors in Different Languages

In some languages, letters with diacritics are treated as wholly separate characters:

In Swedish, Ä (U+00C4, Latin Capital Letter A With Diaeresis) is a separate character after Z

In German (phone book sort), Ä (U+00C4, Latin Capital Letter A With Diaeresis) sorts like ae

In Czech, is a separate character between d

In Czech, is a separate character between t

In Czech, is a separate character after

In some languages, two characters sort as a single character:

In Czech, ch sorts as a single character between h and i

In Traditional Spanish, ch sorts as a single character between c and d

In Traditional Spanish, ll sorts as a single character between l and m

In Danish, Æ sorts as a single character after Z

In some languages, letters have the same sort as other letters:

In Lithuanian, y is sorted as i

In Swedish, w is sorted as v


Fortunately, developers do not need to remember or even know these differencesonly that there are differences. Take, for example, the Array.Sort method. This method accepts an IComparer interface to sort elements of an array. If the IComparer is null, Array.Sort uses each element's IComparable interface to determine the order of a sort. IComparable has a single method, CompareTo. The String class supports the IComparable interface and includes the CompareTo method. The String.CompareTo method uses the CultureInfo.CurrentCulture.Compare-Info.Compare method to perform a culture-sensitive comparison between two strings. In the following code snippet, the en-US culture is used to sort two strings:

 string[] strings = new string[] {"eé", "ée"}; Thread.CurrentThread.CurrentCulture = new CultureInfo("en-US"); Array.Sort(strings); foreach(string s in strings) {     listBox1.Items.Add(s); } 


The output is "" and then "ée". If we change the culture to French in France (fr-FR), in which diacritics (e.g., the acute accent above the e) are evaluated from right to left instead of left to right, the order is reversed. The point is that the sort order will respect the local culture's sort behavior without you having to know what that behavior is.

This returns us to the String.CompareTo, String.Compare and String.CompareOrdinal methods that we saw earlier in the "String Comparisons" section. The single difference between String.CompareTo and String.Compare is that the former is an instance method and the latter is a static method. As we have seen, the String.CompareTo method exists so that the String class can support the IComparable interface. In nearly all cases, String.CompareTo and String.Compare call CompareInfo.Compare to perform string comparisons. The various overloads either accept an explicit culture or, like most globalization methods, default the culture to CultureInfo.CurrentCulture. The return result is an integer indicating the relative order of the two strings:

Negative if the first string is sorted before the second

0 if the first string is equal to the second

Positive if the first string is sorted after the second

In most cases, the negative value will be -1 and the positive value will be +1. The exception is the String.Compare overload in the .NET Framework 2.0, which accepts a StringComparison enumeration where the value is Ordinal or Ordinal IgnoreCase. In this scenario, the "magnitude" of the difference is expressed in the same way as for String.CompareOrdinal.

String.CompareOrdinal is similar to String.Compare, but it performs the comparison based upon the Unicode code points of each character in the string and returns the "magnitude" of the difference. For example, String.CompareOrdinal("a", "á") returns -128. The Unicode code point of the letter "a" is U+0061 (97), and the Unicode code point of the letter "á" is U+00E1 (225). The result is 97 225 (i.e., 128). There are several benefits to using String.CompareOrdinal: It is culture-insensitive (because it uses Unicode code points) and it is faster than other comparison methods. It should also be noted that String.CompareOrdinal compares all characters in a string, whereas other comparison methods are dependent upon characters being defined in the .NET Framework's sorting tables. This means that if a comparison is performed using String.Compare and is passed the invariant culture, characters that are not in the .NET Framework's sorting tables will simply be ignored.

For these reasons, Microsoft recommends using an ordinal comparison for culture-insensitive comparisons. Additionally, it should be noted that although the actual sort itself is unlikely to yields results that are culturally significant for any particular culture, it can still be useful for maintaining ordered lists that require fast searching.

Alternate Sort Orders

The discussions on sort orders so far have assumed that each culture has a single method of sorting. However, a few cultures have more than one way of sorting the same data. All existing .NET Framework cultures have a default sort order, and a few have a single alternate sort order in addition to their default. Spanish, for example, has two sort orders: Modern/International (the default sort order that is typically used in Spain and the U.S.) and the Traditional alternate sort order (used less frequently in some situations in Spain). Each CultureInfo object has a single CompareInfo class that it uses for sorting. Using a different sort order requires creating a different culture. To use the Traditional sort order for Spanish, you must create a new CultureInfo object or create a CompareInfo object using CompareInfo.GetCompareInfo. The CultureInfo object for the alternate sort order is identical in every way to the CultureInfo object for the default sort order, with the exception of its CompareInfo object. This means that the culture's name is also the same. This presents a problem, then, in creating the CultureInfo object:

 CultureInfo cultureInfo = new CultureInfo("es-ES"); 


This code is ambiguous to the reader because there are two cultures that have the name "es-ES". When you use a string in the format <language>-<region> to identify a culture, you get the culture with the default sort order. Both the .NET Framework 1.1 and 2.0 enable you to create a culture for an alternate sort order using a locale ID (LCID):

 CultureInfo cultureInfo = new CultureInfo(0x0000040A); 


In addition, the .NET Framework 2.0 supports the creation of cultures for an alternate sort order using a language and region suffixed with the alternate sort order:

 CultureInfo cultureInfo = new CultureInfo("es-ES_tradnl"); 


The same name can be used with the new CultureInfo.GetCultureInfo method to get a cached read-only CultureInfo. The following two examples of the CultureInfo.GetCultureInfo result in the same CultureInfo object:

 CultureInfo cultureInfo1 =     CultureInfo.GetCultureInfo("es-ES", "es-ES_tradnl"); CultureInfo cultureInfo2 =     CultureInfo.GetCultureInfo("es-ES_tradnl"); 


This capability to specify a culture including an alternate sort order by name is an important enhancement to the .NET Framework because it means that all cultures can now be represented as strings. In contrast, to be able to represent all cultures (including cultures with alternate sort orders), code written for the .NET Framework 1.1 must support representing cultures using both strings (e.g., "es-ES") and also integers (e.g. 0x0000040A).

Table 6.7 is a list of all of the alternate sort orders recognized by the .NET Framework 1.1 and 2.0.

Table 6.7. Cultures with Alternate Sort Orders

Culture Name

Culture English Name

Default Sort Name

Alternate Sort Name

Alternate Sort LCID

CompareInfo Name

es-ES

Spanish (Spain)

International

Traditional

0x0000040A

es_ES_tradnl

zh-TW

Chinese (Taiwan)

Stroke Count

Bopomofo

0x00030404

zh-TW_pronun

zh-CN

Chinese (China)

Pronunciation

Stroke Count

0x00020804

zh-CN_stroke

zh-HK

Chinese (Hong Kong SAR)

Stroke Count

Stroke Count

0x00020c04

zh-HK_stroke

zh-SG

Chinese (Singapore)

Pronunciation

Stroke Count

0x00021004

zh-SG_stroke

zh-MO

Chinese (Macau SAR)

Pronunciation

Stroke Count

0x00021404

zh-MO_stroke

ja-JP

Japanese (Japan)

Default

Unicode

0x00010411

ja-JP_unicod

ko-KR

Korean (Korea)

Default

Korean XwansungUnicode

0x00010412

ko-KR_unicod

de-DE

German (Germany)

Dictionary

Phone Book Sort DIN

0x00010407

de-DE_phoneb

hu-HU

Hungarian (Hungary)

Default

Technical Sort

0x0001040e

hu-HU_technl

ka-GE

Georgian (Georgia)

Traditional

Modern Sort

0x00010437

ka-GE_modern


Although the Japanese and Korean alternate Unicode sorts exist, they are unlikely to be used for any real purpose for sorting Japanese or Korean data. They are almost identical to the default sort, with the exception of one or two code points (e.g., Korean Won, Japanese Yen). They exist for compatibility.


The CompareInfo class has a Name property in the .NET Framework 2.0 (but not 1.1), which is the same as the CultureInfo.Name (e.g. "es-ES") for all default sort orders. This name can be used to specify cultures for alternate sort orders in the .NET Framework 2.0.

It is worth noting that, regardless of the data type (i.e., string or integer) used to create a CultureInfo object, the resulting CultureInfo.Name is the same as the name of the default sort order. The following example outputs "es-ES", "es-ES", and "es-ES":

 CultureInfo cultureInfo1 = new CultureInfo("es-ES"); CultureInfo cultureInfo2 = new CultureInfo(0x0000040A); CultureInfo cultureInfo3 = new CultureInfo("es-ES_tradnl"); listBox1.Items.Add(cultureInfo1.Name); listBox1.Items.Add(cultureInfo2.Name); listBox1.Items.Add(cultureInfo3.Name); 


To distinguish between the different cultures, you should use either the LCID (in the .NET Framework 1.1 and 2.0) or, preferably, the CompareInfo.Name (in the .NET Framework 2.0).

Unfortunately, the .NET Framework does not support any facility for program-matically discovering alternate sort orders. However, the Win32 EnumSystem Locales function accepts a parameter of LCID_ALTERNATE_SORTS, which does provide this functionality and enables you to offer a choice of sort orders to a user. The following class is a wrapper around this function, and the GetAlternative SortOrders method returns an array of LCIDs of alternate sort orders:

 public class AlternateSortOrders {     public static int[] GetAlternateSortOrders()     {         const uint LCID_ALTERNATE_SORTS = 4;         alternateSortOrders = new List<int>();         EnumSystemLocales(new LocaleEnumProc(AlternateSortsCallback),             LCID_ALTERNATE_SORTS);         int[] alternateSortOrdersArray =             new int[alternateSortOrders.Count];         alternateSortOrders.CopyTo(alternateSortOrdersArray);         return alternateSortOrdersArray;     }     protected delegate bool LocaleEnumProc(string lcidString);     [DllImport("kernel32.dll")]     protected static extern bool EnumSystemLocales(         LocaleEnumProc lpLocaleEnumProc, uint dwFlags);     protected static List<int> alternateSortOrders;     protected static bool AlternateSortsCallback(string lcidString)     {         int LCID;         if (Int32.TryParse(lcidString,             NumberStyles.AllowHexSpecifier, null, out LCID))             alternateSortOrders.Add(LCID);         return true;     } } 


The GetAlternateSortOrders method calls EnumSystemLocales and passes a method (AlternateSortsCallback) to call back for each alternate sort order and a flag (LCID_ALTERNATE_SORTS) specifying that only the alternate sorts should be enumerated. The AlternateSortsCallback method simply converts the LCID string to an integer and adds it to an internal list. When the EnumSystemLocales function has completed enumerating locales, the GetAlternateSortOrders method converts the list of integers to an array of integers and returns the array.

As an alternative, the user can specify the preferred sort order in the Regional and Language Options dialog by clicking on the Customize button and selecting the Sort tab (see Figure 6.3).

Figure 6.3. Using Customize Regional Options to Specify a Sort Order


Although this applies to all .NET Framework applications, it is unlikely to be useful in ASP.NET applications because the culture setting is more likely to arrive from the user's language preferences on their own machine. Figure 6.4 shows that the Spanish (International Sort) and Spanish (Traditional Sort) language preferences appear to be distinct.

Figure 6.4. Internet Explorer Language Preferences Dialog Showing Different Sort Orders


Unfortunately, this is just smoke and mirrors, as you can see from the "[es]" language code next to the description. If you close this dialog and reopen it (see Figure 6.5), you will see that even Internet Explorer cannot tell the difference between "es" and "es". There are at least two workarounds, and both involve defining a User-Defined language in Internet Explorer. The simplest is to specify a culture name that includes the sort order (e.g., "es-ES_tradnl"). The second, more complex, workaround is to specify a culture using the LCID as the name (see Figure 6.6).

Figure 6.5. Internet Explorer Language Preferences Dialog Showing Same Sort Orders


Figure 6.6. Internet Explorer Language Preferences Dialog with User-Defined LCID


Unfortunately, the Culture="auto" and UICulture="auto" tags used in ASP.NET 2.0 localized forms do not recognize LCIDs as valid culture identifiers, so you have to read the Request.UserLanguage[0] value and set the CurrentCulture and CurrentUICulture in code. In ASP.NET 2.0, you can override the Page.InitializeCulture method to initialize the culture from the LCID:

 protected override void InitializeCulture() {     if (Request.UserLanguages.GetLength(0) > 0)     {         string userLanguage = Request.UserLanguages[0];         if ((userLanguage.StartsWith("0x") ||             userLanguage.StartsWith("0X"))&&             userLanguage.Length > 2)         {             // Int32.Parse requires that hex numbers do not             // start with "0x" or "oX"             string LCIDString = userLanguage.Substring(2);             int LCID;             if (Int32.TryParse(LCIDString,                 NumberStyles.AllowHexSpecifier, null, out LCID))             {                 try                 {                     Thread.CurrentThread.CurrentCulture =                         new CultureInfo(LCID);                     Thread.CurrentThread.CurrentUICulture =                         Thread.CurrentThread.CurrentCulture;                 }                 catch (ArgumentException)                 {                     // the LCID was not a valid LCID                 }             }         }         else         {             try             {                 int LCID = Convert.ToInt32(userLanguage);                 Thread.CurrentThread.CurrentCulture =                     new CultureInfo(LCID);                 Thread.CurrentThread.CurrentUICulture =                     Thread.CurrentThread.CurrentCulture;             }             catch (ArgumentException)             {                 // the LCID was not a valid LCID             }             catch (FormatException)             {                 // the LCID was not an integer             }         }     } } 


This method accepts LCIDs either as hex values (prefixed with "0x") or as integers. Notice that the method deliberately ignores exceptions that result from an invalid user language, in keeping with ASP.NET's default behavior.

Persisting Culture Identifiers

There will be occasions when it will be necessary to persist a culture identifier. It may be to store a culture in a config file for a user preference, or in a database to maintain a list of selected cultures, or in an XML document for consumption by another process. The method of persistence of the culture identifier requires a moment's thought. We saw in the previous section that simply using a culture's name is insufficient to distinguish between a culture with a default sort order and a culture with an alternate sort order (because both cultures have the same name). The following method is suitable for persisting culture identifiers in the .NET Framework 2.0:

 public static string GetPersistentCultureName(     CultureInfo cultureInfo) {     if ((CultureTypes.UserCustomCulture & cultureInfo.CultureTypes)         != (CultureTypes)0)         return cultureInfo.Name;     else         return cultureInfo.CompareInfo.Name; } 


The if statement determines whether the culture is a custom culture (see Chapter 11). If the culture is a custom culture, the culture's name uniquely identifies the culture in all cases. If the culture is not a custom culture, the culture's CompareInfo name uniquely identifies the culture. The CompareInfo name is used instead of the culture's name because this value respects the culture's sort order. We cannot use the CompareInfo name for custom cultures because custom cultures use CompareInfo objects from existing cultures, and such names do not uniquely identify the custom culture.

The following method is suitable for persisting culture identifiers in the .NET Framework 1.1:

 public static string GetPersistentCultureName(     CultureInfo cultureInfo) {     if (cultureInfo.LCID == 0x0000040A ||         cultureInfo.LCID == 0x00030404 ||         cultureInfo.LCID == 0x00020804 ||         cultureInfo.LCID == 0x00020c04 ||         cultureInfo.LCID == 0x00021004 ||         cultureInfo.LCID == 0x00021404 ||         cultureInfo.LCID == 0x00010411 ||         cultureInfo.LCID == 0x00010412 ||         cultureInfo.LCID == 0x00010407 ||         cultureInfo.LCID == 0x0001040e ||         cultureInfo.LCID == 0x00010437)         return cultureInfo.LCID.ToString();     else         return cultureInfo.Name; } 


The .NET Framework 1.1 does not support custom cultures, so there is no need to write code for them. However, the .NET Framework 1.1's CompareInfo class doesn't have a name property, and its CultureInfo's constructors do not accept Compare Info names to create cultures with alternate sort orders. The result is that cultures with alternate sort orders must be persisted using their locale IDs instead of their names. Any code that subsequently constructs a CultureInfo object from the resulting string must first check whether the string contains a name or number. If it contains a number, the string must first be converted to an integer.




.NET Internationalization(c) The Developer's Guide to Building Global Windows and Web Applications
.NET Internationalization: The Developers Guide to Building Global Windows and Web Applications
ISBN: 0321341384
EAN: 2147483647
Year: 2006
Pages: 213

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net