International Features | Developing International Software

MLang implements several COM objects that offer support for character encoding and text display. The object at the highest level is the MultiLanguage object, which is exposed by the IMultiLanguage, IMultiLanguage2, and IMultiLanguage3 interfaces. IMultiLanguage2, which should be used instead of the older IMultiLanguage, provides access to all of MLang's other interfaces and objects. These include objects for code-page enumeration, locale enumeration, character-set conversion, code-page detection, and font linking, all of which are discussed in the sections that follow.

Code-Page and Locale Enumeration

There are numerous situations in which a program might need to enumerate the list of code pages and locales that the system supports-for example, to enable the user to invoke a preference. However, the code pages and locales that Windows supports can vary from one installation to another.

To help you determine which code pages and locales are supported on a given installation, MLang provides an object for code-page and locale enumeration. These objects can be used to retrieve the code pages and locales that the system recognizes from the MIME database. Enumerating code pages and locales involves the following three steps:

Obtain a pointer to the IMultiLanguage2 interface.
Call either the IMultiLanguage2::EnumCodePages method or the IMulti-Language::EnumRfc1766 method to obtain an interface to the appropriate enumeration object.
Call the Next method of the enumeration interface that is obtained.

When enumerating code pages, the appropriate MIMECONTF constant should be used as the grfFlags parameter for the EnumCodePages method call. The MIMECONTF constants specify a subset of code pages (such as code pages that mail, news, or browser clients use). Additionally, these constants allow you to tailor the list of code pages returned.

Another international feature of MLang is its support for character-set conversion. The following shows the interfaces and objects that you can use.

Character-Set Conversion

MLang provides a number of different ways to convert strings from one code page to another, the best of which is to use the MLang Conversion object. This object supports the IMLangConvertCharset interface and is dedicated to converting strings between a source code page and a destination code page, both of which the user must specify.

The IMultiLanguage2 interface also supports a number of conversion methods. These methods function in the same manner as the IMLangConvertCharset methods, but they take slightly different parameters. The IMultiLanguage2 methods are less efficient than IMLangConvertCharset methods when doing multiple conversions between the same combination of source and destination code pages.

When using MLang's conversion functionality, a caller should always try to verify whether the parameters that specify the size of the source string and destination buffer are measured in a byte count or character count. In general, all APIs that are specifically dedicated to converting to or from Unicode take a character count for the Unicode string. All other strings are measured in a byte count.

MLang provides several conversion methods, which include methods associated with IMLangConvertCharset and IMultiLanguage2.

IMLangConvertCharset methods:

IMLangConvertCharset::DoConversion
IMLangConvertCharset::DoConversionFromUnicode
IMLangConvertCharset::DoConversionToUnicode

IMultiLanguage2 methods:

IMultiLanguage::ConvertString
IMultiLanguage::ConvertStringFromUnicode
IMultiLanguage::ConvertStringToUnicode
IMultiLanguage2::ConvertStringFromUnicodeEx
IMultiLanguage2::ConvertStringToUnicodeEx
IMultiLanguage2::ConvertStringInIStream
IMultiLanguage::IsConvertible

Another issue that sometimes arises when working with Internet-related applications is the inability to tell which code page was used for a file. The following discusses how MLang can be used to detect code pages.

Code-Page Detection

Authors often neglect to mark the code page or language of a Web page or other file on the Internet. This can cause problems for programs that need to convert text from the input code page or language into something else. MLang provides two code-page detection methods for determining the possible code pages and languages of the text data that is given by the caller:

IMultiLanguage2::DetectInputCodepage
IMultiLanguage2::DetectCodepageInIStream

These methods return the results in an array of DetectEncodingInfo structures. In addition to containing a detected language and code page, these structures include two members that indicate the percentage of the data that is in the detected language, as well as the relative confidence that the structure contains the correct language and code page. To help increase the accuracy of the detected language and code page, the MLDETECTCP enumerated type is provided. The values associated with MLDETECTCP specify the type of data that is being given to the detection method.

The numerous fonts that exist in multilingual content present special challenges. Individual character sets have their own corresponding fonts. Once again, MLang provides an effective solution, in this case via font linking.

Font Linking

Through font linking, MLang allows you to output text containing characters from a number of different languages and character sets that the specified font might not support. (For more information on fonts and on font linking, see Chapter 5, "Text Input, Output, and Display.") This functionality is especially useful when dealing with Unicode strings, which might contain characters from many character sets at once. Font linking involves the creation of a custom font inherited from a specified source font. The methods used to perform font linking are contained in the IMLangFontLink interface.

Determining the need for-and creating-a custom font is a five-step process:

Use the IMLangFontLink::GetFontCodePages method to return a set of code pages that the specified font supports.
Use the IMLangCodePages::GetStrCodePages (or IMLangCodePages::-GetCharCodePages for a single character) method to return the code pages necessary for the given string.
Do a bitwise comparison of the results of calls 1 and 2.
Use the IMLangFontLink::MapFont method to create a font if the results of the comparison show that it is needed.
Call IMLangFontLink::ReleaseFont to delete the font from the font cache when it is no longer needed.

These steps are illustrated in the following code sample:

 DWORD dwFontCodePages; DWORD dwCharCodePages; // Get the code page (or pages) supported by the font. pMLangFontLink->GetFontCodePages(hDC, hFont, &dwFontCodePages); // Get the code page (or pages) that support the character. pMLangCodePages->GetCharCodePages(ch, &dwCharCodePages); // Check to see if the character is supported by the font. if (dwFontCodePages & dwCharCodePages) {  // The character ch can be output by hFont on hDC,   //  no work needs to be done! } else {  HFONT hMappedFont;  pMLangFontLink->MapFont(hDC, dwCharCodePages, hFont,   &hMappedFont);  // The character ch can be output by hMappedFont on hDC. }