Although starting out as fixed-width raster fonts that were a collection of bitmaps limited to a given script, fonts evolved later into Unicode-based OpenType fonts containing extensive tables on glyph positioning and substitution. This advancement in font technology has been one of the most important contributions to multilingual computing. In fact, fonts are the final form in which multilingual data is displayed to the user.
In the sections that follow, you will first be introduced to some of the new technologies used in Windows 2000 and Windows XP in order to accommodate font support and multilingual text display. Then you will learn about different programming techniques that you should use to take advantage of the system's support for fonts both in Win32 and Web content.
Glossary
One of the biggest challenges in enabling the operating system for international character sets is the ability to select and display the right glyph. When editing a multilingual document, the user should not be expected to select a different font for each one of the scripts he or she wants to view because:
Therefore, in addition to the font substitution (also known as "font association") technique used since the early versions of Windows, new features-OpenType fonts, font fallback, and font linking-were introduced in Windows 2000 to solve these types of font-selection problems.
The Unicode-based OpenType font format has been developed jointly by Microsoft and Adobe; it extends the TrueType font file format originally designed by Apple. OpenType fonts allow rich mapping between characters and glyphs, thus enabling support for ligatures, positional forms, alternates, and other substitutions. OpenType fonts can also include information that supports two-dimensional glyph positioning and glyph attachment, and can contain either TrueType or PostScript outlines. Layout features within OpenType fonts are organized by scripts and languages, allowing a single font to support multiple writing systems, even within the same script.
Core OpenType fonts (such as Tahoma or Arial) contain glyphs for Western and Central European, Hebrew, Arabic, Greek, Turkish, Baltic, Cyrillic, and Vietnamese scripts. Although OpenType fonts do not contain East Asian script characters, they link to fonts that do. The main reason behind the exclusion of these scripts is related to the massive performance overhead that East Asian glyphs would introduce in terms of font loading and mapping in GDI. In addition, these scripts would make the font size several times bigger. Instead of having instructions on how to create glyphs for several hundred characters, you would have instructions on how to create them for some 6,000 or 7,000 characters, approximately. (For more information on OpenType font technology, see Chapter 20, "OpenType Fonts.")
One benefit of Unicode is the ability to represent many languages and scripts in a single string. This is also a problem, since very few fonts support more than a couple of scripts. Indeed, it's very difficult to do a good job of making fonts with glyphs for different scripts such that all conform to one set of vertical metrics. To overcome this limitation, and in order to accommodate complex scripts, Uniscribe can detect if the currently selected font doesn't support a particular script and can automatically switch-or fall back-to a predefined system font that has appropriate glyphs for the desired script. All these operations are transparent to the user.
Here is an example to better understand this mechanism. A user running Windows XP selects the Tahoma font to enter some text first in English, next in Hebrew, and then in Telugu. Since Tahoma is an OpenType font, it provides support for Latin and Hebrew scripts, but does not contain any Telugu glyphs. Uniscribe detects this lack of font support and automatically renders the Telugu script by using its fallback font, which is Gautami.
Although font fallback can accommodate Indic scripts such as Telugu in Windows XP, no such mechanism existed in Windows 2000. For most of the scripts, the fallback font is set to Microsoft Sans Serif (an OpenType font). For the Indic family of languages, the fallback is set to another appropriate font. Font fallback is internal to Uniscribe, and applications cannot add new fallback fonts or modify existing ones.
Unlike font fallback, in which the selected font is internally replaced by a predefined font, in font linking it is possible to link one or more fonts (called "linked fonts") to another font (called the "base font"). Once you link fonts, you can use the base font to display code points that do not exist in the base font, but that do exist in one of the linked fonts. For example, linking a hangul font and a Japanese font to a Tahoma font allows you to display both Korean and Japanese characters in Tahoma font.
Note
If font linking is enabled on your device, you can examine the registry by enumerating the subkeys of the registry key at HKEY_LOCAL_MACHINE-\SOFTWARE\Microsoft\Windows NT\CurrentVersion\FontLink\SystemLink to determine the mappings of linked fonts to base fonts. You can add links by using Regedit to create additional subkeys. Once you have located the registry key that has just been mentioned, from the Edit menu, click New and then click String Value. Replace <New Value #1> with the <base font face name.> Highlight the new name and then from the Edit menu, click Modify. Within the Edit String dialog box, for the dialog field Value data, enter <face name of the font to link to>.
Important
With font fallback and font linking, the font size of the newly selected font will be the same as that of the original font. For example, if an 8-point Tahoma font was selected to type English and now the user enters some Japanese text, an 8-point MS UI Gothic font will be automatically selected. The 8-point font size might not be the best choice for some scripts, since it can make them hard to read.
 Both font fallback and font linking contain logic to estimate an appropriate font size, but both mechanisms have to use metrics exposed by the font that might or might not actually match the way the font appears. Consider the difference in the height of English letters among 8-point Microsoft Sans Serif, 8-point Traditional Arabic, and 8-point Angsana New ( respectively). Even though all of these are supposedly 8-point fonts, the actual size of the English letters varies widely. Font fallback and font linking are no substitutes for choosing the right font in the first place. Rather, these mechanisms are simply a means of preventing the user from manually selecting a font; additionally, they prevent UI text from being displayed as a default glyph.
 respectively). Even though all of these are supposedly 8-point fonts, the actual size of the English letters varies widely. Font fallback and font linking are no substitutes for choosing the right font in the first place. Rather, these mechanisms are simply a means of preventing the user from manually selecting a font; additionally, they prevent UI text from being displayed as a default glyph. 
The following sections contain technical recommendations for font selection in Win32 applications. More specifically, you'll see best practices for dealing with dialog resource files and for selecting fonts at run time.
Fonts that have glyphs for all supported scripts and for all Unicode characters are very rare. Arial Unicode MS is one of the most complete fonts for Windows, and yet it does not contain all the glyphs associated with all Unicode code points. In order for your application to take advantage of the system font support described previously, you should, as a general rule, adhere to the following guidelines:
Along the same lines, it's strongly suggested that you do not hard-code the font size that you use, and that you make this variable customizable according to the script to be displayed; since some scripts are more complicated than others, they need more pixels to be displayed properly. For example, most English characters can be displayed on a 5x7 grid, but Japanese characters need a grid of at least 16x16 to be seen clearly. Chinese characters, on the other hand, need a 24x24 grid. Thai characters only need 8 pixels for width, but they need at least 22 pixels for height. Thus it is easy to understand why some characters in a small font size might not be legible. (See Figure 5-22.)
 
 
Figure 5-22 - Comparison of English and Japanese characters in varying font sizes.
System fonts are different from one language version to another (even within the same version of the operating system). Those in charge of translating resource files often do not have enough information or technical background on how to change the font face name-whether this change involves replacing the entire name or only modifying it slightly, such as when adding charset information-for the different languages into which they are translating text. In the following example, MS Sans Serif, a bitmap font that only contains glyphs for Western European languages, is being used in the dialog resources. If the application is localized into Turkish or Japanese, for example, without changing the font face name, the UI text will be displayed in the default glyph as empty squares. (See Figure 5-23.) This type of display occurs because MS Sans Serif is not an OpenType font that can accommodate Turkish script, and it is not font linked in the system.
DLG_NLS DIALOG DISCARDABLE 0, 0, 344, 260 STYLE DS_MODALFRAME | WS_POPUP | WS_CAPTION | WS_SYSMENU CAPTION "NLS APIs" FONT 8, "MS Sans Serif"
 
 
Figure 5-23 - A property sheet translated into Korean on Windows 2000, where the font is set to MS Sans Serif. The system fails to find appropriate glyphs for this font and ends up displaying the default glyph.
Because the desired behavior is to have the UI font of your application follow the desktop (Shell) UI font, and because the default Shell font is different from one localized language of the operating system to another (for example, Microsoft Sans Serif for English, Tahoma for Arabic, and so on), the best practice is to always use the higher-level font face name known as "MS Shell Dlg." MS Shell Dlg is actually not a font. Rather, it is a font face name that gets mapped to the right font depending on the font-substitution settings of the operating system. By setting your default resource font as MS Shell Dlg, you are assured of providing the appropriate font solution, not only on Windows 2000 and Windows XP, but also on all versions of Windows since Windows 95!
Hard-coding the font name in your code can have the same result as hard-coding it in dialog resource files. Either action can break your UI. Here again, your font selection should be flexible and context-dependent. Instead of a direct call to the CreateFont or CreateFontIndirect APIs, where the font attributes are hard-coded in a LOGFONT structure, you should use EnumFontFamiliesEx. EnumFontFamiliesEx enumerates all fonts in the system that match the font characteristics specified by the LOGFONT structure-in this case, by character set instead of by font face name.
Possible character sets for a LOGFONT structure are shown in Table 5-6.
Table 5-6 Possible character sets for a LOGFONT structure.
| Charset Name | Charset Value | Charset Value | Code Page | 
| (Hex) | (Decimal) | ID | |
| ANSI_CHARSET | 0x00 | 0 | 1252 | 
| DEFAULT_CHARSET | 0x01 | 1 | CP_ACP | 
| SYMBOL_CHARSET | 0x02 | 2 | CP_OEMCP | 
| SHIFTJIS_CHARSET | 0x80 | 128 | 932 | 
| HANGUL_CHARSET | 0x81 | 129 | 949 | 
| GB2312_CHARSET | 0x86 | 134 | 936 | 
| CHINESEBIG5_CHARSET | 0x88 | 136 | 950 | 
| GREEK_CHARSET | 0xA1 | 161 | 1253 | 
| TURKISH_CHARSET | 0xA2 | 162 | 1254 | 
| HEBREW_CHARSET | 0xB1 | 177 | 1255 | 
| ARABIC_CHARSET | 0xB2 | 178 | 1256 | 
| BALTIC_CHARSET | 0xBA | 186 | 1257 | 
| RUSSIAN_CHARSET | 0xCC | 204 | 1251 | 
| THAI_CHARSET | 0xDE | 222 | 874 | 
| EE_CHARSET | 0xEE | 238 | 1250 | 
| OEM_CHARSET | 0xFF | 255 | n/a | 
Notice that DEFAULT_CHARSET is not a real charset; in reality on Windows 2000 and Windows XP it does two things:
DEFAULT_CHARSET should be used when displaying a string of characters encoded with Unicode. In the example that follows, the code identifies the charset corresponding to the currently selected input language and also enumerates a set of compatible fonts. hWnd is the window handle where the font will be used, and hDlg is a dialog box to display the font list.
 DWORD   dwCodePage; HKL     hkl = (HKL) lParam; LOGFONT lf; HDC     hDc; CHARSETINFO cs; TCHAR szLocaleData [BUFFER_SIZE]; // Initialize the LOGFONT to be used. _tcscpy (lf.lfFaceName, TEXT("")); lf.lfCharSet = DEFAULT_CHARSET; // This is a workaround for Hindi and Tamil, since they //    don't have charsets. Mangal and Latha are the //    fonts for Hindi and Tamil shipping with Windows NT, //    2000, and XP.  A better workaround would be to put //    these strings in data files that can be updated with new //    font face names. You would then call //    EnumFontFamiliesEx once per face name.  if (LOWORD(hkl) == MAKELANGID(LANG_HINDI, SUBLANG_DEFAULT))      _tcscpy (lf.lfFaceName, TEXT("Mangal")); else if (LOWORD(hkl) == MAKELANGID(LANG_TAMIL, SUBLANG_DEFAULT))         _tcscpy (lf.lfFaceName, TEXT("Latha")); else {    // Find out what Charset the new kbd wants.    GetLocaleInfo (LOWORD(hkl), LOCALE_IDEFAULTANSICODEPAGE,       szLocaleData, 6);    dwCodePage = _ttol (szLocaleData);    if (TranslateCharsetInfo ((LPVOID) dwCodePage, &cs,       TCI_SRCCODEPAGE))  {          lf.lfCharSet  = (BYTE) cs.ciCharset;       } } // Get list of fonts that support this charset. // hDc is needed by EnumFontFamilies. hDc = GetDC (hWnd); // Callback uses hDlg. EnumFontFamiliesEx (hDc, &lf, (FONTENUMPROC)EnumFontProc,    (LPARAM)hDlg, (DWORD) 0); ReleaseDC (hWnd, hDc);          In this example, the callback function passed to EnumFontFamiliesEx is as follows:
 int CALLBACK EnumFontProc (ENUMLOGFONTEX* lpelfe,       NEWTEXTMETRICEX* lpntme, int iFontType, LPARAMlParam) {    // Size computed from format used below and buffer limits    TCHAR SzFaceName [4+LF_FULLFACESIZE+LF_FACESIZE];       _stprintf(szFaceName, TEXT("%s (%s)"), lpelfe->elfFullName,       lpelfe->elfScript);    // Add string to list box to describe this font.    SendDlgItemMessage ((HWND)lParam,    IDC_FONTLIST, LB_ADDSTRING, (WPARAM) 0, (LPARAM) szFaceName);    return TRUE; }          Each time the callback function is requested, it builds a string containing the font face name and the language name, and adds that string to the list box (the control name that is part of a dialog template).
As you can see from the previous code sample, Indic scripts must be handled separately because they have no charset values. Since there is no default ANSI or Windows Code Page (ACP) value for Indic scripts, none of the Win32 ANSI entry points (the "A" routines) will work with Indic, Georgian, or Armenian text, or for any new scripts for which system support is provided exclusively through Unicode encoding. (See Chapter 3, "Unicode." )
The font resource for many East Asian languages has two names: an English name and a localized name. For Windows 95, Windows 98, and Windows NT 4, the localized name only works on a system locale that matches the language, while the English name works on all other system locales. This can be a problem when calling CreateFont or CreateFontIndirect. The best method is to try one name and, if that fails, try the other. EnumFonts, EnumFontFamilies, and EnumFont-FamiliesEx returns the English font face name if the system locale does not match the language of the font. On Windows 2000 and Windows XP, this is no longer a problem because both names will be valid for any locale.
Note
Another approach to run-time font selection is to display a font selection common dialog box, from which the user can select the desired font. (See Figure 5-24.) With the ChooseFont API, you can control the list of fonts that are returned to the user, and you can limit the fonts to a given character set.
 
 
Figure 5-24 - A simplified font selection dialog box.
The following code example initializes a font for the IDC_EDITWIN edit control. Then upon the user's selection, the code adjusts the font used in the edit control:
 static  CHOOSEFONT  cf; static  LOGFONT  lf; // Fill out our CHOOSEFONT and LOGFONT and CHOOSEFONT structures //  with default and predefined values. InitializeFont(hDlg, &cf, &lf); // Create this font. hEditFont = CreateFontIndirect(&lf); // Set the font in our edit control. SendDlgItemMessage(hDlg, IDC_EDITWIN, WM_SETFONT,  (WPARAM) hEditFont, MAKELPARAM(TRUE, 0)); // Upon user's request, create a font selection common // dialog box and use the new font. if (ChooseFont(&cf)) {    hEditFont = CreateFontIndirect(&lf);    SendDlgItemMessage (hDlg, IDC_EDITWIN, WM_SETFONT,  WPARAM) hEditFont,  MAKELPARAM(TRUE, 0)); }          Where the InitializeFont function looks like the following:
 void InitializeFont(HWND hWnd,  LONG lHeight, LPCHOOSEFONT lpCf,  LPLOGFONT lpLf) {    lpCf->lStructSize  = sizeof(CHOOSEFONT);    lpCf->hwndOwner  = hWnd;    lpCf->hDC  = NULL;    lpCf->lpLogFont  = lpLf;    lpCf->iPointSize  = 10;    lpCf->Flags = CF_SCREENFONTS|CF_INITTOLOGFONTSTRUCT|CF_NOSIZESEL;    lpCf->rgbColors  = RGB(0,0,0);    lpCf->lCustData  = 0;    lpCf->lpfnHook  = NULL;    lpCf->lpTemplateName= NULL;    lpCf->hInstance  = g_hInst;    lpCf->lpszStyle  = NULL;    lpCf->nFontType  = SIMULATED_FONTTYPE;    lpCf->nSizeMin  = 0;    lpCf->nSizeMax  = 0;      lpLf->lfHeight  = 24;    lpLf->lfWidth  = 0;    lpLf->lfEscapement  = 0;    lpLf->lfOrientation = 0;    lpLf->lfWeight  = FW_DONTCARE;    lpLf->lfItalic  = FALSE;    lpLf->lfUnderline  = FALSE;    lpLf->lfStrikeOut  = FALSE;    lpLf->lfCharSet  = DEFAULT_CHARSET;    lpLf->lfOutPrecision= OUT_DEFAULT_PRECIS;      lpLf->lfClipPrecision = CLIP_DEFAULT_PRECIS;    lpLf->lfQuality  = DEFAULT_QUALITY;    lpLf->lfPitchAndFamily = DEFAULT_PITCH | FF_DONTCARE;    _tcscpy(lpLf->lfFaceName, TEXT("MS Shell Dlg")); }          By adhering to the guidelines of the following section, you can have better control over font customization in Web pages. You'll also save valuable time.
When creating Web pages, avoid placing font attribute values into inline styles, as shown below:
<span style = "font-size: 10pt; font-family: Arial;"> Hello </span>
This approach makes font customization per language or script a difficult task, since a technical localizer would need to scan the entire Web content for all instances of the font definition one language at a time. If the font didn't have glyphs to handle the new language, changes would have to be made on a per-language basis.
A better way to handle font attributes is to use cascading style sheets (CSS) in which corresponding font attributes and styles are defined. In the following example, the CSS file creates a style class called "myStyle," which contains the font family and font size. You can allow these attributes to change depending on the language into which you are rendering your content. For the HTML file, all you need to do for the Web page is "span" whatever text you want formatted with the myStyle class.
  <style>    .myStyle {font-size: 10pt; font-family: Arial;}  </style>  <span class = myStyle> Hello </span>          Now adopting the font to be used per language or per script becomes a much easier job, since it requires a single change in one specific file. You can extend this notion and define a specific style for all the scripts that you want to render in your multilingual Web site. In the following example, the CSS file defines an appropriate font style for each script that will be used thereafter in the inline text.
The CSS file would look like this:
 .clsDescriptor{COLOR: #bdbddd;FONT: 0.7em/1em Verdana;} .clsEnglish {FONT: 1.1em/1.3em "Palatino Linotype";} .clsTitle {COLOR: darkred; FONT: 1.4em/1.6em "Palatino Linotype";} .clsArabic {FONT: 1.1em/1.3em "Arabic Transparent";} .clsArmenian {FONT: 1.3em/1.3em Sylfaen;} .clsHindi {FONT: 1.1em/1.3em Mangal;}          And the HTML file would look like this:
<HTML> <HEAD> <TITLE>LANGUAGE SUPPORT IN WINDOWS 2000</TITLE> <META content="text/html;charset=UTF-8"http-equiv=Content-Type> <LINK href="css.css"rel=stylesheet type=text/css> </HEAD> <BODY bgColor=#ffffff lang=EN-US leftMargin=5 topMargin=30> <SPAN > Each version of Windows 2000 and Windows XP</SPAN><p> <SPAN >supports hundreds of languages.</SPAN> <SPAN >[English ]</SPAN><p> <SPAN DIR=rtl>
</SPAN> <SPAN >[Arabic ]</SPAN><p> <SPAN >
</SPAN> <SPAN >[Armenian ]</SPAN><p> <SPAN >
</SPAN> <SPAN >[Hindi ]</SPAN><p> </BODY> </HTML>
 
 
Figure 5-25 - Output of code in which an appropriate font style has been defined for each script.
