So far, we have output numbers by using the echo or print functions and let the functions choose how to format various variables on output. However, this default output can sometimes appear raw or unprocessed. It takes more than a glance to understand how big the number 286196812 is (the population of the United States as of the 2000 Census). Similarly, seeing the current date and time written as 1106175862 (the output of the time function in PHP) is a little distressing. It would be much better to format these numbers in the same way we would expect to see them in a book or newspaper. Therefore, the American population would be formatted as 286,196,812; and the current date would be formatted as 2005-Jan-19 15:01:58. A complication to this format is that users in France expect to see large numbers written as 1 234 567,89; while users in Italy expect to see them as 1.234.567,89. We would like a system to handle all of these possibilities and give us the chance to change them in our code. This chapter mostly concerns itself with the topic of globalizationthe art of making our application able to handle input and users from different countries and cultures. The art of making our application run in multiple languages is called localization. Before we can look at the available functions for formatting of numeric data and writing our own, we must spend some time examining the concept of locales and learn how our operating system understands them. Locales and Their PropertiesOne of primary concepts we need to understand for this formatting discussion is locale. All computers and operating systems operate with a basic concept of their location that helps them determine how they display information to the user. Once they are told that the user wishes to see things as they are seen in Italy, the computer knows to show numbers as 1.234,56; times as 20:52:16; dates as dd/mm/yy; and monetary values as 1.234.56. On the other hand, American users will want to see 1,234.56; 8:52:16 PM; mm/dd/yy; and $1,234.56 respectively. One problem we have when writing web applications is that the server can run in a different locale than our client. If our server runs an American English version of the operating system, it defaults to processing all information in that locale. However, users browsing from Hong Kong hope to see information presented in their way. It should be noted that many web applications do not bother to deal with these issues; therefore, international users are probably used to seeing information presented in American English. However, in the interest of writing the highest-quality application, we will do our best to be fully globalized. Therefore, we must solve two problems when writing our web applications: which locale the user is visiting from, and how locale of our application is set. Learning the User's LocaleDetermining what locale settings the user is browsing your web application with is not something you may be able to determine. However, there are a couple of key clues we can use to help us make an informed decision:
The content of the Accept-Language: header is often in this format: en-us,en;q=0.5 This basically says that the browser prefers U.S. English output (en-us) overall, and other versions of English otherwise (en). The q=0.5 is a quality factor; the value 0.5 indicates that we are only half as keen on any English (en) as we are on en-us. A language without a quality factor (such as the preceding en-us) is assumed to have a value of 1. Different language entries are separated by commas, and the quality factor is always separated from the language entry with a semicolon. To parse this, we first need to split the various languages and get the quality factors. The first is done with a call to the explode function (we can use the non-multi-byte character set safe explode function instead of split since HTTP headers are transmitted in the ISO-8859-1 character set): $langs = explode(',', $_SERVER['HTTP_ACCEPT_LANGUAGE']); To extract the quality factors, we need to split the strings around any semicolon boundaries. We will write a function to create an array of arrays that each contains a language code and a quality factor: <?php function generate_languages() { // split apart language entries $rawlangs = explode(',', $_SERVER['HTTP_ACCEPT_LANGUAGE']); // initialize output array $langs = array(); // for each entry, see if there's a q-factor foreach ($rawlangs as $rawlang) { $parts = explode(';', $rawlang); if (count($parts) == 1) $qual = 1; // no q-factor else { $qual = explode('=', $parts[1]); if (count($qual) == 2) $qual = (float)$qual[1]; // q-factor else $qual = 1; // ill-formed q-f } // create an array for this entry $langs[] = array('lang' => trim($parts[0]), 'q' => $qual); } // sort the entries usort($langs, 'compare_quality'); return $langs; } // this function sorts by q-factors, putting highest first. function compare_quality($in_a, $in_b) { // quality is at key 'q' if ($in_a['q'] > $in_b['q']) return -1; else if ($in_a['q'] < $in_b['q']) return 1; else return 0; } ?> We can parse the preceding Accept-Language: header value with the generate_ languages function and obtain the following output: Array ( [0] => Array ( [lang] => en-us [q] => 1 ) [1] => Array ( [lang] => en [q] => 0.5 ) ) Unfortunately, learning the user's locale is only half the battle. The second half is telling PHP to use a given locale, which is dependent upon the operating system that PHP is running. Setting the Locale of the Current Page (Unix)Unix systems vary in how they support locales, but a common scheme that is seen in some flavors of Linux and FreeBSD is to store locale information in files in /usr/share/locale, where each language has a subdirectory in that location. However, there is still some variation. Many Linux versions (including SuSE and Red Hat) place the language information in directories of the form en_US/ or de/, while FreeBSD places them in directories, such as en_US.ISO8859-1/ or fr_FR.ISO8859-15/. You can change the locale in the current page with the setlocale function. This function takes two parameters. The first specifies which features the locale is to be set for, while the second indicates the locale to be used (and is both operating-system specific and case-sensitive). The first parameter will be one of the following values:
The setlocale function returns the name of the locale set on success, or FALSE if it fails. Our best bet for using this function is to try a series of attempts and default to leaving it unchanged when we cannot find a given locale: <?php // Linux version function set_to_user_locale() { $langs = generate_languages(); foreach ($langs as $lang) { // if of form major_sublang, sublang must be uppercase if (strlen($lang > 2) { $lang = substr($lang['lang'], 0, 3) . strtoupper(substr($lang['lang'], 3, 2)); } // try to set the locale. if (setlocale(LC_ALL, $lang['lang']) !== FALSE) break; // it worked! } } ?> Unfortunately, web application authors on FreeBSD have to do some extra work to make sure the character set associated with the locale name works properly. Setting the Locale of the Current Page (Windows)Users who run on Microsoft Windows operating systems also use the setlocale function to set the locale of the current page; however, they have the added complication that Windows does not use the same locale names that the browser sends. These systems have no choice but to map languages to the codes that Windows uses. Windows' language strings are largely based on the English pronunciation of a name, though most have three-letter short forms that can also be used. Table 21-1 shows the more common values that you will encounter. You can see a list of these values by going to http://www.msdn.com and searching for "Language Strings."
For our web applications, we have to map between these languages and the country codes available in Windows: <?php // Windows version function set_to_user_locale() { static $langmappings = array( array('codes' => array('en', 'en-us', 'en_us') 'locale' => 'english') array('codes' => array('en-gb', 'en_gb') 'locale' => 'english-uk') array('codes' => array('fr', 'fr-fr', 'fr_fr') 'locale' => 'french') array('codes' => array('fr_ca', 'fr-ca') 'locale' => 'french-canadian') array('codes' => array('de', 'de-de', 'de_de') 'locale' => 'german') array('codes' => array('jp', 'jp-jp', 'jp_jp') 'locale' => 'japanese') array('codes' => array('es', 'es-es', 'es_es') 'locale' => 'spanish') // etc. -- we have skipped many for space. ); // get the languages the browser wants. $user_langs = generate_languages(); // start with the most likely first foreach ($user_langs as $user_lang) { // look through our array of mappings ... foreach ($langmappings as $mapping) { // ... for a code that matches what the user wants foreach ($mapping['codes'] as $code) { if ($code == strtolower($user_lang['lang'])) { setlocale(LC_ALL, $mapping['locale']); return; } } } } // didn't find compatible locale. just leave it } ?> Unfortunately, these functions are inefficient. We should only use them when necessary. Learning About the Current LocaleWhen you wish to manually do numeric formatting or you wish to learn more about the locale in which your page is operating, you can use the localeconv function in PHP to retrieve an array of information that is pertinent to the formatting of numbers for the current locale. The array returned will contain the keys shown in Table 21-2. The trick to using localeconv is to first call setlocale with an appropriate locale name. The reason is that these functions reside in separate libraries in the operating system and do not get initialized until we begin to use them.
|