Better G11N Practices


We'll wrap up this chapter by providing a set of "better" practices for developing G11N applications in ColdFusion MX 7. "Better" means better than "good" but not quite ready to be the "best." Why not "best practices"? My own modesty aside, mainly because ColdFusion MX 7 is so new and there are so many ways to pitch globalized woo.

As shown throughout this chapter, the introduction of CFCs and Unicode in ColdFusion, as well as the fact that it's easier access to Java I18N libraries, certainly improved our ability to develop G11N applications. Yet G11N concepts are still relatively new to ColdFusion application developers, and I expect there is still quite a bit more to evolve as the community's "heavy thinkers" turn their brains to this field. What follows is what I think are better G11N practices now that ColdFusion MX 7 has come to town.

What Not to Do

Before beginning on what you ought to be doing, let me consolidate some of the points made in the previous sections about what not to do.

Don't use a database that doesn't support Unicode. In this day and age, it must be some kind of dinosaur anyway, so why bother?

Don't make your database monocultural. Here are some things to do to make your application more culturally sensitive:

  • Require almost nothing cultural in your forms or database.

  • Non-English terms, names, and so forth are sometimes longer than their English equivalent or consist of more than one word. Keep this in mind when designing databases, input forms, and reports.

  • Mailing address forms and database designs should be flexible in order to handle the varied address styles in use around the world (see http://www.upu.int/post_code/en/addressing.html for examples). Postal codes (ZIP codes in the U.S.) should not assume numeric-only data. The Street part of your address design should allow for more than one street part. Your application logic should not assume that address identifiers are always house numbers/street addresses. Also, it should not assume any patterns (left side odd, right side even) or regular sequences (house #3 might not come after house #1).

  • Don't assume global measurement units are the same ones printed on your cereal box. Your application should separate measurement units from measurement values (that is, as separate database columns or form fields). And for developers in metric countries, the whole world isn't yet metric. For applications dealing with apparel (clothing, shoes, and so on) be aware that sizes vary wildly from locale to locale.

  • Always store date/time data as UTC (Universal Time Coordinate) or Greenwich Mean Time. It's a generally good idea to keep your date time data as UTC, especially if your servers or users are physically located in many time zones.

Don't presume a Gregorian calendar system for your date/time data. There are at least five other major calendar systems in popular use today (Buddhist, Chinese, Hebrew, Islamic, and Japanese, as discussed in this chapter). While it's sometimes acceptable to use the Gregorian calendar with localized date formats, this isn't true in all localesit's not A.D. 2005 to everyone. As noted in the "Calendars" section, this is especially critical for date-based calculations; one person's 30 days might not be another's month.

Don't ignore CSS. Although CSS use can be complicated owing to browser quirks and the lack of a CSS police force, its use can greatly simplify G11N page layouts across locales, especially for applications that need to support BIDI text.

Don't assume text or graphic directionality. For instance, people in Arabic/Hebrew cultures look at a page of text and graphics differently from people in Canada.

Don't mix text/text presentation and application code. This is, by far, the dreariest part of making an existing ColdFusion application I18N. You must search out each wayward bit of hard-coded text and replace it with ColdFusion variables of one sort or another.

Don't fail to use UTF-8 encoding. Just use Unicode. It's the default encoding for ColdFusion and so offers the path of least resistance to ColdFusion globalization.

TIP

If you want to save yourself the trouble of showering your ColdFusion pages with <cfprocessingdirective> tags, it's wise to use an editor such as Dreamweaver, which supports a BOM (byte order mark) so that the ColdFusion server will automatically understand your page's encoding.


Let's next examine a design issue: monolingual versus multilingual Web sites.

Monolingual or Multilingual Web Sites

Some folks in the G11N field often break down Web application design to a choice between so-called monolingual or multilingual designs. In its simplest form, a monolingual design opts for a distinct URL for each language served by the application. For example, a Web site originally in English, say www.iWantYourMoney.com, might through some mechanism redirect its French users to fr.iWantYourMoney.com, its Thai users to th.iWantYourMoney.com, and so on. Such a design is obviously not well suited to static HTML text (translation and HTML file management would quickly become a nightmare). It would, however, work quite nicely as a ColdFusion-driven Web site. The downside to this approach is when an application has to serve locales rather than a single language; for example, French Canadian versus French in France. This design would either have to add increasingly more URLs to the mix it must maintain, or it would have to develop special mechanisms to handle locale within a monolingual site (something akin to a multilingual design to serve locales instead of languages within a monolingual site).

A multilingual design would serve all supported languages and/or locales from a single URL, www.iWantYourMoney.com. And this usually makes heavy use of resource bundles to deliver locale-specific text. Given the choice, this is my personal preference for the following reasons:

  • Much more sensitive to locale. You can just as easily serve fr as you can fr_FR or fr_CA.

  • Simplifies the application by removing the need to redirect to another URL, which also makes it more palatable to users.

  • Helps simplify load-balancing schemes.

  • Reduces potential issues with site structure varying across locales. (This is a pet peeve of mine. I find it particularly annoying to see a Web site's structure in one language be a thin shadow of itself in another; government Web sites are often culprits).

As you might imagine, there are shades of these two design colors, and variations within each. There's really no right or wrong way to handle this, as long as you follow the globalization practices outlined in this chapter.

Locale Stickiness

As noted earlier in the chapter, it's very important for your ColdFusion application to remember a user's locale preference. The monolingual design is one sure way not to forget; users are basically "stuck" in a domain that's devoted to their language/locale. Stickiness in the multilingual design can be maintained via

  • URL variables such as index.cfm?locale=fr_CA, which requires some mechanism to rewrite the URL variable string to append the locale on each page request

  • Cookies, but these are subject to users turning them off, proxy servers trashing them, and so on

  • SESSION variables

A frequent choice for maintained stickiness is the SESSION variable because it is usually the easiest to understand and maintain in an application's code. The user's locale choice is simply "there," and nothing special need be done in the application to maintain it across page requests. The downside to using these variables is the added complexity in load balancing and handling expired sessions. Of course, another option to maintain stickiness is to use SESSION variables with URL variables as a fallback mechanism.

HTML

ColdFusion always overrides HTML-based character encoding META tags (specifically CONTENT-TYPE). However, it's usually a good idea to include these, properly identified, in your ColdFusion pages. Why? The main reasons are search engines and speech synthesizers (without some kind of hint, most text-to-speech software cannot tell one language from another). The three most important HTML tags from this perspective are <HTML> and the CONTENT-LANGUAGE META and CONTENT-TYPE META tags.

As noted earlier in the Display section, the <HTML> tag has two important G11N attributes:

  • The lang attribute, which specifies the base language of an element's attribute values and text content (note that you can apply the lang attribute to many HTML elements). Acceptable values for lang follow ISO639 and ISO3166, using a format such as primary-language-code-subcode. This is normally a two-letter country code but might also include language versions (for instance, "en-cockney", the Cockney version of English).

  • The dir attribute, which specifies the base direction of directionally neutral text. Like the lang attribute, dir can be applied to many HTML elements such as tables, inputs, and so forth. Acceptable values for dir attribute are LTR (left-to-right) and RTL (right-to-left).

The CONTENT-LANGUAGE META tag specifies the primary human language(s) for a page. For example, the following tag indicates that the page has English as used in the United States, Thai as used in Thailand, and French on this page:

 <META HTTP-EQUIV="CONTENT-LANGUAGE" CONTENT="en-US,th-TH,fr"> 

The CONTENT-TYPE META tag specifies the content type, such as text/html, and the character set used on this page. This is one tag that I habitually include. The following META tag indicates that this page is plain text/HTML and uses a UTF-8 character set:

 <META HTTP-EQUIV="CONTENT-TYPE" CONTENT="text/html; charset=UTF-8"> 

See Listing 23.9 for an example of these HTML tags. Even though ColdFusion will ignore them, it is recommended to include these three HTML tags with appropriate attribute values.

CFML

Besides the HTML tags listed just above, your application should include the following in your application.cfm:

 <!--- url and form encoding to UTF-8. ---> <cfset setEncoding("URL", "UTF-8")> <cfset setEncoding("Form", "UTF-8")> <!--- output encoding to UTF-8 ---> <cfcontent type="text/html; charset=UTF-8"> 

Your applications should also include:

 <cfprocessingdirective pageEncoding="utf-8"> 

at or near the top of each and every one of your applications' templates, unless you can be 100 percent sure that each of them is properly encoded as UTF-8, including the byte order mark (BOM). Since the strict definition of UTF-8 encoding doesn't actually mention using a BOM, you might be better off including the <cfprocessingdirective> tag.

Resource Bundles

To completely separate text and text presentation from application code, your ColdFusion MX 7 application should make use of resource bundles (discussed earlier in the chapter). The resource bundle "flavor" you use depends entirely on your application's logic and how and where it can be deployed, although it is strongly recommended to use the Java-style resource bundle owing to the excellent set of free management tools available.

Your application should separate resource bundle files into logical groupings (for instance, menu.properties, deskTop.properties, loginForm.properties) and then again into locale-specific files within those groupings (menu_en_US.properties, menu_th_TH.properties, and so forth). It is recommended that resource bundles be loaded into shared-scope structure variables with locale as keyfor instance, APPLICATION.menu.en_us or, if you prefer, APPLICATION.menu["en_US"].

There are generally two approaches used to initialize these resource bundles: "on demand," and "fire and forget."

"On demand" initialization, in which the application loads each locale's resource bundles as needed (that is, when a user from that locale visits the Web site). This approach might be appropriate in situations where the developer knows or suspects the application will have an uneven distribution of locale users, say 1 million Americans, 500,000 Italians, 10,000 Brazilians, and 3 guys in New Jersey that speak Zulu. (Of course, there are plenty of Zulu or IsiZulu speakers outside New Jersey, but for the purpose of this discussion let's assume there are only three and they live in Hoboken.) Server resources are only used as and when neededif the three Zulu speakers in New Jersey never visit the Web site, the application never loads that locale's resource bundle. Listing 23.13 shows an example of this arrangement.

Listing 23.13. onDemandRB.cfm"On Demand" Resource Bundle Loading

[View full width]

 <cfscript> // uses rbJava CFC to handle java style resource bundles if (NOT structKeyExists(APPLICATION.commonBundle,"#SESSION.userLocale#")) {   APPLICATION.commonBundle[SESSION.userLocale]=rB.getResourceBundle("common.thisAppCommon" ,SESSION.userLocale,markDebug);   APPLICATION.adminBundle[SESSION.userLocale]=rB.getResourceBundle("admin.thisAppAdmin" ,SESSION.userLocale,markDebug);   APPLICATION.appBundle[SESSION.userLocale]=rB.getResourceBundle("applications .thisAppApplications",SESSION.userLocale,markDebug);   APPLICATION.globalBundle[SESSION.userLocale]=rB.getResourceBundle("global.thisAppGlobal" ,SESSION.userLocale,markDebug);   APPLICATION.groupBundle[SESSION.userLocale]=rB.getResourceBundle("groups.thisAppGroups" ,SESSION.userLocale,markDebug);   APPLICATION.toolsBundle[SESSION.userLocale]=rB.getResourceBundle("tools.thisAppTools" ,SESSION.userLocale,markDebug); } </cfscript> 

This code assumes the use of the rbJava.CFC discussed previously, and also assumes that the relevant ColdFusion structures have been created to hold the resource bundles. The rbJava.CFC returns a structure holding the resource bundle's key/value pairs. The code first checks to see if one of the resource bundle structures has a key for this user's locale. If not, it then proceeds to load the various resource bundles into the appropriate structures using this user's locale as a key.

The dot-syntax naming scheme (common.thisAppCommon) for the resource bundle properties file is required is for the getBundle method, which uses a directory.fileName notation to find the correct resource bundle property file.

The markDebug option for that CFC's geTResourceBundle function indicates whether or not to mark the returned resource bundles with asterisks (*) to aid in debugging. (It helps pick out text supplied from the resource bundle versus static application text leftover from the I18N process; that is, bugs.)

"Fire and forget" simply loads all supported-locale resource bundles when the application is initialized, rather than waiting for a user to initialize a new resource bundle for a particular locale. This method of initialization might best be applied when the developer knows that locale usage is evenly distributed or doesn't want to bother with any dynamic loading of resource bundles.

Listing 23.14 provides a simple example of this technique. The array of application-supported locales (APPLICATION.supportLocales) might be supplied as an application parameter or dynamically determined by parsing the locales available for any given resource bundle.

Listing 23.14. ffRB.cfm"Fire and Forget" Resource Bundle Loading

[View full width]

 <cfscript> // uses rbJava CFC to handle java style resource bundles for (i=1; i LTE arrayLen(APPLICATION.supportLocales); i=i+1) {   //verbose for readability   thisLocale=APPLICATION.supportLocales[i]; APPLICATION.commonBundle[thisLocale]=rB.getResourceBundle("common.thisAppCommon" ,thisLocale,markDebug); APPLICATION.adminBundle[thisLocale]=rB.getResourceBundle("admin.thisAppAdmin",thisLocale ,markDebug); APPLICATION.appBundle[thisLocale]=rB.getResourceBundle("applications.thisAppApplications" ,thisLocale,markDebug); APPLICATION.globalBundle[thisLocale]=rB.getResourceBundle("global.thisAppGlobal" ,thisLocale,markDebug); APPLICATION.groupBundle[thisLocale]=rB.getResourceBundle("groups.thisAppGroups",thisLocale ,markDebug); APPLICATION.toolsBundle[thisLocale]=rB.getResourceBundle("tools.thisAppTools",thisLocale ,markDebug); } </cfscript> 

Just Use Unicode

If your application needs to support more than a few languagesespecially any of the CJK languagesit needs to use Unicode as its character encoding. This isn't rocket science and, as emphasized earlier, there are no real adverse side effects to using Unicode. The only serious issue arises over legacy data using codepage encodings, and in the long run you'll be better off biting the bullet and doing the conversion early on in the application's life cycle. Just use Unicode.

When ColdFusion MX 7 Isn't Enough

Out of the preceding sections, you'll want to recall one of the guiding principles for developing G11N applications: These applications should be generic. A G11N application must be distilled down to its essence (internationalized) before it becomes specialized (localization). That old saw "Build once, run anywhere" really does apply to G11N applications.

Making generic applications in the face of all this complexity is difficult. It becomes even more difficult if you are faced with the situation where you need to support locales that ColdFusion MX 7 doesn't (or where core Java provides poor support). Figure 23.16 shows a graph of locales supported by various versions of ColdFusion, with the final value in the graph showing locales supported by ColdFusion backed by the ICU4J library. If you're at this G11N business long enough, you will eventually run into that 100+ locale difference. It becomes a question then of handling the unsupported locales as special cases, and maintaining the simplicity and speed offered by the native ColdFusion MX 7 G11N tags and functions for the rest of your locales. Or you can switch to the complete use of ICU4J's G11N functions and deal with the added code complexity, but maintain a truly generic application.

Figure 23.16. ColdFusion locale support, by version.


There is no right or wrong answer for this question of generic applicability. It depends a great deal on the individual developer and/or the policies of the development shop, and how long both have been in the G11N business. My own shop has more or less slid into the ICU4J-only style. We've developed enough applications in locales that ColdFusion previously didn't support (such as Thai and Arabic) that we have a rather large set of essentialICU4J-based tools that we can't live without. Developers new to G11N applications have a tough decision to make and, quite frankly, I can't offer any critical adviceother than to measure out your locale support and see if that warrants one style or the other.

You're probably thinking to yourself, "Well that was a mouthful". Perhaps it was but G11N concepts are not trivial, even though the code for it very often is. This chapter covered the major G11N issues including locales, character encoding, databases and text/code separation via resource bundles. It also provided information to help you over the rough spots and code download resources to help you easily deal with the more common G11N needs. Finally it set out some "better" G11N practices that you should follow. All in all, this chapter has provided a solid base for developing ColdFusion G11N applications.

"So long and thanks for all the fish."



Advanced Macromedia ColdFusion MX 7 Application Development
Advanced Macromedia ColdFusion MX 7 Application Development
ISBN: 0321292693
EAN: 2147483647
Year: 2006
Pages: 240
Authors: Ben Forta, et al

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net