Understanding and Deploying LDAP Directory Services > 3. An Introduction to LDAP > LDAP and Internationalization |
LDAP and InternationalizationDirectory services, by their very nature, span language boundaries. Multinational companies might have offices in dozens of countries , each with a distinct language. To address this growing need, LDAPv3 has been designed so that it can easily support multiple languages. LDAPv3 uses the UTF-8 (Unicode Transformation Format-8) character set for all textual attribute values and distinguished names . UTF-8 is a standard character coding system that can represent text in virtually all written languages in use today. It is defined and developed by the Unicode Consortium, an industry group . There are two important points to understand about UTF-8. First, because of the way UTF-8 is designed, ASCII data is also valid UTF-8 data. This has the benefit of being highly compatible with existing English-language directory data; no work needs to be done to transform the data into valid UTF-8. The second point is that when you use UTF-8, it becomes unnecessary to declare an attribute value to be in a particular character set. In other systems, values must be tagged with their character set (e.g., Latin-1, Shift-JIS) so that the data may be correctly interpreted. However, because the UTF-8 character set contains codes for the glyphs of virtually all languages, this is unnecessary. It's even possible to use multiple languages within a single attribute value. Because LDAPv3 servers can store text in multiple languages, it is useful to have some way to store and access attributes by language type. For example, in an international corporation with offices in the United States and Japan, it may be desirable to store several representations of a Japanese employee's name in the directory, including a version in Japanese and a version in English. The LDAP Extensions Working Group in the IETF has proposed a method for accomplishing this through the use of language codes. A language code is an option on an LDAP attribute name. Separated from the base attribute name with a semicolon, it gives the particular language for the attribute in a standard format. For example, the attribute type cn;lang-fr refers to a common name in the French language, and the attribute type sn;lang-ja refers to a surname in the Japanese language. All language names are represented by a two-character code defined in ISO Standard 639, "Code for the representation of names of languages." The LDAP language code standard also allows for names to be represented in a particular regional dialect or usage of a particular language. For example, there are some minor differences in how the English language is written in the United States and the United Kingdom. The language code lang-en-US identifies an attribute in the U.S. dialect, whereas the language code lang-en-GB indicates the British dialect. The country codes used to specify the region are defined in ISO Standard 3166, "Codes for the representation of names of countries." An LDAP client may use language codes in search filters and attribute lists. In other words, an LDAP client may limit its search to only those attributes in the specific language it is interested in, and it may request that only specific languages be returned by specifying language codes in the list of attributes to be returned. For example, a client could search the French common name attribute with the filter (cn;lang-fr=Jules) and specify that the French common name and description attributes be returned by including only cn;lang-fr and description;lang-fr in the list of attributes to be returned. Note that there is no way to retrieve all dialects of a particular language code. For example, the attribute type cn;lang-en is not the same as the attribute type cn;lang-en-US . Each dialect must be specifically requested . In general, avoid the use of dialects unless necessary. However, attributes with language codes are treated as subtypes of attributes without language codes. So, for example, the attribute cn;lang-en is a subtype of the attribute cn . Requesting the cn attribute will retrieve all language code variations of the cn attribute. Language codes are a relatively new development, and not all servers support them at this time. Check with your software vendor to see if language codes are supported.
|
Index terms contained in this sectionattributeslanguage codes 2nd storing by language type character sets UTF-8 internationalization internationalization 2nd attributes language codes 2nd storing by language type UTF-8 character set language codes attributes 2nd LDAP internationalization 2nd language codes 2nd storing attributes by language type UTF-8 character set UTF-8 character set internationalization |
2002, O'Reilly & Associates, Inc. |