LDAP and Internationalization

   

Directory services, by their very nature, span language boundaries. Multinational companies might have offices in dozens of countries, each with a distinct language. Electronic commerce sites might have customers in many different countries . To address this growing need, LDAPv3 has been designed so that it can easily support multiple languages.

LDAPv3 uses the UTF-8 (UCS Transformation Format 8) character set for all textual attribute values and distinguished names . UTF-8 is a standard character coding system that can represent text in virtually all written languages in use today. It is defined and developed by the Unicode Consortium, an industry group .

There are two important points to understand about UTF-8. First, because of the way UTF-8 is designed, ASCII data is also valid UTF-8 data, giving it the advantage of being highly compatible with existing English-language directory data; no work needs to be done to transform the data into valid UTF-8.

The second point is that when you use UTF-8, it becomes unnecessary to declare an attribute value to be in a particular character set. In other systems, values must be tagged with their character set (for example, Latin-1, Shift-JIS) so that the data may be interpreted correctly. However, because the UTF-8 character set contains codes for the glyphs of virtually all languages, in this format such tagging is unnecessary. It's even possible to use multiple languages within a single attribute value.

Because LDAPv3 servers can store text in multiple languages, it is useful to have some way to store and access attributes by language type. For example, in an international corporation with offices in the United States and Japan, it may be desirable to store several representations of a Japanese employee's name in the directory, including a version in Japanese and a version in English. The LDAP Extension (LDAPEXT) Working Group in the IETF has proposed a method for accomplishing this through the use of language codes.

A language code is an option on an LDAP attribute name. Separated from the base attribute name with a semicolon, it gives the particular language for the attribute in a standard format. For example, the attribute type cn;lang-fr refers to a common name in the French language, and the attribute type sn;lang-ja refers to a surname in the Japanese language. All language names are represented by two-character codes defined in ISO Standard 639, Code for the Representation of Names of Languages .

The LDAP language code standard also allows names to be represented in a particular regional dialect or usage of a particular language. For example, there are some minor differences in how the English language is written in the United States and the United Kingdom. Whereas the language code lang-en-US identifies an attribute in the U.S. dialect, the language code lang-en-GB indicates the British dialect. The country codes used to specify the region are defined in ISO Standard 3166, Codes for the Representation of Names of Countries .

An LDAP client may use language codes in search filters and attribute lists. In other words, an LDAP client may limit its search to only those attributes in the specific language it is interested in, and by specifying language codes in the list of attributes to be returned, it may request that only specific languages be returned. For example, by including only cn;lang-fr and description;lang-fr in the list of attributes to be returned, a client could search the French common name attribute with the filter (cn;lang-fr=Jules) and specify that the French common name and description attributes be returned.

Note that there is no way to retrieve all dialects of a particular language code. For example, the attribute type cn;lang-en is not the same as the attribute type cn;lang-en-US . Each dialect must be specifically requested . In general, avoid the use of dialects unless necessary. However, attributes with language codes are treated as subtypes of attributes without language codes. So, for example, the attribute cn;lang-en is a subtype of the attribute cn . Requesting the cn attribute retrieves all language code variations of the cn attribute.

Language codes are a relatively new development, and not all servers support them at this time. Check with your software vendor to see whether language codes are supported.

   


Understanding and Deploying LDAP Directory Services
Understanding and Deploying LDAP Directory Services (2nd Edition)
ISBN: 0672323168
EAN: 2147483647
Year: 2002
Pages: 242

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net