Looking Ahead | Understanding and Deploying LDAP Directory Services (2nd Edition)

Understanding and Deploying LDAP Directory Services > 3. An Introduction to LDAP > LDAP and Internationalization

< BACK

CONTINUE >

153021169001182127177100019128036004029190136140232051053055078208062082163098073170243

LDAP and Internationalization

Directory services, by their very nature, span language boundaries. Multinational companies might have offices in dozens of countries , each with a distinct language. To address this growing need, LDAPv3 has been designed so that it can easily support multiple languages.

LDAPv3 uses the UTF-8 (Unicode Transformation Format-8) character set for all textual attribute values and distinguished names . UTF-8 is a standard character coding system that can represent text in virtually all written languages in use today. It is defined and developed by the Unicode Consortium, an industry group .

There are two important points to understand about UTF-8. First, because of the way UTF-8 is designed, ASCII data is also valid UTF-8 data. This has the benefit of being highly compatible with existing English-language directory data; no work needs to be done to transform the data into valid UTF-8.

The second point is that when you use UTF-8, it becomes unnecessary to declare an attribute value to be in a particular character set. In other systems, values must be tagged with their character set (e.g., Latin-1, Shift-JIS) so that the data may be correctly interpreted. However, because the UTF-8 character set contains codes for the glyphs of virtually all languages, this is unnecessary. It's even possible to use multiple languages within a single attribute value.

Because LDAPv3 servers can store text in multiple languages, it is useful to have some way to store and access attributes by language type. For example, in an international corporation with offices in the United States and Japan, it may be desirable to store several representations of a Japanese employee's name in the directory, including a version in Japanese and a version in English. The LDAP Extensions Working Group in the IETF has proposed a method for accomplishing this through the use of language codes.

A language code is an option on an LDAP attribute name. Separated from the base attribute name with a semicolon, it gives the particular language for the attribute in a standard format. For example, the attribute type cn;lang-fr refers to a common name in the French language, and the attribute type sn;lang-ja refers to a surname in the Japanese language. All language names are represented by a two-character code defined in ISO Standard 639, "Code for the representation of names of languages."

The LDAP language code standard also allows for names to be represented in a particular regional dialect or usage of a particular language. For example, there are some minor differences in how the English language is written in the United States and the United Kingdom. The language code lang-en-US identifies an attribute in the U.S. dialect, whereas the language code lang-en-GB indicates the British dialect. The country codes used to specify the region are defined in ISO Standard 3166, "Codes for the representation of names of countries."

An LDAP client may use language codes in search filters and attribute lists. In other words, an LDAP client may limit its search to only those attributes in the specific language it is interested in, and it may request that only specific languages be returned by specifying language codes in the list of attributes to be returned. For example, a client could search the French common name attribute with the filter (cn;lang-fr=Jules) and specify that the French common name and description attributes be returned by including only cn;lang-fr and description;lang-fr in the list of attributes to be returned.

Note that there is no way to retrieve all dialects of a particular language code. For example, the attribute type cn;lang-en is not the same as the attribute type cn;lang-en-US . Each dialect must be specifically requested . In general, avoid the use of dialects unless necessary. However, attributes with language codes are treated as subtypes of attributes without language codes. So, for example, the attribute cn;lang-en is a subtype of the attribute cn . Requesting the cn attribute will retrieve all language code variations of the cn attribute.

Language codes are a relatively new development, and not all servers support them at this time. Check with your software vendor to see if language codes are supported.

Understanding and Deploying LDAP Directory Services, 2002 New Riders Publishing

< BACK

CONTINUE >

Index terms contained in this section

attributes
language codes 2nd
storing by language type
character sets
UTF-8
internationalization
internationalization 2nd
attributes
language codes 2nd
storing by language type
UTF-8 character set
language codes
attributes 2nd
LDAP
internationalization 2nd
language codes 2nd
storing attributes by language type
UTF-8 character set
UTF-8 character set
internationalization

2002, O'Reilly & Associates, Inc.