Language Tags and HTTP

16.4 Language Tags and HTTP

Language tags are short, standardized strings that name spoken languages.

We need standardized names , or some people will tag French documents as "French," others will use "Franais," others still might use "France," and lazy people might just use "Fra" or "F." Standardized language tags avoid this confusion.

There are language tags for English (en), German (de), Korean (ko), and many other languages. Language tags can describe regional variants and dialects of languages, such as Brazilian Portuguese (pt-BR), U.S. English (en-US), and Hunan Chinese (zh-xiang). There is even a standard language tag for Klingon (i-klingon)!

16.4.1 The Content-Language Header

The Content-Language entity header field describes the target audience languages for the entity. If the content is intended primarily for a French audience, the Content-Language header field would contain:

 Content-Language: fr 

The Content-Language header isn't limited to text documents. Audio clips, movies, and applications might all be intended for a particular language audience. Any media type that is targeted to particular language audiences can have a Content-Language header. In Figure 16-8 , the audio file is tagged for a Navajo audience.

Figure 16-8. Content-Language header marks a "Rain Song" audio clip for Navajo speakers

figs/http_1608.gif

If the content is intended for multiple audiences, you can list multiple languages. As suggested in the HTTP specification, a rendition of the "Treaty of Waitangi," presented simultaneously in the original Maori and English versions, would call for:

 Content-Language: mi, en 

However, just because multiple languages are present within an entity does not mean that it is intended for multiple linguistic audiences. A beginner's language primer, such as "A First Lesson in Latin," which clearly is intended to be used by an English-literate audience, would properly include only "en".

16.4.2 The Accept-Language Header

Most of us know at least one language. HTTP lets us pass our language restrictions and preferences along to web servers. If the web server has multiple versions of a resource, in different languages, it can give us content in our preferred language. [14]

[14] Servers also can use the Accept-Language header to generate dynamic content in the language of the user or to select images or target language-appropriate merchandising promotions.

Here, a client requests Spanish content:

 Accept-Language: es 

You can place multiple language tags in the Accept-Language header to enumerate all supported languages and the order of preference (left to right). Here, the client prefers English but will accept Swiss German (de-CH) or other variants of German (de):

 Accept-Language: en, de-CH, de 

Clients use Accept-Language and Accept-Charset to request content they can understand. We'll see how this works in more detail in Chapter 17 .

16.4.3 Types of Language Tags

Language tags have a standardized syntax, documented in RFC 3066, "Tags for the Identification of Languages." Language tags can be used to represent:

         General language classes (as in "es" for Spanish)

         Country-specific languages (as in "en-GB" for English in Great Britain)

         Dialects of languages (as in "no-bok" for Norwegian "Book Language")

         Regional languages (as in "sgn-US-MA" for Martha's Vineyard sign language)

         Standardized nonvariant languages (e.g., "i-navajo")

         Nonstandard languages (e.g., "x-snowboarder-slang" [15] )

[15] Describes the unique dialect spoken by "shredders."

16.4.4 Subtags

Language tags have one or more parts , separated by hyphens, called subtags :

         The first subtag called the primary subtag . The values are standardized.

         The second subtag is optional and follows its own naming standard.

         Any trailing subtags are unregistered.

The primary subtag contains only letters (A-Z). Subsequent subtags can contain letters or numbers , up to eight characters in length. An example is shown in Figure 16-9 .

Figure 16-9. Language tags are separated into subtags

figs/http_1609.gif

16.4.5 Capitalization

All tags are case-insensitivethe tags "en" and "eN" are equivalent. However, lowercasing conventionally is used to represent general languages, while uppercasing is used to signify particular countries . For example, "fr" means all languages classified as French, while "FR" signifies the country France. [16]

[16] This convention is recommended by ISO standard 3166.

16.4.6 IANA Language Tag Registrations

The values of the first and second language subtags are defined by various standards documents and their maintaining organizations. The IANA [17] administers the list of standard language tags, using the rules outlined in RFC 3066.

[17] See http://www.iana.org and RFC 2860.

If a language tag is composed of standard country and language values, the tag doesn't have to be specially registered. Only those language tags that can't be composed out of the standard country and language values need to be registered specially with the IANA. [18] The following sections outline the RFC 3066 standards for the first and second subtags.

[18] At the time of writing, only 21 language tags have been explicitly registered with the IANA, including Cantonese ("zh-yue"), New Norwegian ("no-nyn"), Luxembourgish ("i-lux"), and Klingon ("i-klingon"). The hundreds of remaining spoken languages in use on the Internet have been composed from standard components .

16.4.7 First Subtag: Namespace

The first subtag usually is a standardized language token, chosen from the ISO 639 set of language standards. But it also can be the letter "i" to identify IANA-registered names, or "x" for private, extension names. Here are the rules:

If the first subtag has:

         Two characters, it is a language code from the ISO 639 [19] and 639-1 standards

[19] See ISO standard 639, "Codes for the representation of names of languages."

         Three characters, it is a language code listed in the ISO 639-2 [20] standard and extensions

[20] See ISO 639-2, "Codes for the representation of names of languagesPart 2: Alpha-3 code."

         The letter "i," the language tag is explicitly IANA-registered

         The letter "x," the language tag is a private, nonstandard, extension subtag

The ISO 639 and 639-2 names are summarized in Appendix G . A few examples are shown here in Table 16-5 .

Table 16-5. Sample ISO 639 and 639-2 language codes

Language

ISO 639

ISO 639-2

Arabic

ar

ara

Chinese

zh

chi/zho

Dutch

nl

dut/nla

English

en

eng

French

fr

fra/fre

German

de

deu/ger

Greek (Modern)

el

ell/gre

Hebrew

he

heb

Italian

it

ita

Japanese

ja

jpn

Korean

ko

kor

Norwegian

no

nor

Russian

ru

rus

Spanish

es

esl/spa

Swedish

sv

sve/swe

Turkish

tr

tur

16.4.8 Second Subtag: Namespace

The second subtag usually is a standardized country token, chosen from the ISO 3166 set of country code and region standards. But it may also be another string, which you may register with the IANA. Here are the rules:

If the second subtag has:

         Two characters, it's a country/region defined by ISO 3166 [21]

[21] The country codes AA, QM-QZ, XA-XZ and ZZ are reserved by ISO 3166 as user-assigned codes. These must not be used to form language tags.

         Three to eight characters, it may be registered with the IANA

         One character, it is illegal

Some of the ISO 3166 country codes are shown in Table 16-6 . The complete list of country codes can be found in Appendix G .

Table 16-6. Sample ISO 3166 country codes

Country

Code

Brazil

BR

Canada

CA

China

CN

France

FR

Germany

DE

Holy See (Vatican City State)

VA

Hong Kong

HK

India

IN

Italy

IT

Japan

JP

Lebanon

LB

Mexico

MX

Pakistan

PK

Russian Federation

RU

United Kingdom

GB

United States

US

16.4.9 Remaining Subtags: Namespace

There are no rules for the third and following subtags, apart from being up to eight characters (letters and digits).

16.4.10 Configuring Language Preferences

You can configure language preferences in your browser profile.

Netscape Navigator lets you set language preferences through Edit figs/u2192.gif Preferences . . . figs/u2192.gif Languages . . . , and Microsoft Internet Explorer lets you set languages through Tools figs/u2192.gif Internet Options . . . figs/u2192.gif Languages.

16.4.11 Language Tag Reference Tables

Appendix G contains convenient reference tables for language tags:

         IANA-registered language tags are shown in Table G-1 .

         ISO 639 language codes are shown in Table G-2 .

         ISO 3166 country codes are shown in Table G-3 .

 



HTTP. The Definitive Guide
HTTP: The Definitive Guide
ISBN: 1565925092
EAN: 2147483647
Year: 2001
Pages: 294

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net