F. Charsets

Java Servlet Programming, 2nd Edition > F. Charsets

 
< BACKCONTINUE >

Appendix F. Charsets

The following table lists the suggested charset(s) for a number of languages. Charsets are used by servlets that generate multilingual output; they determine which character encoding a servlet's PrintWriter is to use. By default, the PrintWriter uses the ISO-8859-1 (Latin-1) charset, appropriate for most Western European languages. To specify an alternate charset, the charset value must be passed to the setContentType( ) method before the servlet retrieves its PrintWriter, for example:

res.setContentType("text/html; charset=Shift_JIS");  // A Japanese charset PrintWriter out = res.getWriter();  // Writes Shift_JIS Japanese

The charset can also be set implicitly using the setLocale( ) method, for example:

res.setContentType("text/html"); res.setLocale(new Locale("ja", ""));  // Sets charset to Shift_JIS PrintWriter out = res.getWriter();    // Writes Shift_JIS Japanese

The setLocale( ) method assigns a charset to the response according to the table listed here. Where multiple charsets are possible, the first listed charset is chosen.

Note that not all web browsers support all charsets or have the fonts available to represent all characters, although at minimum all clients support ISO-8859-1. Further note that the UTF-8 charset can represent all Unicode characters and may be assumed a viable alternative for all languages.

Language

Language Code

Suggested Charsets

Albanian

sq

ISO-8859-2

Arabic

ar

ISO-8859-6

Bulgarian

bg

ISO-8859-5

Byelorussian

be

ISO-8859-5

Catalan (Spanish)

ca

ISO-8859-1

Chinese (Simplified/Mainland)

zh

GB2312

Chinese (Traditional/Taiwan)

zh (country TW)

Big5

Croatian

hr

ISO-8859-2

Czech

cs

ISO-8859-2

Danish

da

ISO-8859-1

Dutch

nl

ISO-8859-1

English

en

ISO-8859-1

Estonian

et

ISO-8859-1

Finnish

fi

ISO-8859-1

French

fr

ISO-8859-1

German

de

ISO-8859-1

Greek

el

ISO-8859-7

Hebrew

he (formerly iw)

ISO-8859-8

Hungarian

hu

ISO-8859-2

Icelandic

is

ISO-8859-1

Italian

it

ISO-8859-1

Japanese

ja

Shift_JIS, ISO-2022-JP, EUC-JP[A]

Korean

ko

EUC-KR[B]

Latvian, Lettish

lv

ISO-8859-2

Lithuanian

lt

ISO-8859-2

Macedonian

mk

ISO-8859-5

Norwegian

no

ISO-8859-1

Polish

pl

ISO-8859-2

Portuguese

pt

ISO-8859-1

Romanian

ro

ISO-8859-2

Russian

ru

ISO-8859-5, KOI8-R

Serbian

sr

ISO-8859-5, KOI8-R

Serbo-Croatian

sh

ISO-8859-5, ISO-8859-2, KOI8-R

Slovak

sk

ISO-8859-2

Slovenian

sl

ISO-8859-2

Spanish

es

ISO-8859-1

Swedish

sv

ISO-8859-1

Turkish

tr

ISO-8859-9

Ukranian

uk

ISO-8859-5, KOI8-R

[A] First supported in JDK 1.1.6. Earlier versions of the JDK know the EUC-JP character set by the name EUCJIS, so for portability you can set the character set to EUC-JP and manually construct an EUCJIS PrintWriter.

[B] First supported in JDK 1.1.6. Earlier versions of the JDK know the EUC-KR character set by the name KSC_5601, so for portability you can set the character set to EUC-KR and manually construct a KSC_5601 PrintWriter.


Last updated on 3/20/2003
Java Servlet Programming, 2nd Edition, © 2001 O'Reilly

< BACKCONTINUE >


Java servlet programming
Java Servlet Programming (Java Series)
ISBN: 0596000405
EAN: 2147483647
Year: 2000
Pages: 223

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net