F. Charsets | Java Servlet Programming (Java Series)

Java Servlet Programming, 2nd Edition > F. Charsets

< BACK

CONTINUE >

Appendix F. Charsets

The following table lists the suggested charset(s) for a number of languages. Charsets are used by servlets that generate multilingual output; they determine which character encoding a servlet's PrintWriter is to use. By default, the PrintWriter uses the ISO-8859-1 (Latin-1) charset, appropriate for most Western European languages. To specify an alternate charset, the charset value must be passed to the setContentType( ) method before the servlet retrieves its PrintWriter, for example:

res.setContentType("text/html; charset=Shift_JIS");  // A Japanese charset PrintWriter out = res.getWriter();  // Writes Shift_JIS Japanese

The charset can also be set implicitly using the setLocale( ) method, for example:

res.setContentType("text/html"); res.setLocale(new Locale("ja", ""));  // Sets charset to Shift_JIS PrintWriter out = res.getWriter();    // Writes Shift_JIS Japanese

The setLocale( ) method assigns a charset to the response according to the table listed here. Where multiple charsets are possible, the first listed charset is chosen.

Note that not all web browsers support all charsets or have the fonts available to represent all characters, although at minimum all clients support ISO-8859-1. Further note that the UTF-8 charset can represent all Unicode characters and may be assumed a viable alternative for all languages.

Language	Language Code	Suggested Charsets
Albanian	sq	ISO-8859-2
Arabic	ar	ISO-8859-6
Bulgarian	bg	ISO-8859-5
Byelorussian	be	ISO-8859-5
Catalan (Spanish)	ca	ISO-8859-1
Chinese (Simplified/Mainland)	zh	GB2312
Chinese (Traditional/Taiwan)	zh (country TW)	Big5
Croatian	hr	ISO-8859-2
Czech	cs	ISO-8859-2
Danish	da	ISO-8859-1
Dutch	nl	ISO-8859-1
English	en	ISO-8859-1
Estonian	et	ISO-8859-1
Finnish	fi	ISO-8859-1
French	fr	ISO-8859-1
German	de	ISO-8859-1
Greek	el	ISO-8859-7
Hebrew	he (formerly iw)	ISO-8859-8
Hungarian	hu	ISO-8859-2
Icelandic	is	ISO-8859-1
Italian	it	ISO-8859-1
Japanese	ja	Shift_JIS, ISO-2022-JP, EUC-JP^[A]
Korean	ko	EUC-KR^[B]
Latvian, Lettish	lv	ISO-8859-2
Lithuanian	lt	ISO-8859-2
Macedonian	mk	ISO-8859-5
Norwegian	no	ISO-8859-1
Polish	pl	ISO-8859-2
Portuguese	pt	ISO-8859-1
Romanian	ro	ISO-8859-2
Russian	ru	ISO-8859-5, KOI8-R
Serbian	sr	ISO-8859-5, KOI8-R
Serbo-Croatian	sh	ISO-8859-5, ISO-8859-2, KOI8-R
Slovak	sk	ISO-8859-2
Slovenian	sl	ISO-8859-2
Spanish	es	ISO-8859-1
Swedish	sv	ISO-8859-1
Turkish	tr	ISO-8859-9
Ukranian	uk	ISO-8859-5, KOI8-R

^[A] First supported in JDK 1.1.6. Earlier versions of the JDK know the EUC-JP character set by the name EUCJIS, so for portability you can set the character set to EUC-JP and manually construct an EUCJIS PrintWriter.

^[B] First supported in JDK 1.1.6. Earlier versions of the JDK know the EUC-KR character set by the name KSC_5601, so for portability you can set the character set to EUC-KR and manually construct a KSC_5601 PrintWriter.

< BACK

CONTINUE >