13.4 Multiple Languages

Java Servlet Programming, 2nd Edition > 13. Internationalization > 13.4 Multiple Languages

 
< BACKCONTINUE >

13.4 Multiple Languages

Now it's time to push the envelope a little and attempt something that has only recently become possible. Let's write a servlet that includes several languages on the same page. In a sense, we have already written such a servlet. Our last example, HelloJapan, included both English and Japanese text. It should be observed, however, that this is a special case. Adding English text to a page is almost always possible, due to the convenient fact that nearly all charsets include the 128 US-ASCII characters. In the more general case, when the text on a page contains a mix of languages and none of the previously mentioned charsets contains all the necessary characters, we require an alternate technique.

13.4.1 UCS-2 and UTF-8

The best way to generate a page containing multiple languages is to output 16-bit Unicode characters to the client. There are two common ways to do this: UCS-2 and UTF-8. UCS-2 ( Universal Character Set, 2-byte form) sends Unicode characters in what could be called their natural format, 2 bytes per character. All characters, including US-ASCII characters, require 2 bytes. UTF-8 (UCS Transformation Format, 8-bit form) is a variable-length encoding. With UTF-8, a Unicode character is transformed into a 1-, 2-, or 3-byte representation. In general, UTF-8 tends to be more efficient than UCS-2 because it can encode a character from the US-ASCII charset using just 1 byte. For this reason, the use of UTF-8 on the Web far exceeds UCS-2. For more information on UTF-8, see RFC 2279 at http://www.ietf.org/rfc/rfc2279.txt.

Before we proceed, you should know that support for UTF-8 is not yet guaranteed. Netscape first added support for the UTF-8 encoding in Netscape Navigator 4, and Microsoft first added support in Internet Explorer 4.

13.4.2 Writing UTF-8

Example 13-7 shows a servlet that uses the UTF-8 encoding to say "Hello World!" and tell the current time (in the local time zone) in English, Spanish, Japanese, Chinese, Korean, and Russian.

Example 13-7. A Servlet Version of the Rosetta Stone
import java.io.*; import java.text.*; import java.util.*; import javax.servlet.*; import javax.servlet.http.*; import com.oreilly.servlet.ServletUtils; public class HelloRosetta extends HttpServlet {   public void doGet(HttpServletRequest req, HttpServletResponse res)                                throws ServletException, IOException {     Locale locale;     DateFormat full;     try {       res.setContentType("text/plain; charset=UTF-8");       PrintWriter out = res.getWriter();       locale = new Locale("en", "US");       full = DateFormat.getDateTimeInstance(DateFormat.LONG,                                             DateFormat.LONG,                                             locale);       out.println("In English appropriate for the US:");       out.println("Hello World!");       out.println(full.format(new Date()));       out.println();       locale = new Locale("es", "");       full = DateFormat.getDateTimeInstance(DateFormat.LONG,                                             DateFormat.LONG,                                             locale);       out.println("En Espa\u00f1ol:");       out.println("\u00a1Hola Mundo!");       out.println(full.format(new Date()));       out.println();       locale = new Locale("ja", "");       full = DateFormat.getDateTimeInstance(DateFormat.LONG,                                             DateFormat.LONG,                                             locale);       out.println("In Japanese:");       out.println("\u4eca\u65e5\u306f\u4e16\u754c");       out.println(full.format(new Date()));       out.println();       locale = new Locale("zh", "");       full = DateFormat.getDateTimeInstance(DateFormat.LONG,                                             DateFormat.LONG,                                             locale);       out.println("In Chinese:");       out.println("\u4f60\u597d\u4e16\u754c");       out.println(full.format(new Date()));       out.println();       locale = new Locale("ko", "");       full = DateFormat.getDateTimeInstance(DateFormat.LONG,                                             DateFormat.LONG,                                             locale);       out.println("In Korean:");       out.println("\uc548\ub155\ud558\uc138\uc694\uc138\uacc4");       out.println(full.format(new Date()));       out.println();       locale = new Locale("ru", "");       full = DateFormat.getDateTimeInstance(DateFormat.LONG,                                             DateFormat.LONG,                                             locale);       out.println("In Russian (Cyrillic):");       out.print("\u0417\u0434\u0440\u0430\u0432\u0441\u0442");       out.println("\u0432\u0443\u0439, \u041c\u0438\u0440");       out.println(full.format(new Date()));       out.println();     }     catch (Exception e) {       log(ServletUtils.getStackTraceAsString(e));     }   } }

Figure 13-5 shows a screen shot of the servlet's output.

Figure 13-5. A true Hello World

For this servlet to work as written, your server must support JDK 1.1.6 or later. Earlier versions of Java throw an UnsupportedEncodingException when trying to get the PrintWriter, and the page is left blank. The problem is a missing charset alias. Java has had support for the UTF-8 encoding since JDK 1.1 was first introduced. Unfortunately, the JDK used the name UTF8 for the encoding, while browsers expect the name UTF-8. So, who's right? It wasn't clear until early 1998, when the IANA (Internet Assigned Numbers Authority) declared UTF-8 to be the preferred name. (See http://www.isi.edu/in-notes/iana/assignments/character-sets.) Shortly thereafter, JDK 1.1.6 added UTF-8 as an alternate alias for the UTF8 encoding. For maximum portability across Java versions, you can use the UTF8 name directly with the following code:

res.setContentType("text/html; charset=UTF-8"); PrintWriter out = new PrintWriter(   new OutputStreamWriter(res.getOutputStream(), "UTF8"), true);

Also, your client must support the UTF-8 encoding and have access to all the necessary fonts. Otherwise, some of your output is likely to appear garbled.


Last updated on 3/20/2003
Java Servlet Programming, 2nd Edition, © 2001 O'Reilly

< BACKCONTINUE >


Java servlet programming
Java Servlet Programming (Java Series)
ISBN: 0596000405
EAN: 2147483647
Year: 2000
Pages: 223

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net