|
As you know, the Java programming language itself is fully Unicode based. However, operating systems typically have their own character encoding, such as ISO-8859 -1 (an 8 -bit code sometimes called the "ANSI" code) in the United States, or Big5 in Taiwan. When you save data to a text file, you should respect the local character encoding so that the users of your program can open the text file with their other applications. Specify the character encoding in the FileWriter constructor: out = new FileWriter(filename, "ISO-8859-1"); You can find a complete list of the supported encodings in Volume 1, Chapter 12. Unfortunately, there is currently no connection between locales and character encodings. For example, if your user has selected the Taiwanese locale zh_TW, no method in the Java programming language tells you that the Big5 character encoding would be the most appropriate. Character Encoding of Source FilesIt is worth keeping in mind that you, the programmer, will need to communicate with the Java compiler. And you do that with tools on your local system. For example, you can use the Chinese version of Notepad to write your Java source code files. The resulting source code files are not portable because they use the local character encoding (GB or Big5, depending on which Chinese operating system you use). Only the compiled class files are portablethey will automatically use the "modified UTF-8" encoding for identifiers and strings. That means that even when a program is compiling and running, three character encodings are involved:
(See Volume 1, Chapter 12 for a definition of the modified UTF-8 and UTF-16 formats.) TIP
To make your source files portable, restrict yourself to using the plain ASCII encoding. That is, you should change all non-ASCII characters to their equivalent Unicode encodings. For example, rather than using the string "Häuser", use "H\u0084user". The JDK contains a utility, native2ascii, that you can use to convert the native character encoding to plain ASCII. This utility simply replaces every non-ASCII character in the input with a \u followed by the four hex digits of the Unicode value. To use the native2ascii program, provide the input and output file names. native2ascii Myfile.java Myfile.temp You can convert the other way with the -reverse option: native2ascii -reverse Myfile.temp Myfile.java You can specify another encoding with the -encoding option. The encoding name must be one of those listed in the encodings table in Volume 1, Chapter 12. native2ascii -encoding Big5 Myfile.java Myfile.temp TIP
|
|