30.3. MVC Variations

 
[Page 888 ( continued )]

26.6. (Optional) Character Encoding

Java programs use Unicode. When you read a character using text I/O, the Unicode code of the character is returned. The encoding of the character in the file may be different from the Unicode encoding. Java automatically converts it to the Unicode. When you write a character using text I/O, Java automatically converts the Unicode of the character to the encoding specified for the file. This is pictured in Figure 26.11.

Figure 26.11. The encoding of the file may be different from the encoding used in the program.

You can specify an encoding scheme using a constructor of Scanner / PrintWriter for text I/O, as follows :

   public   Scanner(File file, String encodingName)   public   PrintWriter(File file, String encodingName) 

For a list of encoding schemes supported in Java, please see http://java.sun.com/j2se/1.5.0/docs/guide/intl/encoding.doc.html and http://mindprod.com/jgloss/encoding.html. For example, you may use the encoding name GB18030 for simplified Chinese characters , Big5 for traditional Chinese characters, Cp939 for Japanese characters, Cp933 for Korean characters, and Cp838 for Thai characters.


[Page 889]

The code in Listing 26.8 creates a file using the GB18030 encoding (line 8). You have to read the text using the same encoding (line 12). The output is shown in Figure 26.12(a).

Figure 26.12. You can specify an encoding scheme for a text file.

Listing 26.8. EncodingDemo.java
 1   import   java.util.*;  2   import   java.io.*;  3   import   javax.swing.*;  4  5   public class   EncodingDemo {  6   public static void   main(String[] args)  7   throws   IOException, FileNotFoundException {  8     PrintWriter output = new PrintWriter(   "temp.txt"   ,    "GB18030"    );  9     output.print(   "\u6B22\u8FCE Welcome \u03b1\u03b2\u03b3"   ); 10     output.close(); 11 12     Scanner input =   new   Scanner(   new   File(   "temp.txt"   ),    "GB18030"    ); 13     JOptionPane.showMessageDialog(   null   , input.nextLine()); 14   } 15 } 

If you don't specify an encoding in lines 8 and 12, the system's default encoding scheme is used. The US default encoding is ASCII. ASCII code uses 8 bits. Java uses the 16-bit Unicode. If a Unicode is not an ASCII code, the character ' ? ' is written to the file. Thus, when you write \u6B22 to an ASCII file, the ? character is written to the file. When you read it back, you will see the ? character, as shown in Figure 26.12(b).

To find out the default encoding on your system, use

 System.out.println(System.getProperty(   "file.encoding"   )); 

The default encoding name is Cp1252 on Windows, which is a variation of ASCII.

 


Introduction to Java Programming-Comprehensive Version
Introduction to Java Programming-Comprehensive Version (6th Edition)
ISBN: B000ONFLUM
EAN: N/A
Year: 2004
Pages: 503

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net