Character Data Encoding

     

All Java character Strings are rendered as 16-bit Unicode. Unicode is a standard specifically created for computer processing of character data. Its purpose is to provide a consistent manner in which to encode character data, so that users throughout the world, writing in multiple languages, can share a single system.

The problem that Unicode solves is the problem introduced by ASCII character encoding, which represents our Latin alphabet beautifully, but nothing else. This is no longer an acceptable mode for character data exchange in the Internet age. ASCII Latin characters can be represented by only 8 bits each, but have a very limited range; Unicode represents all of the characters from every major written language in the world. It also includes many mathematical symbols, ideographs, technical marks, and other items. Doing so requires two bytes per character. One of the features of Java that propelled it to popularity in the early 1990s was its enforcing of the use of Unicode throughout the language in order to embrace the global economy.

graphics/fridge_icon.jpg

FRIDGE

Go to www.unicode.org and click on the link labeled Code Charts . This will let you select from all of the graphematic symbols that Unicode supports, including Greek, bi-directional Hebrew, currency and safety symbols, and more. You can read the symbol charts to find out what code you need to call to print certain characters.


Most of the time you don't have to worry about this. But if you need to present data written in Arabic, or present special symbols, you can write Unicode directly by specifying its symbol.

All Unicode characters map to a single hexadecimal number. You can specify a Unicode literal using the escape backslash followed by a u. For example, \u0123 prints a ! (bang). Here is an example using some currency symbols.

 

 String s = new String("Euro: " + "\u20AC"); s = s.concat("\nDollar: " + "\u0024"); s = s.concat("\nYen: " + "\u00A5"); System.out.println(s); 

This outputs the following:

Euro : graphics/euro.gif

Dollar : $

Yen :

Note that many Unicode characters may not be printable on your particular system. Note, too, that any given symbol is actually implemented by a font vendor, and so there may be great variation between its appearance between systems.

You can read more about character encoding and its use in I/O operations in Chapter 22, "File Input/Output."

Converting a Byte Array to a String

Java provides a few different constructors in the String class to accommodate creating String objects from byte arrays. This is particularly useful when interacting with legacy systems or native code that provide you only with byte arrays for character data.

The following code snippet converts text between Unicode and UTF-8 using a byte array:

 

 try {       //go from Unicode to UTF-8       String string = "abc\u5639\u563b";       byte[] utf8 = string.getBytes("UTF-8");       //go from UTF-8 to Unicode       string = new String(utf8, "UTF-8");    } catch (UnsupportedEncodingException e) {       //handle. we'll talk about exceptions later } 

Comparing Strings

The following code, which demos how to compare strings in different ways, is pretty straightforward. There is one weird thing, however. The compareTo() method of the String class returns an int representing the result of the operation: it returns a 0 if the strings compared are equal, a value less than 0 if the first string is lexigraphically less than the string argument, and a value greater than 0 if the first string is lexigraphically greater than the string argument.

CompareStrings.java
 

 package net.javagarage.demo.String; public class CompareStrings {   public static void main(String[] args) {      String littleName = "elvis";      String bigName = "Elvis";        //is the character content of the string        //this returns false, because of        //case-sensitivity      boolean isIdentical = littleName.equals(bigName);        //check without case-sensitivity:        //returns true!      boolean b = littleName.equalsIgnoreCase(bigName);        //check order of two strings        //lowercase FOLLOWS uppercase      int i = littleName.compareTo(bigName);      if (i < 0) {        // big comes before little      } else if (i > 0) {      } else {          //they are the same       }   } } 

This is the big thing to beware of when comparing Strings: don't use the == operator unless you really mean it! That operator compares the object references of two Strings ”not whether they have the same characters in them. Sort of.

Remember that there is a String pool, and creation of a String that already exists should reuse that reference.

The == operator is used to compare primitive values. You can't use the equals method on primitives because you can't use any method on primitives. When used with reference variables (objects, like Strings), the result of x == y is a Boolean value that is the result of the following test: do these two reference variables refer to the same object ”that is, the same space in memory? Said a different way, are the bit patterns of x and y identical?

You use the equals() method to compare two objects to determine if their meaning is equivalent. With Strings, it has been decided that their meanings are equivalent if the characters they store are the same ”that is, "J.B. Lenoir" is meaningfully equivalent to "J.B. Lenoir". It has been decided because the String class is final . If you have a way you think is an improvement over this method of comparing String meanings, you're out of luck in Javaland. However, you can go nuts and enjoy overriding the equals() method for your own objects. This is a good idea, by the way.

To determine if two Strings are not equal, you can use s1 != s2 or (! s1.equals(s2)) .

Here is something that will come in handy. Remember that you will get a runtime exception if you try to call a method on a null reference:

 

 String s1 = null; if (s1.equals("")) ...//no! 

This code will blow up because you can't call a method on a null reference: the runtime will throw a NullPointerException . The way to fix it is to get in the habit of checking the other way around: pass the possibly null value into the equals() method as its parameter. Like this:

 

 String s1 = null; if ("".equals(s1)) ....//okay 

That's a good way to test if a String is empty.



Java Garage
Java Garage
ISBN: 0321246233
EAN: 2147483647
Year: 2006
Pages: 228
Authors: Eben Hewitt

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net