2.10. The String Type

 
[Page 41 ( continued )]

2.9. Character Data Type and Operations

The character data type, char , is used to represent a single character. A character literal is enclosed in single quotation marks. Consider the following code:

   char   letter =   'A'   ;   char   numChar =   '4'   ; 

The first statement assigns character A to the char variable letter . The second statement assigns the digit character 4 to the char variable numChar .

Caution

A string literal must be enclosed in quotation marks. A character literal is a single character enclosed in single quotation marks. So "A" is a string, and 'A' is a character.



[Page 42]

2.9.1. Unicode and ASCII Code

Computers use binary numbers internally. A character is stored as a sequence of 0s and 1s in a computer. The process of converting a character to its binary representation is called encoding . There are different ways to encode a character. How characters are encoded is defined by an encoding scheme .

Java supports Unicode , an encoding scheme established by the Unicode Consortium to support the interchange, processing, and display of written texts in the world's diverse languages. Unicode was originally designed as a 16-bit character encoding. The primitive data type char was intended to take advantage of this design by providing a simple data type that could hold any character. However, it turned out that the 65,536 characters possible in a 16-bit encoding are not sufficient to represent all the characters in the world. The Unicode standard therefore has been extended to allow up to 1,112,064 characters. Those characters that go beyond the original 16-bit limit are called supplementary characters . JDK 1.5 supports supplementary characters. Processing and representing supplementary characters are beyond the scope of this book. For simplicity, this book considers only the original 16-bit Unicode characters. These characters can be stored in a char type variable.

A 16-bit Unicode takes two bytes, preceded by \u , expressed in four hexadecimal digits that run from '\u0000' to '\uFFFF' . For example, the word "welcome" is translated into Chinese using two characters, . The Unicodes of these two characters are "\u6B22\u8FCE" . Listing 2.5 gives a program that displays two Chinese characters and three Greek letters .

Listing 2.5. DisplayUnicode.java

If no Chinese font is installed on your system, you will not be able to see the Chinese characters. The Unicodes for the Greek letters are a ? \u03b1 \u03b2 \u03b3 .

Most computers use ASCII ( American Standard Code for Information Interchange ), a 7-bit encoding scheme for representing all uppercase and lowercase letters, digits, punctuation marks, and control characters. Unicode includes ASCII code, with '\u0000' to '\u007F' corresponding to the 128 ASCII characters. (See Appendix B, "The ASCII Character Set," for a list of ASCII characters and their decimal and hexadecimal codes.) You can use ASCII characters like 'X' , '1' , and '$' in a Java program as well as Unicodes. Thus, for example, the following statements are equivalent:

   char   letter =   'A'   ;   char   letter =   '\u0041'   ;  // Character A's Unicode is 0041  

Both statements assign character A to char variable letter .

Note

The increment and decrement operators can also be used on char variables to get the next or preceding Unicode character. For example, the following statements display character b .

   char   ch =   'a'   ; System.out.println(++ch); 



[Page 43]

2.9.2. Escape Sequences for Special Characters

Java allows you to use escape sequences to represent special characters, as shown in Table 2.5. An escape sequence begins with the backslash character ( \ ) followed by a character that has a special meaning to the compiler.

Table 2.5. Java Escape Sequences
Character Escape Sequence Name Unicode Code
\b Backspace \u0008
\t Tab \u0009
\n Linefeed \u000A
\f Formfeed \u000C
\r Carriage Return \u000D
\\ Backslash \u005C
\' Single Quote \u0027
\" Double Quote \u0022

Suppose you want to print the quoted message shown below:

   He said "Java is fun"   

The statement to print it should be

 System.out.println(   "He said \"Java is fun\""   ); 

2.9.3. Casting Between char and Numeric Types

A char can be cast into any numeric type, and vice versa. When an integer is cast into a char , only its lower sixteen bits of data are used; the other part is ignored.

For example, see the following code:

   char   c = (   char   )   0XAB0041   ;  // the lower 16 bits hex code 0041 is   // assigned to c  System.out.println(c);  // c is character A  

When a floating-point value is cast into a char , the integral part of the floating-point value is cast into a char .

   char   c = (   char   )   65.25   ;  // decimal 65 is assigned to t  System.out.println(c);  // c is character A  

When a char is cast into a numeric type, the character's Unicode is cast into the specified numeric type.

   int   i = (   int   )   'A'   ;  // the Unicode of character A is assigned to i  System.out.println(i);  // i is 65  

Implicit casting can be used if the result of a casting fits into the target variable. Otherwise, explicit casting must be used. For example, since the Unicode of 'a' is 97 , which is within the range of a byte, these implicit castings are fine:

   byte   b =   'a'   ;   int   i =   'a'   ; 

But the following casting is incorrect, because the Unicode \uFFF4 cannot fit into a byte:

   byte   b =   '\uFFF4'   ; 


[Page 44]

To force assignment, use explicit casting, as follows :

   byte   b = (   byte   )   '\uFFF4'   ; 

Any positive integer between and FFFF in hexadecimal can be cast into a character implicitly. Any number not in this range must be cast into a char explicitly.

Note

All numeric operators can be applied to char operands. A char operand is automatically cast into a number if the other operand is a number or a character. If the other operand is a string, the character is concatenated with the string. For example, the following statements

   int   i =   '2'   +   '3'   ;  // (int)'2' is 50 and (int)'3' is 51  System.out.println(   "i is "   + i);   int   j =   2   +   'a'   ;  // (int)'a' is 97  System.out.println(   "j is "   + j); System.out.println(j +   " is the Unicode for character "   + (   char   )j); System.out.println(   "Chapter"   +   '2'   ); 

display

 i is 101 j is 99 99 is the Unicode for character c Chapter 2 


Note

The Unicodes for lowercase letters are consecutive integers starting from the Unicode for 'a' , then for 'b' , 'c' , and 'z' . The same is true for the uppercase letters. Furthermore, the Unicode for 'a' is greater than the Unicode for 'A' . So 'a' - 'A' is the same as 'b' - 'B' . For a lowercase letter ch , its corresponding uppercase letter is (char)('A' + (ch - 'a')) .


 


Introduction to Java Programming-Comprehensive Version
Introduction to Java Programming-Comprehensive Version (6th Edition)
ISBN: B000ONFLUM
EAN: N/A
Year: 2004
Pages: 503

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net