Section 5.7. Character Data and Operators


[Page 230]

5.7. Character Data and Operators

Another primitive data type in Java is the character type, char. A character in Java is represented by a 16-bit unsigned integer. This means that a total of 216 or 65536 different Unicode characters can be represented, corresponding to the integer values 0 to 65535. The Unicode character set is an international standard that has been developed to enable computer languages to represent characters in a wide variety of languages, not just English. Detailed information about this encoding can be obtained at

http://www.unicode.org/


Unicode


It is customary in programming languages to use unsigned integers to represent characters. This means that all the digits (0, . . . , 9), alphabetic letters (a, . . . , z,A, . . . , Z), punctuation symbols (such as . ; , " "! -), and nonprinting control characters (LINE_FEED, ESCAPE, CARRIAGE_RETURN, . . .) that make up the computer's character set are represented in the computer's memory by integers. A more traditional set of characters is the ASCII (American Standard Code for Information Interchange) character set. ASCII is based on a 7-bit code and, therefore, defines 27 or 128 different characters, corresponding to the integer values 0 to 127. In order to make Unicode backward compatible with ASCII systems, the first 128 Unicode characters are identical to the ASCII characters. Thus, in both the ASCII and Unicode encoding, the printable characters have the integer values shown in Table 5.13.

Table 5.13. ASCII codes for selected characters

[View full width]

Code 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 Char SP ! " # $ % & ' ( ) * + , - . / Code 48 49 50 51 52 53 54 55 56 57 Char 0 1 2 3 4 5 6 7 8 9 Code 58 59 60 61 62 63 64 Char : ; < = > ? @ Code 65 66 67 68 69 70 71 72 73 74 75 76 77 Char A B C D E F G H I J K L M Code 78 79 80 81 82 83 84 85 86 87 88 89 90 Char N O P Q R S T U V W X Y Z Code 91 92 93 94 95 96 Char [ \ ] ^ - ' Code 97 98 99 100 101 102 103 104 105 106 107 108 109 Char a b c d e f g h i j k l m Code 110 111 112 113 114 115 116 117 118 119 120 121 122 Char n o p q r s t u v w x y z Code 123 124 125 126 Char { | } ~



ASCII code


5.7.1. Character to Integer Conversions

Is 'A' a character or an integer? The fact that character data are stored as integers in the computer's memory can cause some confusion about whether a given piece of data is a character or an integer. In other words, when is a character, for example 'A', treated as the integer (65) instead of as the character 'A'? The rule in Java is that a character literal'a' or 'A' or '0' or '?'is always treated as a character, unless we explicitly tell Java to treat it as an integer. So if we display a literal's value


[Page 231]

System.out.println('a'); 


the letter 'a' will be displayed. Similarly, if we assign 'a' to a char variable and then display the variable's value,

char ch = 'a'; System.out.println(ch);        // Displays 'a' 


the letter 'a' will be shown. If, on the other hand, we wish to output a character's integer value, we must use an explicit cast operator as follows:

System.out.println((int)'a') ;    // Displays 97 


A cast operation, such as (int), converts one type of data ('a') into another (97). This is known as a type conversion. Similarly, if we wish to store a character's integer value in a variable, we can cast the char into an int as follows:

int k = (int)'a';       // Converts 'a' to 97 System.out.println(k);  // Displays 97 


As these examples show, a cast is a type conversion operator. Java allows a wide variety of both explicit and implicit type conversions. Certain conversions (for example, promotions) take place when methods are invoked, when assignment statements are executed, when expressions are evaluated, and so on.

The cast operator


Type conversion in Java is governed by several rules and exceptions. In some cases Java allows the programmer to make implicit cast conversions. For example, in the following assignment a char is converted to an int even though no explicit cast operator is used:

char ch; int k; k = ch; // convert a char into an int 


Java permits this conversion because no information will be lost. A character char is represented in 16 bits, whereas an int is represented in 32 bits. This is like trying to put a small object into a large box. Space will be left over, but the object will fit inside without being damaged. Similarly, storing a 16-bit char in a 32-bit int will leave the extra 16 bits unused. This widening primitive conversion changes one primitive type (char) into a wider one (int), where a type's width is the number of bits used in its representation.

Implicit type conversion


Widening conversion


On the other hand, trying to assign an int value to a char variable leads to a syntax error:

char ch; int k; ch = k;   // Syntax error: can't assign int to char 



[Page 232]

Trying to assign a 32-bit int to 16-bit char is like trying to fit a big object into an undersized box. The object won't fit unless we shrink it in some way. Java will allow us to assign an int value to a char variable, but only if we perform an explicit cast on it:

ch = (char)k; // Explicit cast of int k into char ch 


The (char) cast operation performs a careful "shrinking" of the int by lopping off the last 16 bits of the int. This can be done without loss of information provided that k's value is in the range 0 to 65535that is, in the range of values that fit into a char variable. This narrowing primitive conversion changes a wider type (32-bit int) to a narrower type (16-bit char). Because of the potential here for information loss, it is up to the programmer to determine whether the cast can be performed safely.

Narrowing conversion


Java Language Rule: Type Conversion

Java permits implicit type conversions from a narrower type to a wider type. A cast operator must be used when converting a wider type into a narrower type.


The cast operator can be used with any primitive type. It applies to the variable or expression that immediately follows it. Thus, parentheses must be used to cast the expression m + n into a char:

char ch = (char)(m + n); 


The following statement would cause a syntax error because the cast operator would only be applied to m:

char ch = (char)m + n; // Error: right side is an int 


In the expression on the right-hand side, the character produced by (char)m will be promoted to an int because it is part of an integer operation whose result will still be an int. Therefore, it cannot be assigned to a char without an explicit cast.

Self-Study Exercise

Exercise 5.22

Suppose that m and n are integer variables of type int and that ch1 and ch2 are character variables of type char. Determine in each of the cases that follow whether the assignment statements are valid. If not, modify the statement to make it valid.

  1. m = n;

  2. m = ch1;

  3. ch2 = n;

  4. ch1 = ch2;

  5. ch1 = m - n;


[Page 233]

5.7.2. Lexical Ordering

The order in which the characters of a character set are arranged, their lexical order, is an important feature of the character set. It especially comes into play for such tasks as arranging strings in alphabetical order.

Although the actual integer values assigned to the individual characters by ASCII and UNICODE encoding seem somewhat arbitrary, the characters are, in fact, arranged in a systematic order. For example, note that various sequences of digits, '0'. . .'9', and letters, 'a'. . .'z' and 'A'. . .'Z', are represented by sequences of integers (Table 5.11). This makes it possible to represent the lexical order of the characters in terms of the less than relationship among integers. The fact that 'a' comes before 'f' in alphabetical order is represented by the fact that 97 (the integer code for 'a') is less than 102 (the integer code for 'f'). Similarly, the digit '5' comes before the digit '9' in an alphabetical sequence because 53 (the integer code for '5') is less than 57 (the integer code for '9').

This ordering relationship extends throughout the character set. Thus, it is also the case that 'A' comes before 'a' in the lexical ordering because 65 (the integer code for 'A') is less than 97 (the integer code for 'a'). Similarly, the character '[' comes before '}' because its integer code (91) is less than 125, the integer code for '}'.

5.7.3. Relational Operators

Given the lexical ordering of the char type, the following relational operators can be defined: <, >, <=, >=, ==, !=. Given any two characters, ch1 and ch2, the expression ch1 < ch2 is true if and only if the integer value of ch1 is less than the integer value of ch2. In this case we say that ch1 precedes ch2 in lexical order. Similarly, the expression ch1 > ch2 is true if and only if the integer value of ch1 is greater than the integer value of ch2. In this case we say that ch1 follows ch2. And so on for the other relational operators. This means that we can perform comparison operations on any two character operands (Table 5.14).

Table 5.14. Relational operations on characters

Operation

Operator

Java

True Expression

Precedes

<

ch1 < ch2

a<b

Follows

>

ch1 > ch2

c>a

Precedes or equals

<=

ch1 <= ch2

a<= |´a

Follows or equals

>=

ch2 >= ch1

a>= |´a

Equal to

==

ch1 == ch2

a|´ == |´a

Not equal to

!=

ch1 != ch2

a|´ != |´b


char relations





Java, Java, Java(c) Object-Orienting Problem Solving
Java, Java, Java, Object-Oriented Problem Solving (3rd Edition)
ISBN: 0131474340
EAN: 2147483647
Year: 2005
Pages: 275

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net