Data Types

   


Java is a strongly typed language. This means that every variable must have a declared type. There are eight primitive types in Java. Four of them are integer types; two are floating-point number types; one is the character type char, used for code units in the Unicode encoding scheme (see the section on the char type); and one is a boolean type for truth values.

NOTE

Java has an arbitrary precision arithmetic package. However, "big numbers," as they are called, are Java objects and not a new Java type. You see how to use them later in this chapter.


Integers

The integer types are for numbers without fractional parts. Negative values are allowed. Java provides the four integer types shown in Table 3-1.

Table 3-1. Java Integer Types

Type

Storage Requirement

Range (Inclusive)

Int

4 bytes

2,147,483,648 to 2,147,483, 647 (just over 2 billion)

Short

2 bytes

32,768 to 32,767

Long

8 bytes

9,223,372,036,854,775,808 to 9,223,372,036,854,775,807

Byte

1 byte

128 to 127


In most situations, the int type is the most practical. If you want to represent the number of inhabitants of our planet, you'll need to resort to a long. The byte and short types are mainly intended for specialized applications, such as low-level file handling, or for large arrays when storage space is at a premium.

Under Java, the ranges of the integer types do not depend on the machine on which you will be running the Java code. This alleviates a major pain for the programmer who wants to move software from one platform to another, or even between operating systems on the same platform. In contrast, C and C++ programs use the most efficient integer type for each processor. As a result, a C program that runs well on a 32-bit processor may exhibit integer overflow on a 16-bit system. Because Java programs must run with the same results on all machines, the ranges for the various types are fixed.

Long integer numbers have a suffix L (for example, 4000000000L). Hexadecimal numbers have a prefix 0x (for example, 0xCAFE). Octal numbers have a prefix 0. For example, 010 is 8. Naturally, this can be confusing, and we recommend against the use of octal constants.

C++ NOTE

In C and C++, int denotes the integer type that depends on the target machine. On a 16-bit processor, like the 8086, integers are 2 bytes. On a 32-bit processor like the Sun SPARC, they are 4-byte quantities. On an Intel Pentium, the integer type of C and C++ depends on the operating system: for DOS and Windows 3.1, integers are 2 bytes. When 32-bit mode is used for Windows programs, integers are 4 bytes. In Java, the sizes of all numeric types are platform independent.

Note that Java does not have any unsigned types.


Floating-Point Types

The floating-point types denote numbers with fractional parts. The two floating-point types are shown in Table 3-2.

Table 3-2. Floating-Point Types

Type

Storage Requirement

Range

float

4 bytes

approximately ±3.40282347E+38F (6 7 significant decimal digits)

double

8 bytes

approximately ±1.79769313486231570E+308 (15 significant decimal digits)


The name double refers to the fact that these numbers have twice the precision of the float type. (Some people call these double-precision numbers.) Here, the type to choose in most applications is double. The limited precision of float is simply not sufficient for many situations. Seven significant (decimal) digits may be enough to precisely express your annual salary in dollars and cents, but it won't be enough for your company president's salary. The only reasons to use float are in the rare situations in which the slightly faster processing of single-precision numbers is important or when you need to store a large number of them.

Numbers of type float have a suffix F (for example, 3.402F). Floating-point numbers without an F suffix (such as 3.402) are always considered to be of type double. You can optionally supply the D suffix (for example, 3.402D).

As of JDK 5.0, you can specify floating-point numbers in hexadecimal. For example, 0.125 is the same as 0x1.0p-3. In hexadecimal notation, you use a p, not an e, to denote the exponent.

All floating-point computations follow the IEEE 754 specification. In particular, there are three special floating-point values:

  • positive infinity

  • negative infinity

  • NaN (not a number)

to denote overflows and errors. For example, the result of dividing a positive number by 0 is positive infinity. Computing 0/0 or the square root of a negative number yields NaN.

NOTE

The constants Double.POSITIVE_INFINITY, Double.NEGATIVE_ INFINITY, and Double.NaN (as well as corresponding Float constants) represent these special values, but they are rarely used in practice. In particular, you cannot test

 if (x == Double.NaN) // is never true 

to check whether a particular result equals Double.NaN. All "not a number" values are considered distinct. However, you can use the Double.isNaN method:

 if (Double.isNaN(x)) // check whether x is "not a number" 


CAUTION

Floating-point numbers are not suitable for financial calculation in which roundoff errors cannot be tolerated. For example, the command System.out.println(2.0 - 1.1) prints 0.8999999999999999, not 0.9 as you would expect. Such roundoff errors are caused by the fact that floating-point numbers are represented in the binary number system. There is no precise binary representation of the fraction 1/10, just as there is no accurate representation of the fraction 1/3 in the decimal system. If you need precise numerical computations without roundoff errors, use the BigDecimal class, which is introduced later in this chapter.


The char Type

To understand the char type, you have to know about the Unicode encoding scheme. Unicode was invented to overcome the limitations of traditional character encoding schemes. Before Unicode, there were many different standards: ASCII in the United States, ISO 8859-1 for Western European languages, KOI-8 for Russian, GB18030 and BIG-5 for Chinese, and so on. This causes two problems. A particular code value corresponds to different letters in the various encoding schemes. Moreover, the encodings for languages with large character sets have variable length: some common characters are encoded as single bytes, others require two or more bytes.

Unicode was designed to solve these problems. When the unification effort started in the 1980s, a fixed 2-byte width code was more than sufficient to encode all characters used in all languages in the world, with room to spare for future expansion or so everyone thought at the time. In 1991, Unicode 1.0 was released, using slightly less than half of the available 65,536 code values. Java was designed from the ground up to use 16-bit Unicode characters, which was a major advance over other programming languages that used 8-bit characters.

Unfortunately, over time, the inevitable happened. Unicode grew beyond 65,536 characters, primarily due to the addition of a very large set of ideographs used for Chinese, Japanese, and Korean. Now, the 16-bit char type is insufficient to describe all Unicode characters.

We need a bit of terminology to explain how this problem is resolved in Java, beginning with JDK 5.0. A code point is a code value that is associated with a character in an encoding scheme. In the Unicode standard, code points are written in hexadecimal and prefixed with U+, such as U+0041 for the code point of the letter A. Unicode has code points that are grouped into 17 code planes. The first code plane, called the basic multilingual plane, consists of the "classic" Unicode characters with code points U+0000 to U+FFFF. Sixteen additional planes, with code points U+10000 to U+10FFFF, hold the supplementary characters.

The UTF-16 encoding is a method of representing all Unicode code points in a variable length code. The characters in the basic multilingual plane are represented as 16-bit values, called code units. The supplementary characters are encoded as consecutive pairs of code units. Each of the values in such an encoding pair falls into an unused 2048-byte range of the basic multilingual plane, called the surrogates area (U+D800 to U+DBFF for the first code unit, U+DC00 to U+DFFF for the second code unit).This is rather clever, because you can immediately tell whether a code unit encodes a single character or whether it is the first or second part of a supplementary character. For example, the mathematical symbol for the set of integers has code point U+1D56B and is encoded by the two code units U+D835 and U+DD6B. (See http://en.wikipedia.org/wiki/UTF-16 for a description of the encoding algorithm.)

In Java, the char type describes a code unit in the UTF-16 encoding.

Our strong recommendation is not to use the char type in your programs unless you are actually manipulating UTF-16 code units. You are almost always better off treating strings (which we will discuss starting on page 51) as abstract data types.

Having said that, there will be some cases when you will encounter char values. Most commonly, these will be character constants. For example, 'A' is a character constant with value 65. It is different from "A", a string containing a single character. Unicode code units can be expressed as hexadecimal values that run from \u0000 to \uFFFF. For example, \u2122 is the trademark symbol (™) and \u03C0 is the Greek letter pi (p).

Besides the \u escape sequences that indicate the encoding of Unicode code units, there are several escape sequences for special characters, as shown in Table 3-3. You can use these escape sequences inside quoted character constants and strings, such as '\u2122' or "Hello\n". The \u escape sequence (but none of the other escape sequences) can even be used outside quoted character constants and strings. For example,

 public static void main(String\5B\5D args) 

Table 3-3. Escape Sequences for Special Characters

Escape Sequence

Name

Unicode Value

\b

Backspace

\u0008

\t

Tab

\u0009

\n

Linefeed

\u000a

\r

Carriage return

\u000d

\"

Double quote

\u0022

\'

Single quote

\u0027

\\

Backslash

\u005c


is perfectly legal \u005B and \u005D are the UTF-16 encodings of the Unicode code points for [ and ].

NOTE

Although you can use any Unicode character in a Java application or applet, whether you can actually see it displayed depends on your browser (for applets) and (ultimately) on your operating system for both.


The boolean Type

The boolean type has two values, false and TRue. It is used for evaluating logical conditions. You cannot convert between integers and boolean values.

C++ NOTE

In C++, numbers and even pointers can be used in place of boolean values. The value 0 is equivalent to the bool value false, and a non-zero value is equivalent to true. This is not the case in Java. Thus, Java programmers are shielded from accidents such as

 if (x = 0) // oops...meant x == 0 

In C++, this test compiles and runs, always evaluating to false. In Java, the test does not compile because the integer expression x = 0 cannot be converted to a boolean value.



       
    top



    Core Java 2 Volume I - Fundamentals
    Core Java(TM) 2, Volume I--Fundamentals (7th Edition) (Core Series) (Core Series)
    ISBN: 0131482025
    EAN: 2147483647
    Year: 2003
    Pages: 132

    flylib.com © 2008-2017.
    If you may any questions please contact us: flylib@qtcs.net