Data Types | Ground-Up Java

Imagine what would happen if SimCom accidentally treated bytes of data as if they were instructions, or instructions as if they were data. In the first case, the virtual machine would execute a random series of opcodes, producing nothing of value. In the second case, instructions would likely be modified (added to or subtracted from one another), again producing nothing of value.

The point is that SimCom uses memory for two different purposes, instructions and data, and each memory type must be treated appropriately. There are no facilities built into SimCom to guarantee appropriate treatment. You just have to be a careful programmer.

This distinction between memory uses is also found in Java and all other high-level languages. Fortunately, Java makes it impossible to execute data or do arithmetic on opcodes.

SimCom has no facilities for dealing with fractions, characters, or very large numbers, and negative numbers are mysterious. Java supports all these different characteristics of numbers. It does this by providing different data types. For now, you can think of a data type as a way of using memory to represent data. SimCom uses an eight-bit base-2 representation. Java provides several base-2 representations: two representations for numbers that might contain fractions, one for characters, and one for logical (true/false) values.

Processing a Java data type as if it were a different type would produce worthless results. Java protects you from this kind of problem by requiring you to declare all data types; the compiler enforces the integrity of your declarations. Of course, this will make much more sense later in this chapter, after we discuss declarations. Right now, let's look at Java's data types. Later on, you'll see how they're used.

Integer Data Types

In the terminology of programming, an integer is a data type that represents non-fractional numbers. In Java, all integer types are signed, meaning that both positive and negative values are supported (as is zero). Java's four integer types are shown in Table 2.1.

Table 2.1: Java's Integer Data Types
Name	Size	Minimum Value	Maximum Value
byte	8 bits	-128	127
short	16 bits	-32768	32767
int	32 bits	-2147483648	2147483647
long	64 bits	-9223372036854775808	9223372036854775807

Each data type shown in Table 2.1 has a finite range. Wider ranges are accommodated by data types that require more memory. No type is unlimited – each has a minimum and a maximum value – but it is difficult to imagine exhausting the capacity of the long type, which ranges from minus nine quintillion to plus nine quintillion.

Java uses a format known as two's complement to represent negative numbers. In nearly all cases, the details of this representation are hidden from programmers, so you can go for a long time without having to know about it. However, there are times when a program will produce baffling results if you don't know about two's complement. Also, there are some arithmetic operators (discussed in Chapter 3, "Operations") that only make sense if you know how negative numbers are represented.

Two's complement is an evolution of the classical base-2 notation that we all learned in elementary school. If you need a review, you can run the Simple Base 2 animated illustration on the CD-ROM. First run the Java setup script you created in Appendix A (assuming you haven't run it already), and type java twoscomp.SimpleBase2Lab. You see a ten-bit number. You can click on individual bits to change their values. When you're ready, click the Run button to see which number is represented. Figure 2.4 shows SimpleBase2Lab in action.

click to expand
Figure 2.4: SimpleBase2Lab

Straightforward base-2 notation, as shown in the Simple Base 2 animation, is not exactly what computers use to represent numbers. Two's complement is more sophisticated than regular base-2, because of the way negative numbers are represented.

Imagine a car with an odometer that uses base-2 rather than base-10. There are a lot more digits than usual, and they roll over more frequently, but otherwise this odometer is like an ordinary odometer. Every time you drive another mile, the displayed number increases by 1. When the display is showing all 1s, and you drive one more mile, the odometer rolls over and shows all 0s. Thus if you wanted to get imaginative, you could say that in a way a display of all 1s represented -1 mile, because when you add one more mile, you get zero miles.

What about a display that consists of all 1s except for the rightmost digit, which is zero? (This would be 11111110 on an 8-bit odometer.) You could make a case that this reading represents -2 miles, because when you drive two more miles you get zero miles.

Here is another way to make the same case: if you were willing to break the law, you could open the odometer and roll it back manually. If it initially showed one mile and you rolled it back once, it would show zero miles. If you then rolled it back once more, it would show 11111111.

Figure 2.5 shows a base-2 odometer.

Figure 2.5: A base-2 odometer

Two's complement works like an odometer. A value of all 1s represents -1. Other values are assigned to ensure consistency. For example, with an 8-bit byte, a value of 11111110 represents -2. This makes sense, because adding 1 produces the "all 1s" representation for -1.

The general rules for two's complement are as follows:

A value of all 0s represents zero.
If the leftmost bit is 0, the number is positive. The remaining bits represent the value in base-2.
If the leftmost bit is 1, the number is negative. To compute the magnitude of the value, invert all bits (changing 0s to 1s and 1s to 0s) and then add 1.

Figure 2.6 shows how to compute the value of the 16-bit short 1111111110011001.

Figure 2.6: An example of two's complement

Figure 2.6 demonstrates that after you invert all the bits and add 1, the magnitude is 103. Thus, the original value of 1111111110011001 must represent -103.

Thinking in two's complement is not intuitive, but fortunately you rarely have to do it. However, it is important to get familiar with this format. There is an animated illustration on the CD-ROM to make this process more enjoyable. To run it, type java twoscomp.TwosCompLab. Figure 2.7 shows the program.

click to expand
Figure 2.7: Two's complement lab

You can select 8-, 16-, or 32-bit data, corresponding to Java's byte, short, and int data types. (The 64-bit long type does not fit on a screen.) Buttons allow you to set the data to all 0s or all 1s. You can click on an individual bit to change its value. When you are ready, click on the Go button. The program will animate the steps involved in computing the value represented by the bit pattern.

Compute the value represented by the "all 1s" pattern for the byte, short, and int types.

Floating-Point Data Types

Integer data types cannot represent fractions. If you try to use an integer type to store a number with a fractional part, the fractional part will just be discarded. For example, if you divide 29 by 10 (as you'll do in the next chapter) and store the result in a short, you will find that the short contains 2, not 2.9.

Floating-point data types can represent numbers with fractional parts. Java provides two floating-point data types, called float and double, as shown in Table 2.2.

Table 2.2: Java's Floating-Point Data Types
Name	Size	Minimum Value	Maximum Value	Smallest-Magnitude Positive Value
float	32 bits	-3.4 x 10³⁸	3.4 x 10³⁸	1.4 x 10^-45
double	64 bits	-1.8 x 10³⁰⁸	1.8 x 10³⁰⁸	4.9 x 10^-324

The maximum value for a float is approximately 34 followed by a string of 37 zeros: 340 undecillion. With such magnitudes, the common ways of naming numbers become impractical. We use scientific notation, as shown in the value columns of Table 2.2. With scientific notation, a number that would ordinarily have a huge string of zeros is represented by a value between 1 and 10 (always strictly less than 10), multiplied by 10 raised to the appropriate power.

The rightmost column of Table 2.2 shows the smallest positive numbers that the data types can represent. These values contain long strings of zeros, not because they are very large, but because they are very small. 1.4 x 10^-45 is another way of saying 0.0000000000000000000000000000000000000000000014.

The original computer output devices—terminals and teletypes—only had one font size, so superscripted exponents could not be displayed. An abbreviation known as scientific notation was developed. In scientific notation, the letter E (which is short for exponent) is shorthand for "times-ten-to-the." For example, the scientific notation for 3.45 x 10^-67 would be 3.45E-67. If you write code that prints out large numbers or very small fractions, you are likely to see scientific notation.

In the discussion of integer data types, you learned that you must understand two's complement notation to really understand certain Java operations. Fortunately, you don't need to know how floating-point numbers are represented internally to understand any Java operations. However, if you are interested in how this is done, you can run the Floating-Point Lab animated illustration by typing java floating.FloatFrame. The program lets you vary the bits of a 32-bit float number and observe the effect this has on the value. As you might expect, the fractional and exponent parts of the data are in base 2, and the exponent is a power of 2 rather than 10. You might discover certain bit combinations that result in "special values".

Doubles use 64 bits, twice as many as floats. The extra bits are used to give the data type both more range and more precision. The name double originates from double precision.

Representing Characters

Java uses a 16-bit data type called char to represent text characters. The data type can accommodate 2¹⁶ or 65,536 bit combinations, so 65,536 characters can be represented. This is more than enough to encode all European-based languages, but not enough for Chinese, Japanese, Korean, and certain others. The correspondence between characters and bit combinations is defined by the Unicode standard, which is beautifully described at www.unicode.org.

Representing Logical Values

The integer and floating-point formats represent numerical values. The char data type represents text characters. Java has one last data type, boolean, which represents logical values. Different JVMs may use different numbers of bytes to store booleans. Often 4 bytes are used, although this is not always the case.

The numerical and char types can represent many different values, from 256 possible values for byte all the way up to 18446644073709551616 for long. The boolean type can represent only two possible values: true and false. This data type is useful for controlling conditional execution. For example, a block of code might need to execute only if it's midnight and a certain database query returns more than 100 records but less than 500. Or a block of code might need to execute if the user has entered a special request and a password. Java uses logical values to express conditions like these that might be true and might be false As you will see in Chapter 3, there are special boolean operations that operate on these values.

Logical values and the operations that act upon them were first studied by George Boole, an 18th-century British mathematician. He is the only person in history whose name has been immortalized as a computer-language concept.

Recap of Java's Data Types

So far this chapter has introduced Java's 8 basic data types. These types are summarized in Table 2.3.

Table 2.3: Java's Primitive Data Types
Data Type	# of Bits	Used For	Internal Format
byte	8	Very small integers	2's complement
short	16	Small integers	2's complement
int	32	Integers	2's complement
long	64	Large integers	2's complement
float	32	Fractions, very large numbers	Floating-point
double	64	Fractions, huge numbers	Floating-point
char	16	Characters	Unicode
boolean	??	Logic	Unavailable

These data types are collectively called primitives to distinguish them from object-oriented types. (We will begin our study of objects in Chapter 7.)

We now turn to the question of what you can do with all this data.