Section 4.3. Types

4.3. Types

The type system of a programming language describes how its data elements (variables and constants) are associated with storage in memory and how they are related to one another. In a statically typed language, such as C or C++, the type of a data element is a simple, unchanging attribute that often corresponds directly to some underlying hardware phenomenon, such as a register or a pointer value. In a more dynamic language such as Smalltalk or Lisp, variables can be assigned arbitrary elements and can effectively change their type throughout their lifetime. A considerable amount of overhead goes into validating what happens in these languages at runtime. Scripting languages such as Perl achieve ease of use by providing drastically simplified type systems in which only certain data elements can be stored in variables, and values are unified into a common representation, such as strings.

Java combines the best features of both statically and dynamically typed languages. As in a statically typed language, every variable and programming element in Java has a type that is known at compile time, so the runtime system doesn't normally have to check the validity of assignments between types while the code is executing. Unlike traditional C or C++, Java also maintains runtime information about objects and uses this to allow truly dynamic behavior. Java code may load new types at runtime and use them in fully object-oriented ways, allowing casting and full polymorphism (extending of types).

Java data types fall into two categories. Primitive types represent simple values that have built-in functionality in the language; they are fixed elements, such as literal constants and numbers. Reference types (or class types) include objects and arrays; they are called reference types because they "refer to" a large data type which is passed "by reference," as we'll explain shortly. In Java 5.0, generic types were introduced to the language, but they are really an extension of classes and are, therefore, actually reference types.

4.3.1. Primitive Types

Numbers, characters, and Boolean values are fundamental elements in Java. Unlike some other (perhaps more pure) object-oriented languages, they are not objects. For those situations where it's desirable to treat a primitive value as an object, Java provides "wrapper" classes. The major advantage of treating primitive values as special is that the Java compiler and runtime can more readily optimize their implementation. Primitive values and computations can still be mapped down to hardware as they always have been in lower-level languages. As of Java 5.0, the compiler can automatically convert between primitive values and their object wrappers as needed to partially mask the difference between the two. We'll explain what that means in more detail in the next chapter when we discuss boxing and unboxing of primitive values.

An important portability feature of Java is that primitive types are precisely defined. For example, you never have to worry about the size of an int on a particular platform; it's always a 32-bit, signed, two's complement number. Table 4-2 summarizes Java's primitive types.

Table 4-2. Java primitive data types
Type	Definition
`boolean`	`true` or `false`
`char`	16-bit, Unicode character
`byte`	8-bit, signed, two's complement integer
`short`	16-bit, signed, two's complement integer
`int`	32-bit, signed, two's complement integer
`long`	64-bit, signed, two's complement integer
`float`	32-bit, IEEE 754, floating-point value

Those of you with a C background may notice that the primitive types look like an idealization of C scalar types on a 32-bit machine, and you're absolutely right. That's how they're supposed to look. The 16-bit characters were forced by Unicode, and ad hoc pointers were deleted for other reasons. But overall, the syntax and semantics of Java primitive types are meant to fit a C programmer's mental habits.

4.3.1.1 Floating-point precision

Floating-point operations in Java follow the IEEE 754 international specification, which means that the result of floating-point calculations is normally the same on different Java platforms. However, since Version 1.3, Java has allowed for extended precision on platforms that support it. This can introduce extremely small-valued and arcane differences in the results of high-precision operations. Most applications would never notice this, but if you want to ensure that your application produces exactly the same results on different platforms, you can use the special keyword strictfp as a class modifier on the class containing the floating-point manipulation (we cover classes in the next chapter). The compiler then prohibits platform-specific optimizations.

4.3.1.2 Variable declaration and initialization

Variables are declared inside of methods or classes in C style, with a type followed by one or more comma-separated variable names. For example:

     int foo;     double d1, d2;     boolean isFun;

Variables can optionally be initialized with an appropriate expression when they are declared:

     int foo = 42;     double d1 = 3.14, d2 = 2 * 3.14;     boolean isFun = true;

Variables that are declared as members of a class are set to default values if they aren't initialized (see Chapter 5). In this case, numeric types default to the appropriate flavor of zero, characters are set to the null character (\0), and Boolean variables have the value false. Local variables, which are declared inside a method and live only for the duration of a method call, on the other hand, must be explicitly initialized before they can be used. As we'll see, the compiler enforces this rule so there is no danger of forgetting.

4.3.1.3 Integer literals

Integer literals can be specified in octal (base 8), decimal (base 10), or hexadecimal (base 16). A decimal integer is specified by a sequence of digits beginning with one of the characters 1-9:

     int i = 1230;

Octal numbers are distinguished from decimal numbers by a leading zero:

     int i = 01230;             // i = 664 decimal

A hexadecimal number is denoted by the leading characters 0x or 0X (zero "x"), followed by a combination of digits and the characters a-f or A-F, which represent the decimal values 10-15:

     int i = 0xFFFF;            // i = 65535 decimal

Integer literals are of type int unless they are suffixed with an L, denoting that they are to be produced as a long value:

     long l = 13L;     long l = 13;       // equivalent: 13 is converted from type int

(The lowercase letter l is also acceptable but should be avoided because it often looks like the number 1.)

When a numeric type is used in an assignment or an expression involving a "larger" type with a greater range, it can be promoted to the bigger type. In the second line of the previous example, the number 13 has the default type of int, but it's promoted to type long for assignment to the long variable. Certain other numeric and comparison operations also cause this kind of arithmetic promotion, as do mathematical expressions involving more than one type. For example, when multiplying a byte value by an int value, the compiler promotes the byte to an int first:

     byte b = 42;     int i = 43;     int result = b * i;  // b is promoted to int before multiplication

A numeric value can never go the other way and be assigned to a type with a smaller range without an explicit cast, however:

     int i = 13;     byte b = i;          // Compile-time error, explicit cast needed     byte b = (byte) i;   // OK

Conversions from floating-point to integer types always require an explicit cast because of the potential loss of precision.

4.3.1.4 Floating-point literals

Floating-point values can be specified in decimal or scientific notation. Floating-point literals are of type double unless they are suffixed with an f or F denoting that they are to be produced as a float value:

     double d = 8.31;     double e = 3.00e+8;     float f = 8.31F;     float g = 3.00e+8F;

4.3.1.5 Character literals

A literal character value can be specified either as a single-quoted character or as an escaped ASCII or Unicode sequence:

     char a = 'a';     char newline = '\n';     char smiley = '\u263a';

4.3.2. Reference Types

In an object-oriented language like Java, you create new, complex data types from simple primitives by creating a class. Each class then serves as a new type in the language. For example, if we create a new class called Foo in Java, we are also implicitly creating a new type called Foo. The type of an item governs how it's used and where it can be assigned. As with primitives, an item of type Foo can, in general, be assigned to a variable of type Foo or passed as an argument to a method that accepts a Foo value.

A type is not just a simple attribute. Classes can have relationships with other classes and so do the types that they represent. All classes exist in a parent-child hierarchy, where a child class or subclass is a specialized kind of its parent class. The corresponding types have the same relationship, where the type of the child class is considered a subtype of the parent class. Because child classes inherit all of the functionality of their parent classes, an object of the child's type is in some sense equivalent to or an extension of the parent type. An object of the child type can be used in place of an object of the parent's type. For example, if you create a new class, Cat, that extends Animal, the new type, Cat, is considered a subtype of Animal. Objects of type Cat can then be used anywhere an object of type Animal can be used; an object of type Cat is said to be assignable to a variable of type Animal. This is called subtype polymorphism and is one of the primary features of an object-oriented language. We'll look more closely at classes and objects in Chapter 5.

Primitive types in Java are used and passed "by value." In other words, when a primitive value like an int is assigned to a variable or passed as an argument to a method, it's simply copied. Reference types, on the other hand, are always accessed "by reference." A reference is simply a handle or a name for an object. What a variable of a reference type holds is a "pointer" to an object of its type (or of a subtype, as described earlier). When the reference is assigned or passed to a method, only the reference is copied, not the object it's pointing to. A reference is like a pointer in C or C++, except that its type is so strictly enforced that you can't mess with the reference itselfit's an atomic entity. The reference value itself can't be created or changed. A variable gets assigned a reference value only through assignment to an appropriate object.

Let's run through an example. We declare a variable of type Foo, called myFoo, and assign it an appropriate object:^[*]

^[*] The comparable code in C++ would be:

     Foo& myFoo = *(new Foo(  ));     Foo& anotherFoo = myFoo;

     Foo myFoo = new Foo(  );     Foo anotherFoo = myFoo;

myFoo is a reference-type variable that holds a reference to the newly constructed Foo object. (For now, don't worry about the details of creating an object; we'll cover that in Chapter 5.) We declare a second Foo type variable, anotherFoo, and assign it to the same object. There are now two identical references: myFoo and anotherFoo, but only one actual Foo object instance. If we change things in the state of the Foo object itself, we see the same effect by looking at it with either reference.

Object references are passed to methods in the same way. In this case, either myFoo or anotherFoo would serve as equivalent arguments:

     myMethod( myFoo );

An important, but sometimes confusing, distinction to make at this point is that the reference itself is a value and that value is copied when it is assigned to a variable or passed in a method call. Given our previous example, the argument passed to a method (a local variable from the method's point of view) is actually a third copy of the reference, in addition to myFoo and anotherFoo. The method can alter the state of the Foo object itself through that reference (calling its methods or altering its variables), but it can't change the caller's notion of the reference to myFoo. That is, the method can't change the caller's myFoo to point to a different Foo object; it can change only its own reference. This will be more obvious when we talk about methods later. Java differs from C++ in this respect. If you need to change a caller's reference to an object in Java, you need an additional level of indirection. The caller would have to wrap the reference in another object so that both could share the reference to it.

Reference types always point to objects, and objects are always defined by classes. However, two special kinds of reference types, arrays and interfaces, specify the type of object they point to in a slightly different way.

Arrays in Java have a special place in the type system. They are a special kind of object automatically created to hold a collection of some other type of object, known as the base type. Declaring an array type reference implicitly creates the new class type, as you'll see in the next chapter.

Interfaces are a bit sneakier. An interface defines a set of methods and gives it a corresponding type. Any object that implements all methods of the interface can be treated as an object of that type. Variables and method arguments can be declared to be of interface types, just like class types, and any object that implements the interface can be assigned to them. This allows Java to cross the lines of the class hierarchy and make objects that effectively have many types. We'll cover interfaces in the next chapter as well.

Finally, we should mention again that Java 5.0 made a major new addition to the language. Generic types or parameterized types, as they are called, are an extension of the Java class syntax that allows for additional abstraction in the way classes work with other Java types. Generics allow for specialization of classes by the user without changing any of the original class's code. We cover generics in detail Chapter 8.

4.3.3. A Word About Strings

Strings in Java are objects; they are therefore a reference type. String objects do, however, have some special help from the Java compiler that makes them look more like primitive types. Literal string values in Java source code are turned into String objects by the compiler. They can be used directly, passed as arguments to methods, or assigned to String type variables:

     System.out.println( "Hello, World..." );     String s = "I am the walrus...";     String t = "John said: \"I am the walrus...\"";

The + symbol in Java is overloaded to provide string concatenation as well as numeric addition. Along with its sister +=, this is the only overloaded operator in Java:

     String quote = "Four score and " + "seven years ago,";     String more = quote + " our" + " fathers" +  " brought...";

Java builds a single String object from the concatenated strings and provides it as the result of the expression. We discuss the String class and all things text-related in great detail in Chapter 10.