4.3 Identifiers and Variables

< Day Day Up >

Many programmers confuse identifiers and variables. This is probably because many programming languages, particularly languages as they are used in introductory programming classes, make no practical distinction between variables and identifiers. Even introductory text-books that do make a distinction between identifiers and variables do not make it sufficient to be practically applied to objects in Java.

A first step to understanding Java, then, is to understand what identifiers and variables are. An identifier is the name and associated data type that identify a variable in a program source file. Except in interpreted languages, an identifier is generally only available when a program is compiled. A variable is the actual instantiation of the data at runtime or simply the memory that stores the data values being manipulated in the program. These are two very different things, and the failure to understand this distinction hinders the ability of a programmer to learn Java, particularly the object model as it is used in Java.

The purpose of this chapter is to clarify this difference between a variable and identifier. The discussion begins by defining data type, which is what defines the attributes (data values) and behavior (operations) of identifiers and variables:

Data type: A set of data values and operations on those data values.

An integer is a data type because it has a set of values, the two's complement representation of integer values, and operations such as "+," "-," etc. Objects are also data types because they have a set of data values, represented by the values of the instance variables for an object, and operations that are the instance methods for those objects.

Identifiers and variables are realizations of a data type. The difference is that an identifier is the reference to the data type that is contained in the source code and maintained in a symbol table by the compiler, and the variable is the realization of that data type in memory when the program runs. The following defines these terms:

Identifier: The name of the variable that is in the source code for the program.

Variable: The actual memory that is allocated at runtime.

Many programmers confuse these definitions and combine these two very different concepts, referring to both as a variable. Again, the reason for this confusion between an identifier and a variable is that often variables and identifiers are treated as the same concept in many languages. For example, consider Program4.1 (Exhibit 1). ^[1] In this program, the identifier intVar and the variable created at runtime are both of the same int data type. When the compiler generates the object code to be executed, it knows the data type for the identifier and generates machine code that does integer addition, correctly incrementing the variable. The data type of the variable is only known at compile time, when the code to manipulate it is created. But, because the generated code works correctly for an int type, it appears that considering the identifier and variable as the same data type makes no difference. Indeed, for simple programs such as this the difference between the identifier and the variable is irrelevant.

Exhibit 1: Program4.1: Simple C Program That Treats a Variable and Identifier Data Types As If They Are the Same

 main() {   int nt;   nt = 1;   nt = nt + 1;   printf("nt = %d\n," nt); }

The distinction between an identifier and variable, however, quickly becomes a problem even in simple C programs such as Program4.2 (Exhibit 2). Here, the union statement allows the compiler to choose between two different data types for the variable. In this program, the compiler generates machine code to store the variable using the integer identifier, and then generates floating-point machine code to increment the variable. Because the operation of adding 1 to an integer is very different from adding 1 to a float, the code to increment the variable is incorrect and a wrong answer is produced when the program is run. This problem is caused by the data type of the variable (integer) not matching the data type of the identifier (float) and shows why it is important to always make sure data types match.

Exhibit 2: Program4.2: Operation on an Incorrect Data Type in C

 main() {   typedef union {     int intVar;     float floatVar;   } NumType;   NumType nt;   nt.intVar = 1;   nt.floatVar = nt.floatVar + 1;   printf("nt.floatVar = %d\n," nt.floatVar); }

The problem of an identifier having a different type than the variable it represents has been carried forward to instances of structures and classes (objects) in C++. The data types for an object are maintained only at compile time in C++. At runtime, the object becomes simply a reference to memory in the heap, with most of its identifying information being lost. Because only one data type is known (the one from the compiler), the compiler must generate any code to manipulate the object. Thus, the data type for the object identifier that the programmer encoded at compile time is used to manipulate the object variable at runtime. Nothing in the language or program execution model can be used to check that the compile time and runtime data types match.

Because C/C++ allows the use of many other unsafe referencing mechanisms, such as pointers, and also allows the programmer to assign a compile time identifier of nearly any data type to any object, it is the programmer's responsibility to ensure that the data type used for the variable is correct. Therefore, it is impossible to identify and use the type difference between the identifier and variable, and many C/C++ programmers simply lose the distinction and refer to both identifiers and variables as the variables. This makes directly referencing memory unsafe, and we have no good way to fix this problem.

The problem with invalid memory accesses is one of the main reasons why constructs such as union and pointers were not allowed in Java. Java recognizes that bifurcation of the data type can occur between the compile time and runtime data types, and it implements two mechanisms to ensure type matching between compile time identifiers and runtime variables. The first method is used to ensure that primitives are always of the correct type at runtime and is explained in Section 4.4. The second method uses a different model for storage of its runtime objects, a model in which the data type of the object is maintained throughout the life of the object, for both compile time and runtime, to ensure that the correct type is maintained (see Section 4.5).

^[1]Program4.1 and Program4.2 are written in C.

< Day Day Up >