|
|
< Day Day Up > |
|
Java deals with the possibility of the identifiers and variables having different types by using two different mechanisms, one for primitive data types and one for objects. Because these two mechanisms are very different, they are addressed in separate sections.
|
|
< Day Day Up > |
|
|
|
< Day Day Up > |
|
Primitive data types are built directly into the Java virtual machine (JVM). For example, an
int
is a primitive, but an
Integer
is an object that
wrappers
an int primitive. You can always tell the primitives in the Java language because they start with a
Because the Java language has no union constructs or pointers,
[2]
the data type of the variable referenced by the identifiers for primitives cannot be changed in Java. When a variable of a different data type is used in an assignment statement for a primitive, it always involves copying the variable and converting the copy to the correct type. For example, when setting an int variable to a float variable, the float variable is
[2] This statement is often misinterpreted to mean that Java has no pointers. Pointers are always used when running a program. The point here is that the Java language does not allow a programmer to access a pointer directly. This ensures that a programmer cannot make a reference to an invalid type when accessing a variable.
|
|
< Day Day Up > |
|
|
|
< Day Day Up > |
|
The mechanism for ensuring that objects are safe in Java recognizes the difference between the identifier in the program source file and the actual runtime variable it represents; therefore, to represent the data types properly, information about the data type for both the identifier (compile time) and the variable (runtime) must be
A number of example programs are now presented to show how Java uses the compile time and runtime types to make sure a program is correct. The first is Program4.3 (Exhibit 3), which illustrates how the data type of the compiler identifier can be different than the data type of the runtime variable. In this example, the program creates an identifier of type Object, named o1, that references a variable of type Person. In the program, a call is made to the method o1.getName. Even though the runtime object has a type of Person, the compiler only
Exhibit 3: Program4.3: Compiler Error in Referencing an Object of Data Type Object with a Variable Data Type of Person
|
|
class Person { String name; public Person(String name) { this.name = name; } public String getName() { return
name
; } } public class ReferenceError { public static void main(String args[]) { Object o1 = new Person("Chuck"); o1.getName(); } }
|
|
Program4.4 (Exhibit 4) illustrates that Java has a compile time data type and a runtime data type, and that information can be used to ensure that the type assumed at compile time is the type that actually exists at runtime. In this example, two identifiers, o1 and o2, with an identifier data type of Object are created. The first references a Person variable, and the second a Car variable. The compiler only knows that the identifiers are of data type Object, but the runtime maintains their types. When the program is run, it attempts to cast both
Exhibit 4: Program4.4: Runtime Error Casting a Variable to a Non-Matching Data Type
|
|
class Person { } class Car { } public class RuntimeError { public static void main(String args[]) { Object o1 = new Person(); Object o2 = new Car(); Person p1 = (Person) o1; Person p2 = (Person) o2; } }
|
|
The existence of a runtime data tag is an important feature in Java. While the use of runtime data tags does not remove the possibility of data type mismatch between the identifiers and the variables (as was achieved with primitives), it does make any mismatch explicit, which is useful to the programmer, compiler, and the JVM. Having the runtime data type tags offers at least four big advantages over systems that do not use them:
Being able to verify that the runtime data types match the data types expected when the program was compiled produces safer systems.
Knowing the runtime data types allows the VM to manage the memory, which is error prone and easier to implement than having the programmer do it.
Knowing the runtime data types allows the objects to be effectively written to other systems or persistent storage (such as a disk or database).
Standard classes, called collection classes , can be developed to safely store and retrieve objects.
The first three of these points are covered in the
One of the main advantages of using runtime data type tags is that doing so makes programs safer for two reasons: First, the data type of the variable can be checked when the variable is being used to be sure that it matches the type that the compiler expected. If the type does not match, the JVM will immediately throw an error at the point where the mismatch occurs, as shown in Exhibit 4 (Program4.4). Languages that do not have this runtime check often allow a program to access data in memory that is outside of the variable currently being operated on. When this happens, often the program does not immediately fail, as the reference itself is not invalid, but what the reference does to memory that it does not own is invalid. Most often this results in corrupting memory used by some other variable or method in the program. The program continues to execute until a statement is reached that is affected by the corrupted memory, and the program fails, often with a
This problem is actually
The second reason why runtime data type tags improve the safety of a program is that they allow the programmer to control the behavior of the program when a mismatched data type is encountered. This can be done in two ways: First, a programmer can control for casting errors by catching them if they occur and acting on them in a manner consistent with the error-handling strategy in the program. Exception handling in Java is the topic of Chapter 6, but a simple try/catch block that catches ClassCastException is shown in Exhibit 5 (Program4.5). This program shows that when an invalid cast is made the programmer can control how the program reacts to the problem. Even if recovery from the error is not possible, the program can gracefully handle any necessary cleanup of data, as well as generate
Exhibit 5: Program4.5: Catching a ClassCastException
|
|
class Person { } class Car{ } public class CatchCast { public static void main(String args[]) { try { Object o1 = new Person(); Car c1 = (Car)o1; } catch (ClassCastException cce) { System.out.println("The program produced a class cast exception."); System.out.println("Please call your software representative."); } } }
|
|
Exhibit 6: Program4.6: Using the Instanceof Operator to Check the Runtime Data Type before Casting
|
|
class Person { } class Car { } public class CorrectCast { public static void main(String args[]) { Object object; //Randomly choose to create a Car or //Person object. int flag = (int)(2.0 * Math.random()); if (flag = = 0) object = new Car(); else object = new Person(); //Cast to the correct object if (object instanceof Person) { Person person = (Person)object; } else if (object instanceof Car) { Car car = (Car)object; } } }
|
|
Because the use of runtime data type tags forces a Java programmer to cast to a correct type, the Java compiler also helps ensure that the use of objects is correct by forcing the programmer to consciously define the expected result when a potentially unsafe cast is to be made. To understand what type of cast is
Exhibit 7: Program4.7: Implicit and Explicit Casting in Java
|
|
class Person { String Name; } class Car { int EngineSize; } public class CastError1 { public static void main(String args[]) { Person person = new Person(); Object object = person; Car car = (Car)object; } }
|
|
Finally, the Java compiler will check to make sure that a cast is correct, if it can. For example, Java will not compile a cast that cannot be correct, as shown in Program4.8 (Exhibit 8). Here, there is no way a valid cast can be made from a Car object to a Person object, and so the compiler simply does not allow the cast to be written.
Exhibit 8: Program4.8: Casting Error Example
|
|
class Person { String Name; } class Car { int EngineSize; } public class CastError2 { public static void main(String args[]) { Person person = new Person(); Car car = (Car)person; } }
|
|
Programmers coming from other languages often complain about the amount of casting that must be done in a Java program; however, a Java program that compiles will generally run or produce an understandable error. Often, in other languages, casts are made that the programmer has not
A second advantage of using runtime data type tags is that memory can be managed by the JVM and is not the responsibility of the programmer. Because Java keeps the runtime data type tags and class definitions during runtime, the JVM knows how the objects are
A simple linked-list example is used here to
First, consider a memory leak. A memory leak occurs when memory is allocated in a program and not returned to the system after it is no longer being used. This memory is no longer used but is inaccessible, so the amount of available memory slowly leaks away. This situation is
Exhibit 9: Memory Leak with a Linked List
|
|
|
|
The second problem that occurs when programmers must handle memory management is dangling pointers. Dangling pointers occur when a variable is deallocated while it is still referenced. To see how this can happen, consider the case of an object that is stored in two separate linked lists, as in Exhibit 10(a). This could happen if
So what should programmers do? If they do not free the memory for the employees when they delete the department, they have a memory leak; however, if they do free the memory, the second list is invalid because one employee pointer is a dangling reference. This problem arises because it is not possible in most languages to easily figure out if an object is still being used, so it is a complex task for a programmer to handle this situation correctly. Some scheme such as reference counting must be implemented, but this, too, is fraught with difficulties. It is hoped that this discussion has convinced the reader that correctly allocating and deallocating memory is not an easy task even for a programmer who is properly motivated to do it carefully.
This type of problem cannot happen in Java. In Java, information about the data types of all objects is maintained at runtime. This information can be used to reconstruct all the variables that are part of the objects. For example, the JVM knows if a data value stored in an object is a primitive (e.g., an int) or a reference to another object. Knowing this, Java can build a list of all objects that are currently active. Having this information, Java can safely deallocate objects no longer used, called garbage collection , which provides the memory management for the programmer.
Exhibit 10: Dangling Reference with a Linked List
|
|
{% if main.adsdop %}{% include 'adsenceinline.tpl' %}{% endif %}
|
|
Exhibit 11 provides a brief example of how a JVM can handle garbage collection. When an object goes out of scope in a program, the JVM does not immediately attempt to
Exhibit 11: Proper Memory Collection with a Doubly Linked List
|
|
|
|
Maintaining information about the data type at runtime has another advantage. Because the JVM has access to both the data for the object and the definition, it can use that information to construct a representation of the object that can then be exported from the program and used outside of the current program. This object then can easily be written to a file or database or even sent across the network to another computer. Because the objects that have been written to a permanent, or persistent, data store, these objects are called persistent objects .
In order to use these external data formats, the data in the program must first be serializable. Serializable objects can be made persistent in two ways. The first is to use a Java-specific internal format called an
object stream
; the second is by using a
A serializable object in Java is an object that implements the java.io.Serializable interface. This interface has no methods, so one might ask what the purpose of such an interface is. The Serializable interface is simply a flag (called a
tagged interface
) for the class that the JVM can check to see if the class can be used with an object stream or XML object. This flag is needed because not all objects have a representation that makes sense outside of the context of the current program. For example, a FileOutputStream object represents a file that is currently open in a program. If this object is written to a persistent store and used later in another program, the reference to the file would not be
Another example is a Graphics object. A Graphics object is an object created by a Frame that allows a programmer to draw on the current window. If the Frame that created the Graphics object no longer exists, that Graphics object is not valid, so the Graphics object cannot be made persistent. FileOutputStream and Graphics are two examples of objects that are not serializable. Most objects that a programmer creates in Java contain only data comprised of serializable objects or primitives; therefore, most objects that are created can implement the Serializable interface.
The JVM checks the objects being written to see if they implement the Serializable interface. If an attempt is ever made to write a non-serializable object to an output that expects serialized objects, such as an object stream or an XMLEncoder, a runtime error is generated.
Another option for creating serializable objects arises when not all of the data fields have valid external representations. Programmers can choose to write their own serialization for such an object by making the object implement java.io.Externalizable, or the fields that are not serializable can be marked with the Java keyword transient , which indicates that the data field should not be written to a stream.
Once it is known that the data in the object has an external representation, the JVM must be able to transform that data into a format that allows it to be written to a file or other persistent store. To understand how this is done, you must first know how the data is stored in memory when the program is running. Consider Exhibit 12, which represents the heap memory used by a Vector object containing two Person objects, as shown in Exhibit 13 (Program4.9). Note that the heap memory to store this element is not organized but
Exhibit 12: Vector Object for Exhibit 13 (Program4.9) as Stored in Memory
|
|
|
|
Exhibit 13: Program4.9: Program to Create a Vector Object
|
|
import java.util.Vector; class Person { String name; public Person(String name) { } } public class VectorMemory { public static void main(String args[]){ Vector People = new Vector(); Person Chuck = new Person("Chuck")); People.addElement(Chuck); People.addElement(new Person("Cindy")); People.addElement(Chuck); } }
|
|
To write this data to an external source, the data represented in this vector object must be "flattened" or serialized into a series of bytes. All references to objects must be made in this flattened representation. To do this, the object doing the serialization (e.g., an object stream) must have knowledge of all the objects stored as part of the aggregate stored object. Fortunately, in Java this knowledge is obtained through the runtime data type tag. Each data value making up the object can be matched to its definition and then each referenced object to its definition until only primitive data remains.
While serialization in Java is achievable, it is not trivial, as shown in Exhibit 12, which has two references to a single object that has the name "Chuck" stored. If this object is simply written to the external stream twice, when the vector is reconstructed later it would not be the same vector, as it would now reference two different objects. Serialization must take this into account. It is also possible for objects to reference each other, in effect creating a "loop" in the object graph. To see how serialization does indeed take this into account, an example of the serialized XML format of the Vector object in Exhibit 13 (Program4.9) is written in Exhibit 14 (Program4.10). The XML representation is shown in Exhibit 15. Note that the XML representation makes it clear that only one "Chuck" object is referenced twice.
Exhibit 14: Program4.10: Program to Write XML Definition of Exhibit 15
|
|
import java.util.Vector; import java.io.BufferedOutputStream; import java.io.FileOutputStream; import java.beans.XMLEncoder; import java.io.Serializable; public class Person implements Serializable { private String name; public Person() { } public Person(String name) { this.name = name; } public String getName() { return name; } public void setName(String name) { this.name = name; } public static void main(String args[]){ try { Vector People = new Vector(); Person chuck = new Person("Chuck"); People.addElement(chuck); People.addElement(new Person("Sam")); People.addElement(chuck); XMLEncoder e = new XMLEncoder( new BufferedOutputStream( new FileOutputStream("out.xml"))); e.writeObject(People); e.close(); } catch (Exception e) { e.printStackTrace(); } } }
|
|
Exhibit 15: XML Output for Vector in Exhibit 13 (Program4.9)
|
|
<?xml version = "1.0" encoding = "UTF-8"?> <java version = "1.4.0" class = "java.beans.XMLDecoder"> <object class = "java.util.Vector"> <void method = "add"> <object id = "Person0" class = "Person"> <void property = "name"> <string>Chuck</string> </void> </object> </void> <void method = "add"> <object class = "Person"> <void property = "name"> <string>Sam</string> </void> </object> </void> <void method = "add"> <objectidref = "Person0"/> </void> </object> </java>
|
|
Once the data is serialized, it can be treated as any other stream of data and written externally from the program. For example, Exhibit 14 (Program4.10)
Serializable data is not intended only to be written to persistent output such as files and databases. Because the data has a representation external to the program, it can be used on computers other than the one on which it was created. Thus, serializable data is the basis for passing data in a number of distributed programming schemes, such as RMI (Chapter 13) or other protocols, such as the simple object access protocol (SOAP), or even as simple objects using sockets.
If runtime data type tags are so useful, why are they not used in languages such as C/C++? One reason is that they were not used when such languages were defined, and the languages that did not include them cannot be easily retrofitted to take advantage of them. Also, one can argue that runtime data type tags do not come without a price and that the overhead involved in using them is too costly in terms of memory and computing time. However, two points must be considered when discussing these tags. The first is whether the amount of program safety achieved with the runtime data type tags is worth the cost to implement them. The answer for most managers would be
yes
. One successful hack attack using a simple buffer overflow from the Web is likely to be far more costly than the extra hardware required for
The second point is whether or not using runtime data type tags is actually more expensive than not using them. This question is
[3] Note that instanceof is an operator, like "+," "-," etc. It is not a method; hence, the somewhat strange syntax.
|
|
< Day Day Up > |
|