4.6 Objects

< Day Day Up >

The mechanism for ensuring that objects are safe in Java recognizes the difference between the identifier in the program source file and the actual runtime variable it represents; therefore, to represent the data types properly, information about the data type for both the identifier (compile time) and the variable (runtime) must be maintained. To accomplish this, in Java every object carries with it a runtime data type tag. This is simply a data item, hidden from the programmer, that contains information about the actual data type of the variable. This runtime data type tag keeps information about not only the current type of the object but also any interface or extends clause that applies to this variable.

A number of example programs are now presented to show how Java uses the compile time and runtime types to make sure a program is correct. The first is Program4.3 (Exhibit 3), which illustrates how the data type of the compiler identifier can be different than the data type of the runtime variable. In this example, the program creates an identifier of type Object, named o1, that references a variable of type Person. In the program, a call is made to the method o1.getName. Even though the runtime object has a type of Person, the compiler only knows it as an object; hence, the call to the method fails to compile.

Exhibit 3: Program4.3: Compiler Error in Referencing an Object of Data Type Object with a Variable Data Type of Person

 class Person {   String name;   public Person(String name) {     this.name = name;   }   public String getName() {     return name;   } } public class ReferenceError {   public static void main(String args[]) {     Object o1 = new Person("Chuck");     o1.getName();   } }

Program4.4 (Exhibit 4) illustrates that Java has a compile time data type and a runtime data type, and that information can be used to ensure that the type assumed at compile time is the type that actually exists at runtime. In this example, two identifiers, o1 and o2, with an identifier data type of Object are created. The first references a Person variable, and the second a Car variable. The compiler only knows that the identifiers are of data type Object, but the runtime maintains their types. When the program is run, it attempts to cast both variables to a Person. This works in the first case, because the variable object is actually a Person. It fails in the second case, because the variable is not a Person, but a Car. When the generated machine code to do the cast to a Person is executed, it realizes that the type is incorrect, and the program throws a runtime ClassCastException. Note that the error even prints out the true data type of the variable, in this case a Car. This shows that a data type is maintained at both runtime and compile time.

Exhibit 4: Program4.4: Runtime Error Casting a Variable to a Non-Matching Data Type

 class Person { } class Car { } public class RuntimeError {   public static void main(String args[]) {     Object o1 = new Person();     Object o2 = new Car();     Person p1 = (Person) o1;     Person p2 = (Person) o2;   } }

The existence of a runtime data tag is an important feature in Java. While the use of runtime data tags does not remove the possibility of data type mismatch between the identifiers and the variables (as was achieved with primitives), it does make any mismatch explicit, which is useful to the programmer, compiler, and the JVM. Having the runtime data type tags offers at least four big advantages over systems that do not use them:

Being able to verify that the runtime data types match the data types expected when the program was compiled produces safer systems.
Knowing the runtime data types allows the VM to manage the memory, which is error prone and easier to implement than having the programmer do it.
Knowing the runtime data types allows the objects to be effectively written to other systems or persistent storage (such as a disk or database).
Standard classes, called collection classes, can be developed to safely store and retrieve objects.

The first three of these points are covered in the next three sections; point 4 is the subject of Section 4.7.

4.6.1 Using Runtime Data Type Tags Results in Safer Programs

One of the main advantages of using runtime data type tags is that doing so makes programs safer for two reasons: First, the data type of the variable can be checked when the variable is being used to be sure that it matches the type that the compiler expected. If the type does not match, the JVM will immediately throw an error at the point where the mismatch occurs, as shown in Exhibit 4 (Program4.4). Languages that do not have this runtime check often allow a program to access data in memory that is outside of the variable currently being operated on. When this happens, often the program does not immediately fail, as the reference itself is not invalid, but what the reference does to memory that it does not own is invalid. Most often this results in corrupting memory used by some other variable or method in the program. The program continues to execute until a statement is reached that is affected by the corrupted memory, and the program fails, often with a strange and cryptic message. The statement that fails is normally perfectly valid, and it is only the existence of the corrupted memory that causes it to fail. In fact, the statement that fails normally has no logical connection, from the point of view of the source program, to the statement where the program memory was corrupted.

This problem is actually worse than a program simply failing. If the memory that was corrupted is persistent (i.e., stored in a file or database), then the failure of the program could occur days or weeks after the actual execution of the statement that caused it. This makes finding these types of bugs exceedingly difficult to find and fix and can result in very far-reaching effects if the errors are allowed to exist over time. Having the program fail when an invalid casting error is encountered does not guarantee that the error will not happen, but it does protect against the knock-on effects of allowing a corrupted program to continue to run. It also makes the bug easier to find and fix because the reason for the error and the exact location of the error are known.

The second reason why runtime data type tags improve the safety of a program is that they allow the programmer to control the behavior of the program when a mismatched data type is encountered. This can be done in two ways: First, a programmer can control for casting errors by catching them if they occur and acting on them in a manner consistent with the error-handling strategy in the program. Exception handling in Java is the topic of Chapter 6, but a simple try/catch block that catches ClassCastException is shown in Exhibit 5 (Program4.5). This program shows that when an invalid cast is made the programmer can control how the program reacts to the problem. Even if recovery from the error is not possible, the program can gracefully handle any necessary cleanup of data, as well as generate user-friendly messages before exiting (not the generally useless "bus error — core dumped" or "segmentation fault" errors produced when C/C++ has corrupted memory). The second way a programmer can handle casting errors is to make sure they do not happen by using the instanceof ^[3] operator. The instanceof operator checks the runtime data type tag of the object to make sure that it matches the type specified in the second operand. It returns true if they match, false otherwise. Note that the instanceof not only checks the class of the object but also can be used to check any superclass or interface that the class for this object extends or implements. Exhibit 6 (Program4.6) shows how the instanceof operator can be used to make sure that a cast is correct. If the object is an instanceof a Person, it is cast to a Person; if it is an instanceof a Car, it is cast to a Car. The cast is guaranteed to be correct as the runtime data type tag is checked before the cast is performed.

Exhibit 5: Program4.5: Catching a ClassCastException

 class Person { } class Car{ } public class CatchCast {   public static void main(String args[]) {     try {       Object o1 = new Person();       Car c1 = (Car)o1;     } catch (ClassCastException cce) {       System.out.println("The program produced a class cast         exception.");       System.out.println("Please call your software         representative.");     }   } }

Exhibit 6: Program4.6: Using the Instanceof Operator to Check the Runtime Data Type before Casting

 class Person { } class Car { } public class CorrectCast {   public static void main(String args[]) {     Object object;     //Randomly choose to create a Car or     //Person object.     int flag = (int)(2.0 * Math.random());     if (flag = = 0)       object = new Car();     else       object = new Person();     //Cast to the correct object     if (object instanceof Person) {       Person person = (Person)object;     }     else if (object instanceof Car) {       Car car = (Car)object;     }   } }

Because the use of runtime data type tags forces a Java programmer to cast to a correct type, the Java compiler also helps ensure that the use of objects is correct by forcing the programmer to consciously define the expected result when a potentially unsafe cast is to be made. To understand what type of cast is potentially unsafe, consider the casting done in Program4.7 (Exhibit 7). In this program, an object of class Person is first cast to an object of class Object. Because all Person objects are also instances of Object, this cast is guaranteed to be 100% safe, so the compiler makes an implicit cast to Object. However, a cast made from an object of type Object to a Person or a Car is not always safe. In Exhibit 7 (Program4.7), it is obvious that objects can be instances of classes other than Person as they can also be instances of class Car. The compiler forces the programmer to acknowledge this by explicitly casting the object to a Person. Because the compiler could not ensure the safety of this cast, it put the onus of making sure that the cast was legal on the programmer, which the programmer acknowledges by doing an explicit cast.

Exhibit 7: Program4.7: Implicit and Explicit Casting in Java

 class Person {   String Name; } class Car {   int EngineSize; } public class CastError1 {   public static void main(String args[]) {     Person person = new Person();     Object object = person;     Car car = (Car)object;   } }

Finally, the Java compiler will check to make sure that a cast is correct, if it can. For example, Java will not compile a cast that cannot be correct, as shown in Program4.8 (Exhibit 8). Here, there is no way a valid cast can be made from a Car object to a Person object, and so the compiler simply does not allow the cast to be written.

Exhibit 8: Program4.8: Casting Error Example

 class Person {   String Name; } class Car {   int EngineSize; } public class CastError2 {   public static void main(String args[]) {     Person person = new Person();     Car car = (Car)person;   } }

Programmers coming from other languages often complain about the amount of casting that must be done in a Java program; however, a Java program that compiles will generally run or produce an understandable error. Often, in other languages, casts are made that the programmer has not considered carefully, and sometimes the language will make an implicit cast that is not even understood by the programmer. Considering the number of major problems in software caused by improper casting in these languages, it is difficult to understand why these programmers feel so strongly about not enforcing rules that ensure safe casting.

4.6.2 Memory Allocation is Simpler and Safer

A second advantage of using runtime data type tags is that memory can be managed by the JVM and is not the responsibility of the programmer. Because Java keeps the runtime data type tags and class definitions during runtime, the JVM knows how the objects are constructed. Using a relatively straightforward garbage collection algorithm, the JVM can therefore allocate and deallocate memory for objects. In languages that do not have runtime data type tags, the information about the data that makes up the object is lost after the program is compiled. Therefore, the methods to create and recover memory used by an object must be supplied by the programmer. Giving the programmer control over the allocation and deallocation of memory can lead to the problems of dangling pointers and memory leaks. Having the VM do this job removes these two possible problems.

A simple linked-list example is used here to illustrate how a programmer implementing memory management can create both memory leaks and dangling pointers. This same linked-list example is then used to show how Java can simply and safely deal with allocating and deallocating memory. This example also demonstrates that when a programmer implements memory management it can be a daunting task to get it correct.

First, consider a memory leak. A memory leak occurs when memory is allocated in a program and not returned to the system after it is no longer being used. This memory is no longer used but is inaccessible, so the amount of available memory slowly leaks away. This situation is illustrated in Exhibit 9. In Exhibit 9(a), a linked list has been allocated with a number of nodes. The only access to each subsequent node is from a pointer from the previous node. If at some point the head of the list is no longer available (for example, the variable referring to the head of the list goes out of program scope), then all the nodes remain allocated but none can be accessed, as shown in Exhibit 9(b). To solve this problem, the programmer must create a method, often called a destructor, that is called when the variable is destroyed. The purpose of the destructor is to go through the list of nodes in Exhibit 9(b) and call a method that deallocates each node. This scheme poses a number of problems. First, the programmer must properly code the destructor, which is a non-trivial task, as shown later in this section. Second, it is often the responsibility of the programmer who uses the linked list to explicitly call the destructor, and this is sometimes forgotten. Even if good managerial controls are in place to ensure that programmers follow acceptable standards, they are not as effective as rules that are automatically enforced by the compiler and JVM.

Exhibit 9: Memory Leak with a Linked List

click to expand

The second problem that occurs when programmers must handle memory management is dangling pointers. Dangling pointers occur when a variable is deallocated while it is still referenced. To see how this can happen, consider the case of an object that is stored in two separate linked lists, as in Exhibit 10(a). This could happen if employees for each department in a company are stored in a linked list. If the entire department is sold to another company, the entire linked list for that department would be deallocated. However, suppose an employee is assigned to two departments, one that leaves and one that stays. An employee staying with the company in the second department would now be referenced in the linked list for the second department, but the record of this employee would have been deleted when the first list was deleted to prevent a memory leak. The employee record is deallocated, but it is still referenced, so the reference is called a dangling reference because it points to an object that no longer exists.

So what should programmers do? If they do not free the memory for the employees when they delete the department, they have a memory leak; however, if they do free the memory, the second list is invalid because one employee pointer is a dangling reference. This problem arises because it is not possible in most languages to easily figure out if an object is still being used, so it is a complex task for a programmer to handle this situation correctly. Some scheme such as reference counting must be implemented, but this, too, is fraught with difficulties. It is hoped that this discussion has convinced the reader that correctly allocating and deallocating memory is not an easy task even for a programmer who is properly motivated to do it carefully.

This type of problem cannot happen in Java. In Java, information about the data types of all objects is maintained at runtime. This information can be used to reconstruct all the variables that are part of the objects. For example, the JVM knows if a data value stored in an object is a primitive (e.g., an int) or a reference to another object. Knowing this, Java can build a list of all objects that are currently active. Having this information, Java can safely deallocate objects no longer used, called garbage collection, which provides the memory management for the programmer.

Exhibit 10: Dangling Reference with a Linked List

click to expand

Exhibit 11 provides a brief example of how a JVM can handle garbage collection. When an object goes out of scope in a program, the JVM does not immediately attempt to reclaim the space; instead, it allows that space to remain allocated, as shown in Exhibit 11(a). When Java runs low on memory to allocate for new objects, the JVM begins garbage collection. It first marks all unreferenced objects that can be collected as garbage, as shown in Exhibit 11(b). Next, all objects currently in scope are unmarked, as they are currently referenced. Then, because Java can distinguish between objects that are referenced from unmarked objects (because it has kept the data type tag and object definition), it can continue to unmark all the objects that it references. It does this until all objects that are currently referenced are unmarked. Any objects left marked are not currently referenced, and can be collected as garbage and the memory can be returned to the system, as shown in Exhibit 11(c). This mechanism ensures that memory leaks and dangling pointers cannot occur.

Exhibit 11: Proper Memory Collection with a Doubly Linked List

click to expand

4.6.3 Serialization

Maintaining information about the data type at runtime has another advantage. Because the JVM has access to both the data for the object and the definition, it can use that information to construct a representation of the object that can then be exported from the program and used outside of the current program. This object then can easily be written to a file or database or even sent across the network to another computer. Because the objects that have been written to a permanent, or persistent, data store, these objects are called persistent objects.

In order to use these external data formats, the data in the program must first be serializable. Serializable objects can be made persistent in two ways. The first is to use a Java-specific internal format called an object stream; the second is by using a human-readable, standards-based format called eXtensible Markup Language (XML). The discussion of serialization first covers what is meant by a serializable object and then addresses programs that output an object stream and an XML object.

4.6.3.1 The Serializable Interface

A serializable object in Java is an object that implements the java.io.Serializable interface. This interface has no methods, so one might ask what the purpose of such an interface is. The Serializable interface is simply a flag (called a tagged interface) for the class that the JVM can check to see if the class can be used with an object stream or XML object. This flag is needed because not all objects have a representation that makes sense outside of the context of the current program. For example, a FileOutputStream object represents a file that is currently open in a program. If this object is written to a persistent store and used later in another program, the reference to the file would not be open in that program, so it would not be valid.

Another example is a Graphics object. A Graphics object is an object created by a Frame that allows a programmer to draw on the current window. If the Frame that created the Graphics object no longer exists, that Graphics object is not valid, so the Graphics object cannot be made persistent. FileOutputStream and Graphics are two examples of objects that are not serializable. Most objects that a programmer creates in Java contain only data comprised of serializable objects or primitives; therefore, most objects that are created can implement the Serializable interface.

The JVM checks the objects being written to see if they implement the Serializable interface. If an attempt is ever made to write a non-serializable object to an output that expects serialized objects, such as an object stream or an XMLEncoder, a runtime error is generated.

Another option for creating serializable objects arises when not all of the data fields have valid external representations. Programmers can choose to write their own serialization for such an object by making the object implement java.io.Externalizable, or the fields that are not serializable can be marked with the Java keyword transient, which indicates that the data field should not be written to a stream.

4.6.3.2 Serializing Data

Once it is known that the data in the object has an external representation, the JVM must be able to transform that data into a format that allows it to be written to a file or other persistent store. To understand how this is done, you must first know how the data is stored in memory when the program is running. Consider Exhibit 12, which represents the heap memory used by a Vector object containing two Person objects, as shown in Exhibit 13 (Program4.9). Note that the heap memory to store this element is not organized but follows pointers to each of the objects that are stored. This memory cannot simply be copied, as the program memory used when this object is created almost certainly will not be free to take these values when the object is read in.

Exhibit 12: Vector Object for Exhibit 13 (Program4.9) as Stored in Memory

click to expand

Exhibit 13: Program4.9: Program to Create a Vector Object

 import java.util.Vector; class Person {   String name;   public Person(String name) {   } } public class VectorMemory {   public static void main(String args[]){     Vector People = new Vector();     Person Chuck = new Person("Chuck"));     People.addElement(Chuck);     People.addElement(new Person("Cindy"));     People.addElement(Chuck);   } }

To write this data to an external source, the data represented in this vector object must be "flattened" or serialized into a series of bytes. All references to objects must be made in this flattened representation. To do this, the object doing the serialization (e.g., an object stream) must have knowledge of all the objects stored as part of the aggregate stored object. Fortunately, in Java this knowledge is obtained through the runtime data type tag. Each data value making up the object can be matched to its definition and then each referenced object to its definition until only primitive data remains.

While serialization in Java is achievable, it is not trivial, as shown in Exhibit 12, which has two references to a single object that has the name "Chuck" stored. If this object is simply written to the external stream twice, when the vector is reconstructed later it would not be the same vector, as it would now reference two different objects. Serialization must take this into account. It is also possible for objects to reference each other, in effect creating a "loop" in the object graph. To see how serialization does indeed take this into account, an example of the serialized XML format of the Vector object in Exhibit 13 (Program4.9) is written in Exhibit 14 (Program4.10). The XML representation is shown in Exhibit 15. Note that the XML representation makes it clear that only one "Chuck" object is referenced twice.

Exhibit 14: Program4.10: Program to Write XML Definition of Exhibit 15

 import java.util.Vector; import java.io.BufferedOutputStream; import java.io.FileOutputStream; import java.beans.XMLEncoder; import java.io.Serializable; public class Person implements Serializable {   private String name;   public Person() {   }   public Person(String name) {     this.name = name; }   public String getName() {     return name;   }   public void setName(String name) {     this.name = name;   }   public static void main(String args[]){     try {       Vector People = new Vector();       Person chuck = new Person("Chuck");       People.addElement(chuck);       People.addElement(new Person("Sam"));       People.addElement(chuck);       XMLEncoder e = new XMLEncoder(                                new BufferedOutputStream(                            new FileOutputStream("out.xml")));       e.writeObject(People);       e.close();     } catch (Exception e) {       e.printStackTrace();     }   } }

Exhibit 15: XML Output for Vector in Exhibit 13 (Program4.9)

 <?xml version = "1.0" encoding = "UTF-8"?> <java version = "1.4.0" class = "java.beans.XMLDecoder">   <object class = "java.util.Vector">     <void method = "add">       <object id = "Person0" class = "Person">         <void property = "name">           <string>Chuck</string>         </void>       </object>     </void>     <void method = "add">       <object class = "Person">         <void property = "name">           <string>Sam</string>         </void>       </object>     </void>     <void method = "add">       <object idref = "Person0"/>     </void>   </object> </java>

4.6.3.3 Writing External Data

Once the data is serialized, it can be treated as any other stream of data and written externally from the program. For example, Exhibit 14 (Program4.10) writes a Vector object to a simple file output stream. Note that in this program the entire Vector object is serialized so that it can be written to and read from a file. Because a number of different types of objects are written to the file (in this case, the Vector object and the Person objects), the JVM must maintain the data type tag with the objects in the files so that the objects can be properly reconstructed. This is done using metadata tags in the case of XML and with an internal, binary format with object streams that is more compact than XML but not easily readable by people.

Serializable data is not intended only to be written to persistent output such as files and databases. Because the data has a representation external to the program, it can be used on computers other than the one on which it was created. Thus, serializable data is the basis for passing data in a number of distributed programming schemes, such as RMI (Chapter 13) or other protocols, such as the simple object access protocol (SOAP), or even as simple objects using sockets.

4.6.4 Performance Considerations

If runtime data type tags are so useful, why are they not used in languages such as C/C++? One reason is that they were not used when such languages were defined, and the languages that did not include them cannot be easily retrofitted to take advantage of them. Also, one can argue that runtime data type tags do not come without a price and that the overhead involved in using them is too costly in terms of memory and computing time. However, two points must be considered when discussing these tags. The first is whether the amount of program safety achieved with the runtime data type tags is worth the cost to implement them. The answer for most managers would be yes. One successful hack attack using a simple buffer overflow from the Web is likely to be far more costly than the extra hardware required for efficiently executing a Java program, and debugging one instance of corrupted memory from a miscast object is much more expensive than the computing power required to implement runtime checking as in Java. These types of problems, which are all too common, are impossible to guard against without the use of runtime data type checking built into the language.

The second point is whether or not using runtime data type tags is actually more expensive than not using them. This question is problematic because, while the tags do incur overhead, the presence of the information they provide allows the JVM to make optimizations of the programs that would otherwise be impossible. Whether or not these optimizations can actually make up for, or even exceed, the overhead necessary to implement and use the tags is still an open question. It may turn out that runtime data tags are indeed free (or nearly so). Regardless, with the current generation of JIT compilers, the extra cost is small, and the benefits normally far outweigh the costs.

^[3]Note that instanceof is an operator, like "+," "-," etc. It is not a method; hence, the somewhat strange syntax.

< Day Day Up >