Object Streams

	Core Java™ 2: Volume I - Fundamentals By Cay S. Horstmann, Gary Cornell
	Table of Contents

	Chapter 12. Streams and Files

Using a fixed-length record format is a good choice if you need to store data of the same type. However, objects that you create in an object-oriented program are rarely all of the same type. For example, you may have an array called staff that is nominally an array of Employee records but contains objects that are actually instances of a child class such as Manager.

If we want to save files that contain this kind of information, we must first save the type of each object and then the data that defines the current state of the object. When we read this information back from a file, we must:

Read the object type;
Create a blank object of that type;
Fill it with the data that we stored in the file.

It is entirely possible (if very tedious) to do this by hand, and in the first edition of this book we did exactly this. However, Sun Microsystems developed a powerful mechanism that allows this to be done with much less effort. As you will soon see, this mechanism, called object serialization, almost completely automates what was previously a very tedious process. (You will see later in this chapter where the term "serialization" comes from.)

Storing Objects of Variable Type

To save object data, you first need to open an ObjectOutputStream object:

 ObjectOutputStream out = new ObjectOutputStream(new    FileOutputStream("employee.dat"));

Now, to save an object, you simply use the writeObject method of the ObjectOutputStream class as in the following fragment:

 Employee harry = new Employee("Harry Hacker", 50000,    1989, 10, 1); Manager boss = new Manager("Carl Cracker", 80000,    1987, 12, 15); out.writeObject(harry); out.writeObject(boss);

To read the objects back in, first get an ObjectInputStream object:

 ObjectInputStream in = new ObjectInputStream(new    FileInputStream("employee.dat"));

Then, retrieve the objects in the same order in which they were written, using the readObject method.

 Employee e1 = (Employee)in.readObject(); Employee e2 = (Employee)in.readObject();

When reading back objects, you must carefully keep track of the number of objects that were saved, their order, and their types. Each call to readObject reads in another object of the type Object. You, therefore, will need to cast it to its correct type.

If you don't need the exact type or you don't remember it, then you can cast it to any superclass or even leave it as type Object. For example, e2 is an Employee object variable even though it actually refers to a Manager object. If you need to dynamically query the type of the object, you can use the getClass method that we described in Chapter 5.

You can write and read only objects with the writeObject/readObject methods, not numbers. To write and read numbers, you use methods such as writeInt/readInt or writeDouble/readDouble. (The object stream classes implement the DataInput/DataOutput interfaces.) Of course, numbers inside objects (such as the salary field of an Employee object) are saved and restored automatically. Recall that, in Java, strings and arrays are objects and can, therefore, be restored with the writeObject/readObject methods.

There is, however, one change you need to make to any class that you want to save and restore in an object stream. The class must implement the Serializable interface:

 class Employee implements Serializable { . . . }

The Serializable interface has no methods, so you don't need to change your classes in any way. In this regard, it is similar to the Cloneable interface that we also discussed in Chapter 5. However, to make a class cloneable, you still had to override the clone method of the Object class. To make a class serializable, you do not need to do anything else. Why aren't all classes serializable by default? We will discuss this in the section "Security."

Example 12-4 is a test program that writes an array containing two employees and one manager to disk and then restores it. Writing an array is done with a single operation:

 Employee[] staff = new Employee[3]; . . . out.writeObject(staff);

Similarly, reading in the result is done with a single operation. However, we must apply a cast to the return value of the readObject method:

 Employee[] newStaff = (Employee[])in.readObject();

Once the information is restored, we give each employee a 100% raise, not because we are feeling generous, but because you can then easily distinguish employee and manager objects by their different raiseSalary actions. This should convince you that we did restore the correct types.

Example 12-4 ObjectFileTest.java

   1. import java.io.*;   2. import java.util.*;   3.   4. class ObjectFileTest   5. {   6.    public static void main(String[] args)   7.    {   8.       Manager boss = new Manager("Carl Cracker", 80000,   9.          1987, 12, 15);  10.       boss.setBonus(5000);  11.  12.       Employee[] staff = new Employee[3];  13.  14.       staff[0] = boss;  15.       staff[1] = new Employee("Harry Hacker", 50000,  16.          1989, 10, 1);  17.       staff[2] = new Employee("Tony Tester", 40000,  18.          1990, 3, 15);  19.  20.       try  21.       {  22.          // save all employee records to the file employee.dat  23.          ObjectOutputStream out = new ObjectOutputStream(new  24.             FileOutputStream("employee.dat"));  25.          out.writeObject(staff);  26.          out.close();  27.  28.          // retrieve all records into a new array  29.          ObjectInputStream in =  new ObjectInputStream(new  30.             FileInputStream("employee.dat"));  31.          Employee[] newStaff = (Employee[])in.readObject();  32.          in.close();  33.  34.          // print the newly read employee records  35.          for (int i = 0; i < newStaff.length; i++)  36.             System.out.println(newStaff[i]);  37.       }  38.       catch (Exception e)  39.       {  40.          e.printStackTrace();  41.       }  42.    }  43. }  44.  45. class Employee implements Serializable  46. {  47.    public Employee() {}  48.  49.    public Employee(String n, double s,  50.       int year, int month, int day)  51.    {  52.       name = n;  53.       salary = s;  54.       GregorianCalendar calendar  55.          = new GregorianCalendar(year, month - 1, day);  56.          // GregorianCalendar uses 0 = January  57.       hireDay = calendar.getTime();  58.    }  59.  60.    public String getName()  61.    {  62.       return name;  63.    }  64.  65.    public double getSalary()  66.    {  67.       return salary;  68.    }  69.  70.    public Date getHireDay()  71.    {  72.       return hireDay;  73.    }  74.  75.    public void raiseSalary(double byPercent)  76.    {  77.       double raise = salary * byPercent / 100;  78.       salary += raise;  79.    }  80.  81.    public String toString()  82.    {  83.       return getClass().getName()  84.          + "[name=" + name  85.          + ",salary=" + salary  86.          + ",hireDay=" + hireDay  87.          + "]";  88.    }  89.  90.    private String name;  91.    private double salary;  92.    private Date hireDay;  93. }  94.  95. class Manager extends Employee  96. {  97.    /**  98.       @param n the employee's name  99.       @param s the salary 100.       @param year the hire year 101.       @param year the hire month 102.       @param year the hire day 103.    */ 104.    public Manager(String n, double s, 105.       int year, int month, int day) 106.    { 107.       super(n, s, year, month, day); 108.       bonus = 0; 109.    } 110. 111.    public double getSalary() 112.    { 113.       double baseSalary = super.getSalary(); 114.       return baseSalary + bonus; 115.    } 116. 117.    public void setBonus(double b) 118.    { 119.       bonus = b; 120.    } 121. 122.    public String toString() 123.    { 124.       return super.toString() 125.         + "[bonus=" + bonus 126.         + "]"; 127.    } 128. 129.    private double bonus; 130. }

`java.io.ObjectOutputStream` 1.1

ObjectOutputStream(OutputStream out)
creates an ObjectOutputStream so that you can write objects to the specified OutputStream.
void writeObject(Object obj)
writes the specified object to the ObjectOutputStream. This method saves the class of the object, the signature of the class, and the values of any non-static, non-transient field of the class and its superclasses.

`java.io.ObjectInputStream` 1.1

ObjectInputStream(InputStream is)
creates an ObjectInputStream to read back object information from the specified InputStream.
Object readObject()
reads an object from the ObjectInputStream. In particular, this reads back the class of the object, the signature of the class, and the values of the nontransient and nonstatic fields of the class and all of its superclasses. It does deserializing to allow multiple object references to be recovered.

Object Serialization File Format

Object serialization saves object data in a particular file format. Of course, you can use the writeObject/readObject methods without having to know the exact sequence of bytes that represents objects in a file. Nonetheless, we found studying the data format to be extremely helpful for gaining insight into the object streaming process. We did this by looking at hex dumps of various saved object files. However, the details are somewhat technical, so feel free to skip this section if you are not interested in the implementation.

Every file begins with the 2-byte "magic number"

 AC ED

followed by the version number of the object serialization format, which is currently

 00 05

(We will be using hexadecimal numbers throughout this section to denote bytes.) Then, it contains a sequence of objects, in the order that they were saved.

String objects are saved as

`74`	2-byte length	Characters

For example, the string "Harry" is saved as

 74 00 05 Harry

The Unicode characters of the string are saved in UTF format.

When an object is saved, the class of that object must be saved as well. The class description contains

The name of the class;
The serial version unique ID, which is a fingerprint of the data field types and method signatures;
A set of flags describing the serialization method;
A description of the data fields.

Java gets the fingerprint by:

Ordering descriptions of the class, superclass, interfaces, field types, and method signatures in a canonical way;
Then applying the so-called Secure Hash Algorithm (SHA) to that data.

SHA is a very fast algorithm that gives a "fingerprint" to a larger block of information. This fingerprint is always a 20-byte data packet, regardless of the size of the original data. It is created by a clever sequence of bit operations on the data that makes it essentially 100 percent certain that the fingerprint will change if the information is altered in any way. SHA is a U.S. standard, recommended by the National Institute for Science and Technology (NIST). (For more details on SHA, see, for example, Cryptography and Network Security: Principles and Practice, by William Stallings [Prentice Hall].) However, Java uses only the first 8 bytes of the SHA code as a class fingerprint. It is still very likely that the class fingerprint will change if the data fields or methods change in any way.

Java can then check the class fingerprint to protect us from the following scenario: An object is saved to a disk file. Later, the designer of the class makes a change, for example, by removing a data field. Then, the old disk file is read in again. Now the data layout on the disk no longer matches the data layout in memory. If the data were read back in its old form, it could corrupt memory. Java takes great care to make such memory corruption close to impossible. Hence, it checks, using the fingerprint, that the class definition has not changed when restoring an object. It does this by comparing the fingerprint on disk with the fingerprint of the current class.

Technically, as long as the data layout of a class has not changed, it ought to be safe to read objects back in. But Java is conservative and checks that the methods have not changed either. (After all, the methods describe the meaning of the stored data.) Of course, in practice, classes do evolve, and it may be necessary for a program to read in older versions of objects. We will discuss this in the section entitled "Versioning."

Here is how a class identifier is stored:

72
2-byte length of class name
class name
8-byte fingerprint
1-byte flag
2-byte count of data field descriptors
data field descriptors
78 (end marker)
superclass type (70 if none)

The flag byte is composed of 3 bit masks, defined in

 java.io.ObjectStreamConstants: static final byte SC_WRITE_METHOD = 1;    // class has writeObject method that writes additional data static final byte SC_SERIALIZABLE = 2;    // class implements Serializable interface static final byte SC_EXTERNALIZABLE = 4;    // class implements Externalizable interface

We will discuss the Externalizable interface later in this chapter. Externalizable classes supply custom read and write methods that take over the output of their instance fields. The classes that we write implement the Serializable interface and will have a flag value of 02. However, the java.util.Date class is externalizable and has a flag of 03.

Each data field descriptor has the format:

1-byte type code
2-byte length of field name
field name
class name (if field is an object)

The type code is one of the following:

`B`	`byte`
`C`	`char`
`D`	`double`
`F`	`float`
`I`	`int`
`J`	`long`
`L`	object
`S`	`short`
`Z`	`boolean`
`[`	array

When the type code is L, the field name is followed by the field type. Class and field name strings do not start with the string code 74, but field types do. Field types use a slightly different encoding of their names, namely, the format used by native methods. (See Volume 2 for native methods.)

For example, the salary field of the Employee class is encoded as:

 D 00 06 salary

Here is the complete class descriptor of the Employee class:

`72 00 08 Employee`
	`E6 D2 86 7D AE AC 18 1B 02`	Fingerprint and flags
	`00 03`	Number of instance fields
	`D 00 06 salary`	Instance field type and name
	`L 00 07 hireDay`	Instance field type and name
	`74 00 10 Ljava/util/Date;`	Instance field class name `String`
	`L 00 04 name`	Instance field type and name
	`74 00 12 Ljava/lang/String;`	Instance field class name `String`
	`78`	End marker
	`70`	No superclass

These descriptors are fairly long. If the same class descriptor is needed again in the file, then an abbreviated form is used:

`71`	4-byte serial number

The serial number refers to the previous explicit class descriptor. We will discuss the numbering scheme later.

An object is stored as

`73`	class descriptor	object data

For example, here is how an Employee object is stored:

`40 E8 6A 00 00 00 00 00`		`salary` field value `double`
`73`		`hireDay` field value new object
	`71 00 7E 00 08`	Existing class `java.util.Date`
	`77 08 00 00 00 91 1B 4E B1 80 78`	External storage details later
`74 00 0C Harry Hacker`		`name` field value `String`

As you can see, the data file contains enough information to restore the Employee object.

Arrays are saved in the following format:

`75`	class descriptor	4-byte number of entries	entries

The array class name in the class descriptor is in the same format as that used by native methods (which is slightly different from the class name used by class names in other class descriptors). In this format, class names start with an L and end with a semicolon.

For example, an array of three Employee objects starts out like this:

`75`			Array
	`72 00 0B [LEmployee;`		New class, string length, class name `Employee[]`
		`FC BF 36 11 C5 91 11 C7 02`	Fingerprint and flags
		`00 00`	Number of instance fields
		`78`	End marker
		`70`	No superclass
		`00 00 00 03`	Number of array entries

Note that the fingerprint for an array of Employee objects is different from a fingerprint of the Employee class itself.

Of course, studying these codes can be about as exciting as reading the average phone book. But it is still instructive to know that the object stream contains a detailed description of all the objects that it contains, with sufficient detail to allow reconstruction of both objects and arrays of objects.

The Problem of Saving Object References

We now know how to save objects that contain numbers, strings, or other simple objects. However, there is one important situation that we still need to consider. What happens when one object is shared by several objects as part of its state?

To illustrate the problem, let us make a slight modification to the Manager class. Let's assume that each manager has a secretary, implemented as an instance variable secretary of type Employee. (It would make sense to derive a class Secretary from Employee for this purpose, but we will not do that here.)

 class Manager extends Employee {    . . .    private Employee secretary; }

Having done this, you must keep in mind that the Manager object now contains a reference to the Employee object that describes the secretary, not a separate copy of the object.

In particular, two managers can share the same secretary, as is the case in Figure 12-5 and the following code:

 harry = new Employee("Harry Hacker", . . .); Manager carl = new Manager("Carl Cracker", . . .); carl.setSecretary(harry); Manager tony = new Manager("Tony Tester", . . .); tony.setSecretary(harry);

Figure 12-5. Two managers can share a mutual employee

graphics/12fig05.gif

Now, suppose we write the employee data to disk. What we don't want is for the Manager to save its information according to the following logic:

Save employee data;
Save secretary data.

Then, the data for harry would be saved three times. When reloaded, the objects would have the configuration shown in Figure 12-6.

Figure 12-6. Here, Harry is saved three times

graphics/12fig06.gif

This is not what we want. Suppose the secretary gets a raise. We would not want to hunt for all other copies of that object and apply the raise as well. We want to save and restore only one copy of the secretary. To do this, we must copy and restore the original references to the objects. In other words, we want the object layout on disk to be exactly like the object layout in memory. This is called persistence in object-oriented circles.

Of course, we cannot save and restore the memory addresses for the secretary objects. When an object is reloaded, it will likely occupy a completely different memory address than it originally did.

Instead, Java uses a serialization approach. Hence, the name object serialization for this mechanism. Here is the algorithm:

All objects that are saved to disk are given a serial number (1, 2, 3, and so on, as shown in Figure 12-7).
Figure 12-7. An example of object serialization
When saving an object to disk, find out if the same object has already been stored.
If it has been stored previously, just write "same as previously saved object with serial number x." If not, store all its data.

When reading back the objects, simply reverse the procedure. For each object that you load, note its sequence number and remember where you put it in memory. When you encounter the tag "same as previously saved object with serial number x," you look up where you put the object with serial number x and set the object reference to that memory address.

Note that the objects need not be saved in any particular order. Figure 12-8 shows what happens when a manager occurs first in the staff array.

Figure 12-8. Objects saved in random order

graphics/12fig08.gif

All of this sounds confusing, and it is. Fortunately, when object streams are used, the process is also completely automatic. Object streams assign the serial numbers and keep track of duplicate objects. The exact numbering scheme is slightly different from that used in the figures see the next section.

In this chapter, we use serialization to save a collection of objects to a disk file and retrieve it exactly as we stored it. Another very important application is the transmittal of a collection of objects across a network connection to another computer. Just as raw memory addresses are meaningless in a file, they are also meaningless when communicating with a different processor. Since serialization replaces memory addresses with serial numbers, it permits the transport of object collections from one machine to another. We will study that use of serialization when discussing remote method invocation in Volume 2.

Example 12-5 is a program that saves and reloads a network of employee and manager objects (some of which share the same employee as a secretary). Note that the secretary object is unique after reloading when newStaff[1] gets a raise, that is reflected in the secretary fields of the managers.

Example 12-5 ObjectRefTest.java

   1. import java.io.*;   2. import java.util.*;   3.   4. class ObjectRefTest   5. {   6.    public static void main(String[] args)   7.    {   8.       Employee harry = new Employee("Harry Hacker", 50000,   9.          1989, 10, 1);  10.       Manager boss = new Manager("Carl Cracker", 80000,  11.          1987, 12, 15);  12.       boss.setSecretary(harry);  13.  14.       Employee[] staff = new Employee[3];  15.  16.       staff[0] = boss;  17.       staff[1] = harry;  18.       staff[2] = new Employee("Tony Tester", 40000,  19.          1990, 3, 15);  20.  21.       try  22.       {  23.          // save all employee records to the file employee.dat  24.          ObjectOutputStream out = new ObjectOutputStream(new  25.             FileOutputStream("employee.dat"));  26.          out.writeObject(staff);  27.          out.close();  28.  29.          // retrieve all records into a new array  30.          ObjectInputStream in =  new ObjectInputStream(new  31.             FileInputStream("employee.dat"));  32.          Employee[] newStaff = (Employee[])in.readObject();  33.          in.close();  34.  35.          // raise secretary's salary  36.          newStaff[1].raiseSalary(10);  37.  38.          // print the newly read employee records  39.          for (int i = 0; i < newStaff.length; i++)  40.             System.out.println(newStaff[i]);  41.       }  42.       catch (Exception e)  43.       {  44.          e.printStackTrace();  45.       }  46.    }  47. }  48.  49. class Employee implements Serializable  50. {  51.    public Employee() {}  52.  53.    public Employee(String n, double s,  54.       int year, int month, int day)  55.    {  56.       name = n;  57.       salary = s;  58.       GregorianCalendar calendar  59.          = new GregorianCalendar(year, month - 1, day);  60.          // GregorianCalendar uses 0 = January  61.       hireDay = calendar.getTime();  62.    }  63.  64.    public String getName()  65.    {  66.       return name;  67.    }  68.  69.    public double getSalary()  70.    {  71.       return salary;  72.    }  73.  74.    public Date getHireDay()  75.    {  76.       return hireDay;  77.    }  78.  79.    public void raiseSalary(double byPercent)  80.    {  81.       double raise = salary * byPercent / 100;  82.       salary += raise;  83.    }  84.  85.    public String toString()  86.    {  87.       return getClass().getName()  88.          + "[name=" + name  89.          + ",salary=" + salary  90.          + ",hireDay=" + hireDay  91.          + "]";  92.    }  93.  94.    private String name;  95.    private double salary;  96.    private Date hireDay;  97. }  98.  99. class Manager extends Employee 100. { 101.    /** 102.       Constructs a Manager without a secretary 103.       @param n the employee's name 104.       @param s the salary 105.       @param year the hire year 106.       @param month the hire month 107.       @param day the hire day 108.    */ 109.    public Manager(String n, double s, 110.       int year, int month, int day) 111.    { 112.       super(n, s, year, month, day); 113.       secretary = null; 114.    } 115. 116.    /** 117.       Assigns a secretary to the manager. 118.       @param s the secretary 119.    */ 120.    public void setSecretary(Employee s) 121.    { 122.       secretary = s; 123.    } 124. 125.    public String toString() 126.    { 127.       return super.toString() 128.         + "[secretary=" + secretary 129.         + "]"; 130.    } 131. 132.    private Employee secretary; 133. }

Output Format for Object References

This section continues the discussion of the output format of object streams. If you skipped the previous discussion, you should skip this section as well.

All objects (including arrays and strings) and all class descriptors are given serial numbers as they are saved in the output file. This process is referred to as serialization because every saved object is assigned a serial number. (The count starts at 00 7E 00 00.)

We already saw that a full class descriptor for any given class occurs only once. Subsequent descriptors refer to it. For example, in our previous example, the second reference to the Day class in the array of days was coded as

 71 00 7E 00 02

The same mechanism is used for objects. If a reference to a previously saved object is written, it is saved in exactly the same way, that is, 71 followed by the serial number. It is always clear from the context whether the particular serial reference denotes a class descriptor or an object.

Finally, a null reference is stored as

Here is the commented output of the ObjectRefTest program of the preceding section. If you like, run the program, look at a hex dump of its data file employee.dat, and compare it with the commented listing. The important lines toward the end of the output show the reference to a previously saved object.

`AC ED 00 05`					File header
`75`					Array `staff` (serial #1)
	`72 00 0B [LEmployee;`				New class, string length, class name `Employee[]` (serial #0)
		`FC BF 36 11 C5 91 11 C7 02`			Fingerprint and flags
		`00 00`			Number of instance fields
		`78`			End marker
		`70`			No superclass
		`00 00 00 03`			Number of array entries
	`73`				`staff[0]` new object (serial #7)
		`72 00 07 Manager`			New class, string length, class name (serial #2)
			`36 06 AE 13 63 8F 59 B7 02`		Fingerprint and flags
			`00 01`		Number of data fields
			`L 00 09 secretary`		Instance field type and name
			`74 00 0A LEmployee;`		Instance field class name `String` (serial #3)
			`78`		End marker
			`72 00 08 Employee`		Superclass new class, string length, class name (serial #4)
				`E6 D2 86 7D AE AC 18 1B 02`	Fingerprint and flags
				`00 03`	Number of instance fields
				`D 00 06 salary`	Instance field type and name
				`L 00 07 hireDay`	Instance field type and name
				`74 00 10 Ljava/util/Date;`	Instance field class name `String` (serial #5)
				`L 00 04 name`	Instance field type and name
				`74 00 12 Ljava/lang/String;`	Instance field class name `String` (serial #6)
				`78`	End marker
				`70`	No superclass
		`40 F3 88 00 00 00 00 00`			`salary` field value `double`
		`73`			`hireDay` field value new object (serial #9)
			`72 00 0E java.util.Date`		New class, string length, class name (serial #8)
				`68 6A 81 01 4B 59 74 19 03`	Fingerprint and flags
				`00 00`	No instance variables
				`78`	End marker
				`70`	No superclass
			`77 08`		External storage, number of bytes
			`00 00 00 83 E9 39 E0 00`		Date
			`78`		End marker
		`74 00 0C Carl Cracker`			`name` field value `String` (serial #10)
		`73`			`secretary` field value new object (serial #11)
			`71 00 7E 00 04`		existing class (use serial #4)
			`40 E8 6A 00 00 00 00 00`		`salary` field value `double`
			`73`		`hireDay` field value new object (serial #12)
				`71 00 7E 00 08`	Existing class (use serial #8)
				`77 08`	External storage, number of bytes
				`00 00 00 91 1B 4E B1 80`	Date
				`78`	End marker
			`74 00 0C Harry Hacker`		`name` field value `String` (serial #13)
	`71 00 7E 00 0B`				`staff[1]` existing object (use serial #11)
	`73`				`staff[2]` new object (serial #14)
		`71 00 7E 00 04`			Existing class (use serial #4)
		`40 E3 88 00 00 00 00 00`			`salary` field value `double`
		`73`			`hireDay` field value new object (serial #15)
			`71 00 7E 00 08`		Existing class (use serial #8)
			`77 08`		External storage, number of bytes
			`00 00 00 94 6D 3E EC 00 00`		Date
			`78`		End marker
		`74 00 0B Tony Tester`			`name` field value `String` (serial # 16)

It is usually not important to know the exact file format (unless you are trying to create an evil effect by modifying the data see the next section). What you should remember is this:

The object stream output contains the types and data fields of all objects.
Each object is assigned a serial number.
Repeated occurrences of the same object are stored as references to that serial number.

Modifying the Default Serialization Mechanism

Certain data fields should never be serialized, for example, integer values that store file handles or handles of windows that are only meaningful to native methods. Such information is guaranteed to be useless when you reload an object at a later time or transport it to a different machine. In fact, improper values for such fields can actually cause native methods to crash. Java has an easy mechanism to prevent such fields from ever being serialized. Mark them with the keyword transient. You also need to tag fields as transient if they belong to nonserializable classes. Transient fields are always skipped when objects are serialized.

The serialization mechanism provides a way for individual classes to add validation or any other desired action to the default read and write behavior. A serializable class can define methods with the signature

 private void readObject(ObjectInputStream in)    throws IOException, ClassNotFoundException; private void writeObject(ObjectOutputStream out)    throws IOException;

Then, the data fields are no longer automatically serialized, and these methods are called instead.

Here is a typical example. A number of classes in the java.awt.geom package, such as Point2D.Double, are not serializable. Now suppose you want to serialize a class LabeledPoint that stores a String and a Point2D.Double. First, you need to mark the Point2D.Double field as transient to avoid a NotSerializableException.

 public class LabeledPoint {    . . .    private String label;    private transient Point2D.Double point; }

In the writeObject method, we first write the object descriptor and the String field, state, by calling the defaultWriteObject method. This is a special method of the ObjectOutputStream class that can only be called from within a writeObject method of a serializable class. Then we write the point coordinates, using the standard DataOutput calls.

 private void writeObject(ObjectOutputStream out)    throws IOException {    out.defaultWriteObject();    out.writeDouble(point.getX());    out.writeDouble(point.getY()); }

In the readObject method, we reverse the process:

 private void readObject(ObjectInputStream in)    throws IOException {    in.defaultReadObject();    double x = in.readDouble();    double y = in.readDouble();    point = new Point2D.Double(x, y); }

Another example is the java.util.Date class that supplies its own readObject and writeObject methods. These methods write the date as a number of milliseconds from the epoch (January 1, 1970, midnight UTC). The Date class has a complex internal representation that stores both a Calendar object and a millisecond count, to optimize lookups. The state of the Calendar is redundant and does not have to be saved.

The readObject and writeObject methods only need to save and load their data fields. They should not concern themselves with superclass data or any other class information.

Rather than letting the serialization mechanism save and restore object data, a class can define its own mechanism. To do this, a class must implement the Externalizable interface. This in turn requires it to define two methods:

 public void readExternal(ObjectInputStream in)   throws IOException, ClassNotFoundException; public void writeExternal(ObjectOutputStream out)   throws IOException;

Unlike the readObject and writeObject methods that were described in the preceding section, these methods are fully responsible for saving and restoring the entire object, including the superclass data. The serialization mechanism merely records the class of the object in the stream. When reading an externalizable object, the object stream creates an object with the default constructor and then calls the readExternal method. Here is how you can implement these methods for the Employee class:

 public void readExternal(ObjectInput s)    throws IOException {    name = s.readUTF();    salary = s.readUTF();    hireDay = new Date(s.readLong()); } public void writeExternal(ObjectOutput s)    throws IOException {   s.writeUTF(name);   s.writeDouble(salary);   s.writeLong(hireDay.getTime()); }

Serialization is somewhat slow because the virtual machine must discover the structure of each object. If you are very concerned about performance and if you read and write a large number of objects of a particular class, you should investigate the use of the Externalizable interface. The tech tip http://developer.java.sun.com/developer/TechTips/txtarchive/Apr00_Stu.txt demonstrates that in the case of an employee class, using external reading and writing was about 35-40% faster than the default serialization.

Unlike the readObject and writeObject methods, which are private and can only be called by the serialization mechanism, the readExternal and writeExternal methods are public. In particular, readExternal potentially permits modification of the state of an existing object.

For even more exotic variations of serialization, see http://www.absolutejava.com/serialization.

Serializing Typesafe Enumerations

You have to pay particular attention when serializing and deserializing objects that are assumed to be unique. This commonly happens when implementing typesafe enumerations.

An enumerated type is a data type with a finite number of values. The Java programming language has no built-in mechanism for enumerated types. They are often simulated with sets of numbers or strings, but such a simulation is not typesafe. Consider for example the JSlider class. You can construct a slider by specifying an orientation, minimum and maximum values, and the current value. Here is an example:

 JSlider slider = new JSlider(SwingConstants.HORIZONTAL,    0, 100, 50);

The SwingConstants interface defines the constant HORIZONTAL as an integer with value 1.

Now suppose a harried programmer doesn't remember the order of the parameters and writes

 JSlider slider = new JSlider(0, 100, 50,    SwingConstants.HORIZONTAL); // wrong order of parameters

This call compiles with no error since the compiler just looks for four values of type int.

The problem could be solved if the first parameter had a separate type, say, Orientation. Then the compiler can report a type error if an int is passed instead of a value of type Orientation.

In the Java programming language, all types need to be implemented as classes. A class representing an enumerated type is special: we want to make sure that only a finite number of objects can be created. This is achieved in the following way:

 public class Orientation {    public static final Orientation HORIZONTAL       = new Orientation(1);    public static final Orientation VERTICAL       = new Orientation(2);    private Orientation(int v) { value = v; }    private int value; }

Note that the constructor is private. Thus, no objects can be created beyond Orientation.HORIZONTAL and Orientation.VERTICAL. In particular, you can use the == operator to test for object equality:

 if (orientation == Orientation.HORIZONTAL) . . .

This programming idiom is called a typesafe enumeration.

There is an important twist that you need to remember when a typesafe enumeration implements the Serializable interface. The default serialization mechanism is not appropriate. Suppose we write a value of type Orientation and read it in again:

 Orientation original = Orientation.HORIZONTAL; ObjectOutputStream out = . . .; out.write(value); out.close(); ObjectInputStream in = . . .; Orientation saved = (Orientation)in.read();

Now the test

 if (saved == Orientation.HORIZONTAL) . . .

will fail. In fact, the saved value is a completely new object of the Orientation type and not equal to any of the predefined constants. Even though the constructor is private, the serialization mechanism can create new objects!

To solve this problem, you need to define another special serialization methods, called readResolve. If the readResolve method is defined, it is called after the object is deserialized. It must return an object that then becomes the return value of the readObject method. In our case, the readResolve method will inspect the value field and return the appropriate enumerated constant:

 protected Object readResolve() throws ObjectStreamException {    if (value == 1) return Orientation.HORIZONTAL;    if (value == 2) return Orientation.VERTICAL;    return null; // this shouldn't happen }

Remember to add a readResolve method to all typesafe enumerations. Also note that the enumeration class must store a value from which the constant can be recovered.

Versioning

In the past sections, we showed you how to save relatively small collections of objects via an object stream. But those were just demonstration programs. With object streams, it helps to think big. Suppose you write a program that lets the user produce a document. This document contains paragraphs of text, tables, graphs, and so on. You can stream out the entire document object with a single call to writeObject:

 out.writeObject(doc);

The paragraph, table, and graph objects are automatically streamed out as well. One user of your program can then give the output file to another user who also has a copy of your program, and that program loads the entire document with a single call to readObject:

 doc = (Document)in.readObject();

This is very useful, but your program will inevitably change, and you will release a version 1.1. Can version 1.1 read the old files? Can the users who still use 1.0 read the files that the new version is now producing? Clearly, it would be desirable if object files could cope with the evolution of classes.

At first glance it seems that this would not be possible. When a class definition changes in any way, then its SHA fingerprint also changes, and you know that object streams will refuse to read in objects with different fingerprints. However, a class can indicate that it is compatible with an earlier version of itself. To do this, you must first obtain the fingerprint of the earlier version of the class. You use the stand-alone serialver program that is part of the SDK to obtain this number. For example, running

 serialver Employee

prints out

 Employee:    static final long serialVersionUID = -1814239825517340645L;

If you start the serialver program with the -show option, then the program brings up a graphical dialog box (see Figure 12-9).

Figure 12-9. The graphical version of the `serialver` program

graphics/12fig09.gif

All later versions of the class must define the serialVersionUID constant to the same fingerprint as the original.

 class Employee // version 1.1 {  . . .    public static final long serialVersionUID       = -1814239825517340645L; }

When a class has a static data member named serialVersionUID, it will not compute the fingerprint manually but instead will use that value.

Once that static data member has been placed inside a class, the serialization system is now willing to read in different versions of objects of that class.

If only the methods of the class change, then there is no problem with reading the new object data. However, if data fields change, then you may have problems. For example, the old file object may have more or fewer data fields than the one in the program, or the types of the data fields may be different. In that case, the object stream makes an effort to convert the stream object to the current version of the class.

The object stream compares the data fields of the current version of the class with the data fields of the version in the stream. Of course, the object stream considers only the nontransient and nonstatic data fields. If two fields have matching names but different types, then the object stream makes no effort to convert one type to the other the objects are incompatible. If the object in the stream has data fields that are not present in the current version, then the object stream ignores the additional data. If the current version has data fields that are not present in the streamed object, the added fields are set to their default (null for objects, zero for numbers and false for Boolean values).

Here is an example. Suppose we have saved a number of employee records on disk, using the original version (1.0) of the class. Now we change the Employee class to version 2.0 by adding a data field called department. Figure 12-10 shows what happens when a 1.0 object is read into a program that uses 2.0 objects. The department field is set to null. Figure 12-11 shows the opposite scenario: a program using 1.0 objects reads a 2.0 object. The additional department field is ignored.

Figure 12-10. Reading an object with fewer data fields

graphics/12fig10.gif

Figure 12-11. Reading an object with more data fields

graphics/12fig11.gif

Is this process safe? It depends. Dropping a data field seems harmless the recipient still has all the data that it knew how to manipulate. Setting a data field to null may not be so safe. Many classes work hard to initialize all data fields in all constructors to non-null values, so that the methods don't have to be prepared to handle null data. It is up to the class designer to implement additional code in the readObject method to fix version incompatibilities or to make sure the methods are robust enough to handle null data.

Using Serialization for Cloning

There is an amusing (and, occasionally, very useful) use for the serialization mechanism: it gives you an easy way to clone an object provided the class is serializable. (Recall from Chapter 6 that you need to do a bit of work to allow an object to be cloned.)

To clone a serializable object, simply serialize it to an output stream, and then read it back in. The result is a new object that is a deep copy of the existing object. You don't have to write the object to a file you can use a ByteArrayOutputStream to save the data into a byte array.

As Example 12-6 shows, to get clone for free, simply derive from the SerialCloneable class, and you are done.

You should be aware that this method, although clever, will usually be much slower than a clone method that explicitly constructs a new object and copies or clones the data fields (as you saw in Chapter 6).

Example 12-6 SerialCloneTest.java

   1. import java.io.*;   2. import java.util.*;   3.   4. public class SerialCloneTest   5. {   6.    public static void main(String[] args)   7.    {   8.       Employee harry = new Employee("Harry Hacker", 35000,   9.          1989, 10, 1);  10.       // clone harry  11.       Employee harry2 = (Employee)harry.clone();  12.  13.       // mutate harry  14.       harry.raiseSalary(10);  15.  16.       // now harry and the clone are different  17.       System.out.println(harry);  18.       System.out.println(harry2);  19.    }  20. }  21.  22. /**  23.    A class whose clone method uses serialization.  24. */  25. class SerialCloneable implements Cloneable, Serializable  26. {  27.    public Object clone()  28.    {  29.       try  30.       {  31.          // save the object to a byte array  32.          ByteArrayOutputStream bout = new  33.             ByteArrayOutputStream();  34.          ObjectOutputStream out  35.             = new ObjectOutputStream(bout);  36.          out.writeObject(this);  37.          out.close();  38.  39.          // read a clone of the object from the byte array  40.          ByteArrayInputStream bin = new  41.             ByteArrayInputStream(bout.toByteArray());  42.          ObjectInputStream in = new ObjectInputStream(bin);  43.          Object ret = in.readObject();  44.          in.close();  45.  46.          return ret;  47.       }  48.       catch (Exception e)  49.       {  50.          return null;  51.       }  52.    }  53. }  54.  55. /**  56.    The familiar Employee class, redefined to extend the  57.    SerialCloneable class.  58. */  59. class Employee extends SerialCloneable  60. {  61.    public Employee(String n, double s,  62.       int year, int month, int day)  63.    {  64.       name = n;  65.       salary = s;  66.       GregorianCalendar calendar  67.          = new GregorianCalendar(year, month - 1, day);  68.          // GregorianCalendar uses 0 = January  69.       hireDay = calendar.getTime();  70.    }  71.  72.    public String getName()  73.    {  74.       return name;  75.    }  76.  77.    public double getSalary()  78.    {  79.       return salary;  80.    }  81.  82.    public Date getHireDay()  83.    {  84.       return hireDay;  85.    }  86.  87.    public void raiseSalary(double byPercent)  88.    {  89.       double raise = salary * byPercent / 100;  90.       salary += raise;  91.    }  92.  93.    public String toString()  94.    {  95.       return getClass().getName()  96.          + "[name=" + name  97.          + ",salary=" + salary  98.          + ",hireDay=" + hireDay  99.          + "]"; 100.    } 101. 102.    private String name; 103.    private double salary; 104.    private Date hireDay; 105. }

Top