Certification Objective Serialization (Exam Objective 3.3) | SCJP Sun Certified Programmer for Java 5 Study Guide (Exam 310-055) (Certification Press)

Certification Objective —Serialization (Exam Objective 3.3)

3.3 Develop code that serializes and/or de-serializes objects using the following APIs from java.io: DataInputStream, DataOutputStream, FilelnputStream, FileOutputStream, ObjectInputStream, ObjectOutputStream, and Serializable.

Imagine you want to save the state of one or more objects. If Java didn't have serialization (as the earliest version did not), you'd have to use one of the I/O classes to write out the state of the instance variables of all the objects you want to save. The worst part would be trying to reconstruct new objects that were virtually identical to the objects you were trying to save. You'd need your own protocol for the way in which you wrote and restored the state of each object, or you could end up setting variables with the wrong values. For example, imagine you stored an object that has instance variables for height and weight. At the time you save the state of the object, you could write out the height and weight as two ints in a file, but the order in which you write them is crucial. It would be all too easy to re-create the object but mix up the height and weight values—using the saved height as the value for the new object's weight and vice versa.

Serialization lets you simply say "save this object and all of its instance variables." Actually its a little more interesting than that, because you can add, "… unless I've explicitly marked a variable as transient, which means, don't include the transient variable's value as part of the object's serialized state."

Working with ObjectOutputStream and ObjectInputStream

The magic of basic serialization happens with just two methods: one to serialize objects and write them to a stream, and a second to read the stream and deserialize objects.

 ObjectOutputStream.writeObject()   // serialize and write ObjectInputStream.readObject()     // read and deserialize

The java.io.ObjectOutputStream and java.io.ObjectInputStream classes are considered to be higher-level classes in the java.io package, and as we learned earlier, that means that you'll wrap them around lower-level classes, such as java.io.FileOutputStream and java.io.FilelnputStream. Here's a small program that creates a (Cat) object, serializes it, and then deserializes it:

 import java.io.*; class Cat implements Serializable { }   //1 public class SerializeCat {   public static void main(String[] args) {     Cat c = new Cat();  // 2     try {       FileOutputStream fs = new FileOutputStream("testSer.ser");       ObjectOutputStream os = new ObjectOutputStream(fs);       os.writeObject(c);   // 3       os.close();     } catch (Exception e) { e.printStackTrace (); }     try {       FileInputStream fis = new FileInputStream("testSer.ser");       ObjectInputStream ois = new ObjectInputStream(fis);       c = (Cat) ois.readObject();  // 4       ois.close();     } catch (Exception e) { e.printStackTrace(); }   } }

Let's take a look at the key points in this example:

We declare that the Cat class implements the Serializable interface. Serializable is a marker interface; it has no methods to implement. (In the next several sections, we'll cover various rules about when you need to declare classes Serializable.)
We make a new Cat object, which as we know is Serializable.
We serialize the Cat object c by invoking the writeObject() method. It took a fair amount of preparation before we could actually serialize our Cat. First, we had to put all of our I/O-related code in a try/catch block. Next we had to create a FileOutputStream to write the object to. Then we wrapped the FileOutputStream in an ObjectOutputStream, which is the class that has the magic serialization method that we need. Remember that the invocation of writeObject() performs two tasks: it serializes the object, and then it writes the serialized object to a file.
We de-serialize the Cat object by invoking the readObject() method. The readObject() method returns an Object, so we have to cast the deserialized object back to a Cat. Again, we had to go through the typical I/O hoops to set this up.

This is a bare-bones example of serialization in action. Over the next set of pages we'll look at some of the more complex issues that are associated with serialization.

Object Graphs

What does it really mean to save an object? If the instance variables are all primitive types, it's pretty straightforward. But what if the instance variables are themselves references to objects? What gets saved? Clearly in Java it wouldn't make any sense to save the actual value of a reference variable, because the value of a Java reference has meaning only within the context of a single instance of a JVM. In other words, if you tried to restore the object in another instance of the JVM, even running on the same computer on which the object was originally serialized, the reference would be useless.

But what about the object that the reference refers to? Look at this class:

 class Dog {    private Collar theCollar;    private int dogSize;    public Dog(Collar collar, int size) {      theCollar = collar;      dogSize = size;    }    public Collar getCollar() { return theCollar; } } class Collar {    private int collarSize;    public Collar(int size) { collarSize = size; }    public int getCollarSize(} { return collarSize; } }

Now make a dog… First, you make a Collar for the Dog:

 Collar c = new Collar(3);

Then make a new Dog, passing it the Collar:

 Dog d = new Dog(c, 8);

Now what happens if you save the Dog? If the goal is to save and then restore a Dog, and the restored Dog is an exact duplicate of the Dog that was saved, then the Dog needs a Collar that is an exact duplicate of the Dog's Collar at the time the Dog was saved. That means both the Dog and the Collar should be saved.

And what if the Collar itself had references to other objects—like perhaps a Color object? This gets quite complicated very quickly. If it were up to the programmer to know the internal structure of each object the Dog referred to, so that the programmer could be sure to save all the state of all those objects…whew. That would be a nightmare with even the simplest of objects.

Fortunately, the Java serialization mechanism takes care of all of this. When you serialize an object, Java serialization takes care of saving that object's entire "object graph." That means a deep copy of everything the saved object needs to be restored. For example, if you serialize a Dog object, the Collar will be serialized automatically. And if the Collar class contained a reference to another object, THAT object would also be serialized, and so on. And the only object you have to worry about saving and restoring is the Dog. The other objects required to fully reconstruct that Dog are saved (and restored) automatically through serialization.

Remember, you do have to make a conscious choice to create objects that are serializable, by implementing the Serializable interface. If we want to save Dog objects, for example, we'll have to modify the Dog class as follows:

 class Dog implements Serializable {    // the rest of the code as before    // Serializable has no methods to implement }

And now we can save the Dog with the following code:

 import java.io.* ; public class SerializeDog {   public static void main(String[] args) {     Collar c = new Collar(3);     Dog d = new Dog(c, 8);     try {       FileOutputStream fs = new FileOutputStream("testSer.ser");       ObjectOutputStream os = new ObjectOutputStream(fs);       os.writeobject(d);       os.close();     } catch (Exception e) { e.printStackTrace(); }   } }

But when we run this code we get a runtime exception something like this

 java.io.NotSerializableException: Collar

What did we forget? The Collar class must ALSO be Serializable. If we modify the Collar class and make it serializable, then there's no problem:

 class Collar implements Serializable {    // same }

Here's the complete listing:

 import java.io.*; public class SerializeDog {   public static void main(String [] args) {     Collar c = new Collar(3);     Dog d = new Dog(c, 5);     System.out.println("before: collar size is "                        + d.getCollar().getCollarSize());     try {       FileOutputStream fs = new FileOutputStream("testSer.ser");       ObjectOutputStream os = new ObjectOutputStream(fs);       os.writeObject(d) ;       os.close();     } catch (Exception e) { e.printStackTrace(); }     try {       FileInputStream fis = new FileInputStream("testSer.ser");       ObjectInputStream ois = new ObjectlnputStream(fis);       d = (Dog) ois.readObject();       ois.close();     } catch (Exception e) { e.printStackTrace(); }     System.out.println("after: collar size is "                        + d.getCollar() .getCollarSize());   } } class Dog implements Serializable {    private Collar theCollar;    private int dogSize;    public Dog(Collar collar, int size) {      theCollar = collar;      dogSize = size;    }    public Collar getCollar() { return theCollar; } } class Collar implements Serializable {    private int collarSize;    public Collar(int size) { collarSize = size; }    public int getCollarSize() { return collarSize; } }

This produces the output:

 before: collar size is 3 after:  collar size is 3

But what would happen if we didn't have access to the Collar class source code? In other words, what if making the Collar class serializable was not an option? Are we stuck with a non-serializable Dog?

Obviously we could subclass the Collar class, mark the subclass as Serializable, and then use the Collar subclass instead of the Collar class. But that's not always an option either for several potential reasons:

The Collar class might be final, preventing subclassing.

OR
The Collar class might itself refer to other non-serializable objects, and without knowing the internal structure of Collar, you aren't able to make all these fixes (assuming you even wanted to TRY to go down that road).

OR
Subclassing is not an option for other reasons related to your design.

So…THEN what do you do if you want to save a Dog?

That's where the transient modifier comes in. If you mark the Dog's Collar instance variable with transient, then serialization will simply skip the Collar during serialization:

 class Dog implements Serializable {    private transient Collar theCollar;  // add transient    // the rest of the class as before } class Collar {             // no longer Serializable    // same code }

Now we have a Serializable Dog, with a non-serializable Collar, but the Dog has marked the Collar transient; the output is

 before: collar size is 3 Exception in thread "main" java.lang.NullPointerException

So NOW what can we do?

Using WriteObject and ReadObject

Consider the problem: we have a Dog object we want to save. The Dog has a Collar, and the Collar has state that should also be saved as part of the Dog's state. But…the Collar is not Serializable, so we must mark it transient. That means when the Dog is deserialized, it comes back with a null Collar. What can we do to somehow make sure that when the Dog is deserialized, it gets a new Collar that matches the one the Dog had when the Dog was saved?

Java serialization has a special mechanism just for this—a set of private methods you can implement in your class that, if present, will be invoked automatically during serialization and deserialization. It's almost as if the methods were defined in the Serializable interface, except they aren't. They are part of a special callback contract the serialization system offers you that basically says, "If you (the programmer) have a pair of methods matching this exact signature (you'll see them in a moment), these methods will be called during the serialization/deserialization process.

These methods let you step into the middle of serialization and deserialization. So they're perfect for letting you solve the Dog/Collar problem: when a Dog is being saved, you can step into the middle of serialization and say, "By the way, I'd like to add the state of the Collar's variable (an int) to the stream when the Dog is serialized." You've manually added the state of the Collar to the Dog's serialized representation, even though the Collar itself is not saved.

Of course, you'll need to restore the Collar during deserialization by stepping into the middle and saying, "I'll read that extra int I saved to the Dog stream, and use it to create a new Collar, and then assign that new Collar to the Dog that's being deserialized." The two special methods you define must have signatures that look EXACTLY like this:

 private void writeObject(ObjectOutputstream os) {   // your code for saving the Collar variables } private void readObject(Objectlnputstream os) {    // your code to read the Collar state, create a new Collar,    // and assign it to the Dog }

Yes, we're going to write methods that have the same name as the ones we've been calling! Where do these methods go? Let's change the Dog class:

 class Dog implements Serializable {   transient private Collar theCollar; // we can't serialize this   private int dogSize;   public Dog(Collar collar, int size) {     theCollar = collar;     dogSize = size;   }   public Collar getCollar() { return theCollar; }   private void writeObject(ObjectOutputStream os) {     //  throws IOException {                           // 1    try {     os.defaultWriteObject();                           // 2     os.writeInt(theCollar.getCollarSize());            // 3    } catch (Exception e) { e.printStackTrace(); }   }   private void readObject(ObjectlnputStream is) {     //   throws IOException, ClassNotFoundException {  // 4    try {     is.defaultReadObject();                            // 5     theCollar = new Collar(is.readInt());              // 6    } catch (Exception e) { e.printStackTrace(); }   } }

Let's take a look at the preceding code.

In our scenario we've agreed that, for whatever real-world reason, we can't serialize a Collar object, but we want to serialize a Dog. To do this we're going to implement writeObject() and readobject(). By implementing these two methods you're saying to the compiler: "If anyone invokes writeObject() or readObject() concerning a Dog object, use this code as part of the read and write".

Like most I/O-related methods writeobject() can throw exceptions. You can declare them or handle them but we recommend handling them.
When you invoke defaultWriteobject() from within writeObject() you're telling the JVM to do the normal serialization process for this object. When implementing writeObject(), you will typically request the normal serialization process, and do some custom writing and reading too.
In this case we decided to write an extra int (the collar size) to the stream that's creating the serialized Dog. You can write extra stuff before and/or after you invoke defaultWriteobject(). BUT…when you read it back in, you have to read the extra stuff in the same order you wrote it.
Again, we chose to handle rather than declare the exceptions.
When it's time to deserialize, defaultReadobject() handles the normal deserialization you'd get if you didn't implement a readobject() method.
Finally we build a new Collar object for the Dog using the collar size that we manually serialized. (We had to invoke readInt() after we invoked defaultReadObject() or the streamed data would be out of sync!)

Remember, the most common reason to implement writeObject() and readobject() is when you have to save some part of an object's state manually. If you choose, you can write and read ALL of the state yourself, but that's very rare. So, when you want to do only a part of the serialization/deserialization yourself, you MUST invoke the defaultReadObject() and defaultWriteObject() methods to do the rest.

Which brings up another question—why wouldn't all Java classes be serializable? Why isn't class Object serializable? There are some things in Java that simply cannot be serialized because they are runtime specific. Things like streams, threads, runtime, etc. and even some GUI classes (which are connected to the underlying OS) cannot be serialized. What is and is not serializable in the Java API is NOT part of the exam, but you'll need to keep them in mind if you're serializing complex objects.

How Inheritance Affects Serialization

Serialization is very cool, but in order to apply it effectively you're going to have to understand how your class's superclasses affect serialization.

Exam Watch

If a superclass is Serializable, then according to normal Java interface rules, all subclasses of that class automatically implement Serializable implicitly. In other words, a subclass of a class marked Serializable passes the IS-A test for Serializable, and thus can be saved without having to explicitly mark the subclass as Serializable. You simply cannot tell whether a class is or is not Serializable UNLESS you can see the class inheritance tree to see if any other superclasses implement Serializable. If the class does not explicitly extend any other class, and does not implement Serializable, then you know for CERTAIN that the class is not Serializable, because class Object does NOT implement Serializable.

That brings up another key issue with serialization…what happens if a superclass is not marked Serializable, but the subclass is? Can the subclass still be serialized even if its superclass does not implement Serializable? Imagine this:

 class Animal { } class Dog extends Animal implements Serializable {    // the rest of the Dog code }

Now you have a Serializable Dog class, with a non-Serializable superclass. This works! But there are potentially serious implications. To fully understand those implications, let's step back and look at the difference between an object that comes from deserialization vs. an object created using new. Remember, when an object is constructed using new (as opposed to being deserialized), the following things happen (in this order):

All instance variables are assigned default values.
The constructor is invoked, which immediately invokes the superclass constructor (or another overloaded constructor, until one of the overloaded constructors invokes the superclass constructor).
All superclass constructors complete.
Instance variables that are initialized as part of their declaration are assigned their initial value (as opposed to the default values they're given prior to the superclass constructors completing).
The constructor completes.

But these things do NOT happen when an object is deserialized. When an instance of a Serializable class is deserialized, the constructor does not run, and instance variables arc NOT given their initially assigned values! Think about it—if the constructor were invoked, and/or instance variables were assigned the values given in their declarations, the object you're trying to restore would revert back to its original state, rather than coming back reflecting the changes in its state that happened sometime after it was created. For example, imagine you have a class that declares an instance variable and assigns it the int value 3, and includes a method that changes the instance variable value to 10:

 class Foo implements Serializable {    int num = 3;    void changeNum() {  num =10;    } }

Obviously if you serialize a Foo instance after the changeNum() method runs, the value of the num variable should be 10, When the Foo instance is deserialized, you want the num variable to still be 10! You obviously don't want the initialization (in this case, the assignment of the value 3 to the variable num) to happen. Think of constructors and instance variable assignments together as part of one complete object initialization process (and in fact, they DO become one initialization method in the bytecode). The point is, when an object is deserialized we do NOT want any of the normal initialization to happen. We don't want the constructor to run, and we don't want the explicitly declared values to be assigned. We want only the values saved as part of the serialized state of the object to be reassigned.

Of course if you have variables marked transient, they will not be restored to their original state (unless you implement defaultReadObject()), but will instead be given the default value for that data type. In other words, even if you say

 class Bar implements Serializable {    transient int x = 42; }

when the Bar instance is deserialized, the variable x will be set to a value of 0. Object references marked transient will always be reset to null, regardless of whether they were initialized at the time of declaration in the class.

So, that's what happens when the object is deserialized, and the class of the serialized object directly extends Object, or has ONLY serializable classes in its inheritance tree. It gets a little trickier when the serializable class has one or more non-serializable superclasses. Getting back to our non-serializable Animal class with a serializable Dog subclass example:

 class Animal {    public String name; } class Dog extends Animal implements Serializable {    // the rest of the Dog code }

Because Animal is NOT serializable, any state maintained in the Animal class, even though the state variable is inherited by the Dog, isn't going to be restored with the Dog when it's deserialized! The reason is, the (unserialized) Animal part of the Dog is going to be reinitialized just as it would be if you were making a new Dog (as opposed to deserializing one). That means all the things that happen to an object during construction, will happen—but only to the Animal parts of a Dog. In other words, the instance variables from the Dog's class will be serialized and deserialized correctly, but the inherited variables from the non-serializable Animal superclass will come back with their default/initially assigned values rather than the values they had at the time of serialization.

If you are a serializable class, but your superclass is NOT serializable, then any instance variables you INHERIT from that superclass will be reset to the values they were given during the original construction of the object. This is because the non-serializable class constructor WILL run!

In fact, every constructor ABOVE the first non-serializable class constructor will also run, no matter what, because once the first super constructor is invoked, it of course invokes its super constructor and so on up the inheritance tree.

For the exam, you'll need to be able to recognize which variables will and will not be restored with the appropriate values when an object is deserialized, so be sure to study the following code example and the output:

 import java.io.*; class SuperNotSerial {   public static void main(String [] args) {     Dog d = new Dog(35, "Fido");     System.out.println("before: " + d.name + " "                        + d.weight);     try {       FileOutputStream fs = new FileOutputStream("testSer.ser"};       ObjectOutputStream os = new ObjectOutputStream(fs);       os.writeObject(d);       os.close ();     } catch (Exception e) { e.printStackTrace(); }     try {       FileInputStream fis = new FileInputstream("testSer.ser");       ObjectInputStream ois = new ObjectInputStream(fis);       d = (Dog) ois.readObject();       ois.close();     } catch (Exception e) { e.printStackTrace(); }     System.out.println("after:  " + d.name + " "                        + d.weight);   } } class Dog extends Animal implements Serializable {   String name;   Dog(int w, String n) {     weight = w;          // inherited     name = n;            // not inherited   } } class Animal {           // not serializable !   int weight = 42; }

which produces the output:

 before: Fido 35 after:  Fido 42

The key here is that because Animal is not serializable, when the Dog was deserialized, the Animal constructor ran and reset the Dog's inherited weight variable.

Exam Watch

If you serialize a collection or an array, every element must be serializable! A single non-serializable element will cause serialization to fail. Note also that while the collection interfaces are not serializable, the concrete collection classes in the Java API are.

Serialization is not for Statics

Finally, you might notice that we've talked ONLY about instance variables, not static variables. Should static variables be saved as part of the object's state? Isn't the state of a static variable at the time an object was serialized important? Yes and no. It might be important, but it isn't part of the instance's state at all. Remember, you should think of static variables purely as CLASS variables. They have nothing to do with individual instances. But serialization applies only to OBJECTS. And what happens if you deserialize three different Dog instances, all of which were serialized at different times, and all of which were saved when the value of a static variable in class Dog was different. Which instance would "win"? Which instance's static value would be used to replace the one currently in the one and only Dog class that's currently loaded? See the problem?

Static variables are NEVER saved as part of the object's state…because they do not belong to the object!

Exam Watch

What about DataInputStream and DatoOutputStream? They're in the objectives! It turns out that while the exam was being created, it was decided that those two classes wouldn't be on the exam after all, but someone forgot to remove them from the objectives! So you get a break. That's one less thing you'll have to worry about.

On the Job

As simple as serialization code is to write, versioning problems can occur in the real world. If you save a Dog object using one version of the class, but attempt to deserialize it using a newer, different version of the class, deserialization might fail. See the Java API for details about versioning issues and solutions.