2.2 JDK Binary Serialization | Core Java Data Objects

Serialization is the basic persistence support that is built into the Java language. It is called serialization because the objects are written or read sequentially as a series of bytes. An object is serializable if its class implements the serializable interface and all its non-transient members are as well serializable. When an object is serialized, the default algorithm traverses the closure of the object graph and writes all reachable objects to a stream. If a member of a class is tagged transient , the default algorithm skips this field.

When the stream is read back, the equivalent object graph is rebuilt, keeping transient members initialized to their null values. By tagging a field as transient , its referenced object or content is not serialized by the default algorithm. On the contrary, fields that are not tagged transient are not checked by the Java compiler. For example, an ArrayList itself is serializable because it implements the Serializable interface, but because an instance of ArrayList might refer to any instance of java.lang.Object , the graph might still not be serializable at runtime.

2.2.1 The serialization API

The Serializable interface has no declared methods. It is merely a tagging interface, although two methods are frequently used in conjunction with Serializable : readObject and writeObject . The following code copies a SerializableBook into a byte array and back into another SerializableBook :

 import java.io.*; import java.util.*; class SerializableBook implements Serializable {       String name;       SerializableBook(String n)       {          name = n;       } } public class SerialTest {   public static void main(String args[])     throws Exception     {       SerializableBook object;       object = new SerializableBook("foo");       // create a buffer to write the object into:       ByteArrayOutputStream bo;       bo = new ByteArrayOutputStream();       // create a stream for serial object data:       ObjectOutputStream oo = new ObjectOutputStream(bo);       // write the object (and all its members recursively)       oo.writeObject(object);       // don't forget to close the stream:       oo.close();       // now lets get the object back from the stream:       // get a new stream that reads the buffer of the       // previous stream:       ByteArrayInputStream bi =                 new ByteArrayInputStream(bo.toByteArray());       // get a stream that can read objects:       ObjectInputStream oi = new ObjectInputStream(bi);       // cast the read object to our type:       SerializableBook back;       back = SerializableBook)oi.readObject();       // never ever forget to close a stream:       oi.close();       }

Three important things can be seen here:

ObjectOutputStream and ObjectInputStream are the main working horses for serialization.
A String seems to be serializable and is written to the stream by some magic. There is no code to read or write that member in the SerializableBook class.
The returned object has to be cast to the destination type. Although the ObjectInputStream " knows " which object to create from the stream, there must be some generic method that returns simple objects.

In the example code, ByteArrayInputStream and ByteArrayOutputStream , respectively, may be replaced by any other Java stream implementation. This is what makes serialization a powerful component of the Java platform: Simply anything may be streamed through a pipe, a network connection, a file, or into a buffer. Other components make heavy use of serialization, such as Remote Method Invocation (RMI), for example. Needless to say, serialization works across platforms and is independent of the CPU byte order or other operating system dependencies.

2.2.2 Versioning and serialization

One of the obstacles with serialization is version handling. When anything changes in the class layout ”the class name, package name, inheritance, fields, or access modifiers ”the new version of the class must be recognized by the serialization runtime library. This is done by a so-called serial version unique identifier (serial version UID), which is a kind of hash code over all properties of a class. The default algorithm throws a StreamCorruptedException or an InvalidClassException if the serialized object data is incompatible with the class in memory. That is one reason why developers need to overload the readObject and writeObject methods with their own implementations to handle different versions of a class. Here is an example of how to handle the versioning problem:

 class Author     implements Serializable {     String name;     Author(String n)     {         name = n;     }     private void writeObject(ObjectOutputStream out)         throws IOException     {         int version = 1;         out.writeInt(version);         out.writeUTF(name);     }     private void readObject(ObjectInputStream in)         throws IOException, ClassNotFoundException     {         int version = in.readInt();         if (version == 1) {             name = in.readUTF();         } else {             throw new IOException(                   "Invalid version: "+version);         }     }   }

Now that a version is provided with every object in the stream, the readObject method can easily distinguish old and new objects. The second version of the Author class gets its name attribute split into first name and last name:

 class Author implements Serializable {     String firstName;     String lastName;     Author(String first, String last)     {         firstName = first;         lastName = last;     }     private void writeObject(ObjectOutputStream out)         throws IOException     {         int version = 2;         out.writeInt(version);         out.writeUTF(firstName);         out.writeUTF(lastName);     }     private void readObject(ObjectInputStream in)         throws IOException, ClassNotFoundException     {         int version = in.readInt();         if (version == 2) {             firstName = in.readUTF();             lastName = in.readUTF();         } else if (version == 1) {             // fallback:             String name = in.readUTF8();             firstName = guessFirstName(name);             lastName = guessLastName(name);         } else {             throw new IOException(                "Invalid version: "+version);         }     }   }

2.2.3 When to use object serialization

An application should use serialization when simple data types have to be saved persistently. Configuration data, which is entered through dialog boxes or window positions in GUI applications, is a good candidate for serialization. When the application version changes, such settings may be simply ignored and complex version handling may not be required. Another use case is RMI: an RMI method may take any object as a parameter if the object and its referenced attributes are serializable. The same applies to method return values. Complex version handling is required only for applications that must support incompatible client and server application versions.

2.2.4 When not to use object serialization

Serialization does, however, have serious limitations that exclude it as a candidate technology for storing even simple domain objects (e.g.. clients , bills, issues, and so on) of real-world applications. Most notably, Java JDK serialization lacks the following:

Query facility: After objects are serialized into a stream, there is no way to ask the stream to return all objects that have fields of certain values. The readObject() method simply returns the next object in the stream, with all objects reachable from it.
Partial read or update: While partial reads and updates as well as caching features are "only" performance and memory usage enhancing features of other persistence solutions, the lack of such approaches in serialization makes it useless for big quantities of data.
Lifecycle management: The deliberately simple serialization API (as seen above) has no notion of an object's "state," i.e., an object being "dirty," and so on. Objects exist in memory, and are explicitly written to and read from dumb streams, and it really doesn't go further. As mentioned, this is, of course, perfectly suitable for purposes such as RMI or persisting very simple configuration information, but not for domain data.
Concurrency and transactions: Serialization completely lacks the notion of ACID ^[1] transactions and is not suitable for applications with concurrent data access. It is, for example, impossible to have two threads write to or read from one and the same stream at the same time.

^[1] ACID is an acronym for A tomicity, C onsistency, I solation, D urability.

The next section examines how relational databases can be used with an object persistence framework as a data storage technology.