Customizing the Serialization Format

The default serialization procedure does not always produce the results you want. Most often, a nonserializable field like a Socket or a FileOutputStream needs to be excluded from serialization. Sometimes, a class may contain data in nonserializable fields like a Socket that you nonetheless want to savefor example, the host that the socket's connected to. Or perhaps a singleton object wants to verify that no other instance of itself exists in the virtual machine before it's reconstructed. Or perhaps an incompatible change to a class such as changing a Font field to three separate fields storing the font's name, style, and size can be made compatible with a little programmer-supplied logic. Or perhaps you want to compress a large array of image data before writing it to disk. For these or many other reasons, you can customize the serialization process.

The simplest way to customize serialization is to declare certain fields transient. The values of transient fields are not written onto the underlying output stream when an object in the class is serialized. However, this only goes as far as excluding certain information from serialization; it doesn't help you change the format that's used to store the data or take action on deserialization or ensure that no more than one instance of a singleton class is created.

For more control over the details of your class's serialization, you can provide custom readObject( ) and writeObject( ) methods. These are private methods that the virtual machine uses to read and write the data for your class. This gives you complete control over how data in your class is written onto the underlying stream but still uses standard serialization techniques for all fields of the object's superclasses.

If you need even more control over the superclasses and everything else, you can implement the java.io.Externalizable interface, a subinterface of java.io.Serializable. When serializing an externalizable object, the virtual machine does almost nothing except identify the class. The class itself is completely responsible for reading and writing its state and its superclass's state in whatever format it chooses.

13.7.1. The readObject( ) and writeObject( ) Methods

By default, serialization takes place as previously described. When an object is passed to an ObjectOutput's writeObject( ) method, the ObjectOutput reads the data in the object and writes it onto the underlying output stream in a specified format. Data is written starting with the highest serializable superclass of the object and continuing down through the hierarchy. However, before the data of each class is written, the virtual machine checks to see if the class in question has methods with these two signatures:

private void writeObject(ObjectOutputStream out) throws IOException
private void readObject(ObjectInputStream in)
 throws IOException, ClassNotFoundException

(Actually, an ObjectOutput only checks to see if the object has a writeObject( ) method, and an ObjectInput only checks for a readObject( ) method, but it's rare to implement one of these methods without implementing the other.) If the appropriate method is present, it is invoked to serialize the fields of this class rather than writing them directly. The object stream still handles serialization for any superclass or subclass fields.

For example, let's return to the issue of making a SerializableZipFile. Previously it wasn't possible because the superclass, ZipFile, didn't have a no-argument constructor. In fact, because of this problem, no subclass of this class can be serializable. However, it is possible to use composition rather than inheritance to make our zip file serializable. Example 13-3 shows a SerializableZipFile class that does not extend java.util.zip.ZipFile. Instead, it stores a ZipFile object in a transient field in the class called zf. The zf field is initialized either in the constructor or in the readObject( ) method. Invocations of the normal ZipFile methods, like enTRies( ) or getInputStream( ), are merely passed along to the ZipFile field zf.

Example 13-3. SerializableZipFile

import java.io.*;
import java.util.*;
import java.util.zip.*;
public class SerializableZipFile implements Serializable {
 private ZipFile zf;
 public SerializableZipFile(String filename) throws IOException {
 this.zf = new ZipFile(filename);
 }
 public SerializableZipFile(File file) throws IOException {
 this.zf = new ZipFile(file);
 }
 private void writeObject(ObjectOutputStream out) throws IOException {
 out.writeObject(zf.getName( ));
 }
 private void readObject(ObjectInputStream in)
 throws IOException, ClassNotFoundException {
 String filename = (String) in.readObject( );
 zf = new ZipFile(filename);
 }
 public ZipEntry getEntry(String name) {
 return zf.getEntry(name);
 }
 public InputStream getInputStream(ZipEntry entry) throws IOException {
 return zf.getInputStream(entry);
 }
 public String getName( ) {
 return zf.getName( );
 }
 public Enumeration entries( ) {
 return zf.entries( );
 }
 public int size( ) {
 return zf.size( );
 }
 public void close( ) throws IOException {
 zf.close( );
 }
 public static void main(String[] args) {
 try {
 SerializableZipFile szf = new SerializableZipFile(args[0]);
 ByteArrayOutputStream bout = new ByteArrayOutputStream( );
 ObjectOutputStream oout = new ObjectOutputStream(bout);
 oout.writeObject(szf);
 oout.close( );
 System.out.println("Wrote object!");
 ByteArrayInputStream bin = new ByteArrayInputStream(bout.toByteArray( ));
 ObjectInputStream oin = new ObjectInputStream(bin);
 Object o = oin.readObject( );
 System.out.println("Read object!");
 }
 catch (Exception ex) {ex.printStackTrace( );}
 }
}

Let's look closer at the serialization parts of this program. What does it mean to serialize ZipFile? Internally, a ZipFile object is a filename and a long integer that serves as a native file descriptor to interface with the native zlib library. File descriptors have no state that would make sense across multiple runs of the same program or from one machine to the next. This is why ZipFile is not itself declared serializable.

However, if you know the filename, you can create a new ZipFile object that is the same for all practical purposes.

This is the approach Example 13-3 takes. To serialize an object, the writeObject( ) method writes the filename onto the output stream. The readObject( ) method reads this name back in and recreates the object. When readObject( ) is invoked, the virtual machine creates a new SerializableZipFile object out of thin air; no constructor is invoked. The zf field is set to null. Next, the private readObject( ) method of this object is called. The value of filename is read from the stream. Finally, a new ZipFile object is created from the filename and assigned to zf.

This scheme isn't perfect. In particular, the whole thing may come crashing down if the file that's referred to isn't present when the object is deserialized. This might happen if the file was deleted in between the time the object was written and the time it was read, for example. However, this will only result in an IOException, which the client programmer should be ready for in any case.

The main( ) method tests this scheme by creating a serializable zip file with a name passed in from the command line. Then the serializable zip file is serialized. Next the SerializableZipFile object is deserialized from the same byte array it was previously written into. Here's the result:

D:JAVA>java SerializableZipFile test.zip
Wrote object!
Read object!

13.7.2. The defaultWriteObject() and defaultReadObject( ) Methods

Sometimes rather than changing the format of an object that's serialized, all you want to do is add some additional information, perhaps something that isn't normally serialized, like a static field. In this case, you can use ObjectOutputStream's defaultWriteObject( ) method to write the state of the object and then use ObjectInputStream's defaultReadObject( ) method to read the state of the object. After this is done, you can perform any custom work you need to do on serialization or deserialization.

public final void defaultReadObject( )
 throws IOException, ClassNotFoundException, NotActiveException
public final void defaultWriteObject( ) throws IOException

For example, let's suppose an application that would otherwise be serializable contains a Socket field. As well as this field, assume it contains more than a few other complex fields, so that serializing it by hand, while possible, would be onerous. It might look something like this:

public class NetworkWindow extends Frame implements Serializable {
 private Socket theSocket;
 // several dozen other fields and methods
}

You could make this class fully serializable by merely declaring theSocket transient:

private transient Socket theSocket;

Let's assume you actually do want to restore the state of the socket when the object is deserialized. In this case, you can use private readObject( ) and writeObject( ) methods as in the last section. You can use defaultReadObject( ) and defaultWriteObject( ) methods to handle all the normal, nontransient fields and then handle the socket specifically. For example:

private void writeObject(ObjectOutputStream out) throws IOException {
 out.defaultWriteObject( );
 out.writeObject(theSocket.getInetAddress( ));
 out.writeInt(theSocket.getPort( ));
 }
 private void readObject(ObjectInputStream in)
 throws IOException, ClassNotFoundException {
 in.defaultReadObject( );
 InetAddress ia = (InetAddress) in.readObject( );
 int thePort = in.readInt( );
 theSocket = new Socket(ia, thePort);
 }

It isn't even necessary to know what the other fields are to make this work. The only extra work that has to be done is for the transient fields. This technique applies far beyond this one example. It can be used anytime when you're happy with the default behavior and merely want to do additional things on serialization or deserialization. For instance, it can be used to set the values of static fields or to execute additional code when deserialization is complete. For example, let's suppose you have a Die class that must have a value between 1 and 6, as shown in Example 13-4.

Example 13-4. A six-sided die

import java.util.*;
import java.io.*;
public class Die implements Serializable {
 private int face = 1;
 Random shooter = new Random( );
 public Die(int face) {
 if (face < 1 || face > 6) throw new IllegalArgumentException( );
 this.face = face;
 }
 public final int getFace( ) {
 return this.face;
 }

 public void setFace(int face) {
 if (face < 1 || face > 6) throw new IllegalArgumentException( );
 this.face = face;
 }
 public int roll( ) {
 this.face = (Math.abs(shooter.nextInt( )) % 6) + 1;
 return this.face;
 }
}

Obviously, this class, simple as it is, goes to a lot of trouble to ensure that the die always has a value between 1 and 6. Every method that can possibly set the value of the private field face carefully checks to make sure the value is between 1 and 6. However, serialization provides a back door through which the value of face can be changed. Default serialization uses neither constructors nor setter methods; it accesses the private field directly. Thus it's possible for someone to manually edit the bytes of a serialized Die object so that the value of the face field is greater than 6 or less than 1. To plug the hole, you can provide a readObject( ) method that performs the necessary check:

private void readObject(ObjectInputStream in)
 throws IOException, ClassNotFoundException {
 in.defaultReadObject( );
 if (face < 1 || face > 6) {
 throw new InvalidObjectException("Illegal die value: " + this.face);
 }
}

In this example, the normal serialization format is perfectly acceptable, so that's completely handled by defaultReadObject( ). It's just that a little more work is required than merely restoring the fields of the object. If the deserialized object has an illegal value for face, an exception is thrown and the readObject( ) method in ObjectInputStream rethrows this exception instead of returning the object.

It's important to distinguish between this readObject( ) method, which is a private method in the Die class, and the public readObject( ) method in the ObjectInputStream class. The latter invokes the former.

13.7.3. The writeReplace( ) Method

Sometimes rather than customizing its serialization format, a class simply wants to replace an instance of itself with a different object. For example, if you were distributing a serialized object for a class you didn't expect all recipients to have, you might replace it with a more common superclass. For instance, you might want to replace a quicktime.io.QTFile object with a java.io.File object because Windows systems usually don't have QuickTime for Java installed. The writeReplace( ) method enables this. The signature is normally like this:

private Object writeReplace( ) throws ObjectStreamException;

The access modifier may be public, protected, private, or not present. That doesn't matter. However, if a method with this signature is present when another class that has a reference to this object is writing this object as part of its own serialization strategy, it will write the object returned by this method rather than this object. Normally, the return type of this method is going to be an instance of this class or one of its subclasses. You can change this using the readResolve( ) method.

13.7.4. The readResolve( ) Method

The readResolve( ) method allows you to read one object from a stream but replace it with a different object. The signature of the method is:

private Object readResolve( ) throws ObjectStreamException

As with writeReplace( ), whether the access modifier is public, protected, or private doesn't matter. You can return any type you like from this method, but it has to be able to substitute for the type read from the stream in the appropriate place in the object graph.

The classic use case for readResolve( ) is maintaining the uniqueness of singleton or typesafe enum objects. For instance, consider a serializable singleton such as Example 13-5.

Example 13-5. A Serializable Singleton class

import java.io.Serializable;
public class SerializableSingleton implements Serializable {
 public final static SerializableSingleton INSTANCE
 = new SerializableSingleton( );
 private SerializableSingleton( ) {}
}

By serializing the instance of this class and then deserializing it, one can create a new instance despite the private constructor because serialization doesn't rely on constructors. To fix this, you have to make sure that whenever the class is deserialized, the new object is replaced by the genuine single instance. This is easy to accomplish by adding this readResolve( ) method:

 private Object readResolve( ){
 return INSTANCE;
 }

13.7.5. serialPersistentFields

You can explicitly specify which fields should and should not be serialized by listing them in a serialPersistentFields array in a private static field in the class. If such a field is present, only fields included in the array are serialized. All others are treated as if they were transient. In other words, TRansient marks fields not to serialize while serialPersistentFields marks fields to serialize.

The components of the serialPersistentFields array are ObjectStreamField objects which are constructed using the name and the type of each field to serialize. For example, suppose you wanted the x-coordinate of a TwoDPoint to be serialized but not the y-coordinate. You could mark the y component transient like this:

public class TwoDPoint {
 private double x;
 private transient double y;
 // ...
}

or you could place the x field and not the y field in the serialPersistentFields array like this:

private static final ObjectStreamField[] serialPersistentFields
 = {new ObjectStreamField("x", double.class)};

The first argument to the ObjectStreamField constructor is the name of the field. The second is the type of the field given as a Class object. This is normally a class literal such as BigDecimal.class, Frame.class, int.class, double.class, or double[].class.

The next trick is to use serialPersistentFields to declare fields that don't actually exist in the class. The writeObject( ) method then writes these phantom fields, and the readObject( ) method reads them back in. Typically this is done to maintain backward compatibility with old serialized versions after the implementation has changed. It's also important when different clients may have different versions of the library.

For example, suppose the TwoDPoint class was modified to use polar coordinates instead of Cartesian coordinates. That is, it might look like this:

public class TwoDPoint {
 private double radius;
 private double angle;
 // ...
}

The serialPersistentFields array could still declare the x and y fields, even though they're no longer present in the class:

private static final ObjectStreamField[] serialPersistentFields = {
 new ObjectStreamField("x", double.class),
 new ObjectStreamField("y", double.class),
};

The writeObject( ) method converts the polar coordinates back to Cartesian coordinates and writes those fields. This is accomplished with the ObjectOutputStream's PutField object. (PutField is an inner class in ObjectOutputStream.) You get such an object by invoking the putFields( ) method on the ObjectOutputStream. (Confusingly, this method gets the PutField object. It does not put anything.) You add fields to the PutField object by passing the names and values to the put( ) method. Finally, you invoke the ObjectOutputStream's writeFields method to write the fields onto the output stream. For example, this writeObject( ) method converts polar coordinates into Cartesian coordinates and writes them out as the values of the x and y pseudo-fields:

private void writeObject(ObjectOutputStream out) throws IOException {
 // Convert to Cartesian coordinates
 ObjectOutputStream.PutField fields = out.putFields( );
 fields.put("x", radius * Math.cos(angle));
 fields.put("y", radius * Math.sin(angle));
 out.writeFields( );
}

The readObject( ) method reverses the procedure using an ObjectInputStream's GetField object. (GetField is an inner class in ObjectInputStream.) You retrieve the GetField object by invoking the readFields( ) method on the ObjectInputStream. You then read the fields by passing the names and default values to the get( ) method. (If the field is missing from the input stream, get( ) returns the default value instead.) Finally, you store the values of the pseudo-fields you read from the stream into the object's real fields after performing any necessary conversions. For example, this readObject( ) method reads Cartesian coordinates as the values of the x and y pseudo-fields and converts them into polar coordinates that it stores in the radius and angle fields:

private void readObject(ObjectInputStream in)
 throws ClassNotFoundException, IOException {
 ObjectInputStream.GetField fields = in.readFields( );
 double x = fields.get("x", 0.0);
 double y = fields.get("y", 0.0);
 // Convert to polar coordinates
 radius = Math.sqrt(x*x + y*y);
 angle = Math.atan2(y, x);
}

The advantage to using serialPersistentFields instead of merely customizing the readObject( ) and writeObject( ) methods is versioning. A class can be both forward and backward compatible as long as the SUIDs are the same, even if the old version did not have custom readObject( ) and writeObject( ) methods. If the old class had an explicit serialVersionUID field, just copy that into the new class. Otherwise, use the serialver tool on the old version of the class to determine its default SUID and then copy that value into the serialVersionUID field in the new version of the class.

The PutField.put( ) and GetField.get( ) methods are heavily overloaded to support all the Java primitive data types as well as objects. For instance, the get( ) method has these nine variants:

public abstract boolean get(String name, boolean value)
 throws IOException
public abstract byte get(String name, byte value)
 throws IOException
public abstract char get(String name, char value)
 throws IOException
public abstract short get(String name, short value)
 throws IOException
public abstract int get(String name, int value)
 throws IOException
public abstract long get(String name, long value)
 throws IOException
public abstract float get(String name, float value)
 throws IOException
public abstract double get(String name, double value)
 throws IOException
public abstract Object get(String name, Object value)
 throws IOException

The put( ) method is equally overloaded.

The object stream uses the type of the value argument to determine the type of the field. For instance, if the type of value is double, put( ) puts a double in the stream and get( ) looks for a double when reading the stream. The problem occurs when the type of the argument doesn't match the type of the field. For instance, I initially wrote my readObject( ) method like this:

 double x = fields.get("x", 0);
 double y = fields.get("y", 0);

I then proceeded to bang my head against the wall trying to figure out why Java kept throwing an IllegalArgumentException with the message "no such field". The problem was that the second argument to this method is an int, not a double. Therefore Java was trying to read a field named x (which I had) with a value of type int (which I didn't). Changing these lines to use a double literal fixed the problem:

 double x = fields.get("x", 0.0);
 double y = fields.get("y", 0.0);

About 99% of the time it's safe to use an int literal where a double is intended. This is one of the 1% of cases where it's not.

13.7.6. Preventing Serialization

On occasion, you need to prevent a normally serializable subclass from being serialized. You can prevent an object from being serialized, even though it or one of its superclasses implements Serializable, by throwing a NotSerializableException from writeObject( ). NotSerializableException is a subclass of java.io.ObjectStreamException, which is itself a kind of IOException:

public class NotSerializableException extends ObjectStreamException

For example:

private void writeObject(ObjectOutputStream out) throws IOException {
 throw new NotSerializableException( );
}
private void readObject(ObjectInputStream in) throws IOException {
 throw new NotSerializableException( );
}

13.7.7. Externalizable

Sometimes customization requires you to manipulate the values stored for the superclass of an object as well as for the object's class. In these cases, you should implement the java.io.Externalizable interface instead of Serializable. Externalizable is a subinterface of Serializable:

public interface Externalizable extends Serializable

This interface declares two methods, readExternal( ) and writeExternal( ):

public void writeExternal(ObjectOutput out) throws IOException
public void readExternal(ObjectInput in)
 throws IOException, ClassNotFoundException

The implementation of these methods is completely responsible for saving the object's state, including the state stored in its superclasses. This is the primary difference between implementing Externalizable and providing private readObject( ) and writeObject( ) methods. Since some of the superclass's state may be stored in private or package-accessible fields that are not visible to the Externalizable object, saving and restoring can be a tricky proposition. Furthermore, externalizable objects are responsible for tracking their own versions; the virtual machine assumes that whatever version of the externalizable class is available when the object is deserialized is the correct one. It does not check the serialVersionUID field as it does for merely serializable objects. If you want to check for different versions of the class, you must write your own code to do the checks.

For example, suppose you want a list that can be serialized no matter what it contains, one that will never throw a NotSerializableException even if it contains objects that aren't serializable. You can do this by creating a subclass of ArrayList that implements Externalizable, as in Example 13-5. The writeExternal( ) method uses instanceof to test whether each element is or is not serializable before writing it onto the output. If the element does not implement Serializable, writeExternal( ) writes null in its place.

The key criterion for being able to use Externalizable is that there are enough getter and setter methods to read and write all necessary fields in the superclasses. If this isn't the case, often your only recourse is to use the Decorator pattern to wrap a class to which you do have complete access around the original class. This was the tack taken in Example 13-6 for SerializableZipFile.

Example 13-6. SerializableList

import java.util.*;
import java.io.*;
import java.net.*;
public class SerializableList extends ArrayList
 implements Externalizable {
 public void writeExternal(ObjectOutput out) throws IOException {
 out.writeInt(size( ));
 for (int i = 0; i < size( ); i++) {
 if (get(i) instanceof Serializable) {
 out.writeObject(get(i));
 }
 else {
 out.writeObject(null);
 }
 }
 }
 public void readExternal(ObjectInput in)
 throws IOException, ClassNotFoundException {
 int elementCount = in.readInt( );
 this.ensureCapacity(elementCount);
 for (int i = 0; i < elementCount; i++) {
 this.add(in.readObject( ));
 }
 }
 public static void main(String[] args) throws Exception {
 SerializableList list = new SerializableList( );
 list.add("Element 1");
 list.add(new Integer(9));
 list.add(new URL("http://www.oreilly.com/"));
 // not Serializable
 list.add(new Socket("www.oreilly.com", 80));
 list.add("Element 5");
 list.add(new Integer(9));
 list.add(new URL("http://www.oreilly.com/"));
 ByteArrayOutputStream bout = new ByteArrayOutputStream( );
 ObjectOutputStream temp = new ObjectOutputStream(bout);
 temp.writeObject(list);
 temp.close( );
 ByteArrayInputStream bin = new ByteArrayInputStream(bout.toByteArray( ));
 ObjectInputStream oin = new ObjectInputStream(bin);
 List out = (List) oin.readObject( );
 Iterator iterator = out.iterator( );
 while (iterator.hasNext( )) {
 System.out.println(iterator.next( ));
 }
 }
}

One might quibble about the name; ExternalizableList may seem more accurate. However, from the perspective of a programmer using a class, it doesn't matter whether a class is serializable or externalizable. In either case, instances of the class are passed to the writeObject( ) method of an object output stream or read by the readObject( ) method of an object input stream. The difference between Serializable and Externalizable is hidden from the client.

The writeExternal( ) method first writes the number of elements onto the stream using writeInt( ). It then loops through all the elements in the list, testing each one with instanceof to see whether or not it's serializable. If the element is serializable, it's written with writeObject( ); otherwise, null is written instead. The readExternal( ) method reads in the data. First, it ensures capacity to the length of the list. It then adds each deserialized object (or null) to the list.

The main( ) method tests the program by serializing and deserializing a SerializableVector that contains assorted serializable and nonserializable elements. Its output is:

D:JAVA> java SerializableList
Element 1
9
http://www.oreilly.com/
null
Element 1
9
http://www.oreilly.com/

This isn't a perfect solution. The list may contain an object that implements Serializable but isn't serializable, for example, a hash table that contains a socket. However, this is probably the best you can do without more detailed knowledge of the classes of objects that will be written.

Basic I/O

Introducing I/O

Output Streams

Input Streams

Data Sources

File Streams

Network Streams

Filter Streams

Print Streams

Data Streams

Streams in Memory

Compressing Streams

JAR Archives

Cryptographic Streams

Object Serialization

New I/O

Buffers

Channels

Nonblocking I/O

The File System

Working with Files

File Dialogs and Choosers

Text

Character Sets and Unicode

Readers and Writers

Formatted I/O with java.text

Devices

The Java Communications API

USB

The J2ME Generic Connection Framework

Bluetooth

Character Sets

Character Sets