Serialization in J2EE applications

 < Day Day Up > 



Figure 6-1 conceptually describes the architecture of J2EE applications. For more information on J2EE refer to the J2EE Web page.

click to expand
Figure 6-1: J2EE environment

One implementation of the J2EE architecture is where the client is a browser such as Microsoft Internet Explorer, which invokes an HTTPServlet on the middle tier server, which in turn invokes methods on an Enterprise JavaBean (EJB) in the EJB container. The WebSphere Application Server provides the Web and EJB containers, which in turn provide basic services to the application components such as servlets and EJBs. WebSphere uses RMI over IIOP for remote method invocations on EJBs.

Remote method invocations require the passing of parameters to the remote function. These parameters can be primitive data types or complex object types of any size. In RMI, remote method calls pass parameters (including nonprimitive parameters such as arrays and objects) by value to the remote Java virtual machine (JVM). The parameter is deconstructed into a byte stream representing a copy of the object in its current state, which is transmitted over a network connection as an argument for the remote method. The process of preparing a parameter for wire transfer is called marshalling. On the server side, the byte stream is demarshalled to an object representation in memory.

Java object serialization is the process of converting and encoding objects, and objects reachable from them, into a linear stream of bytes; and deserialization is the complementary reconstruction of the object from the stream. Objects to be saved in a stream may support either the serializable or the externalizable interface. The serialized form of Java objects must be able to identify and verify the Java class from which the contents of the object were saved and to restore the contents to a new instance. For serializable objects, the stream includes sufficient information to restore the fields in the stream to a compatible version of the class. For externalizable objects, the class is solely responsible for the external format of its contents.

Uses of serialization

Serialization can be used for lightweight persistence to the file system. Serialization is very useful in distributed applications for:

Object distribution: A distributed object is one that is instantiated in one JVM and accessible from other virtual machines. If the object is not copied but is accessible remotely through RMI, the arguments and result of the remote invocations must be marshalled using java.rmi.MarshalledObject, which implements the serializable interface so that the arguments and result can be serialized.

Exception handling: The Java Throwable interface is serializable and hence exceptions can be thrown from one JVM to another.

Sending messages: Java Message Service (JMS) allows the use of serialized objects in the javax.jms.ObjectMessage if the consumer of the message is another Java application.

State management: When high availability and failover are important for Web applications, serialized objects may be stored in persistent HttpSessions to manage the state of a user interaction.

Using Java serialization

To use the serialization capability a programmer only needs to implement the java.io.Serializable interface for the objects used that need to be serializable. To implement serializable, programmers do not have to implement any new methods because the serializable interface doesn't declare any methods. A serializable object does not implement or declare a class's serializable behavior; the default implementation is provided by ObjectOutputStream methods defaultWriteObject() and defaultReadObject(). Thus all objects can serialize themselves. This is possible because the serialization mechanism automatically converts class object hierarchies to metadata so that object instances can be serialized.

Serialization is implemented in the java.io package by two main types: ObjectOutput (implemented by ObjectOutputStream) and ObjectInput (implemented by ObjectInputStream). Eventually, serialization and deserialization are performed by two methods:

void::ObjectOutput.writeObject(Object obj): Serializes the object into bytes

Object::ObjectInput.readObject(): Deserializes the bytes into a new object of the same class

The serializable interface has a specialized subtype called externalizable. The difference between serialization and externalization is the final state. An externalizable class must implement its own external encoding and is then solely responsible for the external format. It must customize the serialization by implementing writeExternal(ObjectOutput out) and readExternal(ObjectInput in).

Serialization algorithm

The serialization mechanism provided by the Java runtime environment stores class related metadata in the object byte stream. The metadata includes:

  • The versionID of the class

  • A Boolean indicating if writeObject and readObject have been implemented

  • Number of serializable fields

  • Descriptions of each field (name and type)

  • Data produced by ObjectOutputStream's annotateClass() method

  • Description of superclass if it is serializable.

During a write operation, the serialization mechanism first writes the object metadata to the stream and then writes instance data to the stream. The ObjectOutputStream deals with the manipulation of the stream. One method useProtocolVersion(int version) can be used to change to either of PROTOCOL_VERSION_1 or PROTOCOL_VERSION_2 protocol to write the data into the stream.

While writing objects to the stream using the writeObject() method, ObjectOutputStream maintains handles of classes and instances written to a stream. This helps to avoid writing the same information multiple times and also avoids any circular references.

The readObject() method in ObjectInputStream of the remote JVM is complicated by the fact that this JVM may have a different version of the serialized class. ObjectInputStream gets a) description of classes in the byte stream and b) serialization data of the instance. It uses the class description for data as well as comparison to its local class description. If the class description matches the local description it creates the instance; otherwise it throws an exception.

Serialization issues

Serialization is a generic marshaling algorithm with some customization capability. While relatively easy to implement, serialization affects performance. To write all the information about a class into a stream, the defaultWriteObject() method uses reflection to discover the field values. It also uses reflection to determine if writeObject() is implemented.

Serialization uses a verbose data format, necessary to reconstruct the object in a different JVM across the network. Every invocation results in the same class information being streamed to bytes. This becomes repetitive work with no value, and starts to make a difference as volumes increase.

Serialization of objects poses another problem with version control. If a class is instantiated, then serialized to the file system and later modified, the serialized object is no longer compatible or usable. This problem is remedied by the use of serialVersionUID in the class definition. Certain modifications, such as new fields to the class, are tolerated if the serialVersionUID is used in the class. The flip side of this feature is that we could be serializing more data than the remote JVM can decipher.

The introductory explanation above implies that all objects are naturally serializable and that the default implementations are typically sufficient. This is not always the case and the problem is apparent only at runtime because classes whose objects do not serialize successfully compile without errors. Typically the compiler flags an error when a class declares that it implements an interface but does not have the necessary code. The problem shows up late in the development cycle when distributed mechanisms are exercised. Class loading and class visibility problems in a distributed remote JVM are quite common when they try to deserialize nested objects within a serialized object's graph.

One way to improve serialization performance would be to use the externalizable interface and be responsible for the external data format. A class that implements externalizable is responsible for any and all fields defined in superclasses. This is, however, a big investment and maintenance of the new methods is expensive with changes in data formats.

Related research on improving serialization performance

One approach adopted by Campadello et al. [1] is to use gzip to compress data streams. Another approach is to recognize class names by the common prefix in the stream. The prefix is inserted into the stream only once. These approaches add more computation steps but reduce the network throughput. Our experience with WebSphere applications is that applications are usually CPU bound and therefore the network is not the bottleneck.

Fabian Berg and D Gannon [2] developed an object serialization framework that allows implementing of the object serialization on a per class basis, and the logic is that it reduces the use of Java's reflection capabilities.

Yet another approach called Xstream object serialization, used by Philipssen and Haumacher [3], exploits the fact that object streams do not need to contain all the data since the serialization process is verbose. It also uses caching mechanisms and buffering schemes in object serialization.

The above mechanisms call for a significant investment in writing middleware, which may or may not give equivalent or expected returns for individual IT organizations.



 < Day Day Up > 



High-Volume Web Sites Team - More about High-Volume Web Sites
High-Volume Web Sites Team - More about High-Volume Web Sites
ISBN: N/A
EAN: N/A
Year: 2003
Pages: 117

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net