Serializing and Deserializing Data | Microsoft Visual J# .NET (Core Reference) (Pro-Developer)

I l @ ve RuBoard

The serialization architecture of the .NET Framework is highly customizable. The basic mechanism is simple, but at several points you can override the default behavior and extend it with your own code. In essence, an object is converted into a series of bytes, which is sent down a stream. The stream can be directed toward persistent storage (a file stream), sent over a network, or dispatched to some other destination. Whatever the ultimate target, eventually this stream of bytes must be reconstructed into a copy of the original object.

Formatting Data

The format of the byte stream emitted by the serialization process is governed by a formatter object. The .NET Framework Class Library supplies two formatters that you can use off the shelf: the BinaryFormatter class (located in the System.Runtime.Serialization.Formatters.Binary namespace) and the SoapFormatter class (located in the System.Runtime.Serialization.Formatters.Soap namespace). You can also create your own custom formatter objects by implementing the System.Runtime.Serialization.IFormatter . The BinaryFormatter and SoapFormatter classes both implement the IFormatter interface.

Implementing the IFormatter Interface

If you want to implement the IFormatter interface yourself, the preferred technique is to extend the System.Runtime.Serialization.Formatter class and override its methods and properties. The Formatter class is abstract, but it provides some helper methods that you can use to interact with the .NET Framework during the serialization and deserialization processes. You should provide your own implementation of the Serialize and Deserialize methods. The Serialize method is passed an output stream and an object; the method should write a serialized version of the object (which can be complex, containing subobjects) to the stream. Conversely, the Deserialize method is passed an input stream, and the method should extract the object data from this stream and use it to reconstitute the object. (Again, this can be a complex object.)

If you implement IFormatter in this way, you must also supply implementations of the following abstract methods inherited from the Formatter class, which are not actually part of the IFormatter interface: WriteArray , WriteBoolean , WriteByte , WriteChar , WriteDateTime , WriteDecimal , WriteDouble , WriteInt16 , WriteInt32 , WriteInt64 , WriteObjectRef , WriteSByte , WriteSingle , WriteTimeSpan , WriteUInt16 , WriteUInt32 , WriteUInt64 , and WriteValueType . This looks like a lot of work but is not as bad as it appears. All of these methods are similar, and their task is to write data of the appointed type (passed in as a parameter, together with the name of the data) in a serialized form to the output stream. You can use these methods when you implement Serialize .

You must also implement the Binder , Context , and SurrogateSelector properties (which are part of the IFormatter interface). Other helper classes are available in the System.Runtime.Serialization namespace. You'll learn more about binding, streaming contexts, and serialization surrogates later in this chapter.

The SoapFormatter class generates an XML stream that can contain simple and complex objects. This class is often used for describing parameters and return values, and forms a fundamental part of the infrastructure needed to support Web services. We'll examine it further in Part V of this book. For the time being, we'll concentrate on the BinaryFormatter class.

The formatter will check that the object to be serialized actually supports serialization. An object can be serialized if it is marked with the Serializable Attribute (which is somewhat similar to applying the Serializable tag interface in the JDK). A class can also control the serialization process by implementing the System.Runtime.Serialization.ISerializable interface. If the object does implement the ISerializable interface, the formatter will call the GetObjectData method of the object at the appropriate juncture ( GetObjectData is the only method defined in the ISerializable interface), and this method should convert the object into a stream of bytes. If the object is simply marked with the SerializableAttribute , the formatter will use its own default mechanism for converting the object into a stream of bytes. Either way, the resulting stream can be sent to a file, over the network, or wherever!

The Serializer.jsl sample file (in the NETSerialization project) shows how to use a BinaryFormatter to store a Cake object (see the sample Cake.jsl in the Cake project, which has been added to the NETSerialization solution) to a file. A Cake object has four properties: filling, shape, size , and message. The filling , shape , and size properties are short values, and the message property is a String . The Cake class is marked with the SerializableAttribute :

 /**@attributeSerializableAttribute()*/ publicclassCake { }

The main method of the Serializer class creates a test Cake object and sets its properties:

 Cakecake=newCake(); cake.set_Filling(Cake.Fruit); cake.set_Shape(Cake.Round); cake.set_Size((short)12); cake.set_Message("HappyBirthday");

The main method then creates a BinaryFormatter object and a Stream for writing data to the file CakeInfo.bin:

 BinaryFormatterformatter=newBinaryFormatter(); Streamstream=newFileStream("CakeInfo.bin",FileMode.Create, FileAccess.Write,FileShare.None);

The Serialize method of the IFormatter interface (which BinaryFormatter implements) serializes its second argument (the Cake object), sending the result down the stream specified by its first argument (the file stream):

 formatter.Serialize(stream,cake);

If you want to send a serialized Cake object over a network, the principle is exactly the same. The only difference is that you should use a network stream rather than a file stream. The System.Net.Sockets.TcpClient class provides the GetStream method, returning a NetworkStream object:

 TcpClientclient=newTcpClient(...); formatter.Serialize(client.GetStream(),cake);

If you're using raw sockets or the UdpClient class, which send arrays of bytes, you can employ a System.IO.MemoryStream to serialize an object into a ubyte array:

 Socketsender=newSocket(...); //Thearraymustbebigenoughtoholdtheserializeddata. ubyte[]data=...; MemoryStreammemStream=newMemoryStream(data); //SerializetheobjecttotheMemoryStream. //Theresultwillbestoredinthedataarray. formatter.Serialize(memStream,cake); memStream.Close(); //Transmitthedata sender.Send(data);

Note that the automatic serialization mechanism requires that the class being serialized not only be designated as serializable but that all ancestor classes be serializable as well. (Native Java serialization has the same requirement.) This is not a problem for types such as the Cake class shown above because it descends from Object, which is also serializable. But be aware of this issue if you're implementing your own class hierarchy. If your class can be subclassed, mark it with the SerializableAttribute unless you want to prevent it from being serialized.

Note

If you examine the CakeInfo.bin file generated by the Serializer class, you should notice that the final member variables (the cake fillings and shapes ) are not serialized. This is because the values for these variables are defined by the values specified in the class definition itself (stored with the metadata of the class), and these variables cannot be assigned to in code. When the Cake object is deserialized, these variables will be populated from the metadata as the Cake object is instantiated , and not from the serialization stream.

Deserialization

Deserialization is a matter of reading a binary stream and using it to reconstitute an object. When an object is serialized using a binary formatter, the resulting stream contains the name of the class, the identity of the assembly, and the name and value of every member of the object. When a stream is deserialized into an object, the binary formatter (which handles deserialization as well as serialization) must have access to the assembly defining the class specified in the stream so it can build an instance of the required object. The formatter can then populate the members of the object using the data in the stream.

It's worth noting that the formatter accesses the member variables directly and does not execute constructors or use object properties when it assigns the data values ”this is for reasons of speed. The Deserializer class in the Deserializer.jsl sample file (in the NETDeserialization project) rebuilds a Cake object from the CakeInfo.bin file and displays its values. The method in the formatter that does the work is Deserialize . This method expects a Stream as its parameter, which it reads and uses to construct the object. The result of the Deserialize method is actually an Object , which you must cast appropriately:

 BinaryFormatterformatter=newBinaryFormatter(); Streamstream=newFileStream("CakeInfo.bin",FileMode.Open, FileAccess.Read,FileShare.Read); Cakecake=(Cake)formatter.Deserialize(stream);

As with the Serialize method, you can deserialize data that appears on almost any stream. For example, to read Cake data arriving on a TcpClient , you can use the following:

 TcpListenerserver=newTcpListener(...); server.Start(); TcpClientclient=server.AcceptTcpClient(); Cakecake=(Cake)formatter.Deserialize(client.GetStream());

Alternatively, if you're using the low-level Sockets API or the UdpClient class, where the data arrives as an array of bytes, you can wrap the data inside a MemoryStream object and deserialize it:

 Socketreceiver=newSocket(...); ubyte[]data=... receiver.Receive(data); MemoryStreammemStream=newMemoryStream(data); CakereceivedCake=(Cake)formatter.Deserialize(memStream);

The principal advantages of using the BinaryFormatter class over the XML alternatives are speed and compactness. As mentioned earlier, the Binary Formatter quickly accesses the internals of objects directly and doesn't bother with niceties such as whether that data is public, private, or protected. This guarantees that all member variables in an object will be saved to the serialization stream and populated correctly when being deserialized. This sounds like an obvious requirement for serialization, but there are obvious security implications ”if you understand the format used by the serialization process, you can forge your own objects! This is even easier if you're using XML serialization to transmit data in XML format over the Web. For this and other reasons, which will be described in due course, the default XML serialization mechanisms supplied with .NET will read and populate only public member variables and members that are reachable through publicly accessible properties. Totally private data (members not directly accessible and not exposed through properties) will not be serialized.

As far as compactness is concerned , you've seen that the BinaryFormatter class does not store much information about the structure of the data itself ”it just records the identity of the class and assembly in the serialized output stream. In contrast, the XML formatter builds an XML representation of the object being serialized, which is more verbose but more portable ”any application that can consume XML can read and process this representation. Speed and compactness become more important as you serialize and deserialize larger and larger objects ”you might not notice much difference in performance between the formatters when you serialize a single Cake object, but if you're serializing a collection of 10,000 of them, the distinction will become a lot more obvious.

Note

The format used by native Java serialization is even more compact than that used by the BinaryFormatter class, but it is less able to cope with versioning issues.

Versioning

Deserializing a stream into an object using a binary formatter relies on having the definition of the class available. The binary formatter uses the assembly and class name found in the serialized stream to locate the assembly required to instantiate an empty object of the appropriate type. The mechanism used for locating the assembly is the same as that used by the loader (as described in Chapter 2). The class can be contained in a local private assembly or in an assembly in the Global Assembly Cache (GAC). The binding policy used to redirect requests for one version of an assembly to another, and any codebases specified in the application configuration file (if it has one), will be applied. This means that it is possible to serialize an object using one version of its implementation and then attempt to deserialize it using another. This might or might not work, depending on the nature of the changes between the two versions.

Once an empty object has been created, the deserialization process will attempt to fill its members using the information specified in the serialization stream. This information comprises the name of each member, along with its type and value. The binary deserialization process is therefore not affected by the following modifications to the class:

A change in the order of the member variables of a class. (This is also true of native Java serialization performed using the JDK.
A change in the type of any member variable, as long as it is possible to cast from the old type to the new (not true of Java serialization).

Note

A class can implement the System.IConvertible interface to define custom conversions to primitive common language runtime base types. Deserialization will exploit these conversions if it needs to.
Changes to the names of any methods (not true of Java serialization).
The addition or removal of methods (not true of Java serialization).
Changes to the signatures of any methods and constructors (not true of Java serialization).

Versioning in the JDK

For the JDK purist, you can indicate that a Java class is serialization-compatible with an older version. This allows you to add, remove, and change the signatures of methods in much the same way that you can with .NET serialization.

The serialization format used by native Java serialization identifies the class with a serial version unique ID . This is a unique hash based on the methods, variables, types, class, parent class, and other features of the serialized object. If a class changes in any way, the contents will hash to a different value. When an object is deserialized, the JVM will generate the hash for the target class and compare it to the hash in the serialized stream. If they're different, the JVM will throw an exception ( java.io.InvalidClassException ).

However, you can override the generated serial version unique ID for a class with your own value by adding a static final long variable called serial VersionUID to the class:

 publicclassCake { staticfinallongserialVersionUID=4832732748870872134L; }

The value should be the same as that of the version of the class you want to be compatible with. You can obtain the serial version of a class using the serialver utility supplied with the JDK:

 C:\>serialverCake Cake:staticfinallongserialVersionUID=4832732748870872134L;

Note that if you force the serial version unique ID in this way, you must ensure that the collection of member variables in the class have not changed. Otherwise , you might have problems when deserializing ”if the type of a variable has changed (the JDK will attempt to cast data if it can), the class will be deemed incompatible and the JDK will throw an exception. Any new variables added to a class will be set to null or zero (depending on their type), and the data for variables that are no longer present will be discarded.

Binary deserialization in .NET will permit the addition of new member variables, although they'll be left in an unassigned state after deserialization has occurred. (There will be no corresponding values in the serialized stream.) However, if you change the name of any member variable, deserialization will fail and throw an exception. Similarly, if you remove a member variable deserialization will also fail, although if you know that a member has been removed since an object was serialized, you can take certain steps to recover data from the original stream. This involves performing selective deserialization by defining your own custom serialization/deserialization mechanism and handling the differences between the stream and the expected data manually. We'll look at custom serialization shortly.

Note

Strictly speaking, if you add a member variable to an existing serializable class, you should mark that variable with the System.NonSerializedAttribute . This will result in the new member being omitted from any future serialization and will maintain compatibility with code that might use serialized instances of older versions of this class. Of course, hardly anyone does this because they don't want to have to cope with the resulting lost state! Just be aware that once you've created and published a class, you can ensure absolute compatibility with existing code only by never adding or removing anything. We'll look at the NonSerializedAttribute later in this chapter.

By default, the serialization stream contains the version number, strong name, and culture of the assembly containing the serialized object, and this information is used to ensure that the correct version of the assembly is used when the stream is deserialized back into an object (notwithstanding any binding policy specified by the application configuration file). If you examine the CakeInfo.bin file generated by the Serializer class shown earlier, you'll see something similar to this embedded in the binary data. (Your version number and public key token might differ from those shown here.)

 ......Cake,Version=1.0.849.23238, Culture=neutral,PublicKeyToken=5cebc9b2f5e65f60......CakeInfo.Cake.....

Note

If the assembly defining the serialized object class is not signed, the PublicKeyToken will be null.

If you're absolutely certain that member variables will never change their names and member variables will never be added or removed, you can save some time (and space) and turn off this version checking before serializing or deserializing data. (But be careful!) The BinaryFormatter class has an AssemblyFormat property that you can set to the value FormatterAssemblyStyle.Simple . The only other option is FormatterAssemblyStyle.Full , which happens to be the default. The FormatterAssemblyStyle enumeration is defined in the System.Run time.Serialization.Formatters namespace:

 BinaryFormatterformatter=...; formatter.set_AssemblyFormat(FormatterAssemblyStyle.Simple);

When an object is serialized, the binary formatter will not emit any assembly version, strong name, or culture information ”just the unqualified namespace and class:

 CakeInfo.Cake...

When the stream is deserialized, the BinaryFormatter will use whatever version of the specified assembly happens to be available. This feature allows you to deploy the latest version of an assembly on a computer without worrying about which versions were used when any data was serialized.

The Serialization Binder

During deserialization, the formatter actually uses a serialization binder object to determine which assembly to load and which class to instantiate. A serialization binder is a class that extends the System.Runtime.Serialization.SerializationBinder abstract class. The default binder performs the actions described in the text, but you can change the binding mechanism used by implementing your own binder. For example, you might want to have finer control over which version of an assembly is used, or you might want to load a different class altogether. You can do this by extending the SerializationBinder class and overriding the BindToType method. This method takes the identity of the assembly (a string that includes the name, version, culture, and public key ”for example, "Cake, Version= 1.0.849.23238, Culture=neutral, PublicKeyToken=5cebc9b2f5e65f60" ) and class (including namespace), which are retrieved from the serialization stream by the formatter. The BindToType method can examine these parameters, parse them, and return the type of an object that the formatter should create (as a System.Type ):

 publicclassCustomBinderextendsSerializationBinder { publicTypeBindToType(StringassemblyName,String,typeName) { //Parseandexaminetheassemblynameandtype //Ifadifferenttypeshouldbeusedthen... if(...) { //...deserializeintoaMyNameSpace.MyClassobject //fromtheMyAssembly.dllassembly returnType.GetType("MyNameSpace.MyClass,MyAssembly"); } else { //otherwiseusethedefaulttypepassedin returnType.GetType(typeName+ "," +assemblyName); } } }

To use a custom serialization binding, you must instantiate a binder object and attach it to the formatter being used through its Binder property:

 BinaryFormatterformatter=newBinaryFormatter(); SerializationBinderbinder=newCustomBinder(); formatter.set_Binder(binder);

Being Selective

You can perform selective serialization of member variables in a class. This is most useful for member variables that are used purely for calculations or that do not maintain meaningful state over time. An example could be a variable that refers to a file stream. A file stream is really just a reference to an internal structure managed by the operating system; it might be valid when an object is serialized, but it is likely to be invalid when the object is deserialized at some unspecified point in the future. (In general, this behavior is true of most objects that are accessed through handles.) Selective serialization is also useful if a member variable is of a type that is not itself serializable; this member can be omitted.

The simplest way to perform selective serialization is to use the NonSerializedAttribute class to tag members that you do not want to serialize, as shown below. This is somewhat analogous to marking a field as transient when you use the JDK.

 /**@attributeSerializableAttribute()*/ publicclassWidget { privateintsize;//Serialized /**@attributeNonSerializedAttribute()*/ privateFileStreamfs;//Notserialized }

You must prefix each member variable that you do not want to be serialized with this attribute. After deserialization occurs, the member variable will be uninitialized and should be assigned before use.

I l @ ve RuBoard