Item 36. Serialize XML with XML

XML is itself a fairly efficient serialization format. There's no need or reason to use expensive binary object serialization on XML documents. Even custom-designed binary formats are generally larger, slower, and less robust than XML's plain text formats. (See Item 50.) Generic binary formats like Java's object serialization are much worse . It is much simpler, faster, and more effective to just write the text of the XML document onto a stream or into a string.

For example, I wrote a simple program to load the XML 1.0 specification into a JDOM document object and write it out again as both XML and a serialized object. The binary format was 537,480 bytes long. The original XML document was only 201,918 bytes. Including the full DTD (which is not necessary) adds another 57,135 bytes to the XML document. The total is still less than half the size of the serialized object.

If that's not good enough for you, you can always compress the serialized XML (Item 50), which makes XML even more space efficient. The gzipped binary object in this case is 117,443 bytes long. However, the gzipped XML is only 54,007 bytes long. Any way you slice it, text is more efficient. The best way to store or transmit XML data is as XML, or perhaps as compressed XML if size is very important. Object serialization buys you nothing.

The reverse is not necessarily true. In some circumstances, it may indeed make sense to transmit binary data using text-based XML formats instead of the native format. A number of developers who have written replacements for Java's object serialization using XML report significant savings of both time and space. These include the following:

  • Mark Wutka's JOX (http://www.wutka.com/jox.html)

  • Bill LaForge's Quick (http://qare. sourceforge .net/web/2001-12/products/quick/index.html)

  • Brendan Macmillan's JSX (http://www.csse.monash.edu.au/~bren/JSX/)

  • KOML, The Koala Object Markup Language (http://koala.ilog.fr/XML/serialization/)

  • CastorXML (http://castor.exolab.org/)

  • Sun's long- term persistence for JavaBeans ( java.beans.XMLEncoder and java.beans.XMLDecoder )

And this is just Java! I won't even attempt to list the similar tools you'll find for Python, Perl, C#, C++, and other languages. .NET has an XML-based object serialization in its core class library.

Of course, binary object serialization is more than just a storage format. It's also a critical component of network computing systems like CORBA and Java's Remote Method Invocation (RMI). These systems depend on complex, serialized binary objects. However, it's no coincidence that in the real world, both of them are getting creamed by the success of much simpler XML-based protocols such as XML-RPC and SOAP, as well as more RESTful systems like RSS. This may well be an example of the triumph of worse is better. Both XML-RPC and SOAP are missing numerous features of the more complex protocols:

  • Transactions

  • Remote garbage collection

  • Callbacks

  • Stubs and skeletons

  • Remote class loading

However, I have to say I'm not at all convinced these are required for most useful work. What most developers need is a straightforward way to move bundles of data from point A to point B on the network, with some confirmation that the data got there. This much both XML-RPC and SOAP provide. RESTful systems like RSS do an even better job. More than this may not be necessary most of the time.

Bottom line: If you need to transmit or store XML, transmit or store XML. Do not transmit or store some other variation of the XML document. XML is syntax, and only Unicode in angle brackets is real XML. All other formats are just poor copies that often don't perform as well as XML itself.



Effective XML. 50 Specific Ways to Improve Your XML
Effective XML: 50 Specific Ways to Improve Your XML
ISBN: 0321150406
EAN: 2147483647
Year: 2002
Pages: 144

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net