Serializing Beans to XML and Back Again | Applied Software Engineering Using Apache Jakarta Commons (Charles River Media Computer Engineering)

Writing applications using SQL commands is a very good strategy if you are using current technologies. For example, many companies will have an Oracle Database or Microsoft SQL Server Database. Using SQL has proven to be an effective way of storing data. The problem with SQL is in how you serialize to a data structure. This process can often be very complicated and requires some processing power. When you build applications using these technologies, it is typically called "building an n - tier application."

In this section of the chapter, XML is used to store and retrieve data. The power of SQL is its ability to sort and find data when the data is organized in a concise manner. The problem with using SQL is that in a longer- term application, earlier created database tables may not be compatible with ones designed more recently. Incompatibility occurs because you cannot upgrade SQL data partially. Either a column is added to the table or a column is removed. The only other option, if you want to partially upgrade a table, would be to create a new table. If you used this solution, you would have to copy and maintain the data in two locations. Regardless of how you define SQL tables, they are very hard to upgrade. Consider the XML in Listing 5.11.

Listing 5.11

 <data> <today>I defined something</today> </data>

Let's say that the XML in Listing 5.11 was created at a point in time called X . We would then create the XML in Listing 5.12 at a point in time called X + N , where N is some later time.

Listing 5.12

 <data> <today>I defined something</today> <tomorrow>Some more data</tomorrow> </data>

The interesting part of XML is that the data added in Listing 5.12 does not affect the original data. Therefore, any application that reads the original data will still be able to function. In addition, any application that modifies the new data can cope with the old data. This can occur in XML because XML is a structured document that allows two entirely different pieces of data to be embedded in the same document.

In the Internet world, searching and finding data has become a well-defined science. Take, for example, a search engine, which crawls and attempts to index the Internet. While it crawls the Internet, documents are archived and stored for later reference purposes. Therefore, when you ask a search engine to find a document or some information, you can pinpoint what you need. Put a search engine together with an XML document and the result is a modern database system.

Technical Details for the betwixt Package

This section of the chapter will discuss the betwixt package. Using this library, you can serialize Java objects to XML and back again. Tables 5.3 and 5.4 provide an abbreviated description of the betwixt package.

Table 5.3: Repository details for the *betwixt* package.
Item	Details
CVS repository	Jakarta-commons
Directory within repository	Betwixt
Main packages used	org.apache.commons.betwixt.io., org.apache.commons.betwixt., org.apache.commons.betwixt.strategy., java.io. (for the serialization and deserialization)

Table 5.4: Package and class details (legend: [lang] = org.apache.commons.betwixt).
Class/Interface	Details
[betwixt].io.BeanWriter	A class used to read XML files that are serialized beans.
[betwixt].io.BeanReader	A class used to write XML from a Java Bean class.
[betwixt].XMLIntrospector	A class used to introspect specific Java Bean classes used to serialize and deserialize .
[betwixt].strategy.*	All of the classes and interfaces in this package are relatively important because they allow a developer to customize the overall serialization and deserialization process.

Writing a Java Bean

The main objective of the betwixt package is to serialize a Java class to and from an XML file. It is important to remember that serialization is not the same as saving data to a SQL database. Serialization is the process of saving the state of an object to another medium, whereas saving data to a SQL database is saving the state of a running business process. The Java serialization file format is specific to Java. In contrast, practically every programming language that exists can parse the XML format. What the betwixt package does is override the default serialization mechanism and add support for XML. The betwixt package requires that a Java Bean be able to serialize XML content. Listing 5.13 shows a sample Java Bean.

Listing 5.13

 package com.devspace.jseng.serialization; public class BeanToWrite implements java.io.Serializable { private int _iValue; private String _strValue; public BeanToWrite(int ival, String sval) { _iValue = ival; _strValue = sval; } public int getIntegerValue() { return _iValue; } public void setIntegerValue(int val) { _iValue = val; } public String getStringValue() { return _strValue; } public void setStringValue(String val) { _strValue = val; } }

For a Java Bean to be suitable for XML serialization, the Java Bean must be a public class, as shown in Listing 5.13. If the class is not a public class, then an access error will occur. The problem is not that the exception is thrown but that the exception is caught and an unpredictable result occurs. This results in an empty data set. If the Java class exposes public data members like in Listing 5.14, those data members will not be serialized.

Listing 5.14

 public class TestClass {  public int SomePublicDataMember; }

Once the Java Bean has been defined, which in our examples is by the class BeanToWrite , it can be used by the betwixt package and serialized to XML. The simplest example of XML serialization is shown in Listing 5.15.

Listing 5.15

 import org.apache.commons.betwixt.io.BeanWriter; BeanToWrite somebean = new BeanToWrite(1234, "hello  world"); BeanWriter writer = new BeanWriter(); writer.enablePrettyPrint(); writer.write(somebean);

The main class in Listing 5.15 is the class BeanWriter . This class is responsible for serializing the Java Bean to XML. In Listing 5.15, the method write has one parameter, which is the Java Bean to serialize. The class BeanWriter exposes several helper methods (and in Listing 5.15, enablePrettyPrint is one of the methods):

enablePrettyPrint: When the XML output is generated, this enables the pretty printing of the XML content, which means humans can read it easily.
writeXmlDeclaration: Before serialization, this enables the writing of an XML declaration to the serialized XML. An example is < ?xml version='1.0' ? >.
getIndent , setIndent: When the you pretty print the XML output, this retrieves or sets the string used to specify an indent in the output string.
getLog , setLog: This retrieves or sets the logging handler used to capture log events.
getWriteIDs , setWriteIDs: These retrieve or set a Boolean flag used to determine whether or not the Java Bean IDs are written.
getWriteEmptyElements , setWriteEmptyElements: These retrieve or set a Boolean flag used to determine whether or not to write an empty element. An empty element is one where the value of the element does not reference any meaningful data. For example, in Listing 5.13, the getter for the property StringValue could have returned a null value. In that case, if the WriteEmptyElements flag were set to false , then the getter StringValue would not be serialized in the XML data stream. If the flag value were set to true , then the property StringValue would be written; however, an empty XML data set will be generated.

Listing 5.16 shows the XML data stream that's generated when Listing 5.15 is executed.

Listing 5.16

 <BeanToWrite> <integerValue>1234</integerValue> <stringValue>hello world</stringValue> </BeanToWrite>

Please note that Listing 5.16 is the default example of a Java Bean being written into an XML stream. We mention this because the XML data stream XML tags ( BeanToWrite , integerValue , and stringValue ) are based on the default betwixt naming strategy. The root XML element is BeanToWrite , which maps one-to-one to the name of the Java Bean being serialized. Notice, though, that the name of the XML element does not include the namespace of the Java Bean. This is an important consideration because if two identically named Java Beans with different namespaces were serialized, they would use the same XML tag. In Listing 5.16, notice the XML tag element identifiers integerValue and stringValue . Java Bean Patterns uses these names to identify various properties.

Consistent XML Naming Conventions and Name Mappings

The name of XML tag in Listing 5.16 is identical to the name of the Java Bean property in Listing 5.13. This is not a problem and follows the Java Bean encoding rules. What is a problem is that in Listing 5.16, some XML tags start with a capital letter and some start with a lower case letter. This is not consistent and not a good naming practice. Although we said that Listing 5.16 is the default XML serialization, the reality is that the default naming standards will never be used. What will be used, however, is all lower case, all upper case, all hyphenated, or some combination. The power of the betwixt package is not its ability to serialize a Java Bean to XML but its ability to let a developer control the serialization. To show you what we mean, we'll convert the XML output in Listing 5.16 to first letter lowercase entirely. Listing 5.17 shows the results.

Listing 5.17

 BeanToWrite somebean = new BeanToWrite(1234, "hello world"); BeanWriter writer = new BeanWriter(); XMLIntrospector introspector = new XMLIntrospector(); introspector.setElementNameMapper(new org.apache.commons.betwixt.strategy.DecapitalizeNameMapper()); writer.setXMLIntrospector(introspector); writer.enablePrettyPrint(); writer.write(somebean);

To be able to generate a different case using the betwixt package, the class XMLIntrospector is used in Listing 5.17. The purpose of the class XMLIntrospector is to allow a developer to override specific defaults so that he can custom-tailor the XML serialization. In Listing 5.17, the method setElementNameMapper is used to replace the default XML tag naming to the class instance of DecapitalizeNameMapper . In the betwixt package, XML tag naming is called "element name mapping." Running Listing 5.17 yields the output generated in Listing 5.18.

Listing 5.18

 <BeanToWrite> <integerValue>1234</integerValue> <stringValue>hello world</stringValue> </BeanToWrite>

Now, Listing 5.18 contains consistent XML tag names because all of the first letters of the individual XML tags are lowercase. The following name mappers are available in the org.apache.commons.betwixt.strategy package:

DefaultNameMapper: This does nothing; it's just a placeholder for the XMLIntrospector class.
DecapitalizeNameMapper: This changes the first letter of all XML elements to be lowercase.
CapitalizeNameMapper: This changes the first letter of all XML elements to be uppercase.
HyphenatedNameMapper: This is a flexible name mapper that hyphenates and changes the case of the first letter for multiword identifiers. For example, in Listing 5.18, there was the XML tag beanToWrite , which would be changed to the XML tag identifier to use hyphens instead of capitalization like bean-to-write . The hyphen is used to separate word sections in the identifier. The use of hyphens is a popular form in the XML world. To be able to separate an identifier into individual word sections, you need to use an algorithm. The algorithm that is used to detect a word break is based on Camel Humped Naming , which is when the first letter of a multiword identifier is a capital, and the second or thereafter letter is lowercase. When a word break is found, a separator is inserted. The default is a hyphen character. You can define the separator by using the class method HyphenatedNameMapper.setSeparator . In the default case, the individual broken words are set to lowercase letters. Using the class method HyphenatedNameMapper.setUpperCase(true) , you can set the first letter of the individual broken words to an uppercase letter.

Writing Your Own Name Mapper

The name mappers available in the betwixt package cover most of the bases. However, sometimes a developer may want to create his own name mapping. The developer may want his own name mapping for a business process reason or to use a specific XML Schema. The reason is not important. What is important is how a custom name mapper can be implemented. It is not a difficult task and is as simple as the code in Listing 5.19.

Listing 5.19

 class ItsANameMapper implements NameMapper { public String mapTypeToElementName(String typeName) { return " its_a_" + typeName; } }

In Listing 5.19, the class ItsANameMapper implements the interface NameMapper , which implements the single method mapTypeToElementName . The purpose of the method mapTypeToElementName is to convert the input parameter typeName to an acceptable identifier and then return the modified value. In the case of Listing 5.19, the text " its_a " is prefixed to all elements. The generated output shown in Listing 5.20 is a bit silly, but it shows the result of the sample name mapper.

Listing 5.20

 <its_a_BeanToWrite> <its_a_integerValue>1234</its_a_integerValue> <its_a_stringValue>hello world</its_a_stringValue> </ its_a_BeanToWrite>

If you read the XML result in Listing 5.20, there is a typo. The identifier integerValue starts with a vowel; hence, the text should have read its_an_integerValue . We could have added that logic to Listing 5.19, but its implementation is beyond the scope of this book. The point is that you can define a custom name mapper without affecting other parts of the XML generation process.

As a last note, the class ItsANameMapper is used in the same manner as the class DecapitalizeNameMapper is in Listing 5.17. Listing 5.21 is a shortened listing that uses the class ItsANameMapper .

Listing 5.21

 XMLIntrospector introspector = new XMLIntrospector(); introspector.setElementNameMapper(new ItsANameMapper());

Serializing Java Beans That Reference Java Beans

The serialization process for the Java Bean in Listing 5.13 is a simple single Java Bean that does not reference other Java Beans. While the betwixt package does automatic serialization for most Java Bean features, special circumstances sometimes arise. For example, if a Java Bean references a collection of other Java Beans, you might wonder how those objects would be serialized. The various Java Bean structures have predictable XML serialization structures, but it is important to know the various situations.

After a single Java Bean serialization, the next level of complexity is when a Java Bean references another Java Bean. In Listing 5.22, a new bean will be created and it will reference the Java Bean in Listing 5.13.

Listing 5.22

 public class ParentBean { private String _dataMember; private BeanToWrite _bean; public ParentBean(String val) { _bean = new BeanToWrite(1234, val); _dataMember = val; } public String getDataMember() { return _dataMember; } public void setDataMember(String val) { _dataMember = val; } public BeanToWrite getMyReferenceToAnotherBean() { return _bean; } }

In Listing 5.22, the class ParentBean references the class BeanToWrite and exposes the property myReferenceToAnotherBean . The additional property dataMember is added in the XML serialization to illustrate how a child object would be serialized. The output of serializing the class ParentBean is Listing 5.23.

Listing 5.23

 <ParentBean> <dataMember>hello world</dataMember> <myReferenceToAnotherBean> <integerValue>1234</integerValue> <stringValue>hello world</stringValue> </myReferenceToAnotherBean> </ParentBean>

Listing 5.23 used default XML serialization to generate the output. What is important to notice is how the bean reference _bean in Listing 5.22 is serialized. When the class BeanToWrite is serialized, there is no BeanToWrite XML tag. Compare that to the generated output of Listing 5.18, where there is a BeanToWrite XML tag. The name of the XML tag, which should have been the XML tag BeanToWrite , is myReferenceToAnotherBean , which is the same name as the Java Bean property of the parent Java Bean defined in Listing 5.22. The problem with this serialization is that you can serialize the same structure in multiple ways. Another approach to serializing the same Java Beans is shown in Listing 5.24.

Listing 5.24

 <ParentBean> <dataMember>hello world</dataMember> <BeanToWrite> <integerValue>1234</integerValue> <stringValue>hello world</stringValue> </BeanToWrite> </ParentBean>

In the Listing 5.24, the XML tag myReferenceToAnotherBean is replaced with BeanToWrite . This solution is what many XML structures expect, because the class ParentBean and BeanToWrite have nothing in common with each other (other than that one class references the other). XML's power is that it can embed another document and structure without ruining the higher-level structure. Therefore, it would appear that we should have implemented the first approach.

Using the approach from Listing 5.24 does have its issues as well. Imagine building a mortgage application. In the first release of the mortgage application, there can be only one signer of the mortgage. Imagine that another person wants to sign for the mortgage. There would be two co-signers. The problem then is how to store the two co-signers as XML elements in the document. From an XML point of view, the solution is simple; just store both of them because the XML document does not have a problem with that. Where the problem becomes difficult is when the betwixt package needs to read the two co-signers of the mortgage. Which one will be read and how will both be stored? In the original approach, each co-signer could be uniquely identified as a unique XML tag. A solution would be to use an array, but even that has problems; a mailing address and domicile address, while being the same type, have very different purposes logically.

Another approach to the serialization of the Java Bean is shown in Listing 5.25.

Listing 5.25

 <ParentBean> <dataMember>hello world</dataMember> <myReferenceToAnotherBean> <BeanToWrite> <integerValue>1234</integerValue> <stringValue>hello world</stringValue> </BeanToWrite> </myReferenceToAnotherBean> </ParentBean>

In Listing 5.25, both the XML elements myReferenceToAnotherBean and BeanToWrite are embedded. In this approach, you don't have any problem figuring out which part of the document belongs to which Java Bean. However, this approach is not an ideal solution, since it indicates that two entirely different documents are embedded. In the case of a mortgage application, this is not the case. This approach would be legitimate when you are writing SOAP packets. SOAP is a Web Services XML specification and explicitly defines that there is an outer package and inner package. Think of it like a letter that's ready to mail; it's composed of the paper the letter is written on and the envelope that encloses the letter.

We can see from these examples that there is no ideal way of serialization and that the default mechanism shown in Listing 5.23 is good enough. Later in this chapter, we will explain a more sophisticated technique on how to control the serialization process.

Serializing Collections

When a Java Bean references another Java Bean, that is a one-to-one relationship. Using collections, you can have a one-to-many relationship. A collection contains and references many other objects, which could contain another one-to-many relationship. To keep things simple, let's consider a simple collection using the class Vector , shown in Listing 5.26.

Listing 5.26

 public class CollectionBeanToWrite { private java.util.Vector _items =  new java.util.Vector(); public void addItem(BeanToWrite bean) { _items.add(bean); } public java.util.Iterator getItems() { return _items.iterator(); } }

In Listing 5.26, a betwixt package-type "Java Bean" is defined. Listing 5.26 could also be defined as a betwixt bean. The methods addItem and getItems use a naming convention that the betwixt package programmers have defined. These two methods define a getter and a setter for collections. The difference between an individual getter and setter is that the betwixt collection getters and setters retrieve a collection and set individual objects. From a programming point of view, this makes sense because when serializing to XML, the betwixt package wants an interface instance of type Iterator to iterate through the individual objects. However, when serializing from XML, the betwixt package wants to be able to instantiate an object, serialize it, and then add the object to the collection.

The betwixt getter and setter notation is identical to Listing 5.27.

Listing 5.27

 public void add[PROPERTY NAME]([OBJECT TYPE] bean) { } public Iterator get[PROPERTY NAME]s() { }

In contrast to Java Beans, which require a get and set, the betwixt collection bean notation, shown in Listing 5.27, requires an add and get. However, the get requires that you append an s after the property name. The concept of adding an s to the method get is called pluralization . Other plurals that are supported are Array , List , and Iterator . The object type is specified only by the method add . In Listing 5.27, the method get returns an interface instance of type Iterator . Also supported are get methods that return an object array, collection, enumeration, or map.

The code defined in 5.26 is executed using the same consumer source code as shown in Listing 5.15. The generated result is shown in Listing 5.28.

Listing 5.28

 <CollectionBeanToWrite> <items> <item> <integerValue>1</integerValue> <stringValue>first item</stringValue> </item> <item> <integerValue>2</integerValue> <stringValue>second item</stringValue> </item> <item> <integerValue>3</integerValue> <stringValue>third item</stringValue> </item> </items> </CollectionBeanToWrite>

In Listing 5.28, the naming convention is very similar to all of the previously defined identifiers. The collection is encoded within the XML tag items . The identifier items is based on the property items from the class CollectionBeanToWrite . Within the XML element items are a number of XML child elements item . The identifier item is not the singular form of the identifier items; rather, it's a default identifier defined by the betwixt package. Then, within the individual XML child elements, item is the serialized contents from the class BeanToWrite .

Serializing Maps

When you serialize a Map -based class, the output is slightly different than a simple collection or array. A Map is a key value pair type of collection. Unlike with the XML output, the changes we need to make in the listings are relatively minor. Listing 5.29 shows the methods that need to be added to allow betwixt to serialize a Map .

Listing 5.29

 public class MappedBeanToWrite { private java.util.HashMap _items = new java.util.HashMap(); public MappedBeanToWrite() { } public void addItem(String key, BeanToWrite bean) { _items.put(key, bean); } public java.util.Map getItems() { return _items; } }

In Listing 5.29, the class MappedBeanToWrite uses the HashMap to store the key value pair. Using serializing Map 's, the betwixt serialization routines search for the proper getter and adder methods, like when serializing arrays or lists. However, since a Map is associated with a key value pair, the adder method has two parameters: string (which represents the key) and object (which represents the value). When the class MappedBeanToWrite is serialized, the output in Listing 5.30 is generated.

Listing 5.30

 <MappedBeanToWrite> <items> <entry> <key>third item</key> <value> <integerValue>3</integerValue> <stringValue>third item</stringValue> </value> </entry> <entry> <key>first item</key> <value> <integerValue>1</integerValue> <stringValue>first item</stringValue> </value> </entry> <entry> <key>second item</key> <value> <integerValue>2</integerValue> <stringValue>second item</stringValue> </value> </entry> </items> </MappedBeanToWrite>

The output that is generated is very similar to the output generated in Listing 5.28. The difference is that instead of the XML child elements item , the XML child element entry is generated. Within the XML child element entry are two XML child elements, key and value . The XML child element key contains the key that was given in the method add . The XML child element value contains the serialization of the class BeanToWrite . The XML elements entry , key , and value are defaults provided by the betwixt package.

Generating and Renaming XML Attributes

We introduced the class XMLInspector when we talked about generating custom name mappings for XML elements. However, this class can do other things that allow the developer to influence how the XML content is serialized. In Listing 5.17, the XMLIntrospector class was used to define a new naming strategy for individual XML elements. You can use the same code to rename individual attributes, as shown in Listing 5.31.

Listing 5.31

 ParentBean bean = new ParentBean("hello world"); BeanWriter writer = new BeanWriter(); XMLIntrospector introspector = new XMLIntrospector(); introspector.setElementNameMapper(new  org.apache.commons.betwixt.strategy.HyphenatedNameMapper()); introspector.setAttributeNameMapper(new org.apache.commons.betwixt.strategy.HyphenatedNameMapper()); introspector.setAttributesForPrimitives(true); writer.setXMLIntrospector(introspector); writer.enablePrettyPrint(); writer.write(bean);

Listing 5.31 uses the same ParentBean class used in the previous serialization examples, except now more betwixt classes are used. The class ParentBean is the multiple-reference Java Bean that will be serialized. The XMLIntrospector is allocated pretty much the same way as it was previously. The method setElementNameMapper modifies the individual XML elements. The method setAttributeNameMapper is a new method used to modify the naming of the individual XML attributes. This method, while useful, does not make much sense since all of the XML generated thus far has not had any XML attributes. A way of generating attributes is to call the method setAttributesForPrimitives with a parameter of true . This converts the class properties into XML attributes instead of XML child elements. When the class instance of XMLIntrospector is assigned to the class instance of BeanWriter, there can be only one assignment. It is not possible to assign multiple class instances of XMLIntrospector. If Listing 5.31 is executed, the output in Listing 5.32 is generated.

Listing 5.32

 <parent-bean data-member="hello world"> <my-reference-to-another-bean integer-value="1234" string-value="hello world"/> </parent-bean>

In Listing 5.32, XML elements are only generated for each of the class instances. The class ParentBean represents itself and the class BeanToWrite, so there are only two XML elements. The properties on each of the class instances are serialized to XML attributes. This strategy of serializing XML is OK to use; however, this approach tends to resemble a record set approach and is not that XML friendly. A bit later in this section of the chapter, we will discuss an XML serialization strategy.

Fine-Tuning Plural Descriptors

When betwixt generates collections, the default mechanism of finding both the add and get is to associate the singular form with the plural form. For example, in Listing 5.26, the singular is the identifier item and the plural form is items. The singular and plural default forms work only for the English language. Therefore, it would be beneficial if the singular and plural forms of other languages could be recognized as well. For example, in German the singular form of "book" is "buch," but the plural is "bcher." The German plural form of the word "book" is rather complicated, because a new letter, , is substituted and an "er" is added to the end of the word. We need a more complicated pluralization rule. The way to implement this sort of rule is similar to how you implement a custom name mapping. However, you support a different interface. In Listing 5.33, a collection class that uses German singular and plural forms of the word "book" is defined.

Listing 5.33

 public class GermanCollection { private java.util.Vector _items = new  java.util.Vector(); public void addBuch(BeanToWrite bean) { _items.add(bean); } public java.util.Iterator getBuecher() { return _items.iterator(); } }

Making the betwixt package aware of the German collection requires an implementation class that implements the interface PluralStemmer . And, like in the name-mapping example in Listing 5.21, the implementation class needs to be associated with the class XMLIntrospector . The implementation class is defined in Listing 5.34.

Listing 5.34

 class GermanPluralMapper implements org.apache.commons.betwixt.strategy.PluralStemmer { public org.apache.commons.betwixt.ElementDescriptor findPluralDescriptor(String propertyName, Map map) { if(propertyName.equals("buch")) { org.apache.commons.betwixt.ElementDescriptor answer =  (org.apache.commons.betwixt.ElementDescriptor) map.get("buecher"); if(answer != null) { return answer; } } return null; } }

The class GermanPluralMapper in Listing 5.34 implements the interface PluralStemmer . The interface PluralStemmer requires that only the method findPluralDescriptor be implemented. In the implementation of findPluralDescriptor , the objective is to cross-reference the property name with a list of properties available. To find paired collection properties, betwixt first retrieves all of the methods that start with add. These methods have the add removed from the front of the identifier. Then, from those found methods, the interface method findPluralDescriptor is called to find the pairs of methods. Going back to Listing 5.33, the parameter propertyName will contain the value buch . The parameter map will contain a list of possible get methods, which in the case of Listing 5.33, will be getBuecher . However, the names of the identifiers contained within the parameter map have the get trimmed , so that only buecher is left over. Within the implementation of findPluralDescription , we need to cross-reference the identifier buch with some identifier in the map. And, if a match is found, the associated instance of the class ElementDescriptor is returned. If a match is not found, a null value is returned.

If in Listing 5.33 there were no addBuch method and only a getBuecher method, then the PluralStemmer interface instance would not be called. As a result, you might think that the collection will not be serialized; however, the collection is serialized because the default rule of serializing an Object of type Iterator is applied.

Using betwixt Configuration Files

All of the techniques discussed thus far do not allow you to fine-tune how the data is serialized. The techniques outlined show you how to tune the overall process, which is useful as well. The point of fine-tuning is that it allows a developer to read and write data exactly to how the XML should be structured. For example, when we renamed the XML tags, all XML tags were renamed . It would have been difficult to rename an XML type based on a specific context, since the contexts used were simple. For example, in the renaming example, the context was the name of the property, which could be identical for two entirely different Java Bean classes.

The fine-tuning process works at the class-type level and allows a developer to specify how everything will be serialized. In the betwixt package, that means using betwixt configuration files. In theory, to use betwixt, it is not necessary to write code like that shown in all of the previous examples. The betwixt configuration files could do everything that is necessary. The only code that you would need to write is the code to consume the betwixt package. Following are general XML rules on how to generate data; for example, when to use XML tags and when to use XML attributes.

General XML Object Structure Rule

When an XML file is created, you place XML elements in a certain structure. Typically, an XML element and its child elements are primary entities, and the XML attributes describe properties. For example, a car is a primary entity and so are the wheels, engine, and seats. However, the serial number of the model could be a property that describes the car. A property would typically be a property that provides a mechanism to quickly distinguish one XML element from another XML element, as shown by the following example.

 <car serial="1234d32"><wheels>2</wheels></car>

Some people may argue that the serial number is a primary entity and should be an XML element. The response to this would be that if the serial number were attached to an invoice, then it would be a primary entity. An invoice does not care if the car has four, five, or ten wheels. The invoice cares only if the car can be uniquely identified from another car. The simplest way to do that is to use the serial number. In the case of an invoice, the number of wheels might be a descriptor, as shown by the following example.

 <invoice><item> 
 <car wheels="2"><serial>1234d32</serial></car> 
 </item></invoice>

However, this rule is not a heavy-handed rule that you must follow every time. Instead, apply this rule as necessary. Be sure, though, to avoid XML structures that are either all XML elements or XML attributes.

The betwixt configuration file allows you to alter the metadata stored about the object. Remember back to Listing 5.15, where the class method BeanWriter.write was called. What happened in that step was that the bean was being introspected. The different fields, properties, and methods were inspected for potential serialization. However, doing introspection every time a bean is serialized would result in a massive performance hit. And that would not be very useful.

So that there is no major performance hit, the betwixt package uses the class XMLIntrospector to cache betwixt configuration files. We have used this class in various examples to control how XML content is serialized. The betwixt configuration files are cached in the XML-Introspector class, so it is an extremely important reference class that should be stored someplace as a singleton. For example, we could use the lang factory package singleton.

In the serialization process when a Java class is introspected, the class XMLIntrospector manages a number of settings that control XML serialization. Important is something called the XML bean registry . Every class that is serialized has its serialization process stored in the XML bean registry. This is the cache used so that an individual class does not have to be introspected twice. The class XMLBeanInfoRegistry represents the XML bean registry. The class XMLBeanInfoRegistry is considered a cache, and you can influence that cache by setting values using method calls or by creating a betwixt file.

Listing 5.35 is the basic Java class definition used to define a betwixt configuration file.

Listing 5.35

 public class BetwixtBean { private int _iValue; private String _strValue; public BetwixtBean(int ival, String sval) { _iValue = ival; _strValue = sval; } public int getIntegerValue() { return _iValue; } public void setIntegerValue(int val) { _iValue = val; } public String getStringValue() { return _strValue; } public void setStringValue(String val) { _strValue = val; } }

Listing 5.36 shows the associated betwixt configuration file, which is called BetwixtBean.betwixt .

Listing 5.36

 <info primitiveTypes="element">  <element name='better-name'> <addDefaults/> </element> </info>

If Listing 5.15 were executed to serialize the class BetwixtBean defined in Listing 5.35, the result would appear identical to Listing 5.37.

Listing 5.37

 <better-name> <integerValue>1234</integerValue> <stringValue>hello world</stringValue> </better-name>

It is worth taking a few moments to consider Listings 5.35, 5.36, and 5.37 because there are several points you should notice:

The name of the betwixt file is identical to the name of class and resides in the same location as the class file. Therefore, if the class filename is hello.java , then the betwixt configuration filename is hello.betwixt .
The betwixt file is XML based and determines the serialization characteristics of a single class. Each class has its own betwixt file.
The XML element info is the root element of all betwixt files
The XML child element element represents the outermost serialization XML element. In other words, in this serialization, the XML element element will associate itself with the class identifier BetwixtBean . If there are multiple XML elements element , as there are in Listing 5.36, then the last one will be used for serialization.
You can control attributes in the XMLIntrospector (like the XML attribute primitiveTypes and XML element addDefaults) by adding them in the betwixt file. The purpose of the XML attribute primitiveTypes is to define how the properties of the class BetwixtBean will be generated. In Listing 5.36, the properties are generated as XML elements. To generate properties as attributes, the XML attribute value element is replaced with attribute. The XML element addDefaults indicates that the XMLIntrospector should be populated with default values.
All applications should define betwixt files because they are the only safe way of writing or reading the XML content properly.

Adding Static Content

In the XML serialization process, it might be useful to add static data. The static data could be version information. Alternatively, it could be data related to something that another process wants, or the static data could serve administrative purposes. For example, an application could be written that generates old data. To modernize the application, we could generate some static data that acts as a placeholder. This solution would be cheaper than having to rewrite the application to add the extra data. Listing 5.38 shows a sample betwixt configuration that adds static content.

Listing 5.38

 <info primitiveTypes="element">  <element name='documentation'> <attribute name='version' value='1.0'/> <element name='author'> <element name="location" value="Zurich"> <attribute name='version' value='1.0'/> </element> <attribute name='version' value='1.0'/> </element> <addDefaults/> </element> </info>

In Listing 5.38, the XML elements attribute and all child XML elements element from the first XML element that has the attribute value documentation are static elements. The XML content that is generated is shown in Listing 5.39.

Listing 5.39

 <documentation version="1.0"> <author version="1.0"> <location version="1.0">Zurich</location> </author> </documentation>

If you compare Listings 5.38 and 5.39, it would seem that all XML elements other than the first XML element element are static content elements. This is true in the simplest of situations, but only because a specific attribute is missing. When you define various XML elements element and attributes with associated value attributes, you create static content. For example, for the first XML element attribute, there are two XML attributes: name and value. These two attributes define an XML attribute that is created in the generated content.

The generated XML attribute is associated with the parent XML element in the betwixt configuration file. Practically speaking, this means that when the betwixt configuration file in Listing 5.39 is processed , the generated XML element documentation has an associated XML attribute version with a value of 1.0. When an XML element element is a child element, then the XML attributes name and value in the betwixt configuration file define an XML tag and value in the generated XML file. For example, in the betwixt configuration file in Listing 5.38, the XML element element with the XML attributes name and value will generate the XML element location, with a contained value of Zurich. We saw the result of this in Listing 5.39.

Mapping a Property to a Different Element

When an object is serialized, the individual properties are iterated and serialized. It is possible to generate additional tags or remap the identifiers to something else. Remember back to Listings 5.22, 5.23, and 5.24, where the problem was how to generate Java Beans that reference other Java Beans. We can easily solve this problem by defining a mapping within a betwixt file, as shown in Listing 5.40.

Listing 5.40

 <info primitiveTypes="element">  <element name='documentation'> <element name="embedded"> <element name="ex-bean-to-write" property="exBeanToWrite"/> </element> <addDefaults/> </element> </info>

In Listing 5.40, the XML element element with the attribute embedded is a static declaration. However, the child XML element element with the attribute ex-bean-to-write is mapped to the property exBeanToWrite . What happens in the serialization process is that an XML element with an identifier embedded will be generated, and embedded within that tag will be the XML element ex-bean-to-write , which represents the class property exBeanToWrite . The output will resemble Listing 5.41.

Listing 5.41

 <documentation> <embedded> <ex-bean-to-write> <integerValue>1234</integerValue> <stringValue>another bean</stringValue> </ex-bean-to-write> </embedded> </documentation>

Listing 5.41 shows how easily and neatly you can solve the problem of having one Java Bean reference and Java Bean.

Mapping a Property to an Attribute

When you serialized the individual properties, the serialization occurred with all Java properties either being XML child elements or XML attributes. There was no happy middle where some properties were serialized as XML elements and others as XML attributes. Using a betwixt file, you can serialize the Java property as either an XML element or XML attribute. For example, let's say you have a Java property that is an integer and needs to be serialized as an attribute. The sample betwixt file is shown in Listing 5.42.

Listing 5.42

 <info primitiveTypes="element">  <element name='documentation'> <attribute name="version" property="version" /> <addDefaults/> </element> </info>

When this is executed, you get the results shown in Listing 5.43.

Listing 5.43

 <documentation version="1"> <! Some other XML elements > </documentation>

You define a mapping of a property by using the XML element attribute and then adding an attribute property , as shown in Listing 5.42. This notation is very similar to that used with a static content declaration, except that instead of having an XML attribute value , there is an XML attribute property . You place the attribute according to the same rule as with static content.

Mapping a Property to Text

In the final example, you can map a property to an XML text. Listing 5.44 shows how to do it.

Listing 5.44

 <info primitiveTypes="element">  <element name="documentation"> <text property="version" /> </element> </info>

The XML element text in Listing 5.44 associates the property version with an output, which would generate the output shown in Listing 5.45.

Listing 5.45

 <documentation>1</documentation>

Hiding a Property

In all of the examples illustrated thus far, the introspection process has added all properties to the list of being serialized. In the betwixt file, you can indicate that a property should not be serialized. The way to do this is to use the XML element hide, as shown in Listing 5.46.

Listing 5.46

 <info primitiveTypes="element">  <element name="documentation"> <hide property="version" /> </element> </info>

In Listing 5.46, the property to be hidden is version .

Reading an XML File

It is desirable to be able to read every bean that is written. Reading the bean is as simple as writing it. Things can get complicated, however, if you don't serialize the bean using a betwixt file. The betwixt file is a central part of the betwixt package and hence has been supported and tested extensively. Listing 5.47 is an example of where the class BetwixtBean is written to a string buffer, which is then read again using the class BeanReader .

Listing 5.47

 StringWriter stringWriter = new StringWriter(); BetwixtBean somebean = new BetwixtBean(1234, "hello world"); BeanWriter writer = new BeanWriter(stringWriter); writer.enablePrettyPrint(); writer.write(somebean); stringWriter.flush(); String xml = "<?xml version='1.0'?>\n" + stringWriter.toString(); BeanReader reader = new BeanReader(); reader.registerBeanClass(BetwixtBean.class); BetwixtBean readBean = (BetwixtBean)reader.parse(new StringReader(xml));

In Listing 5.47, the class StringWriter is a string buffer that will be written to. We could have used the class FileWriter or any other type of writer in its place. The class instance of StringWriter is passed to the class BeanWriter as a constructor parameter. The association created by the parameter passed in the constructor means that any serialization will automatically be saved to the writer. Combining an XML prolog with the contents of the string buffer referenced by variable stringWriter creates a fully complete XML buffer.

Once a complete XML buffer is available, the buffer can be parsed using the class BeanReader . The class BeanReader is the opposite of the class BeanWriter , but there are a couple of differences. First, for the class BeanReader to function correctly, the class BeanReader needs to know which classes can be instantiated . This is the purpose of the method registerBeanClass , which accepts as a parameter the class information of the classes that can be instantiated.

If the class information references other classes, then those other classes will be added to the available pools of classes to be instantiated. It is absolutely important, though, that all classes that can be instantiated from the pool can be instantiated without a constructor parameter(s). The classes that were serialized thus far did not have an empty constructor and therefore would generate an instantiation exception.

To reconstruct the class instance BetwixtBean , the method parse is called in Listing 5.47. The method parse accepts as a parameter a string buffer, or in this case the class StringReader .