Deciding When to Use XML | Microsoft Office PowerPoint 2007 On Demand

XML is a core J2EE technology. Due to its use for the standard J2EE deployment descriptors (application.xml, web.xml and ejb-jar.xml) and the proprietary deployment descriptors required by most servers, it's impossible for J2EE developers to avoid using XML. XML technologies can also be used by choice to provide a valuable complement to Java-based J2EE technologies. One of the important enhancements in J2EE 1.3 is that it guarantees that the JAXP 1.1 API is available to J2EE applications (JAXP makes a further move into the core Java libraries in J2SE 1.4). JAXP provides standard support for XML parsing and XSL transformations in J2EE applications.

J2EE applications may need to generate and parse XML documents to achieve interoperability with non-J2EE platforms. (However, web services style interoperability can largely conceal the use of XML under the hood; XML may also be used to support client devices and user agents.)

Using XSLT in J2EE Applications

Another strong reason to use XML technologies, which applies even within J2EE applications, is to enable the use of XSLT to transform XML data into a wide variety of output formats. This is most valuable in web applications when generating HTML, XHTML or WML, which means that XSLT may be an alternative to web tier view technologies such as JSP.

XSLT guru Michael Kay (author of the excellent XSLT Programmer's Reference, Wrox Press) writes that "When I first saw the XSL transformation language, XSLT, I realized that this was going to be the SQL of the web, the high-level data manipulation language that would turn XML from being merely a storage and transmission format for data into an active information source that could be queried and manipulated in a flexible, declarative way".

If data is held in the form of XML documents, or can easily be converted to an XML representation, XSLT provides powerful functionality that is superior to Java-based technologies such as JSP in the following areas:

Transforming tree-structured data. XSLT provides powerful functionality for navigating and selecting tree nodes, using XPath expressions.
Sorting and filtering data elements.
XML and XSLT skills are not Java specific. For example, many Microsoft technology projects use XSLT for the presentation tier and use XSLT specialists. This may mean that we can call on valuable domain-specific, rather than J2EE-specific, skills.
The XML/XSLT paradigm cleanly separates data model (XML document) from presentation (XSLT stylesheet). We can achieve this using JSP, but there's a greater temptation to ignore the need for such separation.

However, XSLT is weaker than Java-based technologies such as JSP in the following areas:

XSLT is integrated with J2EE at API, rather than specification, level. Using XML and XSLT requires some custom coding, and may require the use of third party libraries or frameworks.
Performance. XSLT transforms are usually significantly slower than rendering output using JSP, especially if the data first has to be converted from Java objects to XML documents. However, this performance overhead may not matter in many applications and may be outweighed by other positives of using XML and XSLT.
String manipulation. XSLT isn't a true programming language, and string manipulation using it is awkward and unintuitive.
Tool support. XSLT is very powerful but harder to edit without tools than JSP. XSLT tools are still relatively disappointing. When strong XSLT skills are available, this may not matter. However, a JSP solution may be simpler for many organizations to implement.

"Deep" Use of XML

To use XSLT, we need XML documents to transform. These don't need to exist as strings or on the file system: they can exist as W3C org.w3c.dom.Node objects before transformation.

There is no problem when data naturally exists as XML: for example, if it comes from an XML database, XML content management system or external system that the J2EE application communicates with using XML. While XML databases are not commonly used, the last two of these scenarios are often seen in practice.

However, if the data doesn't already exist as XML, but we wish to use XSLT for presentation, we need to decide at what point in our J2EE architecture we will convert data to XML form. We must choose between "deep" use of XML (where data is passed around the application in the form of XML rather than Java objects) and superficial use of XML (in which case XML documents are created only at the boundary of the system; for example immediately before performing an XSLT transform).

J2EE and XML Development from Manning (ISBN 1-930110-30-8) advocates "deep" use of XML. In this approach, XML documents replace Java value objects within the J2EE system, making it easy to use XSLT at the boundaries.

Often this data will come from an RDBMS, so we can try to convert query results to XML form. Ronald Bourret maintains an excellent site on this issue at http://www.rpbourret.com/xml/XMLDBLinks.htm. Bourret has also published an article on the subject, at http://www.xml.com/pub/a/2001/05/09/dtdtodbs.html?page=1, which is a useful starting point.

There's a choice between doing the mapping work in Java code, inside the J2EE server, and performing it in the database. RDBMS vendors are rushing to provide XML support, and some relational databases, such as Oracle 9i, even allow us to store XML directly in the database. We may also be able to obtain query results in XML. Such approaches aren't portable, as no relevant standards exist yet.

Unfortunately, wherever we do the conversion, it will be a non-trivial task. In the next few chapters we'll discuss the "imedance mismatch" between RDBMS schema and Java object model. There is a comparable impedance mismatch between RDBMS schema and XML document, and a similar degree of complexity in trying to bridge it. The trivial examples of RDBMS result set to XML mappings shown in books such as J2EE and XML Development are insufficient for real applications: queries can become extremly complex (and expensive to run) if we try to build a deep XML hierarchy from one or more relational queries.

The relational model for data is not hierarchical, and it's difficult to build hierarchies using it. We can run multiple RDBMS queries to produce separate shallow XML documents, hoping to use XSLT to "join" the data on the client-side (XSLT allows nodes to be looked up – in the same or in another document – by ID in an approach analogous to an RDMBS JOIN). However, this is relatively complex to implement in XSLT, and ignores the capabilities of the RDBMS itself. It's only appropriate where a small amount of reference data is concerned. For example, it might be a valid approach where a table of countries is concerned, but inappropriate where a table of invoices is concerned.

The difficulty in extracting relational data directly into XML documents is just one of several major problems with "deep" use of XML, which I believe should preclude its use in general:

We shouldn't modify overall application architecture to support a particular view strategy. What if we need to present data in a format that XSLT won't help us to generate? For example, XSLT is poor at generating binary formats.
If the application is distributed and uses EJBs with remote interfaces, passing XML documents from the EJB tier to EJB clients may be slower than passing Java objects, and we may run into serialization difficulties. W3C Node objects are not necessarily serializable; this depends on the implementation.
Java components inside the application will find it harder to work with XML documents than with Java objects. One exception is the case of tree-structured data, for which the XPath API may be used effectively. Working with XML APIs is far more cumbersome than working with Java objects.
XML does not support object-oriented principles such as encapsulation, polymorphism and inheritance. We lose even Java's strong typing – wholly, in the case of XML with DTDs, and partially, if we use the more complex XML Schema.
Working with XML is likely to prove much slower than working with Java objects. Applications that make "deep" use of XML are often slow and may waste server resources in XML and string processing.
Applications are likely to be harder to test. It's easier to test Java objects returned by an application than XML documents.

Important

I don't advocate "deep" use of XML within J2EE applications. While superficially attractive applications – especially if we wish to use XSLT for presentation – built on internal communication using XSLT are likely to be slower, harder to understand, maintain, and test than applications based on Java objects and sound OO design principles.

Converting Between JavaBeans and XML

When we need to convert data to XML, it's usually a better idea to do so closer to the boundary of the system. If we do so in the web tier itself, immediately before the XSL transform, we can achieve interchangeability of view technology, allowing us to choose between using XSLT and JSP and other solutions, without modifying the entire architecture (we'll talk about how to achieve this important design goal in Chapters 12 and 13).

In J2EE applications, data normally exists as JavaBeans before rendering to a client such as a web browser. If we use Java-based technologies such as JSP, we can work with JavaBeans directly. If we need to expose data using XML (for example, to transform it using XSLT) we need to convert the JavaBeans to an XML representation. Fortunately, there is a fairly natural mapping from JavaBeans to XML documents, making this a far less complex problem than converting relational data to XML.

There are several approaches to generating XML from a graph of Java objects:

Code a toElement() method in each class that we'll want to represent as XML
This obvious but naïve approach has severe disadvantages. It adds complexity, and the need to understand XML, to every class in an object graph. It hard-codes the XML document structure and element and attribute names into each class. What if we need to generate a slightly different type of document? What if we're really interested in generating XML from an interface which has multiple implementations that don't share a superclass?
Use the GoF Visitor design pattern to facilitate XML generation
In this approach, each class in an object graph implements a Visitable interface. An XML visitor is responsible for traversing the graph and generating an XML document. This is a much superior approach to hard-coding element generation in application objects. The XML knowledge is concentrated in the visitor implementation. It's easy to generate different types of XML document. It works well with interfaces. Making each class in the graph implement the Visitable interface may prove useful for other tasks; the Visitor design pattern is very powerful.
Write a custom XML generator that knows about a particular object graph and can generate XML for it
This is similar to the Visitor approach, but less general. It also has the advantage of localizing XML generation code in one or more specialized classes, rather than littering it through the application's object model.
Generate XML nodes from Java objects using reflection, without requiring application-specific XML generation code
This approach uses generic XML generation infrastructure to generate XML from application objects, which need not know anything about XML. XML generation may either occur as the result of a complete traversal of the object graph, or objects may be converted to nodes on the fly, when required by XPath expressions. This approach will require a third-party library. It will be slightly slower than using custom XML generation, but the performance overhead will usually be less than that of transforming the generated XML. I'll refer to this approach as domification: the generation of XML DOM nodes from Java objects.

The second and fourth approaches are usually best. I've used the last three approaches successfully in several projects. The fourth approach – "domification" using reflection – is most attractive, as it minimizes the amount and complexity of code written by application developers. While the use of reflection will incur a small performance overhead, it will probably be smaller than the overhead of transforming the XML.

One major benefit of the use of reflection is that it will ensure that XML-generation code is always up to date; there will be no need to modify bloated XML-generation classes when objects change. We won't need to write verbose XML-generation code either; the domification library will hide the details of XML document creation. If domification is done on the fly, only the object properties actually needed by a stylesheet will be invoked at runtime, meaning that some of the overhead can be avoided.

There are a few things we need to consider before relying on domification:

Although different libraries have different capabilities, the objects exposed will probably need to be JavaBeans. Bean property getters don't require arguments; ordinary methods often do, and therefore can't be invoked automatically. Although JSP and template languages such as WebMacro (http://www.webmacro.org) provide means of calling methods as well as getting bean property values, most presentation technologies work best when exposing JavaBeans. As it's good design practice (encouraged in the JSP specification) to make models JavaBeans, this shouldn't be a major problem.
We may need to customize XML generation for some types, although we can rely on the domification library to handle primitive types. A sophisticated domification library could allow pluggable handling for individual object types.
Cyclic references may pose a problem. These are legal in Java object graphs, but don't make sense in XML trees.
Unexpected errors may result if any bean property getters throw exceptions. It's unlikely that a domification library will be able to handle such errors usefully. As all data retrieval should be complete before models are passed to views, this shouldn't be a problem.

There are several published libraries for converting Java objects to XML. My favorite is the Domify open source project (http://domify.sourceforge.net/). Domify was originally part of the Maverick MVC web framework but was split into a separate project in December 2001. It's a tiny library (only eight classes) and is very easy to use. Other, more sophisticated, but more complex, products include Castor (http://castor.exolab.org/xml-framework.html), which "can marshal almost any ‘bean-like’ Java Object to and from XML". See http://www.rpbourret.com/xml/XMLDataBinding.htm for a directory of several XML-Java conversion products.

Domify uses the on-the-fly approach to XML node creation, with lazy loading. Nodes in the logical DOM tree that are never accessed, by XSLT or other user code are never created. Once created, nodes are cached, so subsequent accesses will be faster.

To use Domify, it's first necessary to create an object of class org.infohazard.domify.DOMAdapter. Once a DOMAdapter has been created, its adapt (Object, String) method can be used to domify objects. A DOMAdapter object is thread-safe, hence can be used repeatedly (however, DOMAdapter objects are cheap to instantiate so there's no problem in creating many objects). The entire process looks like this:

    DOMAdapter domAdapter = new DOMAdapter();    Node node = domAdapter.adapt(javaBean, "nameOfRootElement");

The adapt() method throws an unchecked exception if the transformation fails: most likely, because an invoked getter method threw an exception. As this is not recoverable, and should not happen when traversing a web tier model, this is a reasonable approach (as discussed in Chapter 4).

Domify doesn't check for cyclic references, so we must ensure that beans to be domified don't have any (again, this is not usually a problem with Java bean models in web applications).

To illustrate how Domify works, let's take a simple bean and look at the XML document Domify generates from it. The following simple bean has four properties, of String, int, Collection, and Map type respectively, which are highlighted in the listing below, along with methods allowing data to be added:

    public class Person {      private int age;      private String name;      private List hobbies = new LinkedList();      private Map family = new HashMap();      public Person (String name, int age) {        this.age = age;        this.name = name;      }      public void addHobby (String name) {        hobbies.add (name);      }      public void addFamilyMember (String relation, String name) {        family.put (relation, name);      }

     public int getAge() {        return age;      }      public String getName() {        return name;      }      public Collection getHobbies() {        return hobbies;      }      public Map getFamily() {        return family;      }

I've omitted the no-arg constructor and property setters for brevity. Let's construct a simple bean:

    Person p = new Person ("Kerry", 35) ;    p.addHobby ("skiing") ;    p.addHobby ("cycling") ;    p.addFamilyMember ("husband", "Rod") ;    p.addFamilyMember ("son" , "Tristan") ;

Domify will automatically expose the four bean properties. Each node is created only on demand, and cached thereafter in case it is required again in the same XML operation. The following illustrates the complete XML document. Note the treatment of the Collection and Map properties (highlighted):

 <?xml version="1.0" encoding="UTF-8"?> <person>   <name>Kerry</name>   <age>35</age>   <hobbies>     <item type="java.lang.String">skiing</item>     <item type="java.lang.String">cycling</item>   </hobbies>   </family>     <item key="son" type="java.lang.String">Tristan</item>     <item key="husband" type="java.lang.String">Rod</item>   </family>                                                                                                                  </person>

It's easy to write an XSLT stylesheet to format this data as desired. This approach is very simple, yet powerful. The Maverick MVC web application framework demonstrates its effectiveness.

Converting in the opposite direction – from XML representation to JavaBean – is also relatively straightforward, and is a widely used approach for getting application configuration out of Java code.

Many applications and frameworks use XML documents to provide long-term persistence for JavaBeans, such as JBoss's jboss.jcml XML configuration file, which uses XML to configure JMX MBeans. Each bean property is usually represented as an XML element, with an attribute value holding the property name. The generic framework we'll describe in this book and use for our sample application also uses this approach. Most application objects will be JavaBeans, their properties and relationships held outside Java code in XML documents. This is discussed in detail in Chapter 11.

J2SE 1.4 introduces "Long Term JavaBeans Persistence" to standardize such functionality, although it is too early to tell how widely this standardization will be accepted. In Java 1.4, the java.beans API is extended to read and write a bean as an XML representation of its property values. As this book is concerned with the J2EE 1.3 platform, which is based on J2SE 1.3, it's assumed that this new API isn't available. However, its introduction indicates the importance of Java beans to XML mapping.

J2EE and XML in the Future

J2EE will become still more closely integrated with XML – even besides web services support. Important forthcoming enhancements include Java Architecture for XML Binding (JAXB), which automates the mapping between Java objects and XML documents by providing tools to generate Java classes from XML DTDs or schemas. The generated classes can efficiently handle XML parsing and formatting, simplifying code that uses them and offering the potential of much better performance than is available with traditional XML parsing or Java-to-XML converters such as Domify. The JAXB homepage is at http://java.sun.com/xml/jaxb/index.html. The JAXB specification is presently in draft, and should be complete by the end of 2002.

Important

XML is a core J2EE technology. XSLT is a viable alternative to JSP as a view technology in J2EE applications if it simplifies presentation logic, and when data already exists as XML or can efficiently be converted to XML form.

I don't favor "deep" use of XML within a J2EE architecture. XML documents are loosely typed and cumbersome to access and manipulate in J2EE applications.

XML in the Sample Application

There's no reason to use XML deep within the sample application's architecture. However, it is reasonable to consider the option of using XSLT as a view technology. A key business requirement is the ability to "rebrand" the application by modifying the presentation without revising the workflow. XML and XSLT provides an excellent way of doing this, although we can also achieve it with JSP and other view technologies, so long as we use an MVC approach in the web tier.

We may want to retain the option of using XML and XSLT to generate web content, but have no reason to tie ourselves to using XML. This suggests that we should create our data models as JavaBeans, and use a package such as Domify to convert them to XML documents if needed. The volume of user-specific data to be converted and styled (in reservation objects, for example) is modest, so the performance overhead of this approach should not be a problem. Where reference data (such as performance dates) is concerned, we may want to cache converted documents, as more data is involved and more pages are concerned.

It would be difficult to justify using XML and XSLT in Phase 1 except as a strategic choice, or unless practical considerations suggested it (such as the availability of strong XSLT skills but no JSP skills). JSP, will prove simpler and quicker to generate the screens described in the initial requirements. However, XSLT might well come into its own in the future (for example, it would be well suited to sort reference data in the web tier). In Chapter 13 we will discuss XSLT as a view technology in detail, along with a demonstration of how it could be used in the sample application without changing the application's overall architecture.