The XML Solution | Professional JMS

Using XML as a basis for message formats can eliminate most of these problems, and thus promote loose coupling between systems. There are a number of important reasons for this, which we will expand on in the following sections. These are:

XML is language- and platform-independent
XML is standards-based
XML is extensible
XML is flexible
XML has rich structures
XML uses Unicode
XML has a powerful validation model
XML is human readable, and machine understandable

On its own, no single point is compelling enough to change how messaging is done. Taken as a whole, though, it is hard to argue against using XML for messaging.

Language and Platform Independence

XML is not tied to any language, operating system, networking protocol, hardware, or – possibly most important of all vendor. From the beginning, the architects of XML recognized that to be successful, XML would have to be the common means of describing data between vastly different systems.

Of course, vendors will always attempt to differentiate their products by adding features. Furthermore, since the specifications are evolving very rapidly, it is easy to become locked into enhancements or limitations of a particular product. However, the neutrality of the basic recommendations is extremely important.

Standards-Based

The W3C is responsible for the standardization of XML. The word XML actually describes a diverse and evolving family of technologies and standards into which new features and refinements are continuously added. But the core technology is actually very stable and mature. Out of this has come a rich assortment of tooling from vendors and the open source movement. This means that developers do not have to write parsers – production-quality parsers are available in almost every language. Most of these come at no charge and with unrestrictive distribution licenses. Developers do not have to build specialized message editors that are aware of the nuances of their vocabulary because there are many of these already in existence.

With a standard already defined and tools widely available that support it, developers can concentrate on building good APIs and flexible messaging vocabularies to solve the problem at hand, rather than building infrastructure and arguing over message structuring rules. This focuses messaging projects on the content of the transactions, which is ultimately where the challenges are.

Note

The similarities between XML and HTML are not coincidental. Both actually derive from a common ancestor, the Standardized General Markup Language (SGML). Like XML, SGML allowed users to add structure to documents independent of presentation. Stylistic transforms could then be applied to add presentation details. SGML saw use in government and technical publishing; however it is very complicated and expensive to support.

Extensibility

XML places few restrictions on how element tags and attributes are named. Developers are free to give tags meaningful names, so that messages can become self-documenting. It provides much the same freedom that we have in choosing descriptive object and variable names in high-level languages. Parsers are really only interested in structure, so need no modifications to accommodate changes in document organization.

Flexibility

XML applications tend to be very forgiving when changes are made to documents. For example, returning to our profile example, suppose a new element, City was introduced, but our application that processes it was not changed:

     <Profile>        <Name>John Smith</Name>        <Address>123 Main St</Address>        <City>Any Town</City>        <Phone>+1 (555) 555-1212</Phone>        <CreditCardList>           <Card>              <Number>1234567890</Number>              <Expiry>12/02</Expiry>           </Card>           <Card              <Number>9876543210</Number>              <Expiry>06/03</Expiry>           </Card>        </CreditCardList>     </Profile>

The document remains well formed, so the parser will not report any exceptions. Depending on how they are constructed, most XML applications will simply use the parser to mine the information they need. In general, a new field will be ignored. This may not be the desired behavior; however, it does demonstrate that the system is more resilient to change. Consider, for example, a Publish/Subscribe application using XML messages. New elements could be added to the messages without breaking subscribers that have not been updated to specifically make use of the new elements.

The message validation process may flag the change, but we can choose to ignore validation errors due to added elements that we intend to ignore.

Rich Structures

XML supports rich structures with deep, hierarchical nesting and a model for containing repeated elements. One of the few demands that XML makes is that documents must be well formed. This simple demand actually greatly simplifies document processing and general legibility: one can infer meaning from inspection of a well-designed XML document. HTML, in contrast, does not have to be well formed, and thus is often difficult to parse, to edit, and too often to follow.

The XML strategy of defining hierarchies that contain data maps elegantly onto the data structures of most languages, making XML a good choice as a serialization format for languages like Java or C++.

Unicode Support

One of XML's most visible benefits is that the documents are plain text. They can be read by humans, and edited with a simple text editor. This has been extremely important for interoperability between systems, as virtually all computers support some kind of text encoding. However, even something like defining the bindings between binary values and characters has been a contentious issue in the computing community. ASCII encoding is popular because of the tremendous success of PCs and UNIX systems; however, alternative coding like IBM's EBCDIC is very common in mainframe and midrange computer environments.

Internationalization has always been an important requirement for XML. Encoding schemes like ASCII are really only appropriate for supporting languages that use the Latin alphabet. Other character sets exist, which support other languages; however, a truly global encoding solution needs to be able to potentially support all characters from a single encoding scheme. XML addresses these issues by adopting Unicode as the base character encoding scheme for text.

Validation Model

XML can solve some difficult problems in message structural and content validation. In particular, it decouples validation from the parser; the implication of this is that changes to the document structure or content do not force changes on the parser, only to an external grammar that defines the rules that a document must follow to be valid. Recompilation and linking of parser modules to accommodate message schema changes is not necessary; instead, XML allows a very late binding between message handlers and validation model. Naturally, this lowers code maintenance costs enormously, but it also provides great flexibility in an environment where message formats can evolve continuously.

DTDs provide a means of describing the structure of an XML document. Most XML parsers can optionally be set to validate a document against a DTD, which may be integrated within the document or stored separately. Actual content validation is more complex. XML inherently is weakly typed; everything in an XML document is a string. Weak typing is not necessarily always a bad thing: virtually every developer has a favorite scripting language they use to get things done quickly. Scripting languages are inherently weakly typed, but they can be used to great advantage in rapid application development. Nevertheless, when moving to enterprise class applications with large teams of developers, weak typing in a language can become a serious liability.

DTDs do little to remedy XML's weak typing, describing no more than structural organization and string content models. XML Schema, however, adds support for simple types, enumerations, ordered lists, and inheritance to messages undergoing validation. This is useful as a contractual basis for type bindings between XML messages and strongly typed, high-level languages like Java and C++. There are already some commercial and open-source tools that automatically generate utility classes for marshaling and unmarshaling between XML messages and the aforementioned object-oriented languages. This will be examined in detail in the section about Java/XML data binding, later in the chapter.

Schema Distribution Model

If you do have a schema language that effectively decouples validation from parsing, how do you distribute the schema? There are three options:

Embedded schemas
Locally replicated schemas
Schema repositories

XML supports all three models. Using DTDs as an example of a schema description, the schema document could embed the schema:

     <?xml version="1.0" ?>     <!DOCTYPE Profile [     <!ELEMENT Profile (Name, Address, Telephone)>     <!ELEMENT Name (#PCDATA)>     <!ELEMENT Address (#PCDATA)>     <!ELEMENT Telephone (#PCDATA)>     ]>     <Profile>        <Name>John Smith</Name>        <Address>123 Main St.</Address>        <Telephone>+1 (555) 555-1212</Telephone>     </Profile>

The square brackets define what is called the internal DTD subset. This method of distributing schemas has some obvious advantages. It solves the potential problem of having a handler process a message with an out of date schema, but it comes with a high overhead.

A DTD can also reference an external file containing the schema description. This is done through a URL:

     <?xml version="1.0" ?>     <!DOCTYPE Profile SYSTEM "http://someServer/profile.dtd">     <Profile>        <Name>John Smith</Name>        <Address>123 Main St.</Address>        <Telephone>+1 (555) 555-1212</Telephone>     </Profile>

This is similar to the Java convention for referencing tag descriptor libraries. Either this can point to a local file, or perhaps some centralized repository. The former suffers, like all replication schemes, from the danger of reading stale data. Repositories may also become inaccessible because of firewalls, network outages, etc.

Human Readable, Machine Understandable

One of XML's greatest values is its most obvious: it is human readable. This helps immeasurably when debugging application problems. Dialogs between systems can easily be tracked, captured, played-back for testing, and even modified. For example, a test suite could be based on a series of core XML messages. Technologies such as XSLT, a language for transforming XML from one form to another, could be used to generate all the possible permutations of these messages for exhaustive server testing.