What XML Brings to Messaging | Professional JMS

We have come to recognize the value of XML as a means to manipulate documents, but how can it help in the messaging world and in particular, how can it be applied to a JMS application? Simply put, a message is a document. The advantages of using XML in a simple file-based application apply equally to any messaging application. But there is one even more important quality that is unique to messaging:

Important

XML reduces system coupling.

One of the primary reasons for adopting JMS and the messaging paradigm is to build loosely coupled systems that can accommodate the problems associated with networks, availability of other systems, widely changing loads, etc. XML takes this one step further, and reduces coupling in the API.

Coupling Through APIs

The strength of the messaging paradigm is that it has always been highly effective under such constraints as limited system resources or network bandwidth. Messaging, by its nature, promotes a loose coupling between systems, where coupling, in this sense, refers to a number of important characteristics. Messaging, in and of itself, does not always ensure loose coupling; however, it does have a number of qualities - particularly when used in conjunction with XML, that encourage loose coupling.

Systems couple through APIs (here I consider a message and its handler to constitute an API). The message is a contract between communicating systems; it defines parameter types and placement, communications overhead, etc. How the handler abstracts the message away affects the degree of coupling. Handlers that are tolerant of minor message changes, such as modifications to versioning information, or the addition of fields to support parametric polymorphism etc., promote loose coupling. Changes made to a server, for example, may not necessitate simultaneous changes on all clients. Loosely coupled APIs typically exploit late binding of message content to the local type system.

Note

Late binding of messages refers to when message contents are converted to native types in a language like Java. For example, when you compile a program in Java or C++, the compiler checks to ensure that the types in the call are consistent with at least one of the method signatures. This ensures that simple syntactic typing errors are caught early, but it does comes at the expense a certain amount of flexibility and adaptability to change. A system that exploits late binding is one in which the decision about what content maps to what type is made at run time. Late binding is a double-edged sword; on one hand, you have flexibility to adapt to changing interfaces on the fly; on the other, you must ensure that your program can handle the ambiguous cases.

In contrast, tightly coupled APIs are intolerant of changes to the contract. These are typically statically type-checked at compile time, which promotes type safety but provides little flexibility and potential for adaptation. Critics charge that such APIs are "brittle" that is, any change made to the message or to any handler behavior requires that all participants be changed at the same time. This can be very expensive if there are a large number of clients using the interface.

Coupling also refers to the relative independence of the two systems. In a loosely coupled environment, the remote system's hardware, OS, network protocol, or even the language the application is written in should not be important - or indeed even necessary - details of concern to developers.

Coupling also refers to communications style, which is often dictated by the capabilities of the transport layer. Systems that exhibit tight coupling at the communications layer demand that servers be available when a client sends a message to them; if a server is unavailable, then the client must immediately take corrective action. Furthermore, such clients typically expect servers to deliver reply messages synchronously. In contrast, loosely coupled systems can easily tolerate an unavailable server. Messages may be queued for later deliver, when the server becomes available. Clients, if they expect a reply at all, expect asynchronous delivery.

Loose coupling clearly had benefits in a networked world of questionable reliability. Historically, it promoted a much more coarsely grained model of distributed computing and parallelism than we see now; programs tended to be standalone, the communications peer-to-peer rather than client-server. Development of each application-peer was handled independently, and often by different groups.

As networks matured, development tools and programming techniques became more sophisticated, and tight coupling became more prevalent. Developers shifted their focus to client/server systems, where the attention was on distributed processing within network-spanning applications. The entire interaction, client to server and back again, became the singular focus, with end-to-end responsibility often falling to one group.

What is interesting is that the very factors that gave rise to messaging in the first place are what fuel its current resurgence. Business wants to interact with other business across the Internet. Each organization wants to be responsible for administering only its systems. It wants to be flexible to accommodate continuously shifting requirements. It does not necessarily trust its trading partners to be reliable. And it wants to do it over the public Internet, which notwithstanding the phenomenal success of the Web is notoriously unreliable.

XML Refresher

This chapter will assume a prior knowledge of basic XML concepts. However, in the following section, we will briefly review some of the important details of XML that will be needed to understand some of the later examples, including DTDs, schemas, and namespaces.

We will use the following simple profile data as a basis for examining these:

Name	John Smith
Address	123 Main St
Phone	+1 (555) 555-1212
Credit Card #1 Number	1234567890
Credit Card #1 Expiry	12/02
Credit Card #2 Number	9876543210
Credit Card #2 Expiry	06/03

The Document

Our profile data, rendered as a simple XML document, would look like the following:

     <?xml version="1.0" encoding="UTF-8"?>     <Profile>       <Name>John Smith</Name>       <Address>123 Main St</Address>       <Phone>+1 (555) 555-1212</Phone>       <CreditCardList>         <Card>           <Number>1234567890</Number>           <Expiry>12/02</Expiry>         </Card>         <Card>           <Number>9876543210</Number>           <Expiry>06/03</Expiry>         </Card>       </CreditCardList>     </Profile>

Here, we have several levels of structure, including a credit card list with two distinct items. This is a wellformed document, meaning that all opening tags have a corresponding closing tag and there is no tag overlap. This version of the document does not have an associated schema (validation will be reviewed in the next sections), so none of the values have any typing. All data types must be assumed to be strings. The encoding scheme uses Unicode (http://www.unicode.org), the global encoding standard. This makes it a particularly good match for applications written in Java, which uses 16-bit Unicode character encoding (other languages, such as Visual Basic and C++, can of course manipulate Unicode text).

The DTD

A DTD internal subset (that is, a syntactic description embedded in the document) for the profile document looks like the following:

     <?xml version="1.0" encoding="UTF-8"?>     <!DOCTYPE Profile [         <!ELEMENT Profile (Name, Address, Phone, CreditCardList)>         <!ELEMENT Name (#PCDATA)>         <!ELEMENT Address (#PCDATA)>         <!ELEMENT Phone (#PCDATA)>         <!ELEMENT CreditCardList (Card+)>         <!ELEMENT Card (Number, Expiry)>         <!ELEMENT Number (#PCDATA)>         <!ELEMENT Expiry (#PCDATA)>     ]>     <Profile>         <Name>John Smith</Name>         <Address>123 Main St</Address>         <Phone>+1 (555) 555-1212</Phone>         <CreditCardList>             <Card>                 <Number>1234567890</Number>                 <Expiry>12/02</Expiry>             </Card>             <Card>                 <Number>9876543210</Number>                 <Expiry>06/03</Expiry>             </Card>         </CreditCardList>     </Profile>

Profile is a pretty simple document, so most of the DTD can be figured out from context. The ELEMENT tag defines the name of each element, and its content. The terminal elements, such as Name, Address, etc. have a content type of parsed character data (#PCDATA) -basically arbitrary text strings. Using DTDs, elements can be empty; they can contain other elements; they can contain PCDATA; or elements can contain a mixed content, meaning that they are composed of other elements and character data. One of the fundamental problems with DTDs is that they do not really add strong typing to XML; everything pretty much remains a string. This problem is solved by XML Schema, which we will touch on briefly in the next section.

Here is an example document demonstrating both entities and attributes in DTDs:

     <?xml version="1.0" encoding="UTF-8"?>     <!DOCTYPE Document [         <!ELEMENT Document (#PCDATA)>         <!ATTLIST Document docVersion CDATA #REQUIRED>         <!ENTITY insertName "John Smith" >     ]>     <Document docVersion="1.23">         Dear &insertName;,         It has come to our attention...     </Document>

You can load this document into a web browser that will parse the file, check it is well-formed, validate it against the DTD, and display it rendered with a default style sheet. The result will look similar to the following (note that in Internet Explorer, XML is rendered nicely in colour using a default style sheet):

click to expand

The Schema

We will not delve into the XML Schema specification in detail. To get a sense of what it offers, here is an example of an external schema document for the profile document:

     <?xml version="1.0" encoding="UTF-8" ?>     <! --     W3C Schema generated by XML Spy v3.5 NT (http://www.xmlspy.com)     -->     <xsd:schema xmlns:xsd="http://www.w3.org/2000/10/XMLSchema"             elementFormDefault="qualified">       <xsd:element name="Address" type="xsd:string" />       <xsd:element name="Card">         <xsd:complexType>           <xsd:sequence>             <xsd:element ref="Number" />             <xsd:element ref="Expiry" />           </xsd:sequence>         </xsd:complexType>       </xsd:element>       <xsd:element name="CreditCardList">         <xsd:complexType>           <xsd:sequence>             <xsd:element ref="Card" maxOccurs="unbounded" />           </xsd:sequence>         </xsd:complexType>       </xsd:element>       <xsd:element name="Expiry" type="xsd:string" />       <xsd:element name="Name" type="xsd:string" />       <xsd:element name="Number" type="xsd:string" />       <xsd:element name="Phone" type="xsd:string" />       <xsd:element name="Profile">         <xsd:complexType>           <xsd:sequence>             <xsd:element ref="Name" />             <xsd:element ref="Address" />             <xsd:element ref="Phone" />             <xsd:element ref="CreditCardList" />           </xsd:sequence>         </xsd:complexType>       </xsd:element>     </xsd:schema>

This schema was generated based on an example profile document using XMLSpy (http://www.xmlspy.com), a very popular commercial XML editor. Most of it is self-evident; this clarity is one of the strong points of XML Schema. Notice the use of the namespace xsd: preceding all of the elements. Namespaces exist to group and disambiguate elements - though sometimes, as here, they do affect readability. In this example, the xsd namespace clearly classifies all elements as being in the Schema namespace (more on namespaces below).

Namespaces

Namespaces were introduced to XML to resolve the ambiguities that naturally occur when you give everyone the flexibility to choose their own tag names: inevitably, people choose very similar names, which can potentially be the cause of confusion. Consider our own profile:

     <Profile>        <Name>John Smith</Name>        <Address>123 Main St</Address>        <Phone>+1 (555) 555-1212</Phone>        <CreditCardList>           <Card>              <Name>Visa</Name>              <Number>1234567890</Number>              <Expiry>12/02</Expiry>           </Card>           <Card>              <Name>MasterCard</Name>              <Number>9876543210</Number>              <Expiry>06/03</Expiry>           </Card>        </CreditCardList>     </Profile>

Here, the credit card name was added in an element Name, a subordinate to Card. Maybe this happened because we tried to reuse a previously defined structure. Unfortunately, we have already made use of the Name tag for the person's name. How can we disambiguate between the two when we are searching for name? (This problem will become more obvious as we look at some simple parsing examples further on.)

Namespaces allow us to resolve this ambiguity by associating tags and attributes with a unique identifier:

     <Profile xmlns:personal="someURI"                 xmlns:cc="someOtherURI">        <personal:Name> John Smith </personal:Name>        <personal:Address> 123 Main St </personal:Address>        <personal:Phone> +1 (555) 555-1212 </personal:Phone>        <personal:CreditCardList>           <cc:Card>              <cc:Name>Visa</cc:Name>              <cc:Number>1234567890</cc:Number>              <cc:Expiry>12/02</cc:Expiry>           </cc:Card>           <cc:Card>              <cc:Name>MasterCard</cc:Name>              <cc:Number>9876543210</cc:Number>              <cc:Expiry>06/03</cc:Expiry>           </cc:Card>        </personal:CreditCardList>     </Profile>

Here, we have used the reserved name xmlns to define two unique namespaces. These are assigned the aliases personal and cc, which can be used to qualify tags and attributes. The namespaces are distinct because they are bound to different URIs. Usually, such URIs take the form of a URL indicating a unique resource on the Internet.

Note

A Uniform Resource Identifier (URI) is a unique name for a resource, where a resource could be an HTML page, a program, a video clip, etc. A Uniform Resource Locator (URL) is a URI that explicitly identifies the protocol to access the resource, and a location on the Internet. The W3C defines a Uniform Resource Name as a URI with, "an institutional commitment to persistence, availability, etc.", meaning that while the physical location of a resource may change, it can still be located through the institution. For more information, see the W3C overview of naming and addressing at http://www.w3.org/Addressing.

Needless to say, being forced to qualify every element can become pretty tiresome. Fortunately, XML provides the concept of a default namespace. Any specific namespace can override this default. For example:

     <Profile xmlns="someURI"                 xmlns:cc="someOtherURI">        <Name>John Smith</Name>        <Address>123 Main St</Address>        <Phone>+1 (555) 555-1212</Phone>        <CreditCardList>           <cc:Card>              <cc:Name>Visa</cc:Name>              <cc:Number>1234567890</cc:Number>              <cc:Expiry>12/02</cc:Expiry>           </cc:Card>           <cc:Card>              <cc:Name>MasterCard</cc:Name>              <cc:Number>9876543210</cc:Number>              <cc:Expiry>06/03</cc:Expiry>           </cc:Card>        </CreditCardList>     </Profile>

The first namespace, previously called personal has had its alias removed. This implies it is now the default namespace for all tags not otherwise overridden by the cc alias.

Namespaces were introduced after the XML 1.0 specification. There is no support for them in DTDs, but as we have observed, XML Schema is namespace-aware. Some parsers do not acknowledge namespaces, notably those that only support DOM Level 1 or SAX 1. In general, namespaces will not cause these parsers to fail, but you will have to dissect the prefixes on your own.

XHTML

One of the problems with HTML is that while it looks like XML, it actually isn't. There are a number of HTML constructs that are valid from the perspective of the HTML schema, but which result in documents that are not well formed, and thus violate the basic tenet of XML. As a result, much of the emerging technology being built to process XML, including editors, cannot be used to produce and validate HTML. Often, the solution of vendors and standards groups is to treat HTML as a special case the DOM is a good example of this.

Consider, for example, the HTML IMG tag:

     <IMG src="/books/2/550/1/html/2/car.jpg" WIDTH="128" HEIGHT="128">

This is perfectly valid HTML; however, the missing closing tag for the IMG element will cause most XML parsers to reject it.

The W3C has recognized this problem and is defining HTML 5 as valid XML. The result, called XHTML, will enforce the XML well-formedness constraint. For more information, see http://www.w3.org/MarkUp.