The Message as an API | Professional JMS

Message formats are a contract, an API, between communicating systems. Formats and business processes associated with them are agreed on through a process of negotiation between stakeholders on each side of the transaction. For example, a stock exchange system might be developed to accept orders sent as messages in the following simple format:

     Operation   Stock   Number     BuyStock    Wrox    1000     SellStock   Wrox    500

Here we are naming a service (in this base BuyStock or SellStock), and including two parameters required by this service. The result might be a status message:

    Status    OK    Failed

What we have done here is defined a simple API for a stock exchange system. However, as developers we have become accustomed to thinking of an API in terms of objects or procedures. We think about classes, methods, overloading, typing, etc. Therein lies part of the problem: in Java, or other high-level languages, the syntax has been well defined. We do not have to worry how we are going to delimit the parts of the data we simply use commas in a method call, packaging our parameter list in parenthesis. Syntax and typing are an implicit part of the language, and compilers are great at catching syntax and type errors. In traditional messaging, however, it is up to the developers to define much of this.

The problems with this style of messaging fall into four broad categories:

Structure
Parsing and Validation
Typing and Encoding
Bandwidth and Resource Conservation

Each of these will be examined in the following sections.

Structure

Traditional messaging systems suffered from a lack of standardization defining how structures should be rendered, and even how they should be documented. This often leads to groups developing completely proprietary, idiosyncratic message delimiting schemes. For example, with our simple stock exchange API, we are using commas to delimit parameters. How would we extend our API to accept a variable length list of repeated stocks and numbers? We could simply append them:

     Operation, Stock, Number, Stock, Number, Stock, Number, ..., Stock, Number

Or alternatively use additional special characters to create a list-like structure:

     Operation, {[Stock, Number], [Stock, Number], ..., [Stock, Number]}

Neither of these is particularly elegant. Furthermore, neither lends itself well to applying the principles of object-oriented design. We are overloading the operation interface, but we aren't really encapsulating data effectively.

Sometimes message structure simply directly maps structure from high-level languages. For example, in an exchange between two systems both running C compilers on identical machines (same OS, same word length), the structure may simply be a serialized struct. It isn't human readable, but it is recoverable on the other machine.

With no standard means of defining structure in messages, negotiation between groups defining an API will often focus on issues of syntax, rather than the content, which is ultimately quite a bit more important.

Parsing and Validation

If there is no standardization of how messages are structured, then parsers are going to be incompatible, and everyone will generally have to write and maintain their own. This can create a huge maintenance and deployment issue. Consider the case where two businesses are exchanging messages using a predefined schema. Generally, these businesses will maintain their own applications; this implies that both have their own parsers that must functionally track each other. As messages evolve, how are they versioned? How do parsers handle versions of messages that they have never encountered?

The code to do message validation is often deeply embedded in parsing code. For example:

    BuyStock,   Wrox,   X    BuyStock,   Wrox   X

In the first example, the X here should be numeric according to the rules we have defined. In the second example, the comma delimiter is missing, so the program may be assuming the stock name is "Wrox X". Unfortunately, a large number of programs validate implicitly as they parse, rather than validating against some kind of template that can easily be modified to accommodate changes to the message. Such interfaces are often characterized as brittle because of how susceptible they are to failure if a message is modified.

If we do have such a template, how should it be distributed among parties that need it? Copies could be replicated among all systems processing the message; a central repository could be set up; it could even be embedded in messages. Although each of these is possible, there has been no standard way of implementing any of these strategies.

Typing and Encoding

Character encoding has always been an issue with traditional messaging systems. Many text-based message systems did not support Unicode originally, which made it very difficult to transmit messages containing localized language strings. There have been significant character set issues among systems in the past. Translations from ASCII to IBM's EBCIDC are commonplace in many large messaging environments.

Type binding between languages can be the source of numerous problems. When two systems want to exchange a floating-point value via a message, they must agree on how to render the value in the message so it can be bound into their local type systems. Standards do exist for some simple types. ISO, for example, has defined standard representations for time. However, in general, if you needed a comprehensive type-binding scheme, you had to look at something like CORBA-IDL.

Bandwidth and Resource Conservation

Traditional messaging has always encouraged terseness in messages to accommodate low bandwidth communications links. Although the explosive growth of the Internet has allowed us to become less restrictive in our use of bandwidth, there is still a tendency for message designers to highly optimize messages, densely packing in as much data as possible to maximize information content for the lowest possible bandwidth. This has led to a number of messaging protocols that are machine understandable at the expense of human readability. There are some applications that demand such packaging to sustain real-time-like performance - multi-player game environments come to mind. However, ultimately, this does make it difficult to track problems by inspecting the message flow. Anyone who has tried to track a CORBA message dialog can readily attest to this.