Transformation | Next Generation Application Integration: From Simple Information to Web Services

The transformation layer is the "Rosetta Stone" of the system. It understands the format of all information being transmitted among the applications and translates that information on the fly, restructuring data from one message so that it makes sense to the receiving application or applications. It provides a common dictionary that contains information on how each application communicates outside itself (application externalization), as well as which bits of information have meaning to which applications.

Transformation layers generally contain parsing and pattern-matching methods that describe the structure of any message format. Message formats are then constructed from pieces that represent each field encapsulated within a message. Once the message has been broken down into its component parts, the fields may be recombined to create a new message.

Most integration servers can handle most types of information, including fixed, delimited, and variable. Information is reformatted using an interface that the user integration server provides, which may be as primitive as an API, or as easy to use as a GUI.

There are a few aspects to the notion of transformation:

Support for Differences in Application Semantics

Accounting for the differences in application semantics is the process of changing the structure of a message, and thus remapping the structure and data types so that it is acceptable to the target system. Although it is not difficult, application integration architects need to understand that this process must occur dynamically within the integration server.

This process can be defined within the rules-processing layer of the integration server by creating a rule to translate data dynamically, depending on its content and schema. Moving information from one system to another demands that the schema/format of the message be altered as the information is transferred from one system to the next.

Although most integration servers can map any schema to any other schema, it is prudent to try to anticipate extraordinary circumstances. For example, when converting information extracted from an object-oriented database and placing it in a relational database, the integration server must convert the object schema into a relational representation before it can convert the data within the message. The same holds true when moving information from a relational database to an object-oriented database. Most integration servers break the message moving into their environment into a common format and then translate it into the appropriate message format for the target system.

Support for Differences in Content

Related to the concept of accounting for the differences in application semantics, accounting for content changes is another important aspect of transformation. In short, it's the reformatting of information so that it appears native when sent to a target system. The information needs to appear native, requiring that changes be made to source or target systems.

Although many formats exist within most application integration problem domains, we will confine our attention, for the purposes of this manifesto, to the following:

Alphanumeric
Binary integers
Floating point values
Bit fields
IBM mainframe floating points
COBOL and PL/I picture data
BLOBs

In addition to these formats, there are a number of formatting issues to address, including the ability to convert logical operators (bits) between systems and the ability to handle data types that are not supported in the target system. These issues often require significant customization in order to facilitate successful communication between systems.

In data conversion, values are managed in two ways: carrying over the value from the source to the target system without change, or modifying the data value dynamically. Either an algorithm or a look-up table can be used to modify the data value. One or more of the source application attributes may use an algorithm to change the data or create new data.

Algorithms of this type are nothing more than the type of data conversions we have done for years when populating data warehouses and data marts. Now, in addition to using these simple algorithms, it is possible to aggregate, combine, and summarize the data to meet the specific requirements of the target application.

When using the look-up table scenario, it might be necessary to convert to an arbitrary value. "ARA" in the source system might refer to a value in the accounts receivable system. However this value may be determined, it must be checked against the look-up table. Integration servers may convert dollars to yen using a currency conversion table, which may be embedded in a simple procedure or, more likely, in a database connected to the integration server. The integration server may also invoke a remote application server function to convert the amount.

The application integration architect or developer may encounter special circumstances that have to be finessed. The length of a message attribute may be unknown, or the value may be in an unknown order. In such situations, it is necessary to use the rules-processing capability of the integration server to convert the problem values into the proper representation for the target system.

Support for Abstract Data Types

Transformation mechanisms also need to support abstract data types (ADTs), allowing different representation of data and behavior to meet the requirement of the application integration scenario.

ADTs provide a mechanism with a clear separation between the interface and implementation of the data type, including the representation of the data, or choosing the data structure, and the operations of the data

The interface with the abstract data type is created through an associated operation. What's more, the data structures that store the representation of an abstract data type are invisible to the integration view. The ADT also includes any operations, or algorithms, contained with the ADT.

The internal representation and execution of these operations is changeable at any time and won't affect the interface to the ADTs. Thus, a completely different representation is possible for sets storing information in the ADT.

Having said all that, ADTs consist of:

An interface, or a set of operations that can be performed
The allowable behaviors, or the way we expect instances of the ADT to respond to operations

The implementation of an ADT consists of:

An internal representation data stored inside the source or target system's variables
A set of methods implementing the interface
A set of representation invariants, true initially and preserved by all methods