Strategies for Coping with Integration | Semantics in Business Systems: The Savvy Managers Guide (The Savvy Managers Guides)

There are five major strategies for achieving integration. We'll touch on each briefly and then cover the semantic implications that each brings to the party. The following list is five patterns we have seen toward approaching integration projects. They are not mutually exclusive in that many projects use some elements from each, but each requires a predominant mindset that determines much of the direction and economics of the project. The five most common approaches to systems integration are as follows:

Manual
Big database
Direct connect
Point to point
Message based

Manual

Most integration is still manual. In many cases people don't even know they are doing integration; they are just taking a report from one system and keying it into another. This has the disadvantage of being operationally expensive, and it doesn't scale well. It is semantically imprecise (remember Chapter 3?), but these disadvantages have a counterbalance: The semantic mismatches that grow up between systems can be finessed by the human that is doing the integrating.

Big Database

An approach to integration that has a lot of intellectual appeal is the big database approach. This approach is based on the premise that if we can just get the two functions to run off the same database, they will be as integrated as possible. If they share a customer table and one of them updates it, it is immediately available to all the other functions.

The semantic problems with this approach are that the different applications may still treat the data semantically slightly differently. This cost shows up in the "systems" or "integration" test, which is typically the last and most expensive phase of a development project. The system is exercised with every combination of transactions from the various applications that might affect the system, in the most varied sequences, and the semantic discrepancies show up.

The more important argument against the big database approach is that it doesn't scale well. The scale problems come in two varieties. The first is that the database schema is essentially the same as "global data" was to early programming. In the early days of programming, programmers used to define all the variables they might access in one area ("global storage"), which was available to any subroutine to access. As programs became more complicated, people discovered that the variety of subroutines that might access the global data (and therefore create the potential side effect of changing something) became unwieldy. The database schema is essentially the same, and as applications scale up, the side effects become more complex.

The other issue is that as applications become larger (and that's really what we're talking about when we put multiple applications on the same database; they essentially become one larger application), they become less and less productive to build and maintain, and the chances of a successful project drop precipitously. The Standish Group has done a comprehensive study on more than 23,000 software projects.^[88] To summarize the findings: Projects that cost less than $750,000 had a 55% chance of successful completion. Each doubling of the project cost cut the chance of success roughly in half, until at $10 million there were statistically no successful projects (presumably less than 100 for the sample size).

Direct Connect

By direct connect we are referring to techniques that allow one application to directly access the programs of another application. The technique could be a remote procedure call mechanism or it could be through the sharing of components, such as COM or DCOM. In either case the applications soon become highly interconnected and complex.

There are semantic problems with this approach. The calling program has to accept the semantics of the called routines whether they are made explicit or not. The main reason this approach runs out of steam, though, is not semantic. In this case it is that there is no decoupling mechanism, and a set of highly interconnected systems will be brittle and hard to adapt. By directly connecting applications we turn several small applications into one large one, with all the attendant antiscale issues.

Point to Point/Store and Foreword

Perhaps the most popular and viable strategy of integration currently is what we call "point to point/store and forward." The term point to point refers to the fact that one system is sending data to another. There is one sender and one receiver. The term store and forward refers to the fact that the sender need not be directly connected to the receiver. The most common form of this uses the file system, in which one system writes transactions to a file, which is picked up later by the receiving system. This strategy includes most forms of electronic data interchange (EDI).

The semantic shortcoming of this approach is that the sender and receiver must agree on the semantics of the record formats ahead of time. Because the semantics are not fully expressed in the structures that define these transactions, this agreement often gets worked out in a trial-and-error process. This is generally done though some sort of specification (either an independent standards body, in the case of most EDI, or an internally written specification). The other problem with this is the point-to-point topology, which limits reuse and makes applications dependent on one another.

Message Based

This category includes most of the modern EAI approaches, including the use of integration brokers, message-oriented middleware (MOM), message buses, and self-describing messages (i.e., XML).

Message Broker, Integration Broker, Message-Oriented Middleware

These technologies address the topology of application integration by placing a software component (the broker) as a hub that all the applications talk to, instead of talking directly to each other. This topology can greatly reduce the number of interconnections that must be made.

The message-based approaches have the potential to overcome the semantic issues by expressing the messages in a format that could potentially be evaluated by a system for semantic consistency. They address the point-to-point issues by "anonymizing" the sender and receiver (each knows about the bus, but not each other) and by encouraging multicast of messages through publish and subscribe mechanisms (one application can publish a change message, which may be subscribed to by many receiving applications).

The message-based approaches have the potential to provide a semantically rich interface while leaving the sending and receiving systems loosely coupled, but this is only potential. The rest of this chapter discusses issues that must be resolved before that potential can become a reality.

^[88]Jim Johnson, "Turning Chaos into Success," http://SoftwareMag.com, Dec 1999. Available at http://www.softwaremag.com/archive/1999dec/Success.html.