Integration Is Harder than It Looks | Semantics in Business Systems: The Savvy Managers Guide (The Savvy Managers Guides)

Integration is hard for two reasons: technical and semantic. By technical I mean the issues involved with the implementation technologies, especially binding to languages and communication protocols.

Technical Issues with Integration

The nonsemantic issues with integration can pose some difficulty, but they have a tendency to be about as complex as they look. Typical nonsemantic issues include the following:

Language mismatch—Often the first issue that comes up is incompatible languages. Are we interfacing a system written in COBOL with one written in FORTRAN or Java? The language issue sometimes restricts the design options.
Platform boundaries—Are the two systems to be integrated on different platforms? A different platform might be a different database. It might also be a different computer. The issue is more acute if it is a different type of computer. Are we interfacing an AS/400 with a DEC VAX? The platform boundary eliminates some of the options, and it requires extra effort to overcome the rest.
Character sets and byte order—Some interfaces have to deal with EBCDIC (mainframe character sets) to ASCII (minis and PCs). Different computers represent numbers in a different byte order, and some domain names have their order reversed. These are issues that must be dealt with somewhere in the interface.
Binding issues—Some sets of components or applications have constraints on how they bind to other systems. For example, most integration with Microsoft technology must bind to some variety of component object model (COM) object.
Temporal issues—Many integration projects operate on the assumption that the second system will be available and have adequate response time when we need information. But this is not always the case, and sometimes we don't find this out until long after the integration has been implemented.

These are the straightforward issues; most integrators spot them immediately and plan accordingly. The level of surprise is rarely great with these issues.

Semantic Issues with Integration

The root cause of the semantic issues with integration seems to be human gullibility. There is a tendency to believe that information is more reliable than it actually is, and we have to learn to be skeptical and question our information sources. It's hard to be cynical enough to be good at this.

Let's go over the sources of information and why they aren't reliable:

Metadata—Most of the time, systems integration starts with metadata, some sort of description of the data in the systems to be integrated. The metadata might be as simple as the table and column layouts of the database from which data will be retrieved. The integrator will form opinions about what the data fields are going to be, how valid they will be, and so on, based primarily on what can be deduced from the names of the fields. As we know, names can be deceiving.
User interviews—Often the integration process is driven from discussions with the users. The users have been doing an interface manually for some time and they would now like it automated. They describe a process, and maybe they tell you what the source data is like. Only they don't see all the source data and when you run the interface, spurious results appear. Or there are many subtle differences in the data that they have been finessing and had forgotten about.
Program specs or program code—Occasionally integrators will resort to reviewing code, but this is tedious work, and it isn't done all that often. The downside, if there is one, is that sometimes the programs you review are not the only ones that create data or events that you need to know about. Also, as we discussed in Chapter 3, the less semantic rigor there is in a given application, the more it is subject to procedures and training, and more variability can show up that will not be detected in a code review.
Review the existing data—In many ways this is the most reliable method. It has two possible drawbacks: The sample size may be too small (if the review is manual) or something may change in the future so that the system is not used in the same way that it was before, creating a different profile of data in the future from what was observed in the past.

So why is it said that these types of issues are semantic issues? And what can be done to address them? They are called semantic issues for two reasons. One is that they are not technical issues; typically the integrators have resolved the binding, language, and platform issues, and the interfaces still aren't working. The other, a bit more subtle, is that each of these approaches to attempt to integrate the two systems is built on an impoverished semantic model.

We know the model is impoverished after it fails. We run the interface, and somebody notices that some of the sales orders didn't come across. On further investigation we notice that some of the sales orders have duplicate numbers and the posting program rejected them. Or some of them had negative numbers for the amount and therefore were rejected or treated as credits. Three things are going on here:

The two systems had different implementation models. That is, they describe and implement the same part of the business in different ways. If they had identical implementation models (including the enforced definition of the terms), there wouldn't have been a problem. But that isn't the case.
The integrators created—usually in their heads, occasionally explicitly in a spec—their own version of a model that would bridge the two discrepant models, and they also had mental models of the semantics of each of the systems and a model of how they would resolve to the intermediate model. (Analysts don't often articulate this; they more often think in terms of "mapping" from source to destination, but if probed there is generally an abstract notion that they are going through. For a trivial example, if they have Julian dates [dddyyy] in one system and American Gregorian dates [mm/dd/yyyy] in another, in the back of their minds they know they are translating one of them into an abstract representation of time, and then reexpressing it in the other format.)
Evidence shows up, in a delayed fashion, that indicates that something is wrong with the model. Changes are made, the system is retested, and they proceed until they get to the next violation of the model.

This process is harder than it looks, and it doesn't always go as planned, but it is worth an incredible amount. Hundreds of billions of dollars have been invested on integration, most of it cost justified by return-on-investment analyses. This suggests that even poorly done integration is better than no integration at all. Let's take a look at how companies actually attack this problem.