Information Producers and Consumers: Common Patterns | Next Generation Application Integration: From Simple Information to Web Services

In the world of IOAI the source and target systems are always entities that produce and consume information. The types of systems that produce and consume information vary greatly. For our purposes, we can place them into one of the following patterns:

Database
Application
User interface
Embedded device

Within IOAI, it's important that you understand each pattern, how each pattern behaves, and advantages and limitations of each pattern type.

Database

The most popular information producer and consumer by far, databases are natural points of integration because they were designed to produce and consume data, and thus provide the best interface into source and target applications exchanging information. We interact with databases using whatever database interface makes the most sense, typically native Structured Query Language (SQL), or perhaps through a Call-Level Interface (CLI) such as Java Database Connectivity (JDBC).

When requesting information from the source database, we typically send the request using a language that the database can understand, such as SQL. The database then responds with a result set; just the information requested. This could be one or many records, and the information returns to the integration server, or other calling program. The source database either accepts the request and produces the data, or works through an exception-processing scenario.

When sending information to a target database we simply request that we update the database, once again using a language the database can understand, and send the information in the proper format. The target database either accepts or rejects the updates (see Figure 2.4).

Figure 2.4. Databases produce and consume information through interfaces they provide.

graphics/02fig04.gif

While all databases basically function in the same way, there are some differences in the databases out there. You have many different models that are employed, including relational, object, XML, file-oriented, and hierarchical, as well as the languages they use and the formats they produce. There are many books on database technology, so it does not make sense for us to go into detail here. In many application integration domains, adapters may account for the differences in the database technologies and how requests and updates execute, or you may have to account for the differences using custom code.

The upside of using the database as a point of integration is that the interfaces are almost always well defined and tested, and there are many different types of result sets you can request. That's what they do. The downside is that the information produced is typically not bound to business entities. Thus, while you may get all of the information to create a purchase order, you have to figure out how the information in the result set pertains to the purchase order (and even calculate some fields). In contrast, application interfaces usually produce data bound to business entities, such as invoices, sales orders, or purchase orders.

Application

Application interfaces are much more complex than databases due to the simple fact that applications all take a different approach to how they consume and produce information (if they do it at all). Therefore, the interfaces that exist from application to application don't share common patterns or standards; thus, you need to address each packaged application individually, perhaps programmatically or through adapters provided by application integration vendors.

It's also worth noting up front that while application interfaces provide access to encapsulated information, a producer of information if you will, they also provide access to encapsulated application services. In this book we talk about application interfaces in the context of IOAI, but application interfaces are clearly a part of application service-based access as well.

Application interfaces are interfaces that developers expose from packaged or custom applications to gain access to various levels or services of those applications. Some interfaces are limited in scope, while others are "feature rich." Some interfaces allow access to business processes only; some allow access directly to the data. Some allow access to both.

Packaged application vendors (e.g., SAP) and developers (e.g., the guy who wrote your accounting system) expose these interfaces to provide access to the business processes and data encapsulated within their applications without requiring other developers to invoke the user interface or to go directly to the database. The use of such interfaces creates a benefit for application integration by allowing external applications to access the information found in these applications without making any changes to the packages or to applications themselves (see Figure 2.5). Exposing these interfaces also provides a mechanism to allow encapsulated information to be shared. For example, if SAP data is required from Excel, SAP exposes the interfaces that allow you to invoke a business process and/or gather common data.

Figure 2.5. Application interfaces provide access to information as well as encapsulated processes.

graphics/02fig05.gif

In addition to the potential complexities of application interfaces and their dynamic nature, this difference in approach distinguishes application interface application integration from other types of application integration. The range, in terms of the number and quality of features different application interfaces provide, makes it nearly impossible to know what to anticipate when invoking a particular application interface. This is certainly true of packaged applications, which are "all over the board."

However, recently we've moved quickly toward standard application interfaces such as J2EE Connectivity Architecture (JCA), and even Web services. While this is a step in the right direction, there is much work to be done before all applications have exposed their internals using a standards-based API. What's more, many may find that the standards themselves are limiting, and thus they will have to mature more to be effective. For example, the first generation of the JCA specification only defined information moving in one direction. We'll talk much more about JCA and Web services later in this book.

Interface by Example

Let us say we are attempting to integrate an ERP system that was just configured and installed at the site of our supplier and our long-running, custom COBOL system. Both systems exist on their own processor at their respective sites, connected by the Internet.

In our example, we are fortunate that the ERP vendor understood and anticipated the need to integrate business processes and data with the outside world. The vendor provided an API that works within C++, C, and Java environments, with the libraries that drive the API being downloadable from the company's Web site.

For example, from a C application with the appropriate existing API libraries, the function GetInvoiceInformation("12345") would produce

 <BOM> John Smith 222 Main Street Smalltown, VA 88888 Invoice Number: 12345 001       Red Bricks     1000     .50     500.00 <EOM>

The information returned from the API call is generated by invoking an API and passing in an invoice number as an argument. For processing, this information would have to be placed within the application program in an array, or in another memory location. From this point, the information may be placed within a middleware layer such as an integration broker or within XML, for transmission to other systems. Note that the database itself was never accessed. Further, the data is already bound to a business entity namely an invoice.

Using this same interface, we can also get at customer information:

 GetCustomerInformation("cust_no")

Or inventory information:

 QuantityAvailable("product_no").

Our example might not be so straightforward in COBOL. Here, the developer and application architect who built the application failed to build in an API to access encapsulated business processes. Therefore, the application must be redesigned and rebuilt to expose an API so that the processes of the application may be bound with the processes of the remote ERP application.

The generally high cost of development and testing makes building interfaces into existing applications an unattractive option. However, let's assume in our current example that the application interface is the best solution. Once we've gone ahead and built an interface into the COBOL application, our next move is simple select the right middleware to bind to the ERP API within the supplier and the custom API locally. The middleware then allows the application integration subsystem to extract business information (e.g., credit information) from one and place it in another. Middleware that will work in this scenario might include integration servers, message queuing (MQ) middleware, and application servers.

Packaged applications (most often present in a typical application integration problem domain) are only now beginning to open their interfaces to grant outside access and, consequently, integration. While each application determines exactly what these interfaces should be and what services they will provide, there is an evolving "consensus" to provide access at the business model, data, and object levels.

As we have come to appreciate in the world of custom applications, anything is possible. Access to the source code allows us to define a particular interface, or to open the application with standard interfaces such as CORBA, Component Object Model (COM), JCA, or Web services. For example, rather than accessing the user interface (scraping screens) to get to an existing COBOL application residing on a mainframe, we can build an API for that application simply by exposing its services through an API. In most cases, this will require mapping the business processes once accessible only through screens and menus directly to the API.

If the world were a perfect place, all the features and functions provided by packaged applications would also be accessible through their "well-defined" APIs. Sadly, the world is not a perfect place, and the reality is a bit more sobering. Nearly every packaged application provides some interfaces but, as we have emphasized above, they are uneven in their scope and quality. While some provide open interfaces based on open interface standards such as Java APIs (e.g., J2EE Connection Architecture), many others provide more proprietary APIs that are useful only in a limited set of programming languages (e.g., COBOL, Java, C, and C++). Most disturbing of all is the harsh reality that too many packaged applications fail to offer so much as a solitary interface. When confronted with these applications, there is no opportunity for an application or middleware layer to access services cleanly. As a result, the business processes and data contained within the application remain "off limits." In these situations, half the anticipated expense of moving forward must be dedicated to more traditional mechanisms, such as leveraging scraping screens, or database application integration.

Packaged applications are natural stovepipes. This reality not only places them squarely within the problem domain of most application integration projects, but it also makes them the most challenging applications to integrate. You may find yourself needing to access the information in these stovepipes and needing to share the business logic locked up within them with others in your organization or trading community.

SAP, PeopleSoft, Oracle, and Siebel have come to dominate the many packaged applications on the market today because they have recognized, and responded to, the need to share information. This is a tremendous advantage over their competitors. However, before availing yourself of this advantage, you must remember that, over the years, hundreds of packaged applications have likely entered your enterprise. The sad reality is that many of these packaged applications no longer enjoy the support of their vendors or, perhaps more likely, their vendors have gone out of business. Because the number of these older, packaged applications in your application integration problem domain could easily be in the hundreds, you will be confronted with special challenges for application integration. Most of these applications will provide some points of integration, while others will not.

Packaged applications come in all shapes and sizes. Most large packaged applications that exist within the enterprise are "business critical." SAP, for example, provides modules for accounting, inventory, human resources, manufacturing, and other vital functions. PeopleSoft and Baan provide many of the same types of services and modules.

Vendors such as Lawson Software, J.D. Edwards, and others some with less than a dozen installations offer packaged applications. For example, Scopus, a call-center management application, is limited to highly selected and specialized applications. Siebel, a sales-force automation package, is designed to allow sales organizations to function more effectively.

User Interface

Leveraging the user interface as a point of information integration is a process known as "screen scraping," or accessing screen information through a programmatic mechanism. Middleware drives a user interface (e.g., 3270 user interface) in order to access information. Simply put, many application integration projects will have no other choice but to leverage user interfaces to access application data and processes. Sometimes access to underlying databases and application interfaces does not exist.

There really isn't much to user interface access. It is just one of many techniques and technologies that can be used to access, or place, information in an application. The technology has been around for a number of years. As a consequence, there is very little risk involved in using it. There are, however, problems that need to be overcome. A user interface was never designed to serve up data, but it is now being used for precisely that purpose. It should go without saying that the data-gathering performance of user screens leaves a lot to be desired. In addition, this type of solution can't scale, so it is unable to handle more than a few screen interfaces at any given time. Finally, if the application integration architect and developer do not set up these mechanisms carefully, they may prove unstable. Controller and server bouncing are common problems.

There are, of course, a number of tricks for sidestepping these limitations, but they bring additional risk to the project. Still, with so many closed and proprietary applications out there, the application integration architect has little choice. The name of the game is accessing information by any means possible. Ultimately, going directly to the user interface to get at the data is like many other types of technology not necessarily pretty, but it gets the job done.

For the reasons noted previously, user interface-level application integration should be the "last-ditch effort" for accessing information that resides in source systems. It should be turned to only when there is no well-defined application interface, such as those provided by many ERP applications, or when it doesn't make sense to leverage the database. Having said that, we can also state that user interface-level application integration needn't be avoided. In most cases, it will prove to be successful in extracting information from existing applications and as a mechanism to invoke application logic.

Using the User Interface

An existing mainframe application created using DB2 and COBOL needs to share processes and data with a custom distributed object system running on Linux and with PeopleSoft running on NT. The mainframe application is older and does not have an application interface, nor do the skills exist within the company to create one.

Instead of creating an application interface, or moving information between databases, the application integration architect opts to leverage the user interface. Using this approach, the application integration architect is able to extract both application data and business information from the COBOL/DB2 system exposed by the user interface. The application integration architect may leverage this approach, perhaps to save money and lower the risk of the application integration project, or due to the simple fact that this may be the only solution when considering the state of the technology (the most popular reason).

The process of extracting information using the user interface is really a matter of defining how to get to the appropriate screens, locating the correct information on the screens, reading the information from the screens, and, finally, processing the information. You're creating an automated program that simulates an actual user, navigating through screens, emulating keystrokes, and reading screens into memory. Once in memory, the information is parsed, reformatted, and transported to any number of middleware layers, where it's ultimately sent to the target system, for example, the PeopleSoft system. You also need to check for errors and be able to handle and recover from the inevitable problems such as system and network failures.

You need to consider the business case. In this instance it may not make good business sense to create a new interface into the older mainframe-based system, and it may not make sense to integrate the applications at the database level (e.g., if we need to extract calculated information versus just raw data). By creating a new interface, the existing mainframe system will typically have to undergo a small to medium rearchitecture effort, redevelopment to add the application interface, and redeployment and testing of the application before it's placed into production. This is a long and drawn-out process, and perhaps it's not needed, considering what we can do these days with user-level application integration. Is it the best solution? It's another tradeoff.

As in other levels of application integration, the architect or the developer has the responsibility to understand the application, the application architecture, and the database information in great detail. At the user-level interface, this may prove difficult. Remember, the decision to leverage user interface-level application integration was made specifically to bypass the restrictions of closed proprietary systems, or because it just makes good business sense not to augment an existing system to create an interface, or because other application integration levels are not applicable.

In spite of the difficulty, this information is necessary for two reasons: one, the need to understand all the information that an application is producing, and two, to understand how all the data and methods residing within an application relate to all the data and methods existing within the enterprise. In short, it is necessary to know how the application data and methods fit within the enterprise metadata model and the enterprise object models, and map the source system into this logical and physical layer. Using the user interface as a point of integration does not free us from this requirement.

In order to implement user interface-level application integration, it is necessary to understand the application. This requires understanding the underlying data storage schema, much of the application logic, and, most important, how the information is presented to the user interface. In order to understand how the data elements are represented on the screen, it is necessary to understand how they exist within the application. Unlike other interface levels, information presented to a user interface may not map back to a database. Most of the data elements on the screen, such as calculated fields (e.g., an order total), are created by the application logic and do not come directly from the database.

This being the case, it is evident that an understanding of the application logic is also desirable. This requires reading the documentation so that the mechanisms of the application as well as the logic can be understood. Unfortunately, as in other contexts, the applications are not always well documented. If this proves to be the case, the source code itself will have to be read. Ultimately, regardless of how it is achieved, the goal is to be able to trace all data that appears on the screen to the application logic and database schema information.

Reaching this goal will often require "breaking the data out" from its representation on the screen, a task accomplished with simple mathematical formulas. There are times when it will be necessary to break out six separate data elements from a single number presented on the user interface. This, in and of itself, is a strong statement that the application integration architect or developer needs to understand the application logic in detail. There are many cases of user interface application integration projects where a failure to understand the application has led to a misreading of the user interfaces and, consequently, erroneous data being fed into target applications.

Embedded Device

Finally, you may find that, in some cases, information comes from embedded devices, such as temperature sensors, call-counting machines, or perhaps wireless devices. Dealing with embedded devices is similar to dealing with applications; the interfaces are typically API based and proprietary, although some standards are beginning to emerge.

A common pattern when dealing with embedded devices as source or target systems is the fact that information must flow freely from the device because most devices can't store information (as can true applications). Thus, when you request information you're typically obtaining the information at that time, not information that's in a queue (see Figure 2.6).

Figure 2.6. When using embedded devices, you may not be able to queue up information. Use it or lose it.

graphics/02fig06.jpg