Data Integration | XML: A Managers Guide (2nd Edition) (Addison-Wesley Information Technology Series)

Business Challenge

Unfortunately, the information constituting logically coherent business concepts such as vendor, customer, product, order, and invoice often resides in several different data sources. This information diffusion causes two problems: (1) enterprise employees find it difficult to execute job functions that require them to assemble a complete representation of such business concepts, and (2) each application must incur the performance cost of assembling the component data into complete representations of such business concepts.

XML Benefit

As with application integration, XML helps data integration solutions avoid incompatibility with each other. An obvious solution to data integration is centralizing the process of synthesizing data from different sources. Unfortunately, whatever service performs this synthesis must provide the resulting information in a format that might itself turn out to be incompatible with that used by other systems. XML addresses this issue. Because Web browsers have become a standard component of the enterprise desktop, there is a strong argument for using XML as the universal data format for human clients. Also, because many of the recent B2B and EAI solutions use XML documents, there is a strong argument for using XML as the universal data format for software clients. The widespread availability of XML-capable clients and its flexibility in structuring information make it possible for XML to become the de facto data interchange format. It also eases the use of other technologies such as SVG, as discussed in the trading partner coordination section. Making all business data available as XML naturally facilitates the use of SVG for business data visualization.

Architecture

The data servers described in Chapter 5 provide the foundation for data integration applications. As Figure 7-5 shows, the architecture for data integration includes multiple data sources, a data server, and multiple clients. The data sources could be files in filesystems, DBMSs, and applications ”anything with remotely accessible information.

Figure 7-5. Data Integration Architecture

graphics/07fig05.jpg

The data integration server takes data records from these sources, composes them into XML documents, and makes them available to clients. It also accepts updated documents from clients, disassembles them into their component information elements, and instructs the data sources to change the corresponding component data records. The data integration server does not have to use an XML-based mechanism to access and update the information sources. It could also use native database APIs, JDBC, or ODBC.

A client could be a Web server that then passes the documents on to end users. It could also be a B2B application that processes data from multiple sources to generate business messages. It could even be another data source that needs its own set of consolidated data, such as a data warehouse.

Key Features

What distinguishes data integration from other applications is its true middleman nature: It assembles only information; it does not use it as part of a business process. The fact that it does not necessarily use XML documents as the means of exchange with back end information sources is also an important characteristic.

Development Process

In nearly all cases, you will purchase a data server from a vendor. Chapter 5 discussed some of these vendors using a native XML store as a data server. After selecting a platform, there are two important steps in implementing a synthesis application: (1) figuring out what types of business information human and software users need and (2) locating the component data.

The first step is a needs assessment problem. You identify all the different groups that want synthesized information and explore the needs of each. Then you design a set of schemas that meet these needs. Once you know what data you require, you move on to the second step, which is a data access problem. Find out which data sources contain the required data and gain access to those sources.

With schemas in hand and access to the component data, you then map the elements and attributes specified in the schemas to data fields in information sources. The vendor should supply a tool for browsing the structure of information sources and creating the necessary mappings. However, you may have to specify a number of important parameters such as concurrency policy, transaction policy, and access control policy. Finally, you test the system to make sure that the mapping works and that the parameters specified produce the expected behavior.

Schema Source

The origin of schemas depends on why you are doing data integration. If the goal is to synthesize customized information to support different job tasks , you create a schema for each target audience segment. If you need specific formats to support custom applications, you create a schema for each type of data request these applications need to make. If you want to support B2B applications so they can exchange information with other organizations using standard formats, the schema comes from the standards body.

Document Life Cycle

The data integration server creates XML documents based on the requirements of clients. In most cases, it creates these documents dynamically for each request. The document is destroyed , as far as the data integration server is concerned , as soon as it sends the document to a client, although the client might retain one or more copies. As part of the middle tier , the data integration server may also choose to cache documents, but no part of the system ever assumes that a particular document is in the cache.