13.2 Data Adapters | Modernizing Legacy Systems: Software Technologies, Engineering Processes, and Business Practices

Data adapters are required when the data and code are migrated simultaneously . These adapters synchronize data between the modernized and legacy databases during incremental data migration. Data adapters provide the mapping between a legacy database schema and a modernized database schema, as shown in Figure 13-1.

Figure 13-1. Generic data adapter

Data Replication

Data replication is the process of copying and maintaining database tables in multiple databases. Replication provides users with fast, local access to shared data and greater availability to applications because alternative options exist for data access. Even if one site becomes unavailable, users can continue to query, or even update, data at other locations. Changes applied at one site are captured and stored locally before being forwarded and applied to the master repository at another site. ^[1]

^[1] We realize that data replication may exist without one repository acting as the master, or primary, copy. The challenge with styles of replication that do not use a master copy is often determining which repository contains the latest updates.

Data replication can also enable decentralized access to legacy data stored in mainframes. Local instances are replicated from portions of the centralized legacy database and are stored in a modern database. The local copy of the data wraps and buffers the original data source. Applications using the data obtain the benefits of local access to a modern database instead of remote access to an obsolete data repository.

Data replication, however, can have problems with data coupling ” especially if local data sets overlap. Data elements may need to be updated by both legacy and modernized systems, for example, when the code accessing a particular data element cannot be migrated in a single increment. The following sections describe mechanisms for supporting data replication.

Scripts

Scripts provide a simple mechanism for synchronizing replicated data. Scripts transform data from one system to another according to a predefined mapping between data elements. Scripts are usually run in batch mode, periodically, or on demand. Scripts do not typically require significant changes to the application.

Scripts also have disadvantages. For example, it can be difficult to determine the most recent changes to the data when synchronizing databases updated by both systems. One solution is to timestamp the data so that the most recent update can be used. This solution may not work when data is updated using values that have already been replaced in the other system. Ozsu and Valduriez provide additional information on problems related to data replication [Ozsu 99].

Extraction, Transformation, and Loading (ETL)

ETL tools extract data from a source database; transform the data into a suitable format, using rules or lookup tables or by creating combinations with other data; and then load the data into a target database that may or may not have previously existed. ETL tools may be stand-alone tools or act as middleware residing between a client and a database. Even though ETL tools are used mostly for data warehousing and data marts, they can also support data replication between legacy and modernized systems.

ETL has broad tool support and can be used in batch or on-the-fly mode. Some ETL tools also support bidirectional transfer, or data that moves from the legacy system to the modernized system and vice versa. ETL, however, can lead to inefficiencies because it is an additional component running on the system. In addition, ETL tools may not be available for all legacy platforms and may take considerable effort to master.

Database Triggers

Database triggers are fragments of logic that execute within a database when specified conditions are established or events occur. Database triggers can be used to synchronize data between a legacy database and a modernized database. When used for synchronization, they are programmed as POST-UPDATE or POST-INSERT triggers associated with each replicated table. The logic inside the trigger propagates changes, synchronously or asynchronously, to the corresponding set of tables in the other system. These triggers can be programmed on the legacy database, on the modernized database, or on both databases.

An advantage of using database triggers for data synchronization is that data changes can be propagated on-the-fly, depending on the communication mechanism and infrastructure. Additionally, if it is programmed on the modernized database to update the legacy database, a database trigger can simply be disabled once the legacy database is migrated.

The disadvantage of database triggers is that they increase the workload of the DBMS. Also, mapping between the two databases might be complicated, especially if they have different structures ”for example, if the legacy database is network and the modern database is relational.

The use of database triggers as data adapters is most effective when the legacy and modernized databases rely on the same database management system. In this case, the mapping rules are maintained in the database and not in an additional tool or layer, thereby improving efficiency. If the databases do not rely on the same DBMS, the triggers can be programmed to invoke either scripts or an ETL tool.

Data-Access Layer

A data-access layer maps between data elements so that they appear in a prescribed format to the client application. For example, a data-access layer may make a network database appear as a relational database to a client program.

An advantage of a data-access layer is that the data remaining on the legacy system does not need to be replicated; all data-access operations are performed through the relational view of the legacy database. Although most data-access-layer tools provide read-only access, some can write to the legacy database.

A disadvantage is that a data-access layer provides two points of access to a single data element. As a result, it is necessary to serialize data access to guarantee data integrity.

A data-access layer should be used only as an interim solution during incremental modernization. Mapping from a hierarchical database to a relational database produces a design that is nonoptimal because of the underlying differences between the two database types (see Section 7.3).

Database Gateway

A database gateway is a specific type of software gateway that translates between two or more data-access protocols [Altman 99]. Many vendor-specific protocols are used to access databases. The de facto industry standards include ODBC, JDBC and ODMG; see Chapter 7 for additional information on these standards. A database gateway typically translates a vendor-specific access protocol into one of these de facto standards. Using a database gateway to access legacy data improves connectivity, enables remote access, and supports integrating legacy data with modern systems.

Given that there are multiple standards, the protocols supported by the legacy system database gateway and the protocols supported by the modernized system may not match. Figure 13-2, for example, shows a legacy system that uses an ODBC gateway, whereas the modern system requires a JDBC interface. One solution is a special gateway, called a bridge, that translates one standard protocol into another ”in this case, a JDBC-ODBC bridge, also called a Type 1 JDBC driver; refer to Section 7.4 for a description of the various JDBC driver types.

Figure 13-2. Gateways and bridges

Hybrids

Hybrid data adapters combine two or more of the data adaptation techniques described in this chapter to solve a data synchronization problem. One example of a hybrid solution is a data-access layer for read access and scripts for batch data synchronization. Another example is using database triggers to call an ETL tool to perform on-the-fly data synchronization.

Comparison

Table 13-1 summarizes the strengths and weaknesses of the data adaptation techniques presented in this section.

Table 13-1. Comparison of Data Adaptation Techniques

Technique	Result	Strengths	Weaknesses
Script	Provides data synchronization for replicated data	Simple to implement; doesn't require extensive changes	Difficult to maintain data cohesion in nontrivial cases
ETL	Provides data synchronization for replicated data	Tool support	May lead to inefficiencies; availability of tools; difficult to master
Database trigger	Provides data synchronization for replicated data	Not an additional component in the system; eases legacy system turn -off	Loads the DBMS with additional work; difficult to implement
Data-access layer	Makes a legacy data source appear in a prescribed format	Data does not need to be replicated	Inefficient; dual-access points
Database gateway	Translates a proprietary access protocol to a standard access protocol	Low cost/tool support	Limited impact on maintainability