13.4 Architecture Transformation Strategy | Modernizing Legacy Systems: Software Technologies, Engineering Processes, and Business Practices

Now that we have examined both logic and data adapters, we can consider an architectural transformation strategy. One method for creating this strategy is to form it around a set of answers to key issues in the modernization project. Here, we develop a strategy based on the answers to the following questions.

How will the code be migrated ?
When will the data be migrated relative to the code migration?
Do we need to support parallel operations?

Table 13-3 lists potential answers to each of these questions. Selecting an answer to each question forms the basis for a componentization strategy. Of course, the answers to these questions must be considered collectively because certain groups of answers have more cohesion than others. The following sections describe these questions in more detail and examine the characteristics of each potential answer. We discuss the combined effects in Section 13.5.

Table 13-3. Incremental Deployment Options

Issues

Options

Code migration

A1 Based on user transactions

A2 Based on related functionality

Data migration

B1 Before code migration

B2 During code migration

B3 After code migration

Deployment

C1 Deploy each increment in parallel with the legacy system

C2 Deploy each increment as the operational system

Code Migration

Code can be split and migrated in many ways. One approach is to extract a functional thread from a program element or elements. This can be difficult to implement in practice because it requires a white-box approach. Program elements must be "opened up" and significantly modified. If the existing system is being modernized because it is fragile and difficult to maintain, as is the case with RSS, this approach may not be viable .

The remaining possibility is to migrate sets of "whole" program elements, using a black-box approach. Successfully migrating legacy code based on program element sets requires linking remaining legacy program elements to modernized logic functionality without compromising the overall functionality of the system. In the RSS example, program elements can be split across business objects, and business objects can be deployed while still incomplete ”as long as the overall functionality of the system remains intact. This approach supports architectural transformation but may require some rework of business objects as the system evolves.

Figure 13-5 illustrates code migration by program element sets. A legacy component (121) is scheduled for modernization. On the right, functionality performed by 121 is reimplemented as part of the modernized architecture. However, the 121 component is still invoked by the 345 program element and invokes the 129 component element, neither of which has been modernized. In this case, it is necessary to develop a shell and an adapter for the 121 program. The shell maintains the external interfaces of the 121 program element. The adapter accepts requests from the 121 shell and invokes methods in the modernized components to implement this functionality. Results can then be returned to the 121 shell, which will use this data to satisfy its external requirements before calling program element 129.

Figure 13-5. Code migration by program element sets

In the remainder of this section, we consider two approaches for selecting program elements for migration in a given increment. The first approach is to select program elements based on user transactions; the second is to select program elements implementing related functionality.

User Transactions

User transactions ”user or external system requests that result in the execution of a series of program elements ”can be used to identify functionality to be migrated to the modern system. As each user transaction is modernized, it is turned off in the legacy system and redirected to the modern system. A top-level GUI/routing program determines whether user transactions should be invoked in the modern system or the legacy system.

User transactions can also be used as input in both static and dynamic analysis techniques. Trace programs can identify program elements invoked during the execution of a given user transaction. However, tracing program execution does not guarantee that all the program elements that may be invoked during execution of the user transaction have been identified, particularly with respect to exception handling. Not only do these programs need to be identified, but also any reachable program elements must be migrated. Reachable program elements are program elements that can be called, either directly or indirectly, by a program element in the user transaction.

Migrating program elements based on user transaction sets has the advantage that complete use cases can be migrated and executed on the modern platform. The disadvantage is that it can force the migration of large amounts of functionality in one increment. If suitably sized increments cannot be found, it will be impossible to build the system within the incremental development and deployment paradigm. Also, migrating large amounts of functionality in a single increment, particularly an early increment, reduces the opportunity to refine the target architecture based on lessons learned.

To successfully migrate the code based on user transactions, most of the transactions must be fully executable within a small set of program elements. If the majority of transactions require a large number of program elements, it will be impossible to find suitably sized increments.

Related Functionality

Program elements that implement related functionality can also be selected for migration in the same increment. In theory, these program elements demonstrate greater cohesion than those that execute as part of user transactions. Groups of functionally related program elements should correspond more directly with business objects in the target architecture, making it easier to complete business objects in each increment. Also, working in a related functional area makes it easier for developers to understand the requirements for that area and to develop appropriate designs. The major problem with this approach is identifying program elements that implement related functionality. Some methods for accomplishing this are described in Chapter 15.

Data Migration

The second question to consider when selecting a componentization strategy is when to modernize the database and migrate the data. Like other aspects of the legacy system, the database schema has evolved over time and not necessarily in an optimal fashion. One of the goals of the componentization effort is to improve the representation of data in the database. This, in turn , will eliminate redundancy, improve performance, reduce storage requirements, and reduce the potential for database anomalies.

In general, there are no guarantees about the mapping between the legacy and modernized databases. Some existing database tables may be split up; others may be grouped. New database tables will be created and existing tables eliminated. This may result in a complex relationship between database fields in the legacy and modernized systems.

There are three options for data migration: before, during, and after the code migration. Regardless of which strategy is adopted, when migrating from a network, or hierarchical, to a relational model, the database will most likely pass through a series of states, as shown in Figure 13-6. In RSS, for example, the data is initially stored in a network database (DMS). The first step is to migrate this data to an equivalent relational form. This translation requires modifying the structure of the data to compensate for the differences between relational and network database models. The next step is to replace the database schema that reflects the structure of the legacy tables with a modernized database schema. Eventually, the entire database will be migrated to the modernized structure, as shown at the right in Figure 13-6.

Figure 13-6. Database migration

Data Migration before Code Migration

Migrating the data before the code has several advantages. It certainly simplifies the code migration. Modernized code can be developed to the target data architecture and does not need to be mapped to legacy data elements. Migrating the data first is also a focused effort with a single goal, so it is easier to accomplish. Finally, migrating the data first reduces the risk of retaining the legacy architecture, because the code migration will be based on the new database schema.

Unfortunately, this approach also has disadvantages. Migrating the data requires restructuring the legacy system to accommodate the modified tables or providing a reverse data adapter from the legacy to the modern database tables. This is a major concern if the legacy system is being modernized because of its lack of maintainability. In this case, restructuring the legacy code to support the new database schema can be extremely risky.

Another disadvantage is that migrating the data and restructuring the legacy code will consume considerable time and resources. As a result, it should be attempted only if the target database schema is well understood and architecturally sound. This is extremely difficult when dealing with a large, complex system that is being incrementally developed and deployed. The impossibility of designing an optimal database schema without understanding the associated business logic being modeled is accepted as a truism by most developers involved in these efforts.

Because a large up-front investment must be made to migrate the data, there is not much latitude for further refinement of the database schema. This often means that the project must choose between living with the initial assessment or overrunning budgets and schedules. Changing the database schema downstream requires changing the legacy system again and restructuring the modernized code. In general, this high-risk approach depends largely on "getting it right the first time." If you do not have a high degree of confidence in your understanding of the data requirements for the modernized system, this may not be the best approach.

Data Migration during Code Migration

Perhaps the most direct and obvious approach is to migrate the data and code at the same time. Theoretically, this is the least expensive approach because it requires minimal rework. Unfortunately, migrating both the code and the data simultaneously expands the focus of each increment and increases the complexity of the effort. It is particularly difficult when data elements or logic cannot be easily untangled from the legacy system. This approach can quickly degrade into a big-bang deployment.

There are several techniques for combining data migration and code migration. One technique is to identify several database tables to be migrated. By starting with a small number of isolated tables, it may be possible to identify and migrate the program elements accessing these tables in a single increment while minimizing the amount of code that must be migrated. However, it is likely that these program elements will continue to reference the remaining legacy database tables, requiring the development of data adapters.

A second technique is to create new database tables in the modernized system and to use data adapters to maintain the data in a consistent state. To fully synchronize these databases, data adapters must be maintained in both directions. These data adapters are often difficult to develop and maintain. In addition to simply maintaining data consistency, the order in which data elements are updated may be critical to the proper operation of the system. It may also require significant knowledge of the business logic simply to get the data adapters to function properly.

Although both of these techniques are feasible , both introduce significant complexity, making this a high-risk approach. The major problem with data migration during code migration is data replication and synchronization, because transactional integrity and recovery is an issue. If both databases are updated in a distributed transaction, there may be a requirement to ensure that they stay strictly in step. Synchronization may require that two-phase commit be achieved through compensating transactions or be supported by other means, as explained in Chapter 8. In any case, it is a problem to be addressed, especially if the modernized and legacy systems are connected using MQSeries because two-phase commit cannot be used.

Data Migration after Code Migration

Migrating the data after the code has some interesting advantages. For example, it provides additional time to refine the database schema. This approach requires constructing modernized components using the legacy database schema. This is possible using a persistence layer to map component state data to the persistent store. The modernized logic uses only component/object interfaces to access data elements, which is good software engineering practice, anyway. Implementing reports that directly access the database structure using the report pattern, described in Section 12.4, is a special case. Reports can be migrated after the database or go through a mapping layer as well.

Isolating dependencies on the legacy database to the persistence layer can simplify migrating the data after the code. However, code in the persistence layer will still require modification. This effort will involve replacing fairly complex code that maps state data to fields in one or more legacy database tables with calls that map component state data directly to modern database tables. The mapping between component state data and the database schema can be straightforward because the database schema can be designed to mirror the state data.

The persistence code that maps to the legacy database structure may be slow because it must emulate the modern data structure using the legacy system data structure. The good news is that performance will improve when the data migration is completed and the mapping layer removed.

Deployment Strategy

Every time new functionality is deployed to the field, there is an operational risk that the system, including both modernized and legacy components, will not function properly. Deploying each increment in parallel with the modified operational legacy system can mitigate these risks. Alternatively, these risks may be acceptable when weighed against the additional costs and development risks of parallel operations and deploying each release directly to the field as an operational system. These options are analyzed in the following sections.

Parallel Operations

Operational risk can be reduced by running the previous version of the system in parallel with the current release, as shown in Figure 13-7. In this approach, the modernized system is put into operation, but the legacy system is maintained as a "hot" backup. If the new system fails to function properly, control can be switched over to the legacy system. This solution provides a fallback capability that allows on-line verification and testing of the new increment.

Figure 13-7. Parallel operations

Parallel operations provide additional benefits. Users of the system are able to perform a side-by-side comparison of the user interfaces of both the modern and legacy systems. This may help the user learn the new interface and identify places where it is deficient . Parallel operations can also aid in system verification by allowing users to enter similar or identical operations in both the modern and legacy systems to make sure that the results are the same or, in a defensible way, different.

For this to be feasible, the legacy system must have access to the latest data. Providing this access can be problematic because the format and structure of the database tables may have changed between incremental deployments, depending on the data migration strategy. This situation requires synchronization of the database, using one of the data replication techniques described earlier in this chapter.

Deploying in parallel can reduce operational risks, but care must be taken not to corrupt the legacy system while wiring the two systems together. Introducing complex trigger mechanisms, for example, could easily corrupt the legacy system. In general, changes to the legacy system should be minimal and nonpervasive. Another concern is that invoking procedures to synchronize multiple database tables after each update can affect performance. After the modernized system has been deployed, used, and validated , the legacy system and modernized system can be decoupled and the modernized system can run independently.

Although parallel operation can reduce operational risk, it can also increase development risk, degrade performance, and significantly increase maintenance costs. Difficulties may arise in data synchronization and locking between the modern and legacy systems. This can further increase development costs and affect the schedule.

When deploying in parallel, each incremental system release is deployed alongside the legacy system. Once the final release has been verified , the backup system can be stood down. This has several implications for the overall life cycle of the system. First, it will be necessary to maintain two separate databases from the first incremental deployment until the backup system is stood down. This will increase maintenance and support costs over the life of the project. Also, code and database changes will need to be removed from the completed system. Parallel operations make sense when system availability is critical and the risks associated with this approach are negligible or easily mitigated.

Non-Parallel Operation

Another strategy is to deploy each increment as part of the operational system. In this approach, the deployed system consists of modernized and legacy components. Nonparallel deployments typically reduce cost and development time and force users to use the new system immediately, potentially increasing acceptance, without injecting additional technical and software development risks.

The major disadvantage to this strategy is that there is no fallback mechanism in the event of a system failure. Therefore, you must have complete confidence in the system before deploying it as the operational system.