Implementing the Migration to the Solaris Operating System | Migrating to the Solaris Operating System: The Discipline of UNIX-to-UNIX Migrations

< Day Day Up >

The next phase of a migration project involves implementation tasks. In this section, we examine the tasks involved with implementing an HP/UX solution to the Solaris OS.

Applying Migration Techniques

Migration involves the movement of business logic (for example, code), business data, configuration data, and metadata from the source environment to the target environment.

In this case study, the various objects to be migrated were categorized and a strategy for moving them was developed. Because we were certain that the source system worked satisfactorily, the initial strategy was to copy all the objects from the source environment to the target environment. However, during prototyping, we discovered that the database volume map needed to be recreated and that it would be difficult to port certain configurations and metadata. Because it was going to be more difficult to adopt a "copy everything" strategy than we'd expected, we decided to use the installation scripts provided by GEAC and Sybase to create a container in which the remaining data and executables were installed.

The decision to implement a new disk map was made for the following reasons:

The target system had a different disk architecture from that of the source system.
The source system disk map had only two virtual disks, which makes implementing database recovery difficult.

Note

Database recovery requires that database files, the write-ahead log, and the offline images of these objects be held on different disks. Databases with split journals and before-image logs (such as Oracle) might require an additional disk. Database recovery is designed to protect work against a lost disk, so the write-ahead log must be on a different disk from the database or it will be unavailable if the database disks become unavailable. Offline copies should be held on other disks so that they are available if the write-ahead log becomes unavailable.

The target system had significantly more disk volumes than were available on the source system. The disk map redesign also enabled a simplification of the database's internal object placement design.

Both Sun and the customer agreed that the rehosting strategy would be applied to GEAC, Sybase, and the Sybase Client reporting tool used. The data migration process would be a logical copy. It was not possible to undertake a physical copy of the source system; Sybase implements certain features of its system differently on the Solaris OS and HP/UX. The methods used to extract the data for the procurement due diligence were based on logical copy technology for this reason.

Note

Print-report logic and the output formats were defined in another third-party product, originally sold by Sybase, which remained available. The customer obtained a Solaris license for this product. This product and an ISV portability guarantee permitted the interface between the Print Job Definitions and Print Jobs commands to be defined as a third-party protocol and obviated the need to rewrite the report programs.

Solutions for the individual data item types needed to be developed.

The team undertook a volatility analysis of the data objects. The database schema definitions were very stable. The majority of the databases were part of the ISV package (SmartStream); therefore, the schemas were very stable, typically only changing when software updates were applied. At the other end of the spectrum, the online transaction processing (OLTP) data changed minute by minute. Between both ends of this spectrum were the customer proprietary schemas and user identity data.

Sybase data servers organize their catalog tables either in a configuration database called master or in the application's databases. master holds server wide data, whereas the application databases hold database-specific configuration data as subsets of the data. Each database has its own local catalog tables, including private lists of user objects such as tables, indexes, views, and users. Each database has its own write-ahead log and is therefore a unit of recovery. Identifying Sybase RDBMS metadata is relatively simple. However, GEAC originally architected its distributed-systems solution to hold significant amounts of metadata either in specific application databases that were solely responsible for holding this data or in tables within application databases that had business-functional purposes.

An audit of the available technology to copy data from the source to the target was undertaken. The procurement due diligence resulted in a suite of programs to extract critical business data and metadata from the customer's systems. On the general principal of trying to copy everything, we initially decided to extend the copy and upload scripts from the initial subset to all data objects. We planned to use Sybase's bulk copy program ( bcp ) to transfer the data (and certain other objects) and to use Sybase's defncopy utility program to migrate the triggers, views, and procedures. We decided to maximize the advantage of using the Sybase and GEAC capability of running on both the source and target system. A copy strategy also allowed the migration team to minimize their understanding of the GEAC schemas as if they were the same on both systems.

However, Sybase's instance-to-instance copy facilities were limited and needed to be augmented. Sybase had logical copy tools such as bcp or defncopy that could be used for tables or procedural objects. The database was the only larger object that could be made the argument of a copy program: the dump and load commands could be used on it. However, dump and load activities need certain common configuration parameters for the source and target servers. The default collating sequence for a Sybase server is different for HP/UX and Solaris. The team chose to implement the default on the target (Solaris) system for maintainability reasons, so the dump and load commands were not available. An object-by-object copy policy needed to be defined.

The customer had access to two schema extractors. These extractors were part of the two computer-aided software engineering (CASE) tools and could act as reverse engineering tools. Despite relying on the rehosting strategy, we realized that weaknesses in the environment's capability meant that the rehosting strategy had to be supplemented. Reverse engineering techniques and source code porting techniques were used to supplement the strategy.

We chose a schema extractor primarily due to the availability of certain skills and because of current licensing commitments (the customer had two CASE tools licensed and preferred one over the other).

The transfer strategies are summarized in the following table.

Table 11-1. Application of Migration Techniques to Objects

Object Type	Transfer Tool	Transfer Technique
Executables	Reinstall	Rehost
Table definition	Schema extractor	Reverse engineer
Table data	Sybase BCP utility	Rehost
Index definition	Schema extractor	Reverse engineer
`syslogin` table data	Sybase BCP utility	Rehost
`sysusers` table data	Schema extractor	Reverse engineer
User datatypes, rules and defaults	Schema extractor, supplemented with bespoke DDL	Reverse engineer, supplemented with source code port
Views	Sybase `defncopy` utility	Rehost
Procedures	Sybase `defncopy` utility, GEAC source files or bespoke DDL files	Rehost, supplemented with source code port
Permissions	Schema extractor	Reverse engineer

Although table definitions and contents are two separate objects in the component map, an index definition contains a description in SQL that is defined in text and held in a catalog table. In the case of Sybase, the index definition possesses either a clustered structure or a B-Tree structure. The definition can be run at any time, but running it involves building the index contents, which, in the case of a clustered index, involves sorting the table. For this reason, a create index statement can have significant runtime implications, but it obviates the need to copy index contents.

The disk map redesign made obvious the need to copy the sysdevices table that maps the RDBMS's name for a disk volume to the OS name. It also revealed the need to port any named segments. The role of a segment is to provide a location name to which a table or index can be bound. This allows DBAs to manage the location of objects, permitting, for example, a B-Tree index or a log object to be located on specific disks different from their bound table or database. The creation of the sysdevices table and the minimum necessary segments was undertaken when the RDBMS and their component databases were installed.

The stored procedures were copied by one of two methods. Procedures that were also implemented as text objects in the catalog tables were copied with defncopy . In cases in which there were unresolved external references, the original source code files for the data definition language (DDL) were inspected and rerun unchanged or were massaged. The original author was either the COTS vendor or the customer.

The print management solution was the outstanding piece of code that had yet to be ported. The print queues were reallocated to the file servers, but the print management component had been implemented in UNIX shell script. There was a very limited amount of this code, and rather than install Sun's standard tools and test harnesses, we ported the code manually by inspection and iterative testing. This means that the replacement strategy was applied to the queue management function, and that a source code porting technique was applied to the runtime management functions.

TABLE 11-1 on page 216 shows how the basic strategy of utilizing the COTS vendor's guarantee of platform independence and a consistent API across platforms was supplemented. The initial strategy of copying everything was amended for the following reasons:

Manifest quality improvement was achieved by a redesign of the disk map.
The cost of isolating the physical changes that were required and applying separate strategies was too high.
The time required to transition with this strategy demonstrated a need for a more incremental approach.

Let us consider further the ease of undertaking the creation of a component/technique map. In the real circumstances of the case study, the axiomatic properties of an RDBMS ease the mapping of object instance to object classes. The descoping of all the client logic also eased the migration task significantly. In many cases, allocating an object to a class of data is not easy. This problem is eased by the facts that a technique can be reused and that object types can have multiple techniques applied to them, as illustrated TABLE 11-1 on page 216. With RDBMS systems, it can be very difficult to define which objects sit in which category. Note that it should be easier for database objects than where significant amount of 3GL code exists, because the types of objects available within 3GLs are far more restricted, and active dictionaries in which metadata are held are less frequently implemented. In the RDBMS case above, the metadata and many runtime objects are held in the data dictionary, alternatively called the database catalog. 3GL systems more frequently build and integrate their metadata within the executables. In this case study, one of the executable database objects was the Sybase stored procedures, which were easy to isolate, although some required significant massage to port because they had unresolved external references. If the stored procedure used a temporary object, which means one that is created and destroyed by the procedure itself, despite the fact that the creation commands exist within the stored procedure, unless the object exists at the time the CREATE PROCEDURE command is issued, the CREATE PROCEDURE command will fail.

An example of a tightly integrated business logic and execution logic is described in the following paragraph. For instance, Cobol 88-level definitions are business logic objects, and are embedded into an executable, or at least into the source code lines.

 77  BOOL-TEST-1                       PIC(X)VALUE 0. 88      TEST-1-TRUE REDEFINES BOOL-TEST-1   VALUE 1. 88      TEST-1-FALSE REDEFINES BOOL-TEST-1  VALUE 0.     IF < complex condition true>         MOVE 1 TO BOOL-TEST-1.     IF < complex condition false >         MOVE 0 TO BOOL-TEST-1.     IF TEST-1-TRUE         PERFORM TRUE-CONDITIONS     ELSE         PERFORM FALSE-CONDITIONS.

In this example, data division entries state that a binary test condition exists, for example, it can be true or false, and that the variable acts as a flag. The next two procedural statements evaluate the condition and encapsulate a business rule; the final statement executes the business transaction logic. To simplify the example, we've used the PERFORM statement to invoke a section that undertakes this work. The business rules, business logic, and business data types are all distributed throughout the source code lines. Extracting these three object classes as individual items from a source code file is hard. There are no clear rules for distinguishing among these object classes, and identifying the elements is equally difficult. Fortunately, the richer data types available in an RDBMS solution allow application designers to isolate business rules and transaction logic from implementation detail.

This case study shows how iteration and prototyping were applied to various objects, object types, and classes. For example, an index can contain both business logic (a uniqueness constraint) and implementation factors such as a fill factor. The value of or the necessity for iteration and prototyping might depend on physical design and implementation details of the RDBMS. Other examples of implementation details that are encapsulated in the index include the location where the index was built and the sort order of a clustered index.

In the case study documented here, the RDBMS was implemented with a cost-based query parser and predated the implementation of query hints. For this reason, the index is used as an example, but where rule-based analyzers are implemented, performance-critical transactions need to be tested to ensure that the query plan resolution remains optimal. Query plans are usually calculated at runtime, so prototyping and preimplementation testing might be required because rule-based optimizers can require code changes to permit the analyzer to choose the optimal query plan. This is particularly important when RDBMS version upgrades are undertaken.

TABLE 11-1 on page 216 shows the application of migration strategies and techniques to specific database and execution objects. These strategies and techniques were developed during the architecture stage and were refined during the implementation phase while software to implement the migration was developed.

With an RDBMS, business logic can be held in database objects (such as views, indexes, constraints, triggers, and procedures) or in client-side objects. The definition of scope is critical in defining the techniques used to migrate business logic. As described above, business-object logic can be in scope or out of scope. The scope status of business logic can depend on several factors. The business logic can be redundant, as is the case when the retirement/replacement technique is used. It can be encapsulated in part of the environment that exists in both the source and target environments. In the case of this study, client-side PowerBuilder procedures were executed in both the source and target environments; therefore, they remained out of scope. For this reason, these business-logic entities remained the same in both source and target environments. Given the techniques applied to the migration, the business and presentation logic encapsulated in these procedures and objects was defined as out of scope.

The Print Job component was defined as being within the scope of the project because print queue management had been undertaken by the HP/UX jobs. Actual queue management was moved into the LAN, and the organization's LAN servers were configured to hold the print queues. If the job were undertaken today, it is likely that the printers would manage their own queues, depending on the output management requirements such as restarting and reprinting. Scheduling print jobs against the databases was managed by a series of shell scripts that ran the third-party report generators. The report logic was held in ASCII files holding SQL script definitions that were invoked by the shell scripts. The following example illustrates shell syntax that allows the script to run on either an HP/UX system or Solaris system, and thus supports backward compatibility. Furthermore, it has the advantage of indirectly referencing the UNIX utility in the example (for example, cpio ). In this case study, indirection was implemented but backward compatibility was not.

 #  !/bin/ksh  OS=`uname  ${cutpath}/cut -f3 -d' '` case $OS in HPUX)    OS_PATH_LIST=${HPUX_PATH_LIST};; SunOS)   OS_PATH_LIST=${SOLARIS_PATH_LIST};; *)       exit 1;; esac #Original Line #PATH=${HPUX_PATH_LIST} PATH=${OS_PATH_LIST} CPIO=`whence cpio` . . $CPIO ${CPIO_FLAGS}

Metadata is data that describes data. In the case study, this was absolutely critical because the key migration object was an RDBMS that possesses an active data dictionary. Three strategies were applied to the metadata:

Utilizing the installation processes provided by the COTS and DBMS vendors
Copying metadata objects from the source environment to the target environment with tools based on the appropriate migration technique
Applying reverse engineering techniques

Financial considerations were the primary influence over the decision of which strategy to apply. There are several factors in calculating strategy costs. These include the following:

The cost of identifying objects. In the case study, a number of objects could not be identified and the installation processes were utilized.
The cost of applying the strategies. SQL-BackTrack was rejected because of cost.
The runtime cost implications of the strategy.

Not all metadata is held in obvious metadata objects. In the case of the COTS product under consideration, metadata was held in the RDBMS catalog tables, user tables defined by the COTS vendor, and index definitions. One additional piece of metadata included the representation of the system namespace within objects that are available to the application. GEAC SmartStream used multiple databases within a database server and used Sybase remote procedure calls to implement inter-database transactions. This permitted the deployment of a SmartStream implementation across any number of server instances. One of the advantages of this implementation feature is that different application components can be deployed in separate servers on separate hosts . The development of blade technology gives this architecture a new lease on life. This technology requires each server and stored procedure to know about the location of the database within a server. Sybase implements a name service based on flat files, mapping a server name to a TCP/IP address/port location. In addition, when a remote-stored procedure semantic is implemented, this name service must be placed within a security model implemented in the catalog tables. In the case study, most of the COTS metadata was applied to the target the installation scripts were run.

The examination of security data in the context of application name services brought us to security data itself. Within Sybase, both authentication and privilege management functionality is implemented. Privilege management is part of the SQL standard. The permissions row in TABLE 11-1 on page 216 represents the implementation of each object's execution, read, insert, update, delete, create, and destroy privileges. The mapping of a user's identity to a privilege set is a business issue based on roles within the business. At this customer site, the authentication data was treated as data, not as metadata; therefore, it was copied across. This is represented by the syslogins row in TABLE 11-1 on page 216. Sybase also implements an alias for each login within each database, and at the time, it presented the team with a referential integrity issue between the database user alias and login. The aliases were migrated with reverse engineering techniques, with manual inspection and adjustment as the remediation techniques used when reverse engineering failed.

Namespace Migration

In the case study, there are three namespace problems as follows :

Database object namespace. Objects within the data servers (except databases ). This includes a need to migrate or transition the server names and address maps
Applications component namespace.
System component namespace.

Different techniques were used to manage the namespace implementations to the target environments.

We utilize the application's installation procedures to preserve the database server's internal namespace. This meant that the proprietary extensions to GEAC SmartStream deployed by the customer also needed to be ported and the object namespace preserved. The mechanism used to preserve the object namespace is documented below. It utilized the file system by writing the object definitions to files with the object name in the file system name.

The target data servers were given new names. This was required because the servers had separate TCP/IP addresses and both needed to be on the network at the same time. This policy conformed to the strategies adopted and aided transition because the customer had a good server name management-distribution policy. The server name and address file, ${SYBASE}/interfaces , was held on a file server and read by each of the user client's systems. This system also allowed the default data server to be configured by the LAN administrator.

The application's component namespace was managed as defined by GEAC, and existing documentation explained how to transition the namespace from HP/UX to the Solaris OS. This transition involved manually updating rows in three tables. The customer had previously moved systems and had scripts we could use to update the system names. Only one of the rows involved specifying the target OS. The system namespace was implemented in bind .

Data Migration

The use of the supplementary techniques is mainly constrained to nondata objects. In the case study, data was defined as only the content of database tables that contained business data. Previous sections discussed the techniques used to identify metadata, configuration data, and security data; what's left is the business data. We had two choices for copying the data:

Logical copies
Physical copies

The need to apply data transformation to the source data is one of the primary influences on the decision of what technique to use for copying data. In this case study, copying data from the legacy mainframe required the application of transformation techniques. With one exception, moving the business data from the source environment to the target environment did not require transformational work. This means that a logical copy was simple and that a physical copy was possible. At this site, a physical copy was not possible because of implementation differences in the RDBMS on HP/UX and Solaris systems. Therefore, a logical copy was the only option. In the case of Sybase, this suggests the bcp program; in the case of Oracle, it would imply the use of the export and import commands.

In all cases, object namespace preservation and mapping is required. This means that because we were using different techniques in the case of Sybase to copy the table definitions and table contents, the planners needed to map the target DDL file, table name, and table contents file. (This would not be the case with Oracle's import/export, but would be if SQL/ODL were used.) This issue was resolved by use of the UNIX file system to preserve the table namespace between systems, as shown here.

 mkdir ${database_name}; cd ${database_name} for table_name in ${table_name_list} do     mkdir ${table_name};cd ${table_name}     extract_ddl ${table_name}> ${table_name}.ddl     bcp ${bcp_flags} out ${table_name} \     > ${table_name}.data done

In this case, extract_ddl is a script or function that performs the table DDL extraction so that ${table_name}.ddl contains the table DDL code. The queried object might be the database, or it might be a flat file that contains the complete RDBMS-instance DDL, prepared by the selected schema extractor. The following example code can also be used to preserve objects transferred by defncopy .

 mkdir ${database_name}; cd ${database_name} mkdir views ;cd views for view_name in ${view_name_list} do     defncopy ${defncopy_flags} ${view_name} \     > ${view_name}.ddl done

In both cases, input scripts can be driven by parsing the directories for *.ddl files.

In the case study site, migration harnesses were built to parse the database catalogs to extract the ddl and data files, and the input scripts parsed the UNIX file system to drive the database inputs. The input scripts also used symmetrization techniques to leverage the power of the SMP platforms proposed for the target implementation. Each job stream uploaded a quarter of the database bound to a single CPU, and the jobs ran concurrently.

Specify the Implementation Platform

The procurement due diligence exercise led Sun and the customer to specify the hardware platforms. It was proposed that a system with three domains would support the production, QA, and MIS environments, and a second system would support development and training and act as a business continuity system if the production machine became unavailable. This meant that the customer wanted both a physical consolidation and workload sharing consolidation benefits. These decisions allowed the customer to recover significant floor space through the consolidation of three environments onto a single system. The shared solution also delivered significant floorspace savings.

One of the aims of this project was to reduce the number of system hosts at the customer site. The current estate consisted of five HP/UX systems, and the goal was to reduce this number to two Sun systems. However, because the number of management environments was five, separate instances of the OS were required to allow differing and separate management policies to be implemented and enforced. At the time, an instance of the OS could have only one security model and the business necessity of ring fencing nonoperational users from production systems was, and still is, almost universal. The target platform design established during the customer's due diligence phase consisted of two Sun servers, with only one being capable of hosting multiple OS instances. Both systems were SMP systems, and the smaller was designated to become the development and training system host. This involved the implementation of two application instances within a single instance of the Solaris OS and used an aggregation design pattern. The remaining instances of the application (production, QA, and MIS) were planned to be hosted within a domain in a multidomain system.

Specify the OE Tune State

We initiated a requirements-capture exercise. This exercise primarily involved collecting the constraints that the superstructure products such as the RDBMS placed on the /etc/system file tunables. The following were the two key tunables for the RDBMS:

SHMMAX . Maximum size of a contiguous shared memory segment. With the versions of Sybase proposed, a limit of 2 gigabytes was the maximum. More recent and current versions support Very Large Memory addressing, so a more appropriate setting is to set SHMMAX to high values.

If explicit values are set for SHMMAX , the system will require rebooting if the database administrator decides to increase the database buffer cache beyond the SHMMAX limit. Restarting the database server process will cause a service outage to their users. In a shared infrastructure solution, rebooting a system is undesirable because other customers might take a service outage for no benefit.
ISM . The Solaris default is intimate shared memory on which is the advantageous performance configuration. This configuration option had implications for defining the swap partition size.

Prototyping during the test loads of the development and training instances was undertaken to see if the available processor-management tools were necessary or desirable to manage service level provisioning for the multiple communities proposed to use the shared second system. These management tools allowed the system administrator to provide rules to the dispatcher. It was discovered that the Solaris affinity algorithms did not need the help of the process management tools, and the final production configuration for this system did not use them.

Build a Migration Harness

The copy programs were encapsulated into a harness so that the migration team could undertake relevant jobs of work. These included "extract an instance," "load an index," and "rebuild indexes." These were supplemented by jobs to copy various objects that were planned to be precopied. These latter programs could take an instance, database, or object as argument so that they could be copied incrementally. They were all driven by lists that were created by the developer team or developed by browsing the database catalogs. By creating programs to undertake this work, not only was human productivity enhanced but the programs could be tested and trusted. This minimized the requirement for testing the processes activity; if the jobs reported success, then the prior testing of the programs enhanced the confidence that the job had been performed accurately. It made the process testable.

The transition process was principally tested by migrating the training and QA instances before the production instance of the application. This permitted both real timings for the data extraction and target index builds to occur. It also meant that the user training began on the target Solaris system several weeks before the migrated solution was to be placed into production. This allowed the training team, as well as the trainees, to comprehensively test the migrated application. This was advantageous because it ensured that trainees were introduced to every aspect of the system, and it had the added benefit of thoroughly testing the client-server interfaces.

Utilize Management Environments to Enhance Testing

The transition plan for this project included testing plans for testing outputs and regression testing. The migration process was pretested, and checkpoint tests were inserted. In addition, checkpoints were designed into the plan to use backup solutions.

The migration team utilized nonproduction environments as part of the enterprise transition plan. The training and QA environments were ported in advance of the production instance, which improved the confidence the team had in the transition harness and the application of basic strategies. The migration of the training department allowed enhanced, comprehensive testing of the client APIs. The migration of the QA instance delivered confidence that the production performance tests would be achieved.

The development and MIS environments were ported after the production transition. The development environment was created by copying the training the environment and then applying the developers' subsequent changes to the new development environment. This is a process that the customer had frequently undertaken and was satisfied with.

The MIS environment was created with the production mechanism, which was to use Sybase's block-level online dump and load. This gave us the advantage of testing that this process/program worked in the new environment. (The technology had been tested before the production transition).

< Day Day Up >