Migrating Data | Migrating to the Solaris Operating System: The Discipline of UNIX-to-UNIX Migrations

< Day Day Up >

With the applications taken care of, it is now time to turn your attention to handling the most delicate part of the migration implementation: data transfer or conversion. While most organizations don't realize it, data migration is one of the most important parts of the migration exercise. After all, what good are most applications on any platform if they don't have data to operate on? Depending on the application that you are using and your outage window, this activity can range in difficulty and effort from trivial to extremely laborious. On the trivial end of the scale, you might be simply copying files over the network or restoring them from magnetic tape. However, at the other end of the scale, you might use exotic networking technologies and complicated utilities requiring several intermediate steps before getting data onto the target platform in the correct format for your application. Whichever end of the spectrum your migration is located on, it is good to understand the range of tools and techniques that are available for moving data.

Transferring Data

Let's start by looking at the shallow end of the pool: data transfers. The goal here is to get data from the legacy server onto the new target environment. While the physical format of the data might change, the logical (application) format of the data will stay the same. The difficulty of transferring data is determined by three factors: size , transfer window, and rate of change. Looking at all these factors, you should be able to develop a strategy for getting data to your desired location.

The size of data is the most obvious factor in data transfers. Size, in itself, is not a problem, but size and a small transfer window can create one. For example, transferring 10 gigabytes worth of files overnight is not problematic for most networks. However, if you needed to transfer 10 gigabytes of data within 10 minutes, there certainly might be a problem. Problems caused by size are not limited to bandwidth. Staging and backing up data in a timely fashion can also be problematic.

The other variable involved in determining the difficulty of a transfer involves the data's rate of change. Static data can make your transfer much easier by allowing you to segment activities. For example, configuration files that do not frequently change can probably be transferred to a new system at any time by a simple copy. This is also the case for read-only application data. However, data that constantly change, such as transactional databases or real-time data acquisitions, might be very difficult to transfer.

Regardless of the level of difficulty involved in your data transfer process, there are a number of decisions you need to make. The most important of these is the general strategy for transferring the data. In the following sections, we describe two methods for transferring data: network data transfers and media data transfers. Following these sections, we detail the steps involved in this process.

Network Data Transfers

Network transfers are the preferred technique for data transfers. Because almost every UNIX has TCP/IP networking and Ethernet connectivity, there are few cases where this method won't be the fastest and easiest way to transfer your data. Adding to this convenience, most network transfer applications will automatically encode data transformations, as needed.

Two primary tools transfer large amounts of data over the network: File Transfer Protocol (FTP) and Network File System (NFS). Both tools allow you to move files over the network quickly. Either tool is acceptable, depending on where the data currently reside. Both tools will be limited by the speed of the network or the disks. However, remember to pull the data off the legacy server when using NFS. NFS provides more efficient reading than writing.

FTP and NFS can be automated with Expect (http://expect.nist.gov/), Perl, or shell scripts. This automation can be difficult, but as you can see by the process described in the preceding paragraphs, you will be doing this transfer many times, so the investment should be worthwhile.

Remember to include at least some rudimentary integrity testing with your scripts. For file-based transfers, using some form of the UNIX chksum command should be sufficient.

Media Data Transfers

If you cannot use the network, perhaps in wide area network (WAN) migrations where the link is too slow, you will be forced to use some kind of media transfer. Traditionally, this has meant restoring streaming tape backups , but today you might also be able to use optical media like CD-ROMs or DVDs. Either way, this transfer requires a temporary storage space where data can be restored on the media, then reassembled before being copied to its final place.

A new type of media transfer is the use of storage area network (SAN) volumes. In this case, a temporary volume is created on the SAN-attached legacy system, unmounted, and remounted in the new environment. While this method can be quite efficient for large data sets, it requires that both systems be connected to the SAN and share a common file system type (UFS, EXT3, and the like). Because this is rarely the case in most migrations, SAN volumes are not commonly used in migration projects.

Regardless of the method you choose for transferring data, there is a basic methodology you should follow. In general, this process includes the following tasks :

Plan the transfer process
Perform functional testing
Conduct performance testing
Implement the transfer process

Following this process should ensure a successful data transfer. In the following sections, we explore each of these tasks in detail.

To Plan the Transfer Process

Before you can start transferring data, you need to make a number of decisions. The most important of these is the general strategy for transferring the data. A number of methods for this are described in the following sections. When planning the transfer process, address the following tasks:

Schedule downtime.

Most large transfers require two to three tests before the actual production transfer, so make sure that your downtime windows have been requested .
Create a staging area.
Plan for the target server.
Develop backout strategies.
Create, automate, and test the process.

Using a subset of data in the prototype environment, create and unit test the transfer process. When you get something you like, start to automate it. Because you will be doing quite a few transfer tests (and perhaps quite a few transfers), take the time to script the transfer with enough robustness that you can restart failed transfers.

To Test Functionality

After you establish a plan for effectively transferring data, you need to test the plan to ensure that it functions as you expect it to and that all of the transferred data run on the production environment.

Schedule downtime for functional testing of the process.
Back up the data.

Take the first full backup of the data you are going to transfer and use it to conduct functional testing of the transfer process. This is often called a dry run . This dry run should use the entire data set on the production equipment.
Set up monitoring.

Start gathering performance data that you can use to tune the process later.
Evaluate the results.

Once you have results from the functional test, evaluate them, and tune your process in preparation for performance testing.

To Test Performance

When you are confident that the transfer process is functioning as it should, performance testing needs to be conducted . Performance testing will ensure that the transfer process will perform within the allotted downtime window.

Schedule downtime for the performance test.
Back up the data again.
Test the performance of the transfer, monitoring the results.

This should result in an accurate estimate of the total transfer time. If the amount of time required to perform the test doesn't fit the downtime window, repeat this step until you achieve the desired results.

To Implement the Transfer Process

When you reach a point at which the transfer process functions the way you expect it to, within the allotted amount of time, you are ready to transfer the data.

Schedule the production transfer.

Ensure that you allot an appropriate amount of downtime, as determined from the performance tests.
Back up the data again.

Make sure the test completely restores the original from this backup.
Begin the transfer.

This should be the smoothest part of the process because it has been practiced and timed several times now.
Perform data testing on the transferred data.

Conduct tests on the data as described in the master test plan.
Go live.

Following this process should ensure a successful data transfer. In the following sections, we explore specific tools and techniques to perform the data transfer.

Transforming Data

Sometimes you have to contend with even more than just the physical transportation of the data from one platform to another. Data may need to be converted or transformed in some way to work with the new platform, application, or both. Because data transformations deal with both the physical and logical conversion of data, they are much more difficult and expensive.

Data transformations will follow much the same process as the data transfers above: however, extra steps will be needed to process the data. These extra steps, usually called staging, can range from simple mapping to complete rekeying. While they can be automated with commercial tools and scripting languages, the cost of tools or time invested for scripting tends to be expensive.

Data transformation tend to fall into three categories: encoded data transformations, application transformations, and database transformations.

Transform Encoded Data

Encoded data transformations occur because data has been stored in different or incompatible file formats. While most UNIX files are stored in ASCII encoding, many other file formats are in use on other platforms. For example, mainframes use the older EBCDIC standard, and non-English systems use double-byte character sets or Unicode. Another example would be the difference in text file formatting (CRLF/CR) between DOS and UNIX text files.

Encoded data will require specialized transformation applications. However, most UNIX systems (Solaris included) have a general-purpose utility, called dd , that can do some basic transformations from EBCDIC to ASCII. The following example shows its use to convert between EBCDIC and ASCII while forcing all the resulting text to lower case:

 # dd if=test.ebcdic of=test.ascii conv=ascii,lcase

More difficult encoding transformation, such as the Unicode or double-byte character set described above, will require more specialized tools.

Transform Application Data

Application data transformations are more demanding than encoding data transformations. Because they are specific to each application, most applications will come with some utility to convert standard data interchange formats like comma-separated value (CSV) or tab-delimited text files into their format.

A specialized, but common subset of application transformations are databases. In fact, these transformations are so common that a class of applications, called extract, transform, and load (ETL), have been created to address them. ETL utilities take a wide array of formats (both standard interchange and proprietary application formats) to convert them into Structured Query Language (SQL) for relational database management system (RDBMS). Most RDBMSs come with a basic set of these utilities to convert SQL or standard interchange formats into their data storage format.

The most basic ETL utilities will be provided by the RDBMS vendors as their logical copy utilities. Examples of these include Oracle's export and import commands, MySQL's dump command, and Sybase's bulkcopy program ( bcp command). These commands use the APIs of the RDBMSs to take the proprietary storage format and dump the output in standard (or close to standard) SQL text. For example, the following session uses the MySQL mysqldump command for that purpose:

 #  mysqldump -uroot -p e107 > e107.sql  Enter password: #  more e107.sql  -- MySQL dump 8.22 -- -- Host: localhost    Database: e107 --------------------------------------------------------- -- Server version       3.23.56 -- -- Table structure for table 'e107_core' -- CREATE TABLE e107_core (   e107_name varchar(20) NOT NULL default '',   e107_value text NOT NULL,   PRIMARY KEY  (e107_name) ) TYPE=MyISAM; -- -- Dumping data for table 'e107_core' -- INSERT INTO e107_core VALUES ('e107','a:5:{s:11:\"e107_author\";s:22:\"Steve Dun stan (jalist)\";s:8:\"e107_url\";s:15:\"http://e107.org\";s:12:\"e107_version\"; s:5:\"0.555\";s:10:\"e107_build\";s:4:\"beta\";s:14:\"e107_datestamp\";i:1055552 502;}');

As you can see from the example, the mysqldump utility includes the database schema as well as the actual application data. While this is fine if you are moving from MySQL to MySQL (for instance, from Linux to Solaris), it might not correctly import into other RDBMSs. In that case, you will need to find a slightly more sophisticated tool that understands the differences between RDBMS implementations. One such tool is Oracle Migration Workbench, available for free from Oracle's TechNet Web site (http://technet.oracle.com). This tool allows you to extract data from MySQL, Sybase, DB2, and other RDBMSes, manipulate the data, and import it into Oracle 8 or 9. Most RDBMS vendors will provide tools that allow the migration of database objects and data to their own RDBMS implementations . Additionally, third parties have built tools with heterogeneous capability.

However, if you are considering taking non-SQL or interchange formats into an RDBMS, you will need to look for a commercial utility or write your own. Non-SQL data types commonly include hierarchical database outputs and XML files. Commercial utilities commonly employ mapping technologies that map database fields into the new format (such as objects).

< Day Day Up >