9.2 Data checking technique

< Day Day Up >

Data movement is the first thing any migration should focus on. Without having all your tables and data properly moved over, all other migration testing is in vain. The test process should detect, if all rows were imported into the target database, verify that data type conversions were successful, and check random data byte-by-byte. The data checking process should be automated by appropriate scripts. When testing data migration you should:

Check IMPORT/LOAD messages for errors and warnings.
Count the number of rows in source and target databases and compare them.
Prepare scripts that perform data checks
Involve data administration staff familiar with the application and its data to perform random checks.

9.2.1 IMPORT/LOAD messages

You should always check the messages generated by IMPORT or LOAD commands. Example 9-1 presents messages generated by the sample import command. You should read not only the summary at the end of the listing, but also pay attention to the warning messages.

Example 9-1: Sample IMPORT messages

 db2 import from table01.unl of del replace into table01 SQL3109N  The utility is beginning to load data from file "table01.unl". SQL3148W  A row from the input file was not inserted into the table.  SQLCODE "-545" was returned. SQL0545N  The requested operation is not allowed because a row does not satisfy the check constraint "ARTURW.TABLE01.SQL030812222227680".  SQLSTATE=23513 SQL3185W  The previous error occurred while processing data from row "2" of the input file. SQL3117W  The field value in row "3" and column "1" cannot be converted to a SMALLINT value.  A null was loaded. SQL3125W  The character data in row "4" and column "2" was truncated because the data is longer than the target database column. SQL3110N  The utility has completed processing. "4" rows were read from the input file. SQL3221W  ...Begin COMMIT WORK. Input Record Count = "4". SQL3222W  ...COMMIT of any database changes was successful. SQL3149N  "4" rows were processed from the input file.  "3" rows were successfully inserted into the table.  "1" rows were rejected. Number of rows read         = 4 Number of rows skipped      = 0 Number of rows inserted     = 3 Number of rows updated      = 0 Number of rows rejected     = 1 Number of rows committed    = 4

As shown in the summary, during the import process one record from the input file was rejected, and three were inserted into the database. To understand the nature of the warnings, you should look into the data source file and the table definition (db2look command). For Example 9-1, the table definition is presented in Figure 9-2 and the data file in Figure 9-3.

 CREATE TABLE TABLE01 (   C1 SMALLINT,   C2 CHAR(3),   C3 SMALLINT CHECK( C3 IN (1,2,3)))

Figure 9-2: Table "table01" definition for Example 9-1

 1,"abc",1 2,"abc",4 32768,"abc",2 4,"abcd",3

Figure 9-3: Data file "table01.unl" for Example 9-1

The first row from the input file (Figure 9-3) was inserted without any warnings. The second row was rejected because it violated check constraint (warnings SQL3148W, SQL0545N, SQL3185W). A value of 32768 from the third row was changed to null because it was out of SMALLINT data type range (warning SQL3117W) and string abcd from the last row was truncated to abc because it was longer than the relevant column definition (warning SQL3125W).

The LOAD utility generates messages in similar format, but because it is designed for speed, it bypasses the SQL engine, and inserts data directly into table spaces without constraint checking. Inserting the same table01.unl file (Figure 9-3) into table01 (Figure 9-2) with load utility generates messages without SQL3148W, SQL0545N, SQL3185W warnings as shown in Figure 9-2.

Example 9-2: LOAD messages

 db2 load from table01.unl of del replace into table01 [..] SQL3117W  The field value in row "3" and column "1" cannot be converted to a SMALLINT value.  A null was loaded. SQL3125W  The character data in row "4" and column "2" was truncated because the data is longer than the target database column. [..] Number of rows read         = 4 Number of rows skipped      = 0 Number of rows loaded       = 4 Number of rows rejected     = 0 Number of rows deleted      = 0 Number of rows committed    = 4

A table that has been created with constraints is left by the LOAD command in check pending state. Accessing the table with SQL queries generates warning SQL0668N Operation not allowed for reason code "1" on table "<TABLE_NAME>". SQLSTATE=57016. The SET INTEGRITY SQL statement should be used to move loaded table into a usable state. Example 9-3 shows a way to validate constraints. All rows that violated constraints will be moved to exception table table01_e.

Example 9-3: Turning integrity checking back on.

 db2 create table table01_e like table01 db2 set integrity for table01 immediate checked for exception in table01 use table01_e SQL3602W  Check data processing found constraint violations and moved them to exception tables.  SQLSTATE=01603

The set integrity statement has many options, like turning integrity on only for new data, turning integrity off, or specifying exception tables with additional diagnostic information. To read more about SET INTEGRITY command refer to:

DB2 UDB SQL Reference Volume 2
DB2 UDB SQL Reference Volume 1 for exception tables.

9.2.2 Data checking scripts

Scripts performing logical data integrity checks automate the data verification process and save administrator effort. For small tables (with less that 50,000 rows) you can write a program that compares data byte-by-byte. The program (preferably ODBC, JDBC, or SQL script) can extract sorted rows from Oracle and DB2 to files in the same ASCII format. The files should be then binary compared (on UNIX use DIFF command) and checked to determine if they are the same. For larger tables, comparing all rows byte by bytes can be very inefficient. The data migration should be evaluated by comparing aggregate values, like the number of rows. To do this you can create a special table for storing the information about the number of rows in the source Oracle database. Table CK_ROW_COUNT presented in Example 9-4 can be used for that purpose.

Example 9-4: Table for storing number of rows (Oracle)

 CREATE TABLE CK_ROW_COUNT (    TAB_NAME VARCHAR2(30), -- table name    ROW_COUNT INT, -- number of rows    SYS_NAME CHAR(3), -- code to distinguish the system: ORA or DB2    TIME_INS DATE -- time when the count was performed

For each table you should count the number of rows and store the information in the CK_ROW_COUNT table. The following insert statement can be used for that purpose:

    insert into ck_row_count select 'TAB_NAME', count(*), 'ORA',  sysdate from    TAB_NAME

The table CK_ROW_COUNTS and its data can be manually migrated to the target DB2 database. Example 9-5 presents DB2 versions of the table.

Example 9-5: Table for storing number of rows (DB2)

 CREATE TABLE CK_ROW_COUNT (    TAB_NAME VARCHAR(30),    ROW_COUNT INT,    SYS_NAME CHAR(3),    TIME_INS TIMESTAMP )

On the DB2 system, you should repeat the counting process with the equivalent insert statement:

    insert into ck_row_count select 'TAB_NAME', count(*), 'DB2',  CURRENT    TIMESTAMP from  TAB_NAME

After performing the described steps DB2 table CK_ROW_COUNT should contain information about the number of rows counted on Oracle and DB2 databases. The records in the table should look like Example 9-6.

Example 9-6: Sample table CK_ROW_COUNTS contents

 select TAB_NAME, ROW_COUNT, SYS_NAME, TIME_INS from CK_ROW_COUNT [...] TABLE_A 39001 ORA 2003-08-13-10.13.39 TABLE_A 39001 DB2 2003-08-13-10.32.13 TABLE_B 60003 ORA 2003-08-13-10.15.29 TABLE_B 60002 DB2 2003-08-13-10.33.49 [...]

Having the information about number of rows in a SQL table is very convenient, because with single query you can get the table names that contain the different number of rows in source and target databases:

    select tab_name from (select distinct tab_name, num_rows from CK_ROW_COUNT)    as t_temp group by t_temp.tab_name having(count(*) > 1)

To manually migrate the data from the Oracle ck_row_count table, you can save the results of the following query to a file (ck_row_count.unl):

    select tab_name || ',' || row_count || ',' || sys_name || ',' || to_char(    time_ins, 'yyyy-mm-dd-hh24.mi.ss".000000"') from  ck_row_count

and import the file into DB2 using DB2 import command:

    db2 import from ck_row_count.unl of del insert into ck_row_count

The process of creating the statements that count the number of rows is automated by a PL/SQL script presented in Example 9-7. The script retrieves information about the user tables from the Oracle data dictionary, and generates two files count_rows_ora.sql (Oracle version) and count_rows_db2.sql (DB2 version) with insert statements that can used for calculating rows for all user tables.

Example 9-7: PL/SQL program that generates scripts for counting rows

 set heading off; set echo off; set feedback off; set serveroutput on size 100000 ---- Generating script for Oracle database spool count_rows_ora.sql; BEGIN   dbms_output.put_line( 'CREATE TABLE CK_ROW_COUNT ( TAB_NAME VARCHAR2(30), ROW_COUNT INT, SYS_NAME CHAR(3), TIME_INS DATE);');   FOR tab_rec IN (SELECT TABLE_NAME from TABS ) LOOP    dbms_output.put_line( 'INSERT INTO CK_ROW_COUNT SELECT '''||tab_rec.table_name||''', COUNT(*), ''ORA'',  SYSDATE FROM '|| tab_rec.table_name ||';');   END LOOP;   dbms_output.put_line( 'COMMIT; '); END; / spool off; ---- Generating script for DB2 database spool count_rows_db2.sql; BEGIN   dbms_output.put_line( 'CREATE TABLE CK_ROW_COUNT ( TAB_NAME VARCHAR(30), ROW_COUNT INT, SYS_NAME CHAR(3), TIME_INS TIMESTAMP);');   FOR tab_rec IN (SELECT TABLE_NAME from TABS ) LOOP    dbms_output.put_line( 'INSERT INTO CK_ROW_COUNT SELECT '''||tab_rec.table_name||''', COUNT(*), ''DB2'',  CURRENT TIMESTAMP FROM '|| tab_rec.table_name ||';');   END LOOP;   dbms_output.put_line( 'COMMIT; '); END; / spool off; rollback;

The presented approach for comparing the number of rows can be extended for additional checking, like comparing sum of numeric columns. Here are the steps that summarize the technique:

Define check sum tables on the source database and characterize scope of the computation.
Perform the computation and store the results in the appropriate check sum tables. Use Oracle data dictionary (TABS and COLS) to generate the aggregate queries, or write SPL procedures.
Migrate the check sum tables as other user tables.
Perform equivalent computations on the target system, and store the information in the migrated check sum tables.
Compare the computed values.

Table 9-4 provides computations for selected database types. The argument for the DB2 sum() function is converted to DECIMAL type, because in most cases the sum() function returns the same data type as its argument, which can cause arithmetic overflow. For example, when calculating the sum on an integer column, if the result exceeds the integer data type range, error SQL0802N is generated Arithmetic overflow or other arithmetic exception occurred. Converting the argument to DECIMAL eliminates the error.

Table 9-1: Aggregations for data migration verification
Data type	Oracle operation	DB2 operation
numeric(<precison>,<scale>)	sum( val)	sum( cast(val as decimal(31,<scale>)))
date	sum( trunc( val - to_date('0001/01/02', 'yyyy/mm/dd')))	sum( cast(days(val) as decimal(31,1)))
variable length character	sum( length(val))	sum( cast( length(val) as decimal(31,0)))
fixed length character	sum( length(rtrim(val)))	sum( cast( length(rtrim(val)) as decimal(31,0)))

< Day Day Up >