3.9 Data Loads


Databases frequently have multiple information providers as shown in Figure 3-9.

Figure 3-9. Database Information Providers.

graphics/03fig09.gif

OLTP providers usually perform single-row inserts or updates ”these transactions are usually just a few rows. Data can also be loaded in from other systems. These are batch data loads and occur when there is less OLTP activity.

Data loads typically fall into two categories. One type is a schema initialization load. This process brings data into the database for the first time and coincides with the application development. The load may be from a legacy system that is being converted to Oracle. These loads require data "scrubbing" (e.g., the legacy may require string conversions to load time/day fields into an Oracle DATE type). Constraints and indexes can be built after the data is verified and loaded.

Other batch loads occur on a periodic base. A load can initiate from a user , an operating system-scheduled job, or possibly a PL/SQL procedure scheduled through the Oracle DBMS_JOB queue.

SQL*Loader is an Oracle utility for loading fixed-format or delimited fields from an ASCII file into a database. It has a conventional and direct load option. The default option is conventional.

The direct load method disables constraints before the data load and enables them afterward. This incurs some overhead. For large amounts of data, the direct method is much faster than the conventional method. Postload processing includes not just the enabling of constraints, but the rebuilding of indexes for primary key and unique constraints.

If a direct load contains duplicates, the post process of enabling constraints and rebuilding of indexes fails. For a duplicate primary key or unique constraint, the failed state leaves the index in a "direct load" state.

Log messages in the SQL*Loader file will highlight this type of failure with the following:

 
 The following index(es) on table STUDENTS were processed: Index PK_STUDENTS was left in Direct Load State due to ORA-0145 cannot CREATE UNIQUE INDEX; duplicate keys found 

Following a direct load, you should check the SQL*Loader log file but also check the status of your indexes. A simple query for troubleshooting is the following:

 
 SELECT index_name, table_name FROM USER_INDEXES WHERE STATUS = 'DIRECT LOAD'; 

If you have bogus data following a direct load, you need to remove all duplicates before you can enable constraints and rebuild the indexes.

For a conventional SQL*Loader path , duplicate records are written to the SQL*Loader "bad" file with corresponding messages in the SQL*Loader "log" file. If no errors occur, then there is no "bad" file. SQL*Loader is a callable program and can be invoked in a client/server environment where the end user takes an action to load a file that is stored on the server. The mere existence of a bad file, following the load, will indicate errors during the load.

You can use SQL*Loader as a callable program to implement daily loads using a conventional path. You can use this utility to load large files with millions of rows into a database with excellent results.

Each SQL*Loader option (conventional and direct load) provides a mechanism to trap and resolve records that conflict with your primary key or any other constraint; however, direct load scenarios can be more time consuming.

Alternatives to SQL*Loader are SQL*Plus scripts and PL/SQL. You can load the data with constraints on and capture failed records through exception handling. Bad records can be written to a file using the UTL_FILE package. Bad records can also be written to a temporary table that has no constraints.

You also have the option to disable constraints, load data into a table, and then enable the constraint. If the data is bad you cannot enable the constraint. To resolve bad records, start with an EXCEPTIONS table. The exceptions table can have any name , but must have the following columns .

 
 CREATE TABLE EXCEPTIONS  (row_id     ROWID,   owner      VARCHR2(30),   table_name VARCHAR2(30),   constraint VARCHAR2(30)); 

The SQL for this exceptions table is found in the ORACLE_HOME/RDBMS/ADMIN directory in the file utlecpt.sql. The RDBMS/ADMIN directory, under ORACLE_HOME, is the standard repository for many scripts including the SQL scripts to build the data dictionary catalog, scripts to compile the SYS packages, and scripts like the exceptions table.

We use the exceptions table to capture rows that violate a constraint. This capturing is done as we attempt to enable our constraint. The following TEMP table is created with a primary key.

 
 CREATE TABLE TEMP (id   VARCHAR2(5) CONSTRAINT PK_TEMP PRIMARY KEY,  no NUMBER); 

Insert some good data:

 
 INSERT INTO temp VALUES ('AAA', 1); INSERT INTO temp VALUES ('BBB', 2); 

The following disables the constraint. This is done here prior to inserting new data.

 
 ALTER TABLE temp DISABLE CONSTRAINT PK_TEMP; 

Now we insert some data; in this example, this is one row that we know to be duplicate row.

 
 INSERT INTO temp VALUES ('AAA', 3); 

The following shows the error when we enable the constraint with SQL*Plus.

 
  SQL>  ALTER TABLE temp ENABLE CONSTRAINT pk_temp;  ORA-00001 cannot enable constraint. Unique constraint pk_temp violated.   SQL>  

What if we had started with a million rows in TEMP and loaded another million. The task of identifying the offending rows can be tedious . Use an exceptions table when enabling constraints. The exceptions table captures the ROW ID of all offending rows.

 
 ALTER TABLE temp ENABLE CONSTRAINT pk_temp EXCEPTIONS INTO exceptions; 

The constraints are still off. All records are in TEMP, but you can identify the bad records.

 
 SELECT id, no FROM   temp, exceptions WHERE  exceptions.constraint='PK_TEMP' AND    temp.rowid=exceptions.row_id; 

This works for all types of constraint violations. You may not be able to enable constraints after a load, but you can capture the ROWID and constraint type through an exceptions table.



Programming Oracle Triggers and Stored Procedures
Programming Oracle Triggers and Stored Procedures (3rd Edition) (Prentice Hall PTR Oracle Series)
ISBN: 0130850330
EAN: 2147483647
Year: 2003
Pages: 111
Authors: Kevin Owens

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net