Using DML INSERTs gets data into the DB2 tables, but it is not a feasible option for large amounts of data. The LOAD utility is needed for populating tables with large amounts of data.
For removing data from a DB2 table, we can use DML DELETEs, but doing so is not optimal for large amounts of data. For unloading data, better options are DSNTIAUL, UNLOAD, and REORG DISCARD or REORG UNLOAD EXTERNAL.
The LOAD utility is used to load one or more tables of a table space. This utility loads records into the tables and builds or extends any indexes defined on them. If the table space already contains data, you can either add the new data to the existing data or replace the existing data. The loaded data is processed by any edit or validation routine associated with the table and any field procedure associated with any column of the table. The LOAD utility will load data coming from a sequential data set into one or more tables in the same table space. Because data is coming in from a non-DB2 source, all integrity checking must be performed: entity integrity, referential integrity, and check integrity.
The output from LOAD DATA consists of a loaded table space or partition, a discard file of rejected records, and a summary report of errors encountered during processing. (This report is generated only if you specify ENFORCE CONSTRAINTS or if the LOAD involves unique indexes.)
The LOAD utility consists of several phases. In the first phase, UTILINIT, the LOAD performs initialization and start-up.
In the second phase, RELOAD loads the record types and writes temporary file records for indexes and foreign keys. Check constraints are checked for each row. One pass through the sequential input data set is made. Internal commits are taken to provide commit points at which to restart in case operation should halt in this phase. If you specified the COPYDDN or RECOVERYDDN keywords, inline copies are created. If SORTKEYS is used, a subtask is started at the beginning of the RELOAD phase to sort the keys. This subtask initializes and waits for the main RELOAD phase to pass its keys to SORT. The RELOAD phase loads the data, extracts the keys, and passes them in memory for sorting. At the end of the RELOAD phase, the last key is passed to SORT, and record sorting completes. PREFORMAT for table spaces occurs at the end of the RELOAD phase. Use SORT for sorting temporary file records before creating indexes or validating referential constraints, if indexes or foreign keys exist.
The SORT phase is skipped if all the following conditions apply for the data processed during the RELOAD phase.
In the third phase, BUILD creates indexes from temporary file records for all indexes defined on the loaded tables, detects any duplicate keys, and preformats index pages. If you specify a parallel index build, all activities that normally occur in both the SORT and BUILD phases occur in the SORTBLD phase instead.
Next, INDXVAL corrects unique-index violations from the information in SYSERR, if any exist. ENFORCE checks referential constraints, performs corrections of violations, and reports them to SYSERR. DISCARD copies the records causing errors from the input data set to the discard data set.
REPORT generates a summary report if you specified ENFORCE CONSTRAINT or if load index validation is performed. Finally, UTILTERM performs any necessary cleanup.
In order to load data into a DB2 table, we must supply an input data set and a target table. The input data set must match the target table. The following example loads data from the data set specified by the DB2LDS DD statement into the TEST_TAKEN table:
LOAD DATA INDDN DB2LDS INTO TABLE DB2USER1.TEST_TAKEN
Rules can be selectively added to the LOAD criteria to LOAD only rows that meet a specified criterion. An example follows:
LOAD DATA INDDN DB2LDS INTO TABLE DB2USER1.TEST_TAKEN WHEN (1:3)='300'
In order to map the position in the file to the columns in the table:
LOAD DATA RESUME YES INTO TABLE DB2USER1.TEST WHEN (1:1)='P' (NUMBER POSITION (1:6) CHAR, NAME POSITION (7:57) CHAR, TYPE POSITION (58:59) CHAR, CUTSCORE POSITION (60:68) DECIMAL EXTERNAL, LENGTH POSITION (69:74) SMALLINT, TOTALTKN POSITION (75:80) SMALLINT, TOTALPAS POSITION (81:86) SMALLINT)
This example shows how to identify the table that is to be loaded, the fields within the input record, and the format of the input record. The fields need to be mapped to the columns of the table that is being loaded; otherwise, errors will result.
The LOAD utility's REPLACE option can be used to replace data in a single-table table space or in a multiple-table table space. All the data can be replaced in the table space by using this option. If you want to preserve the records that are already in the table, use the LOAD RESUME option.
If an object is in REORG-pending status, you can perform a LOAD REPLACE of the entire table space, which resets REORG-pending status. In this situation, no other LOAD operations are allowed.
LOAD REPLACE or LOAD PART REPLACE with LOG YES logs only the reset and not each deleted row. If you need to see what rows are being deleted, use the SQL DELETE statement.
The following example shows how to replace data in one table in a single-table table space:
LOAD DATA REPLACE INTO TABLE DSN861.DEPT (DEPTNO POSITION (1)CHAR(3), DEPTNAME POSITION (5)VARCHAR, ) ENFORCE NO
When using LOAD REPLACE on a multiple-table table space, care must be taken because LOAD works on an entire table space at a time. In order to replace all rows in a LOAD multiple-table table space, you have to work with one table at a time, using the RESUME YES option on all but the first table. For example, if you have two tables in a table space, you first need to use LOAD REPLACE on the first table to empty out the table space and replace the data for the first table. Next, use LOAD with RESUME YES on the second table to add the records for the second table without destroying the data in the first table.
If only one table in a multiple-table table space needs to be replaced, all rows need to be deleted from the table. Use LOAD with RESUME YES.
If you need to add data to a table rather than replace, use LOAD with RESUME YES. The RESUME keyword specifies whether it is to be loaded into an empty or a nonempty table space. RESUME NO loads records into an empty table space. RESUME YES loads records into a nonempty table space.
If RESUME NO is specified and the target table space is not empty, no data is loaded. If RESUME YES is specified and the target table space is empty, data is loaded.
LOAD always adds rows to the end of the existing rows, but index entries are placed in key sequence.
You can run a LOAD REPLACE on a table space that has pages on the LPL (Logical Page List), and it will reset the status; however, you cannot run a LOAD RESUME on the object.
An efficient way of clearing a table space is to delete all the data in it, specifying LOAD REPLACE without loading any records. In order to do this, specify the input data set in the JCL as DD DUMMY.
The LOAD REPLACE method is efficient for the following reasons.
Loading Ordered Rows
The LOAD utility loads records into a table space in the order in which they exist in the input stream. LOAD does not sort the input stream and does not insert records in sequence with existing records, even if there is a clustering index. To achieve clustering when loading an empty table or replacing data, sort the input stream.
When adding data to a clustered table, consider reorganizing the table afterward.
By using the PART clause of the INTO TABLE option, you load only the specified partitions of a partitioned table. If the PART keyword is omitted, the entire table will be loaded. The REPLACE and RESUME options can be specified separately by partition. The following example loads data into the first and second partitions of the employee table. Records with '0' in column 1 replace the contents of partition 1; records with '1' in column 1 are added to partition 2; all other records are ignored. (The example, simplified to illustrate the point, does not list field specifications for all columns of the table.)
LOAD DATA INTO TABLE DB2USER1.CANDIDATE PART 1 REPLACE WHEN (1)= '0' (CID POSITION (1:9) CHAR(9), LNAME POSITION (10:40) VARCHAR(30), ... ) INTO TABLE DB2USER1.CANDIDATE PART 2 RESUME YES WHEN (1)='1' (CID POSITION (1:9) CHAR(9), LNAME POSITION (10:40) VARCHAR(30), ... )
If you are not loading columns in the same order as in the CREATE TABLE statement, you must code field specifications for each INTO TABLE statement.
The next example assumes that the data is in separate input data sets. The data is already sorted by partition, so the WHEN clause of INTO TABLE is not necessary. The RESUME YES option placed before the PART option will not allow concurrent partition processing during the execution of the utility:
LOAD DATA INDDN CERTLD1 CONTINUEIF(72:72)='X' RESUME YES INTO TABLE DB2USER1.CANDIDATE REPLACE PART 1 LOAD DATA INDDN CERTLD2 CONTINUEIF(72:72)='X' RESUME YES INTO TABLE DB2USER1.CANDIDATE REPLACE PART 2
The following example allows partitioning independence when loading more than one partition concurrently:
LOAD DATA INDDN SYSREC LOG NO INTO TABLE DB2USER1.CANDIDATE PART 2 REPLACE
LOAD INTO PART x is not allowed if an identity column is part of the partitioning index.
If another table needs to access data during a LOAD utility, you can specify SHRLEVEL CHANGE on a LOAD RESUME with the LOG YES option. This option effectively combines the speed and performance of the LOAD utility with the availability and access offered by INSERT processing. This option operates similarly to an SQL INSERT program and uses claims instead of draining for best concurrent access.
The option is LOG YES only and will not require a COPY afterward. Locking problems are avoided through internal monitoring of the commit scope. You can also run it in parallel for partitioned table spaces.
The LOAD utility converts data between compatible data types. Table 7-1 identifies the allowable data conversions and the defaults used when the input data type is not specified in a field specification of the INTO TABLE statement.
The LOAD utility will not load a table with an incomplete definition. If the table has a primary key, the unique index on that key must exist. If any table named to be loaded has an incomplete definition, the LOAD job will terminate.
By default, LOAD enforces referential constraints (ENFORCE CONSTRAINTS). During this process, several errors could occur.
A primary index must be a unique index and must exist if the table definition is complete. Therefore, when a parent table is loaded, there must be at least a primary index.
An error data set, and probably also a map data set and a discard data set, will be needed for the LOAD utility for RI errors.
A dependent table has the constraint that the values of its foreign keys must be values of the primary keys of corresponding parent tables. By default, LOAD enforces that constraint in much the same way that it enforces the uniqueness of key values in a unique index. First, it loads all records to the table; subsequently, it checks their validity with respect to the constraints, identifies any invalid record by an error message, and deletes the record. The record can optionally be copied to a discard data set.
If a record fails to load because it violates a referential constraint, any of its dependent records in the same job also fail. For example, suppose that the sample project table and project activity tables belong to the same table space, that you load them both in the same job, and that an input record for the project table has an invalid department number. Then, that record fails to be loaded and does not appear in the loaded table; the summary report identifies it as causing a primary error.
But the project table has a primary key: the project number. In this case, the record rejected by LOAD defines a project number; in the project activity table, any record that refers to the rejected number is also rejected. The summary report identifies those as causing secondary errors. If you use a discard data set, both types of error records are copied to it.
The deletion of invalid records does not cascade to other dependent tables already in place. Suppose now that the project and project activity tables exist in separate table spaces and that they are both currently populated and possess referential integrity. Further, suppose that the data in the project table is now to be replaced, using LOAD REPLACE, and that the replacement data for a department was inadvertently not supplied in the input data. Records referencing that department number might already exist in the project activity table. LOAD therefore automatically places the table space containing the project activity table, and all table spaces containing dependent tables of any table being replaced, into CHECK-pending status.
The CHECK-pending status indicates that the referential integrity of the table space is in doubt; it might contain records that violate a referential constraint. The use of a table space in CHECK-pending status has severe restrictions; typically, you run the CHECK DATA utility to reset this status.
If the ENFORCE NO option is used, the LOAD utility will not enforce referential constraints. But the result is that the loaded table space might violate the constraints. The following is an example of running without enforcing constraints:
LOAD DATA INDDN DB2LDS ENFORCE NO INTO TABLE DB2USER1.TEST_TAKEN
LOAD places the loaded table space in CHECK-pending (CHKP) status. If you use REPLACE, all table spaces containing any dependent tables of the tables that were loaded are also placed in CHECK-pending status.
The referential integrity checking in LOAD can only delete incorrect dependent rows that were input to LOAD. But deletion is not always the best strategy for correcting referential integrity violations.
For example, the violations may occur because parent rows do not exist. In this case, it is better to correct the parent table, not to delete the dependent rows. Therefore, and in this case, ENFORCE NO would be more appropriate than ENFORCE CONSTRAINTS. After the parent table is corrected, CHECK DATA can be used to reset the CHECK-pending status.
LOAD ENFORCE CONSTRAINTS is not equivalent to CHECK DATA. LOAD ENFORCE CONSTRAINTS deletes any rows causing referential constraint violations. CHECK DATA detects violations and optionally deletes such rows. CHECK DATA checks a complete referential structure, although LOAD checks only the rows being loaded. The CHECK DATA utility is discussed later in this chapter.
When loading referential structures with ENFORCE CONSTRAINTS, you should load parent tables before dependent tables.
Be aware that running the LOAD utility on a table space does not activate triggers defined on tables in the table space.
Loading ROWID Columns
Columns defined as ROWID can be designated as input fields, using the LOAD field specification syntax diagram. LOAD PART is not allowed if the ROWID column is part of the partitioning key. Columns defined as ROWID can be designated as GENERATED BY DEFAULT or GENERATED ALWAYS. With GENERATED ALWAYS, DB2 always generates a row ID.
Columns defined as ROWID GENERATED BY DEFAULT can be set by the LOAD utility from input data. The input field must be specified as ROWID. No conversions are allowed. The input data for a ROWID column must be a unique, valid value for a row ID. If the value of the row is not unique, a duplicate key violation will occur. If such an error occurs, the load will fail. In this case, the duplicate values will need to be discarded, and the load will need to be retried with a new unique value; alternatively, allow DB2 to generate the value of the row ID.
The DEFAULTIF attribute can be used with the ROWID keyword. If the condition is met, the column will be loaded with a value generated by DB2. The NULLIF attribute cannot be used with the ROWID keyword, because row ID columns cannot be null.
A ROWID column defined as GENERATED ALWAYS cannot be included in the field specification list, because DB2 generates the ROWID value automatically.
Loading a LOB column
The LOAD utility treats LOB columns as varying-length data. The length value for a LOB column must be 4 bytes. When the input record is greater than 32KB, you might have to load the LOB data separately.
When loading into a nonsegmented table space, LOAD leaves one free page after reaching the FREEPAGE limit, regardless of whether the records loaded belong to the same or different tables. When loading into a segmented table space, LOAD leaves free pages, and free space on each page, in accordance with the current values of the FREEPAGE and PCTFREE parameters. Those values can be set by the CREATE TABLESPACE, ALTER TABLESPACE, CREATE INDEX, or ALTER INDEX statements. LOAD leaves one free page after reaching the FREEPAGE limit for each table in the table space.
The LOAD utility can also perform an inline copy, which is, for the most part, equivalent to a full image copy taken with SHRLEVEL REFERENCE. The only difference between an inline copy and a regular full image copy is that data pages and space map pages may be out of sequence or repeated; if a compression dictionary was built during the LOAD, the pages will be duplicated. These differences, however, should be negligible in terms of the amount of space required for the copy data set, and the copy is still valid for recovery.
The inline copy increases the availability of your data because, after the data has been loaded and the inline copy taken, the table space is not left in a copy-pending (COPY) status, even if you specify LOG NO, and the data is ready to be accessed. You can take multiple-image copies with this feature as well, with a maximum of two primary- and two secondary-image copies allowed.
Statistics can be collected during the LOAD utility, eliminating the need for executing the RUNSTATS utility after the LOAD. This is done by using the STATISTICS keyword to gather catalog statistics for the table space:
LOAD STATISTICS INTO TABLE DB2USER1.TEST_TAKEN
More information on RUNSTATS is given later in this chapter.
Changing the key ranges for the partitions can be done by using the REBALANCE keyword. DB2 automatically rebalances the keys over the specified partitions. The following example rebalances partitions 3 and 4:
REORG TABLESPACE DB2CERT.TSCERT PART(3:4) REBALANCE
Using the SORTKEYS option for the LOAD utility allows for elimination of the SYSUT1 and SORTOUT temporary data sets for sorting the key/RID pairs for foreign keys or for indexes, provided that they have been given a nonzero estimate for the key/RID pairs to be sorted. In this case, DFSORT exits are used to pass the unsorted key/RID pairs for the foreign keys or indexes to DFSORT and to accept the sorted key/RID pairs again rather than making the unsorted key/RID pairs available via the SYSUT1 data set and receiving the sorted key/RID pairs via the SORTOUT data set.
With the SORTKEYS option, the sorting of the key/RID pairs is done in parallel with the loading of the rows, the building of the indexes, andif inline statistics are establishedthe collection of statistics. The parallelism is achieved by groups of z/OS subtasks. A task group can contain two or three tasks, depending on whether inline statistics are to be established. These subtasks are referred to as the SORT task, BUILD task, and STATISTICS task, respectively.
DB2 can allocate multiple task groups of two or three subtasks each to sort the key/RID pairs and to establish the indexes in parallel with one another. The degree of parallelism used depends on the specifications and the DD statements provided for sort-work data sets.
If the sort-work data sets are dynamically allocated by DFSORTSORTDEVT has been specifiedthe degree of parallelism is determined by them. If manual allocation of the sort-work data sets and sort-message data sets is chosen, the degree of parallelism is determined by the data sets that have been allocated.
When DB2's preformatting delays impact the performance or execution time consistency of high-INSERT applications and the table size can be predicted for a business-processing cycle, LOAD PREFORMAT or REORG PREFORMAT might be a technique to consider. This technique will be of value only if DB2's preformatting causes a measurable delay with the INSERT processing or causes inconsistent elapsed times for INSERT applications. It is recommended that, to quantify its value in your environment, a performance assessment be conducted before and after LOAD or REORG PREFORMAT is used.
PREFORMAT is a technique used to eliminate DB2's having to preformat new pages in a table space during execution time. This might eliminate execution-time delays but adds the preformatting cost as set up prior to the application's execution. LOAD or REORG PREFORMAT primes a new table space and prepares it for INSERT processing. When the preformatted space is used and DB2 has to extend the table space, normal data set extending and preformatting occur.
Preformatting for INSERT processing may be desirable for high-INSERT tables that will receive a predictable amount of data, allowing all the required space to be preallocated prior to the application's execution. This would be the case for a table that acts as a repository for work items coming into a system subsequently used to feed a back-end task that processes the work items.
Preformatting of a table space containing a table used for query processing may cause a table space scan to read additional empty pages, extending the elapsed time for these queries. LOAD or REORG PREFORMAT is not recommended for tables that have a high ratio of reads to inserts if the reads result in table space scans.
The PREFORMAT option can also be used on the REORG utility.
Parallel Index Builds
LOAD builds all the indexes defined for any table being loaded. At the same time, LOAD checks for duplicate values of any unique index key. If any duplicate values exist, one of the corresponding rows is loaded. Error messages identify the input records that produce duplicates; optionally, the records are copied to a discard data set. At the end of the job, a summary report lists all errors found. For unique indexes, any two null values are taken to be equal, unless the index was created with the UNIQUE WHERE NOT NULL clause. In that case, a single-key column can contain any number of null values, although its other values must be unique.
Neither the loaded table nor its indexes contain any of the records that might have produced an error. Using the error messages, you can identify faulty input records, correct them, and load them again. If you use a discard data set, you can correct the records there and add them to the table with LOAD RESUME.
Use parallel index builds to reduce the elapsed time for a LOAD job by sorting the index keys and rebuilding multiple indexes in parallel rather than sequentially. Optimally, a pair of subtasks process each index; one subtask sorts extracted keys, and the other subtask builds the index. LOAD begins building each index as soon as the corresponding sort emits its first sorted record. LOAD uses a parallel index build if all the following conditions are true: When more than one index is to be built, the LOAD utility specifies the SORTKEYS keyword, along with a nonzero estimate of the number of keys, in the utility statement. The utility can be allowed to either dynamically allocate the data sets needed by SORT or provide the necessary data sets in the job.
LOAD Parallelism for Partitioned Table Spaces
LOAD parallelism helps when dealing with short windows in which to load a lot of data. This allows us to load multiple partitions of a partitioned table space in parallel in the same job. Large loads have been a problem in the past when nonpartitioned indexes were involved, often causing the DBA to have to drop and recreate them in order to get large data loads done. Multiple tasks in a single job can be used to load the partitions in parallel, and a single job can be submitted with several input files to be loaded in parallel. The performance is fast, and the contention on the NPI is eliminated. The number of parallel load tasks will be determined by the number of CPUs and virtual storage available, as well as the number of threads available. In the following example of the necessary syntax, two partitions are being loaded in parallel, and part 1 is also being preformatted:
LOAD INTO TABLE tab1 PART 1 INDDN infile1 PREFORMAT INTO TABLE tab1 PART 2 INDDN infile2
If the data set definitions have been deferredthe DDL was run with DEFINE NOa LOAD with an empty data file should be done on the first partition, and then the other partitions can be loaded in parallel in order to get LOAD parallelism on the initial load.
The LOAD utility allows a cursor to be defined for the input data. This can be done with the EXEC SQL and INCUSOR parameters in the utility to invoke the family cross-loader function whereby data can be loaded from any DRDA remote server.
Some restrictions are that you cannot load into same table as a defined cursor, use SHRLEVEL CHANGE, or use field specifications or discard processing.
Following is an example of using a cursor in a LOAD utility:
EXEC SQL DECLARE C1 CURSOR FOR SELECT * FROM DB2CERT ENDEXEC LOAD DATA INCURSOR(C1) REPLACE INTO DB2CERT2
You can use the DB2 sample unload program DSNTIAUL to unload data and to create control statements for use with the DB2 LOAD utility. The source code for this program can be found in the DSNSAMP folder.
The UNLOAD utility unloads data from one or more source objects to one or more BSAM (Basic Sequential Access Method) sequential data sets in external formats. The source of the data can be
This utility allows an unload of rows from an entire table space, specific partitions, or individual tables. The utility can also unload specific columns of a table by using a field-specification list. If a table space is partitioned, all the selected partitions can be unloaded into a single data set, or you can unload each partition in parallel into physically distinct data sets. Multiple tables from the same table space can be unloaded in the same job; however, the unload cannot perform joins on tables. If the tables for the unload are not specified, all the tables in the table space are unloaded. The UNLOAD utility can also change data types of selected columns, change the order of the columns, use sampling to get a cross-section of data, specify conditions for row selection, and limit the number of rows to be unloaded.
The output records written by the UNLOAD utility are compatible as input to the LOAD utility. Thus, the original table or different tables can be reloaded with the data from UNLOAD.
Output from UNLOAD consists of an unloaded table space or partition and/or a discard file of rejected records.
The UNLOAD utility has three phases:
One pass through the input data set is made during the UNLOAD phase. If UNLOAD is processing a table space or a partition, DB2 takes internal commits to provide commit points at which to restart in case operation should halt in this phase.
Following is an example of using the UNLOAD utility:
UNLOAD TABLESPACE DB2CERT.CERTTS FROM TABLE DB2USER1.TEST_TAKEN WHEN (CID =300 AND SCORE > 90)
Until version 8, the LOAD utility on DB2 for z/OS required input in positional format. This often required data from other platforms to be converted or to use INSERTs. With version 8, LOAD or UNLOAD now produces and accepts delimited files. A delimited file has row and column delimiters.
The FORMAT DELIMITED syntax on LOAD supports COLDEL, CHARDEL, and DECPT options to specify the column delimiter, character delimiter, and decimal-point character on the input file. The DELIMITED syntax on UNLOAD supports the same optionsCOLDEL, CHARDEL, and DECPTto specify the column delimiter, character delimiter, and decimal-point character on the output file. The following example uses LOAD DATA with delimited input:
LOAD DATA RESUME YES FORMAT DELIMITED COLDEL ';'
Using Reorg to Remove Data
The REORG utility has an option for unloading or discarding data during the REORG utility execution. The UNLOAD EXTERNAL option unloads selected data and places it into a data set that can then be loaded into a table. The DISCARD option permits selected removal of rows during the REORG utility execution.