< Day Day Up > |
The LOAD UtilityThe LOAD utility is used to accomplish bulk inserts to DB2 tables. It can add rows to a table, retaining the current data, or it can replace existing rows with the new data. Table Loading PhilosophiesThere are two distinct philosophies regarding the use of the LOAD utility. The first and generally recommended philosophy takes more time to implement but is easier to support. It requires the reservation of sufficient DASD to catalog the LOAD work data sets in case the LOAD job abends. The work data sets for the LOAD job are allocated for the DDNAME s SORTOUT , SYSUT1 , SYSERR , and SYSMAP with DISP=(MOD,DELETE,CATLG) . This enables the data sets to be allocated as new for the initial running of the REORG job. If the job abends, it catalogs the data sets in case they can be used in a restart. After the step completes successfully, the data sets are deleted. The space for these data sets must be planned and available before the LOAD job runs. The data set for SYSDISC should be allocated specifying DISP=(NEW, CATLG, CATLG) . If there are discards, the LOAD utility returns a RC=4 , and it does not abend. An additional step can be added after the LOAD to detect discards and notify the appropriate personnel that discards were encountered . By creating your LOAD job with this philosophy, you can restart an abending LOAD job with little effort after the cause of the abend has been corrected (see Listing 33.1). You simply specify one of the RESTART options in the UTPROC parameter for DSNUTILB . Listing 33.1. LOAD JCL (Restartable)//DB2JOBU JOB (UTILITY),'DB2 LOAD',MSGCLASS=X,CLASS=X, // NOTIFY=USER //* //**************************************************************** //* //* DB2 LOAD UTILITY (RESTARTABLE) //* //**************************************************************** //* //UTIL EXEC DSNUPROC,SYSTEM=DSN,UID='LOADDATA',UTPROC=" //* //* UTILITY WORK DATAETS //* //DSNUPROC.SORTWK01 DDUNIT=SYSDA,SPACE=(CYL,(2,1)) //DSNUPROC.SORTWK02 DDUNIT=SYSDA,SPACE=(CYL,(2,1)) //DSNUPROC.SORTOUT DD DSN=CAT.SORTOUT,DISP=(MOD,CATLG,CATLG), // UNIT=SYSDA,SPACE=(CYL,(2,1)) //DSNUPROC.SYSMAP DD DSN=CAT.SYSUT1,DISP=(MOD,DELETE,CATLG), // UNIT=SYSDA,SPACE=(CYL,(2,1)),DCB=BUFNO=20 //DSNUPROC.SYSUT1 DD DSN=CAT.SYSUT1,DISP=(MOD,DELETE,CATLG), // UNIT=SYSDA,SPACE=(CYL,(2,1)),DCB=BUFNO=20 //DSNUPROC.SYSDISC DD DSN=CAT.SYSDISC,DISP=(MOD,DELETE,CATLG), // UNIT=SYSDA,SPACE=(CYL,(1,1)) //DSNUPROC.SYSERR DD DSN=CAT.SYSERR,DISP=(MOD,DELETE,CATLG), // UNIT=SYSDA,SPACE=(CYL,(1,1)) //DSNUPROC.SYSREC00 DD DSN=CAT.LOAD.INPUT.DATASETA,DISP=SHR,DCB=BUFNO=20 //DSNUPROC.UTPRINT DD SYSOUT=X //* //* UTILITY INPUT CONTROL STATEMENTS //* The LOAD statement reloads the DSN8810.ACT table //* //DSNUPROC.SYSIN DD * LOAD DATA REPLACE INDDN SYSREC00 LOG NO INTO TABLE DSN8810.ACT (ACTNO POSITION ( 1 ) SMALLINT, ACTKWD POSITION ( 3 ) CHAR ( 6 ), ACTDESC POSITION ( 9 ) VARCHAR ) /* // NOTE The sort work data sets need to be assigned in the JCL only if sort work data sets are not dynamically allocated. Additionally, you should consider explicitly defining sort work data sets when loading very large tables. The second philosophy is easier to implement but more difficult to support. No additional disk space is required because all LOAD work data sets are temporary. Therefore, all interim work data sets are lost when the job abends. See Listing 33.2 for sample JCL. Listing 33.2. LOAD JCL (Nonrestartable)//DB2JOBU JOB (UTILITY),'DB2 LOAD',MSGCLASS=X,CLASS=X, // NOTIFY=USER,REGION=3M //* //**************************************************************** //* //* DB2 LOAD UTILITY (NON-RESTARTABLE) //* //**************************************************************** //* //UTIL EXEC DSNUPROC,SYSTEM=DSN,UID='LOADDATA',UTPROC=" //* //* UTILITY WORK DATASETS //* //DSNUPROC.SORTWK01 DD DSN=&&SORTWK01, // UNIT=SYSDA,SPACE=(CYL,(2,1)) //DSNUPROC.SORTWK02 DD DSN=&&SORTWK02, // UNIT=SYSDA,SPACE=(CYL,(2,1)) //DSNUPROC.SORTOUT DD DSN=&&SORTOUT, // UNIT=SYSDA,SPACE=(CYL,(2,1)) //DSNUPROC.SYSMAP DD DSN=CAT.SYSUT1,DISP=(MOD,CATLG,CATLG), // UNIT=SYSDA,SPACE=(CYL,(2,1)) //DSNUPROC.SYSUT1 DD DSN=&&SYSUT1,DCB=BUFNO=10 // UNIT=SYSDA,SPACE=(CYL,(2,1)) //DSNUPROC.SYSDISC DD DSN=CAT.SYSDISC,DISP=(MOD,CATLG,CATLG), // UNIT=SYSDA,SPACE=(CYL,(1,1)) //DSNUPROC.SYSERR DD DSN=&&SYSERR, // UNIT=SYSDA,SPACE=(CYL,(1,1)) //DSNUPROC.SYSREC00 DD DSN=CAT.LOAD.INPUT.DATASETD,DISP=SHR,DCB=BUFNO=10 //DSNUPROC.UTPRINT DD SYSOUT=X //* //* UTILITY INPUT CONTROL STATEMENTS //* The LOAD statement adds the data in SYSREC00 to //* the DSN8810.DEPT table. //* //DSNUPROC.SYSIN DD * LOAD DATA RESUME(YES) ENFORCE CONSTRAINTS LOG NO INDDN SYSREC00 INTO TABLE DSN8810.DEPT (DEPTNO POSITION( 1) CHAR( 3), DEPTNAME POSITION( 4) VARCHAR, MGRNO POSITION( 42) CHAR( 6) NULLIF( 48)='?', ADMRDEPT POSITION( 49) CHAR( 3), LOCATION POSITION( 52) CHAR( 16) NULLIF( 68)='?' ) /* // To restart this LOAD job, you must determine in which phase the job abended. If the job abends in any phase of a LOAD REPLACE , you can simply terminate the utility and rerun. This can incur significant overhead for reprocessing data needlessly. If the first philosophy is used, reprocessing is usually avoided. For a LOAD RESUME(YES) , however, if the job abends in any phase other than UTILINIT , you must restore the table space for the table being loaded to a previous point in time. This can be accomplished by running the RECOVER TOCOPY utility or by running a full RECOVER if the LOG NO option of the LOAD utility was specified. After restoring the table space (and possibly its associated indexes), you must correct the cause of the abend, terminate the utility, and then rerun the job. As you can see, this method is significantly more difficult to restart than the first method. Try to use the first philosophy rather than the second. This makes recovery from error situations as smooth and painless as possible. Estimating LOAD Work Data Set SizesThe LOAD utility requires work data sets to load data into DB2 tables. The following formulas can help you calculate estimated sizes for these work data sets. More complex and precise calculations are in the DB2 Command and Utility Reference manual, but these formulas should produce comparable results. SORTOUT = (size of the largest index key or foreign key + 14) x (total number of rows in NOTE If any number in the SORTOUT calculation is 0, substitute 1. The multiplier 1.2 is factored into the calculation to provide a "fudge factor." If you are absolutely sure of your numbers , the calculation can be made more precise by eliminating the additional multiplication of 1.2. SYSUT1 = (size of the largest index key or foreign key + 14) x (total number of rows to be NOTE If any number in the SYSUT1 calculation is 0, substitute 1. The multiplier 1.2 is factored into the calculation to provide a "fudge factor." If you are absolutely sure of your numbers, the calculation can be made more precise by eliminating the additional multiplication of 1.2. SORTWKxx = (size of SYSUT1) x 2 SYSERR = ((number of estimated unique index errors) + (number of estimated data NOTE Always allocate the SYSERR data set to be at least 1 cylinder. SYSMAP = (total number of rows to be loaded to the table) x 21 NOTE The SYSMAP data set is required if either of the following is true:
SYSDISC = Allocate the SYSDISC data set to be the same size as the data set containing the NOTE The space requirements for SYSDISC may be prohibitive if disk space is at a premium at your shop. Instead of allocating the SYSDISC data set as large as the data being loaded, consider using a small primary quantity and a larger secondary quantity ”for example: SPACE=(CYL,(0,50),RLSE) NOTE Although the SYSDISC data set is optional, specifying it is highly recommended to trap records that cannot be loaded. After calculating the estimated size in bytes for each work data set, convert the number into cylinders , rounding up to the next whole cylinder. Allocating work data sets in cylinder increments enhances the utility's performance. LOAD PhasesThere are nine possible phases of the LOAD utility:
NOTE The SORT phase will be skipped if the RELOAD phase analyzes the data and determines that the SORT phase is not needed. Creating an Inline Copy During the LOADIt is possible to create a full image copy data set during the execution of the LOAD utility. This is referred to as an inline COPY . The image copy will be a SHRLEVEL REFERENCE copy. There are two major benefits of taking an inline copy. The first is that a second pass of the data is not required to create a DB2 image copy. The second benefit is that the table space into which the data is being loaded will not be placed into a copy pending state when inline copy is specified, even if the LOG NO option is specified. To create an inline copy, use the COPYDDN and RECOVERYDDN keywords. You can specify up to two primary and two secondary copies. Gathering Inline Statistics During the LOADYou also can generate statistics during the execution of the LOAD utility. This is referred to as inline RUNSTATS . Up-to-date statistics will be generated during the LOAD instead of requiring an additional RUNSTATS step. To generate inline RUNSTATS , use the STATISTICS keyword. You can gather table space statistics, index statistics, or both. Discards and Inline RUNSTATSIf you specify both the DISCARDDN and STATISTICS options, the inline statistics collected during the LOAD may be inaccurate. When a row is found with check constraint errors or conversion errors, the row is not loaded into the table, so DB2 will not collect statistics for it. So far, so good. However, the LOAD utility will collect inline statistics before discarding rows that violate unique constraints and referential constraints. Therefore, when the number of rows that violate RI and unique constraints is high, the statistics could be quite imprecise. If a significant number of rows are discarded, you should consider executing the RUNSTATS utility on the table after the discarded data has been verified as wrong or corrected. Loading Delimited Input Data Sets
CAUTION When loading delimited data, you cannot specify CONTINUEIF , INCURSOR , WHEN , or multiple INTO TABLE statements. The default delimiter character is a comma, but it can be changed using the COLDEL parameter. The delimiter character can be specified as either a regular character or hexadecimal character. CAUTION When you specify the delimiter character, be sure to verify that the character is specified in the code page of the source data. Furthermore, if the utility control parameter is coded in a different character type than the input file, specify the COLDEL in hex or the result can be unpredictable. For example, specify the delimiter as a hex constant if the utility control statement is coded in EBCDIC and the input data is ASCII or Unicode. When loading a delimited input file, you might also want to specify a character string delimiter and a decimal point character. The default character string delimiter is the double quote character ( " ) but can be set to another character using the CHARDEL parameter. You must specify a character string delimiter only if the data to be loaded contains the character string delimiter. Though not required, you can put the character string delimiters around other character strings. To load a string containing the character delimiter character, you must double up the character. For example, code the following to LOAD a string containing: He told me "You look well" and I liked it. "He told me ""You look well"" and I liked it." Finally, you can change the decimal point character, too. This is done using the DECPT parameter. The default is a period, but some countries use a comma. Therefore, you can change the decimal point character to another character of your choice using DECPT . LOAD Rerun/Restart ProceduresThe LOAD utility can be restarted. The restart or rerun procedure is determined by the abending phase of the LOAD step. There are two ways to determine the phase in which the abend occurred. The first method is to issue the DISPLAY UTILITY command to determine which utilities are currently active, stopped , or terminating in the DB2 system. The format of the command is -DISPLAY UTILITY(*) The second method to determine the abending phase is to view the SYSPRINT DD statement of the LOAD step. This method is not as desirable as the first, but it is the only method you can use when the DB2 system is down. At the completion of each phase, DB2 prints a line stating that the phase has completed. You can assume that the phase immediately following the last phase reported complete in the SYSPRINT DD statement is the phase that was executing when the abend occurred. After determining the phase of the LOAD utility at the time of the abend, follow the steps outlined here to restart or rerun the load. In the following procedures, it is assumed that your LOAD utility processing is generally restartable. If the abend occurred in the UTILINIT phase
If the abend occurred in the RELOAD phase
CAUTION Although LOAD can be restarted normally within the RELOAD phase if SORTKEYS is not used, it will restart from the beginning of the RELOAD phase if SORTKEYS is used. If the abend occurred in the SORT phase
If the abend occurred in the BUILD phase
NOTE When the SORTKEYS parameter is used and the LOAD utility terminates during the RELOAD , SORT , or BUILD phases, both RESTART and RESTART(PHASE) restart from the beginning of the RELOAD phase. If the abend occurred in the INDEXVAL phase
If the abend occurred in the ENFORCE phase
If the abend occurred in the DISCARD phase
If the abend occurred in the REPORT phase
If the abend occurred in the UTILTERM phase
LOAD Locking and ConcurrencyThe LOAD utility can run concurrently with the following utilities (each accessing the same object): DIAGNOSE , REPORT , and STOSPACE . The LOAD utility will drain all claim classes for the table space or partition being loaded and any associated indexes, index partitions, and logical index partitions. Furthermore, if the ENFORCE option is specified, LOAD will drain the write claim class for the primary key index. Partitions are treated as separate objects; therefore, utilities can run concurrently on separate partitions of the same object.
An additional benefit of parallel loading is that the input data need not be broken out into separate data sets. And the same goes for the error data set and mapping data set.
NOTE Be aware that the INDDN and DISCARDDN options can only be specified if the PART keyword is also specified. They cannot be used with segmented or simple table spaces. The actual number of parallel RELOAD tasks to be run is determined by the number of CPUs, the availability of virtual storage, and the number of available DB2 threads. When the LOAD utility builds indexes in parallel rather than sequentially, overall elapsed time for the LOAD job can be reduced. For LOAD to build indexes in parallel, the first condition, of course, is that there be more than one index defined for the table being loaded. If that is the case, the SORTKEYS clause must be specified with an estimate for the number of keys, and sort work data sets must be allocated to the LOAD job (either explicitly or dynamically). Online Loading
Previously, running the LOAD utility made the table data unavailable. Of course, you could always code a program to insert the data, but that is not very efficient or simple. Online loading will work using normal SQL INSERT s. The LOAD will perform normal claim processing and no drains. But all normal SQL processing activities will occur. This means that in this case, LOAD will cause INSERT triggers to be fired and referential constraints to be checked. LOAD GuidelinesWhen running the LOAD utility consider applying the following tips, tricks, and techniques. Consider Using SORTKEYSWhen index keys are not already in sorted order and indexes exist on the table into which data is being loaded, consider using the SORTKEYS keyword. When SORTKEYS is specified, index keys are sorted in memory, rather than being written to work files. This can improve performance by:
An estimate of the number of keys to be sorted can be supplied. This is optional, but recommended because the extracted keys will be written to a work data set, minimizing the efficiency gains of using the SORTKEYS parameter. To estimate the number of keys to sort, use the following calculation: Number of Keys = (Total number of rows to be loaded) x [(number of indexes on the table) + (number of foreign keys {unless index exists for the FK}) + ((number of foreign keys participating in multiple relationships) x (number of relationships - 1)) ] NOTE If more than one table is being loaded, the preceding calculation must be repeated for each table ”the sum of the results is used.
Avoid the LOAD Utility for Tables with TriggersYou may wish to avoid using the LOAD utility to add data to any table on which you have defined an INSERT trigger. Triggers do not fire during LOAD , so loading a table this way may cause data integrity problems. Instead, code a program to insert the data as needed because INSERT will cause the trigger to fire appropriately. NOTE Of course, this caveat does not apply to an online LOAD resume, because the LOAD utility will perform normal SQL INSERT s, thereby firing triggers as desired. Consider Serializing Loads for Tables in the Same DatabaseThe LOAD utility is sensitive to concurrent processing. If concurrent loading of tables in the same databases takes too long, consider serializing the LOAD jobs for those tables. Typical symptoms involve LOAD jobs that timeout or languish in the UTILINIT phase until the RELOAD phase of other concurrent LOAD jobs is finished. NOTE Consider assigning tables needing to be loaded concurrently to different databases to avoid this problem. Another approach is to assign only one table per database. Use LOAD to Append or Replace RowsYou can use LOAD to replace data in a table by specifying the REPLACE option. LOAD also can append new data to a table, leaving current data intact, by specifying the RESUME(YES) option. Choose the appropriate option based on your data loading needs. Use LOAD to Perform Mass DeletesUse the LOAD utility, specifying an empty input data set (or DD DUMMY ), to delete all rows from a non-segmented table space. This is called a mass delete . LOAD is usually more efficient than DELETE SQL without a WHERE clause. Specifying the LOG NO option to avoid logging data changes will further enhance the performance of the mass delete. Note, however, the following considerations:
Use Fixed Blocked InputTo enhance the performance of the LOAD utility, use a fixed blocked input data set rather than a variable blocked data set. Buffer the Work Data Sets AppropriatelyFor large loads, set the BUFNO parameter in the JCL for the SYSUT1 DD statement to a number greater than 20. A BUFNO of approximately 20 is recommended for medium- sized indexes, and a BUFNO between 50 and 100 is recommended for larger tables. The BUFNO parameter creates read and write buffers in main storage for the data set, thereby enhancing the performance of the LOAD utility. The default for BUFNO is 8 for DB2 V3 and 20 for DB2 V4. Ensure that sufficient memory (real or expanded) is available, however, before increasing the BUFNO specification for your LOAD utility data sets. Enforce RI During Table Loading When PossibleFavor using the ENFORCE option of the LOAD utility to enforce referential constraints instead of running CHECK DATA after the LOAD completes. It is usually more efficient to process the loaded data once, as it is loaded, than to process the data twice, once to load it and once to check it. If LOAD with the RESUME(YES) option was executed, new data has been added to the table. However, if ENFORCE was not specified and a subsequent CHECK DATA is run, CHECK DATA will check the entire table, not just the new data. Ensure That LOAD Input Data Sets Are in Key SequenceFavor sorting the LOAD input data set into sequence by the columns designated in the clustering index. Be sure to sort the data in the appropriate sequence, either ascending or descending, depending on how the index was defined. Otherwise, the LOAD utility does not load data in clustering order, and the table space and indexes will be inefficiently organized. NOTE When the index key is null, it should be treated as "high values" for sorting purposes. If you use DFSORT to sort the input data before loading, consider invoking the SORT DEDUPE option. Doing so not only can decrease the size of the file passed to the LOAD step, but it can minimize or eliminate LOAD discard processing. Removing duplicates in DFSORT can improve performance because of the different ways that DFSORT and the LOAD utility handle duplicates. When DFSORT encounters duplicates, it sends one of the values to the output file then discards the remaining duplicates. When the LOAD utility encounters duplicates in the input file, it sends all of the duplicates to the discard file. Consider the following code for DFSORT : //SYSIN DD * SORT FIELDS=(1,4,BI,A,5,4,BI,A,9,12,CH,A) SUM FIELDS=NONE /* This code indicates that DFSORT is to sort on three fields. The first starts in position 1 for 4 bytes, the second starts in position 5 for 4 bytes, and the third starts in position 9 for 12 bytes. The first two fields are unsigned binary and the third is character. And the sort is to be ascending for each field. Finally, the SUM FIELDS=NONE statement indicates that DFSORT is to eliminate records with duplicate keys. Additionally, you can improve performance by removing unneeded records during the DFSORT step, instead of using a WHEN clause on the LOAD utility. Doing so can decrease the size of the file passed to the LOAD step and the DFSORT INCLUDE is more efficient than LOAD WHEN . Consider the following sample DFSORT code: //SYSIN DD * INCLUDE COND=(9,4,CH,EQ,C'CDBD') /* This code indicates that DFSORT is to start in position 9 and drop the record when the next four bytes equal 'CDBD'. More details on how to use DFSORT can be found in the IBM manual number SC33-4035, DFSORT Application Programming Guide . REORG After Loading When the Input Is Not SortedIf data is not loaded in clustering sequence, consider following the LOAD with a table space reorganization. This can be performed all the time, which is not recommended, or based on the value of CLUSTER RATIO stored in the DB2 Catalog for the table space and its clustering index. If CLUSTER RATIO is not 100% for a newly loaded table, the REORG utility should be used to cluster and organize the application data. NOTE If LOAD is run specifying RESUME(YES) then even if the input is in clustering sequence, the result can be a CLUSTER RATIO less than 100%. It is best to avoid sorting the input in this case. Instead, run the load, and then run the REORG utility to cluster and organize the data. Favor the Use of LOG NOUse the LOG NO option unless the table to be loaded is very small. Doing so avoids the overhead of logging the loaded data and speeds load processing. If data is loaded without being logged, however, follow the LOAD utility with a full image copy. Specify KEEPDICTIONARY for PerformanceThe LOAD utility will rebuild the compression dictionary for table spaces defined with the COMPRESS YES parameter. Specifying the KEEPDICTIONARY parameter causes the LOAD utility to bypass dictionary rebuilding. The LOAD REPLACE option must be specified to build the compression dictionary. This will improve the overall performance of the LOAD utility because the CPU cycles used to build the dictionary can be avoided. However, this option should be utilized only when you are sure that the same basic type of data is being loaded into the table. If the type of data differs substantially, allowing the LOAD utility to rebuild the compression dictionary will provide for more optimal data compression. NOTE Keeping the compression dictionary can increase work space requirements for the REORG utility. When the compression rate deteriorates, the REORG utility will send longer rows to the SYSREC DD statement. Avoid Nullable Columns for Frequently Loaded TablesLoading tables with nullable columns can degrade the LOAD utility's performance. If a table will be loaded frequently (daily, for example), consider reducing or eliminating the number of nullable columns defined to the table to increase the performance of the LOAD utility. This is not always practical or desirable because many program changes may be required to change columns from nullable to NOT NULL or to NOT NULL WITH DEFAULT . Additionally, nullable columns might make more sense than default values given the specification of the application. Avoid Decimal Columns for Frequently Loaded TablesAvoid DECIMAL columns for tables that are loaded frequently. Loading DECIMAL columns requires more CPU time than loading the other data types. Avoid Data ConversionThe LOAD utility automatically converts similar data types as part of its processing. However, try to avoid data conversion, because the LOAD utility requires additional CPU time to process these conversions. The following data conversions are performed automatically by the LOAD utility:
Reduce CPU Usage by Explicitly Coding All LOAD ParametersExplicitly define the input file specifications in the LOAD control cards. Do this even when the data set to be loaded conforms to all the default lengths specified in Table 33.1. This reduces the LOAD utility's CPU use. Table 33.1. Default LOAD Lengths
If the input file specifications are not explicitly identified, the LOAD utility assumes that the input data set is formatted with the defaults specified in Table 33.1. NOTE You can use the DSNTIAUL sample program, the UNLOAD utility, or REORG UNLOAD EXTERNAL to build LOAD control cards with explicit definitions. The PUNCHDDN keyword is used to specify a data set for the control cards. For BLOB , CLOB , and DBCLOB data, you must specify the length of the input field in bytes. This length value is placed in a four-byte binary field at the beginning of the LOB value. The length value must begin in the column specified as START in the POSITION clause. The END specification is not used for LOB s. Consider Using LOAD Parameters to Edit Data Before LoadingYou can use the STRIP and TRUNCATE parameters of LOAD to tweak graphic and character data before loading. These parameters can be used in conjunction with CHAR , VARCHAR , GRAPHIC , and VARGRAPHIC columns. The STRIP parameter indicates that LOAD must remove specified characters from the beginning, the end, or both ends of the data prior to loading it. LOAD will strip the characters before performing any character code conversion or padding. If a specific character is not coded, the default is to strip blanks. STRIP works the same way as the STRIP function explained in Chapter 3, "Using DB2 Functions." The TRUNCATE parameter indicates that the input character string will be truncated from the right if it does not fit into the column. LOAD will truncate the data after any required CCSID translation. Create All Indexes Before LoadingIt is usually more efficient to define all indexes before using the LOAD utility. The LOAD utility uses an efficient algorithm to build DB2 indexes. If indexes must be created after the data has been loaded, create the indexes with the DEFER YES option and build them later using the REBUILD INDEX utility. Favor LOAD over INSERTTo insert initial data into a DB2 table, favor the use of the LOAD utility with the REPLACE option over an application program coded to process INSERT s. LOAD should be favored even if the application normally processes INSERT s as part of its design. The initial loading of DB2 table data usually involves the insertion of many more rows than does typical application processing. For the initial population of table data, the LOAD utility is generally more efficient and less error-prone than a corresponding application program, and also maintains free space. Consider using the LOAD utility with the RESUME(YES) option to process a large volume of table insertions. LOAD is usually more efficient and less error-prone than a corresponding application program that issues a large number of INSERT s. Do Not Load Tables in a Multi-Table Simple Table SpaceAvoid loading tables with the REPLACE option when multiple tables have been defined to a simple table space. The LOAD utility with the REPLACE option deletes all rows in all tables in the simple table space, which is not usually the desired result. Gather Statistics When Loading DataIf you are loading data into a DB2 table specifying RESUME NO and the REPLACE keyword, you also should use the STATISTICS keyword to gather statistics during LOAD processing. These keywords specify that you are loading a table from scratch and that any previous data will be lost. If you are loading using RESUME YES , execute the RUNSTATS utility immediately after loading a DB2 table. Accurate statistics are necessary to maintain current information about your table data for access path determination. Of course, access paths for static SQL will not change unless all packages and plans accessing the table are rebound. Any dynamic SQL statements will immediately take advantage of the new statistics. Consider Loading by PartitionConcurrent loading of multiple partitions of a single table space can be achieved using partition independence. This technique is useful for reducing the overall elapsed time of loading a table in a partitioned table space. Use Data Contingency Options As RequiredThe LOAD utility can perform special processing of data depending on the data values in the input load data set. Data contingency processing parameters indicate a field defined in the LOAD parameters or a beginning and ending location of data items to be checked. The data contingency processing parameters follow:
CAUTION NULLIF cannot be used with ROWID columns because a ROWID column cannot be null. Separate Work Data SetsSpread the work data sets across different physical devices to reduce contention. Use Caution When Loading ROWID DataWhen loading a table with a ROWID column, ensure that the input data is a valid ROWID value. The appropriate way to do this is to load ROWID values that were previously generated by DB2. If the ROWID is defined with the GENERATED ALWAYS keyword, you cannot load data into that column. Instead, the ROWID value must be generated by DB2. Handle Floating Point DataLoading floating point data into DB2 tables requires that you know the format of the data. Two options are available for loading floating point data:
NOTE If a conversion error occurs while converting from binary floating point format to hexadecimal floating point format, DB2 will place the record in the discard file. Optimize Sort Utility ProcessingBe sure to optimize the operation of the sort utility in use at your shop. For example, you can assign additional resources to DFSORT using the following DD statement: //DFSPARM DD * HIPRMAX=0,EXCPVR=NONE,SIZE=32768K Additionally, consider using the SORTNUM clause to increase the number of dynamically allocated files and use the SORTDEVT clause to assign the disk pool. Be Aware of the Impact of Multi-Level Security on LOADIf you use multilevel security to control authorization, the user or process must have write-down privilege to run a LOAD REPLACE on a table space containing a table that has multilevel security with row-level granularity. This allows you to specify values for the security label columns. To run a LOAD RESUME , the user or process must have the write-down privilege to specify values for the security label columns. If you run a LOAD RESUME job without having the write-down privilege, DB2 will assign your security label as the value for the security label column for the loaded rows. |
< Day Day Up > |