0369-0372

Previous Table of Contents Next

Page 369


Figure 16.6.
Physical versus logical records.

Concatenated Records

SQL*Loader can concatenate records either by specifying the number of physical records that form one logical record or by specifying conditional evaluations based on character occurrences within the data. If the number of physical records that comprise a logical record varies, then you must base the concatenation on conditional evaluations. The CONCATENATE and CONTINUEIF clauses in the control file facilitate physical record concatenation.

SQL*Loader Paths

SQL*Loader provides two paths for loading data:

  • The conventional path
  • The direct path

The Conventional Path

The conventional path generates a SQL INSERT command with the array processing interface to load the data into the tables. Because of this interface, SQL*Loader competes with all other Oracle processes for buffer cache resources; Oracle looks for and tries to fill partially filled blocks on each insert. (See Figure 16.7.) If you are loading small amounts of data, this method is usually acceptable; with large volumes of data, however, this technique becomes too time-consuming and resource- intensive .

The following conditions exist when you load data with the conventional path:

  • Other users and Oracle processes can access the table concurrently with SQL*Loader.
  • Indexes are updated as the rows are inserted into the table.
  • All referential and integrity constraints are enforced as the data is loaded into the table.

Page 370

  • Database insert triggers fire as the rows are inserted into the table.
  • You can apply SQL functions to the input data as it is loaded.
  • Data can be loaded into a clustered table.
  • Data can be loaded with SQL*Net.

Figure 16.7.
Conventional loader path.

The Direct Path

In contrast to the conventional path, the direct path is optimized for bulk data loads. Instead of using the buffer cache to obtain and fill the database blocks, the direct path option uses the Extent Manager to get new extents and adjust the high water mark. Direct path creates formatted database blocks and writes them directly to the database. (See Figure 16.8.)


Figure 16.8.
Direct loader path.

Page 371


The direct path has several significant advantages:
  • You can load and index large amounts of data into empty or non-empty tables in a relatively short period of time.
  • If loading data into empty tables, you can load presorted data and eliminate the sort and merge phases of the index build, thereby significantly increasing performance.
  • You can load data in parallel, which enables multiple SQL*Loader sessions to perform concurrent direct path loads into the same table.
  • You can specify that a direct path load be done in an UNRECOVERABLE mode, which bypasses Oracle's redo logging activity and significantly increases performance.

Although the direct path significantly increases performance, it does have some limitations:

  • The tables and indexes into which you are loading data are exclusively locked at the start of the load and not released until the load is finished; the tables cannot have any active transactions on them and are not available to other users or processes until the load is completed.
  • Indexes are put into a direct load state at the start of the load and need to be rebuilt, either automatically or manually, after the load is completed. If the SQL*Loader session does not complete successfully, the indexes are left in the direct load state and need to be rebuilt manually. Any PRIMARY KEY or UNIQUE constraints are not validated until after the load is complete and the index rebuild occurs; you might have duplicate keys and need to correct them through the use of the exceptions table before you can rebuild the index.
  • The NOT NULL constraint is the only constraint checked at insertion time. All other integrity and referential constraints are re-enabled and enforced after the load is complete. If any violations exist, they are placed into the exceptions table, which you should specify when you create the constraint. The exceptions table must be created before the load session.
  • Database insert triggers do not fire. Any application functionality that relies on them must be accomplished through some other method.
  • You cannot apply SQL functions to the input data as it is loaded.
  • Data cannot be loaded into a clustered table.
  • Only in the case where both computer systems belong to the same family, and both are using the same character set, can data be loaded through SQL*Net. You should not use SQL*Net for direct path loads because the direct path should be used only for large amounts of data. Network overhead offsets any performance gains associated with the direct path.
  • DEFAULT column specifications are not available with the direct path.
  • Synonyms that exist for the tables being loaded must point directly to the table; they cannot point to another synonym or view.

Page 372

Direct path loads of single partitions have the following limitations:

  • The table being loaded cannot have any global indexes defined on it.
  • Referential and check constraints on the object being loaded must be disabled.
  • Triggers on the object being loaded must be disabled.
NOTE
The main advantage to the direct path method is performance. No hard and fast rules exist to determine when to use the conventional method or the direct path method because the definition of "large amounts of data" varies from application to application. Other external factors vary also, such as database availability, the cleanliness of the data being loaded, and system resources. I have seen loads go from days to hours using the parallel, direct path method instead of the conventional path method with clean data on a multiprocessor system. I have also seen cases where significant amounts of time were spent cleaning the table data so that an index could rebuild after a direct path load. You need to analyze your situation and, based on all the factors, decide which method is better for your application.

Parallel Data Loading

Oracle version 7.1 introduced the functionality of performing direct path loads in parallel. This feature gives SQL*Loader nearly linear performance gains on multiprocessor systems. By using the parallel option, you can load multiple input files concurrently into tablespace files, with each file belonging to the tablespace in which the table resides. Maximum throughput is achieved by striping the tablespace across different devices and placing the input files on separate devices and preferably separate controllers. (See Figure 16.9.)

When using the parallel option, SQL*Loader creates temporary segments, sized according to the NEXT storage parameter specified for the table, in the tablespace file specified in the OPTIONS clause of the control file.

CAUTION
The specified file must be in the tablespace in which the table resides, or you get a fatal error.

Upon completion of the SQL*Loader session, the temporary segments are merged (with the last extent trimmed of any free space) and added to the existing table in the database above the high water mark for the table.

Previous Table of Contents Next


Oracle Unleashed
Oracle Development Unleashed (3rd Edition)
ISBN: 0672315750
EAN: 2147483647
Year: 1997
Pages: 391

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net