Logged and Non-Logged Operations

There are two modes in which bulk-copy operations can occur: logged and non-logged (also known as slow and fast BCP, respectively). The ideal situation is to operate in non-logged mode because this arrangement dramatically decreases the load time and the consumption of other system resources such as memory, processor use, and disk access. However, the default runs the load in logged mode, which causes the log to grow rapidly for large volumes of data.

To achieve a non-logged operation, the target table must not be replicated (the replication log reader needs the log records to relay the changes made). The database holding the target table must also have its SELECT INTO/BULK COPY option set, and finally, the TABLOCK hint must be specified.

NOTE

Remember that setting the SELECT INTO/BULK COPY option disables the capability to back up the transaction log until a full database backup has been performed. Transaction log dumps are disabled because if the database had to be restored, the transaction log would not contain a record of the new data.

Although you can still perform fast loads against tables that have indexes, it is advisable to drop and re-create the indexes after the data transfer operation is complete. In other words, the total load time includes the loading of the data and the index creation time. If there is existing data in the table, the operation will be logged; you achieve a non-logged operation only if the table is initially empty. Generally, you get at least a 50 percent drop in transfer speed if the table has an index. The more indexes, the greater the performance degradation. This is due to the logging factor; more log records are being generated, and index pages are being loaded into the cache and modified. This can also cause the log to grow, possibly filling it (depending on your log file settings).

NOTE

Even the so-called non-logged operation logs some things. In the case of indexes, index page changes and allocations are logged, but the main area of logging is of extent allocations every time the table is extended for additional storage space for the new rows.

Batches

By default, BCP puts all the rows that are inserted into the target table into a single transaction (as seen in Figure 20.4). BCP calls this a "batch." This arrangement reduces the amount of work that the log must deal with; however, it locks down the transaction log by keeping a large part of it active, which can make truncating or backing up the transaction log impossible or unproductive.

Figure 20.4. Default transaction behavior of BCP.

graphics/20fig04.gif

By using the BCP batch ( -b ) switch, you can control the number of rows in each batch (or effectively, each transaction). This switch controls the frequency of commits; although it can increase the activity in the log, it enables you to trim the size of the transaction log. You should tune the batch size in relation to the size of the data rows, transaction log size, and total number of rows to be loaded. The value you use for one load might not necessarily be the right value for all other loads.

As you can see in Figure 20.5, a batch switch of 500 is specified for the customer table load. This creates committed milestones during the load at the end of each 500 rows inserted.

Figure 20.5. Changing the Transaction behavior of BCP with the `“b` switch.

graphics/20fig05.gif

Note that if the third batch fails, the first and second batches are committed, and those rows are now part of the table. However, any rows copied up to the point of failure in the third batch are rolled back.

Parallel Loading

A great enhancement of BCP is that of being able to do parallel loads of tables. To take advantage of this feature, the following must be true:

The bulk-copy operation must be non-logged; all requirements specified in the previous discussion on non-logged operations must be met.
There must be no indexes on the target table.

Only applications using the ODBC or SQL OLE DB “based APIs can perform parallel data loads into a single table.

The procedure is straightforward. After you have ascertained that the target table has no indexes (which could involve dropping primary or unique constraints) and is not being replicated, you must set the database option SELECT INTO/BULK COPY to true . The requirement to drop all indexes comes from the locking that has to occur to load the data. Although the table itself can have a shared lock, the index pages are an area of contention that prevents parallel access.

Now all that is required is to set up the parallel BCP loads to load the data into the table. You can use the -F and -L switches to specify the range of the data you want each parallel BCP to load into the table if you are using the same data file. These switches remove the need to manually break up the file. Here is an example of the command switches involved for a parallel load with BCP for the customers table:

 bcp "northwind..customers"in customers.dat -T S servername -c -F 1         -L 10000 -h "TABLOCK" bcp "northwind..customers"in customers.dat -T S servername -c -F 10001        -L 20000 -h "TABLOCK"

The TABLOCK hint ( -h switch) provides better performance by removing contention from other users while the load takes place. If you do not use the hint, the load will take place using row-level locks, which will be considerably slower.

SQL Server 2000 allows parallel loads without impacting performance by making each BCP connection create extents in non-overlapping ranges. The ranges are then linked into the table's page chain.

After the table is loaded, it is also possible to create multiple nonclustered indexes in parallel. If there is a clustered index, do this one first, followed by the parallel nonclustered index created.

Supplying Hints to BCP

The SQL Server 2000 version of BCP comes with the capability to further control the speed of data loading and to invoke constraints and have insert triggers fired during loads by using something called hint switches ( -h ). One or more hints can be specified at a time:

 -h "  hint  [,  hint  ]"

This option cannot be used when bulk-copying data into versions of SQL Server before version 7.0 because, starting with SQL Server 7.0, BCP works in conjunction with the query processor. The query processor optimizes data loads and unloads for OLE database rowsets that the latest versions of BCP and BULK INSERT can generate.

The `ROWS_PER_BATCH` Hint

The ROWS_PER_BATCH hint is used to tell SQL Server the total number of rows in the data file. This hint helps SQL Server to optimize the entire load operation. This hint and the -b switch heavily influence the logging operations that occur with the data inserts . If you specify both this hint and the -b switch, they must have the same values or you will get an error message.

When you use the ROWS_PER_BATCH hint, you copy the entire resultset as a single transaction. SQL Server automagically optimizes the load operation using the batch size you specify. The value you specify does not have to be accurate, but you should be aware of what the practical limit will be, based on the database's transaction log.

TIP

Do not be confused by the name of the hint. You are specifying the total file size and not the batch size (as is the case with the -b switch).

The `CHECK_CONSTRAINTS` Hint

The CHECK_CONSTRAINTS hint controls whether check constraints are executed as part of the BCP operation. With BCP, the default is that check constraints are not executed. This hint option allows you to turn the feature on (to have check constraint be executed for each insert). If you do not use this option, you either should be very sure of your data or should rerun the same logic as the check constraints you deferred after the data has been loaded.

The `FIRE_TRIGGER` Hint

The FIRE_TRIGGER hint controls whether the insert trigger on the target table will be executed as part of the BCP operation. With BCP, the default is that no triggers are executed. This hint option allows you to turn the feature on (to have insert triggers be executed for each insert). As you can imagine, when this option is used, it slows down the BCP load operation. However, the business reasons to have the insert trigger fired might outweigh the slower load.

The `ORDER` Hint

If the data you want to load is already in the same sequence as the clustered index on the receiving table, you can use the ORDER hint. The syntax for this hint is shown here:

  ORDER({    column    [ASC   DESC] [,...    n    ]})

There must be a clustered index on the same columns , in the same key sequence as you specify in the ORDER hint. Using a sorted data file (in the same order as the clustering index) helps SQL Server place the data into the table with minimal overhead.

The `KILOBYTES_PER_BATCH` Hint

The KILOBYTES_PER_BATCH hint gives the size, in kilobytes, of the data in each batch. This is an estimate and is used internally by SQL Server to optimize the data load and logging areas of the BCP operation.

The `TABLOCK` Hint

The TABLOCK hint is used to place a table-level lock for the BCP load duration. This hint gives you increased performance at a loss of concurrency as described in "Parallel Loading," earlier in this chapter.

NOTE

NOTE

Batches

Figure 20.4. Default transaction behavior of BCP.

Figure 20.5. Changing the Transaction behavior of BCP with the “b switch.

Parallel Loading

Supplying Hints to BCP

The ROWS_PER_BATCH Hint

TIP

The CHECK_CONSTRAINTS Hint

The FIRE_TRIGGER Hint

The ORDER Hint

The KILOBYTES_PER_BATCH Hint

The TABLOCK Hint

Figure 20.5. Changing the Transaction behavior of BCP with the `“b` switch.

The `ROWS_PER_BATCH` Hint

The `CHECK_CONSTRAINTS` Hint

The `FIRE_TRIGGER` Hint

The `ORDER` Hint

The `KILOBYTES_PER_BATCH` Hint

The `TABLOCK` Hint