Bulk Insert Task | Wireless Java : Developing with Java 2, Micro Edition

The Bulk Insert task allows you to insert data from a text file (also called a flat file) into a SQL Server in the same manner as a BULK INSERT statement or the bcp.exe command-line tool. This task is in the controller flow only, and it does not generate a data flow. A disadvantage to the task is that it does not allow any transformations to occur to the data in flight. In exchange for this, you have the fastest way to load a lot of data into a SQL Server since it's done in a bulk fashion.

When you create a Bulk Insert task and go to configure it, the Bulk Insert Task Editor will open (shown in Figure 3-2). As in most tasks, the General page allows you to name and describe the task. Make sure you name it something that describes its unit of work, like "Prepare Staging." This will help you later when you deploy the package and troubleshoot problems.

image from book
Figure 3-2

The most important page, which is shown in Figure 3-2, is the Connection page. This page lets you specify the source and destination for the data. Select the destination from the Connection drop-down box in the Destination Connection group. Next, specify a destination table from the next drop-down box below the destination connection. While you're specifying connections, also specify the source connection's file name in the File drop-down box. Both the source and destination connections use the Connection Manager. If you haven't already created the shared connections, you'll need to create a new one from the source or destination connection drop-down box by selecting <New Connection…>.

After you specify the connections, you can set what type of file you're importing and how it's delimited. You can select Use File or Specify from the format drop-down box. If you select Use File, you'll have to select a Bulk Insert format file, which tells the task how the file is formatted (text and row delimiters). Generally, you're going to want to specify a file format by selecting Specify from the Format drop-down box. The Specify option gives you the control to specify what the delimiters are. For the most part, the default delimiter options are perfect for specifying a file format. One that you may have to change on a regular basis is the column delimiter, which tells the task what character separates each column. The default here is tab-delimited, but another common one you'll see is comma-delimited.

Note

The format file must be on the SQL Server, since the operation occurs there when a bulk insert task is used. You may also wish to use a UNC path (\\MachineName\ShareName\FileName.csv) if the source or format file is not going to be on the executing server.

In the Options page of the Bulk Insert Task Editor, you'll be able to use some lesser-known options. Here you can specify the code page for the source file. You will rarely want to change the code page from RAW, which is the default. The DataFileType option can specify what type of file the source file is. Options here include char, native, widechar, and widenative. Generally, files you receive will be the default char option, but in some cases you may see a file with native format. You'll see this if the file was created from an instance of SQL Server by using the bcp.exe program with the –n (native) switch.

In the Options page, you can also specify the first and last row to copy if you'd like only a sampling of the rows or if you want to skip a header row. The BatchSize option shows how many records will be written to SQL Server before committing the batch. If you have a BatchSize of 0 (the default), this means that all the records will be written to SQL Server in a single batch. If you have more than 100,000 records, then you may want to adjust this setting to 50,000 or adjust based on your need. The adjustment may vary based on the width of your file.

The options drop-down box contains five options that you can turn off and on.

Check Constraints: The option that checks table and column constraints before committing the record. This option is the only one turned on by default.
Keep Nulls: By selecting this option, the Bulk Insert task will replace any empty columns in the source file with NULLs in SQL Server.
Enable Identity Insert: Enable this option if your destination table has an identity column that you're inserting into. Otherwise, you will receive an error.
Table Lock: This option creates a SQL Server lock on the target table from insert and updates other than the records you're inserting. This option will speed up your process but may cause a production outage since others will be blocked from modifying the table. If you check this option, SSIS will not have to compete for locks to insert massive amounts of data into the target table. Set this option only if you're certain that no other process will be competing with your task for table access.
Fire Triggers: By default, the Bulk Insert task will ignore triggers for maximum speed. By checking this option, the task will no longer ignore triggers and will fire the insert triggers for the table you're inserting into.

There are a few other options you can set in the Options page. The SortedData option specifies what column you wish to sort by while inserting the data. This option defaults to nothing, which means false. You'll almost never have a need to set this option, but if you do, type the column name that you wish to sort by. The MaxErrors option specifies how many errors are acceptable before the task is stopped with an error. Each row that does not insert is considered an error; by default, if a single row has a problem, the entire task fails.

Note

The Bulk Insert task does not log error-causing rows. If you want your bad records to be written to an error file or table, it's better to use the Data Flow task.