Chapter 15: Working with Operational Data | Microsoft SQL Server 2005: The Complete Reference: Full Coverage of all New and Improved Features

All good things come in threes. You have the three amigos, the three stooges, the three musketeers, and so on. SQL has its famous trio, too: INSERT, DELETE, and UPDATE. I often remark to my clients that you cannot expect a database management system to work for you if you do not insert data into the system. Without these three members of SQL’s data manipulation language (aka data modification language), DML, there is no way to get data into your database, to modify records, or to delete them.

The SELECT statement I discussed in the previous chapter is also considered a member of the SQL’s data “DMLition” team, but we were able to chew on SELECT, and its new collateral features, because we already have data to work with in the old pubs, northwind or AdventureWorks demo databases, which you can download from Microsoft or upgrade from SQL Server 2000. However, the three statements I will explore in this chapter often need a fourth member of the posse, in the form of SELECT, to hit on some heavyweight data crunching problems, and I will discuss this as well.

Understanding the DML three is essential before you tackle Extract Transform and Load projects, online analytical processing (OLAP), or any online transaction processing (OLTP). In this chapter, I will go over all three statements in detail, discussing the basics of their usage in T-SQL and some new and advanced features for the accomplished DBAs and developers.

INSERT

The INSERT statement is straightforward. In its most basic form, it looks like this:

 INSERT Orders VALUES (value)

The target table f ollows the INSERT keyword, and in the preceding example that table is Orders. After the table, you need to add the VALUES keyword (case doesn’t matter), which is then followed by a list of values. In the preceding example, there is only one value, but if the table you were to insert into had several columns, you would need to provide a list of values separated by commas for each column you had a value for. If you didn’t have values for each column, that column would have to accept default values or allow NULL values, Consider the following example and imagine only two columns to keep it simple for now;

 INSERT Orders VALUES (5, 'Cat Food')

The VALUES list, which is enclosed in the parentheses (required), represents each column in the table starting with the first. Here, the value 5 of type integer is inserted into column 1 (or zero if you are looking at it from a zero-based result set), while the second value, of a character type, is placed into the second column.

So now go back to your imagination and add a few more columns and imagine you want to install a value into the “Amount” and “Description” columns and that both columns are somewhere in the middle of a multicolumn table. Your INSERT code will now look like this:

 INSERT Orders (column 2, column 4) VALUES (5, 'Cat Food')

The integer value of 5 is inserted into column 2, while the character (description) value of “Cat Food” is inserted into column 4. You can also identify the columns by name like this:

 INSERT Orders (Amount, Description) VALUES (5, 'Cat Food')

Supplying the column names is essential if you specify columns that are not in order, and possibly even in the wrong sequence. The following code demonstrates switching the order of the columns in the INSERT statement:

 INSERT Orders (Description, Amount) VALUES ('Cat Food', 5)

Is this easy enough for you? Sit tight, we are still at T minus 10 and counting…. Before going further, let’s look over the cryptic syntax comprising the INSERT (I have added a cut line to suggest that the hint section is optional and you will probably not use the hints for 99.9 percent of your insert operations; after all, hints are only for very accomplished developers):

 [ WITH <common_table_expression> [ , . . . n ] ] INSERT     [ TOP ( expression ) [ PERCENT ] ]     [ INTO]     { <object> | rowset_function_limited       [ WITH ( <Table_Hint_Limited> [  . . . n ] ) ]     } {     [ ( column_list ) ]     [ <OUTPUT Clause> ]     { VALUES ( { DEFAULT | NULL | expression } [  , . . . n ] )     | derived_table     | execute_statement     } }     | DEFAULT VALUES [; ] <object> ::= {     [ server_name . database_name . schema_name .       | database_name .[ schema_name ] .       | schema_name .     ]         table_or_view name }

So where did the INTO keyword spring from? I delayed mentioning it because the INTO is optional and belongs to the ANSI SQL specification. If you want to keep your code as standard as possible, you can include the INTO keyword, but it makes no difference to the SQL parser in SQL Server.

I have already discussed the table name, but as the syntax notes, you can add the optional hint directive to coerce SQL Server to optimize in a user-defined way

The T-SQL syntax demonstrates that you can also insert into views and the table data type and the new common table expression (CTE) using the WITH directive. You can also just as simply insert into temporary tables designated as temporary by the # (pound or hash) sign. For example, the following code:

 INSERT #Orders1 (Amount, Description) VALUES (5, 'Cat Food')

inserts the data into a temporary table created earlier. Or your recipe might need a temporary table to be created on the fly, using DDL and DML in the middle of a stored procedure perhaps, to cause a result set to persist for the duration of the connection:

 CREATE TABLE #Orders1 (Amount int DEFAULT 0, Description varchar (50), Notes varchar(max) NULL) INSERT #Orders1 (Amount, Description, Notes) VALUES (5, 'Cat Food', NULL)

I have touched a little here on the idea that you will often need to work with default and NULL values. You will also find yourself working with identity column values and columns in which SQL Server automatically adds the value, such as columns of the timestamp data type. We will deal with default, auto, and NULL values in a moment.

The view_name argument implies that you can insert data into a view, and you would insert rows into a view just as you would insert rows into a table or a temporary table. The new table data type, however, presents an interesting new addition to the row receptacles now supported by T-SQL.

The bullet list that follows here lists the various forms in which INSERT can be used:

INSERT with NULL, default, and system-supplied values
INSERT with @@Identity
Using INSTEAD OF triggers on INSERT actions
Using OPENROWSET and BULK to bulk-load data
Inserting data that is not in the same order as the table columns
Inserting data with fewer values than columns
Inserting data into a table with an identity column
Inserting data into a uniqueidentifier column by using NEWID()
Inserting data into a table through a view
Inserting data using the SELECT and EXECUTE options
Inserting data by using the TOP clause
Using OUTPUT with an INSERT statement
Using WITH common table expression with an INSERT statement
Using OUTPUT with identity and computed columns

INSERT with NULL, Default, and System-Supplied Values

When you need to work with tables that have more columns than you are providing values with your INSERT statement, you need to either provide default values for the unspecified columns or allow the unspecified columns to accept NULL.

If your solution requires you to keep NULL out of the tables, you will have to use default values. These values must correspond to the data type accepted by the column or they must be convertible by SQL Server; or you’ll need to explicitly convert them using CAST or CONVERT (see Chapter 16). You can also create default objects that have a value property and supply the object and owner name as the default’s parameter.

If you do not specify the default, the insert may still succeed as long as the column accepts NULL. If the column is defined to refuse NULL, the insert will fail. In this example inserting in the temporary table we created earlier, the insert installs a default value of zero for the number of cans of cat food:

 INSERT #Orders1 (Amount, Description, Notes) VALUES (DEFAULT, 'Cat Food', NULL)

You must remember that INSERT means that you are inserting a new row into the table (actually a new row is appended) and providing data for the column values incorporated by your new row. So constraints and triggers will fire accordingly for the table, and any relationships between the target for the insert and other tables (primary and candidate keys) need to be taken into account (more about that shortly).

There are two additional circumstances in which SQL Server will automatically install values into columns. These circumstances are as follows:

The table includes an identity column. SQL Server will automatically install the appropriate value.
The column takes a timestamp value. SQL Server installs the timestamp automatically.

Interestingly, however, you can override the identity column value and explicitly provide a value for the identity column as long as you first invoke the IDENTITY_INSERT option. This is done using a SET statement, as follows:

 SET IDENTITY_INSERT #Orders1 ON INSERT #Orders1 (ItemNumber, Amount, Description, Notes) VALUES (1234, DEFAULT, 'Cat Food', NULL) SET IDENTITY_INSERT #Orders1 OFF

Don’t forget to turn the IDENTITY_INSERT to OFF after the inserts complete or the next time you run the same statement, SQL Server will pop back an error telling you that IDENTITY_INSERT is already set to ON.

Caution

If you explicitly force a user-defined value into the identity column, you need to be sure that you are not going to violate integrity.

INSERT and @@IDENTITY

A useful function is @@IDENTITY, which you can call after an INSERT, SELECT INTO, or BULK INSERT (discussed shortly), or after a bulk copy operation. This function returns the last identity value generated by an insert operation, as described earlier.

The identity value is not generated if the statement failed and tables were not affected, in which case @@IDENTITY returns NULL. The value returned is always the value supplied to the last insertion, and therefore, when you insert multiple rows, @@IDENTITY always returns the last identity value generated.

The @@IDENTITY value is generated by the system for your operation (as the identity provided to the last insert). Even if your statement causes one or more triggers to perform insert operations that generate identity values, calling @@IDENTITY immediately after the statement will return the last identity value generated by the last trigger. You can then use the value as needed by the application. In the following example, I simply select the identity value and use it as an item number:

 INSERT ITEMS (Item)   VALUES ('Calamari')   SELECT @@IDENTITY AS 'New Menu Item'

SCOPE_IDENTITY() returns the value only within the current scope, while @@IDENTITY is not limited to a specific scope. IDENT_CURRENT() is not connected to any scope or session; it returns the identity value of a table name that you must pass as an argument of the function. Consider the following table:

Identity Function	What Identity It Returns
@@IDENTITY	Last identity on your connection
SCOPE_IDENTITY()	Value of current scope
IDENTITY_CURRENT(T)	The last value for table T

Tip

You can quickly spot the built-in functions that come in SQL Server, because they are not prefixed with the double ampersand (@@).

To see this working, knock up the following code and run it in a query window against a demo or lab database:

 CREATE TABLE T1 (ItemID Int IDENTITY) CREATE TABLE T2 (ItemID Int IDENTITY) GO CREATE TRIGGER Trigl ON T1 FOR INSERT  AS    BEGIN     INSERT T2 DEFAULT VALUES    END GO INSERT T1 DEFAULT VALUES SELECT @@IDENTITY SELECT SCOPE_IDENTITY() SELECT IDENT_CURRENT('T1') SELECT IDENT_CURRENT('T2')

Using NEWID()

If you need to provide a row with an unique identifier, NEWID() will do the trick. Simply call the function and assign the returned GUID to a variable of type unique identifier as follows:

 DECLARE @MYID uniqueidentifier  SET @MYID = NEWID()  SELECT @MYID

INSERT with SELECT

Did I not tell you that SELECT will meet up with us here? SELECT can be used in the INSERT statement for a number of useful functions. The primary reason you would toss a SELECT into the INSERT statement is to gather data from one table and insert it into another. That can be easily achieved with the following statement (I have added cut lines to emphasize the inclusion of SELECT:

 CREATE TABLE #Orders1  (OrderItem int, CustID varchar (20)) --8< ----add the insert/select in here ------------------------       INSERT #Orders1       SELECT OrderItem, CustID FROM Orders --8< -------------------------------------- SELECT * From #Orders1

The INSERT…SELECT code between the cut lines specifies to SQL Server to insert into temporary table #Orders1 the result set from the ensuing SELECT statement. The SELECT statements can be as complex as you need them to be to return the desired result set for insertion into your table or table variable. It can even be the result of a JOIN from hell.

Note

INSERT…SELECT is very similar to SELECT INTO discussed in the preceding chapter.

You can also use the SELECT in your INSERT statement to return the result set of the last insertion or a specific selection of rows. For example, on an order entry system it is useful to insert a new order into the Orders table and then pull a result set using SELECT back to the client in the same connection.

The second SELECT in the preceding code does exactly that. The result set of the first SELECT is not returned to the client, because it gets inserted into the #Orders1 table. The second SELECT returns a result set to the client. You can obviously include a sophisticated search condition to return an updated result set that was narrowed down to the rows of interest to the client. The preceding code works for the temporary table because it persists for the connection. If you ended your session from SQL Server, you would not be able to SELECT from the same temporary table because SQL Server would have purged it from tempdb.

It goes without saying that you can also include the TOP optional clause in the SELECT statement

 CREATE TABLE #Orders1  (OrderItem int, CustID varchar (20)) GO    INSERT #Orders1    SELECT TOP 10 OrderItem, CustID FROM Orders GO SELECT * From #Orders1

but you might need the extra garnish like ORDER BY to achieve the desired result set.

INSERT with EXECUTE

INSERT…EXECUTE works like INSERT…SELECT. EXECUTE fires a stored procedure, function, or SQL statement that will return a result set for insertion into the target table or table variable. In many respects, very little different is happening between the two INSERT extensions. The result set that returns for the insertion is ultimately derived from a SELECT, no matter that it is buried inside a stored procedure, an extended stored procedure, a function, or the primary code.

The syntax is also straightforward, as follows:

 INSERT [ INTO] { table_name WITH ( < table_hint_limited > [ . . .n ] ) | view_name | rowset_function_limited }   execute_statement

Your EXECUTE statement can either call the procedure or provide an inline SQL statement. For example, the little statement

 INSERT NewItems (Items) EXEC ('SELECT * FROM OldItems')

copies all of the rows in OldItems and inserts them into NewItems. If you use a complete SQL statement as demonstrated earlier, remember to enclose the statement between single quotes (many of us forget that). You do not need the quotes when executing a proc or function, as follows:

 INSERT CurrentAgents (Agent) EXEC sp_getagents

Note

The system stored procedure sp_executesql may be used instead of EXECUTE (and temporary stored procedures) to make code easier to read and improve query performance (See “Parameterized Queries” later in this chapter).

INSERT WITH DEFAULT VALUES

Inserting with default values is the easiest of the INSERT statements because you do not need to specify columns or values, relying on SQL Server to install defaults as defined for the column or as set up in the default objects. Here’s the syntax:

 INSERT [INTO] {  table_name WITH ( < table_hint_limited > [ . . .n ] ) |  view_name |  rowset_function_limited } DEFAULT VALUES

Naturally, running this query against a table that does not yet have defaults established will either cause it to fail or install NULLs if the columns are set to accept them. You probably also noticed that no column list is required here. The INSERT will fail with an error even if you supply every column in the target table.

Keeping Tabs on Errors

The ability to report errors in the middle of your INSERT statements is useful, especially when you run a batch of INSERT statements. The following code checks for an error and switches out of the INSERT batch to an error handler, VB style:

 INSERT BoundToFail (PK_Column) Values ('DuplicateValue') IF (@@ERROR <> 0) GOTO MAKEITBETTER MAKEITBETTER:     EXEC DoSomethingPlease

As a matter of interest, a batch of several INSERTS are isolated from each other with respect to errors. When an INSERT in a batch fails, the remaining INSERTs will continue unless you trap the error and stop the remainder of the batch from executing.

This error isolation between INSERT statements is useful for inserting bulk data into a table where data is drawn from several sources in which duplicate records might exist. Inserting addresses for a mailing list company or managing e-mail addresses from a Web site are good examples of where duplicate records are bound to exist. You can thus build a loop that contains an INSERT that installs data into a table without first checking for duplicate rows. A unique key constraint, even a primary key on your new table, causes the duplicate insertions to error out, which will cause the INSERT to discard the row, but the loop continues inserting until all source data is exhausted. I have done stuff like this often (even with string lists in Delphi, VB, or Java), using exceptions to achieve a tacit result.

Tip

The TRY…CATCH construct is recommended for new code supported by SQL Server 2005.

BULK INSERT

I can guarantee that, as a SQL Server DBA or SQL Server developer, you will have your chance to insert what seems to be a bazillion records into a table. Typical scenarios I hinted at earlier include bulk-inserting mailing lists and e-mail addresses culled from Web sites, moving operational data into analytical data stores, and so on (the latter is becoming less of an issue thanks to SQL Server’s new support for Internet applications. I also predict that bulk insert and bulk copy operations will catch fire as DBAs rush to get data out of their inferior DBMS products into SQL Server 2005. The BULK INSERT statement is a T-SQL front end to the command-line bulk copy program, bcp.exe.

Note

SQL Server Integration Services (SSIS) can be used to import and export data. SSIS is discussed briefly in Chapter 7.

But there will be many other occasions you will need to use the BULK INSERT over a command line utility My data center work in large Fortune 500 companies would often include receiving banking transactions from several banks the companies used. The data represented, for example, direct deposit information, and records listing which checks had cleared the bank accounts.

I would typically download each day humongous text files containing tens of thousands of lines of information from as many as ten major banks. This information comes as delimited text files, and the data has to be inserted into the database so that the financial analysts can keep tabs on the company’s financial health ... thanks to that persistent bug that plagues our existence on this earth-bound plane-affectionately known as cash flow.

The BULK INSERT is ideal for such data loading. You might run your script in a query window in Management Studio, build a GUI front end, or create a service, as I did to automatically download and insert the data every day. The syntax of your T-SQL looks like this:

 BULK INSERT  [ database_name . [ schema_name ] . |  schema_name . ] [ table_name | view_name ]    FROM 'data_file'   [ WITH  ( [  [ , ] BATCHSIZE = batch_size ] [  [ , ] CHECK_CONSTRAINTS ] [  [ , ] CODEPAGE = { 'ACP' | 'OEM' ^'RAW' | 'code_page' } ] [  [ , ] DATAFILETYPE =     {'c char' | 'native'| 'widechar' | 'widenative' } ] [  [ , ] FIELDTERMINATOR = 'field_terminator' ] [  [ , ] FIRSTROW = first row ] [  [ , ] FIRE_TRIGGERS ] [  [ , ] FORMATFILE = 'format_file_path' ] [  [ , ] KEEPIDENTITY ] [  [ , ] KEEPNULLS ] [  [ , ] KILOBYTES_PER_BATCH =kilobytes_per_batch ] [  [ , ] LASTROW = last_row ] [  [ , ] MAXERRORS = max_errors ] [  [ , ] ORDER ( { column [ ASC | DESC ] } [ ,...n ] ) ] [  [ , ] ROWS_PER_BATCH = rows_per_batch ] [  [ , ,] ROWTERMINATOR = 'row_terminator' ] [  [ , ,] TABLOCK ] [  [ , ,] ERRORFILE = 'file_name' ]  ) ]

The arguments required are all documented in Books Online, but the text is as sterile as moon rock. You will find further hands-on information here on several important arguments.

Target Databases and Tables

The first arguments before the FROM specify the database name and owner in the dot notation qualifier style of object.owner-also known as a fully qualified table name (FQTN). Then you need to specify the target table for the data. If you leave out the database details, the current database is assumed. You need to make sure the current database is going to accommodate the insert, or it will fail. If you do not specify the owner, and the login executing the script is not the owner, SQL Server will cancel the operation.

Source of Data

The FROM clause specifies the source data file. This is the text file containing the data you wish to load into the database. This file can reside anywhere on your network. Obviously if it is local to the computer on which your script will run, performance will be much better because data transfer across the bus is faster by an order of magnitude over a 100/1000Mbit network being shared by everyone in the company. If the files you are going to work with are huge, you might not have any choice but to source them on the local drive. Some of my bulk inserts take place across the network, however, and they are usually small files that transfer a few thousand rows every day. This should thus be a daily chore so that data does not pile up on the network and become impossible to work with.

The location of the database server is also important. Obviously you are back to square one if you locate the data on the same server as the script and the server is located out on the network. Better to run everything on one machine, or at least from separate drives (see Chapter 6). If you absolutely have no choice but to process the raw data from a network location, then the source file needs to be specified using the UNC path or a drive mapping.

BATCH SIZE

The BATCHSIZE parameter is an important variable. This number specifies the number of rows inserted by each execution of the BULK INSERT statement. Well, let me tell you that the batch size you choose depends on a number of factors. For starters, the size of the data file needs to be taken into consideration. The network latency and both network and local host bandwidth discussed earlier also need to be taken into account. These factors consume resources, and if you do not work out specifically how much resources are consumed by the processing of the rows, you are likely to run out of resources and cause the bulk insert to fail, especially if server resources are already borderline.

The bad news is that when you fail in the middle of a bulk insert, the entire batch is lost because SQL Server sees the disconnection in the middle and rolls back the entire batch. So it would be crazy to try to load a bazillion rows in one iteration of the BULK INSERT statement unless you are sure your data is highly refined and you have gobs of RAM and fast hard disks. If you need to insert half a million rows, specify the batch size to be a small number that’s chosen in relation to the resources and the environment. Memory is released after each batch is processed, so if you loop through the BULK INSERT script at a hundred or so rows at a time, you will be sure that, if the process crashes or fails for some reason, you have only lost the current batch. Believe me, there is nothing worse than reaching row 999,999 in the Godzilla of bulk inserts only to blow the whole caboodle away on the last record.

After each batch of rows is installed, you can test for errors and rows inserted and so forth. You should also code a routine that checks the last row that was committed to the database, and you’ll need some means of identifying where in the raw data file the bulk insert should begin. If you have 900,000 rows processed and the next 1,000 rows fail, it does not make sense to reprocess the whole file all over again, and force SQL Server to manage duplicates again, and so on.

The BATCHSIZE argument is directly related to the ROWS_PER_BATCH argument that can also optionally be passed in the script. This argument applies if you do not specify the BATCHSIZE. But I prefer to specify the former with a parameter because ROWS_PER_BATCH causes SQL Server to try to eat the entire data file in one sitting (and available resources come into play again). If you do not specify either, SQL Server makes its own optimization choices, but it still has no control over disasters like a network crash, which would still cause loss of the last batch to be processed.

FIELDTERMINATOR

The FIELDTERMINATOR argument is essential to SQL Server so that it knows how to find and separate out the fields or columns in your data. If you do not specify the terminator, SQL Server will look for tabs, which are the default. As you know, there are several characters you can use to separate out the fields. The most common are commas, quotes, semicolons, and tabs. What you specify here depends on the source data file. If you have control over the process that created the source file (such as SSIS or another system that outputs the data for you), then you could simply specify tabs and leave the argument to the SQL Server default. Often you have no choice, but it helps to ask the gurus in control of the output (with me it was the batch processing people at the banks) to give you the terminator of your choice, not that it makes any difference to SQL Server what you throw at it.

Caution

Do not forget to enclose the terminator values between single quote marks.

 BULK INSERT Reconciliation.dbo   FROM '\\SERV04\FTP\ACCPAY.TXT'   WITH   (     FIELDTERMINATOR=' ; ' ,     ROWTERMINATOR=' ; \n '    )

FIRSTROW/LASTROW

The FIRSTROW parameter is useful but only if you are keeping track of the contents of your file. You can easily create a counter in a front-end program or in T-SQL to keep track of the row number last processed by SQL Server, at the end of each successful batch, in the source data file. Then if you have to stop the process and restart it at a later time, you start the batch from the last row installed to the database. The LASTROW argument tells SQL Server the row number to stop at. After the row number specified, the processing terminates. The default is 0 if you omit the parameter, in which case SQL Server will run until there are no more rows left in the batch.

 BULK INSERT Reconciliation.dbo   FROM '\\SERV04\FTP\ACCPAY.TXT'   WITH   (     FIRSTROW = 9875,     FIELDTERMINATOR = ' ; ',     ROWTERMINATOR = ' ;\n ',     LASTROW = 20000    )

MAXERRORS

The MAXERRORS parameter is important too, especially if you use the bulk insert with unique constraints or with a primary key index on the table. If you omit the parameter, SQL Server assumes the default, which is 10 errors. After the MAXERRORS value has been reached, SQL Server terminates the processing. So if you are importing two million e-mail names and addresses and 100,000 of them are duplicates, SQL Server will continue until it has seen 100,000 errors. On the flip side of the coin, you may not be prepared to tolerate waiting around for the processing to finish if you have errors and the maximum error limit is set too high. Set it at 1 if you want the batch processing to end after the first hiccup.

 BULK INSERT Reconciliation.dbo   FROM '\\SERV04\FTP\ACCPAY.TXT'   WITH    (      FIRSTROW = 9875,      FIELDTERMINATOR = ' ; ',      ROWTERMINATOR = ' ;\n ',      LASTROW = 20000,      MAXERRORS = 100     )

ORDER

Another important argument is ORDER because BULK INSERT goes easier if your data, as it is sucked out of the source file, is explicitly ordered before the inserting begins. To use it, you must have a clustered index on the target table. SQL Server ignores the order if there is no clustered index and assumes that the data in the source files is unordered on any specific column (no matter what you specify). The insert performance is greatly improved if you use a clustered index on the columns you provide in the column_list. The clustered index must exist in the same order as the columns listed in column_list. If you intend to set multiple bulk insert processes against the table, it is better to drop the indexes and use the TABLOCK argument discussed next.

TABLOCK

Including the TABLOCK argument causes a special bulk insert table-level lock to be acquired by several clients on a target table with each client holding a lock for the duration of the their respective bulk insert operations, which happen concurrently. This lock obviates lock contention, but the lock in itself greatly improves insert performance. This argument is useful only if you have no indexes installed on the table, or you would have to remove them. It would be a good idea to use it if you do not have a clustered index to take advantage of the ORDER clause.

As demonstrated in Chapter 3, SQL Server is able to allocate separate threads and fibers to parallel processes, which makes it one of the few products that can facilitate multiple clients without having to downgrade or share resources among the connections. This means that you can launch multiple bulk insert operations, with each BULK INSERT process performing at the same level as its peer, as close to true parallelism as you will get on one CPU. But if you omit the argument, the insert degrades to a point where you might as well run single BULK INSERT statements sequentially

FORMATFILE

The FORMATFILE argument specifies the name and path of the format file generated with the BCP utility. A format file is used in situations in which the number of columns in the source file is different than that of the target table, the columns are in different orders, column delimiters vary, a combination of several factors. The BCP utility can also specify additional information in the format file.

BULK INSERT and Transactions

You can encase BULK INSERT operations inside transactions as discussed later in this chapter. However, using the BULK INSERT in a transaction with the BATCHSIZE argument will cause the rollback of all batches caught in a transaction that fails to commit.

BULK INSERT, Triggers, and Constraints

The foundation of BULK INSERT and BCP, the bulk copy API, can be commanded to force SQL Server to either fire or suppress triggers on your target table. By default, bulk copy or bulk insert operations ignore triggers. Pass the FIRETRIGGERS argument, fire_triggers, if your solution calls for you to fire the triggers.

If you fired triggers, the insert operations will be fully logged. The triggers are also fired once for each batch in the operation instead of once for each row insert (the trigger fires for each BULK INSERT statement that references the table).

 BULK INSERT Reconciliation.dbo   FROM '\\SERV04\FTP\ACCPAY.TXT'   WITH   (     FIRSTROW = 9875,     FIELDTERMINATOR = ' ; ',     ROWTERMINATOR = ' ;\n ',     LASTROW = 20000,     FIRE_TRIGGERS    )

Remember to change the recovery model of the database to Bulk-Logged Recovery. This recovery model (discussed in depth in Chapter 8) offers protection against media failure with the best performance and minimal log space usage for bulk insert or bulk copy operations. The actual inserts are minimally logged.