Moving a Mountain of Data


There are several ways of moving large amounts of data, either within a single server or between servers. Importing and exporting tasks vary in SQL Server, and each task has an appropriate implementation. The first technique we will discuss is a more traditional mechanism for moving data, BCP. A closely related technique involves the use of BULK INSERT, a relatively new feature that's really just an extension of the BCP functionality.

Backup/restore is also a common method used for moving data. In this approach, data can be backed up, placed onto CD, and easily transported to another location for restore. This technique is also common when a test or training copy of the data is desired within the same server where the "live" data resides.

Another technique, which actually represents several different techniques, involves the use of replication. Replication of data is a feature of SQL Server specifically used to create copies of data. Completed copies, partial copies, and warm backups all use data replication as the primary method of implementation.

Data Transformation Services (DTS) is probably the most full-featured and flexible technique for moving data. It is usually implemented in instances in which data is to be massaged during the movement of the information. Whether it be minor processing of the information or something more lucrative, the DTS tools are quite diverse and very capable of handling any such task.

The previously mentioned techniques are the most common, but there are also some custom techniques involving a little coding and scheduling of jobs that can be used. Such is the case when loading data cubes into data warehouses in preparation for data analysis. With so many different techniques available, it is important to know the advantages and disadvantages of each method. So in the next several sections we will look into each means to see where each is most appropriately put into service.

Mass Movements of Data via BCP

The Bulk Copy Program (BCP) is most often used when the database can be taken out of production temporarily to import large amounts of data. Of course, BCP also allows for the export of data, in which case the database can remain online. Using BCP for importing data is extremely fast. As the name indicates, BCP is a process used to copy large amounts of data.

BCP is not a specific implementation of SQL Server. The BCP command-line tool is often used from batch files. The command-prompt window is often forgotten as an execution environment. In the current world of graphics, colors, pictures, and automation, it is easy to become intimidated with a black empty window that contains only a blinking cursor. In fact, many end users never see this perspective on the operating system as shown in Figure 5.1. In fact, however, this window is pretty easy to use if you need quick access to the operating system to execute applications.

Figure 5.1. Command-line entry window with BCP assistance.


BCP's biggest asset is its speed. It moves information into and out of a database using data files. BCP is normally used to do one of the following:

  • Bulk copy from a table, view, or the resultset into a native-mode data file in the same format as the table or view.

  • Bulk copy from a table, view, or the resultset into a data file in a format other than the one that the table or view is in. In this instance, a format file is created that defines the characteristics of each column. If all columns are converted to character format, the resulting file is called a character-mode data file.

  • Bulk copy from a data file into a table or view. A format file can be used to determine the layout of the data file.

  • Load data into program variables and then import the data into a table or view a row at a time.

As with most command-line operations, a listing of options can be provided easily using the /? command switch. Knowing the listing of options, or switches as they are more appropriately known, doesn't always make it easier to use the command. BCP has several unique standard implementations, each using a separate set of switches.

As stated, BCP is best suited to loading data into a database quickly. BCP does not create tables. You must have a table set up and waiting for BCP before you run BCP. This is the basic syntax of the BCP operation:

 bcp <table> <in or out> <file> <security information> <format information> 

To avoid unnecessary confusion, we will avoid a lot of the specifics of each implementation and variation of the coding. Mastering all the options of the BCP command will require a lot of practice and additional research beyond the scope of this book. The following represents a simple export of data:

 bcp "Northwind.dbo.Products" out "Products.txt" -c -q -U"sa" -P"password" 

Assuming a very simple password for the sa account of password, the command will create a text file from the Products table of the Northwind database.

Although BCP is a long-standing tool available to use with SQL Server, you are not likely to see it on the exam other than by definition of how it is used. You are not likely to be asked for the specifics of its command coding structure. Focus on knowing where to implement BCP and knowing the available BULK INSERT options.


For large files with more than a couple thousand rows, perhaps, you should turn on the SELECT INTO/BULKCOPY option for the database, or set the database recovery mode to SIMPLE. This option disables all transaction log backups while they are turned on, and you must do a full backup to get transaction log backups to work afterward. The option affects certain operations, namely those involving SELECT INTO and BULK COPY, by changing how transaction logging works. If you do set the recovery mode to SIMPLE, it is recommended that you turn it back to FULL when you have completed your process.

BCP is a pretty powerful tool, but it's kind of difficult to use when you need to just read a file from within SQL Server as part of a script or scheduled job. It sure would be nice if there were a T-SQL equivalent.

The BULK INSERT statement implements part of BCP inside SQL Server, so it has all the speed of BCP with an easier-to-use interface. BULK INSERT will allow for the copying of a data file into a database table or view with a user-specified format. However, the BULK INSERT statement cannot copy data from a SQL Server out to a data file. The BULK INSERT statement allows you to copy data one way only. BULK INSERT copies data into SQL Server using the functionality of BCP from within SQL Server via a T-SQL statement. Although it uses most of the same options, it doesn't need to know which server to use or what security to use because you use it from within T-SQL, so it runs on that server with the security context with which you logged in. Both of the BCP variations require a lot of command knowledge. BCP can be rather complex and it is worth taking a look at how to import and export data using the graphical tools.

Using Backup/Restore for More Than Recovery

Although the most common use of the BACKUP and RESTORE operations (DUMP and LOAD were the previous versions) is for safeguarding data, the two commands are also useful in moving large amounts of data. In particular, when a set of data is needed for testing or training purposes, the backing up of one database can be restored into another. This is a quick and thorough technique for gaining a copy of the original data for testing.

Use Backup/Restore to move copies of data over long distances, particularly when there is a large amount of data to be moved. A CD/DVD can be couriered faster than electronic transmission over a slow WAN link.


A complete backup can be restored into the same database for recovery, but it can also be restored to a different database or server to create a duplicate set of data. In this event the data will need to be refreshed on a periodic basis. The data refresh of a test or training database can be scheduled. The restore operation of data into the test database needs to be scheduled to occur after the backup operation of the live database.

A restore operation can be scheduled through the use of the T-SQL Restore command, the basics of which are easily implemented. If you are using the restore operation to overwrite a different database, you will need to use a few more options as given in the following code:

 Restore FileListOnly From Backups Restore Database Back From Backups  With Move 'LiveData.mdf' To 'Back.mdf',  Move 'LiveData.ldf' To 'Back.ldf',  Replace 

The initial restore operation obtains the list of files on a backup device. This is needed if you're restoring from a device not originally associated with the database being restored into. The second restore will perform the actual operation, moving the files that were originally backed up, and will replace the database, overwriting the previous contents.

Data Movement with Manipulation

Basic imports and exports of data can be performed using the Data Transformation Services Import/Export Wizard. This tool uses SQL Server DTS to copy data into and out of SQL Server using nice, easy-to-understand graphical tools. In addition to working with SQL Server, DTS can copy from any data source that is ODBC compliant to any other data source. This functionality means that data can be combined from all sources regardless of the program used to process and store the data originally.

The only really tricky part of the entire wizard is the transformations. DTS enables you to write transformations in VBScript that can make simple changes to data, such as formatting or localizing. By clicking on the Transform window, you can go into the transformation and change the VBScript so that it changes the data format.

After you have completed the definition via the wizard, you can save the package for future useand this is only the beginning. The DTS editing environment allows you to turn a simple input/output operation into a complex business process that can be repeated and scheduled to occur on a regular basis.

There is little purpose behind the storage of data if you can't get the data out of the system in a meaningful manner. The SELECT statement is the basis for most of the activity performed in data retrieval. It is the first statement for a SQL developer to master, because its use is varied and can involve many options.



    EXAM CRAM 2 Designing and Implementing Databases with SQL Server 2000 Enterprise
    MCAD/MCSE/MCDBA 70-229 Exam Cram 2: Designing & Implementing Databases w/SQL Server 2000 Enterprise Edition
    ISBN: 0789731061
    EAN: 2147483647
    Year: 2005
    Pages: 154

    flylib.com © 2008-2017.
    If you may any questions please contact us: flylib@qtcs.net