Chapter 5. Retrieving and Modifying DataTerms you'll need to understand:
Techniques you'll need to master:
|
Introduction
We now
|
Moving a Mountain of Data
There are several ways of moving large amounts of data, either within a single server or between servers. Importing and exporting
Backup/restore is also a common method used for moving data. In this approach, data can be
Another technique, which actually represents several different techniques, involves the use of replication. Replication of data is a feature of SQL Server
Data Transformation Services (DTS) is probably the most
The previously mentioned techniques are the most common, but there are also some custom techniques involving a little coding and scheduling of jobs that can be used. Such is the case when loading data cubes into data warehouses in preparation for data analysis. With so many different techniques available, it is important to know the advantages and disadvantages of each method. So in the
Mass Movements of Data via BCP
The Bulk Copy Program (BCP) is most often used when the database can be taken out of production temporarily to import large amounts of data. Of course, BCP also allows for the export of data, in which case the database can
BCP is not a specific implementation of SQL Server. The BCP command-line tool is often used from batch files. The
Figure 5.1. Command-line entry window with BCP assistance.
BCP's biggest asset is its speed. It moves information into and out of a database using data files. BCP is normally used to do one of the following:
As with most command-line operations, a listing of options can be provided easily using the
/?
command switch. Knowing the listing of options, or switches as they are more appropriately known, doesn't always make it easier to use the command. BCP has several unique standard
As stated, BCP is best suited to loading data into a database quickly. BCP does not create tables. You must have a table set up and waiting for BCP before you run BCP. This is the basic syntax of the BCP operation: bcp <table> <in or out> <file> <security information> <format information> To avoid unnecessary confusion, we will avoid a lot of the specifics of each implementation and variation of the coding. Mastering all the options of the BCP command will require a lot of practice and additional research beyond the scope of this book. The following represents a simple export of data: bcp "Northwind.dbo.Products" out "Products.txt" -c -q -U"sa" -P"password" Assuming a very simple password for the sa account of password , the command will create a text file from the Products table of the Northwind database.
For large files with more than a couple thousand rows, perhaps, you should
BCP is a pretty powerful tool, but it's kind of difficult to use when you need to just read a file from within SQL Server as part of a script or scheduled job. It sure would be nice if there were a T-SQL equivalent.
The
BULK INSERT
statement implements part of BCP inside SQL Server, so it has all the speed of BCP with an easier-to-use interface.
BULK INSERT
will allow for the copying of a data file into a database table or view with a
Using Backup/Restore for More Than RecoveryAlthough the most common use of the BACKUP and RESTORE operations ( DUMP and LOAD were the previous versions) is for safeguarding data, the two commands are also useful in moving large amounts of data. In particular, when a set of data is needed for testing or training purposes, the backing up of one database can be restored into another. This is a quick and thorough technique for gaining a copy of the original data for testing.
A complete backup can be restored into the same database for recovery, but it can also be restored to a different database or server to create a duplicate set of data. In this event the data will need to be refreshed on a periodic basis. The data refresh of a test or training database can be scheduled. The restore operation of data into the test database needs to be scheduled to occur after the backup operation of the live database. A restore operation can be scheduled through the use of the T-SQL Restore command, the basics of which are easily implemented. If you are using the restore operation to overwrite a different database, you will need to use a few more options as given in the following code: Restore FileListOnly From Backups Restore Database Back From Backups With Move 'LiveData.mdf' To 'Back.mdf', Move 'LiveData.ldf' To 'Back.ldf', Replace The initial restore operation obtains the list of files on a backup device. This is needed if you're restoring from a device not originally associated with the database being restored into. The second restore will perform the actual operation, moving the files that were originally backed up, and will replace the database, overwriting the previous contents. Data Movement with ManipulationBasic imports and exports of data can be performed using the Data Transformation Services Import/Export Wizard. This tool uses SQL Server DTS to copy data into and out of SQL Server using nice, easy-to-understand graphical tools. In addition to working with SQL Server, DTS can copy from any data source that is ODBC compliant to any other data source. This functionality means that data can be combined from all sources regardless of the program used to process and store the data originally. The only really tricky part of the entire wizard is the transformations. DTS enables you to write transformations in VBScript that can make simple changes to data, such as formatting or localizing. By clicking on the Transform window, you can go into the transformation and change the VBScript so that it changes the data format. After you have completed the definition via the wizard, you can save the package for future useand this is only the beginning. The DTS editing environment allows you to turn a simple input/output operation into a complex business process that can be repeated and scheduled to occur on a regular basis. There is little purpose behind the storage of data if you can't get the data out of the system in a meaningful manner. The SELECT statement is the basis for most of the activity performed in data retrieval. It is the first statement for a SQL developer to master, because its use is varied and can involve many options. |