3 4
Importing data from an external source into an instance of SQL Server is likely to be the first step that you will perform once you set up your database. After data has been imported into your database, you can start working with that data by using Transact-SQL statements and other tools to view and modify the data. You might also find that you need to export that data out of the database. Importing data is the process of retrieving data from sources external to SQL Server (for example, an ASCII text file) and inserting it into SQL Server tables. Exporting data is the process of extracting data from an instance of SQL Server into some user-specified format (for example, copying the contents of a SQL Server table to a Microsoft Access database). In this lesson, you will learn how to use the bcp command prompt utility to copy data to and from a SQL Server database and will learn how to use the BULK INSERT statement to import data from a data file to a database. With Data Transformation Services (DTS), you will also be able to extract, transform, and consolidate data from disparate sources.
The bcp command prompt utility copies SQL Server data to or from a data file. You will use this utility most frequently to transfer large volumes of data into a SQL Server table from another program often from another database management system (DBMS). When the bcp utility is used, the data is first exported from the source program to a data file and is then imported from the data file into a SQL Server table. Alternatively, bcp can be used to transfer data from a SQL Server table to a data file for use in other programs such as Microsoft Excel.
Data can also be transferred into a SQL Server table from a data file by using the BULK INSERT statement. The BULK INSERT statement cannot bulk copy data from an instance of SQL Server to a data file, however. With the BULK INSERT statement, you can bulk copy data to an instance of SQL Server by using the functionality of the bcp utility in a Transact-SQL statement (rather than from the command prompt).
In order for the bcp utility and the BULK INSERT statement to insert data, the data file must be in row and column format. SQL Server can accept data in any ASCII or binary format as long as the terminators (characters used to separate columns and rows) can be described. The structure of the data file does not need to be identical to the structure of the SQL Server table, because bcp and BULK INSERT enable columns to be skipped or reordered during the bulk copy process.
Data that is bulk copied into an instance of SQL Server is appended to any existing contents in a table. Data that is bulk copied from an instance of SQL Server to a data file overwrites the previous contents of the data file.
You should keep in mind the following guidelines when you bulk copy data:
The following bcp command copies data from the Publishers table in the Pubs database and into the Publishers.txt file:
bcp pubs..publishers out publishers.txt -c -T
The command first identifies the database (Pubs) and the table (Publishers) from which the data will be extracted. The out keyword specifies that the data will be exported from the table and into the Publishers.txt file; the -c switch specifies that a character (char) data format should be used; and the -T switch specifies that bcp should use a trusted connection to connect to SQL Server. The contents of the Publishers.txt file will include all of the data from the Publishers table. You can use a text editor, such as Notepad, to view the data.
You can also use the bcp command prompt utility to bulk copy data from the Publishers.txt file into the Publishers2 table in the Pubs database:
bcp pubs..publishers2 in publishers.txt -c -T
Notice that this command uses the in keyword rather than the out keyword. By running this command, you insert all data from the Publishers.txt file into the Publishers2 table.
You can also use the BULK INSERT statement from a query tool, such as Query Analyzer, to bulk copy the data into the Publishers2 table:
Use Pubs BULK INSERT Publishers2 FROM 'c:\publishers.txt' WITH (DATAFILETYPE = 'CHAR')
NOTE
The bcp utility can create or read data files in the default data formats by specifying a switch at the command prompt. The following table includes a description of the four default data formats:
Data Format | bcp Switch | BULK INSERT Clause | Description |
---|---|---|---|
Native | -n | DATAFILETYPE = 'native' | Uses native (database) data types. Storing information in native format is useful when information must be copied from one instance of SQL Server to another. Using native format saves time and space, preventing unnecessary conversion of data types to and from character format. A data file in native format cannot be read by any program other than bcp, however. |
Character | -c | DATAFILETYPE = 'char' | Uses the character (char) data format for all columns, providing tabs between fields and a new-line character at the end of each row as default terminators. Storing information in character format is useful when the data is used with another program, such as a spreadsheet, or when the data needs to be copied into an instance of SQL Server from another database. Character format tends to be used when copying data from other programs that have the functionality to export and import data in plain-text format. |
Unicode character | -w | DATAFILETYPE = 'widechar' | The -w switch (or widechar value for the DATAFILETYPE clause of the BULK INSERT statement) uses the Unicode character data format for all columns, providing (as default termi-nators) tabs between fields and a new-line character at the end of each row. This format allows data to be copied from a server (that is using a code page different from the code page used by the client running bcp) to another server that uses the same or different code page as the original server. This format prevents the loss of any character data, if the source and destination are Unicode data types. In addition, only a minimum number of extended characters are lost if the source and destination are not Unicode data types. |
Unicode native | -N | DATAFILETYPE = 'widenative' | Uses native (database) data types for all non-character data and uses Unicode character data format for all character (char, nchar, varchar, nvarchar, text, and ntext) data. |
By default, the bcp utility operates in interactive mode and queries SQL Server and the user for information required to specify the data format. When using the -n, -c, -w, or -N switches, however, bcp does not query for information about the SQL Server table on a column-by-column basis. Instead, SQL Server reads or writes the data by using the default format specified.
By default, the BULK INSERT statement operates in character mode (char). Interactive mode does not apply.
When using interactive mode to bulk copy data, the bcp utility prompts you for information regarding the storage type, prefix length, field length, and field and row terminators. The file used to store the format information for each field in the data file is called the format file. This format file provides the default information that is used either to bulk copy the data in the data file back into an instance of SQL Server or to bulk copy data from the table another time (without needing to respecify the format).
SQL Server DTS is a set of graphical tools and programmable objects that enable you to extract, transform, and consolidate data from disparate sources into single or multiple destinations. By using DTS tools, you can create DTS packages that enable you to create custom data-movement solutions tailored to the specialized business needs of your organization. The capability to import and export data is among the tasks that you can perform using DTS tools and packages.
Data Transformation Services includes a set of tools that enable you to create, schedule, and execute DTS packages. The following table describes DTS tools:
Tool | Description |
---|---|
DTS Import/Export wizard | Wizard used to copy data to and from an instance of SQL Server and to map transformations on the data. Of all the DTS tools, the DTS Import/Export Wizard provides the simplest method of copying data between OLE DB data sources. |
DTS Designer | Graphical tool used to build complex packages with workflows and event-driven logic. DTS Designer can also be used to edit and customize packages created with the DTS Import/Export Wizard. |
DTS and Enterprise Manager | Options available for manipulating packages and accessing package information from Enterprise Manager. |
DTS Package Execution Utilities | Package execution utilities include the following:
|
DTS Query Designer | A graphical tool used to build queries in DTS. |
A DTS package is an organized collection of connections, DTS tasks, DTS transformations, and workflow constraints that are assembled either with a DTS tool or programmatically, and then saved to SQL Server, SQL Server 2000 Meta Data Services, a structured storage file, or a Microsoft 0Visual Basic file.
Each package contains one or more steps that are executed sequentially or in parallel when the package is run. When executed, the package connects to the correct data sources, copies data and database objects, transforms data, and notifies other users or processes of events. Packages can be edited, password protected, scheduled for execution, and retrieved by version.
A DTS task is a discrete set of functionalities executed as a single step in a package. Each task defines a work item to be performed as part of the data movement and data transformation process or as a job to be executed.
Data Transformation Services supplies a number of tasks that are part of the DTS object model and that can be accessed graphically through DTS Designer or accessed programmatically. These tasks, which can be configured individually, cover a wide variety of data copying, data transformation, and notification situations. For example, the following types of tasks represent some actions that you can perform by using DTS:
Because DTS is based on an extensible Component Object Model (COM), you can create your own custom tasks. You can integrate custom tasks into the user interface of DTS Designer and save them as part of the DTS object model.
A DTS transformation is one or more functions or operations applied against a piece of data before the data arrives at the destination. The source data is not changed. For example, you can extract a substring from a column of source data and copy it to a destination table. The particular substring function is the transformation mapped onto the source column. You also can search for rows that have certain characteristics (for example, specific data values in columns) and apply functions against only the data in those rows. Transformations make it easy to implement complex data validation, data scrubbing, and conversions during the import and export process.
Data transformations enable you to perform the following operations against column data:
Data Transformation Services is based on an OLE DB architecture that enables you to copy and transform data from a variety of data sources:
DTS functionality might be limited by the capabilities of specific databases, ODBC drivers, or OLE DB providers.
Data Transformation Services steps and precedence constraints order work items in a DTS package. You can design the DTS package workflow graphically through DTS Designer or design the workflow programmatically. You also can use an ActiveX script to customize step execution.
Steps control the order in which tasks are executed in a DTS package. Steps represent the execution units in the DTS object model, and they define which tasks execute in what sequence when the package is run.
You can define the sequence of step execution in a package with precedence constraints that link two tasks based on whether the first task executes, whether it executes successfully, or whether it executes unsuccessfully. You can use precedence constraints to build conditional branches in a workflow. Steps without constraints are executed immediately, and several steps can execute in parallel. You can also use ActiveX scripts to modify workflow.
In this exercise, you will import data from text files into tables in the BookShopDB database. You will first import data into lookup tables, such as BookCondition and Position, and you will then populate other tables in the database. When you have completed importing data, you will export data from the BookShopDB database into text files that are created on your hard drive. To complete this exercise, you must copy the text files in the \Chapter07\Exercise1 folder of the Training Kit Supplemental CD-ROM to the root directory of the C: drive on your Windows 2000 computer. In addition, you should be logged into your Windows 2000 Server computer as Administrator.
NOTE
The Run dialog box appears.
A command prompt window appears.
The bcp command identifies the database (BookShopDB) and table (BookCondition) that will receive the imported data. The data source (BookCondition.txt) is also identified. The in keyword is used when importing data into a table. The -c switch specifies that a character (char) data type should be used, and the -T switch specifies that a trusted connection should be used.
After you press Enter, the bcp utility copies the data from the BookCondition.txt file into the BookCondition table. Once the data has been copied, a message appears and provides the number of rows copied, the network packet size, and the clock time.
USE BookShopDB SELECT * FROM BookCondition
In this statement, you are using a SELECT statement to retrieve data from the BookCondition table in the BookShopDB database.
The result set appears in the Grids tab of the Results pane. Notice that each row in the Description column contains a value of N/A. This value can be changed at any time to include a more detailed description of what each book condition means.
USE BookShopDB BULK INSERT Positions FROM 'c:\positions.txt' WITH (DATAFILETYPE = 'CHAR')
In this BULK INSERT statement, you are identifying the table that will receive the data (Positions) and the data source (Positions.txt). The statement uses a CHAR data type.
A message appears in the Messages tab of the Results pane, providing the number of rows affected.
What statements should you use?
Both tables should now be populated with the appropriate data.
What statements should you use?
NOTE
What bcp command should you use?
Notice that author 102 appears in the table twice. If more than one author wrote a book, the TitleID value for that book would appear in more than one row.
Every table in the BookShopDB database (except for the Orders table and the BookOrders table) should now be populated with data. Data is not added to the Orders table or to the BookOrders table until new orders are generated.
The bcp command identifies the database (BookShopDB) and table (Books) from which data will be exported. A text file (Books2.txt) will be created, and the data from the Books table will be copied into that file. The out keyword is used when exporting data from a table.
After you press Enter, the bcp utility copies the data from the Books table into the Books2.txt file. Once the data has been copied, a message appears providing the number of rows copied, the network packet size, and the clock time.
Notice that each row in the table is on a separate line and that a tab separates each column value within a row.
Importing data is the process of retrieving data from sources external to SQL Server and inserting the data into SQL Server tables. Exporting data is the process of extracting data from an instance of SQL Server into some user-specified format. The bcp command prompt utility copies SQL Server data to or from a data file. Data can also be transferred into a SQL Server table from a data file by using the BULK INSERT statement. The BULK INSERT statement cannot bulk copy data from an instance of SQL Server to a data file, however. Data that is bulk copied into an instance of SQL Server is appended to any existing contents in a table. Data that is bulk copied from an instance of SQL Server to a data file overwrites the previous contents of the data file. DTS is a set of graphical tools and programmable objects that enable you to extract, transform, and consolidate data from disparate sources into single or multiple destinations. By using DTS tools, you can create DTS packages with custom data-movement solutions tailored to your organization's business needs. A DTS package is an organized collection of connections, DTS tasks, DTS transformations, and workflow constraints that are assembled either with a DTS tool or programmatically and saved to SQL Server, SQL Server 2000 Meta Data Services, a structured storage file, or a Microsoft Visual Basic file.