Lesson 1: Transferring and Transforming Data

3 4

After you have created your user database, you need to populate it with data. Frequently, this consists of importing and transforming existing data from one or more external data sources (such as another database system) to SQL Server 2000. In this lesson, you learn the issues you will face when you choose to import data from another data source. This lesson will also introduce you to the primary tools used to import and transform data, including a brief discussion of the capabilities of each tool. These tools include DTS, Bcp, and the BULK INSERT Transact-SQL statement.


After this lesson, you will be able to

  • Describe the process of evaluating the quality and consistency of data in an external data source prior to data import
  • Understand the types of data transformations that may be necessary when importing data from existing data sources
  • Describe the tools provided by SQL Server 2000 for importing data

Estimated lesson time: 15 minutes


Importing Data

Populating your SQL Server 2000 user database frequently consists of the process of importing existing data from external data sources to a newly created destination database. These external data sources include Microsoft or third-party databases, spreadsheets, or text files. Before you can import this data, you must perform the following preliminary tasks to evaluate the external data and determine the steps that will be involved in the import process. These preliminary steps will also help you select the appropriate tool to use.

  • Determine the consistency of the existing data within the external data source. The external data source may not have enforced consistency when data was initially input (for example, if the names of states were entered as two-letter abbreviations in some instances and were fully spelled out in other instances).
  • Determine whether additional columns must be added. The existing data may be internally consistent but not include necessary columns because these values were assumed (such as values for country or an international telephone area code).
  • Determine whether the existing data format should be modified. The existing data may be internally consistent but not be represented in the manner you want to use in the destination database (for example, requiring a change in the date format or the conversion of numerical values to more readable string values, such as 1, 2, and 3 being converted to poor, average, and excellent).
  • Determine whether existing data columns should be modified. The existing data may be internally consistent but contain columns that need to be aggregated or separated (such as separating first and last names into separate columns or summarizing daily data into weekly or monthly data).
  • Determine whether the import of data will be a one-time task or a task that must be performed periodically. The existing data may need to be migrated from an existing legacy system that will be retired or may need to be extracted from an existing system from which you will extract data on a weekly or monthly basis.
  • Determine how to access the existing data. Is the external data source directly accessible, and do you have permission to access the data directly? (If not, the data will have to be exported into a format that SQL Server 2000 can work with, such as a delimited text file.)

DTS Data Transformations

After you have evaluated the data in each external data source, you need to determine how to proceed. Occasionally, changes to data can be made in the external data source, but usually these changes cannot be made in the external data source without either breaking existing applications (for example, adding columns or changing data formats) or consuming too much time (for example, manually enforcing data consistency where none existed previously). These changes can either be made after the data has been imported into SQL Server, using temporary tables and using Transact-SQL statements to scrub and cleanse the data, or can be made during the import process itself. Changes to the data made during the import and export process are referred to as DTS transformations. A DTS transformation occurs when one or more operations or functions are applied against data before that data arrives at the destination. The data at the source is not changed. Transformations make it easy to implement data scrubbing, conversions, and complex data validations during the import and export process.

The type and extent of modifications that must be made will help determine the SQL Server 2000 data transfer tool that you will use, and when you will perform the modifications. Also, whether the data import is a one-time task or a periodic task will frequently determine the tool you will use and how much you automate the necessary transformations. Planning and testing (using data subsets) is essential, particularly for large data sets.

Introducing the Data Transfer Tools

SQL Server 2000 provides a number of tools for importing and exporting data. These tools have differing capabilities to extract subsets of data from an existing data source and to transform data. Table 7.1 briefly describes each of these primary data transfer tools and their capabilities.

Table 7.1 Data Transfer Tools and Their Functions

Tool

Description

DTS

DTS is a graphical tool used to import, export, and transform data. DTS can work directly with a wide variety of data sources. DTS creates packages that can be scheduled. DTS can also import and export database objects schema (meta data) between SQL Server instances.

Bcp

Bcp is a command-prompt utility used to copy data from a text file to a SQL Server 2000 table or view (or from a SQL Server 2000 table or view to a text file) using ODBC. The transformation capabilities of Bcp are limited and require cryptic format files. Working with Microsoft or third-party databases is a two-step process.

BULK INSERT Transact-SQL statement

BULK INSERT is a Transact-SQL command used to copy data from an ASCII text file to a SQL Server 2000 table or view (but not from SQL Server 2000 to a text file) using OLE DB. The BULK INSERT statement provides the same functionality of Bcp (and has the same limitations) in a Transact-SQL statement and can be embedded in a DTS package.

Note


There are a number of other methods of moving data between data sources, but they are not normally used to initially populate a database. These additional methods include backup and restore (see Chapters 8 and 9), replication (see Chapter 15), ActiveX scripts, and the INSERT and SELECT INTO Transact-SQL statements using distributed queries (see Chapter 12).

Lesson Summary

When you are populating a database from existing data stored in an external data source, you must evaluate that data to determine what transformations, if any, must be performed on that data prior to or during importation. You must determine whether the import is a one-time task or will be a periodic task. You must also determine how you will access this data, either directly or through an exported text file. These factors will determine the data transfer tool you use, and when you transform the data.



MCSE Microsoft SQL Server 2000 System Administration(c) Training Kit (c) Exam 70-228
MCSE Microsoft SQL Server 2000 System Administration(c) Training Kit (c) Exam 70-228
ISBN: N/A
EAN: N/A
Year: 2001
Pages: 126

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net