|< Day Day Up >|| |
This example demonstrates how to use the External File wizard to extract information from a flat file.
The External File source designer is a wizard that guides you through the steps that are required to create and execute a SAS ETL Studio job. The job extracts information from an external file and writes it to a SAS table. Typically, the SAS table is used as a source table in another SAS ETL Studio job.
The External File source designer enables you to do the following tasks:
extract information from flat files in fixed or delimited format. Supported file types are TXT, DAT, and CSV.
import column-aligned data or data that is not column-aligned. Data that is not column-aligned can be imported with single or multiple delimiters separating the values.
import variable length records and fixed-length records.
import character, numeric and nonstandard numeric data (such as currency data or signed numbers).
specify how missing values should be treated.
read data in which one record is spanned over multiple lines, as well as data in which multiple records are included in a single data line.
remove columns in the imported data; arrange the order of the columns, change attributes of any column, add new columns.
For column-aligned data, the External File source designer uses a sample of data from the external file, together with metadata that you enter, to estimate the length and data type of the columns. You can specify the rows used in sampling of data by specifying the start record and how many records should be included in the sample.
For the current example, assume that the following statements are true:
A data warehouse project plan specified a report that requires information from an external file. The external file is a flat file that is called employeeFlatFile.dat.
Information will be extracted from employeeFlatFile.dat into a SAS table called EmployeeSAS.
EmployeeSAS will be stored in a SAS library called Efiout. Assume that metadata for Efiout has been added to the main metadata repository for the example data warehouse. For details about libraries, see "Enter Metadata for Libraries" on page 44.
The main metadata repository is under change-management control. For details about change management, see "Working with Change Management" on page 64.
You have selected a default SAS application server for SAS ETL Studio, as described in "Select a Default SAS Application Server" on page 59.
Perform the following steps to begin work in SAS ETL Studio:
Start SAS ETL Studio as described in "Start SAS ETL Studio" on page 56.
Open the appropriate metadata profile as described in "Open a Metadata Profile" on page 58. For this example, the appropriate metadata profile would specify the project repository that will enable you to access metadata for the Efiout library.
You do not need to check out a library in order to add metadata about source tables or target tables in that library. Accordingly, the next task is to display the External File source designer.
To display the External File source designer, from the menu bar on the SAS ETL Studio desktop, select Tools Source Designer.
The Source Designer selection window is displayed, as shown in the following display.
Display 7.6: Source Designer Selection Window
From this window, take the following actions:
Click the External File icon.
The wizard attempts to open a connection to the default SAS application server. If there is a valid connection to this server, you might be prompted for a user name and a password. After you have provide that information, the External File Selection window is displayed.
Perform the following steps to specify how the external file will be accessed:
In the External File Selection window, select the SAS application server that will be used to access the external file, then specify a physical path to the external file.
The external file is probably remote from the SAS application server, so you might have to enter a remote path in the Path field, such as \\d9585\sources_external\employeeFlatFile.dat. The following display shows an External File Selection window with values that are appropriate for the current example.
Display 7.7: External File Selection Window
When the appropriate server and path and have been specified, click . The wizard reads the source file and tries to determine whether the source contains fixed-width data or delimited data. The Import Parameters window is displayed with some estimated parameters.
Perform the following steps to specify how information should be imported from the external file:
Review the estimated parameters and sample data that are displayed in the Import Parameters window. Update as needed. The following display shows an Import Parameters window with values that are appropriate for the current example.
Display 7.8: Import Parameters Window
When the import parameters are correct, click . The wizard reads the source file and derives default metadata for columns in the target (the SAS table), based on columns in the source (the external file).
For the current example, assume that the data in employeeFlatFile.dat is arranged in columns, and the Set Column Definitions window is displayed. The next task is described in "Specify Column Variables for the Target" on page 82.
However, if the data in the external file is not arranged in columns, the Fixed Width Column Definition window is displayed. In this scenario, the next task is described in "Specify the Width of Columns in the Target" on page 81.
If the data in the external file is not arranged in columns, use the Fixed Width Column Definition window to view the data in the source (external file) and specify the width of the columns in the target (SAS table).
To specify the width of a column, study the example data, decide where the columns should be, then click the location where the column should be. An arrow is added at each column location, as shown in the following display.
Display 7.9: Fixed Width Column Definition Window
The values and columns in the previous display do not match the data in the employeeFlatFile.dat file. They are taken from a different external file, one that does not arrange its data in columns.
After the appropriate columns have been specified, click . A temporary SAS data set is created with the column widths that you have specified on the Fixed Width Column Definition window. The Set Column Definition window is displayed, showing the effect of any changes that you made on the Fixed Width Column Definition window.
In the Set Column Definition window, you can accept the default column variable names in the Columns in the target group box, or you can update them.
Scroll to the right to view or update the Description, Length, Type, Format, and Informat fields. The following display shows a Set Column Definition window with values that are appropriate for the current example.
Display 7.10: Set Column Definition Window
The button in this window launches the Expression Builder window. In the context of the External File source designer, the Expression Builder enables you to build a WHERE clause to subset the data that is being imported from an external file. (To see an example of how the Expression Builder can be used to build a WHERE clause, see "Configure the SQL Join Transformation" on page 137.)
When the column metadata is correct, click . The SAS Destination window is displayed.
When the SAS Destination window is displayed, a number of fields have default values that must be updated. The following display shows the SAS Destination window before you have specified the desired library, member name (table name), and file format for the target (SAS data set or SAS data view).
Display 7.11: SAS Destination Window
In the SAS Destination window, select the library where the target will be stored (Efiout), a member name for the target (EmployeeSAS), and the file format of the target (SAS data set). The name for the target must follow the rules for SAS names.
When the physical storage information is correct, Click . The General Properties window for the target is displayed.
Perform the following steps to specify a descriptive name for the target:
In the General Properties window, specify a descriptive name for the target, and perhaps a brief narrative description.
The default descriptive name for the target is the member name that was entered in the SAS Destination window. A descriptive name does not have the same restrictions as a member name, so it can be changed to something that is easier to understand. For the current example, assume that the SAS data set name that you entered in the previous window (EmployeeSAS) is acceptable as a descriptive name.
When the general properties are correct, click . The Import Data Step Validation window is displayed.
In the Import Data Step Validation window, click to generate a SAS DATA step from the metadata that you have entered. If the DATA step has no errors, the Wizard Finish window displays. If the DATA step has errors, a window displays that enables you to view the SAS log and take other corrective action.
For this example, assume that the DATA step is valid, and the Wizard Finish window is displayed.
In the Wizard Finish window, review the metadata that you have entered. When you are satisfied that the metadata is correct, click . The following actions occur:
Metadata for the source (the external file) is added to a current metadata repository.
Metadata for the target (the SAS table) is added to a current metadata repository.
Metadata for the job that extracts information from the source and writes it to the target is added to a current metadata repository.
The job is submitted for execution.
If the job is successful, the target is created on the file system.
Under change management, new metadata objects are added to the Project tree on the SAS ETL Studio desktop, as shown in the following display.
Display 7.12: Project Tree with Output from the External File Source Designer
In the previous display, Create EmployeeSAS is the metadata for the job. Jobs that are created by wizards have names in the format Create target_name, where target_name is the name of the target.
employeeFlatFile.dat is the metadata for the external file. It will have the same name as the external file.
EmployeeSAS is the metadata for the SAS table into which information was extracted from the external file. The target has the descriptive name that you specified in the External File wizard.
You must check in the new metadata in order to save it to the change-managed repository.
In the Project tree, select the repository icon (Project: etlUser1).
From the menu bar on the SAS ETL Studio desktop, select Project Check In Repository.
All metadata objects in the project repository will be checked in to the change-managed repository. The new objects will be visible in the Inventory tree.
|< Day Day Up >|| |