Now you can practice what you've learned in this chapter and pull together some of the transforms and connections to create a small ETL process. This process will pull transactional data from the AdventureWorks database and then massage the data by aggregating, sorting, and calculating new columns. This extract may be used by another vendor or an internal organization.
Create a new package and rename it AdventureWorksExtract.dtsx. Start by dragging a Data Flow task onto the control flow. Double-click on the task to go to the Data Flow tab.
In the Data Flow tab, drag an OLE DB Source onto the design pane. Right-click on the source and rename it TransactionHistory. Double-click on it to open the editor. The connection to the AdventureWorks database may already be in the Data Connections list on the left. If it is, select it, and click OK. If it's not there yet, click New to add a new connection to the AdventureWorks database on any server.
When you click OK, you'll be taken back to the OLE DB Source Editor. Ensure that the Data Access Mode option is set to "Table or View." Select the [Production].[TransactionHistoryArchive] table from the Name of the Table drop-down box as shown in Figure 4-32.
Go to the Columns page (shown in Figure 4-33) and uncheck every column except for ProductID, Quantity, and ActualCost. Click OK to exit the editor.
Drag a Derived Column transform onto the data flow, right-click on it, and select Rename. Rename the transform "Calculate Total Cost." Click the TransactionHistory OLE DB source and drag the green arrow (the data path) onto the Derived Column transform.
Double-click on the Derived Column transform to open the editor (shown in Figure 4-34). For the Expression column, type the following code or drag and drop the column names from the upper-left box: [Quantity]* [ActualCost]. The Derived Column should have the <add as a new column> option selected, and type TotalCost for the Derived Column Name option. Click OK to exit the editor.
Drag an Aggregate transform onto the data flow and rename it "Aggregate Data." Drag the green arrow from the Derived Column transform onto this transform. Double-click the Aggregate transform to open the editor (shown in Figure 4-35). Select the ProductID column and note that it is transposed into the bottom section. The ProductID column should have Group By for the Operation column. Next, check the Quantity and TotalCost columns and set the operation of both of these columns to Sum. Click OK to exit the editor.
Drag a Sort transform onto the data flow and rename it "Sort by ProductID." Connect the Aggregate transform to this transform by the green arrow as in the last step. Double-click on the Sort transform to configure it in the editor. You can sort by the most popular products by checking the Quantity column and selecting Descending for the Sort Type drop-down box. Click OK to exit the editor.
You've now done enough massaging of the data and are ready to export the data to a flat file that can be consumed by another vendor. Drag a Flat File Destination onto the data flow. Connect it to the Sort transform by using the green arrow as you saw in the last few steps. Rename the Flat File Destination "Vendor Extract."
Double-click on the destination to open the Flat File Destination Editor. You're going to output the data to a new Connection Manager, so click New. When prompted for the Flat File Format, select Delimited. Name the Connection Manager "Vendor Extract" also and type whatever description you'd like. If you have the directory, point the File Name option to C:\SSISDemos\VendorExtract.csv (make sure this directory is created before proceeding). Check the Column Names in the First Data Row option. Your final screen should look like Figure 4-36. Click OK to go back to the Flat File Destination Editor.
Go to the mappings page and make sure that each column in the Inputs table is mapped to the Destination table, as shown in Figure 4-37. Click OK to exit the editor and go back to the data flow.
Now, your first larger ETL package is complete! This package is very typical of what you'll be doing day-to-day inside of SSIS, and you will see this expanded on greatly in Chapter 5. Execute the package and you should see the rows flow through the data flow as shown in Figure 4-38. Note that as the data flows from transform to transform you can see how many records were passed through the transform.
Figure 4-32
Figure 4-33
Figure 4-34
Figure 4-35
Figure 4-36
Figure 4-37
Figure 4-38