Pivot Transform


Do you ever get the feeling that pivot tables are the modern day Rosetta Stone for translating data to your business owners? You store it relationally, but they ask for it in a format that you have to write a complex case statement to generate. Well, not anymore. Now you can use a SSIS transformation to generate the results. A pivot table is a result of cross-tabulated columns generated by summarizing data from a row format. Prior to SQL Server 2005, a pivot table could be generated only by using a SELECT...CASE statement to build summary columns based on one field in the row.

Typically a Pivot is generated using the following input columns:

  • Pivot Column: A Pivot column is the element of input data to "pivot." The word "pivot" is another way of saying "to create a column for each unique instance of." However, this data must be under control. Think about creating columns in a table. You wouldn't create 1000 uniquely named columns in a table. So for best results when choosing a data element to pivot, pick an element that can be run through a GROUP BY statement that will generate 15 or fewer columns. If you are dealing with dates, use something like a DATENAME function to convert to the month or day of the year.

  • Row Columns: Row columns are elements of input data that act as row (not column) identifiers. Just like any GROUP BY statement, some of the data are needed to define the group (row), whereas other data are just along for the ride.

  • Value Columns: These columns are aggregations for data that provide the results in the matrix between the row columns and the pivot columns.

The Pivot Transform task can accept an input stream, use your definitions of the columns above, and generate a pivot table output. It helps if you are familiar with your input needs and format your data prior to this transform. Aggregate the data using GROUP BY statements. Pay special attention to sorting by row columns — this can significantly alter your results.

The Pivot Transform task uses the Advanced Editor to set up pivot rules. To set your expectations properly, you are going to have to define each of your literal pivot columns. A common misconception, and source of confusion, is approaching the Pivot transform with the idea that you can simply set the pivot column to pivot by the month of the purchase date column and the transformation should automatically build 12 pivot columns with the month of the year for you. It will not. It is your task to create an output column for each month of the year. If you are using colors as your pivot column, you'll need to add an output column for every possible color. What happens if columns are set up for Blue, Green, and Yellow and the color Red appears in the input source? The Pivot transform task will fail. So plan ahead and know the possible pivots that can result from your choice of a pivot column or provide for an error output for data that doesn't match your expected pivot values.

Use some of the AdventureWorks product and transactional history to generate a quick pivot table to show product quantities sold by month. This is a typical upper-management request and you can cover all the options with this example. Adventure Works Management wants a listing of each product with the total quantity of transactions by month for the year 2003.

First identify the Pivot Column. The month of the year looks like the data that is driving the creation of the pivot columns. The row data columns will be the product name and the product number. The value field will be the total number of transactions for the product in a matrix by month. Now you are ready to set up the Pivot transformation.

  1. Create a new SSIS project named "Pivot Example." Add a Data Flow task to the Control Flow design surface.

  2. Add an OLE DB Source transform to the Data Flow design surface. Configure the connection to the AdventureWorks database. Set the Data Access Mode to SQL Command. Add the following SQL statement into the SQL Command text box:

     SELECT p.[Name] as ProductName, p.ProductNumber,        datename(mm, t.TransactionDate) as TransMonth,        sum(t.quantity) as TotQuantity FROM production.product p INNER JOIN production.transactionhistory t ON t.productid = p.productid WHERE t.transactiondate between '01/01/03' and '12/31/03' GROUP BY p.[name], p.productnumber, datename(mm,t.transactiondate) ORDER BY productname, datename(mm, t.transactiondate) 

  3. Add the Pivot transform and connect the output of the OLE DB Source to the input of the transform. Open the Advanced Editor and navigate to the Input Columns tab. In many of the transforms, you have the option of passing through some values from the input to the output. In the Pivot transform, you have to select all the columns that will be included in the output of the Pivot. All nonselected input columns will be ignored. Select all the input columns for this example.

  4. Move to the Input and Output Properties tab. There is a collection of input, output, and error outputs. Remember that the input columns are the raw data coming in. The output columns will be the pivot data coming out. Figure 6-29 shows the input columns expanded and a view of the properties for the ProductName column. There are two important properties in that property editor:

    • The LineageID property can't be changed, but you will need to know it in order to map output columns to an input column.

    • The PivotUsage has to be set using the following codes:

      0:

      The column is just passed through as a row attribute.

      1:

      The column is the row column identifier (BOL calls this the Set Key).

      2:

      The pivot column.

      3:

      The value column.

  5. Set the PivotUsage properties for each of the Input Columns to match these codes:

    • ProductName: 0 — A row attribute

    • ProductNumber: 1 — A row identifier

    • TransMonth: 2 — The pivot column

    • TotQuantity: 3 — The value column

  6. Expand the Output Column Nodes. Click the Add Column button to add a column to the output column collection. Set the name of the new output column to ProductName. Set the SourceColumn value to the LineageID of the same-named input column. Do the same thing for the ProductNumber Column. Figure 6-30 shows an example of the properties that appear for the output column. There are some new properties here:

    • Comparison Flags: Allows ignoring of case, kana type, nonspacing characters, character width, and symbols when sorting the field. The defaults use each of these setting when sorting.

    • SortKeyPosition: Provides for custom sorting by position. Each field has a number that indicates the order by which it is sorted. A zero (0) indicates that it is nonsorted. A one (1) indicates that it is sorted.

    • PivotKeyValue: This property is important only for the output columns that you define for the Pivot Column. In this column, you'll place the exact text or an expression that will resolve to the groupings that you want to appear as your pivot columns. When pivoting on colors, this value would be Blue, Green, or Red.

    • SourceColumn: This property requires the LineageID (not the ID) of the source column. This is a "poor man's" way of connecting the input columns to the output columns.

    Note

    The output columns will be generated in exactly the same order that they appear on the output columns collection. You can't move them once they are added either, so pay attention to this as you add output columns.

  7. Add an output column named "January." Now for the big secret to making the whole thing work: Set the source column value to the LineageID of the TotQuantity column — not the TransMonth column. Remember that you are building a two-dimensional grid. The TransMonth field dictates one of the dimensions. The value in the column should be the total quantity at that dimension. Set the PivotKeyValue to "January" (without quotes). The pivot key is the literal value that will be examined in the data to determine when to put a value in a column. It is important that the incoming data sorts on this column to get consistent results. Repeat this process of creating an output column for each month of the year.

    Note

    Do not use the LineageID values that you see in any of these figures. LineageIDs are specific to your own examples.

  8. To finish the example, add an OLE DB Destination. Configure to the AdventureWorks connection. Connect the Pivot Default Output to the input of the OLE DB Destination. Click on the New Button to alter the CREATE TABLE statement to build a table named PivotTable.

  9. Add a Data Viewer in the pipe between the PIVOT and OLE DB destination and run the package. You'll see the data in a pivot table in the Data Viewer as in Figure 6-31.

image from book
Figure 6-29

image from book
Figure 6-30

image from book
Figure 6-31



Professional SQL Server 2005 Integration Services
Wireless Java : Developing with Java 2, Micro Edition
ISBN: 189311550X
EAN: 2147483647
Year: 2006
Pages: 182

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net