Example: Creating a Job That Joins Two Tables and Generates a Report

 < Day Day Up > 



This example demonstrates one way to use the New Job wizard, the Process Library, and the Process Designer window to enter metadata for a job. The example describes one way to create a report that is needed for the example data warehouse, as described in "Example: Creating a SAS Code Transformation Template" on page 120.

Preparation

Assume the following about the job in the current example:

  • A data warehouse project plan identified the need for a report that ranks salespeople by total sales revenue. The report will be produced by a SAS ETL Studio job. The job will combine sales information with human resources information. A total revenue number will be summarized from individual sales. A new target table will be created, and that table will be used as the source for the creation of an HTML report.

  • Metadata for both source tables in the job (ORGANIZATION_DIM and ORDER_FACT) is available in a current metadata repository.

  • Metadata for the main target table in the job (Total_Sales_By_Employee) is available in a current metadata repository. "Example: Using the Target Table Designer to Enter Metadata for a SAS Table" on page 89 describes how the metadata for this table could be specified. As described in that section, the columns in Total_Sales_By_Employee were modeled after the columns in one source table, ORGANIZATION_DIM. However, when the job is fully configured in this example, the Total_Sales_By_Employee table will contain selected columns from both source tables: ORGANIZATION_DIM and ORDER_FACT.

  • All sources and targets are in SAS format and are stored in a SAS library called Ordetail.

  • Metadata for the Ordetail library has been added to the main metadata repository for the example data warehouse. For details about libraries, see "Enter Metadata for Libraries" on page 44.

  • The main metadata repository is under change-management control. For details about change management, see "Working with Change Management" on page 64.

  • You have selected a default SAS application server for SAS ETL Studio, as described in "Select a Default SAS Application Server" on page 59.

Start SAS ETL Studio and Open the Appropriate Metadata Profile

Perform the following steps to begin work in SAS ETL Studio:

  1. Start SAS ETL Studio as described in "Start SAS ETL Studio" on page 56.

  2. Open the appropriate metadata profile as described in "Open a Metadata Profile" on page 58. For this example, the appropriate metadata profile would specify the project repository that will enable you to access metadata for the required sources and targets: ORGANIZATION_DIM, ORDER_FACT, and Total_Sales_By_Employee.

Check Out Any Metadata That Is Needed

To add a source or a target to a job, the metadata for the source or target must be defined and available in the Project tree. In the current example, assume that the metadata for the relevant sources and targets must be checked out. The following steps would be required:

  1. On the SAS ETL Studio desktop, select the Inventory tree.

  2. In the Inventory tree, open the Tables folder.

  3. Select all source tables and target tables that you want to add to the new job: ORGANIZATION_DIM, ORDER_FACT, and Total_Sales_By_Employee.

  4. Select Project Check Out from the menu bar. The metadata for these tables will be checked out and will appear in the Project tree.

The next task is to create and populate the job.

Create and Populate the New Job

With the relevant sources and targets checked out in the Project tree, follow these steps to create and populate a new job. To populate a job means to create a complete process flow diagram, from sources, through transformations, to targets.

  1. From the SAS ETL Studio desktop, select Tools Process Designer from the menu bar. The New Job wizard is displayed.

  2. Enter a name and description for the job. Type the name Total_Sales_By_Employee, press the TAB key, the enter the description Generates a report that ranks salespeople by total sales revenue.

  3. Click . An empty job will open in the Process Designer window. The job has now been created and is ready to be populated with two sources, a target, a SQL Join transformation, and a Publish to Archive transformation.

  4. From the SAS ETL Studio desktop, click the Process tab to display the Process Library.

  5. In the Process Library, open the Data Transforms folder.

  6. Click, hold, and drag the SQL Join transformation into the empty Process Designer window. Release the mouse button to display the SQL Join transformation template in the Process Designer window for the new job. The SQL Join transformation template is displayed with drop zones for two sources and one target, as shown in the following display.

    click to expand
    Display 10.1: The New SQL Join Transformation in the New Job

  7. From the SAS ETL Studio desktop, click the Project tab to display the Project tree. You will see the new job and the three tables that you checked out.

  8. In the Project tree, click and drag the ORGANIZATION_DIM table into one of the two input drop zones in the Process Designer window, then release the mouse button. The ORGANIZATION_DIM table appears as a source in the new job.

  9. Repeat the preceding step to identify the ORDER_FACT table as the second of the two sources in the new job.

  10. Click and drag the table Total_Sales_By_Employee into the output drop zone in the Process Designer window. The target replaces the drop zone and a Loader transformation appears between the target and the SQL Join transformation template, as shown in the following display.

    click to expand
    Display 10.2: Sources and Targets in the Example Job

  11. From the SAS ETL Studio desktop, click the Process tab to display the Process Library.

  12. In the Process Library, open the Publish folder. Click and drag the Publish to Archive transformation into any location in the Process Designer and release the mouse button. As shown in the following display, an icon and an input drop zone appear in the Process Designer.

    click to expand
    Display 10.3: Example Job with Publish to Archive

  13. In the Process Designer window, click and drag the target Total_Sales_By_Employee over the input drop zone for the Publish to Archive transformation. Release the mouse button to identify the target as the source for Publish to Archive, as shown in the following display.

    click to expand
    Display 10.4: Target Table Used as the Source for Publish to Archive

The job now contains a complete process flow diagram, from sources, through transformations, to targets. The next task is to update the default metadata for the transformations and the target.

Configure the SQL Join Transformation

The example job now contains the necessary sources, target, and transformations. Follow these steps to configure the SQL Join transformation.

  1. In the Process Designer window, select the SQL Join transformation object, then select File Properties from the menu bar. A properties window is displayed.

  2. Click the SQL tab. Note that all columns from both source tables are included in the join operation by default.

  3. Click the Mapping tab. Click the Employee_Country column and press the DELETE key. The Employee_Country column and mapping are removed. This column is not needed in the report.

  4. In the target table on the right of the Mapping tab (a temporary work table), retain the Company column. Select the Department column. Press the DELETE key twice to delete the Department column and the Section column.

  5. In the target table on the right of the Mapping tab, retain the Org_Group column, the Job_Title column, and the Employee_Name column. Select the Employee_Gender column. Delete the next 20 columns. Retain the Total_Retail_Price column, which will be summarized to create the total revenue number for each salesperson.

  6. In the target table on the right of the Mapping tab, select the CostPrice_Per_Unit column. Delete the last two columns. The temporary target now contains only the columns that are needed in the report. Eliminating extraneous columns at this early stage maximizes the job's run-time performance.

  7. In the target table on the right of the Mapping tab, click twice in the Expression column for Total_Retail_Price. Then click again in the icon that appears at the right side of the field. This action displays the Expression Builder, which will be used to enter the expression that will summarize individual sales into a total revenue number for each salesperson.

  8. In the Expression Builder, enter the following expression and click OK, as shown in the following display. The Expression Builder window closes and the expression appears in the Expression column of the Mapping tab.

       SUM(ORDER_FACT.Total_Retail_Price) 

    click to expand
    Display 10.5: SUM Statement in the Expression Builder

    The following display shows the configuration of the Mapping tab.

    click to expand
    Display 10.6: Mapping Source Columns in the SQL Join Transformation

  9. To see how the SQL code is updated based on the contents of the Mapping tab (and other tabs in the SQL Join transformation), click the SQL tab. In the SQL code that is shown in the following display, note that the number of target columns has been reduced to six, and a SUM expression has been added for the Total_Retail_Price column.

    click to expand
    Display 10.7: SQL Code Configured Automatically

  10. The SQL Join transformation is now ready. Click OK to save input and close the properties window.

Configure the Columns in the Target and the Loader

In our example job, the SQL Join transformation is now ready to run. Follow these steps to configure the target table Total_Sales_By_Employee.

  1. In the Process Designer window, select the target Total_Sales_By_Employee, then select File Properties from the menu bar. A properties window is displayed.

  2. In the properties window, click the Columns tab.

  3. Click the Total_Retail_Price column. Change the name to Total_Revenue. The new name is a better representation of the newly summarized data.

  4. In the Total_Revenue column, scroll right to display the Format column. Enter the format DOLLAR13.2 to specify the appearance of this column in the HTML output file.

  5. In the Total_Revenue column, click twice in the Sort column to display a pull-down icon. Click the icon and select the DSCFORMATTED option. This option sorts the rows in descending order, based on the formatted value of the Total_Revenue column.

  6. Reorder the columns by selecting the rows in the list view and clicking the up or down arrows. The end result is a column order that formats the report for easy reading, as shown in the following display. When the column order is ready, the target is ready. Click OK to save input and close the properties window.

    click to expand
    Display 10.8: Configured Target Columns

  7. In the Process Designer window, select the Loader transformation, then select File Properties from the menu bar. A properties window is displayed.

  8. In the properties window, click the Mapping tab to confirm that the columns of the SQL Join transformation are correctly mapped to the columns of the target, as shown in the following display.

    click to expand
    Display 10.9: Column Mapping in the Loader

  9. In the properties window, click the Load Technique tab. Select the Drop Target radio button to replace the physical table each time the job is run.

The Loader is now configured and is ready to run.

Configure the Publish to Archive Transformation

The example job is now fully configured through the SQL Join and Loader transformations, and through the target table. Follow these steps to configure HTML output using the Publish to Archive transformation. The Publish to Archive transformation generates a SAS package file and an optional HTML report. The package file can be published by SAS programs that use the publishing functions in SAS Integration Technologies software.

  1. In the Process Designer window, select the Publish to Archive transformation, then select File Properties from the menu bar. A properties window is displayed.

  2. In the properties window, click the Options tab. Type in values for the fields that are shown in the following display.

    click to expand
    Display 10.10: Options in the Publish to Archive Transformation

  3. Click OK to save input and close the properties window. The Publish to Archive transformation, and the entire job, are now ready to run.

Run and Troubleshoot the Job

After the metadata for a job is complete, you must submit the job for execution in order to create targets on the file system.

  1. With the job displayed in the Process Designer window, select Process Submit from the menu bar. SAS ETL Studio generates code for the job and submits the code to a SAS application server. The server executes the code. A pop-up window is displayed to indicate that the job is running.

  2. If a pop-up error message appears, or if you simply want to look at the log for the completed job, click the Log tab in the Process Designer window.

  3. In the Log tab, scroll through the SAS log information that was generated during the execution of the job, as shown in the following display.

    click to expand
    Display 10.11: Log Tab with Text from the Example Job

    The code that was executed for the job is available in the Source Code tab of the Process Designer window.

  4. If you find errors in the source code for a step, select the corresponding transformation in the process flow diagram, then select File Properties from the menu bar. A properties window displays.

  5. Correct the metadata and resubmit the job until there are no more errors.

  6. After the job runs without error, save the job. Select File Save from the menu bar.

The next task is to verify that the job created the correct output.

Verify the Job's Outputs

After the job runs without error and has been saved, you should confirm that the target(s) contain the data you need, in the format that best communicates the purpose of the target(s). In the current example, the main target table is the Total_Sales_By_Employee table. The example job also creates an HTML report.

  1. To view the data in the Total_Sales_By_Employee target, select the target, then select View View Data from the menu bar. The data in the target is displayed in a View Data window, as shown in the following display.

    click to expand
    Display 10.12: Viewing the Target

  2. Confirm that the target contains the data you need, in the format that best communicates the purpose of the target.

  3. To display the HTML report, open the output file. In this case, the example generated a file in the public directory on the SAS application server. The file specification is as follows:

       \\D9585\public\salesRank.html 

    The following display shows how the example file appears in a Web browser.

    click to expand
    Display 10.13: HTML Report Generated by the Example Job

If a target needs to be improved, change the properties of that target or the transformations that feed data to that target. If the outputs are correct, you can check in the job.

Check In the Job

To check in a job in the Project tree:

  1. In the Project tree, select the repository icon.

  2. On the SAS ETL Studio desktop, select Project Check In Repository from the menu bar. All of the objects in the project repository are checked in to the change-managed repository.



 < Day Day Up > 



SAS Institute - SAS 9.1.3 ETL Studio. User's Guide
SAS 9.1.3 ETL Studio: Users Guide
ISBN: 1590476352
EAN: 2147483647
Year: 2004
Pages: 127
Authors: SAS Institute

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net