Creating Integration Services Packages for Analysis Services Operations | Professional SQL Server Analysis Services 2005 with MDX (Programmer to Programmer)

Integration Services is great for creating packages to accomplish administrative tasks on Analysis Services. Even more convenient is how these automated tasks can be performed on a periodic basis. Integration Services provides several tasks and transforms that help you in building packages in the uniform Business Intelligence IDE you have been working with in Analysis Services 2005. Tasks and transforms have been specifically designed for integration with Analysis Services.

Execute DDL Task

The Execute DDL task is used for sending a DDL script to Analysis Services. This task is typically used to accomplish administrative tasks like backup, restore, or sync operations that need to be performed on a periodic basis. You can also use this task to send process statements to Analysis Services for processing Analysis Services objects or include this task along with other Integration Services tasks to create a more generalized package; for example, dynamically creating a new partition in a cube by analyzing the fact data — and then processing that partition. To create a package that will backup your Analysis Services database on a periodic basis, do the following:

Open BIDS and load the AnalysisServices2005Tutorial project found under folder Chapter 16 Samples of the book's sample that can be downloaded from the Wrox Web site. Once you have loaded it, go ahead and deploy the project. This specific project has some additions to the ones you have created earlier, and hence make sure the full project is deployed.
While still in BIDS, create a new Integration Services project named IntegrationServicesTutorial by selecting FileNewProject and selecting Integration Services project. You will now be in the Integration Services project shown in Figure 16-1.
The solution explorer window shows four folders containing the following: Data Sources, Data Source Views, SSIS Packages, and Miscellaneous as shown in Figure 16-1. As with Analysis Services you can create connections to data sources and add them to a data source object. The main purpose of data sources is to share connections between multiple SSIS packages within a single project. Again, as with an Analysis Services project, DSVs help you create a subset of tables, views, named queries, and named calculations that can be used by SSIS packages rather than creating equivalent views on your relational backend. If you are designing a single SSIS package that uses a database too small to merit a view, you might not use the data source views. The properties window helps you to define properties of various objects. The primary objective of SSIS is to retrieve data from a source, perform some operations to transform the data, and store the data at a destination. Therefore we expect almost all the SSIS projects will have some form of connection object. Familiarize yourself with the various windows in the SSIS design surface and various windows. All the connections used in your SSIS package will be shown in the Connection Managers window thereby providing a consolidated view. If you right-click in the connection window you can see the various types of connections that SSIS supports. The main window, which is a graphical designer, is used to create SSIS packages. The SSIS Design window has four views: Control Flow, Data Flow, Event Handlers, and Package Explorer. You explore the use of these views in this chapter.

Figure 16-1
All the tasks and transforms provided by SSIS are represented within the toolbox window. To see the toolbox window, click ViewToolbox in the drop-down menu . The toolbox window can be docked by clicking the pin icon as shown in Figure 16-2. Drag and drop the SQL Server Analysis Services Execute DDL task to the Control Flow tab. You will now see the task in the window as shown in Figure 16-2. The SSIS designer in BIDS completes a validation on every task and transform in your package. Each task has a certain set of required properties and some optional properties. If the required properties are not specified, you will see a red "x" mark within the task. If the optional properties are not defined you will see a yellow "x" mark within the task. If any of the tasks or transforms within your package have a red "x" mark, that indicates that there is an error in your package and you will not be able to run the package without resolving the error. The Execute DDL task needs the connection details to Analysis Services and the DDL to execute. Because these properties have not been defined when you drag and drop the task in your editor, you will see a red "x" mark as shown in Figure 16-2.

Figure 16-2
One of the properties for the Execute DDL task is to specify the connection details to an Analysis Services database. To create a connection to the Analysis Services 2005 Tutorial database, rightclick the Connection Managers window. You can see all the various types of connections SSIS supports as shown in Figure 16-3. Select "New Analysis Services Connection…"

Figure 16-3
The Add Analysis Services Connection Manager dialog is launched as shown in Figure 16-4. You have the option of establishing a connection to an Analysis Server database or to an Analysis Services project within your solution. BIDS supports having multiple projects within the same solution. This means that you can have Analysis Services and Integration Services projects within the same solution. If you are building a SSIS package for the Analysis Services project within the same solution, you choose the second option. Select the first option and click Edit.

Figure 16-4
In the Connection Manager dialog you can specify the connection details to your Analysis Services database such as server name, database name as shown in Figure 16-5. After you have specified the connection details, click OK to complete both dialogs.

Figure 16-5
To specify properties needed by the Execute DDL task double-click the Execute DDL task object within the designer. You will see the Execute DDL task editor as shown in Figure 16-6.

Figure 16-6
From the drop-down list for connection you can either create a new connection or select an existing connection. Select the Analysis Services connection you established in step 7. There are three ways of specifying the DDL to be executed.
- Enter the DDL in a text box. Whenever you know that your DDL is static and will not change you can use this option.
- Specify a connection to the file. This option is used whenever you have a file containing the DDL. This file can be static or dynamic in the sense that the DDL contents can be changed by another operation, such as a different SSIS task or an external program.
- Specify the DDL using a variable in the SSIS package where the value of the variable is the actual DDL. This option is used whenever the results from one task need to be used as an input to another task. The variable can be initialized during package execution, or the task setting the value for the variable needs to be executed before the Execute DDL task in the control flow.
Select the source type as File Connection. Select the drop-down list under Source and select New Connection. In the File Connection manager dialog, select the DDL file provided under the Chapter 16 directory on the Web site. The contents of the DDL are shown below. This DDL will take a backup of the Analysis Services 2005 Tutorial database.
```
     <Backup xmlns="http://schemas.microsoft.com/analysisservices/2003/engine">         <Object>             <DatabaseID>AnalysisServices2005Tutorial</DatabaseID>         </Object>         <File>AnalysisServices2005Tutorial.abf</File>     </Backup> 
```
Once you have specified all the properties for the Execute DDL Task editor as shown in Figure 16-6, click OK.

If you run the SSIS package you have created a backup of the Analysis Services 2005 Tutorial database will be created in Program Files\Microsoft SQL Server\MSSQL.2\OLAP\Backup. Backup is usually an operation scheduled for when the load on Analysis Services is minimal. Many companies do backup operations on a nightly basis, but if you are a multinational company or have customers using the database across the globe, you would have to factor in your customers' needs and take the backup at an appropriate time.

Regardless of when the package is run, you want to know whether the operation succeeded or failed. Obviously you can check the logs on Analysis Services or the logs of the SSIS package, but as an administrator one of the easiest ways is to send an e-mail about the results of the operation. To facilitate this operation, SSIS provides a task called the Send Mail Task. By specifying appropriate parameters to this task you can send an e-mail upon completion of a specific task or an entire SSIS package.

To add the send mail task to your SSIS package, drag and drop two instances of Send Mail Task to your designer. You will use one task to send an e-mail when the Execute DDL task succeeds and the other one to send mail when the Execute DDL task fails. Now that you have two send mail tasks in the Control Flow pane, it is time to connect the Execute DDL task to the send mail tasks. You can connect tasks in a control flow by clicking on the originating object (a downward facing green arrow will appear) and dragging the arrow end to the target object. Do that for the first Send Mail task. For the second Send Mail Task, just click the Execute DDL task again and you will see another green arrow appear as an output of the item. Drag the green arrow and connect it to the next Send Mail task. Your package should look like the one shown in Figure 16-7. That connecting line represents precedence constraint functionality; in fact, if you double-click the green line, the Precedence Constraint Editor will appear. The green lines indicate success and whenever the Execute DDL task completes successfully, execution continues with the task connected on the success line. To configure the second mail task to send e-mail on failure, double-click the connecting line.

You should see the precedence constraint editor as shown in Figure 16-8. The connecting line between the two tasks has several properties that are evaluated after the completion of the source task. You can specify an expression and/or constraint that can be evaluated after the completion of the task. The value property of the connecting line helps you to choose the constraint and determines whether the control will be transferred to the next task. The three options for the value property are success, failure, and completion. Change the value from Success to Failure. You can also configure the precedence control by right-clicking the connecting line between two tasks and selecting Success, Failure, or Completion. Click OK after you have completed specifying the constraint.

Figure 16-7

Figure 16-8
Double-click the Send Mail task to configure the properties of the task. You need to specify the mail server and details of the mail content in the properties of the Send Mail task. The Send Mail Task Editor is shown in Figure 16-9. Specify the details of the mail server by clicking the drop-down list of the SmtpConnection property. Your company should have a SmtpServer.

Figure 16-9

Contact your IT administrator to get details on the name of your SMTP server. Specify the e-mail address from which you want this mail to be sent, the people who need to receive the status of this package execution, and the content of the mail, as shown in Figure 16-9. Based on the Send Mail task you have chosen, provide the appropriate subject and message source. Figure 16-9 shows the contents of the Send Mail task that will be executed on successful completion of the Execute DDL task created in a previous step. Make sure the other send mail task properties are appropriately entered. Rename the Send Mail Tasks as Send Mail Success and Send Mail Failure by changing the name of the tasks as shown Figure 16-10. Appropriate naming makes the SSIS package easily readable and can be interpreted immediately by another person working on this task.

Figure 16-10
The SSIS package is now ready for execution. You can select DebugStart, hit the F5 key, or right-click the package name and select Deploy. The BIDS now starts the execution of the SSIS package. The BIDS will operate in the debugging environment, similar to debugging a program. You will first see the SQL Server Analysis Services Execute DDL task highlighted in yellow as shown in Figure 16-11, which indicates that the task is currently under execution. If the task completed successfully, the status of the task is shown in green, and if it failed the status is shown in red. You do have the ability to insert break points, and analyze variables used within tasks or transforms in the debug environment.

Figure 16-11

Once the entire package is completed, status on each of the tasks is shown; that is, the two tasks are shown in green having completed successfully.
In the debug environment you can see detailed information for each task and the time taken by the task for completion in the Progress window. You can switch to the progress window when the package is being executed. The progress window gets updated when the control moves from one task to another. Figure 16-12 shows the progress report of the execution of the package. You can see that the DDL Execute task, which took a backup of an Analysis Services database, took 47 seconds to complete.

image from book
Figure 16-12

Processing an Analysis Services Object

SSIS provides a task for processing Analysis Services objects. You can process an entire Analysis Services database or choose a specific dimension or cube or even partitions for processing using the SSIS task called the SQL Server Analysis Services Processing task. The Analysis Services processing task is useful whenever you have changes in your relational data that need to get propagated to the cube. Often retail companies have new products added to their catalog every day, and the products table gets updated with the new products or changes in existing products as a daily batch process. Also, the daily sales data gets updated in the relational database as a nightly batch process. In order to propagate these changes to the cube for analysis, the dimensions and cubes need to be processed unless you have set the storage mode as ROLAP for dimensions and cubes. There are several considerations involved in determining frequency of processing. Should cubes be processed on a daily, weekly or monthly basis? The decision to process the Analysis Services objects is typically based upon the size of the dimensions and cubes, how often data changes on the relational database, and the frequency with which business analysts analyze the cube data. In most cases there are additions to the products table rather than updates, and hence an incremental process of the products table might be sufficient. If your fact table gets updated with daily transactional data in the same table, you have the option of creating new partitions in the cube on a daily/weekly basis or doing a full process of the cube. The Microsoft operations guide for Analysis Services suggests you have a new partition for every 20 million records or when the partition file reaches 5GB — in this way you can achieve optimal performance. How you partition your data is based on decisions that relate to your business needs. To create an Integration Services package that processes an Analysis Services Sales partition, do the following:

Right-click the SSIS Packages folder and select New SSIS Package. Name the package PartitionProcessing.
Similar to what we did in the Backup package earlier in this chapter; create a connection to the Analysis Services Tutorial 2005 database in Connection Managers.
Drag and drop the SQL Server Analysis Services Processing task and two Send Mail tasks into the SSIS designer. Configure one of the Send Mail tasks for success and another one for failure.
Double-click the SQL Server Analysis Services Processing task. This launches the Analysis Services Processing Task editor as shown in Figure 16-13. This dialog is similar to the Processing dialog of Analysis Services, which you learned about in previous chapters.

Figure 16-13

If you click the Change Settings button, the Change Settings dialog pops up as shown in Figure 16-14. The Change Settings dialog allows you to process the selected Analysis Services objects sequentially or in parallel. You can use the "Dimension key errors" tab to configure the processing options so that appropriate actions are taken when errors are encountered during the processing operation. The selected option will apply to all the Analysis Services objects chosen in the Analysis Services processing task editor. To add Analysis Services objects for processing, click the Add button on the Analysis Services Processing Task Editor (Figure 16-13).

Figure 16-14
The Add SQL Server Analysis Services Object allows you to choose the object you want to process. Select the Fact Internet Sales Partition as shown in Figure 16-15 and click OK. Click OK again to dismiss the Task Editor dialog.

Figure 16-15
Press the F5 button on your keyboard to deploy the SSIS processing package and make sure it executes correctly. If everything has been specified correctly you will see successful completion of the SQL Server Analysis Services Processing task and the Send Mail Success tasks — also, these tasks will be highlighted in green indicating the successful execution. Of course you do need appropriate privileges on the Analysis Services instance to perform these operations.

Loading Data into an Analysis Services Partition

Typically, the data from the transactional source database (production system) is extracted, staged to some intermediate storage, and then undergoes transformations before being stored in a data warehouse (flat file or relational format). This data then needs to be propagated to the cube for analysis. Analysis Services has the ability to read data from various data sources from flat files to various relational databases. One way to add new fact data to the cube is to create new partitions that read data from the data sources. You can use SSIS's Script task to create a new DDL, execute the DDL using the Execute DDL task, and then process the partition using the Analysis Services processing task. You can create a package that will integrate all these tasks. Even though you can utilize these tasks to load fact data, it is not easy to load new dimension data to an existing dimension table. Therefore, SSIS provides an easy way to load new fact and dimension data into your current cube using SSIS transforms. The two transforms that help in loading such data are the partition processing transform and the dimension processing transform.

Many large retail stores still use flat files to extract data from the transactional systems. Your company probably does the same. Often the columns in the flat files do not contain clean data. During the staging process you clean the data and format it with appropriate IDs that match your cube in order to load the data into your cube. SSIS provides transformations to do lookups, get the correct ids, clean the data on the fly, and then load the data into your cube. In the following example you will be working with clean data that needs to be loaded from a flat file into one of the partitions of the AnalysisServices2005Tutorial cube:

Create a new SSIS package under the SSIS packages folder and name it PipelineDataLoad.
The SSIS task that helps you to read data, perform transforms, and then push the data into a destination is called the Data Flow task. Drag and drop the Data Flow task into your SSIS editor as shown in Figure 16-16 and name it Data Flow Partition Load.

Figure 16-16
Double-click the Data Flow task. You will now be in the Data Flow view. The Toolbox window will show you the SSIS transforms available for use in the Data Flow view. The data flow transforms are categorized into three main areas, namely, data flow sources, data flow transformations, and data flow destinations. Data to be loaded into the partition of the Analysis Services 2005 Tutorial cube is provided as a flat file in the Chapter 16 Samples folder named Adventure Works Partition 3 Data.txt. To retrieve this data you need to use the Flat File Source transform. This data needs to be pushed to the partition in the Analysis Services2005Tutorial cube. Therefore, you need a Partition Processing transform. Drag and drop the Flat file source and Partition processing transforms from the Toolbox to the Data Flow editor and join them through the connector as shown in Figure 16-17.

Figure 16-17
Double-click the Flat File Source transform to specify the connection to the flat file. You will now be in the Flat File Source Editor as shown in Figure 16-18. You need to specify the flat file using the Flat file connection manager. Click the New button.

Figure 16-18
The flat file connection manger dialog as shown in Figure 16-19 now pops up. Click the Browse button and select the flat file AdventureWorksPartition3Data.txt which is available for download from the book web site. The dialog now parses the data in the flat file. You need to specify the type of delimiter used in the flat file to separate the columns. Click on the Columns and choose the delimiter as comma, see Figure 16-20. You also have the option to skip rows from the flat file if the first row or the first few rows indicate column headers. Click on the check box Column names in the first datarow. To see if the dialog is able to read the data from the flat file correctly based on the delimiter, click the columns property

Figure 16-19
You will now see the data from the flat file organized as rows and columns as shown in Figure 16-20. After you have confirmed that the dialog is able to read the data in the flat file, click OK.

Figure 16-20
In the flat file editor dialog click on the Columns as shown in Figure 16-21. By default SSIS will use the column names to be the output column names. You can change the output column names in this dialog by editing the appropriate row. Leave the default names suggested by SSIS and click OK.

Figure 16-21
After configuring the flat file source, you need to specify the partition into which this data needs to be loaded. Click the partition processing transform in the SSIS data flow view editor. You can now see the partition processing connection editor as shown in Figure 16-22. Similar to the first two SSIS packages, you need to specify the connection to the database. Click the New button to specify the connection to AnalysisServices2005Tutorial database. You will now see the cubes and partitions within the database.

Figure 16-22
Select the partition under which the data needs to be loaded and specify the processing method that needs to be applied. If the data is new you typically need to use the Add (Incremental) option that processes the partition incrementally. Processing the partition incrementally means that the new data will be incrementally added to the cube while the current data is available for querying. Once the new data has been processed, the data is committed and it is available for querying. The incremental processing method's primary functionality is to serve the customer's queries even when the new data is being added to the partition. Analysis Services is able to accomplish this by cloning the existing version of the partition and adding data to that. Once the entire data has been processed in the new version of the partition, and when the original partition is free from any query locks, Analysis Services switches the versions and the new version containing the entire data set is now available for querying. Select the Fact Internet Sales Partition2 as the partition to add the data and Add (incremental) as the processing method as shown in Figure 16-22.

A more typical package using the partition processing transform will contain several lookup transforms to map the incoming dimension columns to the right id in the dimensions of the OLAP database. Once the correct ids for each dimension are obtained, retrieved dimension id columns are mapped to the partition processing columns to load the data.
Click on the Mappings as shown in Figure 16-23 to specify the right mappings from the columns from the flat file to the columns in the partition. The columns in the flat file have been specified to be the same names as the ones in the cube. Hence you will find it easy to map each column directly as shown in Figure 16-23. Make sure you mark all the columns correctly in this page. You can ignore the Dim Geography.Dim Geography destination column since Dim Geography is a reference dimension and hence the key for this dimension does not exist in the fact table. Click OK after completing all the mappings in the Partition Processing Destination Editor.

Figure 16-23
You have completed all the necessary settings in the SSIS package to load data into a partition. Hit the F5 key to test the execution of your SSIS package. You will see that the background colors of the two data flow transforms Flat File Source and Partition Processing are highlighted in yellow indicating that the SSIS package is being executed. Along the connector line between the two transforms you can notice the number of rows being processed. Figure 16-24 shows a snapshot of the SSIS package execution. After all the data has been loaded without errors you will see the background color of the two transforms turn to green indicating successful completion of the package. During this SSIS operation that does incremental processing, Analysis Services creates a new temporary partition, loads the data from flat file, and then merges the partition to Internet Sales Partition 2.

Figure 16-24

In the above data load example there was one-to-one mapping between the columns in the flat file and the measures and granularity attributes of the partitions, except for the reference dimension granularity attribute DimGeography. This was possible because all the measures in the partition directly mapped to the columns in the flat file. Assume you have a fact data column that was used twice in a measure group as two measures: one with sum as the aggregation function and another as count or distinct count as aggregation functions. Such a scenario is pretty common. In this scenario you will not be able to map the corresponding column from the flat file to the two measures since SSIS partition processing transform disallows a column from the source (in this example flat file) to be mapped to multiple destination columns that are part of the partition. If you ignore mappings even for a single destination column that is part of the partition, your data load will fail. You would need to either have additional column(s) in the source so that you can map those to the corresponding columns in the partition or use SSIS transform Copy Column to duplicate existing column(s) to serve as input to the partition processing transform. We recommend you modify the AdvenetureWorks2005Tutorial database to have a distinct count measure in Internet Sales partition and then create an SSIS package with Copy Column transform between flat file data source and partition processing transform to map the column from fact file to the distinct count measure.

You have successfully learned to create SSIS packages for performing administrative tasks on Analysis Services such as backup and processing. Other administrative tasks such as synchronization, restore, etc., can be performed using the tasks and transforms provided by SSIS. In addition to providing tasks and transforms for OLAP features, SSIS also provides tasks and transforms data mining to perform administrative tasks, as well as querying, which you learn in the next section.

Integration Services Tasks for Data Mining

SSIS 2005 provides tasks and transforms specifically targeted for Data Mining objects in Analysis Services 2005. With the Data Mining Query task you can query mining models and store the results in a destination like a relational database. One of the common uses of such a task is to predict a list of customers for a targeted marketing campaign. If the company wants to offer promotional discounts to targeted customers every month, they would predict if a new customer is valuable based on the customer's attributes, calculate an appropriate discount, and mail them a coupon. The data mining query transform is used when you want to manipulate the source data to the mining model or the output of the mining model in a better format. For examples and illustrations of Integration Services Tasks for Data Mining please refer to Professional SQL Server 2005 Integration Services by Brian Knight, et al. (Wiley, 2005).