SSAS Integration Basics Within SSIS | Expert SQL Server 2005 Integration Services (Programmer to Programmer)

To make an informed decision on how you will handle your SSAS processing, it is important to understand the uses, benefits, and drawbacks of the built-in SSAS components in SSIS. This section is followed by examples of the various choices reviewed, which, in some cases, use these features.

The two built-in control flow components already mentioned that interact with SSAS are the Analysis Services Processing Task and the Analysis Services Execute DDL Task. The two data flow components that interact with SSAS are the Dimension Processing destination and the Partition Processing destination. All of these components leverage an Analysis Services connection that must be defined in the Connection Managers window pane.

To create a new connection, right-click in the Connection Managers window pane and select New Analysis Services Connection, which brings up the Add Analysis Services Connection Manager window with a connection string. When you select Edit next to the connection string, it opens up the standard Connection Manager editor, as shown in Figure 6-4.

image from book
Figure 6-4: Standard Connection Manager editor

The Server or file name entry is the name of the SSAS server, and the Log on to the server selection defines how the connection should pass its authentication to the SSAS server. SSAS only supports Windows authentication (local machine or Active Directory security), so the two choices for security both relate to Windows security.

The Use Windows NT Integrated Security option uses the existing security account for the user who is logged in during either design or execution. When a package is executed through the command line, it is the account that runs DTExec, which may be a service account or a proxy account when using SQL Agent or Windows scheduler, or a third-party scheduling tool.
The Use a specific user name and password option allows a Windows account and password to be entered, so that you can define a different Windows account that the connection uses. Because a password must be entered, even if you select the Allow saving password check box, you must still handle the password storage or un-encryption (through an SSIS configuration entry, or command line switch or property expression).

Chapter 9 discusses package security and package roles in more detail.

Control Flow Tasks for SSAS

As mentioned, the two control flow tasks are the Analysis Services Processing Task and the Analysis Services Execute DDL Task.

Analysis Services Processing Task

The Analysis Services Processing Task is designed to process objects selected in a list with the ability to set processing types, handle errors, order and relate objects, and manage transactions. Figure 6-5 shows the Analysis Services Processing Task Editor.

image from book
Figure 6-5: Analysis Services Processing Task Editor

Adding objects to the Object list is very straightforward. Selecting Add below the Object list will enable an entire SSAS database to be selected, or specific cubes, dimensions, or mining models. When selecting a cube or mining model, you are merely instructing the task to process the underlying measure groups and partitions or mining structures. These can also be selected individually, as Figure 6-6 shows.

image from book
Figure 6-6: Adding objects to the Object list

When the objects are selected, the task will return them to the Object list on the main screen. Here is where the processing types can be selected. For each item in the list, choose the processing option from the drop-down list. Figure 6-7 shows some of the choices for the Product dimension.

image from book
Figure 6-7: Processing options for the Product dimension

If you have selected a parent container with the intention of processing the underlying objects, the list will only show the processing options that are in common with all the objects. So, you may need to select the different objects individually. Also, if you have selected the Process Incremental option for a partition, you will be prompted to configure some advanced settings. This is because when adding data to a partition incrementally, you must define which data should be added. The configuration screen (not shown) allows for a table, view, or query to be specified as the source for the rows being added to the partition.

The second aspect of this task involves setting how the processing batch will be handled. The lower half of the task editor window shown earlier in Figure 6-5 has settings related to the batch. Selecting Change Settings opens a window to modify the advanced properties, with a Processing options tab and a Dimension key errors tab. The first tab allows some general settings to be updated, such as whether to run the processing sequentially or in parallel, and how to handle SSAS transactions initiated by the task. Also, a writeback table selection exists, along with the option to process affected objects.

Figure 6-8 shows the Dimension key errors tab, which controls the action taken when a partition is being processed, and a dimension key is referenced in the underlying fact, but the associated key doesn’t exist in the SSAS dimension itself.

image from book
Figure 6-8: The Dimension key errors tab

Similar settings can be defined within a given measure group and the Use default error configuration option will then default to those settings. The Use custom error configuration option enables you to override these settings. For example, you are able to set a threshold number (property Number of errors) so that if processing errors reach the threshold, either the task can fail, or it can stop reporting the errors.

Several key errors will also cause issues in processing, including missing keys, duplicates, and NULL values. These settings can ignore, report, and continue processing, or report and stop the processing. In this case, you are able to define what types of errors you allow to happen, and which indicate a bigger issue that may need to be addressed (and, therefore, stop the processing).

The following are some drawbacks to the Analysis Services Processing Task:

Although some of the general properties are configurable through property expressions or SSIS Configurations, what is not changeable is anything related to the list of objects and the batch setting. In essence, when you use this task, you are hard-coding the list of objects and how the objects should be processed. In many cases, this will not meet your requirements, especially if partitions are involved. The good news is that even if you have multiple partitions, typically your dimension list doesn’t change, nor the processing type. This makes the processing task a good choice for defining dimension processes.
A second drawback is that the SSAS Processing Task either requires every object to be processed one at a time (in separate transactions), or the entire process set to run in parallel in the same transaction. The drawback when choosing to run the processing in parallel is that the processing is an all-or-nothing proposition. Therefore, you need to consider other choices that allow for better granular control over which objects get run in which order, and how things are batched for transactions for restartability.

Analysis Services Execute DDL Task

The Analysis Services Execute DDL Task is relatively simple in form, but provides excellent functionality. Figure 6-9 shows the task editor with the DDL property page selected.

image from book
Figure 6-9: Analysis Services Execute DDL Task Editor

There are only three properties. The Connection property, as you would guess, defines which SSAS package connection should be used. The SourceType property allows three different methods to input the XMLA statement:

Choosing Direct Input changes the third property to SourceDirect, where the XMLA can be hard-coded into the task.
Selecting File Connection changes the third property to be Source, so that a file connection manager can be selected, which identifies the file system file that contains the XMLA to run.
Choosing Variable in the SourceType means that the XMLA will be embedded in a variable of type string. The Source property becomes the package variable that the task will use.

Because these properties are exposed, you are able to modify them through expressions and configurations, which will be the pivotal use of the Analysis Services Execute DDL Task as we look at some dynamic examples.

Data Flow Destinations for SSAS Objects

Besides the data mining components discussed later, the final consideration for the built-in features involves the data flow destinations: Dimension Processing and Partition Processing.

As briefly mentioned earlier, these two data flow destinations are the only SSIS methods where the ETL process is actually pushing the data into SSAS. The other choices listed earlier instruct SSAS to begin its processing, which causes the SSAS engine to pull data in from defined sources for the various objects being processed. As discussed later, these destinations are a very powerful mechanism to load data into SSAS from non-standard sources and for near real-time processing.

Dimension Processing Data Flow Destination

The Dimension Processing destination is connected directly to the data flow pipeline, which allows any type of transformation to influence the data before it is pushed into the SSAS dimension structures. Figure 6-10 shows a data flow where source rows from a flat file are being loaded into the Geography dimension.

image from book
Figure 6-10: Data flow with source rows from a flat file

The Dimension Processing destination provides property pages for specifying the dimension (and processing option), column mappings from the source data into the dimension attributes, and dimension key error handling. Figure 6-11 shows the Connection Manager property page in the Dimension Processing Destination Editor, which identifies the SSAS connection manager, the dimension (in this case, Geography), and the type of processing (described earlier).

image from book
Figure 6-11: Connection Manager property page in Dimension Processing destination

Mapping the source columns from the pipeline to the dimension attributes is handled in much the same way as any other source-to-destination mapping within the pipeline. Figure 6-12 shows the Mappings page of the Geography Dimension Processing destination.

image from book
Figure 6-12: Mappings page of the Geography Dimension Processing destination

Attribute names and keys are required for the processing to be successful. The keys are defined within SSAS, which may be a collection of more than one column. The Copy Column transformation shown in Figure 6-10 is creating duplicates of the columns that are used for attribute keys so that they are available in the destination. Not shown, the Advanced tab allows key error handling, identical to the details shown earlier in Figure 6-8.

Partition Processing Data Flow Destination

The Partition Processing destination is very similar to the Dimension Processing destination. As with any destination, the data flow can pull from any source and perform a variety of transformation logic before the Partition Processing destination.

Figure 6-13 shows the Connection Manager tab of the destination, which is selected on the sole partition of the currency rate measure group. This page includes processing options to handle additions, full process (which includes aggregations), or data loading only.

image from book
Figure 6-13: Connection Manager tab of the destination

The Mappings tab and the Advanced tab are identical to the Dimension Processing destination, where the columns are mapped and key error handling can be defined.