Advanced Processing and Partition Management Examples


When it comes to an enterprise BI solution, rarely is the status quo applicable for processing objects. Usually, there are unique precedent requirements or complicated partition management and processing needs to get the performance and agility essential for successful BI applications. Earlier, this chapter discussed the different ways that SSAS objects could be processed and managed within SSIS. The prior section looked at the basic out-of-the box functionality. So, now it’s time to consider leveraging some of the other approaches and learning how to take advantage of the Analysis Services Execute DDL Task by making the XMLA dynamic.

Dimension Processing

Dimension processing is generally straightforward, except for some potential unique precedence handling or transaction management. Because of this, even though many of the prescribed approaches would work for dimension processing, the keep-it-simple corollary dictates that some of the approaches would be overkill.

Leveraging the Analysis Services Processing Task is often sufficient, and has already been described. The other straightforward approaches involve creating an XMLA script for processing the dimensions. Rarely do dimension structures change, and, in most scenarios, a process update is performed on the dimensions. In most cases, the XMLA script doesn’t need to be dynamic. Therefore, the XMLA can be generated and then just executed either by using the ASCMD executable or the Analysis Services Execute DDL Task. Here’s an example of a small XMLA script that performs a parallel process update on the Product, Employee, and Time dimensions:

  <Batch     xmlns="http://schemas.microsoft.com/analysisservices/2003/engine">   <Parallel MaxParallel = "2">     <Process xmlns:xsd="http://www.w3.org/2001/XMLSchema">       <Object>         <DatabaseID>Adventure Works DW</DatabaseID>         <DimensionID>Dim Employee</DimensionID>       </Object>       <Type>ProcessUpdate</Type>       <WriteBackTableCreation>UseExisting</WriteBackTableCreation>     </Process>     <Process xmlns:xsd="http://www.w3.org/2001/XMLSchema">       <Object>         <DatabaseID>Adventure Works DW</DatabaseID>         <DimensionID>Dim Product</DimensionID>       </Object>       <Type>ProcessUpdate</Type>       <WriteBackTableCreation>UseExisting</WriteBackTableCreation>     </Process>     <Process xmlns:xsd="http://www.w3.org/2001/XMLSchema">       <Object>         <DatabaseID>Adventure Works DW</DatabaseID>         <DimensionID>Dim Time</DimensionID>       </Object>       <Type>ProcessUpdate</Type>       <WriteBackTableCreation>UseExisting</WriteBackTableCreation>     </Process>   </Parallel> </Batch> 

This code was generated in SSMS by connecting to SSAS. The Processing dialog box (right-click any dimension and choose Process) contains the ability to script out the processing settings.

To run this XMLA script with the Analysis Services Execute DDL Task, the XMLA can just be entered directly into the task as described earlier, or captured into a package variable identified within the task. (A later discussion of partition processing will involve dynamic XMLA with the Analysis Services Execute DDL Task.)

A second approach to executing this XMLA script is to leverage the ASCMD.exe executable that was first shipped with SQL Server 2005 SP1 and with an updated version in SP2. As mentioned earlier, the ASCMD contains the ability to run XMLA scripts, MDX queries, and DMX queries. Refer to the following URL for an article that describes the full feature set of ASCMD.exe:

 http://msdn2.microsoft.com/en-us/library/ms365187.aspx 

In its basic use, the ASCMD can be used in SSIS through an Execute Process Task. For example, the prior XMLA script can be saved to a local directory file, and the Execute Process Task can be used in the control flow to run ASCMD.exe, identifying the file containing the XMLA script. Figure 6-14 shows an example package designed in this manner executing in the designer with the command window open.

image from book
Figure 6-14: Package executing in the designer with the command window open

When looking at the properties of the Execute Process Task, the task is configured to run the ascmd.exe executable and pass in three primary arguments. Figure 6-15 shows the editor being used to execute the Execute Process Task.

image from book
Figure 6-15: Execute Process showing ASCMD editor settings

The first argument, -S localhost, specifies the server to run the script. The second argument, -d "Adventure Works DW", identifies which SSAS database the script should be run against. The third argument, -i Dim_Process.xmla, specifies the name of the file containing the XMLA script to run.

ASCMD.exe also supports the ability to dynamically update the contents of the XMLA, MDX, or DMX code with variables passed into the command line through the arguments. This will be discussed later during an examination of processing partitions with ASCMD.exe.

Partition Creation and Processing

If you have an SSAS solution that does not contain more than one partition per measure group, or your design has a static set of partitions and you do not have the need to dynamically add or process them, then you can use one of the prior solutions demonstrated with the following:

  • The Analysis Services Processing Task

  • The direct input use of the Analysis Services Execute DDL Task

  • The ASCMD.exe utility with a static XMLA file

If you have the need to dynamically process partitions, the following are your choices:

  • A parameter-driven ASCMD.exe (or a dynamic build of the XMLA file)

  • Dynamic XMLA code in conjunction with the Analysis Services Execute DDL Task

  • AMO code embedded in a Script Task or Script Component

Parameter-Driven ASCMD

Among other benefits (such as more comprehensive logging), the ASCMD.exe utility also contains the capability to accept parameters as arguments, and then apply those parameters to the XMLA code being executed. To demonstrate, a new XMLA file has been generated with the following XMLA code. Included in the code are parameters identified by the reference $(parameter_name).

  <Batch    xmlns="http://schemas.microsoft.com/analysisservices/2003/engine">   <Parallel MaxParallel="$(parallel_count)">     <Process xmlns:xsd="http://www.w3.org/2001/XMLSchema">       <Object>         <DatabaseID>Adventure Works DW</DatabaseID>         <CubeID>Adventure Works DW</CubeID>         <MeasureGroupID>Fact Internet Sales 1</MeasureGroupID>         <PartitionID>$(partition_1)</PartitionID>       </Object>       <Type>ProcessFull</Type>       <WriteBackTableCreation>UseExisting</WriteBackTableCreation>     </Process>     <Process xmlns:xsd="http://www.w3.org/2001/XMLSchema">       <Object>         <DatabaseID>Adventure Works DW</DatabaseID>         <CubeID>Adventure Works DW</CubeID>         <MeasureGroupID>Fact Internet Sales 1</MeasureGroupID>         <PartitionID>$(partition_2)</PartitionID>       </Object>       <Type>ProcessFull</Type>       <WriteBackTableCreation>UseExisting</WriteBackTableCreation>     </Process>   </Parallel> </Batch> 

Three parameters have been defined: $(parallel_count), $(partition_1), and $(partition_2). This file has been saved as is with the name Part_Processing.xmla.

The host package will be calling this XMLA script, which contains four matching package variables, and an Execute SQL Task, which updates the variables just before the Execute Process Task, as shown in Figure 6-16.

image from book
Figure 6-16: Host package

Since the package variables must update the argument switches for the executable, a property expression will be used to handle this within the Execute Process Task. The property expression is defined in the Execute Process Task on the Expressions property page, as shown in Figure 6-17. Since the goal is to modify the command line switches, the Arguments property is updated with an expression.

image from book
Figure 6-17: Execute Process Task on the Expressions property page

The following SSIS expression integrates the package variables with the ASCMD switches needed to execute the XMLA file. Remember that package variable references are case-sensitive.

    "-S " + @[User::SSAS_Server] + " -d \"Adventure Works DW\" -i Part_Process.xmla -v parallel_count=\"" + @[User::Parallel_count] + "\" -v partition_1=\"" +  @[User::Partition_1] + "\" -v partition_2=\"" +  @[User::Partition_2] + "\"" 

When the Execute Process Task runs, the Arguments property is updated with the evaluated value for the expression.

  -S localhost -d "Adventure Works DW" -i Part_Process.xmla -v parallel_count="2" -v partition_1="Internet_Sales_2003" -v partition_2="Internet_Sales_2004" 

In conclusion, this approach works well in the case where a fixed number of partitions must execute, and the XMLA can be parameterized to handle the fixed number, which would map to pre-defined package variables. The drawback is that the parameterization requires a fixed number of partitions. To circumvent this, the file could be updated with the correct number of partitions before the execution, if needed, or the Analysis Services Execute DDL Task could be used, as described next.

SSAS Execute DDL Task for Dynamic Processing

The next choice works well for cases where the partition count is dynamic. Using the Analysis Services Execute DDL Task, the XMLA can be modified before the DDL task is run. Figure 6-9 earlier showed how the Execute DDL Task works, by reading the XMLA (or MDX or DMX) either through the direct input, a package variable, or from a file.

Since the SSIS expression language does not support iterative functions, using a property expression with the direct input option will not allow the partition count to be dynamic. Instead, the Script Task is a good alternative to dynamically generate the XMLA and either outputting the script to a file, or the contents into a variable. If you are proficient with TSQL, the XMLA could be generated within a stored procedure and returned with the Execute SQL Task. But be mindful of output parameter length limitations.

Taking the approach of embedding the XMLA inside a variable (which is updated by a script) and using the variable in the Execute DDL Task, your control flow would look similar to Figure 6-18.

image from book
Figure 6-18: Control flow of embedding the XMLA inside a variable

The Execute SQL Task here is returning a comma-separated list of partitions that must be processed into a package string variable called Partition_List. With the following code, the Script Task takes the list, plus the other inputs, and builds the XMLA, returning it to a string variable named XMLA_Code.

  Imports System Imports System.Data Imports System.Math Imports Microsoft.SqlServer.Dts.Runtime Imports System.Text Public Class ScriptMain   Public Sub Main()     Dim sPartitionList As String = Str(Dts.Variables("Partition_List").Value)     Dim sParallel_count As String = CStr(Dts.Variables("Parallel_count").Value)     Dim sb As StringBuilder = New StringBuilder()     sb.Append("<Batch xmlns=""http://schemas.microsoft.com")     sb.Append("/analysisservices/2003/engine"">")     sb.Append("<Parallel MaxParallel=""")     sb.Append(sParallel_count)     sb.Append(""">")     For Each sPartition As String In sPartitionList.Split(","c)       sb.Append("<Process xmlns:xsd=""http://www.w3.org/2001/XMLSchema""")       sb.Append(" xmlns:xsi=""http://www.w3.org/2001/XMLSchema-instance""")       sb.Append(" xmlns:ddl2=""http://schemas.microsoft.com")       sb.Append("/analysisservices/2003/engine/2""")       sb.Append(" xmlns:ddl2_2=""http://schemas.microsoft.com")       sb.Append("/analysisservices/2003/engine/2/2"">")       sb.Append("<Object><DatabaseID>Adventure Works DW</DatabaseID>")       sb.Append("<CubeID>Adventure Works DW</CubeID>")       sb.Append("<MeasureGroupID>Fact Internet Sales 1</MeasureGroupID>")       sb.Append("<PartitionID>")       sb.Append(sPartition)       sb.Append("</PartitionID></Object>")       sb.Append("<Type>ProcessFull</Type>")       sb.Append("<WriteBackTableCreation>UseExisting</WriteBackTableCreation>")       sb.Append("</Process>")     Next     sb.Append(" </Parallel></Batch>")     Dts.Variables("XMLA_Code").Value = sb.ToString()     Dts.TaskResult = Dts.Results.Success     End Sub End Class 

When you are using the Script Task to perform the XMLA and integrating package variables, be sure to add the package variables to either the ReadOnlyVariables or ReadWriteVariables list as appropriate in the Script property page of the task. In this case, the XMLA_Code variable was added to the ReadWriteVariables list, and the Partition_List and Partition_count were added to the ReadOnlyVariables list.

At this point, the XMLA now in the variable can be used by the SSAS Execute DDL Task to process the partitions. Figure 6-9 earlier showed the editor of the DDL Task. In this package, the SourceType is set to Variable and the Source property is set to User::XMLA_Code.

A Preview of the Script Task with AMO

Processing SSAS objects can also be handled programmatically by using the Analysis Services object model API, called AMO (see the complete object model reference at http://msdn2.microsoft.com/en-us/library/ms345088.aspx). AMO contains all the methods, properties, and objects needed to perform many tasks such as creating SSAS partitions or other objects (which is discussed briefly in the next section), modifying settings at all levels, performing administrative operations such as backups and restores, and processing objects.

As a preview to using AMO, the following code in the Script Task loops through all the SSAS dimensions in the AdventureWorks DW cube database and performs a process update on each one:

  Imports System Imports System.Data Imports System.Math Imports Microsoft.SqlServer.Dts.Runtime Imports Microsoft.AnalysisServices Class ScriptMain   Public Sub Main()     ' Get Server and Database name from SSIS connection managers     Dim oConnection As ConnectionManager     oConnection = Dts.Connections("SSAS")     Dim sServer As String = _         CStr(oConnection.Properties("ServerName").GetValue(oConnection))     Dim sDatabase As String = _     CStr(oConnection.Properties("InitialCatalog").GetValue(oConnection))     ' Connect to the requested server     Dim oServer As New Microsoft.AnalysisServices.Server     oServer.Connect(sServer)     ' Connect to the database     Dim oDB As Database = oServer.Databases.FindByName(sDatabase)     Dim oDimension As New Microsoft.AnalysisServices.Dimension     'Process update each dimension     For Each oDimension In oDB.Dimensions       oDimension.Process(ProcessType.ProcessUpdate)     Next     Dts.TaskResult = Dts.Results.Success     End Sub End Class 

Note that the Microsoft.AnalysisServices namespace reference has been added to the script to allow interaction with AMO references. With the release of SQL Server 2005 SP2, this reference is available with the .NET Framework 2.0. However, if you are running a prior release, you will need to copy the Microsoft.AnalysisServices.dll assembly (found in %Program Files%\Microsoft SQL Server\90\SDK\Assemblies) to the .NET Framework build folder (%windows%\Microsoft.NET\ Framework\v2.0.[build#]) in order to interact with the references during design time.

The script also references the SSAS connection (named SSAS as the script references) within the SSIS package, which prevents the need to hard-code the server or database. The full AMO reference can be found in Books OnLine or online at MSDN.

Figure 6-19 shows the control flow with the single Script Task used in this example, but the code can also be used in a Script Component in the data flow, with the connection information included in the PreExecute subroutine.

image from book
Figure 6-19: Control flow with the single Script Task

Partition Management

When it comes to managing SSAS partitions (creating, modifying, and deleting), here are your choices in SSIS:

  • Use an Analysis Services Execute DDL Task to run a dynamically built XMLA script, which uses the Alter and ObjectDefinition commands of XMLA, and dynamically modifies the partition list that must be created.

  • Leverage the ASCMD.exe executable to run the same XMLA DDL statement, and parameterize the XMLA file with the partition and measure group names.

  • Use AMO in a Script Task or Script Component to create or modify the partition.

With the first two choices, the following XMLA example would be the basis for creating a partition. In this example, a new partition is generated in the Fact Internet Sales measure group with the name Internet_Sales_2005. This XMLA was generated in SSMS by scripting a create partition statement dialog to the query editor window.

  <Alter AllowCreate="true" ObjectExpansion="ObjectProperties"       xmlns="http://schemas.microsoft.com/analysisservices/2003/engine">   <Object>     <DatabaseID>Adventure Works DW</DatabaseID>     <CubeID>Adventure Works DW</CubeID>     <MeasureGroupID>Fact Internet Sales 1</MeasureGroupID>     <PartitionID>Internet_Sales_2005</PartitionID>   </Object>   <ObjectDefinition>     <Partition xmlns:xsd=http://www.w3.org/2001/XMLSchema             xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">       <ID>Internet_Sales_2005</ID>       <Name>Internet_Sales_2005</Name>       <Annotations>         <Annotation>           <Name>AggregationPercent</Name>           <Value>20</Value>         </Annotation>       </Annotations>       <Source xsi:type="QueryBinding">         <DataSourceID>Adventure Works DW</DataSourceID>         <QueryDefinition>         SELECT *         FROM [dbo].[FactInternetSales]         WHERE OrderDateKey &gt;= '915'         AND OrderDateKey &lt;= '1280'         </QueryDefinition>       </Source>       <StorageMode>Molap</StorageMode>       <ProcessingMode>Regular</ProcessingMode>       <AggregationDesignID>AggregationDesign</AggregationDesignID>     </Partition>   </ObjectDefinition> </Alter> 

As with the other examples provided earlier in this chapter, the XMLA can be generated in a Script Task, or used in a file with ASCMD parameters to update the properties.

The third choice for creating or modifying partitions is to use AMO. One advantage of AMO is the ability to clone an existing partition to a new partition. In other words, you can have a template partition in each of your measure groups that is used as the basis for any new partition that must be created; then the AMO code is a simple template-copy operation. The following AMO code embedded in a Script Task demonstrates cloning an existing partition named Internet_Sales_2001 to new partitions contained in the Partitions_List package variable (in this case, as a comma-delimited string of names). The script first checks for the existence of the partition before cloning the identified template partition.

  Imports System Imports System.Data Imports System.Math Imports Microsoft.SqlServer.Dts.Runtime Imports Microsoft.AnalysisServices Imports System.Text Class ScriptMain   Public Sub Main()     Dim sPartitionList As String = _         CStr(Dts.Variables("Partition_List").Value)     Dim sCube As String = "Adventure Works"     Dim sMeasureGroup As String = "Internet Sales"     Dim sPartitionTemplate As String = "Internet_Sales_2001"     Dim sb As StringBuilder = New StringBuilder()     ' Get Server and Database name from DTS connection object     Dim oConnection As ConnectionManager     oConnection = Dts.Connections("SSAS")     Dim sServer As String = _     CStr(oConnection.Properties("ServerName").GetValue(oConnection))     Dim sDatabase As String = _     CStr(oConnection.Properties("InitialCatalog").GetValue(oConnection))     ' Connect to the requested server     Dim oServer As New Microsoft.AnalysisServices.Server     oServer.Connect(sServer)     ' Connect to the database, cube, measuregroup, and partition     Dim oDB As Database = oServer.Databases.FindByName(sDatabase)     Dim oCube As Cube = oDB.Cubes.FindByName(sCube)     Dim oMeasureGroup As MeasureGroup = _         oCube.MeasureGroups.FindByName(sMeasureGroup)     Dim oPartitionTemplate As Partition = _         oMeasureGroup.Partitions.FindByName(sPartitionTemplate)     For Each sPartition As String In sPartitionList.Split(","c)       ' Check for the partition existence, clone as needed       If Not oMeasureGroup.Partitions.Contains(sPartition) Then         Dim oNewPartition As Partition         oNewPartition = oPartitionTemplate.Clone()         oNewPartition.Name = sPartition         oNewPartition.ID = sPartition         oMeasureGroup.Partitions.Add(oNewPartition)         oNewPartition.Update()         oNewPartition = Nothing       End If     Next         Dts.TaskResult = Dts.Results.Success     End Sub End Class 

To be sure, this AMO example needs error handling for object existence and does not completely update every property necessary for the new partition (such as the source filter or slicer), but can be used as a starting point for more complex partition management needs.

Putting it all together, you will end up with several steps that comprise your SSAS processing requirements. Figure 6-20 shows a control flow containing four tasks to acquire partitions, process dimensions, manage partitions, and then process your partitions.

image from book
Figure 6-20: Control flow containing four tasks

In Step 1, SQL has tracked the partitions that need processing, and this step returns a comma-separated partition list to the variable XMLA_Code. Step 2 runs the ASCMD executable, which calls an XMLA file configured to perform a process update on the dimensions. Step 3 handles both the creation of the partitions, as well as composes the XMLA code to process the partitions, which is captured in the XMLA_Code package variable. The final step runs the generated XMLA to process the affected partitions.

Processing SSAS Cubes from Non-SQL Server Sources

Oftentimes, you may have an SSAS cube that is sourced from a data warehouse or mart that is stored in a non-SQL Server RDBMS (such as Teradata, Oracle, or DB2), and you are tasked with coming up with the right processing architecture to load the cubes and dimensions as quickly as possible.

The problem you run into is that the data access providers perform very slowly when taking the standard approach to processing the objects, or worse, SSAS does not support the source for SSAS objects.

The solution to this challenge is to leverage the SSAS data flow destinations. In other words, you may have your SSAS database objects built on top of an Oracle schema living on a remote database, but instead of loading your partitions directly from Oracle (which can be painfully slow), instead you can dump the data to a flat file, copy the file over the SSAS server, and then use the Partition Processing destination to load your partitions (potentially a much faster approach). The Dimension Processing and Partition Processing destinations were reviewed earlier (see Figures 6-10, 6-11, and 6-12). Figure 6-21 shows the Mappings tab of the Partition destination.

image from book
Figure 6-21: Mappings tab of the Partition destination

Since dimensions are comparatively small, it may make sense to continue processing dimensions directly from the source system and only leverage this strategy for fact data. That design choice will depend on the complexity, dimension data volume, and performance of your solution.

If you have a data warehouse built on a non-supported source for SSAS (such as IBM DB2), then you can base your cubes on an empty SQL Server schema in a template-like fashion, but still load the IBM DB2 data in the SSAS objects through these SSIS destinations.

Implications of Type 1, Type 2, and Inferred Members on Attribute Relationships

Since we talked about Type 0 fixed, Type 1 changing, Type 2 historical, and inferred member when looking at dimension ETL in Chapter 4, it is worth mentioning the processing implications with SSAS attributes. Figure 6-22 shows the dimension designer in SSAS with the Product Line attribute relationship highlighted for the Model Name attribute of the Product dimension. Also highlighted on the right is the RelationshipType property, which contains two selection choices, Flexible and Rigid.

image from book
Figure 6-22: Dimension designer in SSAS

An attribute relationship associates one attribute with another for several reasons, one of them being aggregations. Choosing Rigid means that the relationships between the attributes never change over time. In other words, the value of one attribute in a member compared with the value of the related attribute doesn’t change. In this example, if you have a Model Name member of StreamLine 2000 and a Product Line name of Road Bikes, and you define the relationship type as Rigid, then StreamLine 2000 should always reference Road Bikes (and will never get moved under another Product Line such as Mountain Bikes). This doesn’t work the other way, since Road Bikes will have many Model Names underneath its hierarchy node.

When it comes to dimension ETL changes and the impact on SSAS, here are some things you need to consider:

  • For Type 0 fixed attributes, all of the attributes can be marked as Rigid for the attribute relationship, because they will not change.

  • Any dimension attribute that is processed as a Type 1 changing attribute in the dimension table must be marked as Flexible for all the attribute relationships it participates in, because the value can be changed or overwritten.

  • Dimension attributes that are Type 2 historical can be marked as Rigid for attribute relationships that are associated to the key attribute.

  • If you have any Type 2 historical attributes that have attribute relationships assigned to non-key attributes, then a Type 2 historical change may break the relationship when processing. Why? Attribute relationships are many-to-one or one-to-one when going from the attribute to the attribute relationship. A new Type 2 historical record may create a many-to-many relationship between values at the higher levels and, therefore, that attribute may need to stay as an attribute relationship only at the key attribute level.

  • If you have a dimension that has inferred members, then you must mark every attribute relationship as Flexible, because each one can change when the inferred member is updated.

The exception to all of this is when you can perform a full process of your entire SSAS every time you need to process data. All the dimension attribute relationships can be defined as rigid if the dimension structures can be rebuilt. But rebuilding the dimension structures drops all related partition data. Therefore, the entire cube would need processing. For more on SSAS Attribute Relationships and the RelationshipType property, see the following MSDN links:

 http://msdn2.microsoft.com/en-us/library/ms166553.aspx http://msdn2.microsoft.com/en-us/library/ms176124.aspx 



Expert SQL Server 2005 Integration Services
Expert SQL Server 2005 Integration Services (Programmer to Programmer)
ISBN: 0470134119
EAN: 2147483647
Year: 2004
Pages: 111

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net