Analysis Services 2005 Enhancements

The enhancements to Analysis Services are extreme, enough to warrant multiple books of their own. This chapter focuses on the enhancements that are relevant to Analysis Services administrators and are mostly addressed at a very high level. Details are presented in specific cases where they provide critical insight to the new paradigms introduced in this new, dramatically improved version of the product. Specific changes to front-end development, such as the many language enhancements to Multidimensional Expressions (MDX), are not addressed in this chapter.

Integrated Tools and the Enhanced User Interface Experience

Business Intelligence Development Studio (BIDS) and SQL Server Management Studio (SSMS) are the two biggest tools introduced as part of the SQL Server 2005 toolset. BIDS provides an integrated environment for developing and deploying business intelligence solutions, and SSMS provides an administration and management interface for maintaining already deployed business intelligence solutions. SSMS also provides the ability to author and execute Multidimensional Expressions (MDX), Data Mining Extensions (DMX), and XML for Analysis (XMLA) queries. BIDS enables you to create SSIS packages, build Analysis Services objects such as cubes and data mining models, and author Reporting Services reports.

BIDS provides a lot of templates to accelerate the development of cubes, dimensions, and so on. SSMS also provides a bunch of templates to give you a head start in authoring MDX and DMX queries. Each management dialog in SSMS (such as the Analysis Server Properties dialog) contains a Script button that you can click to generate XMLA script for the actions performed by that dialog. XMLA is discussed later in this chapter.

SSMS and BIDS are discussed in great detail in Chapter 5, "SQL Server 2005 Tools and Utilities."

You can use the SQL Server Configuration Manager tool to start and stop the Analysis Services instance and to view or edit service properties, such as the account under which the service runs.

Profiler Support

The SQL Profiler tool can now be used to trace Analysis Services events. You can use it to view both administrative commands and query requests. The DDL commands are displayed as XMLA scripts, and the queries are displayed as either MDX or DMX. Using Profiler is a great way to learn about MDX, DMX, and XMLA queries. You can perform various operations, such as browse a cube, in SSMS or BIDS and use Profiler to see the commands executed by these tools. Profiler also exposes events to show whether data was read from cache or disk and which aggregations were used to satisfy the query. Figure 13.1 shows SQL Profiler in action, capturing the queries and commands submitted to an Analysis Services instance.

Figure 13.1. Profiler in SQL Server 2005 can be used to trace commands and queries submitted to Analysis Services 2005.

Multiple Instance Support

Analysis Services 2000 did not support the installation of multiple instances. SQL Server 2005 fixes that problem by enabling up to 50 instances of the Analysis Services engine from SQL Server 2005 Enterprise Edition or 16 instances of the Analysis Services engine from other editions to be installed on the same machine.

By default, Analysis Services listens on port 2383. Additional instances can be assigned specific ports or can use a dynamic port configuration. In the latter scenario, Analysis Services attempts to use port 2383. If that port is not available, it looks for the first available port in a range of port numbers. The Analysis Services redirector, a part of the SQL Browser Windows service, is responsible for redirecting connections to the appropriate ports by using instance names in the connection string. Therefore, the Analysis Services redirector port, 2382, must be open to any external traffic that does not explicitly specify port information.

Failover Clustering Support

Analysis Services 2000 did not support failover clustering. Analysis Services 2005 adds this high-availability support and allows Analysis Services to be installed in an 8-node failover cluster on 32-bit systems and in a 4-node failover cluster on 64-bit platforms. The SQL Server 2005 setup is cluster aware and can seamlessly install Analysis Services on cluster nodes.

Unified Dimension Model (UDM) and Proactive Caching

The unified dimensional model (UDM) is arguably the most significant feature in Analysis Services 2005. The UDM provides a layer of abstraction between the source database and the data model that is presented to the querying user. The ultimate goal is to provide the best of OLAP and relational reporting.

Table 13.1 shows that the high-level benefits of relational reporting are not available in OLAP and vice versa. The UDM introduces a revolutionary platform for providing all these reporting features and more.

Table 13.1. Relational Reporting Versus OLAP Reporting
Feature	Relational Reporting	OLAP Reporting
Real-time data access	Yes	No
Detailed reporting	Yes	No
High performance	No	Yes
Flexible analysis	No	Yes

The UDM combines the best aspects of traditional OLAP-based analysis and relational reporting into one dimensional model. In its most simple form, the UDM could be used to provide user-friendly names to a querying client. Many OLTP applications have very cryptic column names that would not make sense to a business user, or even a developer, who tries to create queries on the information. You could simply define the UDM to provide a mapping to these underlying columns that renames them to more understandable names, such as changing CustID to Customer Identifier or CustFN to Customer First Name.

As mentioned previously, this is a highly simplified view of what the UDM provides; a UDM is far more powerful than that. The power of the UDM can best be demonstrated by walking through an implementation.

The UDM begins with the definition of one or more data sources. These are defined by using BIDS. These data sources can be any data store that is accessible via an OLE DB connection. This means that the UDM can actually be a combination of tables from various sources, such as SQL Server 2000, SQL Server 2005, Oracle, and DB2.

A data source view is then created to define a subset of information to be gathered from each data source. This data source view can then be manipulated to create relationships, rename tables or columns, add calculated columns, and more. The result is a logical data store that can be queried just as if it were physically stored in a single database. And, in fact, it can be physically stored. This is where the concept of proactive cache comes into play.

Proactive caching manages the local MOLAP data cache and can be configured to provide real-time, high-performance access to the information in the UDM at any point in time. This latency definition is literally represented as a slider bar in Analysis Services, as Figure 13.2 shows.

Figure 13.2. The Aggregation Design Wizard allows you to specify the storage mode and the caching options.

On one end of the slider bar, there is no latency. When changes have been made to the underlying data store, queries against the affected data revert to ROLAP until a new copy of the cache is rebuilt. At the other end of the slider bar is MOLAP. This means that the information is stored in Analysis Services and is updated within a period that is explicitly managed. Intermediate settings determine how often the MOLAP cache is rebuilt and whether the MOLAP cache can continue to be queried (instead of reverting to ROLAP) while the new cache is being built.

The beauty of proactive caching and the UDM is that it is fully configurable by the Analysis Services administrator. An experienced administrator knows that nothing comes without a price. It is rare that a business user does not say that he or she wants to receive information in real-time. It is up to administrators and developers to ascertain whether that is truly a requirement. The performance impact of real-time data should be taken into consideration. This impact does not change entirely with the UDM. Using MOLAP, which is on the other end of the spectrum, results in prestored, pre-calculated information that provides much better performance. MOLAP, however, does not give you the auto-processing functionality or the up-to-date information that you get with real-time or near-real-time analysis.

As you can see, the UDM does not remove all the real-world considerations of the past. However, it does provide a very easy mechanism of combining and presenting information for one or more source systems at a point in time that is completely configurable. Previously, such a solution would have required an extensive effort in architecting and programming.

Cube Enhancements

Analysis Services 2005 introduces several enhancements and new features related to designing and managing cubes. The following sections present some of the improvements introduced in Analysis Services 2005.

Multiple Fact Tables

You can use multiple fact tables in a single cube to consolidate related information. Measure groups are loosely equivalent to the concept of cubes in the previous versions of Analysis Services. These multiple-fact table cubes more closely resemble virtual cubes from Analysis Services 2000.

Intellicube

In Analysis Services 2000, an intimate knowledge of the underlying source database was required prior to the setup of dimensions and cubes. Each dimension had to be set up separately and added, explicitly, to a new cube. The Intellicube feature is invoked automatically when you create a new cube and check the Auto Build box in the Cube Wizard. It examines the tables in the designated data source view and determines what appear to be dimension and fact tables. These can be accepted or changed during the cube-building process. This functionality is very accurate, particularly if the source tables follow traditional multidimensional design techniques (using the star schema). This saves considerable time and effort in developing a new cube.

Key Performance Indicators (KPIs)

The term key performance indicator (KPI) refers to an indicator that measures how a business is doing in a given area. A simple example could be sales per day per retail square foot. For instance, for a given store, a KPI could be defined to include the following definition:

$10,000	Top performer
$5,000 and < $10,000	Average performer
< $5,000	Poor performer

You could define this KPI as a part of a cube and easily have any application reference it. A KPI consists of a value, a goal, a status, and the trend. Several user-friendly graphics give a visual representation of the values, such as a gauge, traffic light, road sign, or thermometer. Figure 13.3 shows the KPIs tab inside BIDS.

Figure 13.3. The Cube Designer in BIDS allows you to define KPIs and associate visual representational graphics such as traffic lights, gauges, and so on with them.

Translations

Translations are a globalization feature that provides the ability to display metadata (that is, dimension and measure labels) and data in alternate languages. This is all controlled at the cube definition level, again removing complexity from the underlying applications. Figure 13.4 shows some sample label strings and their respective translation strings in Spanish and French.

Figure 13.4. The Translations tab allows you to define dimension and measure labels in multiple languages.

Perspectives

Perspectives provide a means of presenting alternate views of a cube to users to remove some of the underlying complexity. This is similar to creating a view in SQL Server that includes only a subset of the columns from the underlying tables.

Dimension Enhancements

At a high level, the concept of dimensions has changed in Analysis Services 2005. In Analysis Services 2000, dimensions were highly hierarchical structures. Dimension attributes, called member properties, could be used to create a separate dimension, called a virtual dimension. This greatly limited the navigational capabilities of business users.

Dimensions in Analysis Services 2005 are attribute based and are visible by default. They can be used for filtering and analysis, just like formal dimension structures were in Analysis Services 2000. This greatly expands the analytical capabilities of data analysts. Attributes correspond to the columns in the tables of a dimension. In addition to this extremely powerful aspect of dimension analysis, the following dimension enhancements are available in Analysis Services 2005.

No More 64,000 Limit

Analysis Services 2000 required that no member could have more than 64,000 children. This could be overcome by using member groups or introducing an explicit intermediate level that was invisible to the end user. This limitation is no longer present in Analysis Services 2005, and, therefore, no workaround is needed.

Role-Playing Dimensions

In Analysis Services 2000, any sourced dimension table could be used only once for a cube. This issue came up regularly if there were two date dimensions on the fact table that referenced a single date dimension. An example is when the fact table has an OrderDate and a ShipmentDate. Analysis Services 2000 required that a separate date dimension table be created to source the second dimension. In Analysis Services 2005, the same date dimension can be used for multiple foreign key columns in a fact table.

Reference Dimensions

A reference dimension table is one that is indirectly related to a fact table. Reference dimensions are readily apparent in snowflake designs, but Analysis Services 2005 provides the capability to separate out reference dimension with or without an underlying snowflake design. These dimensions can be used as part of a dimension hierarchy or created as a separate dimension in their own right. An example might be a case where a Product dimension is related directly to a Sales fact table. A Vendor table may then be related to the Product table but not directly to the Sales fact table. In this case, Vendor could be a level above Product in the same dimension, and/or it could be segregated into its own explicit dimension, even though it is not directly related to the Sales fact table.

Fact Dimension Relationships

A fact dimension is a dimension whose attributes are drawn from a fact table. This provides a means of implementing degenerate dimensions in Analysis Services without explicitly creating and populating the dimension in the star schema. Due to their large size, degenerate dimensions were difficult and expensive to load and manage.

Many-to-Many Dimensions

Analysis Services 2005 provides a solution to the age-old problem of many-to-many dimensions. An example could be a product that exists in multiple categories. The relational model would typically have a ProductCategory bridge table between the Product dimension and the Sales fact table. Analysis Services allows you to specify this type of relationship when defining the relationships between dimension tables and fact tables. Analysis Services handles the rest, making sure each view of the data shows the correct totals.

Multiple Hierarchy Dimensions

The concept of multiple hierarchies existed in Analysis Services 2000 in name only. In essence, dimensions with multiple hierarchies were actually separate dimensions that shared a high-level name. In fact, these dimensions are migrated as separate dimensions when you use the Migration Wizard because that's exactly what they are. In Analysis Services 2005, dimensions don't have this formal hierarchical structure. Many hierarchies are supported through the use of attribute dimensions, which is implicit within the product.

Data Mining Enhancements

Data mining can be used to provide tremendous insight into data without having to manually perform in-depth manual analysis. It can be used to determine patterns between two or more pieces of information (for instance, buying patterns based on age, gender, and years of education). Without the proper tools, the task of mining data can be very tedious and requires expertise in statistical analysis. Analysis Services provides data mining algorithms to circumvent this requirement and allow for easy definition of mining models in order to accomplish this task.

Analysis Services 2000 provided two data mining algorithms: clustering and decision trees. Data mining gets a significant boost in Analysis Services 2005, with the introduction of five new mining models. Here is a brief explanation of each of these new data mining algorithms:

Naïve Bayes This is a relatively simple algorithm that works well for predictive behavior analysis. Because it requires less computational activity, it is faster than most other models.
Association As its name implies, the Association algorithm associates items that are likely to appear together in a single unit, such as a transaction. This makes it ideal for market-basket analysis for the purposes of cross-selling.
Sequence clustering This algorithm can be used to analyze the sequence of events in a unit. It makes it an excellent tool for click-stream analysis.
Time series This is a forecasting algorithm that can be used to predict future sales trends based on historical data.
Neural network This algorithm is similar to decision trees (available in Analysis Services 2000) but can define a three-dimensional node structure to analyze larger amounts of data.

In addition to these new algorithms, the decision trees algorithm has been enhanced to allow an additional, continuous attribute as a predictable column.

The small number of data mining algorithms in Analysis Services 2000 is only partially the reason that data mining was a largely unused part of the product. Another was the difficulty in setup, maintenance, and presentation of the resulting findings of data modeling. Several new enhancements in Analysis Services 2005 have made data mining more accessible.

The Data Mining Wizard guides both novice and experienced users through the process of creating a data mining structures and models. SSIS has been enhanced with new tasks that create and process mining models and can subsequently run queries against them. Reporting Services also includes the ability to build reports on top of mining models.

XMLA Support

As previously mentioned, Analysis Services communication is now based exclusively on XMLA. XMLA is a standard protocol for communicating with multidimensional data stores. It is the result of a consortium between Microsoft and two other leading OLAP vendors, Hyperion and SAS. The specification can be found at www.xmla.org.

According to xmla.org, XMLA provides an "open industry-standard web service interface designed specifically for online analytical processing (OLAP) and data-mining functions." XMLA has actually been around since 2001 and was available for use in Analysis Services 2000, but it required the installation of the XML for Analysis Services Development Kit and the subsequent setup of a virtual directory in IIS. XMLA is the native language protocol in Analysis Services 2005 and is installed with the product. It can be integrated with SOAP to make Analysis Services a universally accessible web service.

Microsoft has further extended XMLA to include a specification for managing an instance of Analysis Services and to manage Analysis Services objects such as cubes, dimensions, data sources, and mining models. These extensions are comprehensively referred to as the Analysis Services Scripting Language (ASSL). These objects can be scripted in either BIDS or SSMS. These objects can then be checked into a source code management system to provide a database versioning system. XMLA code can also be executed in SSMS.

Figure 13.5 shows an example of the XMLA script to create a new data source.

Figure 13.5. XMLA scripts can be authored and executed in SSMS. This script is for creating a data source in a database named `Adventure Works DW`.

XMLA consists of two methods: Discover and Execute. The Discover method provides a means of examining metadata, such as enumerating through databases, data sources, cubes, and data models on an Analysis Services instance. Properties can be examined and used to construct a subsequent query request in the form of MDX or DMX. These are sent to the Analysis Services instance by using the Execute method.

As mentioned earlier, all Analysis Services objects are scripted and saved in the Data directories. XMLA code can be scripted or viewed in various ways. One way is to find the .xml scripts throughout the data directory. Another is to select the View Code option in BIDS when viewing the object to be scripted. Probably the most intuitive method is to script the objects directly within SSMS. You can script virtually any object at any point in the Analysis Services hierarchy by right-clicking the object in Object Explorer and selecting the appropriate scripting menu option. These scripts can even be scripted to an XMLA query window, where text can be globally replaced and run to create a replica of the source object. This approach is useful for copying cubes and for creating additional partitions on a measure group.

In summary, XMLA support in SQL Server 2000 was provided as an add-on, whereas XMLA is natively built into Analysis Services 2005. XMLA with Analysis Services 2000 required clients to send HTTP/SOAP requests, which were processed by IIS, the XMLA SDK, and the Pivot Table Service on the middle tier. The middle-tier components then communicated with Analysis Services 2000 to execute the XMLA commands and obtain the results. With Analysis Services 2005, the client can submit the XMLA commands to Analysis Services 2005 directly over TCP/IP or over HTTP by using IIS.

Tip

There are four ways to learn XMLA: (1) by using SQL Profiler and tracing the server activity to see XMLA queries submitted to the server; (2) by right-clicking Analysis Services objects in Object Explorer in SSMS and selecting the appropriate scripting option; (3) by right-clicking Analysis Services objects in Solution Explorer in BIDS and selecting View Code; and (4) by clicking the Script button on the management dialogs in SSMS.

For instance, you can right-click an Analysis Server instance in Object Explorer, select Properties, and click the Script button to generate XMLA script to alter the server properties. Because AMO (discussed next) is based on XMLA, you can write an AMO script, execute it while Profiler is running, and see the XMLA that it submits to the server to perform the action. For example, if you write the AMO code to end, or kill, a longrunning or a ghosted connection, you see an XMLA command similar to the following in Profiler:

<Cancel xmlns=  "http://schemas.microsoft.com/analysisservices/2003/engine">   <SPID>1453</SPID> </Cancel>

If XMLA seems complicated to you, you can use a .NET-based API called Analysis Management Objects (AMO) from .NET code or scripting languages such as VBScript to manage Analysis Services 2005 objects. AMO is discussed in the following section.

Analysis Management Objects (AMO)

AMO provides a programmer-friendly layer on top of XMLA. It is exposed as a .NET object model and can be coded using any .NET programming language, such as Visual Basic .NET or C#. It replaces its predecessor, Decision Support Objects (DSO). DSO is still available in Analysis Services 2005 for backward compatibility, but it has not been enhanced to support any of the new functionality. Therefore, cubes that have been changed to use any of the new features in Analysis Services 2005 are no longer manageable by DSO.

AMO provides a more logical view of the Analysis Services environment than DSO. DSO inevitably required the use of the universal MDStores interface in order to reference databases, cubes, partitions, and aggregations. This was certainly confusing to anyone new to the object model, and it resulted in code that was sometimes difficult to interpret. MDStores had a list of possible properties that could be interpreted differently and were optionally available, depending on the object being referenced.

AMO has its own complexities, but they are related to the new Analysis Services architecture. When you understand the new architecture, the AMO object model becomes very intuitive. UDM, measure groups, and perspectives are just some of the new functionality that changes the way you traverse the object hierarchy to implement code. For instance, in Analysis Services 2000, the natural hierarchy from server to partition looked like this:

Server Database Cube Partition

The DSO code to traverse this hierarchy looked something like this:

Set dsoServer = New DSO.Server dsoServer.Connect ("localhost") Set dsoDB = dsoServer.MDStores("Foodmart 2000") Set dsoCube = dsoDB.MDStores("Sales") Set dsoPartition = dsoCube.MDStores(1)

In Analysis Services 2005, the natural hierarchies have changed somewhat. In the preceding example, the roughly equivalent Analysis Services 2005 hierarchy is as follows:

Server Database Cube Measure Group Partition

A code segment to navigate this hierarchy would look something like this:

Server.Connect("LocalHost") Database = Server.Databases("Foodmart 2000") Cube = Database.Cubes("Sales") MeasureGroup = Cube.MeasureGroups("Sales") Partition = MeasureGroup.Partition(0)

AMO can be used to automate numerous tasks in Analysis Services, such as backing up all databases on a server, creating new partitions during incremental processing, and running reports to list metadata information and check for best practices. Virtually anything that can be done manually in Analysis Services can be coded in a .NET program, VBScript, or a script task in SSIS. This chapter focuses on SSIS because that is often the method of choice by system administrators. The SSIS script task is especially a good choice when using AMO to perform tasks such as adding partitions, performing backups, and so on because these tasks are often part of the end-to-end process of loading the data warehouse, which is usually performed in an ETL tool such as SSIS.

Using AMO Inside an SSIS Script Task

SSIS includes a new task that can be used to script managed code by using a .NET language. This has many advantages over VBScript, not the least of which is usability. The scripting environment offers many of the browse and help features of the Visual Studio .NET environment. You can use Visual Studio .NET to examine the objects, methods, and properties throughout the AMO model.

The first hurdle to implementing AMO, however, is referencing the object libraries. A limitation in Visual Basic for Applications (VBA), which is used in SSIS, requires that the AMO-related assemblies (Microsoft.AnalysisServices.DLL and Microsoft. DataWarehouse.Interfaces.DLL) be copied from the %ProgramFiles%\Microsoft SQL Server\90\SDK\Assemblies folder to the .NET folder in the Windows directory (for example, %windir%\Microsoft.NET\Framework\v2.0.50215). After you do this, the assemblies are accessible from the scripting task in SSIS. You can also right-click the script folder under Class View in the Microsoft Visual Studio for Applications script editor, select Add Reference, and then select Analysis Management Objects and Microsoft.DataWarehouse.Interfaces from the list. Figure 13.6 highlights the assembly references that are required for any basic AMO programming.

Figure 13.6. .NET assemblies are required to use AMO API.

To see AMO in action from an SSIS script task, you can launch BIDS, select File | Open | Project/Solution or press Ctrl+Shift+O, and open the Backup Sample Project.sln solution file. Then you can double-click the BackupDatabase.dtsx SSIS package, double-click the BackupDatabase script task, and on the Script page click the Design Script button to view the script.

Here is how the Visual Basic .NET script that uses AMO to back up an Analysis Services database looks:

Imports System Imports System.Data Imports System.Math Imports Microsoft.SqlServer.Dts.Runtime Imports Microsoft.DataWarehouse.Interfaces Imports Microsoft.AnalysisServices Public Class ScriptMain    Public Sub Main()       Dim server As New Server       Dim database As Database       Dim databaseName As String       Dim serverName As String       Dim backupFolderName As String       Dim backupInfo As New BackupInfo       serverName = Dts.Variables("ServerName").Value.ToString()       databaseName = Dts.Variables("DatabaseName").Value.ToString()       backupFolderName = Dts.Variables("BackupDirectory").Value.ToString()       server.Connect(serverName)       database = server.Databases(databaseName)       backupInfo.AllowOverwrite = True       backupInfo.ApplyCompression = True       backupInfo.File = backupFolderName & databaseName & ".abf"       database.Backup(backupInfo)       server.Disconnect()       Dts.TaskResult = Dts.Results.Success    End Sub End Class

Three SSIS variables are created to pass the name of the Analysis Services server, a database name, and a backup directory. You should right-click the Workflow designer area and select Variables. Then you should provide the values for these three variables, save the package, and execute it. If the package execution fails, you need to make sure that you have copied the Microsoft.AnalysisServices.DLL and Microsoft.DataWarehouse.Interfaces.DLL AMO assemblies from the %ProgramFiles%\Microsoft SQL Server\90\SDK\Assemblies folder into the .NET folder as mentioned previously; you also need to make sure you have updated the variable values and have the correct server name, database name, and backup folder path, as well as ensure that the specified backup folder already exists on the server.

AMO also allows you to create and maintain data mining objects, including model security, processing, and backup and restore.

ADOMD.NET

With SQL Server 2000, using the ADO MD and XMLA add-ons are the two ways to run queries against Analysis Server from a client application. ADO MD (MD for multidimensional) is a COM-based API created around OLE DB for OLAP. Much as all COM-based APIs are being replaced with .NET-based object libraries in SQL Server 2005, the ADO MD is being replaced with ADOMD.NET.

ADOMD.NET is a .NET-based object model for querying the Analysis Services server. Like AMO, ADOMD.NET is also based on XMLA. When you use the ADOMD.NET object model, behind the scene it generates XMLA that is submitted over TCP/IP or HTTP to the Analysis Server to retrieve data and metadata information from the server.