Analysis Services Concepts | Microsoft SQL Server 2005 Reporting Services

It is no surprise that during the course of business every company accumulates data. It is common to divide databases into two categories: transaction processing and data warehouses.

Typical transaction processing (Online Transaction Processing or OLTP) databases are used for current processing, such as online sales, customer management, employee management, production, and inventory management. Transaction processing databases are small in size , usually less than 100GB, and tend to have only a small amount of historical data that is directly relevant to ongoing transaction processing. OLTP database is tuned for fast processing of transactions, incurs many changes (volatile) throughout a day, and is normally not used for reporting. An OLTP database is typically normalized to optimize data storage and retrieval for transactions.

A data warehouse is a database that is used for reporting and data analysis. It is possible to use an OLTP database for reporting, but it often negatively impacts transactional performance and is certain to make online users unhappy . If an online purchasing transaction is slow, a user is likely going to use another site that offers better performance.

A data warehouse (DW) usually has the following attributes:

It is large in size (it is not uncommon to have multiterabyte data warehouses) and might contain years of historical data. Part of the data might not be currently useful, but provides a view into the company's past and can be analyzed to determine why, for example, online sales were successful in a certain time frame, or how well a marketing campaign performed.
It stores data in the matter understandable to business users as opposed to applications. In contrast, OLTP databases are designed to accommodate applications. A DW is designed to answer business questions, such as sales numbers for a particular region and time frame.
It is updated on predetermined intervals (once daily, weekly, or monthly) from transactional databases and ideally does not change previously stored historical data.

A subset of a data warehouse is called a data mart. A data warehouse usually contains data from a variety of heterogeneous data sources. Data marts are designed to minimize the amount of data used in processing and can contain a subset of data, based, for example, on time or geography.

Unified Data Model (UDM) is a new future that is available in SQL Server 2005. UDM greatly simplifies access to data and combines the best of relational and analytical models.

UDM allows Reporting Services to get data from Analysis Services in a similar fashion as from any relational data source. One way to think about UDM is as a view on data, which allows "combining" data from various data sources (SQL Server relational and OLAP databases, Oracle, Teradata, DB2, and so on), "defining" relationships between that data, "defining" calculated fields, and mapping between original column names and newly defined names that might be more understandable to users of UDM. Several words in the previous sentence are included in double quotes (""). UDM does not really combine data, but rather creates a metadata (data describing) view. This does not affect source data itself, but allows creating a metadata view that, for example, may have a "relationship" between SQL Server and an Oracle table. Then UDM can be queried, like a database, and UDM, in turn , will access original data sources to retrieve needed data. UDM blurs the usual differentiation between OLTP and DW data. In the past, DW stored denormalized data for quick retrieval. This is standard in the industry, but with UDM, users no longer have to denormalize their data warehouses.

Analysis Services consists of two components : Online Analytical Processing (OLAP) and Data Mining (DM). OLAP is designed to summarize data and DM is designed to look for patterns and trends in data.

Let's look at the example in which a manager wants to analyze sales by country. It is certainly possible to use aggregate functions such as SUM() in a query or in SSRS to calculate summaries by country, but for the large amounts of data, it is not very efficient.

Depending on the amount of data, summarization could be slow, which would be unsatisfactory to online users.

Reporting Services provides caching mechanisms that allow you to prepare a summary report and then display it to a user in real time without waiting for data retrieval. However, OLAP provides a better choice when a user is looking for summarized (or aggregated) data. This is because OLAP is specially tuned to perform aggregations. One of the most useful OLAP modes is Multidimensional OLAP (MOLAP). MOLAP stores aggregated data in an Analysis Services' multidimensional structure, called cube, which is highly optimized to maximize query performance.

Note

This chapter discusses default OLAP aggregation mode: MOLAP. Analysis Services provides two modes in addition to MOLAP: Relational OLAP (ROLAPdoes not store summaries and queries relational data for each MDX query) and Hybrid OLAP (HOLAPis a combination of MOLAP and ROLAP). Details of various modes are outside of this book's scope.

A T-SQL query against the AdventureWorksDW database to get a summary by country would look like the following:

 SELECT DimSalesTerritory.SalesTerritoryCountry AS [Country-Region],      SUM(FactInternetSales.SalesAmount) AS [Internet Sales-Sales Amount] FROM DimSalesTerritory INNER JOIN FactInternetSales ON      DimSalesTerritory.SalesTerritoryKey = FactInternetSales.SalesTerritoryKey GROUP BY DimSalesTerritory.SalesTerritoryCountry

A comparative multidimensional (or MDXyou can find more about MDX later in this chapter) query to retrieve the same result would look like the following:

 SELECT NON EMPTY     { [Measures].[Internet Sales Amount]} ON COLUMNS,         NON EMPTY { [Customer].[Customer Geography].[Country]} DIMENSION PROPERTIES     MEMBER_CAPTION,      MEMBER_UNIQUE_NAME ON ROWS FROM [Adventure Works]

As you can see, for a simple aggregation the complexity of either query is fairly comparable. The key difference is in the underlying structures that each query accesses .

At this point, you should not be concerned if you are not familiar with MDX. SQL Server 2005 has an extremely capable visual designer that makes creation of an MDX query a fairly easy endeavor.

The result of both queries is the same and is shown in Table 15.1.

Table 15.1. Query Results


Country-Region	Internet Sales-Sales Amount
Australia	9061000.5844
Canada	1977844.8621
France	2644017.7143
Germany	2894312.3382
United Kingdom	3391712.2109
United States	9389789.5108

The duration of either query is not significantly different if the amount of data is small. For large amounts of data, the MDX query is going to be significantly faster, because a cube stores aggregate data. Aggregation of data can be done on multiple levels, such as country, state, and city, and can be subsequently stored in the cube; thus, MOLAP does not have to query a large DW to generate needed summaries. All that needs to be queried is the cube. MOLAP summarizes and stores data when the cube is processed .