The relational database is a great place to store and manage the data in your DW/BI system. But the relational database doesnt, by itself, have enough intelligence. To build a great DW/BI system, you need an analytic server to deliver excellent query performance and provide analytic capabilities beyond those provided by the SQL language.
The single most important thing you can do to improve the query performance of any DW/BI system is to define aggregations. Aggregations are pre-computed and pre-stored summarizations of the detailed data in the fact table. Theyre nothing mysterious : Theyre simply summary tables at different grains, for example monthly, or by geographic region, or both. Defining a good set of aggregations is more valuable to query performance than indexing and cheaper than upgrading your hardware.
The first challenge in aggregation management is figuring out which aggregations are most useful. The best way to do this is to monitor how business users are querying the system. You need to consider which aggregations will benefit you the most: Aggregating daily data to monthly reduces data volume by thirty-fold, whereas aggregating monthly to quarterly provides only a three-fold improvement. The problem gets much more complex when you think about aggregating multiple attributes across multiple dimensions. This is a task that software can help you with, as Analysis Services does.
The next challenge is to populate and maintain the aggregations. If youve read Chapters 5 and 6, youll have a good idea about how to maintain summary tables in the relational data warehouse database. Incremental data is pretty easy to handle, but any deletions or updatesincluding updates to Type 1 dimension attributes that are used in aggregationsare very tricky to handle. Its not rocket science, but again its a task that software should be able to handle for you, as indeed Analysis Services does.
Once you build and maintain your aggregations, you need to use them appropriately. To do so, you need a logical layer. The simplest approach is to depend on the business user : Train users which summary table to use, and when. This approach has obvious flaws.
You could build the aggregation navigation into the reporting front end. This common approach works adequately for many applications, but it means you have to go through that reporting client in order to use the aggregations. Its a much better design to put this logic into a server that all clients , however simple or complex, can seamlessly use.
Again, Analysis Services provides this service. It uses pre-stored aggregations where it can, accessing the smallest possible set of stored or cached data necessary to answer a query. Some relational database engines build aggregation navigation into the relational database server, as indexed or materialized views. This can be an effective approach, but even if you assume that aggregation navigation using indexed views works as well as it does in an OLAP server like Analysis Services, there are many other reasons to use Analysis Services.
The SQL Server relational database has an indexed views feature which performs some aggregation navigation. However, weve not found it to be particularly useful for dimensional schemas. Analysis Services works so well that almost no one uses SQL Server indexed views for aggregation navigation on dimensional schemas. DW/BI systems built using the Oracle RDBMS make heavy use of their similar materialized views and query rewrite features.
Most of the facts or measures in your dimensional model are additive: Total Sales this year is the sum of all sales for all products for all days for all regions . Non-additive and semi-additive measures require more attention.
The best example of a semi-additive measure is any kind of period-ending balance, like inventory level. If you track the quantity of products by day in your warehouse, you cant sum each of those daily inventory levels to get the monthly level. The definition of monthly inventory levels is driven by business requirements, but its typically either the month-end or an average of the daily levels. In the relational database, how do you ensure business users dont attempt to add these facts across time?
Even worse , business users may use period-end levels for queries up to monthly, but use averages for quarterly and annual queries. The relational database has no structure in which to store this default summarization logic. When you combine this complexity with the goal of reading data from the smallest possible aggregation table, you can no longer hope that business users will get it right without software to help them out.
You can create predefined reports that present the data accurately, but thats not a very flexible solution. Plus, this approach places a huge burden on the report builders who, like the business users, are only human.
Semi-additive facts are the simplest example of a wider problem of complex calculations. One of the most compelling reasons for using Analysis Services is that the effort it takes to get SQL to perform analytics ranges from frustrating to impossible , depending on the level of complexity. The Q in SQL stands for query, not analysis.
Even if a client-side query and reporting application can handle the computation logic, the best system design folds this functionality into a server. With a server, you define the logic once and use it many times. A server has an engine that can perform complex calculations on a large volume of data with a good chance of acceptable performance. With an Analysis Services server you also have a language that can handle these calculations: MDX.
Query performance is the first reason most people are attracted to Analysis Services OLAP. In general, Analysis Services offers excellent dimensional query performance, in a way thats cheaper and easier to manage than is possible from the relational database alone. You can get good query performance from a pure relational DW/BI system, but it requires a lot of work that Analysis Services does for you.
Analysis Services query performance comes from four primary areas:
Smart aggregations, used intelligently.
Effective data compression reduces I/O. You might think that the cost of compressing and decompressing the data would outweigh the benefits of reduced I/O. Empirically, this is not the case.
Analysis Services storage structures, indexes, query optimizer, and language are optimized for analytic use.
Effective data caching keeps data in server memory longer, again reducing disk I/O.
In addition to storing summarization logic, you can define business calculations like [Profit], [Sales year to date], and [Sales same period last year]. You can define sets like [Top 10 customers] and a host of other calculations. Once these calculations are defined, all business usersno matter what tool they use to access Analysis Serviceswill use the same formula. Complex calculations can be challenging to define correctly, but the work is done once by the development team and shared by all.
Many business intelligence tools define business calculations in the front end, like in report definitions. This approach leads to several difficulties:
Manageability and consistency: By embedding logic in client tools and report definitions, youre setting the stage for different reports to calculate a measure differently. This is one of the problems youre building the DW/BI system to solve. Store calculations definitions on the server, and queries and reports will be consistent.
Query performance: SQL is a limited language. To perform calculations that cannot be expressed in SQL, you need a query engine. Some calculations need to access a lot of detailed data. You really dont want this work done by your reporting server or, worse yet, by your client tool. The Analysis Services calculation engine is designed for complex analytics on large data volumes .
There are other reasons for using Analysis Services as the primary query server for your business intelligence application:
Analysis Services efficiently calculates and loads predefined aggregations. Weve already discussed that Analysis Services simplifies aggregation management. It also calculates and stores those aggregations very efficiently.
Security, particularly complex row-level security, is awkward and difficult to define in the relational database. Analysis Services easily supports very sophisticated security models, as we describe in Chapter 12.
Its relatively simple to manage an Analysis Services system with minimal downtime. The relational part of the ETL process is the most time-consuming part of the load cycle. Incrementally loading the OLAP database and calculating aggregations is usually fast. Analysis Services can cache a shadow copy of data being updated so the system is available for querying while its being updated. You certainly dont need Analysis Services to provide this functionality, but its a lot easier.
Analysis Services clients reveal metadata to business users. The dimensional structure is defined by the Analysis Services metadata, and is easily extended to include user-oriented descriptive information. This metadatawhat is available and what does it meanis extremely valuable to business users.
For readers familiar with Analysis Services 2000, heres a list of what we consider the most important changes. Most of these changes are discussed in greater detail in this chapter.
In Analysis Services 2000, dimensions were strongly tied to hierarchies like Year, Quarter, Month, Day. Non-hierarchical attributes like DayName or IsWeekend were second-class citizens and difficult to work with. In Analysis Services 2005, dimensions are attribute-based, and behave like relational dimensions. We talk more about this change later in this chapter, when we discuss how to design the dimensions.
Analysis Services 2005 no longer requires that dimensions be cached in memory. Youll be able to build cubes with extremely large dimensionspossibly beyond 100 M members . For those who dont have extreme dimensions, the main implication of this important change is that although the server will still benefit from lots of physical memory, it should no longer fail if memory is tight.
Proactive caching is used for BI applications with low latency. We discuss proactive caching in Chapter 17 .
The metadata repository is gone, replaced by XML data definition files.
You can use the standard management tools to process partitions in parallel. Parallel processing occurs by default.
Calculations are computed on the server, rather than distributed between client and server. This has the effect of reducing the performance of simple queriesprobably by an amount thats not noticeable to people. On the upside, complex queries will resolve much faster because the server rather than the client is always operating on the data.
You can capture the text of users queries, which makes it much easier to understand what users are doing and to debug problematic queries.
The Key Performance Indicator (KPI) framework lets you define corporate measures on the server. A KPI consists of expressions for value, goal, current status, and trend, which are displayed using simple graphics like gauges and stoplights.
Analysis Services 2005 lets you present one database in multiple languages, including both metadata (the names of dimensions and levels) and data (dimension member contents and measure calculation and display).
The administrative security model is richer than in Analysis Services 2000.
MDX Scripts are the new mechanism for defining Calculated Members, Named Sets, and Cell Calculations. MDX Script syntax is simplified and improved, and can actually be debugged .
You can write stored procedures for Analysis Services, using popular languages like VB.NET or C++. Stored procedures will perform far better than the old user-defined functions because stored procedures execute on the server.
Data writeback, which was a flawed feature in Analysis Services 2000, is much better. Writeback performance has improved dramatically.
There are new features for developers, like web services and the XMLA protocol. Analysis Services 2005 continues support for the old protocols and object models: OLE DB for OLAP, ADOMD, and ADOMD.NET. A new management object model, Analysis Management Objects (AMO), replaces DSO. AMOs most important new feature is the ability to script an objects creation or modification.
There are new features to improve system management, like server trace events for monitoring and auditing system usage and performance. The Flight Recorder feature makes it much easier to debug problems.
We recommend that most DW/BI systems use Analysis Services as the primary query server. The relational version of the dimensional schema serves as the permanent store of the cleaned and conformed data, and feeds data to the OLAP database.
Some systems will close the relational data warehouse database to most business users. This is an appealing architecture, for in this case the relational database can be lightly indexed. The Analysis Services database, including data, indexes, and a reasonable set of pre-computed aggregations, is smaller than the relational indexes it replaces. Standard reports, KPIs, analytic reports, and ad hoc analyses can all be sourced from Analysis Services, using Reporting Services, Excel, SharePoint Portal Server, and possibly a non-Microsoft ad hoc query tool.
Other systems will permit user access to both Analysis Services and the corresponding relational database. And others dont use Analysis Services at all. What are some of the common reasons to commit less than fully to Analysis Services?
The Analysis Services development market is immature. There is more than a 20-year history of tools, books, articles, consultants , and classes to help implement a relational DW/BI system. There are fewer tools, experts, and informational material about how to work with Analysis Services.
The query and reporting tool market is confusing and immature. Reporting Services is the first report publishing server that can source from Analysis Servicesand Reporting Services is itself a young product. On the ad hoc query and analysis side, the tools are maturing but its still a confusing market.
Companies have a large investment in existing client tools and developer and user skills.
Some kinds of analysis are intrinsically difficult in OLAP. This is particularly true for analyses that align data not by dimension attributes but by an event thats buried in the facts.
Most ad hoc analyses are easier to construct with MDX and OLAP than with SQL. A counter-example from a clickstream business process is to analyze how users behave after they first visit a certain page. First you need to go into the facts to find who visited the page (and when). Then you may want to align all the users page clicks by time to see patterns in behavior. This is moderately difficult in SQL but extremely challenging in OLAP and MDX. In truth, this kind of analysis is best handled by data mining. But sometimes we just want to poke around, and for this kind of problem we prefer SQL.