Analysis Services is Microsoft's OLAP engine. It stores multidimensional data in a way that facilitates the performance and flexibility of OLAP. Analysis Services 2000 was already the market leader in OLAP, and Analysis Services 2005 promises to even further increase the appeal of the product in the OLAP market space. Two large differentiating factors between Analysis Services and other products on the market are the way it deals with data explosion and data sparsity.
Data Explosion and Data Sparsity Handling
To explain data explosion, we first need to explore the power of OLAP engines in general. Most OLAP engines provide high-performance, flexible analytical environments by prestoring aggregations across several dimension levels. To better understand this, let's look at a very simplistic example of a sales cube with two measuressales amount and sales quantityand two dimensionsproduct (category and SKU) and date (year, month, and day). Queries can request this data in a variety of different ways. Here are some sample requests that can be answered from the sample cube:
OLAP engines provide quick answers by pre-calculating the answers to these and other questions. Given our simplistic example, there would be a total of 12 possible aggregations, as shown here:
The number of actual pre-calculations depends on the number of members at the given level of a category. For instance, if we are storing 5 years and we have a total of 40,000 products, there would be 200,000 pre-calculations for the "Year/Product" aggregation.
Data explosion occurs when additional dimensions or levels are added, especially those with a large number of members, such as customers. If each combination of dimension/level is precalculated, the number of aggregations can easily reach into the billions or higher. An efficient indexing strategy would be needed in order to parse through such a large set of data to find the relevant calculations!
Analysis Services solves this problem by selectively aggregating data. Taking the example above, Analysis Services might choose to store the "Month/Product" aggregation but not the "Year/Product" aggregation. A request for the total number of widgets sold last year could be easily resolved by adding the 12 "Month/Product" pre-calculations to obtain the total for the year. This is still far better than adding each product sale for the year.
One problem with this random, selective aggregation in Analysis Services is that all dimensions/levels are considered as equal candidates for aggregation. Analysis Services does not know to favor commonly queried dimension levels unless you tell it to. You can tell the Aggregation Design Wizard to eliminate some dimension levels from consideration for pre-aggregation. This takes the form of a property on the measure group to assign the levels that should be removed from consideration. Another way to influence the aggregation design is to feed the wizard a query load that has occurred on the cube or measure group. This is referred to as usage-based optimization. These methods do not concretely direct the Aggregation Design Wizard to create one aggregation over another. They are simply means of influencing the wizard to do so more intelligently. When using these methods, you can actually aggregate at a higher level (that is, create more aggregations) because there are fewer available possibilities.
Another prevalent issue in OLAP is data sparsity. For any given cube, a number of dimension intersections will not be populated. In the preceding example, let's say the store didn't start selling a particular product until this year. This means that all the intersection points for this product in previous years would return empty cells. Cubes, by their very nature, tend to be highly sparse. A dense cube would mean nearly every combination of dimension levels would be active. This is not so hard to believe in our simplistic example, but in a more realistic cube, where we have Date, Product, Location, Customer, Discount Type, Promotion Type and more, you can see where sparsity comes into play. In many other OLAP products, a placeholder is stored even for empty cells. Analysis Services compacts its cubes by not reserving space when there are no values for the intersection. Space saving can be in the neighborhood of 50% to 90%. This results, also, in a more efficient cube because there is less to manage and examine when queries are issued.
These two inherent features of Analysis Services have been integral to propelling it to its current position in the market. With this introduction to OLAP concepts and Analysis Services capabilities, let's now see what's new in Analysis Services 2005.