Chapter 18: Designing Real-Time Cubes | Professional SQL Server Analysis Services 2005 with MDX (Programmer to Programmer)

Overview

So, you're ready to go real-time with your cubes? Or perhaps you would be if only you knew what real-time cubes were? We define real-time cubes as cubes that are configured for automatic data updates on a time scale that makes them appear to be working in real time. This can be profoundly useful for certain types of analytical applications. First, consider an application for which real-time cubes would not be useful: An application designed to create profit projections and economic analysis of harvesting an old growth forest (a renewable resource) — that would require updates, say, every five years or so to reflect macroeconomic trends. There is a much more exciting application that would exploit real-time techniques. Consider a case such that your cube is directly built against your transactional data that is has several transactions per second which need to be updated in your UDM so that users can query the data real time. With the use of real-time streaming stock quotes of your company and results of a business analysis could be fed into a digital dashboard for viewing results. Ok, it is an unlikely example, but you get the idea. Such a dashboard might house multiple Key Performance Indicators (KPIs), clearly indicating the performance of target metrics with changes in color or graphics displays based on the data. Attaching the real-time stock quote stream to analyze the constantly changing cube and/or dimensions can be done through a dot net stored assembly. All this is possible in Analysis Services 2005 due to the flexibility of the Universal Dimension Model (UDM).

What does real-time mean to you or your business? Does it mean the ability to query the cube at any time? Does it mean you have the most up-to-date data in your cube? If you think of "most up-to-date data," what does that mean to you? Perhaps it means something like the previous quarter's data or previous month's data or perhaps it is a weekly or daily data. There are cases where even seconds count, as with the stock-related example. The question of how soon does the data needs to be available in the UDM is what you need to think about when you are designing a real-time cube. The daily transactional data in most retail companies arrives at the data warehouse nightly or on a weekly basis. Typically these companies have a nightly job that loads the new data to your cube through an incremental process.

If your company is multi-national, the concept of a nightly job (which is typically considered a batch process) is not nightly at all — due to the many time zones involved. Assume your company had offices in the USA and you were loading new data during the night. If your company expanded to include data-generating offices in Asia and those employees needed to access the cubes, you would need to make sure the UDM was available for querying throughout the day and night while giving consistently correct data. Some companies can find the right sweet spot of time needed to upload the data while users do not access the cube; and do the data load then. What if your transactional data arrives at regular intervals during the day and the end users of the cube need access to the data instantaneously? Then you would have to design a special way to meet the needs of your users. Analysis Services 2005 allows you to address these very sorts of challenges. You simply need to choose the right method based on your requirements.

By now, you are very familiar with MOLAP, HOLAP, and ROLAP storage modes; they can be crossed with varying methods of data update for both fact and dimension data through a technique called proactive caching. With the proactive caching technique you can count on getting the real-time data with MOLAP performance through the use of cache technology. In addition, proactive caching provides you the ability to manage any changes that occur to the source transactional data being propagated to the end user through the UDM (that is where the real-time part comes in). It is important to understand that proactive caching itself does not provide real-time capability; it is a feature that helps you manage your realtime business needs. This chapter provides you with some thoughts on which approach to take in which case and why. We have divided this chapter into three general scenarios to explain proactive caching and how it is useful for designing real time cube. They are long latency scenario for those times when quick updates are not required; an average latency scenario for those periodic, non-time-critical updates; and finally, the low latency scenario for the most demanding of users.