Average Latency Scenario | Professional SQL Server Analysis Services 2005 with MDX (Programmer to Programmer)

For the average latency scenario, assume you are running a large retail business intelligence implementation with several hundred product-related data changes being added overnight, every night. These additions come in the form of stocking and pricing changes. Actual sales information arrives in your data warehouse periodically and your users really want to see the data under reasonable real-time conditions. For this case, assume updates are available every two hours or so and your cube typically takes about an hour to process. However your users are willing to see old data for up to four hours. Assume the data partition itself is not large (say, less than 5 GB) for this scenario.

Proactive Caching with MOLAP Storage Option

Let's say you have built the cube, and dimensions are updated nightly using incremental processing. Incremental processing is good whenever you want the current dimensions to be used by customers, because incremental processing can take place in the background and not prevent customers from querying the data.

The case for which it makes sense to use Proactive Caching with the MOLAP storage option is when you need to update the sales information (or other information) into the measure groups on a periodic basis so that users see near real-time data without any performance degradation. In this case, the data arrives in your data warehouse in the form of a bulk load from your relational transactional database. Further, let's say that incremental processing of your cube is faster than the time required for a bulk load to your data warehouse. You can set up proactive caching for the average latency scenario to be medium latency MOLAP as shown in Figure 18-13 so that as soon as a notification arrives, Analysis Services automatically starts building the new MOLAP cache. Since your users are willing to wait to get the old data for up to four hours the proactive caching property called latency is set to 4 hours. If the new MOLAP cache is not built in 4 hours since the last data change then Analysis Services switches to ROLAP mode to retrieve data from the relational data source. As soon as the new MOLAP cache is completed Analysis Services will serve the users from the new MOLAP cache. Typically in this scenario you would want to specify the latency time interval to be much higher than the incremental processing time for the partitions. If the incremental processing takes much longer than the latency, you might experience occasional degradation in performance because the Analysis Services's existing MOLAP cache is outdated and Analysis Services needs to fetch the results from the relational data source.

Latency simply refers to the amount of time you want the system to wait before unceremoniously dumping an existing MOLAP cache that is used to serve users. The SilenceInterval indicates that no less than the specified period must elapse before initiating the rebuilding of a new MOLAP cache upon data change. SilenceOverrideInterval is a little trickier to get your head around, but by no means daunting. If SilenceInterval is reset time and again due to frequent data changes then the MOLAP cache never gets rebuilt fully and gets dumped often whenever data changes. There is some limit to our patience since users will always see performance degradation from the time Analysis Services switches to fetching the data from the relational data source after the specified latency time. To overcome this issue SilenceOverride Interval property ensures that it stops resetting the silence interval for future data changes till the existing MOLAP cache is rebuilt fully.

image from book
Figure 18-13

Normally you know how frequently updates are occurring to your relational data source. Based on that information you can specify the SilenceInterval. On certain occasions there might be frequent data changes that result in the Silence interval timer being reset, and this can potentially lead to not rebuilding the MOLAP cache. That's when SilenceOverrideInterval comes in handy. Think of SilenceOverrideInterval as simply your way of saying, "I don't care if the update notifications keep coming, I want to do an update no longer than, say, every sixty seconds." So, even though SilenceInterval keeps on ticking away the seconds, SilenceOverrideInterval will override it if SilenceInterval overstays its welcome — and that is just what happens in Figure 18-14. You can see how SO (SilenceOverrideInterval) times out and a rebuild of the MOLAP cache is kicked off. Typically, if the SilenceInterval is specified in the order of seconds, your SilenceInterval override would be specified in minutes so that your MOLAP cache is not too long outdated. Figure 18-14 shows a graphical timeline representation of events occurring due to proactive caching being enabled, but is demonstrated using smaller time intervals for SilenceInterval and SilenceOverrideInterval rather than typical values. Once the cache is rebuilt, the normal process of proactive caching using the SilenceInterval during future notifications will be handled by Analysis Services.

image from book
Figure 18-14

For the average latency scenario example explained here we recommend you customize the medium latency MOLAP default settings so that you set the SilenceInterval, SilenceOverrideInterval and Latency as shown in Figure 18-15. Latency is pretty straight forward based on requirement and you set it to 4 hours. SilenceInterval is set to 10 seconds so that the MOLAP cache rebuilding starts in 10 seconds. The processing of the partition takes approximately 2 hours. If there are multiple data updates within the first two hours you want to make sure the SilenceOverrideInterval kicks in and stops frequent cache updates and by the time of the 4 hour time limit you do have a new MOLAP cache ready for users to query. There might be some times where you have frequent data updates on relational database and your MOLAP rebuilding has not completed but latency has expired. During that time all requests will be served by retrieving results from the data source. You need to ensure this time period is as small as possible so that users perceive the data is real-time and with very good performance (due to MOLAP cache).

image from book
Figure 18-15

Analysis Services serves data to users with the help of a cache. If the data in the relational data warehouse changed, the cache needs to be updated (that is, rebuilt). It takes some amount of time to rebuild the cache. Latency is one of the proactive caching properties that allows you to control serving your customers from an old MOLAP cache for a certain period of time; or to instantaneously serve the customers with the latest data. If your users are concerned about getting the most up-to-date data, you would set the property called Latency to zero. This informs the Analysis Services that users are interested in the latest data and the existing MOLAP cache needs to be cleared. Because the new MOLAP cache might take some time to be rebuilt, you want to take steps to keep the results coming to the users. During the time the MOLAP cache is being rebuilt, Analysis Services fetches the data from the relational data warehouse. Even though you do get the most up-to-date data, you might see slight performance degradation because Analysis Services needs to retrieve the data from the relational data warehouse.

As soon as the MOLAP cache is rebuilt, Analysis Services starts serving the customers with the new MOLAP cache and you will start seeing your original query response times. If you want the users to continue using the existing cache while a new cache is generated based on new data, you can specify the time that it would take for rebuilding the MOLAP cache as latency. For example, if it takes 15 minutes to rebuild your MOLAP cache, you can specify the latency as 15 minutes. By specifying this, the current users would be receiving slightly old data for 15 minutes but at the MOLAP performance level. As soon as the MOLAP cache is rebuilt, Analysis Services starts serving all the customers using the new MOLAP cache and they would instantaneously see the new data. The trade-off here is how current the data is versus query performance.This is a key configuration and one we expect many will apply if they want to see near real-time data but with MOLAP performance. In this scenario, customers need to be willing to wait for a certain period of time for data to be propagated through the UDM.

We do not recommend this solution for dimensions (changes to existing dimension members) because occasionally you might end up in a state where you would have to query the data from the relational data source. This is not a problem, but when the dimension storage mode switches from MOLAP to ROLAP, it is considered a structural change by Analysis Services, which means that the partitions have to be rebuilt. This can potentially have a significant performance impact and clients might have to reconnect to query the UDM. However, if your business needs demand this and your users always establish a new connection to send queries, you can still use the settings for dimensions.