We will use Analysis Services proactive caching to enable real-enough time updates of our cubes. Proactive caching allows us to configure notifications of changes to the underlying relational table of a cube, and a maximum acceptable latency before the changes appear in the cube. We will isolate historical data and recent data from current data in separate partitions to reduce the amount of data that needs to be processed by the proactive cache. This will minimize the time between the arrival of new data and its availability to the user.
The database consists of a large amount of historical data as well as very current data. New data needs to be available in an OLAP database within 30 minutes of its arrival at the hospital. The OLAP database must be accessible to users continuously, even during updating. The medical staff needs to be able to track the current symptoms and medical requirements of new and existing patients and be able to anticipate the resources required by those patients over the next 24 hours.
A multidimensional database is required because of the large volume of data and multiple ways of analyzing the data. Our goal is to make new data visible as soon as possible after it arrives. One of the barriers to this is that with the MOLAP storage model, the data is only visible after the cube has been processed. We favor the MOLAP model because it is extremely an efficient way to store and retrieve cubes. It turns out we can trade some overhead for lower latency. The overhead arises from more frequent processing of the updated cube partitions, since processing is how data in the relational tables becomes available through the cube. By being smarter about what we reprocess, we hope to pay out as little as possible in overhead and in return receive a large reduction in latency.
For the Analysis Services database, we will use the Proactive Caching feature of Analysis Services to reduce the latency between the arrival of new data and the availability of the data to the user. For optimal access to the existing data, less-recent and historical data will be maintained as usual in MOLAP storage partitions. The new data will eventually make its way to one of these partitions or a newly created partition.
We will use the partitioning techniques discussed in the chapter on very large databases to limit the cube partitions that need to be processed. Ideally, only one very current partition needs to be processed.
As shown in Figure 12-1, SQL Server Integration Services (SSIS) will receive data from external sources by periodic polling of the sources, apply the necessary transforms to make the data conform to our data model, and store the data in data warehouse tables. SQL Server will notify Analysis Services of the arrival of new data in the data warehouse. Analysis Services will process the new data when a lull in the arrival of new data occurs or when the latency constraints dictate that the data must be processed. You have total control of the definition of a "lull" and the maximum latency.
Figure 12-1. High-level architecture
The hospital expects to reduce its operating costs by having the right number of staff with the right skills in attendance each shift. In addition, they expect to reduce waste of expensive supplies that have a short shelf life, kept on hand in quantities larger than usually necessary. They will now be able to analyze the current overall patient profile, bed utilization, and care-type requirements.
A day is a long time in the context of epidemics. With near real-time analysis, local health authorities will have much faster notice of epidemics, which will reduce the number of victims, reduce hospital visits, and thereby free space for the critically ill and reduce overall health-care spending.