Adding and changing data or changing the structure of the data warehouse affects the integrity and accuracy of cubes. You must update OLAP data whenever data in the data warehouse changes. It is important to minimize processing time when maintaining Microsoft OLAP Services data in a data warehouse.
After this lesson, you will be able to:
- Update data in an online analytical processing (OLAP) database
Estimated lesson time: 90 minutes
Changes to the structure of the data warehouse or the data can affect the integrity and accuracy of cubes that you create from the data warehouse.
Data is usually added periodically to the data warehouse to include recent information about the organization s business activities. Adding data to the data warehouse can affect the cube information that is available online to clients because of the interaction between the data and cube partitions. You can manage the effects of adding data to the data warehouse by defining partition filters.
Changes to data already in the data warehouse are less frequent and usually are made so that you can
For example, reorganizing products into different categories can require significant changes to data in the data warehouse. In some cases, such changes require that you completely redesign the cubes. In other cases, redesigning dimensions and processing the cubes that use those dimensions are all that is required.
At some point, data in the data warehouse may become older than the history keeping goals of the data warehouse require. For example, if you keep five years of history in your data warehouse, then records that are older than five years are not required. Removing this old data from the warehouse is called roll-off. Before you roll-off data, you should archive it and confirm that your database will remain consistent after the data has been removed.
Some of the issues you must address in roll-off procedures are as follows:
Changes to the structure of the data warehouse should be infrequent if the data warehouse is well designed. Structural changes to the data warehouse design are typically the least common type of change. When the structure changes, you may need to modify the design of affected cubes and dimensions, redefine partitions and aggregations, and completely process (rebuild) the modified cubes and dimensions.
When you add data, change data, or change its structure in a data warehouse, you must update OLAP data before the changes are available to users.
Many changes you make within the OLAP Manager and all changes to the source data of a cube require you to process the cube before the data reflects these changes. Incorporating additional data into cubes requires less time than rebuilding cubes when data changes.
The method of processing an OLAP object depends on the object and the type of change made to the data warehouse, such as data addition, data change, or structural change. Incorporating additional data into cubes requires less time than rebuilding cubes when data changes. The following table describes the methods that you can use to process cube data.
Update Method | Description | Use for |
---|---|---|
Incremental update | Adds new data to a partition in the cube and updates aggregations. Creates a temporary partition from the new data and merges it into an existing partition. Does not process changes to the structure of a cube (measures, dimensions, and so on) or changes to its existing source data. | Adding new members to a dimension that does not affect the structure of the dimension. |
Refresh data | Clears and reloads data in the cube and recalculates its aggregations. Reuses the aggregation design. | Changing the source data of a cube when its structure does not change. |
Process (rebuild) data | Completely restructures a cube and its dimensions based on its current definitions and then recalculates its data. Redefines partitions and aggregations. Fully processes (rebuilds) the modified cubes and dimensions. | Changing the structure of cube data by adding, deleting, or altering tables or relationships between tables. |
When you process a cube that is currently online by using full process or refresh options, you must have available disk space that is approximately twice the size of your cube. (The new copy of the cube is built. It then replaces the online cube, which is deleted after being successfully replaced.)
When you process a cube that is currently online by using any one of the three processing methods, the cube remains online until the processing is complete, at which time the new cube replaces the online cube.
Incremental or Refresh Update Method
When you process a cube by using the incremental update or the refresh data method, online clients continue to query the cube.
When the refresh completes, clients can access the updated data without disconnecting and reconnecting. The new version of the cube is visible immediately, with no break in service.
Process (Rebuild) Update Method
When you process a cube by using the full process (rebuild) method, online clients lose their connections to the cube when you make the switch to the new cube. The clients must individually reconnect in order to access the new cube.
OLAP Services does not verify the referential integrity of the data warehouse. For example, if a fact table in a cube contains foreign key values that are not present in the primary key column of a joined dimension table, the rows containing those values are not processed. In this example, processing does not produce an error message, but the cube does contain incomplete and therefore inaccurate data.
Microsoft SQL Server OLAP Services provides an incremental update processing method that efficiently incorporates data additions into OLAP cubes.
Using the Incremental Update Wizard
You can use the Incremental Update wizard in OLAP Manager to incorporate data additions into OLAP data incrementally. The Incremental Update wizard helps you add new data to a cube partition. The wizard creates a temporary partition for the new data and merges it into an existing partition. The Incremental Update wizard is launched when you are processing a cube and you select the Incremental update method.
Use the Incremental Update wizard when you add new data to the data warehouse (but existing data does not change) and you want to add the new data to your cube. The wizard adds the new data and associated aggregations but does not change the structure of the cube.
Defining Partition Filters
Cube aggregation data is stored in one or more partitions. You can explicitly define a filter in the Incremental Update wizard that specifies the data that applies to the partition. Explicit partition filters minimize the effects on OLAP data of changes to data in the data warehouse.
The filter that you create is added to the WHERE clause of all the SELECT statements that are used to build the data in the cube. You can specify any of the columns in the fact table in the filter.
You cannot directly specify the name of a column from a dimension table because the dimension tables are not used in all the SELECT statements that are used to build the cube. It is possible to specify the names of columns from the dimension tables by using them in nested SELECT statements, if the data provider for the source data supports nested SELECT statements.
For example, in a filter for updating the Sales cube in the Northwind_DSS database, this filter is invalid because some of the queries used to build the Sales cube do not include the dimension table, Time_Dim:
Time_Dim.Time > 387 |
This filter is valid because all queries used to build the Sales include the fact table, Sales_Fact:
Sales_Fact.Time > 387 |
One method of using filters is to include a batch number on all rows in the fact table and set partition filters to include only the current batches. If new batches are added, they will not affect the cube until the cube is processed and the filter is updated to include the new batches.
CAUTION
The Incremental Update wizard updates a partition. Incorrect use of partitions can result in inaccurate cube data.
OLAP Services provides the refresh data method that efficiently incorporates changes in existing source data into an OLAP cube.
When using the refresh data option, consider the following facts and guidelines:
To understand why aggregations must be recalculated, consider the following example. A transaction is moved from one dimension member to another, such as when a sale posted to the wrong customer is moved to the correct customer. If the aggregations are not recalculated, the original sale is zeroed out, but it will be included in the count of sales records and affect the calculation of a sales average.
OLAP Services provides the process method that completely rebuilds a cube when you change the structure of a cube or when you edit its dimensions or measures.
Processing a cube performs a complete load of data into the cube. This process involves
Processing a cube can take a substantial amount of time when you have a large fact table containing many dimensions with numerous levels and many items in each level.
Process a cube after you complete any of the following activities:
Changes to dimensions require processing. You can only process cubes and shared dimensions. You cannot directly process private dimensions; they are processed when you process the cube to which they belong. The processing of shared dimensions can affect all cubes that incorporate the dimension in the design.
WARNING
If you update and save, but do not process, the structure of a shared dimension, it is automatically processed later, when you process any cube that incorporates the dimension.
The following table describes the methods that you can use to process shared dimensions.
Update Method | Description | Use |
---|---|---|
Incremental update | This option appends new dimension members to your shared dimension. A cube that incorporates a shared dimension remains available to users while the dimension is incrementally updated, and the added dimension members are available to users automatically after the update is complete. The cells associated with the new members remain empty until new data that relates to the members is added to the fact table. | When the structure and relationship of the dimension levels and members have not changed, but new members (that is, rows) have been added to the dimension table. |
Rebuild the dimension structure | Select this option to completely rebuild the dimension structure. Take care when using this method, because all cubes that incorporate the shared dimension immediately become unavailable to users and must be processed before they can be used again. | After making changes to the structure or relationship of the dimension levels or members. For example, use this option after you add, delete, or move a level or after you move a member from one parent member to another. |
NOTE
Empty cells introduced by new dimension members can affect the results of some aggregate function computations in which empty cells are counted.
Changes to data in the data warehouse dimension tables can affect dimension hierarchies, even though the table schema remains the same, because the dimension hierarchy is based on relationships between members in a dimension table. For example, if you rearrange accounts into different regions, the schema stays the same, but the accounts will all be children of different regions.
If the dimension hierarchy is changed in this way, you must rebuild the dimension structure by using the rebuild the dimension structure method.
OLAP Services provides continuous online access to cubes. You must approach changes to the underlying data warehouse with a clear understanding of the effects of synchronizing data in a data warehouse with data in cubes.
Valid cubes are online and available to clients whenever the OLAP Server is running. When you design the data warehouse, you must design a synchronization strategy to enable the addition of data without causing cubes to provide incorrect answers to queries in cubes available to online clients.
If you divide data in a cube among multiple partitions, you can use one of the partitions to accumulate new data batches and process that partition only. The other partitions in the cube must have filters that exclude new data so that you add data only to the partition that accumulates data.
Designing a strategy that provides both currency of data and accuracy of updates requires careful planning. You can manage the effects of adding data to the data warehouse by defining partition filters and by designing a strategy to synchronize OLAP and data warehouse data.
One strategy for managing additions to data warehouse and OLAP data is to design a batch update system. In this strategy, the data in the data warehouse fact table includes a batch number in each record. When you design a cube, add an expression to the filter for each of the partitions of the cube to specify the largest batch number applicable for example, " AND DWBatch <= 33...." When you make additions to the fact table, include a new, higher batch number in the records. These added records do not affect cubes, because the cube partitions read data from previous batches only. When you are ready to process the cube, you must change the partition filter to include the new data.
Microsoft provides a DTS task named the OLAP Services Processing Task. This task lets you create and execute DTS packages which perform OLAP Services processing. The following are suggested uses for the task:
You need to install the OLAP Services Processing Task after installing SQL Server. To download the latest version of the task from the Microsoft SQL Server OLAP Services Website, start the OLAP Manager, and click "OLAP Services on the Web" on the Getting Started tab of the HTML (that is, the right) pane. You need to have a valid connection to the Internet before you do this. A copy of the DTSTasks.EXE file appears in the \Software\Addins folder on the companion CD-ROM if you are not able to download it from OLAP Services on the Web.
After downloading the DTSTasks.EXE file, execute it to install the task. The OLAP Services Processing Task will now be available in the task palette in the DTS Package Designer.
In this exercise, you will incrementally update the data in the Northwind_DSS database by using the Incremental Update wizard.
Scenario
First you will update the Northwind_Mart SQL Server database. The following tables need to be updated:
After this information is added to the Northwind_Mart database, it will be updated in the Northwind_DSS database in Microsoft OLAP Services by processing the Sales cube so it reflects this new information.
The Dimension Browser Time dialog box appears.
Does the Time dimension contain members for the week of May 7, 1998, through May 12, 1998?
What is the original source for the members of the Time dimension?
Does the Time_Dim table contain rows for the time period of May 7, 1998, through May 12, 1998?
This will add new time entries to the Time_Dim table in the Northwind_Mart database in SQL Server.
This will add new product entries to the Product_Dim table in the Northwind_Mart database in SQL Server.
This will add the new sales for the new time period to the Sales_Fact table in the Northwind_Mart database in SQL Server.
The Process dialog box appears, displaying the progress of the incremental update of the Time dimension.
The Dimension Browser Time dialog box appears.
Does the Time Dimension contain members for the week of May 7, 1998, through May 13, 1998?
The Process dialog box appears, displaying the progress of the incremental update of the Product dimension.
The Incremental Update wizard appears.
Sales_Fact.TimeKey > 387 |
This is a WHERE clause expression that filters the new data being updated to the new sales only.
The Process dialog box appears, displaying the progress of the incremental update of the Sales cube.
The Cube Browser Sales dialog box appears.
Updating data in the OLAP Services cubes is an important ongoing task of the data warehouse administrator. The data must always be up to date, and updates must be performed, with minimal impact on users. The incremental update method makes it possible to perform regular updates with minimal impact on users when data needs to be added to the cube. The refresh method makes it possible to perform updates with minimal impact on users when data in the cube needs to be synchronized with changed source data. When structural changes are necessary, you should schedule a full rebuild of the cubes and use the process method to rebuild the cubes.