Lesson 2: Maintaining OLAP Services Data

Adding and changing data or changing the structure of the data warehouse affects the integrity and accuracy of cubes. You must update OLAP data whenever data in the data warehouse changes. It is important to minimize processing time when maintaining Microsoft OLAP Services data in a data warehouse.

After this lesson, you will be able to:

  • Update data in an online analytical processing (OLAP) database

Estimated lesson time: 90 minutes

Maintaining OLAP Data

Changes to the structure of the data warehouse or the data can affect the integrity and accuracy of cubes that you create from the data warehouse.

Adding Data

Data is usually added periodically to the data warehouse to include recent information about the organization s business activities. Adding data to the data warehouse can affect the cube information that is available online to clients because of the interaction between the data and cube partitions. You can manage the effects of adding data to the data warehouse by defining partition filters.

Changing Data

Changes to data already in the data warehouse are less frequent and usually are made so that you can

  • Incorporate corrections to errors that you discover in the source from which you extract the data
  • Restructure data because of changes in the structure of an organization or products
  • For example, reorganizing products into different categories can require significant changes to data in the data warehouse. In some cases, such changes require that you completely redesign the cubes. In other cases, redesigning dimensions and processing the cubes that use those dimensions are all that is required.

Roll-Off Procedures

At some point, data in the data warehouse may become older than the history keeping goals of the data warehouse require. For example, if you keep five years of history in your data warehouse, then records that are older than five years are not required. Removing this old data from the warehouse is called roll-off. Before you roll-off data, you should archive it and confirm that your database will remain consistent after the data has been removed.

Some of the issues you must address in roll-off procedures are as follows:

  • What is the historical scope of the data warehouse?
  • By which criteria is data selected for removal from the warehouse?
  • What happens to the data when it is removed?
    • Removed to archive media
    • Summarized to an aggregate schema
    • Deleted altogether
    • Removed to alternate online storage

  • If OLAP cubes contain data affected by the roll-off procedure, how and when will those cubes be rebuilt?

Changing the Structure of the Data Warehouse

Changes to the structure of the data warehouse should be infrequent if the data warehouse is well designed. Structural changes to the data warehouse design are typically the least common type of change. When the structure changes, you may need to modify the design of affected cubes and dimensions, redefine partitions and aggregations, and completely process (rebuild) the modified cubes and dimensions.

Processing Cubes

When you add data, change data, or change its structure in a data warehouse, you must update OLAP data before the changes are available to users.

Many changes you make within the OLAP Manager and all changes to the source data of a cube require you to process the cube before the data reflects these changes. Incorporating additional data into cubes requires less time than rebuilding cubes when data changes.

Methods of Processing Cubes

The method of processing an OLAP object depends on the object and the type of change made to the data warehouse, such as data addition, data change, or structural change. Incorporating additional data into cubes requires less time than rebuilding cubes when data changes. The following table describes the methods that you can use to process cube data.

Update MethodDescriptionUse for
Incremental updateAdds new data to a partition in the cube and updates aggregations. Creates a temporary partition from the new data and merges it into an existing partition.

Does not process changes to the structure of a cube (measures, dimensions, and so on) or changes to its existing source data.

Adding new members to a dimension that does not affect the structure of the dimension.
Refresh dataClears and reloads data in the cube and recalculates its aggregations.

Reuses the aggregation design.

Changing the source data of a cube when its structure does not change.
Process (rebuild) dataCompletely restructures a cube and its dimensions based on its current definitions and then recalculates its data.

Redefines partitions and aggregations.

Fully processes (rebuilds) the modified cubes and dimensions.

Changing the structure of cube data by adding, deleting, or altering tables or relationships between tables.

Disk Use during Cube Processing

When you process a cube that is currently online by using full process or refresh options, you must have available disk space that is approximately twice the size of your cube. (The new copy of the cube is built. It then replaces the online cube, which is deleted after being successfully replaced.)

Cube Visibility during Processing

When you process a cube that is currently online by using any one of the three processing methods, the cube remains online until the processing is complete, at which time the new cube replaces the online cube.

Incremental or Refresh Update Method

When you process a cube by using the incremental update or the refresh data method, online clients continue to query the cube.

When the refresh completes, clients can access the updated data without disconnecting and reconnecting. The new version of the cube is visible immediately, with no break in service.

Process (Rebuild) Update Method

When you process a cube by using the full process (rebuild) method, online clients lose their connections to the cube when you make the switch to the new cube. The clients must individually reconnect in order to access the new cube.

Referential Integrity during Cube Processing

OLAP Services does not verify the referential integrity of the data warehouse. For example, if a fact table in a cube contains foreign key values that are not present in the primary key column of a joined dimension table, the rows containing those values are not processed. In this example, processing does not produce an error message, but the cube does contain incomplete and therefore inaccurate data.

Using the Incremental Update Method

Microsoft SQL Server OLAP Services provides an incremental update processing method that efficiently incorporates data additions into OLAP cubes.

Using the Incremental Update Wizard

You can use the Incremental Update wizard in OLAP Manager to incorporate data additions into OLAP data incrementally. The Incremental Update wizard helps you add new data to a cube partition. The wizard creates a temporary partition for the new data and merges it into an existing partition. The Incremental Update wizard is launched when you are processing a cube and you select the Incremental update method.

Use the Incremental Update wizard when you add new data to the data warehouse (but existing data does not change) and you want to add the new data to your cube. The wizard adds the new data and associated aggregations but does not change the structure of the cube.

Defining Partition Filters

Cube aggregation data is stored in one or more partitions. You can explicitly define a filter in the Incremental Update wizard that specifies the data that applies to the partition. Explicit partition filters minimize the effects on OLAP data of changes to data in the data warehouse.

The filter that you create is added to the WHERE clause of all the SELECT statements that are used to build the data in the cube. You can specify any of the columns in the fact table in the filter.

You cannot directly specify the name of a column from a dimension table because the dimension tables are not used in all the SELECT statements that are used to build the cube. It is possible to specify the names of columns from the dimension tables by using them in nested SELECT statements, if the data provider for the source data supports nested SELECT statements.

For example, in a filter for updating the Sales cube in the Northwind_DSS database, this filter is invalid because some of the queries used to build the Sales cube do not include the dimension table, Time_Dim:

Time_Dim.Time > 387

This filter is valid because all queries used to build the Sales include the fact table, Sales_Fact:

Sales_Fact.Time > 387

One method of using filters is to include a batch number on all rows in the fact table and set partition filters to include only the current batches. If new batches are added, they will not affect the cube until the cube is processed and the filter is updated to include the new batches.

CAUTION
The Incremental Update wizard updates a partition. Incorrect use of partitions can result in inaccurate cube data.

Using the Refresh Data Method

OLAP Services provides the refresh data method that efficiently incorporates changes in existing source data into an OLAP cube.

When using the refresh data option, consider the following facts and guidelines:

  • This option clears and reloads the data in a cube and recalculates the aggregations.
  • You do not need to process the cube fully.
  • This option is faster than processing the cube, because you do not have to redesign the aggregation tables.
  • Use this option when the underlying data in the data warehouse changes, but the structure of the cube remains the same.

To understand why aggregations must be recalculated, consider the following example. A transaction is moved from one dimension member to another, such as when a sale posted to the wrong customer is moved to the correct customer. If the aggregations are not recalculated, the original sale is zeroed out, but it will be included in the count of sales records and affect the calculation of a sales average.

Using the Process Method

OLAP Services provides the process method that completely rebuilds a cube when you change the structure of a cube or when you edit its dimensions or measures.

Processing a cube performs a complete load of data into the cube. This process involves

  • Reading the dimension tables to populate the levels with members from the source data
  • Reading the fact table
  • Calculating all aggregations
  • Storing the results in the cube

Processing a cube can take a substantial amount of time when you have a large fact table containing many dimensions with numerous levels and many items in each level.

Process a cube after you complete any of the following activities:

  • Build the cube and design its storage options and aggregations.
  • Change the cube s metadata (measures, dimensions, and so on) and decide to incorporate the changes in the cube.
  • Change the structure of a shared dimension used in the cube and process the shared dimension.

Updating Dimensions

Changes to dimensions require processing. You can only process cubes and shared dimensions. You cannot directly process private dimensions; they are processed when you process the cube to which they belong. The processing of shared dimensions can affect all cubes that incorporate the dimension in the design.

WARNING
If you update and save, but do not process, the structure of a shared dimension, it is automatically processed later, when you process any cube that incorporates the dimension.

The following table describes the methods that you can use to process shared dimensions.

Update MethodDescriptionUse
Incremental updateThis option appends new dimension members to your shared dimension.

A cube that incorporates a shared dimension remains available to users while the dimension is incrementally updated, and the added dimension members are available to users automatically after the update is complete. The cells associated with the new members remain empty until new data that relates to the members is added to the fact table.

When the structure and relationship of the dimension levels and members have not changed, but new members (that is, rows) have been added to the dimension table.
Rebuild the dimension structureSelect this option to completely rebuild the dimension structure.

Take care when using this method, because all cubes that incorporate the shared dimension immediately become unavailable to users and must be processed before they can be used again.

After making changes to the structure or relationship of the dimension levels or members. For example, use this option after you add, delete, or move a level or after you move a member from one parent member to another.

NOTE
Empty cells introduced by new dimension members can affect the results of some aggregate function computations in which empty cells are counted.

How Updating Dimensions Affects Dimension Hierarchies

Changes to data in the data warehouse dimension tables can affect dimension hierarchies, even though the table schema remains the same, because the dimension hierarchy is based on relationships between members in a dimension table. For example, if you rearrange accounts into different regions, the schema stays the same, but the accounts will all be children of different regions.

If the dimension hierarchy is changed in this way, you must rebuild the dimension structure by using the rebuild the dimension structure method.

Planning Data Updates

OLAP Services provides continuous online access to cubes. You must approach changes to the underlying data warehouse with a clear understanding of the effects of synchronizing data in a data warehouse with data in cubes.

Valid cubes are online and available to clients whenever the OLAP Server is running. When you design the data warehouse, you must design a synchronization strategy to enable the addition of data without causing cubes to provide incorrect answers to queries in cubes available to online clients.

If you divide data in a cube among multiple partitions, you can use one of the partitions to accumulate new data batches and process that partition only. The other partitions in the cube must have filters that exclude new data so that you add data only to the partition that accumulates data.

A Data Synchronization Strategy

Designing a strategy that provides both currency of data and accuracy of updates requires careful planning. You can manage the effects of adding data to the data warehouse by defining partition filters and by designing a strategy to synchronize OLAP and data warehouse data.

One strategy for managing additions to data warehouse and OLAP data is to design a batch update system. In this strategy, the data in the data warehouse fact table includes a batch number in each record. When you design a cube, add an expression to the filter for each of the partitions of the cube to specify the largest batch number applicable for example, " AND DWBatch <= 33...." When you make additions to the fact table, include a new, higher batch number in the records. These added records do not affect cubes, because the cube partitions read data from previous batches only. When you are ready to process the cube, you must change the partition filter to include the new data.

The DTS OLAP Services Processing Task

Microsoft provides a DTS task named the OLAP Services Processing Task. This task lets you create and execute DTS packages which perform OLAP Services processing. The following are suggested uses for the task:

  • You can schedule DTS packages so you can use the task in a package that you schedule for regular execution. Scheduling a package makes it possible to automatically refresh or update your OLAP Services databases regularly.
  • You can perform the processing as part of a package that also uses the msmdarch command to archive the OLAP Services database.

Obtaining the OLAP Services Processing Task

You need to install the OLAP Services Processing Task after installing SQL Server. To download the latest version of the task from the Microsoft SQL Server OLAP Services Website, start the OLAP Manager, and click "OLAP Services on the Web" on the Getting Started tab of the HTML (that is, the right) pane. You need to have a valid connection to the Internet before you do this. A copy of the DTSTasks.EXE file appears in the \Software\Addins folder on the companion CD-ROM if you are not able to download it from OLAP Services on the Web.

After downloading the DTSTasks.EXE file, execute it to install the task. The OLAP Services Processing Task will now be available in the task palette in the DTS Package Designer.

Exercise: Incrementally Updating Cube Data

In this exercise, you will incrementally update the data in the Northwind_DSS database by using the Incremental Update wizard.

Scenario

First you will update the Northwind_Mart SQL Server database. The following tables need to be updated:

  • The Northwind_Mart.dbo.Product_Dim table with five new products that have been added to the product line of Northwind Traders.
  • The Northwind_Mart.dbo.Sales_Fact table with sales data for the week of May 7, 1998, through May 12, 1998. This sales data also reflects sales of five new products.
  • The Northwind_Mart.dbo.Time_Dim table with time data for the week of May 7, 1998, through May 12, 1998. Typically, you populate your time dimension table to the point at which the time span matches the specified history to be maintained in the data mart.

After this information is added to the Northwind_Mart database, it will be updated in the Northwind_DSS database in Microsoft OLAP Services by processing the Sales cube so it reflects this new information.

  • To review the time dimension in Northwind_DSS using OLAP Manager
    1. In OLAP Manager, expand OLAP Servers, expand your server, expand Northwind_DSS, expand Library, and expand Shared Dimensions.
    2. Right-click Time, and then click Browse Dimension Data.
    3. The Dimension Browser Time dialog box appears.

    4. Expand All Time, expand 1998, expand Quarter 2, and then expand May.
    5. Does the Time dimension contain members for the week of May 7, 1998, through May 12, 1998?

      What is the original source for the members of the Time dimension?

    6. Click Close.

  • To review the Time_Dim table in Northwind_Mart
    1. Open SQL Server Enterprise Manager.
    2. Expand SQL Server Group, expand your server, expand Databases, expand Northwind_Mart, and then click Tables.
    3. In the details pane, right-click Time_Dim, point to Open Table, and then click Return all rows.
    4. Review the results.
    5. Does the Time_Dim table contain rows for the time period of May 7, 1998, through May 12, 1998?

    6. Close the Data in Table Time_Dim dialog box.

  • To update the Time_Dim table in Northwind_Mart
    1. Open SQL Server Query Analyzer.
    2. Open, review, and execute the script in C:\SQLDW\Exercise\Ch13\AdditionalTimeMembers.SQL.
    3. This will add new time entries to the Time_Dim table in the Northwind_Mart database in SQL Server.

    4. In the results pane, verify that the new dates have been added.

  • To update the Product_Dim table in Northwind_Mart
    1. Open, review, and execute the script in C:\SQLDW\Exercise\Ch13\NewProducts.SQL.
    2. This will add new product entries to the Product_Dim table in the Northwind_Mart database in SQL Server.

    3. In the results pane, verify that the new products have been added.

  • To update the Sales_Fact table in Northwind_Mart
    1. Open, review, and execute the script in C:\SQLDW\Exercise\Ch13\NewSales.SQL.
    2. This will add the new sales for the new time period to the Sales_Fact table in the Northwind_Mart database in SQL Server.

    3. In the results pane, verify that the new sales have been added.

  • To update the Time dimension
    1. Switch to OLAP Manager.
    2. Expand your server, expand Northwind_DSS, expand Library, and then expand Shared Dimensions.
    3. Under Shared Dimensions, right-click Time, and then click Process.
    4. Click Incremental update, and then click OK.
    5. The Process dialog box appears, displaying the progress of the incremental update of the Time dimension.

    6. When the processing is complete, click Close.

  • To verify the update of the Time dimension
    1. Under Shared Dimensions, right-click Time, and then click Browse Dimension Data.
    2. The Dimension Browser Time dialog box appears.

    3. Expand All Time, expand 1998, expand Quarter 2, and then expand May.
    4. Does the Time Dimension contain members for the week of May 7, 1998, through May 13, 1998?

    5. Click Close.

  • To update the Product dimension
    1. Under Shared Dimensions, right-click Product, and then click Process.
    2. Click Incremental update, and then click OK.
    3. The Process dialog box appears, displaying the progress of the incremental update of the Product dimension.

    4. When the processing is complete, click Close.

  • To incrementally update cube data
    1. In the Northwind_DSS OLAP database, expand Cubes.
    2. Right-click Sales, and then click Process.
    3. Click Incremental update, and then click OK.
    4. The Incremental Update wizard appears.

    5. Click Next, verify that the default fact table is Sales_Fact, and then click Next.
    6. In the Create a filter expression box, type the following Transact-SQL filter expression to specify the data that you want to update incrementally:
    7. Sales_Fact.TimeKey > 387

      This is a WHERE clause expression that filters the new data being updated to the new sales only.

    8. Click Next, and then click Finish.
    9. The Process dialog box appears, displaying the progress of the incremental update of the Sales cube.

    10. Review the steps of the incremental update in the Process dialog box, and then click Close.

  • To browse the updated Sales cube
    1. Under Cubes, right-click Sales, and then click Browse Data.
    2. The Cube Browser Sales dialog box appears.

    3. Drag the Product dimension to the left vertical axis. Drop the Product dimension on the header of the vertical axis so that the Product dimension replaces the Customer dimension.
    4. Filter the Time dimension so that it only displays those sales that occurred on May 7, 1998. To do this, open the drop-down list for the Time dimension. Expand All Time, expand 1997, expand Quarter 2, expand May, and click 7.
    5. Expand the Condiments member and note the sales information for the new product, maple syrup.
    6. Review the sales that occurred on this day.
    7. Change to time filter to review the remaining days of new sales.
    8. Click Close.

    Lesson Summary

    Updating data in the OLAP Services cubes is an important ongoing task of the data warehouse administrator. The data must always be up to date, and updates must be performed, with minimal impact on users. The incremental update method makes it possible to perform regular updates with minimal impact on users when data needs to be added to the cube. The refresh method makes it possible to perform updates with minimal impact on users when data in the cube needs to be synchronized with changed source data. When structural changes are necessary, you should schedule a full rebuild of the cubes and use the process method to rebuild the cubes.



    Microsoft Corporation - Microsoft SQL Server 7 Data Warehousing Technical Support Training Kit
    Microsoft SQL Server 7 Data Warehousing Technical Support Training Kit
    ISBN: 0735606706
    EAN: 2147483647
    Year: 1999
    Pages: 114

    flylib.com © 2008-2017.
    If you may any questions please contact us: flylib@qtcs.net