Lesson 4: Managing Cubes Using Storage Types and Partitions | Microsoft SQL Server 7 Data Warehousing Technical Support Training Kit

Management of cubes encompasses selecting storage type, using partitions to tune performance, and setting up security. Cubes can be stored in three ways. Care must be taken to decide how to store data, because there are advantages and disadvantages to each method. User-defined partitions allow you to spread large cubes across multiple locations. User-defined partitions and the Partition wizard are available only if you are using Microsoft SQL Server OLAP Services, Enterprise Edition. OLAP Services provides role-based security for managing access to cubes.

After this lesson, you will be able to:

Select the appropriate storage type for a cube based on resources and performance expectations

Understand how partitions can be used to fine-tune performance

Set up Windows NT groups to support access to cubes

Set up OLAP Services roles to provide access levels to cubes

Estimated lesson time: 60 minutes

Storing in a MOLAP Structure

Multidimensional online analytical processing (MOLAP) stores the base data and aggregations in multidimensional structures called cubes. These structures are stored outside of the data mart or data warehouse database, as shown in Figure 9.9. If you do not specify a storage structure, the cube defaults to a MOLAP structure.

click to view at full size

Figure 9.9 Multidimensional online analytical processing

MOLAP Data Structure

Data stored in MOLAP have the following format:

All measures are in the same record, which can speed retrieval time.

No empty cells are stored no sparsity problem.

64-kilobyte (KB) row segments.

Specialized bitmap indexing improves performance.

Reasons to Store in MOLAP

Storing cubes in a MOLAP structure is best for frequently queried aggregated data that requires rapid query response for example, total product sales for all regions by quarter.

MOLAP offers the best query performance for data analysis because of the following features:

The aggregations and a copy of the base data are stored in a multidimensional structure.

Join operations, which are one of the most expensive relational tasks, are not required.

MOLAP data storage is much smaller for the same aggregate data due to data compression algorithms that are used. This results in less input/output (I/O).

MOLAP uses bitmap indexes, which provide greater performance.

MOLAP retrieves data in the cube quickly by using its high-speed query processor and data cache. Information is retrieved from the cube, and the base OLAP tables are only accessed for detailed information.

MOLAP does not use locks because the data is read-only.

MOLAP can be pre-loaded into cache memory

Data can be easily copied to the client for offline analysis.

Storing in a ROLAP Structure

Relational online analytical processing (ROLAP) stores the base data and aggregations in relational tables, as shown in Figure 9.10. These tables are stored in the same database as the data mart or data warehouse tables.

click to view at full size

Figure 9.10 Relational online analytical processing

ROLAP Data Structure

ROLAP stores the base data and aggregations in relational tables in the data warehouse database. Because the base data is stored in the source data mart or data warehouse, ROLAP cubes require no extra storage space to store the base data.

You can use standard Transact-SQL statements to query the aggregation tables, although the names of the tables and columns are system generated, making them difficult to distinguish. You will normally use OLAP Services to access the data and aggregations.

The ROLAP data structure consists of tables and indexes. Each dimension level is stored in a table, and each measure is a column. Each level of the dimension is indexed.

You cannot use ROLAP storage if the data source for your cube is the OLE DB Provider for OLAP Services because ROLAP requires that the data source be a relational database. ROLAP creates the aggregation tables in the source database.

Reasons to Store in ROLAP

Storing cubes in a ROLAP structure is best for infrequently queried data. For example, if 80 percent of user queries only go back one year, historical data older than one year could be moved to a ROLAP structure to reduce overall disk space usage. In addition to eliminating data duplication, storing data in a ROLAP structure provides the following benefits:

ROLAP enables the Cube Builder to automatically create indexes.

ROLAP maps existing data mart or data warehouse aggregations. The OLAP Manager is able to use existing aggregations so that the aggregations do not have to be recalculated for each query.

ROLAP leverages the existing relational database management system (RDBMS) which enables system administrators to efficiently maintain it.

ROLAP supports Microsoft SQL Server 7.0, Oracle, Access, and Open Database Connectivity (ODBC), which can leverage existing DBA knowledge.

Storing in a HOLAP Structure

Hybrid online analytical processing (HOLAP) is a combination of MOLAP and ROLAP, as shown in Figure 9.11.

click to view at full size

Figure 9.11 Hybrid online analytical processing

HOLAP Data Structure

HOLAP stores the base data in relational tables in the same database as the data mart or data warehouse tables. It stores the aggregations in multidimensional structures outside of the data mart or data warehouse database.

Reasons to Store in HOLAP

Storing cubes in a HOLAP structure is best for frequently queried aggregated data based on a large amount of base data. For example, you could store quarterly and yearly sales in MOLAP and store monthly, weekly, and daily sales in ROLAP.

The benefits of storing in a HOLAP structure is that it

Retrieves data in the cube quickly by using the MOLAP high-speed query processor

Consumes less storage space than MOLAP

Avoids data duplication

Comparing Storage Structures

The following table is a summary comparison of the three OLAP Services storage options.

	MOLAP	ROLAP	HOLAP
Base data storage	Cube	Relational table	Relational table
Aggregation storage	Cube	Relational table	Cube
Query performance	Fastest	Slowest	Fast
Storage consumption	High	Low	Medium
Maintenance	High	Low	Medium

There is an appropriate time to use each of the three storage formats. Each of the OLAP data models involves certain compromises. You must decide which model is most appropriate for your particular situation.

MOLAP vs. ROLAP

Although there are no fixed rules, here are some typical scenarios. You should weigh the considerations for your individual situation using the criteria described above.

Larger-Scale Data Warehousing Environments: ROLAP

ROLAP provides the most leverage of existing DBMS skills. ROLAP allows you to keep all the data in a format with which you are familiar, but it is the slowest method.

Smaller- and Medium-Scale Data Marts: MOLAP

MOLAP provides you with speed at any cost. By storing the data and aggregations in cubes, you achieve maximum speed. However, MOLAP requires additional disk storage and a multidimensional cube-browsing tool to view the data.

Certain Performance Scenarios: HOLAP

HOLAP allows you to leverage existing data storage with fast query response time. HOLAP gives you the ability to speed up queries while still keeping the base data in relational tables.

Exercise: Using MOLAP Storage for an OLAP Cube

In this exercise, you will use the Storage Design Wizard again to change the storage type of the Sales cube from ROLAP (selected in a prior exercise) to MOLAP. Then you will look at the physical file created to store the aggregations created by OLAP Services.

To use the Storage Design wizard to enable MOLAP storage

In the OLAP Manager console tree, under the Northwind_DSS database, expand the Cubes folder.

Right-click the Sales cube, and click Design Storage.

On the Storage Design Wizard Welcome page, click Next.

You want to store all aggregations for the cube in a MOLAP mode. Click Replace the existing aggregations, and then click Next.

Select MOLAP as your data storage type, and then click Next.

Under Aggregations options, click Performance gain reaches. In the text box, type 80.

Click Start.

When the process of designing aggregations is complete, click Next.

Click Process Now, and then click Finish.

A window appears in which you can see your cube being processed. When processing is complete, a message appears, stating that processing completed successfully.

Click Close to return to the OLAP Manager.

The aggregations for a cube processed with MOLAP as the data storage type are physically stored on the hard disk in a file using the file extension *.data.

To see the file created for the Sales cube, use the Microsoft Windows Explorer to navigate to the C:\Program Files\OLAP Services\Data\ Northwind_DSS\Sales folder. If the directory path is incorrect, the installation has been altered from default selections, and you can use the Find applet provided by the operating system to locate the file.

Processing, Updating, and Refreshing Cubes

After you create a cube and define the aggregations, you must populate the cube with the aggregations (and base data, if appropriate). Populating the cube is called processing; processing entails a complete load of the data. When you process a cube, OLAP Services retrieves and stores the base data and calculates the aggregations based on the defined dimensions. The aggregations are then stored in either a cube structure or a relational database. The processing period can be very time-consuming based on the amount of base data, the number of dimensions and aggregations calculated, and the hardware you are using.

When to Process

You should process a cube

When the cube is new

When the structure of a cube has changed for example, if you modify the dimensions or aggregations

Incremental Updating

If you choose to update incrementally, this

Can be done on a cube if new data has been added to the warehouse and existing data did not change

Adds aggregations to the cube for only the new data

Can be done while users continue to use the cube

Is often far less processor- and disk-intensive than processing

The existing cube does not have to be reprocessed.

Refreshing Data

When you refresh data,

Existing data is deleted

Aggregations are recalculated and stored

It is appropriate if the underlying base data has changed

The cube can be used while it is being refreshed

Refreshing Definitions

Refreshing is used for nonstructural and nondata changes to the cube. For example, if access roles change, the definition of the cube must be refreshed.

Virtual Cubes

Virtual cubes are simply logical cubes based on physical cubes.

Virtual cubes are logical combinations of multiple physical cubes.

A virtual cube is similar to a view between relational tables.

No storage space is needed other than the definition of the virtual cube itself.

Create virtual cubes with the Virtual Cube wizard.

A virtual cube is a superset of selected cubes in a database. A virtual cube allows you to create a broader view of your multidimensional data, without utilizing additional physical storage space on your drive. Using a virtual cube allows users querying the database to see the data from different cubes in the same place, providing more powerful analysis capabilities.

Exercise: Creating a Virtual Cube

In this exercise, you will use the Cube Wizard to build a second cube that will be used for analyzing the shipping process. Using the Cube Wizard, you will combine the shared dimensions defined earlier with a new dimension that you will create that contains a few measures (the quantitative data derived from the columns in your tables). You will reuse the fact table that is also used for the Sales cube to gain insight about the shipments made by the Northwind Traders Company. Then you will create a virtual cube by joining the Sales and Shipping cubes.

To build your cube

In the OLAP Manager, expand the Northwind_DSS database, right-click Cubes, point to New Cube, and then click Wizard.

On the Welcome page of the Cube Wizard, click Next.

On the Select a fact table for your cube page, expand the Northwind_Mart data source, and then select the Sales_Fact table.

Click Next.

To define the measures for your cube, under Fact table numeric columns, double-click Line Item Freight and Line Item Quantity, in that order.

Click Next.

You need to add a few of the defined dimensions to your cube. Double-click Shipper, Customer, Product, and Time to add the dimensions.

Click Next.

Name your cube Shipping, and then click Finish.

To edit your cube in the Cube Editor

In the Schema pane of the Cube Editor, you can see the fact table (with a yellow title bar) and the joined dimension tables (with blue title bars). In the Cube Editor console tree, you can preview the structure of your cube. You can edit the cube properties by clicking the Properties button.

At this time, you decide you need another dimension that gives you data about the supplier. The supplier information has been rolled up into the Product dimension. The Product dimension is already being used in this cube, but the Supplier information was not added as a level in the product dimension. You need to create a new dimension with this information.

To define the new dimension, double-click the SupplierName column in the Product_Dim table.

In the Map The Column dialog box, click Dimension, and then click OK.

Right-click the Supplier Name dimension in the console tree and select Rename on the pop-up menu.

Type Supplier, and then press Enter.

To design storage and process the cube

In the Cube Editor, on the File menu, click Exit. If you are prompted to save the cube, click Yes.

The Design Storage dialog box will ask you if you want to design storage now. Click Yes.

On the Storage Design Wizard Welcome page, click Next.

Select MOLAP as your data storage type, and then click Next.

Under Aggregations options, click Performance gain reaches. In the text box, type 80.

Click Start.

When the process of designing aggregations is complete, click Next.

Click Process Now, and then click Finish.

A window appears in which you can see your cube being processed. When processing is complete, a message appears, stating that processing completed successfully.

Click Close to return to the OLAP Manager.

To build the virtual cube

In the OLAP Manager console tree, expand the Northwind_DSS database.

Right-click the Virtual Cubes folder, and then click New Virtual Cube.

On the Welcome page of the Virtual Cube Wizard, click Next.

Click the >> button to move the Sales and Shipping cubes from the Available cubes box to the Virtual cube includes box. Click Next.

On the Select the measures for the virtual cube page, double-click the Line Item Freight measure from the Shipping cube in the Available measures box. It will move to the Selected measures box.

On the Select the measures for the virtual cube page, double-click the Line Item Total and Line Item Quantity measures from the Sales cube in the Available measures box. They will move to the Selected measures box.

Click Next.

On the Select the dimensions for virtual cube page, double-click the Customer, Time, Product, and Shipper dimensions in the Available dimensions box. Click Next.

On the Finish the Virtual Cube Wizard page, in the Virtual cube name box, type Sales and Shipping,

Process the cube now by clicking Process Now, and then clicking Finish.

Click Close to return to the OLAP Manager when virtual cube has been processed.

To browse the data of the virtual cube

In the OLAP Manager console tree, expand the Northwind_DSS database, expand Virtual Cubes, right-click the Sales and Shipping virtual cube, and then click Browse Data.

The Cube Browser appears, displaying a grid made up of one dimension and the measures of your cube. The additional dimensions appear in the top of the browser.

Click Close to return to the OLAP Manager console tree.

Partitions

Partitions are storage areas for cubes and relate to our OLAP environment in the following manner:

Every cube has at least one partition, which is created when the cube is created.

You can create additional partitions only if you are using Microsoft SQL Server OLAP Services, Enterprise Edition.

Cubes are stored on partitions.

A cube can be stored across many partitions.

Partitions have no effect on the cube from the users perspective, but they can affect performance.

The power of partitions can be seen in segmenting parts of a cube onto separate partitions. For example, the data of each month may be stored on separate partitions. Monthly partitions can then be merged at year end to create one partition with annual data. Each partition can use a different storage mechanism, enabling tuning of performance.

Types of Partitions

There are three choices available when creating a partition:

Data slice breaks up the data across dimensions.

For example, if you have a multiyear dimension, you may put individual years on separate partitions. This is a slice of the time dimension. This choice is useful when most queries only hit a certain subset of a dimension, such as the current year.

Different fact table for a partition.

A partition can use a fact table different from that defined for the main partition of the cube. To use a different fact table, the structure must be the same as the main fact table of the cube. For example, you may store different products in different fact tables, but all the tables have the same structure. You could then place these different fact tables on different partitions.

Same fact table for multiple partitions.

It is possible to break up a single cube over multiple partitions. This process is similar to data slicing, except the division criteria may not be along dimension lines. Extra care must be taken to ensure that there is no duplicate data on the partitions.

Considerations in using partitions include the following:

When splitting data across partitions, care must be taken to avoid duplicating any data; data duplication will destroy the consistency of the data in the cube.

If data is duplicated across partitions, it will be counted multiple times if the data is summarized across partitions.

Illustration of Partitions

Figure 9.12 shows an example of a cube using multiple partitions. Each partition can have its own optimization set to help control the size of each cube. The more frequently a cube is accessed, the more aggregations we will generally create.

click to view at full size

Figure 9.12 Example of a cube using multiple partitions

In Figure 9.12, the Sales cube is based on three partitions: Europe, America, and Asia.

The America data is the most frequently accessed, so the partition for that part of the data is more highly optimized than the other partitions.

The Asia data is rarely accessed, so it has not been optimized at all in order to save disk space.

The Usefulness of Partitions

Effective use of partitions can give you a number of benefits:

Scalability clusters of SQL Server and OLAP Servers.

Flexibility.

Different storage modes.

Different set of aggregations.

Performance.

Queries are processed in parallel over partitions.

Predefined data slices to limit query bounds.

Incremental data updates.

Merging Partitions

You can merge multiple partitions back into one partition, using a merge process. You must be careful to avoid data duplication during the merge process. The partitions that you want to merge must all

Have the same structure

Be saved in the same format (MOLAP, ROLAP, or HOLAP)

Contain identical aggregation design

Failure to abide by any one of these restrictions disqualifies the partitions from being merge candidates.

Fact Tables

If two partitions use different fact tables, the merged partition will use only the fact table of the target partition; you must manually merge the fact table of the source partition into the target partition. After merging partitions with different fact tables, it is important to process the cube. Failure to do so can show the user accurate aggregations but incorrect base data.

Sliced Data

If you merge partitions created by slicing, the data is fine until the cube is processed. At that point, the data is refreshed using the top level of that dimension. For example, if you sliced your data by regions (Region 1, Region 2, and Region 3), all of which were part of a territory (Territory 1), merging Region 1 and Region 2 would return accurate results until the next refresh. When the data was refreshed in the merged partition, it would default to Territory 1, which would include Region 3. To avoid this, you would have to specify a WHERE clause that retrieved only the data that you needed.

Security

OLAP Services integrates with the security model of Windows NT. This feature allows you to restrict access to cubes, virtual cubes, and the underlying database.

Authentication

Authentication of users occurs at the operating system level. Windows NT validates users using its normal authentication schemes. Access to OLAP Services management features requires the user to be a member of the same domain as the user account under which the server was installed or to be a member of a trusted domain. A user is denied access if the user s account cannot be authenticated against a trusted domain.

Access Control

Access control is handled through access control rights.

Access control rights are only available for cubes stored on the NTFS file system (NTFS).

Access control rights are stored in access control lists (ACLs).

The three levels of access control are

Read Provides read-only access to the data. Supported functionality includes browsing of data and data structures (metadata). This level does not allow modifying data and processing of data. This level of access control is supported explicitly via the OLAP Manager user interface.

Read/Write Provides write access to a write-enabled cube. Supported functionality includes all read access functionality and modifying data in cubes designated and enabled for write-back. This level of access control is supported explicitly via the OLAP Manager user interface.

Admin Provides access to the OLAP Manager user interface and allows processing of data. Upon installation, OLAP Services creates a Windows NT group named OLAP Administrators and adds the logged-in user to this group. Only members of this group can start the OLAP Manager user interface and use administrative functions, including security management. Use the Windows NT User Manager to manage this group.

Roles

Roles are a tool provided by Microsoft that allows you to collect users into a single unit, against which you can assign permissions. This way, instead of having to add a certain set of permissions to a number of users, you assign that set of permissions to a role and simply place users in that role.

Roles map Windows NT user accounts and groups to security categories on the server.

Roles are created at the database level and can be assigned to specific cubes.

Lesson Summary

Management of cubes involves

Selecting storage type: MOLAP, ROLAP, or HOLAP

Securing access to cubes by creating roles and assigning NT users and groups to those roles

Optionally using partitions to optimize storage types and aggregations

OLAP Services provides a management interface and wizards that facilitate these management objectives.