Management of cubes encompasses selecting storage type, using partitions to tune performance, and setting up security. Cubes can be stored in three ways. Care must be taken to decide how to store data, because there are advantages and disadvantages to each method. User-defined partitions allow you to spread large cubes across multiple locations. User-defined partitions and the Partition wizard are available only if you are using Microsoft SQL Server OLAP Services, Enterprise Edition. OLAP Services provides role-based security for managing access to cubes.
After this lesson, you will be able to:
- Select the appropriate storage type for a cube based on resources and performance expectations
- Understand how partitions can be used to fine-tune performance
- Set up Windows NT groups to support access to cubes
- Set up OLAP Services roles to provide access levels to cubes
Estimated lesson time: 60 minutes
Multidimensional online analytical processing (MOLAP) stores the base data and aggregations in multidimensional structures called cubes. These structures are stored outside of the data mart or data warehouse database, as shown in Figure 9.9. If you do not specify a storage structure, the cube defaults to a MOLAP structure.
Figure 9.9 Multidimensional online analytical processing
Data stored in MOLAP have the following format:
Storing cubes in a MOLAP structure is best for frequently queried aggregated data that requires rapid query response for example, total product sales for all regions by quarter.
MOLAP offers the best query performance for data analysis because of the following features:
Relational online analytical processing (ROLAP) stores the base data and aggregations in relational tables, as shown in Figure 9.10. These tables are stored in the same database as the data mart or data warehouse tables.
Figure 9.10 Relational online analytical processing
ROLAP stores the base data and aggregations in relational tables in the data warehouse database. Because the base data is stored in the source data mart or data warehouse, ROLAP cubes require no extra storage space to store the base data.
You can use standard Transact-SQL statements to query the aggregation tables, although the names of the tables and columns are system generated, making them difficult to distinguish. You will normally use OLAP Services to access the data and aggregations.
The ROLAP data structure consists of tables and indexes. Each dimension level is stored in a table, and each measure is a column. Each level of the dimension is indexed.
You cannot use ROLAP storage if the data source for your cube is the OLE DB Provider for OLAP Services because ROLAP requires that the data source be a relational database. ROLAP creates the aggregation tables in the source database.
Storing cubes in a ROLAP structure is best for infrequently queried data. For example, if 80 percent of user queries only go back one year, historical data older than one year could be moved to a ROLAP structure to reduce overall disk space usage. In addition to eliminating data duplication, storing data in a ROLAP structure provides the following benefits:
Hybrid online analytical processing (HOLAP) is a combination of MOLAP and ROLAP, as shown in Figure 9.11.
Figure 9.11 Hybrid online analytical processing
HOLAP stores the base data in relational tables in the same database as the data mart or data warehouse tables. It stores the aggregations in multidimensional structures outside of the data mart or data warehouse database.
Storing cubes in a HOLAP structure is best for frequently queried aggregated data based on a large amount of base data. For example, you could store quarterly and yearly sales in MOLAP and store monthly, weekly, and daily sales in ROLAP.
The benefits of storing in a HOLAP structure is that it
The following table is a summary comparison of the three OLAP Services storage options.
MOLAP | ROLAP | HOLAP | |
---|---|---|---|
Base data storage | Cube | Relational table | Relational table |
Aggregation storage | Cube | Relational table | Cube |
Query performance | Fastest | Slowest | Fast |
Storage consumption | High | Low | Medium |
Maintenance | High | Low | Medium |
There is an appropriate time to use each of the three storage formats. Each of the OLAP data models involves certain compromises. You must decide which model is most appropriate for your particular situation.
Although there are no fixed rules, here are some typical scenarios. You should weigh the considerations for your individual situation using the criteria described above.
ROLAP provides the most leverage of existing DBMS skills. ROLAP allows you to keep all the data in a format with which you are familiar, but it is the slowest method.
MOLAP provides you with speed at any cost. By storing the data and aggregations in cubes, you achieve maximum speed. However, MOLAP requires additional disk storage and a multidimensional cube-browsing tool to view the data.
HOLAP allows you to leverage existing data storage with fast query response time. HOLAP gives you the ability to speed up queries while still keeping the base data in relational tables.
In this exercise, you will use the Storage Design Wizard again to change the storage type of the Sales cube from ROLAP (selected in a prior exercise) to MOLAP. Then you will look at the physical file created to store the aggregations created by OLAP Services.
A window appears in which you can see your cube being processed. When processing is complete, a message appears, stating that processing completed successfully.
To see the file created for the Sales cube, use the Microsoft Windows Explorer to navigate to the C:\Program Files\OLAP Services\Data\ Northwind_DSS\Sales folder. If the directory path is incorrect, the installation has been altered from default selections, and you can use the Find applet provided by the operating system to locate the file.
After you create a cube and define the aggregations, you must populate the cube with the aggregations (and base data, if appropriate). Populating the cube is called processing; processing entails a complete load of the data. When you process a cube, OLAP Services retrieves and stores the base data and calculates the aggregations based on the defined dimensions. The aggregations are then stored in either a cube structure or a relational database. The processing period can be very time-consuming based on the amount of base data, the number of dimensions and aggregations calculated, and the hardware you are using.
You should process a cube
If you choose to update incrementally, this
The existing cube does not have to be reprocessed.
When you refresh data,
Refreshing is used for nonstructural and nondata changes to the cube. For example, if access roles change, the definition of the cube must be refreshed.
Virtual cubes are simply logical cubes based on physical cubes.
A virtual cube is a superset of selected cubes in a database. A virtual cube allows you to create a broader view of your multidimensional data, without utilizing additional physical storage space on your drive. Using a virtual cube allows users querying the database to see the data from different cubes in the same place, providing more powerful analysis capabilities.
In this exercise, you will use the Cube Wizard to build a second cube that will be used for analyzing the shipping process. Using the Cube Wizard, you will combine the shared dimensions defined earlier with a new dimension that you will create that contains a few measures (the quantitative data derived from the columns in your tables). You will reuse the fact table that is also used for the Sales cube to gain insight about the shipments made by the Northwind Traders Company. Then you will create a virtual cube by joining the Sales and Shipping cubes.
In the Schema pane of the Cube Editor, you can see the fact table (with a yellow title bar) and the joined dimension tables (with blue title bars). In the Cube Editor console tree, you can preview the structure of your cube. You can edit the cube properties by clicking the Properties button.
At this time, you decide you need another dimension that gives you data about the supplier. The supplier information has been rolled up into the Product dimension. The Product dimension is already being used in this cube, but the Supplier information was not added as a level in the product dimension. You need to create a new dimension with this information.
A window appears in which you can see your cube being processed. When processing is complete, a message appears, stating that processing completed successfully.
The Cube Browser appears, displaying a grid made up of one dimension and the measures of your cube. The additional dimensions appear in the top of the browser.
Partitions are storage areas for cubes and relate to our OLAP environment in the following manner:
The power of partitions can be seen in segmenting parts of a cube onto separate partitions. For example, the data of each month may be stored on separate partitions. Monthly partitions can then be merged at year end to create one partition with annual data. Each partition can use a different storage mechanism, enabling tuning of performance.
There are three choices available when creating a partition:
For example, if you have a multiyear dimension, you may put individual years on separate partitions. This is a slice of the time dimension. This choice is useful when most queries only hit a certain subset of a dimension, such as the current year.
A partition can use a fact table different from that defined for the main partition of the cube. To use a different fact table, the structure must be the same as the main fact table of the cube. For example, you may store different products in different fact tables, but all the tables have the same structure. You could then place these different fact tables on different partitions.
It is possible to break up a single cube over multiple partitions. This process is similar to data slicing, except the division criteria may not be along dimension lines. Extra care must be taken to ensure that there is no duplicate data on the partitions.
Considerations in using partitions include the following:
Figure 9.12 shows an example of a cube using multiple partitions. Each partition can have its own optimization set to help control the size of each cube. The more frequently a cube is accessed, the more aggregations we will generally create.
Figure 9.12 Example of a cube using multiple partitions
In Figure 9.12, the Sales cube is based on three partitions: Europe, America, and Asia.
Effective use of partitions can give you a number of benefits:
You can merge multiple partitions back into one partition, using a merge process. You must be careful to avoid data duplication during the merge process. The partitions that you want to merge must all
Failure to abide by any one of these restrictions disqualifies the partitions from being merge candidates.
If two partitions use different fact tables, the merged partition will use only the fact table of the target partition; you must manually merge the fact table of the source partition into the target partition. After merging partitions with different fact tables, it is important to process the cube. Failure to do so can show the user accurate aggregations but incorrect base data.
If you merge partitions created by slicing, the data is fine until the cube is processed. At that point, the data is refreshed using the top level of that dimension. For example, if you sliced your data by regions (Region 1, Region 2, and Region 3), all of which were part of a territory (Territory 1), merging Region 1 and Region 2 would return accurate results until the next refresh. When the data was refreshed in the merged partition, it would default to Territory 1, which would include Region 3. To avoid this, you would have to specify a WHERE clause that retrieved only the data that you needed.
OLAP Services integrates with the security model of Windows NT. This feature allows you to restrict access to cubes, virtual cubes, and the underlying database.
Authentication of users occurs at the operating system level. Windows NT validates users using its normal authentication schemes. Access to OLAP Services management features requires the user to be a member of the same domain as the user account under which the server was installed or to be a member of a trusted domain. A user is denied access if the user s account cannot be authenticated against a trusted domain.
Access control is handled through access control rights.
The three levels of access control are
Roles are a tool provided by Microsoft that allows you to collect users into a single unit, against which you can assign permissions. This way, instead of having to add a certain set of permissions to a number of users, you assign that set of permissions to a role and simply place users in that role.
Management of cubes involves
OLAP Services provides a management interface and wizards that facilitate these management objectives.