Managing performance is an ongoing task that includes optimizing the design of your data warehouse as well as monitoring and analysis. In this lesson, you will learn how to design and optimize your data warehouse databases. You will learn more about using multiple partitions in an OLAP Services database when you are running SQL Server OLAP Services Enterprise Edition. You will learn how to use the Usage-Based Optimization wizard and the Usage Analysis wizard.
After this lesson, you will be able to:
- Describe optimization strategies for a data warehouse
- Describe optimization strategies for cubes
- Create cube partitions
- Optimize performance by determining appropriate levels of aggregations, indexing, and storage methods
Estimated lesson time: 120 minutes
You can apply several important tuning steps to your data warehouse to optimize performance.
Data in your data warehouse or data mart should be indexed properly to minimize the time required to extract the data from the relational system while building cubes. You should
TIP
Create a clustered index on each dimension table on a column or group of columns that are searched frequently in ranges or are accessed in sorted order. For a large fact table, you should not create a clustered index because clustered indexes are to expensive to maintain. The organization of rows is maintained when new data is added to the fact table because typically the existing data is not updated in a warehouse
In your data warehouse or data mart, you should denormalize the schema to reduce the number of joins required when the data is queried. Typically, you will use a star schema to accomplish this reduction.
Optimizing cube performance starts with solid cube design. To ensure that your cubes are designed most efficiently, consider the following factors.
You should strive to reduce unnecessary complexity in your cubes by analyzing exactly what information is needed. You should use the minimum number of dimensions, levels, or members necessary to accomplish the desired task.
Removing unneeded dimensions, levels, and members from cubes can greatly increase the performance of processing and querying. Additionally, you can optimize the cube schema by removing unnecessary joins in the data warehouse tables.
When optimizing your cube, you can use the Optimize Schema option on the Tools menu in the Cube Editor to help identify potential deficiencies in your cube design. In general, if you are using a single data warehouse as a source, you should ensure that it is optimized first. If you are using more than one data source, you may need to optimize the schema in the cube.
Using partitions can significantly improve query performance, particularly on multiprocessor systems or when you have partitioned a cube across two or more servers. When you use multiple partitions on multiprocessor systems, the query parser is able to divide the workload more efficiently among the processors to improve performance.
Each partition uses two operating system threads: one to read data from the data warehouse and one to load data into the cube and create aggregations and indexes. By using multiple partitions, OLAP Services can perform processing simultaneously on different parts of the cube, provided there are sufficient hardware resources available and the OLAP Server has been configured properly.
When you create a cube, at least one partition is automatically created to store cube data. You can use the Partition wizard to create partitions and store a cube across many partitions. From the users perspective, partitions have no effect on the cube. Figure 14.7 shows how you can partition the Sales cube based on yearly data.
Figure 14.7 Partitioning the Sales cube based on yearly data
Reasons to Create Partitions
Creating additional partitions allows you to
TIP
To create partitions that will be candidates for later merging, you can select the storage mode and copy aggregations from another partition when you create the partition.
Scenario Example
An organization determines that the best way to store its data is in one cube, partitioned by age, because most queries are performed using the time dimension. In this scenario, the last partition is incrementally updated nightly, and as long as the history partition doesn t change, the cube does not have to be reprocessed.
You should consider additional configuration settings when optimizing your configurations. In this section we take a closer look at aligning fact tables and partitions for optimal performance and maintainability. Different Fact Table for a Partition
As you see in the top of Figure 14.8, you can define a fact table for a partition that is different from that defined for the primary partition of the cube. This type of partition is useful when you want to store different types of data, such as different product lines, on separate partitions.
Figure 14.8 Allocating data to partitions
When creating this type of partition, you must ensure that the fact table structure is the same as the primary fact table structure. For example, you may store different products in different fact tables, but all the tables must have the same structure. You could place these different fact tables on different partitions.
Filter Data from One Fact Table into Multiple Partitions
In the second example of Figure 14.8, you see that it is possible to break up a single cube over multiple partitions. This is similar to data slicing, but it has greater flexibility because the division criteria do not have to be along dimension lines. Extra care must be taken to ensure that there is no duplicate data in the partitions.
Slice Data from One Fact Table into Multiple Partitions
In the final example of Figure 14.8, it shows that you can also slice across dimensions and store each dimensional data slice in a separate partition. This type of partition is useful for queries that access only a certain subset of a dimension, such as the current year. For example, if you have a multiple-year dimension, you may put individual years on separate partitions. This is a slice of the Time dimension.
By determining the cost of adding cubes compared with the query performance, you should carefully analyze how the data is sliced or whether to slice data at all.
NOTE
Only SQL Server OLAP Services Enterprise Edition running on Windows NT Enterprise Edition supports the partitioning feature of OLAP Services.
Although it is not typically desirable to merge partitions, Microsoft has provided the functionality necessary to perform this task. It is important to plan your partitions carefully to get optimal performance and to avoid unnecessary merges. As you will see in this section, merging is a complex process that can cause errors in your data if it is not performed correctly.
Determining Which Partitions to Merge
When determining which partitions to merge, you must ensure that all partitions
OLAP Services will not allow you to merge partitions that do not meet these requirements.
Merging Sliced Data
If you merge partitions created by slicing, the merged partition can contain incorrect data after it is processed.
For example, your data is sliced by regions (Region 1, Region 2, and Region 3), all of which are part of a territory (Territory 1). Each region is stored in a different partition. If you merge the Region 2 and Region 3 partitions, when you process the data in the merged partition, the content of the merged partition will default to the parent of the regions, which is Territory 1. After processing the cube, the second partition will now include the data from Region 1, Region 2, and Region 3. Region 3 would therefore be duplicated. To avoid duplication, you need to define a filter on the merged partition that limits its contents to Region 2 and Region 3.
Merging Partitions That Use Different Fact Tables
If two partitions use different fact tables, the merged partition will use only the fact table of the target partition; you must manually merge the fact table of the source partition into the fact table of the target partition. After you merge partitions with different fact tables, it is important to process the cube. Failure to do so can show the user accurate aggregations but incorrect base data.
Whether you create or merge partitions, you must manage these activities to ensure that the partitions of a cube always contain the correct data.
Ensuring Data Consistency
Data consistency in the cube is the primary concern when creating and merging partitions. Inconsistent data produces incorrect summaries and presents erroneous data to users.
For example, if a sales transaction for Product n is duplicated in the fact tables for two partitions, summaries of Product n sales can include a double accounting of the duplicated transaction.
To ensure data consistency, it is important to create appropriate filters for your partitions.
Defining Filters
When you partition cube data, you can explicitly define a filter in the definition of a partition to minimize unexpected effects on data that changes to the underlying data warehouse data can cause. Explicitly defined partition filters have different effects depending on the specific storage structure of the partition. These differences are shown in the following table.
Storage | Filter Effect |
---|---|
MOLAP | Specifies which data is read from the data warehouse to create aggregations when the cube is processed. New data is not available until the cube is reprocessed. |
ROLAP and HOLAP | Specifies which data from the data warehouse can be accessed by queries. New data is available to queries immediately, which can cause data inconsistencies. For this reason, it is often important to use a filter to explicitly exclude new data. |
In this exercise, you will use OLAP Manager to create a new database in OLAP Services. You will then create multiple partitions for the Sales cube.
You must be using OLAP Services Enterprise Edition in order to complete this exercise.
In this procedure, you will use OLAP Manager to create a new database called Northwind_Partition and add a data source for the database.
Option | Value |
---|---|
Provider | Microsoft OLE DB Provider for SQL Server |
Server name | Your server name |
Log on to the server | Use Windows NT Integrated Security |
Database | Northwind_Mart |
In this procedure, you will build the Time dimension.
Option | Value |
---|---|
Create a new dimension from | A single-dimension table |
Dimension table | Time_Dim |
Dimension type | Time dimension |
Date_column | TheDate |
Time levels | Year, Quarter, Month, Day |
Dimension name | Time |
In this procedure, you will build the Product dimension.
Option | Value |
---|---|
Create a new dimension from | A single-dimension table |
Dimension table | Product_Dim |
Dimension levels | CategoryName and ProductName columns, in that order |
Dimension name | Product |
In this procedure, you will build the Customer dimension.
Option | Value |
---|---|
Create a new dimension from | A single-dimension table |
Dimension table | Customer_Dim |
Dimension levels | Country, Region, City, and CompanyName columns, in that order |
Dimension name | Customer |
NOTE
When defining some levels, you may receive warning messages that the number of children is less than the number of parents. In the cube, this message is expected when defining the Region level in the Customer dimension. The Region column stores the state or province for each country. Some countries do not have Region attributes. Click Yes when this message occurs.
In this procedure, you will build the Shipper dimension.
Option | Value |
---|---|
Create a new dimension from | A single-dimension table |
Dimension table | Shipper_Dim |
Dimension levels | ShipperName |
Dimension name | Shipper |
In this procedure, you will use the Cube wizard to define the Sales cube.
Option | Value |
---|---|
Select a fact table | Sales_Fact |
Select numeric columns that define your measures | Line Item Total, Line Item Quantity, and Line Item Discount columns, in that order |
Select the dimensions for your cube | Time, Product, Customer, and Shipper |
Cube name | Sales |
In this procedure, you will use the Partition wizard to partition the Sales cube by using the data slice method.
Option | Value |
---|---|
Data source | Northwind_Mart |
Fact table | Sales_Fact |
Select the data slice | Time dimension, member 1996 |
Partition name | Year 1996 |
Design aggregations for your partition now | Selected |
Option | Value |
---|---|
Data storage type | MOLAP |
Aggregation options | Performance gain reaches 50% |
Create aggregations | Click Start Notice the Performance vs. Size graph and the number of aggregations designed for you. |
Save, but don't process now | Selected |
In this procedure, you will process the Sales cube.
To ensure that you have processed the most efficient and frequently used aggregations, you should audit the queries that have been submitted by users. SQL Server OLAP Services provides the Usage Analysis wizard and the Usage-Based Optimization wizard to help you to optimize your cubes based on usage.
TIP
SQL Server provides extensive monitoring capability that is integrated with the Windows NT Performance Monitor. You should perform standard performance monitoring and analysis of SQL Server in addition to using the OLAP Services analysis tools covered in this lesson.
Aggregations improve query speed but come with the cost of increased disk storage space and cube processing time. By reviewing the queries that users submit, you can determine which aggregations can answer the majority of queries and can eliminate unnecessary ones.
Usage-based aggregations can be used to set aggregation levels automatically based on actual query history. This can be particularly useful when the same types of queries are used repeatedly. Run the Usage-Based Optimization wizard to create aggregations based on actual query history.
NOTE
As we discussed in Lesson 4 of Chapter 9, "Managing Cubes Using Storage Types and Partitions," the storage method that you choose also affects the performance of OLAP Services. Review the table in Chapter 9 that compares MOLAP, ROLAP, and HOLAP.
In this exercise, you will use the Usage-Based Optimization wizard to process the Sales cube in the Northwind_DSS database based on the queries that you have run on the Sales cube.
You can run the Usage Analysis wizard to see reports that show information about the queries that have been submitted. Reports are in either a tabular or a histogram graph format. The following reports are available:
A table that shows the length of time it takes each query to run. The queries are ordered from the longest to the shortest runtime.
A table that shows how frequently each query is run. The queries are ordered from the most frequently to the least frequently run query.
A table that shows the number of queries that have been run by each user. The table is ordered from the most active to the least active user.
A histogram that graphs query response times in ranges from fewer than 5 seconds to more than 60 seconds.
A histogram that graphs the number of queries that run during each hour of the day.
A histogram that graphs the number of queries that run grouped by date.
You can filter the reports on the following criteria:
Review Query Log Sample Frequency
The Usage-Based Optimization wizard and Usage Analysis wizard rely on query history logging. By default, every tenth query received by the OLAP Server is logged. You can set the logging frequency on the Query Log tab in the Server Properties dialog box.
In this exercise, you will view Usage Analysis wizard reports for the Northwind_DSS database.
Effective use of indexes and optimal design of your SQL Server databases and OLAP Services cubes are essential for gaining the best performance from your data warehouse. Using multiple partitions provides scalability, increased manageability, and increased performance if you are running SQL Server OLAP Services Enterprise Edition. The Usage-Based Optimization wizard and the Usage Analysis wizard are useful tools for tuning your OLAP Services cubes based on actual user query patterns.