When building business intelligence solutions at the enterprise level it is common to work with terabytes of source (also known as fact or detail) data. You can bet a company like Wal-Mart has many terabytes of detailed data feeding into its business intelligence solutions. Even if you are working with just a few hundred gigabytes, you will find the use of partitions to be critical to your success.
By adding partitions to your overall cube design strategy, you can manage how and where cube data is physically stored, how a cube is processed as well as the time required for processing, and how efficiently Analysis Services 2005 can retrieve data in response to user queries. One key benefit of partitioning is the distribution of data over and across one or more hard disk drives. And in the case of remote partitions, the data can be spread over various machines. Partitions can even be processed in parallel on the remote machines. In this section, you first learn how to set up a local partition. Then, in the section that follows, you learn how to set up a remote partition configuration — which, by the way, is not the simplest procedure.
In order to work with partitions, you first need administrator privileges on both the local and remote instances of Analysis Services you intend to use. Administrator privileges are granted to member groups or users assigned to the Analysis Services 2005 server role. Being a member of the server role is analogous to being a member of the OLAP Administrator's group in Analysis Services 2000. To join the Server role first open SSMS and connect to each Analysis Services instance you plan to use. For each instance, you need to perform the following steps:
Right-click the instance name and select Properties. In the Analysis Services Properties dialog, shown in Figure 9-57, click the Security tab in the top-left pane. Then, click the Add button.
Next you need to enter your fully articulated user name in the Enter the Object Names to Select box. In Figure 9-58, you can see how redmond\stephenq was entered. To validate your entry, just click the Check Names button. Finally, click OK to close all dialogs. You now have serverwide administrator rights. Be sure to repeat these steps on the local and the remote servers.
An important thing to know about partitions is that one partition per measure group in a cube is created behind the scenes to accommodate the storage of data and metadata of your cube — so without any action on your part, beyond the creation of the measure groups, you already have partitions on your computer. When you explicitly create a local partition, you add it to the existing partition for a measure group. So, why should you take extra steps to add partitions? Well, by using partitions, you can spread data across multiple hard disk drives on a single computer. Because very large partitions slow down cube-related activities, dividing one large partition into multiple smaller partitions can improve processing and query times.
In the following exercise, your goal is to replace the single Internet Sales partition for 2004 into two partitions of equal size. This will require you to change the parameters on the existing partition to make room for the new partition; otherwise, Internet sales for 2004 would be double-counted because both partitions would contain the same data. Double-counting, by the way, is something you must be very alert for because it will result in incorrect results.
To create a local partition, follow these steps:
Using BIDS, open the Adventure Works DW sample project located at C:\Program Files\ Microsoft SQL Server\90\Tools\Samples\AdventureWorks Analysis Services Project\ Enterprise. Double-click the Adventure Works cube name in Solution Explorer to open the cube designer.
Click the Partitions tab. Your screen should look similar to Figure 9-59.
If necessary, expand the Internet Sales section by clicking the arrows to the left of Internet Sales. Click the Source box for Internet_Sales_2004, and then click the button with two dots appearing in the box to open the Partition Source dialog.
Because you need to use a very similar filter query when you create the second Internet Sales 2004 partition, highlight the query and copy it to the clipboard or Notepad. You will modify the query to change the partition such that it contains only data with OrderDateKeys between 915 and 1098 (inclusive) as shown in Figure 9-60, and click OK.
Next, click the Partition Name "Internet_Sales_2004" and change it to "Internet_Sales_2004a."
Click the New Partition link to launch the Partition Wizard.
In the Partition Wizard, under Available Tables select the check box next to dbo.FactInternetSales and then click Next. You should see the wizard page as shown in Figure 9-61.
Click the Specify a Query to Restrict Rows check box. Delete the default query that shows up in the query window and paste in the query you previously saved. Edit the WHERE clause to limit partition data to rows with OrderDateKey >= ‘1099’ AND OrderDateKey <= ‘1280’.
Click Finish (you're storing your partition to the default location). In the Name box, type Internet_Sales_2004b, click the Design Aggregations Later radio button, and then click Finish.
Naturally, you would want to design aggregations for your new partition, and deploy and process it. In this case, you have learned how to use the partition wizard for creating a new partition without duplicating data in the process. Again, data duplication must be guarded against because it leads directly to wrong results. The next section takes on the formidable and powerful remote partition.
The basic architecture of the remote partition keeps data definitions (metadata) in the cube on the master (or parent) machine and off-loads the measure or detail data to the remote partitions on subordinate machines. In terms of administration tasks related to remote partitions, the host machine containing the cube metadata acts as the point of control for all related remote partitions.
To implement a remote partition in the most meaningful way, you need two computers. Earlier in this chapter, you followed steps to make sure the permissions were set to work with the local and remote computers with appropriate credentials. In addition, you must have the firewall settings on the master computer (the host box) and subordinate box configured to accept outside connections for Analysis Services. The computer storing the remote partition on it is called the Subordinate (Target) computer. In the tutorial that follows, we are using two instances on one machine (localhost and localhost\I2) to demonstrate remote partitions; in this case, firewall settings are not necessary. If you are going for a two machine configuration, of course, you will need to set the firewalls appropriately. Before working with the subordinate computer, you set the stage for successful inter-server interactions by starting your work on the Master computer.
In BIDS, open the Adventure Works DW sample project located at C:\Program Files\Microsoft SQL Server\90\Tools\Samples\AdventureWorks Analysis Services Project\Enterprise.
Right-click the project name, Adventure Works DW, in Solution Explorer and click Properties to access the Property Pages, as shown in Figure 9-62. Select the third item down below Configuration Properties in the pane on the left (Deployment). Make your settings consistent with those shown in the figure; be sure the correct name is listed for the Master Server which, if you're not using a named instance, should be localhost. Click OK.
In Solution Explorer, right-click the Adventure Works DW database icon and select Deploy to deploy and process the project.
Now it's time to work on the target, or subordinate, machine (or instance). Open BIDS, and then create a new Analysis Services project called Target.
Right-click the Data Sources folder in Solution Explorer and select New Data Source. In the Data Source Wizard dialog, click the New button to open the Connection Manager dialog.
First, change the Provider to Native OLE DB\Microsoft OLE DB Provider for Analysis Services 9.0. In the Connection Manager dialog, there is a Server Or File Name text box, into which you type the name of your master Analysis Services instance. If you are using an unnamed instance (the default instance) on the Master machine, type the machine name (or "localhost"). Set the authentication properties in Log on to the Server consistent with your own configuration. Finally, click the down arrow for Initial Catalog and select Adventure Works DW (the dialog should look like Figure 9-63) and click OK.
In the Data Source Wizard, click Next. Set the Impersonation Information at this time to verify correct security settings. If you are unsure what to use, try Default. Click Next.
You need to provide a new name on this page; we suggest "ASDB_AdvWorksDW" and then click Finish to dismiss the dialog.
Just to be clear, you're still on the subordinate machine or instance. Right-click the project name in Solution Explorer and choose Edit Database. Now, change the MasterDataSourceID from empty to "ASDB_AdvWorksDW" as shown in Figure 9-64.
Right-click the database name in Solution Explorer and choose Properties. Click Configuration Properties and then click Deployment. Finally, click Server and set the name to the subordinate instance of Analysis Services as shown in Figure 9-65. Click OK to close the dialog.
Deploy the project to move the metadata to the Analysis Services instance (localhost\I2).
For both localhost and localhost\I2 do the following in SSMS:
Connect to the service.
Right-click the instance name, and select Properties.
Change the Value (first column) of both Feature\LinkToOtherInstanceEnabled and Feature\LinkFromOtherInstanceEnabled to True.
Click OK to close the Analysis Servier Properties dialog. To make these changes stick, you need to restart the instance of Analysis Services; just right-click the instance name in Object Explorer and click Restart and click Yes when asked for verification.
In BIDS, create a second data source in the Adventure Works DW project to connect to the subordinate (target) instance of Analysis Services, and specifically at the target project (named Target here). Use the Native OLE DB\ Microsoft OLE DB Provider for Analysis Services 9.0 (see Figure 9-66).
Click OK to close the Connection Manager dialog. Select the created data source for the Target database; then click Next. On the Impersonation Information page, select Default and click Next. Accept the default name provided by the Data Source Wizard and click Finish.
It is nearly time to create the remote partition; but first you need to make room for one by deleting an existing partition. Let's sacrifice one of the Fact Internet Sales partitions by opening the Adventure Works cube and clicking the Partitions tab; then click the double down arrows next to Fact Internet Sales to open the section for the Internet Sales measure group. To delete the Internet_Sales_2001 partition, right-click on the partition and click delete as shown in Figure 9-67.
To build the remote partition, click the New Partition link. On the first page, click the checkbox next to dbo.FactInternetSales under Available Tables, and then click Next.
On the Restrict Rows page of the wizard; enable "Specify a query to restrict rows."
The partition wizard creates a relational select query up to the WHERE clause. At the end of the WHERE clause, type in "OrderDateKey<=184" as shown in Figure 9-68 and click Next.
In the Processing and Storage Locations page, select the option "Remote Analysis Services Data Source," select the data source Target as shown in Figure 9-69 and click Next.
Select the "Design aggregations later" radio button, name the partition "Internet_Sales_2001" as shown in Figure 9-70 and click Finish.
Open the Internet Sales Order Details dimension and change the storage mode from ROLAP to MOLAP in the Properties window.
Important Security Information for Remote Partitions: We have chosen the Impersonation mode to be "Default" for the Data Sources within the master and subordinate databases (instances). By choosing default, at the time of processing, Analysis Services instances use the service's start-up account while connecting to Analysis Services 9.0 databases. If your server start-up account for both instances is a Windows domain (user) account, then you would have appropriate permissions to access the databases (so long as that start-up account has admin permissions). However, if both instances were installed with server start-up accounts as "Local System" then you might encounter an access permissions error when the master database is trying to connect to the subordinate. This is due to the fact that connections to named instances of Analysis Services are routed via SQL Browser and by default, the server start-up account for SQL Browser is "Network Service." If your installation has the SQL Browser server start-up account as "Network Service" and the Analysis Services instances server start-up account is "Local System," please change the server start-up account of SQL Browser and the two Analysis Services instances to a windows domain account using SQL Server Configuration Manager. The Configuration Manager can be launched from Start All programs Microsoft SQL Server 2005 Configuration Tools SQL Server Configuration Manager. For more information on changing a server start-up account, please refer to Chapter 12.
Finally, to populate the partition on the remote machine, right-click the Adventure Works cube name in the Solution Explorer, and choose Process. When asked whether to deploy the project first, select Yes. After processing is complete you will be able to query data from the remote partitions.
If you were successful, you can see the remote partitions quite clearly after processing as shown in Figure 9-71. And if it doesn't work for you the first time, don't worry, there are a lot of steps and therefore lots of opportunities to get things messed up. Most likely though, are security errors; be sure to verify you have all the correct Firewall settings and Impersonation settings. Impersonation settings are changed in a secondary tab in the Data Source wizard.
Where you store partition data is just one part of the storage picture; the mode in which you store it is the other. The storage modes used with Analysis Services solutions include MOLAP (Multi-dimensional OLAP), ROLAP (Relational OLAP), and HOLAP (Hybrid OLAP). These storage types were discussed in some detail way back in Chapter 1, so this section contains only a brief review. The main difference between these storage modes concerns where the data and (or) aggregated fact data is stored. MOLAP is the traditional storage mode for OLAP Servers and involves keeping both data and aggregations on the server. This results in fast query response times, but it is not as scalable as other solutions. ROLAP is the storage mode in which the data is left in the relational database. Aggregated or summary data is also stored in the relational database. The key advantage here is that the ROLAP will scale as well as your relational hardware/software will supportIt can sometimes result in slower queries, though. The HOLAP storage mode combines the best features of MOLAP and ROLAP. The data in the relational database is not touched while the aggregated or summary data is stored on the OLAP Server in a proprietary format; queries that can be resolved in the OLAP Server are, and those that cannot be are redirected to the relational backend.
MOLAP is generally the preferred mode due to the performance gains and efficiency in its use of storage space. If you need to analyze real-time data, ROLAP is probably more appropriate. ROLAP is also a better option when you have a very large data warehouse and you do not want to duplicate the data. Each partition can have its own storage mode, which you specify on the Partitions tab of the cube designer in BIDS. Just click the Storage Settings link as shown in Figure 9-72.
When you click Storage Settings, the first page in the wizard is for Proactive Caching configuration (see Figure 9-73). Proactive Caching is a mechanism that control over the latency associated with moving data to MOLAP and reprocessing data too. You learn more about Proactive Caching options suited for various real world scenarios in Chapter 18.
Aggregations are what make OLAP fast to use. They are pre-calculated summaries of the data. In Analysis Services, the storage and processing schemes are linked through the setting of caching options. Caching is a way to increase query response time by keeping the data used most often on a local disk.
To start the Aggregations Design wizard, click Design Aggregations in the Partitions tab of the cube designer. You use this wizard to create aggregations for each partition to improve overall query performance. You learn about aggregations in detail in Chapter 13. When you run the wizard, the performance benefits gained are graphically compared to the storage space required to store the aggregations, as shown in Figure 9-74 where you can see that the Performance Gain is set to 30%. A good starting target should be around 25% to 30%, but you will learn about the Usage Based Optimization Wizard in Chapter 13 for fine-tuning performance gains even further.
You can validate that the aggregations were defined by the wizard by looking at the partitions tab with the relevant measure group open and showing the partitions, as shown in Figure 9-75.
To validate the aggregations defined by the wizard are useful you need to trace the query execution using SQL Server Profiler which you learn in Chapter 13.