Next Steps

Here we provide some additional avenues for you to consider to improve your solution, after you have implemented the basic elements of this chapter.

Automating the Process

In this chapter, we described a scenario where the data is constantly arriving, or at least arriving very frequently in batches. Our purpose was to describe the ways you can efficiently process high volumes of newly arriving data and be able to query large databases efficiently. We want to leave you with some ideas on how to detect when new data has arrived. In Chapter 4, we discussed how you can browse a directory for files and pass a filename into a Data Flow task. Here are some other avenues you can explore:

  • Two Control Flow tasks that you can look at for detecting file arrival are the Message Queue task and the WMI Event task.

  • Chapter 12, "Real-Time Business Intelligence," on real-time OLAP describes methods for automatic background processing of cube partitions when new data arrives.

  • A complete object model exists for working all aspects of partitions, ETL packages, and cubes. You can quickly build an application to automate the processing of data from the arrival of the flat files through to incrementally processing the cubes and creating new partitions in any language supporting the Common Language Runtime (CLR).

Partitioned Views

As noted in the introduction, you can also use partitioned views (PVs) to partition your data. This allows you to create a unified view of a table distributed over multiple databases. You can now manage each partition of the data independently. If the databases are on other servers, you have a distributed partitioned view (DPV). The advantage of DPVs is that the load can be distributed over several servers, and the partitions are searched in parallel. Each table has a constraint similar to one boundary point in the partition function. The individual tables are combined in a view that unions the tables together. This makes the use of the partitioned view transparent to a user. Select statements issue parallel queries to the relevant partitions. Data inserted through the view will be inserted into the table on the correct server.

The databases do not need to contain all tables, only the tables you want to divide into partitions. By using all but one database for historical data, you can find some operational benefits. If your partitioning is by time, you can reduce the backup time by marking the historical databases read-only; then, you only need to back up a much smaller current database.

You should consider PVs and DPVs only when you have extremely large databases, and queries that usually require data from only a few partitions. PVs, which by definition do not involve the network, are less resource intensive than DPVs. DPVs allow you to scale out over several servers, but the cost of initiating a query to another server across the network is relatively high compared to a PV or local query, so you should measure the performance of a DPV to ensure you are receiving a benefit from this design.

Scaling Out Using Analysis Services Database Synchronization

You can deploy the same database on several Analysis Services servers using Database Synchronization. This increases the number of users that you can handle. In this scenario, the Analysis Services servers have identical but separate databases. The partitions exist on locally managed storage (which could be a storage area network [SAN] drive). Each server has its own IP address and service name, but also has a virtual IP and virtual service name provided by the load-balancing service. The load-balancing service is either Windows NLB or a hardware device such as an F5 switch. Figure 11-11 shows a typical configuration.

Figure 11-11. Analysis Services database synchronization

You can initiate synchronization from SQL Server Management Studio by right-clicking the Databases node of the target server (not the source server). You then specify a source Analysis Services server and database you want to synchronize with. Using the same wizard, you can create a script to enable you to schedule periodic synchronizations.

During synchronization, both the source and the target can continue to be queried by users. When synchronization finishes, Analysis Services switches users to the new data, and drops the old data from the destination database.

Practical Business Intelligence with SQL Server 2005
Practical Business Intelligence with SQL Server 2005
ISBN: 0321356985
EAN: 2147483647
Year: 2007
Pages: 132 © 2008-2017.
If you may any questions please contact us: