Here we provide some additional avenues for you to consider to improve your solution, after you have implemented the basic elements of this chapter.
Automating the Process
In this chapter, we described a scenario where the data is constantly arriving, or at least arriving very frequently in batches. Our purpose was to describe the ways you can efficiently process high volumes of newly arriving data and be able to query large databases efficiently. We want to leave you with some ideas on how to detect when new data has arrived. In Chapter 4, we discussed how you can browse a directory for files and pass a filename into a Data Flow task. Here are some other avenues you can explore:
As noted in the introduction, you can also use partitioned views (PVs) to partition your data. This allows you to create a unified view of a table distributed over multiple databases. You can now manage each partition of the data independently. If the databases are on other servers, you have a distributed partitioned view (DPV). The advantage of DPVs is that the load can be distributed over several servers, and the partitions are searched in parallel. Each table has a constraint similar to one boundary point in the partition function. The individual tables are combined in a view that unions the tables together. This makes the use of the partitioned view transparent to a user. Select statements issue parallel queries to the relevant partitions. Data inserted through the view will be inserted into the table on the correct server.
The databases do not need to contain all tables, only the tables you want to divide into partitions. By using all but one database for historical data, you can find some operational benefits. If your partitioning is by time, you can reduce the backup time by marking the historical databases read-only; then, you only need to back up a much smaller current database.
You should consider PVs and DPVs only when you have extremely large databases, and queries that usually require data from only a few partitions. PVs, which by definition do not involve the network, are less resource intensive than DPVs. DPVs allow you to scale out over several servers, but the cost of initiating a query to another server across the network is relatively high compared to a PV or local query, so you should measure the performance of a DPV to ensure you are receiving a benefit from this design.
Scaling Out Using Analysis Services Database Synchronization
You can deploy the same database on several Analysis Services servers using Database Synchronization. This increases the number of users that you can handle. In this scenario, the Analysis Services servers have identical but separate databases. The partitions exist on locally managed storage (which could be a storage area network [SAN] drive). Each server has its own IP address and service name, but also has a virtual IP and virtual service name provided by the load-balancing service. The load-balancing service is either Windows NLB or a hardware device such as an F5 switch. Figure 11-11 shows a typical configuration.
Figure 11-11. Analysis Services database synchronization
You can initiate synchronization from SQL Server Management Studio by right-clicking the Databases node of the target server (not the source server). You then specify a source Analysis Services server and database you want to synchronize with. Using the same wizard, you can create a script to enable you to schedule periodic synchronizations.
During synchronization, both the source and the target can continue to be queried by users. When synchronization finishes, Analysis Services switches users to the new data, and drops the old data from the destination database.