Update Frequency and Persistence


The concurrency of available data in a data warehouse will depend greatly on the frequency of updates from the source data systems. However, updating more often is not necessarily better in all cases. Developers' experience shows that some operational data is in balance only at the close of certain reconciliation cycles. It might be misleading to import new source records in the middle of a sales reporting cycle because the impartial results will be misleading. Time and expenses are often reported at the end of the two-week payroll period, and dissecting data in intervals that are more granular would only prove to be erroneous.

In some cases, it will make sense to import operational data frequently. Near-real-time systems incorporate tight data extraction cycles that populate the data warehouse on an ongoing basis. One caveat in this environment is that users will see slight fluctuations as they view analytic dashboards and reports. Even in the time it takes to run two or three different reports, the data can change enough to create an out-of-balance situation between the reports. This phenomenon is often referred to as twinkling data, analogous to counting busy ants in an anthill. Although this situation can be difficult to manage, having up-to-the-hour analytical data can be an important business requirement in your environment. Just be prepared for quite a bit more design work and administrative overhead.

Finally, the opportunity to capture some data from operational sources might be limited as details are purged from the system. You might not be able to get three years' worth of daily data if your operational system maintains only a rolling 13 months' worth of details.

Historical Data

On the other side of the data concurrency equation is how to manage older data. At some point, most data should be retired or archived to prevent excessive or unnecessary data volumes in the data warehouse. In some industries, companies are required to keep records on hand for a period of time. For others, the business requires years of historical data to spot trends and to learn from historical patterns. However, in many business environments, two-year-old data is worthless and an unnecessary commodity for most purposes. Your bank or credit card company isn't likely to provide a two-year-old statement of account activity without digging it out of a vault somewhere (and then only with a court-ordered subpoena).

The challenges associated with keeping both very current data and very old data are not insurmountable. In fact, many improvements were made in the SQL Server 2005 suite to help with these requirements. Federating and partitioning data is the key to maintaining optimal performance with changing and increasing data volumes. SQL Server 2005 Analysis Services, Enterprise Edition, allows a cube consisting of multiple partitions, with portions of the cube stored on separate physical storage devices. Separate physical disks or disk arrays can be managed separately and will improve data throughput. Separate cube partitions do not require separate fact tables in the relational data warehouse, but this can also improve management and performance under the right conditions.

There are technical and practical considerations when deciding on the amount of data required. Optimal storage for performance implications might result from the level of detail desired by the business users. The goal is to strike a balance between business requirements and technical constraints. You could, for example, design a solution that maintains summary information in Analysis Services for fast exploration while keeping detailed-level data in a SQL Server 2005 relational database for targeted queries.




Microsoft SQL Server 2005 Integration Services Step by Step
MicrosoftВ® SQL Server(TM) 2005 Integration Services Step by Step
ISBN: 0735624054
EAN: 2147483647
Year: 2007
Pages: 152

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net