Summary


It is almost a given that your source data will have errors. Your goal is to get as much useful data into the data warehouse without introducing incorrect data. If a fact isn't providing a business key for a dimension, the data is still useful in other dimensions, just not in that one missing dimension. You should make use of the error output of most of the data flow transforms to handle imperfect data. You can deal with imperfect data by rejecting it, creating an inferred member if you have a business key without a matching dimension member, or you can simply assign the fact to an unknown or miscellaneous category in the dimension. Fuzzy matching can help increase the number of facts successfully matched to known dimension members.

You need to track the quality of your data. Facts that you reject should be written to a reject table for later handling. Even if you can't fix them, you can gain an understanding of what needs to be fixed in the source systems. You can use a simple ratio of rows input from the source to rows output to the data warehouse to gauge the quality of your data. Place rejected and auditing data in a database that is separate from both the staging and data warehouse databases.



Practical Business Intelligence with SQL Server 2005
Practical Business Intelligence with SQL Server 2005
ISBN: 0321356985
EAN: 2147483647
Year: 2007
Pages: 132

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net