Next Steps

We have focused on dimension issues, which are a common cause of low-quality data. There are situations where the measures are of poor quality. Here are some tips that you might use to solve some of these problems.

Out of Range Values

Data quality might be affected by out-of-range values, but you need to have clear business rules regarding the ranges. You would use a Conditional Split transform to direct a fact row to the normal data flow or to a flow to handle the out-of-range condition. For an example of how to implement this, look at how the customer merge was handled, where we used a Conditional Split transform to decide whether a sufficient match existed between two customer records.

Unexpected Values

You might not necessarily know the true range for each measure, or the range might depend on some complex combinations of the values of some of the dimensions. Often the rules are far too complex to be explicitly stated. In these cases, using a data mining query to assess the probability of a value being correct based on known occurrences of good values and related dimension members is recommended. Because this only provides you with a probability, you will want to route low-probability rows through a manual process for review. The threshold of "low probability" should in most cases be fairly high to minimize false positives. Chapter 10, "Data Mining," provides an introduction to data mining techniques.

Null and Zero

A null measure is considered empty by Analysis Services. Depending on the settings of your cube browser, empty cells might not display. Zero values, on the other hand, are not empty and will display. You should consider whether to convert null measures to zero. You can use a Derived Column transform to check for and convert nulls to zeros.

Practical Business Intelligence with SQL Server 2005
Practical Business Intelligence with SQL Server 2005
ISBN: 0321356985
EAN: 2147483647
Year: 2007
Pages: 132 © 2008-2017.
If you may any questions please contact us: