12.4 Remedies for Value Rule Violations

12.4 Remedies for Value Rule Violations

Issues resulting from value tests can take the same path as those of other data profiling activities if they uncover data inaccuracies as the cause of output deviations. However, they lend themselves to additional remedies through adding the value tests as part of the operational environment to help data stewards monitor data over time.

This is useful for catching changes that are negatively affecting the quality of data, for capturing metrics on the impact of improvements made, or to catch one-time problems such as the loss of a batch of data or trying to push data to a summary store before all detail data is collected.

Transaction Checkers

Most value tests do not apply to executing transactions because they deal with values over a group of data. However, it is possible to perform continuous monitoring of transactions by caching the last n transactions and then periodically, such as every minute or 10 minutes, executing the value test against the cached set. The cache would be designed to kick out the oldest transaction data every time a new one is entered. This is a circular cache.

Although this does little to validate a single transaction, it can catch hot spots where the data accuracy is making a rapid turn for the worse. Profiles of value distributions are particularly appropriate for this type of monitoring. You would not use it for most value tests.

Periodic Checkers

Value tests are particularly suited for execution on a periodic basis: daily, weekly, monthly, or quarterly. They are also useful when performing extractions for moving data to decision support stores. In fact, every extraction should include some tests on the data to ensure that it is a reasonable set of data to push forward. It is much more difficult to back out data after the fact than to catch errors before they are loaded.

Value tests are also very useful to execute on batches of data imported to your corporation from external sources. For example, you may be getting data feeds from marketing companies, from divisions of your own company, and so on. It only makes sense that you would provide some basic tests on the data as part of the acceptance process.

Periodic checks suggest that the steward of the data be able to modify the expectations on a periodic basis as well. Each value rule can have a data profiling repository entry that includes documentation for the test, expectations, and result sets. This facilitates comparing results from one period to another and tracking the changes to expectations.

start sidebar

One of the benefits a data quality assurance group can provide to the business side is to help them formulate a set of quick tests that can be applied to a collection of data to determine its relative health. A concentration on how inaccurate data can distort computations is helpful in building such a suite.

What is nice about this is that it is fairly easy to do, and execution of the suite is generally very nondisruptive. It can provide valuable visibility to the data quality assurance group and return value quite early in the process of reviewing older systems.

end sidebar



Data Quality(c) The Accuracy Dimension
Data Quality: The Accuracy Dimension (The Morgan Kaufmann Series in Data Management Systems)
ISBN: 1558608915
EAN: 2147483647
Year: 2003
Pages: 133
Authors: Jack E. Olson

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net