7.6 Closing Remarks

7.6 Closing Remarks

Data profiling is described in this book as a generic technology. Any specific implementation of software and process to support it will be more or less complete for each step. For example, in value analysis you could invent new analytical techniques endlessly to micro-define what is acceptable. Similarly you can invent rules for business objects seemingly endlessly.

start sidebar

Data profiling is emerging as a unique and independent technology. However, analysts have performed data profiling throughout the years on all projects. You always have a phase of collecting information about data sources, mapping to targets, identifying issues, and crafting remedies.

The difference is that the analyst lacked a set of analytical tools designed specifically for the task of data profiling. They used ad hoc queries to test data. As a result, they generally did not have the time or resources to perform rigorous data profiling. They shortchanged the part about looking at the data. This meant that they tended to accept the gathered descriptions of data even though they cannot be trusted. The result has been a high rate of project failure or significant overruns.

One very experienced analyst once told me that he had been doing projects for 20 years and at the beginning of each project he promised himself that he would do it right that time. Doing it right meant thoroughly looking at the data to verify all gathered facts about the data, to uncover undocumented issues, and to discover the true state of data quality. In every case, he ended up stopping his investigation early in the project because of the enormous time and resources required to complete a comprehensive examination. In all case the project suffered later because of information about the data that was missed.

The emergence of a discrete methodology backed by software explicitly crafted for these tasks has greatly reduced the time and effort required to perform thorough data profiling. This is enabling data quality staff to use this approach effectively.

end sidebar

It is easy to get into "analysis paralysis" in performing data profiling by trying to micro-define correctness to the ultimate level and then burn up machines for days trying to validate them. At some point the process yields too little for the effort to be worthwhile. Practitioners need to find the right balance to get the most value from the work being performed.

Although overanalyzing data is a risk, you rarely see this as the case. The most common failing is not to perform enough analysis. Too often the desire to get results quickly ends up driving through the process with too few rules defined and too little thinking about the data.

Used effectively, data profiling can be a core competency technology that will significantly improve data quality assessment findings, shorten the implementation cycles of major projects by months, and improve the understanding of data for end users. It is not the only technology that can be used. However, it is probably the single most effective one for improving the accuracy of data in our corporate databases.



Data Quality(c) The Accuracy Dimension
Data Quality: The Accuracy Dimension (The Morgan Kaufmann Series in Data Management Systems)
ISBN: 1558608915
EAN: 2147483647
Year: 2003
Pages: 133
Authors: Jack E. Olson

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net