Chapter 7: Data Profiling Overview

Overview

This chapter begins the examination of the most important technology available to the data quality assurance team: data profiling.

start sidebar

Note to the reader: This text uses the terms column and table throughout the data profiling chapters in order to provide consistency. Data profiling is used for data from a wide variety of data sources that use different terminology for the same constructs. Consider table the equivalent of file, entity, relation, or segment, and column the equivalent of data element, attribute, or field.

The text uses the term data profiling repository to mean a place to record all of the information used in and derived from the data profiling process. Much of this information is metadata. However, I do not want to confuse the reader by referring to it as a metadata repository. A user could use an existing metadata repository for this information provided it was robust enough to hold all of the types of information. Otherwise, they could use the repository provided by a data profiling software vendor or fabricate their own repository. It is not unreasonable to expect that much of this information would subsequently be moved to an enterprise metadata repository after data profiling is complete.

end sidebar



Data Quality(c) The Accuracy Dimension
Data Quality: The Accuracy Dimension (The Morgan Kaufmann Series in Data Management Systems)
ISBN: 1558608915
EAN: 2147483647
Year: 2003
Pages: 133
Authors: Jack E. Olson

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net