9.7 Closing Remarks

9.7 Closing Remarks

Structure analysis can uncover rules in the data that are crucial for correct extraction and mapping to other data structures. Overlooking this step in the data profiling process can lead to difficulties in moving data. If denormalized data is moved and remains denormalized in the target, it can lead to inaccurate results when aggregating the data for decision support reports.

Structure rules can be violated in source systems without detection due to inaccurate or incomplete data. If they are not uncovered before using or moving the data, they can result in failures on loading data or in subsequent attempts to use the data.

It is important to complete the structure analysis and document the true structure of the data before proceeding to the next step of data profiling: business object data rule analysis. That step is described in the next chapter. Some of the data rules depend on a correct understanding of the structure as it relates to the association of columns to each other in defining a business object.

start sidebar

Relational systems often contain structural issues, even though the relational DBMS provides a great deal of support for structural enforcement. Denormalized tables are common in relational applications. In the 1980s, relational systems were criticized often for slow performance. Data designers were encouraged to use denormalization to solve many of the performance problems by avoiding JOINs. Duplicate data was also commonly used to avoid performance issues as just another form of denormalization. Most of the time, when denormalization was done, it was not documented. The relational directories have no mechanism for telling them that denormalization exists within a table and what the columns involved in the denormalization are.

Another common shortcoming of relational implementations is a conscious choice not to employ the referential constraints on primary/foreign key pairs. This is often done to alleviate difficulties in loading data or to reduce processing requirements for high-speed transactions. You may have an entity-relationship diagram that shows the RI constraints but not have it implemented within the DBMS. When this is done, you can bet there are orphan rows in the database.

end sidebar



Data Quality(c) The Accuracy Dimension
Data Quality: The Accuracy Dimension (The Morgan Kaufmann Series in Data Management Systems)
ISBN: 1558608915
EAN: 2147483647
Year: 2003
Pages: 133
Authors: Jack E. Olson

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net