As information systems become more of the fabric of organizations, they also get more and more complex. The quality of data within them has not improved over the
There are many reasons data quality is low and getting lower. This will not change until corporations adopt stringent data quality assurance initiatives. With proper attention, great returns can be realized through improvements in the quality of data.
The primary value to the corporation for getting their information systems into a state of high data quality and maintaining them there is that it gives them the ability to quickly and
Data quality assurance initiatives are becoming more popular as organizations are
Data accuracy is the foundation of data quality. You must get the values right first. The remainder of this book focuses on data accuracy: what it means, what is possible,
To begin the discussion of data accuracy, it is important to first establish where accuracy fits into the larger picture of data quality.
Data quality is defined as
Some examples will help in understanding the notion of data quality in the context of intended use. The sections that follow explore examples of the previously mentioned aspects of data integrity.
Consider a database that contains
If this database is to be used for the state of Texas to notify physicians of a new law regarding assisted suicide, it would
If this database were to be used by a new surgical device manufacturer to find potential customers, it would be considered high quality. Any such firm would be
Consider a database containing sales information for a division of a company. This database contains three
If this database is to be used to compute sales bonuses that are due on the 15th of the following month, it is of poor data quality even though the data in it is always eventually accurate. The data is not timely enough for the intended use.
However, if this database is to be used for historical trend analysis and to make decisions on altering
Consider an inventory database that contains part numbers, warehouse locations, quantity on hand, and other information. However, it does not contain source information (where the parts came from). If a part is supplied by multiple suppliers, once the parts are received and put on the shelf there is no indication of which supplier the
If a supplier
A database contains information on repairs done to capital equipment. However, it is a known fact that sometimes the
This database is probably a good-quality database for assessing the general health of capital equipment. Equipment that required a great deal of expense to maintain can be identified from the data. Unless the missing data is disproportionately skewed, the records are usable for all ordinary decisions.
However, trying to use it as a base for evaluating information makes it a low-quality database. The missing transactions could easily tag an important piece of equipment as
Consider a database containing orders from customers. A practice for handling complaints and returns is to create an "adjustment" order for backing out the original order and then writing a new order for the corrected information if
For the accounting department, this is a
A new application is deployed that is used to determine the amount and timing of ordering parts for machinery based on past history and the time in service since last replacement for the machines they are used in. The original application had a programming error that incorrectly ordered 10 times the amount actually required. The error went undisclosed until a large order was sent. A great deal of publicity ensued over the incident. The programming error was fixed and the problem does not repeat.
The database was never wrong; the application was. The large order was actually placed and the database reflected the order as such.
Because of a fear of a repeat of the incident, the maintenance chief has
Unless his confidence in the original application is restored, the database is of poor quality, even though it is entirely accurate. It is not serving its intended use due to a lack of believability.