The costs of pursuing a data quality program start with the costs of creating and maintaining a central organization for data quality assurance. Developing temporary
This team needs the usual resources to accomplish
The quality team will carve out activities such as a data quality assessment for a specific application. In pursuing this, additional costs are incurred by people outside the data quality team. They will be asked to
The real costs come when the quality team has
Another cost to consider is the cost of disruption of operational systems. Fixing data quality problems often requires changing application programs or business processes. The testing and deployment of these changes can be very disruptive to critical transaction systems. Downtime of even one
It is helpful to separate the front-end project of assessing the quality of the data from the back end of implementing and monitoring remedies. The front end takes you through the remedy design phase of issues management.
Figure 6.1 shows the shape of the business case when this is done. An assessment project looks at data, develops facts, converts them to issues, assesses impacts, and designs remedies. One or more back-end projects are spawned to implement the remedies. They can be short-
Figure 6.1: General model of business case.
A Data quality project can escalate into a cost of several millions of dollars. For example, an assessment project discovers many inaccurate data facts. This costs very little to find. In reviewing these with business analysts, they discover that many decision-making systems that feed off that data are corrupted, creating a serious potential problem. They also discover that the bad data is currently costing them a bundle.
Moving to the remedy phase, they work with the IT department staff to find solutions. The problem is multifaceted, with some of the blame going into improper data architecture. The application is old, using an obsolete data management technology.
A decision is made to scrap the existing application and database implementation and reengineer the entire application. Instead of building a new system, they decide to purchase a packaged application. This project escalates into a major migration of data and logic.
In parallel, the team determines that reengineering the entire data capture process is also in order. Bringing the application into the modern age entails defining Internet access to data as well an Internet data capture process. The jobs and responsibilities of many people change.
What started out as a small project mushroomed into a major initiative for the corporation. There were probably many other business drivers that
The assessment project is all cost and no value. This is because value is only returned if changes are made. The business case for doing the assessment project is the difficult evidence of value to be recouped from problems caused by that source of data and speculation about recoupable value not visible. Its business case is very
The implementation projects achieve the potential value. The business case for the implementation projects is determined by the assessment project. If done properly, it will have a thorough analysis of potential value to be recouped. It will also be much easier to estimate the costs of the project.
The front-end assessment project is not only the basis for the business case for the back-end implementation projects but an indicator of what can be accomplished through more assessment projects. As the company becomes more successful with projects, it becomes easier and easier to support new assessment projects. The goal of a data quality assurance
The model of front-end versus back-end projects described previously is instructive to use in comparing the two project activity types of stand-alone assessment versus services-to-projects. The stand-alone assessment project starts by selecting a data source that is suspected of having sufficient data problems to speculate that large gains are likely from quality investigations and follow-through.
Companies have hundreds if not thousands of data sources. Those that are usually
The quality assurance group should inventory all information systems and then rank them according to their contribution to the business, the potential for them to be the target of new business models, and their potential to create costly problems if they are not accurate. The goal should be to get the corporation's information systems to high levels of quality and maintain them there. The process should select the databases first that have the most need to be at a level of high data quality.
Figure 6.2 shows some of the factors that should be considered in selecting what to work on first. Each of the categories should be graded on a scale of 1 to 10, for example, and weights assigned for each factor. This would give the team a basis for relative prioritizing. The resulting chart could be presented to management to help justify the work to be done. This becomes the basis for the business case.
Figure 6.2: Project selection criteria.
Identified costs include any problems, such as
These are the potential for costs that are not visible. Identified costs may be an indicator that more costs are there. For example, if you know that 5% of customer orders are returned, you can speculate that the number of mixed-up orders is higher and that some customers are just accepting the mistake and not going to the trouble of returning the merchandise. A chart that shows the potential for additional wrong orders and the impact they have on customer loyalty can easily suggest that the cost is much higher than the visible component.
Data sources that serve the critical
The age of data source applications should be considered. Older applications
Applications that were built using modern DBMS technology, with data architecture tools, or with business rules accommodated in the design should be prioritized lower than applications built on obsolete technology and that gave little consideration to quality factors. The characterization should take into consideration whether quality promotion features such as referential constraints were used, not just that they were available.
The probability of a data source being the target of a business change is very important. Anticipating
The stand-alone project is all cost with no value. The justification is usually the expectation that value-generating issues will be found. This is why hard evidence of costs already being incurred has such large weight in these decisions. Because most of the costs are hidden at this stage, often one of less importance is selected.
The cost of conducting the assessment should also be estimated. Assessment projects involve a great deal of analytical activity regarding data. The factors that will influence the cost are the volume of data, the difficulty in extracting the data for analysis, the breadth of the data (number of attributes and relations), and the quality of the known metadata. Some sources can be profiled with very little effort, and others can take weeks or months to gather the data and metadata to even start the process.
Generally, the cost of conducting the assessment is not a critical factor. The value potential determined by the first part of the business case will drive the approval of the project, not the cost of doing the project. The primary difficulty in getting approval of assessment-only projects is that they are all cost without hard facts
All active corporations execute several projects concurrently that are based on changing their business model or practices and that involve reengineering, reusing, or repurposing existing data. Examples of these types of projects are application migration from legacy to packaged applications, data warehousing construction, data mart construction, information portal construction, application consolidation due to mergers and acquisitions, application consolidation for eliminating disparate data sources, CRM projects, application integration projects joining Internet applications to legacy data sources, and on and on.
All of these projects have an existing data source or sources that are about to receive new life. These projects have the worst record of success of any class of corporate
It is recognized that a major reason for the difficulties in executing these projects is the poor quality of the data and the metadata of the original data sources. It is also becoming increasingly recognized that discovering and correcting the metadata and discovering the data accuracy problems as the first step in these projects hugely
It only makes sense that working from accurate and complete metadata will lead to better results than working with inaccurate and incomplete metadata. It also makes sense that knowing the quality of the data will lead to corrective actions that are necessary for the project to complete, as well as to meet its objectives.
The analytical approach to accomplishing this first step includes the same activities that are performed in a stand-alone data quality assessment project. It requires the same skills and produces the same outputs: accurate metadata, and facts about inaccurate data and issues. If the data quality assurance group performed this first step for these projects or with the project team, the business case becomes a no-brainer.
The justification of the project is not just that quality improvements will lead to value. It adds the justification that it will shorten the time to complete the project and reduce the cost of completion. The Standish Group has estimated that performing data profiling at the beginning of a project can reduce the total project cost by 35%. This is a very conservative estimate. However, it is more than sufficient to offset all costs of the assessment activity.
This means that performing a quality assessment on data sources that are the subject of a project is free: it returns more dollars and time to the project than it takes away. It is very different from the business case for stand-alone assessment projects in that it does not need to wait for remedy implementation to return its expense.
The story does not end there. The quality assessment also makes decisions within the project better. The project team may decide that improving the quality of the data source is a requirement for proceeding, thus spinning off a renovation project that will enable the larger project to succeed. It may identify changes to the design of the project that are necessary for success: changes to target system database design or packaged application customization parameters. The project team can decide to cancel a project early if the data assessment reveals that the data source cannot
Additional benefits are the reduction in the potential for mistakes being made in the data flow processes for data between the original source and the target. It
It may also provide issues that can eliminate data inaccuracies in the source systems that are not directly needed by the project but that were
Figure 6.3 shows this business case. The business case is just the
Figure 6.3: The business case for project services.
Because of the unique business value for these activities, the data quality assurance department should include this as their primary function. Instead of avoiding data that is the subject of major project, they should join with the projects and add value to them. The data sources involved in these projects are most likely the ones with the highest ratings for importance to the corporation and are most likely to have the highest rating for potential future problems.
Teach-and-preach activities involve
This is no different than wanting
The outcome of continuous data quality education for information specialists will be better system designs and
Spending money on improving data and information quality is always a trade-off decision for corporations. Data quality initiatives do not generally
There are three fundamental approaches to deciding whether to invest in data quality improvements and to what intensity the activities ought to be pursued. These approaches are described in the sections that follow.
This type of decision requires defensible estimates of cost and value before funding is made. Demanding this for data quality initiatives will almost always result in no funding. Everyone
Many corporate decisions are based on hard facts. For example, consider buying a software package that will reduce disk requirements for large databases by 40%. You can trial the software and see if it gets a reduction of 40%, plus or minus 5%, on samples of your data. If it does, you can take the worst case of 35%, compute the reduction in disk drives required, price the cost of the
Data quality initiatives do not work this way. Too often management wants a hard-fact case presented to them for data quality spending. Sometimes this is possible because of egregious examples of waste due to bad data. However, it is not a method that selects projects with the highest potential value. Even though this may grossly understate the potential for value, it may provide sufficient justification for getting started.
The negative about this is that one project gets
Corporations often spend money on initiatives that promise returns that cannot or have not been estimated. Opening outlets in new cities, changing the design of a popular product, and spending money on expensive advertising
This is where data quality belongs. Those asking for funding need to exploit this approach as much as possible. Hard facts should be used to
There is a lot of
Corporations also base decisions on intuition. Some of this is following fads. This is why some major
Executives should be made aware of the growing groundswell of interest in this area. It is not happening for no reason; corporations who adopt data quality initiatives become more successful than those who do not. Although this cannot be proven, it should be obvious.
Most large initiatives in IT are not created through hard facts or through credible estimates of probable value. They are created through intuition.
An example is the movement to relational database systems in the early 1980s. The systems were unreliable, incompatible with prior systems, and showed very slow and unpredictable performance. They used much more disk space than other data storage subsystems. Whereas mature, previous-generation systems were benchmarking 400+ transactions per second, the relational systems could
There were virtually no hard facts to justify relational databases. There were no
Data quality is not a glamorous or exciting topic. It is not central to what a company does. It deals in
The best argument you can use is the one that says that placing your company in a position to
Data quality advocates should avoid being put into a position of conducting limited value-demonstration projects and in doing only projects that require hard facts. Far too much value is left on the table when this is done. They should try to get religion in the corporation and have a full-function mission established that