The data warehouse is expected to be the authority for any data it provides. To gain and maintain this position of authority, a data warehouse should only contain data that is complete, correct, and consistent. If you are the manager of the data warehouse, you may not be responsible for the quality of data sources, but you are responsible for ensuring that only quality data reaches the data warehouse.
It seems that regardless of how much care is taken with an OLTP database, data quality issues arise when data reaches the data warehouse. When your data warehouse is receiving data from a multitude of independently managed sources, problems are compounded. Data "managed" locally by users is often deemed good by them, and any errors don't appear as a problem to those users because they make adhoc adjustments and allowances. Even well-managed local data likely isn't directly compatible or consistent with other local data. Industry estimates are that close to half the data quality issues must be resolved at the source. A much smaller number of the issues can only be resolved in the ETL process. This chapter is about the techniques available to you to detect and correct bad data, merge multiple data sources into a consistent model, and keep track of the successes and failures of the processing.