Reconciliation


One of the most common complaints about BI applications is that the data in the BI target databases does not match the data in the source systems. As a result, business people often do not trust the BI data. Ironically, most of the time the data in the BI target databases is more accurate than the data in the operational source files or source databases because the data has been reformatted, standardized, and cleansed. However, without proof this trust cannot be restored. Reconciliation totals provide that proof and must be available as meta data in the meta data repository.

ETL reconciliation totals are ETL process control totals, not operational reconciliation totals back to the organization's financial statement or general ledger. The purpose for ETL reconciliation totals is to ensure that all the data values going into the ETL process can be reconciled with all the data values coming out of the ETL process.

Storing ETL reconciliation totals, as well as data quality and load statistics, as meta data highlights the importance of providing a meta data repository. This meta data is crucial information for the business people who want to see which data was loaded, which data was rejected and for what reasons, and what the data reliability (cleanliness) factor is for the BI data after each load cycle. For example, source data from multiple source files and source databases may not have been properly synchronized (a timeliness error), which may have caused inconsistencies in analysis results. With load statistics available as meta data, the business analysts can quickly recognize and reconcile the problem.

Calculating Reconciliation Totals

Far too many BI projects do not employ the good IT practices commonly enforced in operational systems. Believing the notion that BI applications are "just" for decision support and are therefore less critical than operational systems is a grave mistake. BI applications are as critical as operational systems because the decisions that are made based on the data in the BI decision-support environment can affect an organization's strategic direction and vitality .

One time-proven discipline when manipulating data, whether changing the data in some way or copying it or moving it from one place to another, is to reconcile every new output to its old input. Every program or program module that reads data and then writes it to a new file, even if only into a temporary file, must produce reconciliation totals.

There are three types of reconciliation totals: record counts, domain counts, and amount counts.

Record Counts

One of the most fundamental reconciliation totals is a simple count of records read and records written. If records are rejected because they fail an edit check in the ETL process, the number of records rejected must also be counted. The total count of records written and records rejected must equal the count of records read (Figure 11.2).

Figure 11.2. Record Counts

graphics/11fig02.gif

Domain Counts

Domain counts involve counting the number of records for each unique domain (data value) of an input field and counting the number of records for the same unique domain on the output file. The complication with domain counts enters when an "overloaded" source data element must be split into multiple columns . An overloaded source data element is a data element used for multiple purposes, as many of the one-byte and two-byte code fields are in operational systems. The domain of an overloaded data element describes not one business object but many business objects and must therefore be separated into different columns for different tables. (Business objects are usually implemented as dimension tables.) In that case, the total of the multiple domain counts on the output side must equal the one domain count on the input side. If data values were rejected because they did not pass the data quality edits, there would be an additional count of all the rejected records. Thus, the total of the multiple output domain counts and the rejected data values count must equal the input domain count (Figure 11.3).

Figure 11.3. Domain Counts

graphics/11fig03.gif

Amount Counts

Even more important than domain counts are amount counts. They are the primary mechanism for reconciling source files and source databases to the BI target databases. Reconciling amounts happens in two ways. One is to summarize every amount field in every input file and every corresponding amount field in every output file. If records were rejected, there would be a third summary for the total amount figure rejected. The combined total of the output amount total and the rejected amount total must equal the input amount total.

A more complicated amount reconciliation algorithm must be developed if the incoming amount field is an overloaded source data element that has to be separated into several different amount columns for the BI target databases. In that case, the selection and edit criteria in the transformation specifications must be used to create multiple output amount totals. In addition, the same selection and edit criteria must be run against the input file to produce the same multiple amount totals for verification (Figure 11.4).

Figure 11.4. Amount Counts

graphics/11fig04.gif

Storing Reconciliation Statistics

Because of the extensive amount of transformation and cleansing that typically occurs in the ETL process, the business people should expect the data to be different on the BI target databases than in the original source files and source databases. Their desire or need to reconcile the data between source and target is valid. However, it must be accomplished through using reconciliation totals, not by comparing source data elements to target columns dollar for dollar.

Therefore, all reconciliation totals produced by every program module for each BI application load cycle must be stored as meta data in the meta data repository. If possible, additional categorization of rejected records should be considered with short descriptions for the rejection reason. These descriptions can be composed based on the ETL edit criteria.



Business Intelligence Roadmap
Business Intelligence Roadmap: The Complete Project Lifecycle for Decision-Support Applications
ISBN: 0201784203
EAN: 2147483647
Year: 2003
Pages: 202

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net