The technical rules and the business rules for the required source data transformations were accumulated and defined throughout the steps of project planning, project requirements definition, data analysis, application prototyping, and meta data repository analysis. During those steps, the rules were probably extracted from old manuals, old memos, e- mails , programs (operational and decision support), and computer-aided software engineering (CASE) tools and provided by people who remember when and why a business rule was created. These rules are now reflected as data transformation activities in the ETL process. Data Transformation ActivitiesBI projects present the best opportunity to eliminate dead and useless data because it allows the business people to see their information requirements in a different light. When properly implemented, the data transformation activities of cleansing, summarization, derivation, aggregation, and integration will produce data that is clean, condensed, new, complete, and standardized, respectively (Figure 11.1). Figure 11.1. Data Transformation Activities
Underestimating Data Transformation EffortsSource data transformation is similar to opening a Russian doll ”you open one and there is another inside. It could be an endless process. That is why the time required for the ETL process is chronically underestimated. The original estimates are usually based on the amount of technical data conversions required to transform data types and lengths, and they often do not take into account the overwhelming amount of transformations required to enforce business data domain rules and business data integrity rules. The transformation specifications given to the ETL developer should never be limited to just technical data conversion rules. For some large organizations with many old file structures, the ratio of a particular data transformation effort could be as high as 80 percent effort toward enforcing business data domain rules and business data integrity rules and only 20 percent effort toward enforcing technical data conversion rules. Therefore, expect to multiply your original estimates for your ETL data transformation effort by four. Even if you think you have a very realistic timetable for the ETL process, do not be surprised if you still miss deadlines due to dirty data. If you do not miss deadlines, do not be surprised to discover you have not cleansed enough of the data sufficiently. Insist on full-time involvement from the business representative, and insist on getting the right business representative ”someone who is knowledgeable about the business and who has authority to make decisions about the business rules. These stipulations are essential for speeding up the ETL process. Furthermore, urge the business sponsor and the business representative to launch a data quality initiative in the organization, or at least in the departments under their control or influence. When business people drive a data quality initiative, they are more likely to assist with the ETL transformation process. Remind them that while IT technicians may know the process semantics, the business people know the data contents and business semantics. They understand what the data really means. |