Chapter Twenty. TaskSubtask Matrix


Chapter Twenty. Task/Subtask Matrix

Development Step

Activities

Tasks /Subtasks

1. Business Case Assessment

1. Determine the business need

  • Identify the business need (business problem or business opportunity)

  • Determine current financial consequences of the business need (lost revenue, lost opportunity, competition, obsolescence, cost overrun )

 

2. Assess the current decision-support system (DSS) solutions

  • Assess current usage of the existing DSS

  • Determine the shortcomings of the existing DSS (different keys, data redundancy, inconsistencies, difficult to access)

  • Perform gap analysis, identifying which business questions are being answered by the existing DSS and which ones are not

 

3. Assess the operational sources and procedures

  • Assess the data quality of operational systems in terms of:

    - File structures and database structures

    - Content (domain) of source data elements

  • Assess the current data movement in terms of:

    - Data entry

    - Data extraction

    - Data manipulation

    - Data duplication

  • Assess current operational procedures in terms of:

    - Poor data entry practices

    - Lack of edit checks

    - Defective program code

    - Lack of training

 

4. Assess the competitors ' BI decision-support initiatives

  • Determine the competitors' successes and failures with their BI decision-support initiatives, especially their data mining initiatives

  • Determine whether your competitors gained market advantages, such as more sales, new customers, or innovative products

 

5. Determine the BI application objectives

  • Identify the strategic business goals of the organization

  • Define the overall BI decision-support objectives

  • Define the project-specific BI application objectives

  • Match the overall BI decision-support objectives to the strategic business goals

  • Match the project-specific BI application objectives to the strategic business goals

 

6. Propose a BI solution

  • Review current DSS solutions

  • Review DSS gap analysis

  • Determine how the BI application will lessen the business pain

  • Create a high-level architecture for the proposed BI solution

  • Consolidate and prioritize unfulfilled requirements from previous BI projects with new requirements

  • Create a high-level (conceptual) logical data model

 

7. Perform a cost-benefit analysis

  • Determine costs

  • Determine benefits:

    - Identify tangible benefits

    - Identify intangible benefits

    - Identify short- term benefits to the organization

    - Identify long-term benefits to the organization

  • Calculate the projected return on investment (ROI)

 

8. Perform a risk assessment

  • Create a risk assessment matrix, listing all possible BI project risks in terms of technology, complexity, integration, organization, project team, and financial investment

  • Assign weights to the risks

  • Rank the risks: low, medium, or high

  • Determine the risks ( ramifications ) of not addressing the business need and not implementing a BI solution

 

9. Write the assessment report

  • Write the assessment report to describe:

    - Business need (business problem or business opportunity)

    - Lost opportunities (ramifications of not addressing the business need)

    - Proposed BI solution (and alternative solutions)

    - Cost justification and expected ROI

    - Risk assessment

    - Recommendations (include operational business process improvements)

Development Step

Step Activities

Tasks/Subtasks

2. Enterprise Infrastructure Evaluation

   

Section A. Technical Infrastructure Evaluation

1. Assess the existing platform

  • Review hardware

  • Review operating systems

  • Review middleware, especially DBMS gateways

  • Review custom interfaces

  • Review network components and bandwidth

  • Review the DBMS

  • Review tools (CASE, ETL, OLAP, report writers, data mining tool)

  • Review the meta data repository (if one exists)

  • Perform gap analysis, identifying which platform components seem adequate and which ones seem inadequate

 

2. Evaluate and select new products

  • Identify the product categories you need to evaluate (for example, hardware, middleware, DBMS, tools)

  • List all products being considered for each category

  • Itemize your requirements for the products

  • Weigh each product requirement on a scale of 1 to 10

  • Rank each product against the weighted requirements on a scale of 0 to 10 (0 means the product cannot satisfy the requirement)

  • Determine the total score for each product by multiplying the rank by the requirement weight factor

  • List all vendors of all products

  • Itemize your requirements for the vendors

  • Weigh each vendor requirement on a scale of 1 to 10

  • Rank each vendor against the weighted requirements on a scale of 0 to 10 (0 means the vendor cannot satisfy the requirement)

  • Determine the total score for each vendor by multiplying the rank by the requirement weight factor

  • Evaluate the product scores and vendor scores

  • Create a short list of products and vendors in each category

  • Have the products demonstrated by the vendors

  • Choose the final product in each product category

  • Obtain business sponsor approval to license the products

 

3. Write the technical infrastructure assessment report

  • Fill in the following sections of the assessment report:

    - Executive summary

    - Findings about servers, operating systems, middleware, interfaces, network and bandwidth, DBMS, tools, and meta data repository

    - List of weighted requirements

    - Product scores

    - Vendor scores

    - Product costs

    - Products on the short list

    - Rationale for selecting or rejecting products

    - Final selection criteria

 

4. Expand the current platform

  • Order new products

  • Install new products

  • Test new products

  • Train technical staff on new products

Section B. Nontechnical Infrastructure Evaluation

1. Assess the effectiveness of existing nontechnical infrastructure components

  • Review standards for data naming, abbreviations, logical data modeling, testing, and reconciliation

  • Review the use of the development methodology

  • Review estimating guidelines

  • Review change-control procedures

  • Review issues management procedures

  • Review roles and responsibilities

  • Review security processes and guidelines

  • Review meta data capture and delivery processes for business meta data as well as technical meta data

  • Review meta data repository functionality

  • Review the process for merging logical data models into the enterprise data model

  • Review data quality measures and the cleansing triage process

  • Review the service-level agreement (SLA) process

  • Review the BI support function

  • Review the dispute resolution process

  • Review the communication process

  • Perform gap analysis, identifying which standards, guidelines, and procedures are adequate and which ones are inadequate

 

2. Write the nontechnical infrastructure assessment report

  • Fill in the following sections of the assessment report:

    - Executive summary

    - Findings about inadequate standards, guidelines, procedures, and processes

    - Recommendations for nontechnical infrastructure changes

    - Prioritized nontechnical infrastructure requirements to be implemented within the BI project

    - Prioritized nontechnical infrastructure enhancements to be implemented outside the BI project

 

3. Improve the nontechnical infrastructure

  • Create time estimates for creating or modifying new standards, guidelines, and procedures

  • Change the guidelines for using the development methodology before the BI project begins, if necessary

  • Modify the roles and responsibilities before the BI project begins, if necessary

  • Create new processes as needed, or modify any process that appears to be inadequate or not working

3. Project Planning

1. Determine the project requirements

  • Define data requirements

  • Define functional requirements ( reports , queries, online help function)

  • Define infrastructure requirements (technical and nontechnical)

  • Expand or create the high-level logical data model

  • Validate the requirements with other business people

  • Obtain sponsor approval for the requirements

 

2. Determine the condition of the source files and databases

  • Review the content of each potential source file and source database (internal and external)

  • Assess source data violations to the:

    - Technical data conversion rules

    - Business data domain rules

    - Business data integrity rules

  • Determine which data elements are critical to the business, which are important (but not critical), and which are insignificant

  • Estimate how long it would take to cleanse the critical source data elements and extrapolate an estimate (make an educated guess) for cleansing the important source data elements

  • Review data-cleansing estimates with the business sponsor and have the business sponsor prioritize the cleansing effort

 

3. Determine or revise the cost estimates

  • Review the technical infrastructure assessment report

  • Review the nontechnical infrastructure assessment report

  • Review the project requirements

  • Review the project constraints (time, scope, budget, resources, quality)

  • Review the need for consulting, contracting, and training

  • Revise the original cost estimates, if necessary

 

4. Revise the risk assessment

  • Review and revise the original risk assessment matrix

  • For every risk, determine the likelihood of it materializing: low, medium, high

  • Determine the impact of every risk: low, medium, high

  • Define triggers ( indications that a risk is about to materialize)

  • Define a risk mitigation plan, listing actions to prevent or circumvent each risk

  • Define a contingency plan, listing alternative actions to take when a risk materializes

  • Identify your assumptions because they may become risks

  • Include assumptions in the contingency plan (list alternatives)

  • Review the project constraints (time, scope, budget, resources, quality) as they relate to risk

 

5. Identify critical success factors

  • Define the success criteria for the BI project (what would be considered a "success")

  • Determine critical success factors (what must be in place in order for the project to meet the success criteria)

  • Review critical success factors with the business sponsor

  • Obtain agreement and cooperation on the critical success factors from the business sponsor (for example, provide one full-time business representative for the project)

 

6. Prepare the project charter

  • Write the project charter with information about the BI project collected up to this point:

    - Purpose and reason for the BI project

    - Costs and benefits

    - Infrastructure and business process improvements

    - High-level scope (data and functions)

    - Items not in the scope (originally requested but subsequently excluded from the scope)

    - Expectations from the business people in terms of availability, security, response time (performance), data cleanliness, ongoing support

    - Team structure, roles, and responsibilities

    - Risks, assumptions, and constraints

    - Critical success factors

 

7. Create a high-level project plan

  • Create a work breakdown structure (list of appropriate tasks)

  • Determine base estimates for all tasks (using estimates provided by the team members who will do the work)

  • Identify task dependencies

  • Revise the base estimates for assigned resources, based on:

    - Skill level

    - Subject matter expertise

    - Additional administrative activities

    - Non-work- related activities

  • Identify resource dependencies (resource leveling)

  • Create a critical path method (CPM) or Pert chart

  • Create a Gantt chart

 

8. Kick off the project

  • Prepare an agenda for the kickoff meeting

  • Call a kickoff meeting (include the business sponsor and the business representative)

  • Assign roles and responsibilities to core team members

  • Identify extended team members and review their responsibilities

  • Discuss the project charter

  • Walk through the project plan

  • Discuss the concept of self-organizing teams (core team members monitoring and redistributing [sharing] their workload)

4. Project Requirements Definition

1. Define the requirements for technical infrastructure enhancements

  • Define the requirements for additional hardware

  • Define the requirements for additional middleware

  • Define the requirements for a new DBMS or upgrades to the existing DBMS

  • Define the requirements for the network or upgrades to it

  • Define the security requirements and decide whether to buy a security package

  • Define the requirements for development tools (CASE, ETL)

  • Define the requirements for data access and reporting tools (OLAP, report writers)

  • Define the requirements for a new data mining tool

  • Determine whether to license (buy) or custom build a meta data repository

  • If a meta data repository already exists, determine how to enhance it

 

2. Define the requirements for nontechnical infrastructure enhancements

  • Define the requirements for creating or changing standards, guidelines, and procedures for:

    - Governance standards and procedures for prioritizing requirements and deliverables (functions, data, meta data)

    - Use of the development methodology

    - Estimating guidelines

    - Scope management (change control) process

    - Issues management process

    - Roles and responsibilities

    - Security process

    - Meta data capture and delivery process

    - Logical data modeling

    - Data quality measures and triage process

    - Testing process

    - SLA process

    - Support functions

    - Dispute resolution process

    - Communication process

 

3. Define the reporting requirements

  • Collect or create sample report layouts

  • Collect or create sample queries

  • Define business rules for the reports

  • Define aggregation and summarization rules

  • Define reporting dimensions

  • Define query libraries

  • Identify stewards of the libraries

  • Get samples of the types of ad hoc queries the business analysts may want to write

  • Define access interfaces (GUI front end, Web display, portal)

 

4. Define the requirements for source data

  • Define all source data elements (for reporting dimensions, for report fields, for queries, for data mining)

  • Classify data elements as critical, important, insignificant

  • Define the data domains ( allowable values)

  • Define the significant and obvious business rules for the data

  • Analyze the source files and source databases in more detail to determine data-cleansing requirements

  • Define the historical data requirements (how much history to load and from how many years back, or accumulate history from this point forward)

 

5. Review the project scope

  • Compare the detailed project requirements to the high-level scope in the project charter

  • Review the project constraints (time, scope, budget, resources, quality)

  • Determine whether the scope is still realistic under those constraints

  • Renegotiate the scope, if necessary

  • Create a change-control document for managing changes to the requirements

  • Create an issues log for tracking and resolving issues

 

6. Expand the logical data model

  • Add newly discovered entities and their relationships to the high-level (conceptual) logical data model

  • Refine the logical data model by resolving the many-to-many relationships

  • Add unique identifiers to each entity

  • Attribute the logical data model with critical data elements

 

7. Define preliminary service-level agreements

  • Identify or revise the expectations (outermost acceptable limits) of the business people in terms of:

    - Availability

    - Security

    - Response time

    - Data cleanliness

    - Ongoing support

 

8. Write the application requirements document

  • Fill in the following sections of the application requirements document:

    - Technical infrastructure requirements

    - Nontechnical infrastrucure requirements

    - Reporting requirements

    - Ad hoc and canned query requirements

    - Requirements for source data, including history (include the high-level logical data model)

    - Data-cleansing requirements

    - Security requirements

    - Preliminary SLAs

5. Data Analysis

1. Analyze the external data sources

  • Identify the entities and relationships from each external data source

  • Merge the new entities and relationships from the external data sources into the logical data model

 

2. Refine the logical data model

  • Fully attribute the logical data model to include all required source data elements from internal as well as external data sources

  • Create new entities and relationships where needed to store the new attributes (data elements)

  • Analyze the layout of all identified source files and source databases, and normalize overused data elements, which are explicitly redefined; for example, a "redefines" clause or an "occurs" clause in a program

  • Analyze the content of all identified source data elements in the source files and source databases, and normalize overused data elements, which are implicitly redefined; for example, when a data element is used for multiple purposes

  • Create the data-specific business meta data components for all attributes (data name , definition, domain, length, type, and so on)

 

3. Analyze the source data quality

  • Apply business data domain rules and business data integrity rules to find data elements with invalid domains, such as:

    - Default values

    - Missing values

    - Cryptic values

    - Contradicting values (between two dependent data elements)

    - Values that violate the business rules

    - Missing primary keys

    - Duplicate primary keys

  • Determine the severity of the problem: how many data elements have invalid domains and how many records are affected; for example, 140 data elements out of 260 and 609,772 records out of 2,439,087

  • Determine the criticality of the problem (how the dirty data will affect the BI application if not corrected)

 

4. Expand the enterprise logical data model

Note

This activity is typically performed by data administration behind the scenes of the BI project. However, if the core team member who plays the role of the project data administrator must also play the role of the enterprise data administrator, the BI project schedule may be affected.


  • Merge the project-specific logical data model into the enterprise logical data model

  • Identify data discrepancies and inconsistencies between the logical data models

 

5. Resolve data discrepancies

  • Discuss the discrepancies and inconsistencies with the BI project team, the data owners , and the business executives who use the data in question for their business decisions

  • Adjust either the project-specific logical data model or the enterprise logical data model, as appropriate

  • Notify other project teams that are affected by the changes

  • If changes cannot be implemented, document the discrepancies and inconsistencies as meta data, and schedule a time for the changes to be implemented

 

6. Write the data-cleansing specifications

  • Review the classification of data elements: critical, important, insignificant

  • Write data-cleansing specifications for all critical data elements

  • Write data-cleansing specifications for selected important data elements (let the business representative make the selection)

  • Unless there is sufficient time on the project and the business representative specifically requests it, do not write data-cleansing specifications for the insignificant data elements

6. Application Prototyping

Note

Prototyping is an iterative process, and activities in this step will be repeated multiple times.


1. Analyze the access requirements

  • Review the application requirements document with the subject matter expert and the business representative, and together analyze:

    - Report requirements; for example, how many reports look similar and can therefore be combined due to a pattern in the reports or use of the same facts and dimensions

    - Query requirements; for example, do the business people want parameterized queries and if so, what are the parameterized variables and who will maintain the query library

    - Ad hoc requirements; if possible, try to determine what type of ad hoc questions the business people may want to ask based on what types of questions they currently ask

    - Interface requirements; for example, a GUI front end or a Web portal

  • Communicate all your findings to the database administrator

  • Create a skill set matrix for each business person participating in the prototyping activities:

    - Indicate computer skill level as beginning, advanced, expert

    - Indicate application knowledge as beginning, advanced, expert

 

2. Determine the scope of the prototype

  • Determine the objective and the primary use of the prototype

  • Decide which type of prototype to build:

    - Show-and-tell

    - Mock-up

    - Proof-of-concept

    - Visual-design

    - Demo

    - Operational

  • Select a subset of functions (reports, queries, ETL, interface)

  • Select a subset of sample data from the source files and source databases

  • Create a change-control document for managing scope changes during the prototype

  • Create an issues log for tracking and resolving issues during the prototype

  • Determine the number of prototype iterations

  • Determine the number of prototype participants (IT and business people)

  • Determine the time limits for each prototype iteration (time-box)

  • Estimate the cost and benefit for each prototype iteration

  • Determine the point of diminishing returns for the prototyping effort

 

3. Select tools for the prototype

  • Review existing in-house tools and find out who uses them

  • Review the availability of new reporting and querying tools

  • Review existing or new graphical tools

  • Review existing or new report distribution tools

  • Review existing DBMS options for the prototype

  • Select the platform on which the prototype will be developed

  • Select one of the installed and tested DBMSs

  • Select one or more existing or new tools

  • Determine training needs for the new tools

  • Schedule training sessions as soon as possible

 

4. Prepare the prototype charter

  • Write the prototype charter with information about:

    - Why you are building the prototype (purpose)

    - What type of prototype you selected (show-and-tell, mock-up, proof-of-concept, visual-design, demo, operational)

    - Who will participate (IT and business people)

    - What the rules are (scope, time, iterations)

    - How you will measure your degree of success

 

5. Design the reports and queries

  • Design the reports based on existing reports or mock-up report layouts

  • Design the queries based on existing spreadsheets or reports or on mock-up samples

  • Design the interfaces : GUI or Web front end

  • Create a physical data model (database design) for the prototype database

  • Identify the data to be used for the prototype : either a representative sample of source data or new test data

  • Map sample source data or new test data into the prototype database

 

6. Build the prototype

  • Create the physical prototype database (tables, columns , indices)

  • Create sample test data (extract sample source data or create new data)

  • Load the prototype database with sample source data or sample test data

  • Write a selected subset of reports

  • Write a selected subset of queries

  • Write a selected subset of interfaces or other functions

  • Test reports, queries, interfaces, or other functions

  • Document any problems with the tool

  • Document any issues with the reports or queries

  • Document any issues with the interfaces or other functions

  • Document any issues with dirty source data (do not waste time resolving them unless you are prototyping an ETL function)

  • Using the prototype as a yardstick, validate the time and cost estimates for the BI application

 

7. Demonstrate the prototype

  • Review reports and queries with the business people

  • Review problems and issues with the business sponsor and the business representative on the core team

  • Review the project requirements with the subject matter expert and the business representative

  • Document requested changes in the change-control document

  • Analyze the impact of requested changes in terms of:

    - Time

    - Quality

    - Cost

    - Additional skills or resources required

  • Review the impact of requested changes with the business sponsor and the business representative

  • Revise the application requirements document to include approved changes

  • Review lessons learned with the entire project core team and in particular with the ETL step core team

  • Use prototype demonstrations to promote the BI application

  • Perform the next prototype iteration, if applicable

7. Meta Data Repository Analysis

1. Analyze the meta data repository requirements

  • Review the technical infrastructure assessment report for requirements to license (buy), build, or enhance a meta data repository

  • Perform a cost-benefit analysis for licensing versus building a meta data repository

  • Make the decision to license or build a meta data repository

  • Review the nontechnical infrastructure assessment report for new meta data requirements (business and technical)

  • Determine the scope of the meta data repository deliverables

  • Prioritize the meta data repository deliverables, indicating whether the meta data components are mandatory, important, or optional

  • Update the application requirements document to reflect any changes

 

2. Analyze the interface requirements for the meta data repository

  • Analyze the meta data sources from which the meta data components will be extracted:

    - Word processing files and spreadsheets

    - DBMS dictionaries

    - CASE, ETL, OLAP tools

    - Report writers and query tools

    - Data mining tool

  • Determine what import and export features are available in these tools as well as in the meta data repository product

 

3. Analyze the meta data repository access and reporting requirements

  • Review the original meta data repository access and reporting requirements

  • Review the meta data security requirements

  • Identify the access interface media for displaying meta data ad hoc query results (for example, PDF, HTML)

  • Analyze the feasibility of a context-sensitive help function

  • Determine what reports should be produced from the meta data repository

 

4. Create the logical meta model

  • Create business meta data entities

  • Create technical meta data entities

  • Determine the relationships between the meta data entities

  • Create attributes for business and technical meta data entities

  • Draw an entity-relationship diagram

 

5. Create the meta-meta data

  • Describe all meta data entities with:

    - Name

    - Definition

    - Relationships

    - Security

    - Physical location

    - Timeliness

    - Volume

  • Describe all meta data attributes with:

    - Name

    - Definition

    - Type and length

    - Domain (content)

    - Security

    - Ownership

  • Define the business rules for meta data entities, attributes, and relationships

8. Database Design

1. Review the data access requirements

  • Review the data-cleansing specifications

  • Review the prototyping results with the application lead developer

  • Review detailed access and analysis requirements with the application lead developer and the subject matter expert or business representative:

    - Reporting requirements

    - Querying requirements

    - Ad hoc querying requirements

  • Review data security requirements

  • Determine projected data volumes and growth factors

  • Determine the projected number of concurrent database usages

  • Determine the location of business people

  • Determine the frequency of report and query executions

  • Determine the peak and seasonal reporting periods

  • Determine platform limitations

  • Determine tool limitations (ETL, OLAP, report writers)

 

2. Determine the aggregation and summarization requirements

  • Review the measures (facts) used by the prototype (how they were derived through aggregation and summarization)

  • Review the dimensions used by the prototype

  • Review the drill-down and roll-up functions of the prototype

  • Review common reporting patterns among existing reports and among business people from various departments

  • Determine the most frequently used reporting dimensions

  • Review the logical data model with the data administrator

  • Determine the level of detail (granularity) needed

  • Determine how the detailed data will be accessed (drill-down or ad hoc)

  • Determine how many business relationships (entity relationships) among detailed data will be needed, if any

 

3. Design the BI target databases

  • Determine the appropriate database design schemas:

    - Multidimensional star schema

    - Multidimensional snowflake schema

    - Entity-relationship-based relational schema

    - Hybrid design schema (mixture)

  • Create the physical data models (database design diagrams)

  • Create the technical meta data for the physical data models (for example, names and definitions for tables, columns, keys, indices)

  • Map the physical data models to the logical data model

 

4. Design the physical database structures

  • Determine how to cluster the tables

  • Determine the placement of datasets

  • Determine how to stripe disks

  • Determine how to partition the tables across multiple disks

  • Determine how much free space to choose

  • Determine how much buffer space to declare

  • Determine how large to set the blocksize

  • Determine the most appropriate indexing strategy

  • Determine whether referential integrity will be enforced by the DBMS or by the ETL programs

 

5. Build the BI target databases

  • Create the data definition language (DDL) defining:

    - Storage groups

    - Databases

    - Partitions

    - Tablespaces

    - Tables

    - Columns

    - Primary keys

    - Foreign keys

    - Indices

  • Create the data control language (DCL) to:

    - Define parameters for the security SYSTABLE

    - Set up group IDs

    - Grant CRUD (create, read, update, delete) authority to the group IDs

    - Assign developers, business analysts, and programs to the appropriate group IDs

  • Run the DDL to create the physical database structures

  • Run the DCL to grant CRUD authority to the physical database structures

  • Build the indices

 

6. Develop database maintenance procedures

  • Define database maintenance activities for:

    - Database backups (full backups and incremental backups )

    - Disaster recovery procedures

    - Reorganization procedures for fragmented tables

  • Define the frequency of and procedure for performance monitoring activities

 

7. Prepare to monitor and tune the database designs

Note

This is an ongoing post-implementation activity.


  • Plan to monitor the performance of ETL loads, reports, and queries at runtime

  • Plan to use a performance-monitoring utility to diagnose performance degradation

  • Plan to refine the database design schemas

  • Plan to add additional indices, if necessary

 

8. Prepare to monitor and tune the query designs

Note

This is an ongoing post-implementation activity.


  • Plan to review and streamline all SQL calls in ETL programs and application programs

  • Plan to write pass-through queries for OLAP tools, if necessary

  • Plan to utilize parallel query execution

9. Extract/Transform/Load Design

1. Create the source-to-target mapping document

  • Review the record layouts for the source files

  • Review the data description blocks for the source databases

  • Review the data-cleansing specifications for source data elements with the data quality analyst, the subject matter expert, and the business representative

  • Create a matrix for all target tables and target columns

  • List all applicable source files and source databases for every target table

  • List all relevant source data elements for every target column

  • List data type and length for every target column

  • List data type and length for every source data element

  • Write transformation specifications for populating the columns:

    - Combine data content from multiple sources (if needed)

    - Split data content from one data element across multiple columns if source data was used for multiple purposes

    - Include aggregation and summarization algorithms

    - Include data-cleansing specifications for each column

    - Include logic for checking referential integrity (if not performed by the DBMS)

    - Include logic for error messages and record rejection counts

    - Include logic for reconciliation totals (record counts, domain counts, amount counts)

 

2. Test the ETL tool functions

  • Review the transformation specifications in the source-to-target mapping document

  • Determine whether the ETL tool functions can perform the required transformation logic

  • Determine what supplementary custom code must be written for transformations that cannot be handled by the ETL tool

 

3. Design the ETL process flow

  • Determine the most efficient sequence in which source data can be extracted from the source files and source databases

  • Determine the most efficient sequence in which the extracted source data can be transformed, cleansed, and loaded

  • Determine the sort and merge steps in the ETL process

  • Identify all temporary and permanent work files and tables

  • Determine what components of the ETL process can run in parallel

  • Determine what tables can be loaded in parallel

  • Draw the process flow diagram showing process sequence and process dependencies for:

    - Extracts from source files and source databases

    - Temporary and permanent work files and tables

    - Sorting and merging performed on the temporary and permanent work files and tables

    - Transformation programs (program modules)

    - Error rejection files and error reports

    - Load files and load utilities

 

4. Design the ETL programs

  • Design three sets of ETL programs for:

    - Initial load

    - Historical load

    - Incremental load

  • Modularize the ETL programs as much as possible

  • Translate the transformation specifications from the source-to-target mapping document into programming specifications for each ETL program module (or for each ETL tool module)

 

5. Set up the ETL staging area

  • Determine whether the entire ETL process can run in one central staging area or whether it has to be distributed (some ETL programs running on the mainframe, some on the ETL server)

  • Set up the ETL server (if a dedicated server is used)

  • Allocate space for temporary and permanent work files and tables

  • Create program libraries

  • Establish program-versioning procedures

10. Meta Data Repository Design

1. Design the meta data repository database

  • Review the logical meta model for the meta data repository

  • Design the meta data repository database (entity-relationship or object-oriented)

  • Draw the physical meta model diagram (entity-relationship or object-oriented)

  • Map the physical meta model to the logical meta model

  • Create the DDL for the meta data repository database

  • Create the DCL for the meta data repository database

  • Design backup and recovery procedures

  • Design versioning and archival procedures

 

2. Install and test the meta data repository product

  • Compile a list of meta data repository products and vendors

  • Compare the meta data repository products to the meta data repository requirements in the revised application requirements document and in the logical meta model

  • Create a scorecard for each evaluated meta data repository product

  • Create a scorecard for each evaluated meta data repository vendor

  • Narrow the list of meta data repository products and vendors to a short list

  • Arrange for meta data repository product demos

  • Check the vendors' client references

  • License (buy) the meta data repository product

  • Install and test the meta data repository product

 

3. Design the meta data migration process

  • Analyze all sources for extracting business meta data (for example, CASE tool, word processing documents, spreadsheets)

  • Analyze all sources for extracting technical meta data (for example, ETL tool, OLAP tool, data mining tool)

  • Design the tool interface process

  • Design the transformations for the extracted meta data

  • Design the load programs for the meta data repository

  • Write the programming specifications for the meta data migration process:

    - Tool interface process

    - Transformation process

    - Load process

 

4. Design the meta data application

  • Design the meta data repository report programs

  • Design the context-sensitive online help function

  • Design the media for displaying meta data ad hoc query results (for example, PDF, HTML)

  • Design the access interface process (Web display, GUI front end, meta data repository directory)

  • Write the programming specifications for the meta data application:

    - Reports

    - Queries

    - Access interface process

    - Online help function

11. Extract/Transform/Load Development

1. Build and unit test the ETL process

  • Code the ETL programs by following the programming specifications in the ETL program design document

  • If using an ETL tool, write instructions (technical meta data) for the ETL tool modules

  • Capture the ETL technical meta data for the meta data repository

  • Write code in the ETL programs to produce reconciliation totals, data quality metrics, and load statistics

  • Unit test each individual program module

  • If using an ETL tool, unit test each ETL tool module

  • Write the scripts to execute the ETL programs and the sort, merge, and load utilities in the proper sequence

 

2. Integration or regression test the ETL process

  • Create a test plan with test cases for the ETL process

  • Create test data (a representative subset of source data) for the ETL programs

  • Integration or regression test the entire ETL process from beginning to end using the test plan

  • Log the actual test results and document any test issues

  • Compare actual test results with expected test results

  • Revise the ETL programs (or the instructions for the ETL tool)

  • Retest the entire ETL process from beginning to end until it produces the expected results

 

3. Performance test the ETL process

  • Test individual ETL programs and ETL tool modules that read or write to high-volume tables

  • Test the parallel execution of ETL programs and ETL tool modules against high-volume tables

  • Test the ETL programs and ETL tool modules that perform complicated operations

  • Use full-volume data for performance testing

  • If using a stress test simulation tool, define the test components (programs, databases, tables, volumes, and so on) to the simulation tool and run a simulation test before testing with full-volume data

 

4. Quality assurance (QA) test the ETL process

  • Move all ETL programs into the QA environment

  • QA test the entire ETL process from beginning to end with the operations staff

  • Obtain approval from the operations staff to move the ETL process into production

 

5. Acceptance test the ETL process

  • Acceptance test the entire ETL process from beginning to end with the subject matter expert and the business representative:

    - Validate all cleansing transformations

    - Validate error-handling routines

    - Validate reconciliation totals

  • Obtain certification for the ETL process from the business representative

12. Application Development

1. Determine the final project requirements

  • Review the results of the prototype

  • Review the prototyping programs and scripts

  • Review the change-control document

  • Review the issues log

  • Review existing and mock-up report layouts

  • Review existing spreadsheets

  • Review the latest version of the application requirements document

  • Agree on the final project requirements; renegotiate the final scope if necessary

  • Update the application requirements document to reflect any changes

 

2. Design the application programs

  • Design the final reports

  • Design the final queries

  • Design the front-end interface (GUI, Web portal)

  • Design the online help function

  • Write the programming specifications for:

    - Reports

    - Queries

    - Front-end interface process

    - Online help function

  • Create a test plan with test cases and a test log

 

3. Build and unit test the application programs

  • Create sample test data

  • Load the development databases with sample test data

  • Rewrite or enhance prototyping programs and scripts

  • Code the final report programs

  • Code the final query scripts

  • Code the final front-end interface programs

  • Code the online help function programs

  • Unit test each individual program module

 

4. Test the application programs

  • Integration test (first release) or regression test ( subsequent releases) all programs and scripts from beginning to end, using the test cases from the test plan:

    - Report programs

    - Query scripts

    - Front-end interface programs

    - Online help function programs

  • Log the actual test results and document any test issues

  • Compare actual test results with expected test results

  • Revise the application programs and scripts

  • Retest the application programs and scripts from beginning to end until they perform as expected

  • Performance test those programs that have many JOINs, require complicated calculations, and read high-volume tables

  • Use full-volume data for performance testing

  • If using a stress test simulation tool, define the test components (programs, databases, tables, volumes, and so on) to the simulation tool and run a simulation test before testing with full-volume data

  • Move databases, programs, and scripts into the QA environment

  • QA test the entire application from beginning to end with the operations staff

  • Obtain approval from the operations staff to move the application programs into production

  • Acceptance test the entire application from beginning to end with the subject matter expert and business representative

  • Obtain certification for the application from the business representative

 

5. Provide data access and analysis training

  • Identify help desk staff to be trained

  • Identify "power users" or other business liaison personnel to be trained

  • Identify business people to be trained

  • Create training materials:

    - Presentation slides and instructor notes

    - Student workbooks with exercises

    - Exercise solutions and other pertinent handouts

  • Schedule training sessions

  • Conduct training sessions

  • Measure training effectiveness

13. Data Mining

1. State the business problem

  • Define the business problem

  • Obtain commitment for a data mining solution

  • Set realistic expectations for the data mining tool

  • Identify preliminary algorithms relevant to the business problem

 

2. Collect the data

  • Identify available data sources (operational as well as BI)

  • Extract pertinent data from various internal data sources

  • Acquire (purchase) pertinent data from external data sources

 

3. Consolidate and cleanse the data

  • Merge data from various internal data sources

  • Match and merge internal data with external data

  • Review the structure of the merged data

  • Select a sample of data for each analytical data model

  • Select related meta data from the meta data repository

  • Review the data domains (content) and measure the quality and reasonability of the data values

  • Validate domain reasonability across active variables

 

4. Prepare the data

  • Review the frequency distribution of categorical variables

  • Review maximum, minimum, mean, mode, and median for quantitative variables

  • Use statistical distribution parameters to filter noise in the data

  • Eliminate variables with missing values or replace the missing values with "most likely" values

  • Convert data formats to suit the particular data mining algorithm used

  • Derive new variables from original input data, where appropriate

  • Consolidate customers by assigning a household number to related customers

  • Relate customers with products and services

  • Apply data reduction, where appropriate

  • Apply data mining transformation techniques, where appropriate:

    - "Discretization" technique to convert quantitative variables into categorical variables

    - "One-of-N" technique to convert a categorical variable to a numeric representation for input to a neural network

 

5. Build the analytical data model

  • Create the analytical (informational) data model

  • Select data mining operations with the appropriate algorithms

  • Test accuracy of the analytical data model using confusion matrices and input sensitivity analyses

  • Repeat prior steps (if necessary) to train and retrain the model (beware of overtraining the model)

 

6. Interpret the data mining results

Note

This is an ongoing activity.


  • Review the data mining results

  • Look for results that are interesting, valid, and actionable

  • Present the new findings in a convincing, business-oriented way using visualization technology

  • Formulate ways in which the new information can best be exploited

 

7. Perform external validation of the results

Note

This is an ongoing activity.


  • Compare data mining results to published industry statistics

  • Validate the selection of variables and time frame of your data against the variables and time frame of the industry statistics

  • Identify the variations between your analysis results and the industry statistics

  • Determine the reasons for the variations

 

8. Monitor the analytical data model over time

Note

This is an ongoing activity.


  • Keep validating your analytical data model against industry statistics at regular time intervals

  • When industry statistics change, change your analytical data model and retrain it

  • Research the data mining capabilities of your competitors

  • Monitor your competitors' market share and adjust your analytical data model accordingly

14. Meta Data Repository Development

1. Build the meta data repository database

  • Run the DDL to create the physical meta data repository database structures

  • Run the DCL to grant CRUD authority on the meta data repository database structures

  • If licensing a meta data repository product, set up CRUD authority on the meta data repository product

  • Test all meta data repository product components, especially the meta data repository database

 

2. Build and unit test the meta data migration process

  • Code the tool interface programs or use the export facility of the various tools; (for example, ETL tool, CASE tool)

  • Code the meta data transformation programs

  • Code the meta data load programs or use the import facility of the meta data repository product or the DBMS load utility

  • Code the meta data programs that will run during the ETL process to capture:

    - Load statistics

    - Reconciliation totals (record counts, domain counts, amount counts)

    - Data-cleansing (reliability) metrics

    - Data rejection counts and reasons for rejections

  • Unit test the meta data migration programs (or meta data repository product modules):

    - Tool interface programs

    - Meta data transformation programs

    - Meta data load programs

  • Unit test the meta data programs that will run during the ETL process

 

3. Build and unit test the meta data application

  • Code the access interface programs (GUI or Web front end)

  • Code the meta data report programs

  • Code the meta data query scripts

  • Code the meta data repository online help function programs

  • Unit test the meta data application programs (or meta data repository product modules):

    - Access interface programs

    - Report programs

    - Query scripts

    - Online help function programs

 

4. Test the meta data repository programs or product functions

  • Create a test plan with test cases for:

    - Meta data migration process

    - Meta data repository application programs or product modules

    - Meta data programs that run during the ETL process

  • Create test data for meta data repository testing:

    - Meta data migration process

    - Meta data repository application or product modules

    - Meta data programs that run during the ETL process

  • Integration or regression test the meta data repository:

    - Meta data migration process

    - Meta data repository application or product modules

    - Meta data programs that run during the ETL process

  • Log the actual test results and document any test issues

  • Compare actual test results with expected test results

  • Revise the meta data repository programs

  • Retest the meta data repository programs from beginning to end until they perform as expected

  • Conduct QA testing with operations staff

  • Conduct acceptance testing with the subject matter expert and the business representative (QA and acceptance testing may be conducted at the same time)

 

5. Prepare the meta data repository for production

  • Install and test the server platform for the production meta data repository

  • Create DDL and DCL for the production meta data repository database

  • Write operating procedures for the operations staff with instructions for running meta data repository reports at predetermined dates and times

  • Write a reference guide for the help desk staff and the business people with instructions on how to use the meta data repository

  • Develop performance monitoring and tuning procedures for the meta data repository database

  • Develop meta data repository usage monitoring procedures

 

6. Provide meta data repository training

  • Identify help desk staff to be trained on the content and use of the meta data repository

  • Identify "power users" or other business liaison personnel to be trained

  • Identify business people to be trained

  • Create meta data repository training materials:

    - Presentation slides and instructor notes

    - Student workbooks with exercises

    - Exercise solutions and other pertinent handouts

  • Schedule meta data repository training sessions

  • Conduct meta data repository training sessions

  • Measure meta data repository training effectiveness

15. Implementation

1. Plan the implementation

  • Select an implementation strategy (gradual rollout or entire BI application at once to all business people)

  • Set the implementation date

  • Determine the number of business people who will be using the BI application (initially and eventually)

  • Schedule the necessary resources to participate in implementation activities

  • Schedule the functions to be rolled out

  • Prepare for organizational impact

 

2. Set up the production environment

  • Set up the production ETL program library

  • Set up the production application program library

  • Set up the production meta data repository program library

  • Create the production BI target databases (including data mining databases)

  • Create the production meta data repository database

  • Grant appropriate authority on the production BI target databases

  • Grant appropriate authority on the production meta data repository database

  • Grant appropriate authority to developers, operations staff, and business people to execute programs from production program libraries

  • Write operating procedures for operations staff with instructions for running ETL programs and application report programs at predetermined dates and times

  • Write reference guides for the help desk staff and the business people with instructions for how to use the BI application

  • Implement production security levels for all BI application components

 

3. Install all the BI application components

  • Move ETL programs into the production ETL program library:

    - Initial load

    - Historical load

    - Incremental load

  • Move application programs into the production application program library:

    - Reports

    - Queries

    - Front-end interface process

    - Online help function

  • Move meta data repository programs into the production meta data repository program library:

    - Meta data migration programs

    - Meta data application programs or product modules (including meta data repository online help function programs)

 

4. Set up the production schedule

  • Set up the ETL process on the job scheduler

  • Add to the job scheduler the meta data programs that run during the ETL process

  • Set up on the job scheduler the regularly scheduled application report programs

  • Set up on the job scheduler the regularly scheduled meta data repository programs:

    - Meta data migration process

    - Meta data repository application

 

5. Load the production databases

  • Run the initial load process

  • Run the historical load process

  • Run the meta data migration process

 

6. Prepare for ongoing support

  • Establish a schedule for on-call emergency support

  • Schedule database maintenance activities for the BI target databases and the meta data repository database:

    - Database backups

    - Disaster recovery testing

    - Database reorganizations

  • Schedule database monitoring activities for the BI target databases and the meta data repository database:

    - Performance

    - Growth

    - Usage

  • Schedule data quality monitoring activities for the BI target databases:

    - Meta data metrics

    - Quality spot checks

  • Develop or review capacity plans for the BI platform:

    - Processors

    - Disk storage

    - Network components (including bandwidth)

  • Start production processing (go live)

16. Release Evaluation

1. Prepare for the post-implementation review

  • Review budget expenditures

  • Review the original project plan and final schedule

  • Review the estimated and actual task completion times

  • Review the issues log (resolved and unresolved issues)

  • Review the change-control procedure and scope changes

  • Review unfulfilled requirements (dropped from scope)

  • Review the effectiveness of the development approach

  • Review the effectiveness of the team structure

  • Review the effectiveness of the organizational placement

  • Review the existing infrastructure (technical and nontechnical)

  • Identify missing infrastructure pieces (technical and nontechnical) that hindered progress on the BI project

  • Assess the performance of the BI application

  • Review the effectiveness of training

  • Review the implementation (rollout) strategy

  • Review the effectiveness of the release concept

 

2. Organize the post-implementation review meeting

  • Create the preliminary post-implementation review agenda:

    - List date, time, and place

    - List invited attendees

    - List topics for discussion

    - List and assign topics for research

    - List questions to be discussed and answered

  • Solicit additional topics and questions from attendees

  • Send out the preliminary agenda to attendees

  • Schedule the meeting at an off-site location

  • Arrange facilitation by a third party

  • Arrange for a third-party scribe to take notes during the meeting

  • Revise and send the final meeting agenda

  • Send out documentation to be discussed during the review

 

3. Conduct the post-implementation review meeting

  • Introduce the attendees

  • Explain the rules for the facilitated session

  • Discuss each item on the agenda

  • Document discussions, suggestions, resolutions

  • Document action items

  • Assign action items

  • Establish a completion or response date for each action item

 

4. Follow up on the post-implementation review

  • Document unfulfilled requirements (dropped from scope), which should be bundled with new requirements for the next (or a future) BI release

  • Write the meeting minutes

  • Publish the meeting minutes

  • Work on assigned action items

  • Monitor the work performed on action items, especially those that were assigned to people outside the BI project team

  • Document the action item results

  • Publish the action item results

  • Implement nontechnical infrastructure improvements to:

    - Development approach

    - Use of the development methodology

    - Processes and procedures

    - Guidelines

    - Standards



Business Intelligence Roadmap
Business Intelligence Roadmap: The Complete Project Lifecycle for Decision-Support Applications
ISBN: 0201784203
EAN: 2147483647
Year: 2003
Pages: 202

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net