Chapter Twenty-One. Practical Guidelines Matrix


Development Step

Dos

1. Business Case Assessment

  • Identify your organization's strategic business goals. Understand the impact of the external drivers. A common cause of failure for BI decision-support initiatives is that the objectives of these BI initiatives do not align with the strategic business goals of the organization.

  • Let the business representative work with a number of business managers and business executives to define the business value of the BI application. Help him or her differentiate between the needs of different business people. The level of detail, timeliness, accuracy, security, and external data needs will differ for senior managers, knowledge workers, business analysts, sales and marketing personnel, and external clients .

  • Concentrate your efforts on defining the business case with business people from the marketing arm of the organization. In many industries, marketing personnel often serve on the forefront of BI decision-support initiatives. Use their business savvy to help identify business benefits, and call upon their influence to sell the value of the BI initiative throughout the organization.

  • Have a clear business driver. You cannot justify BI costs (many range up to $25 million or more) unless you have very specific business reasons to create your BI application.

  • Keep it simple. Start with one business need (business problem or business opportunity) that you would like the BI application to satisfy . With a flexible design, you can add more functionality later, once the initiative has proven itself profitable and once some comfort level has been attained.

  • Clearly state the financial consequences of the current business problems and how these problems could be resolved with a BI solution. Include this in the cost-benefit analysis.

Development Step

Don'ts

 
  • Don't depend on any single business unit to completely fund the BI initiative. The true business potential for BI usually lies in delivering information that is used across the lines of business.

  • Don't get maneuvered into promising a specific dollar-value return on investment that the BI information is supposed to offer. For example, if a business benefit is the potential for cross-selling, the amount that could be generated from cross-selling is pure guesswork. One selling experience could generate a minute amount, while another could generate enough revenue to cover the entire cost of the first BI application release.

  • Never work alone. Realize that business managers would like to profit from any competitive advantage they can get in this increasingly competitive marketplace . Business managers are quite willing to work with IT on business justification for a BI decision-support initiative.

 

Tips and Rules of Thumb

 
  • Chances to succeed or fail:

    - If the BI decision-support initiative is driven by a business problem, it has a very good chance of success.

    - If the BI decision-support initiative is driven only by the latest trend in technology, it has a very good chance of failure.

  • Keep complexity and integration to a minimum on every BI decision-support project to avoid a higher risk of failure.

  • As early as possible, assess the risk for the six categories of technology, complexity, integration, organization, staffing, and financial investment in order to gain a better understanding of the BI initiative.

  • Usage: 20 percent of the business people will be using the BI decision-support environment 80 percent of the time. Initially address the needs of those people who form that 20 percent.

2. Enterprise Infrastructure Evaluation

 

Section A. Technical Infrastructure Evaluation

  • Pay attention to scalability; avoid any components that scale poorly. Scalability is one of the most important factors to be considered .

  • Watch out for products that offer unnecessary technical elegance. The price you will pay for this elegance is either needing more staff training or requiring more resources.

  • Use tools whenever possible instead of writing your own custom code.

  • Understand the types of analyses the business people need to perform so that you can choose the appropriate tool set.

  • Keep current with technology.

 

Don ' ts

 
  • Don't jump to implement a very large database (VLDB) unless the vendor's VLDB features are delivered and have proven themselves .

  • Don't expect to buy a BI decision-support turnkey solution. A BI decision-support environment evolves over time.

  • Don't select a tool without participation by the business people; otherwise they may not buy into it and may not use it.

  • Never assume that because a software product is labeled "OLAP" it will satisfy the analytical reporting needs of the business community.

  • Don't select an OLAP tool just because it is popular. First and foremost, it must have functionality that matches the analytical requirements of the business community, and it must be easy to use.

 

Tips and Rules of Thumb

 
  • Hardware: One of the main features required to support the BI decision-support environment is scalability. Therefore, monitor the rapid changes in:

    - Data volume

    - Load frequencies

    - Data access patterns

    - Number of reports and queries

    - Number of tools

    - Number of people accessing the BI target databases

    - Number of operational systems feeding the BI target databases

  • DBMS functions: The necessary and important functions of the selected DBMS for the BI decision-support environment are:

    - Degree of parallelism in handling queries and data loads

    - Intelligence in handling dimensional database designs (optimizers)

    - Database scalability

    - Internet integration

    - Availability of advanced index schemes

    - Replication on heterogeneous platforms

    - Unattended operations

  • DBMS workload: The BI decision-support environment is an unpredictable mix in workload demand. It is common for a BI application to support hundreds of knowledge workers, business analysts, and business managers performing data access requests that range from simple indexed retrievals to more complex comparative analysis queries. Such an environment requires that the DBMS not only provide efficient complex JOIN processing but also manage and balance the overall workload effectively.

Section B. Nontechnical Infrastructure Evaluation

  • Identify the missing or ineffective nontechnical infrastructure components. These components provide the glue for cross-organizational integration.

  • Pay attention to meta data. Meta data is much more than documentation. It facilitates navigation through the BI decision-support environment and is an integral part of every BI project.

  • Provide standards for your formal project documents. For example, specify that each document must have a title, description, purpose, author, owner, creation date, latest update date, latest version number, revision history, page numbers , and sign-off space.

 

Don ' ts

 
  • Don't attempt to build all components of the nontechnical infrastructure at once. Start with standards, then model the common business data architecture, and go from there. This is an iterative refinement process.

  • Don't forget to allocate time and resources on the project plan for working on noninfrastructure activities.

  • Don't skip data standardization because you think it is too difficult or takes too much time. One of the main reasons for a BI decision-support initiative is to address data quality and data standardization.

 

Tips and Rules of Thumb

 
  • Use a meta data repository if you have one. If you do not have one, but you have a fairly sophisticated CASE tool, use that CASE tool in the interim until a more permanent repository solution is established. At a minimum, document the source-to-target data mapping and the transformation rules in some format that can be accessed by the business people and that can later be used to populate a meta data repository.

  • Determine what types of standards, guidelines, and procedures already exist in the organization ”what is useful, what is not, and what is missing. The fastest way to assess this is to talk to the staff of the following groups: data administration, strategic architecture, quality assurance, security, audit, standards, training, and even operations.

  • Include at least one nontechnical infrastructure requirement with every BI project. Implement at least 80 percent of its functionality, and plan to revise it in future BI projects.

  • As much as possible, reuse the standards, guidelines, procedures, and so on that already exist in the organization. Try not to reinvent the wheel. Get input from as many people in IT and on the business side as you can to develop a practical and useful nontechnical infrastructure at your organization.

3. Project Planning

  • Use Business Intelligence Roadmap to create your work breakdown structure and project plan. If you have Microsoft Project, copy and modify the work breakdown structure on the CD included with this book.

  • Insist on having a full-time business representative matrixed into your project. Development work will go much faster because issues can be resolved on the spot. Also, knowledge transfer will occur because the business representative will be part of the project core team. This will also go a long way toward eliminating the "us versus them" syndrome between IT and the business.

  • Create a detailed project charter and use it as your baseline for change control.

  • Have stringent change-control procedures. Never change the scope without renegotiating the constraints of time, budget, resources, and quality. If your project plan was doable before the scope change, it will no longer be doable unless you renegotiate all constraints.

  • Plan to keep an issues log. Be aware that some issues may not get resolved during the project. This may result in a scope change, and some deliverables may have to be deferred to a future release.

  • Perform a detailed risk analysis. Be sure to weigh the risks and provide recommendations and a plan to mitigate them. Also, include a contingency plan in case the risks cannot be avoided.

  • Consider all your assumptions as potential risks, and manage them accordingly .

  • Assess the quality of source data early so that you can have confidence in your estimates for the project completion date.

 

Don ' ts

 
  • Never build the BI application "by the seat of the pants" because you will miss tasks , and you will underestimate the effort.

  • Never make any assumptions about knowing intuitively what tasks have to be performed and how long they will take. Document your estimates in a detailed project plan.

  • Don't assemble a large team. Although BI projects are big and complicated, communication and coordination among the team members will slow down the project. Keep the project core team down to about four or five people (never more than seven). Keep each step core team down to two or three people. Remember that multiple roles can be assigned to one person, and that multiple people can share one role.

  • Don't plan to perform all tasks or all activities in Business Intelligence Roadmap . Select only the ones you need and add any additional tasks that may be unique to your BI decision-support project.

  • Don't forget to allocate time to be spent on other common project- related or organizational activities not listed as tasks on your project plan (for example, troubleshooting other systems, attending department meetings, and dealing with computer downtime).

  • Don't forget to allocate time for vacations , sick leave, jury duty, and so on.

  • Don't spend too much time on tuning your estimates, only enough to have confidence in them.

 

Tips and Rules of Thumb

 
  • Review the stages and determine at which stage your project needs to begin. For example, some projects will begin with justification, others will begin with planning, and some short enhancement releases may even begin with design. Then, in a similar vein, review the steps and determine which ones are appropriate for your project. Finally, review the activities as well as the tasks and subtasks , and extract only those you need to perform on your project. Most projects require 90 percent of the activities to be performed for the selected steps.

  • Be prepared to answer the four questions that are always asked about every project:

    - What will be delivered?

    - How much will it cost?

    - When will it be done?

    - Who will be doing it?

  • Project failures: A very high percentage of BI projects are aborted because of inadequate or nonexistent project planning and because of inadequate resources. Don't become one of the casualties.

  • One of the critical success factors is having a very strong business sponsor who understands the release concept of BI projects and who is agreeable to keeping the scope small and the quality high.

  • Skills: It takes one week to acquire a basic skill in a seminar and six months to master it. Be sure you understand the skill set of your BI project team members.

  • Estimates of time required for data cleansing are often missed by a factor of three or four. Estimate the time you think you will require and multiply it by three if your source data is stored in relational databases; multiply it by four if your source data is stored in old flat files.

  • Data quality: Find out in how many places the data is stored and how many variations of the data exist.

    - If the same customer record is stored in no more than two places and the data is consistent, you have a minor problem.

    - If the same customer record is stored in three to five places and the data is consistent, you have a medium- sized problem.

    - If the same customer record is stored in six or more places and the data is inconsistent, you have a serious problem!

    Work with several important business people who will be using the BI data and discuss your findings with them.

4. Project Requirements Definition

  • Prioritize the functional requirements into mandatory, important, and nice-to-have. Concentrate on the mandatory functions and add important functions as time allows. Nice-to-have requirements should be dropped from the project scope.

  • Prioritize data requirements into mandatory, important, and nice-to-have. Identify the exact business needs that will be met by each category. Concentrate on the mandatory and important data, and don't include the nice-to-have data. Remember to keep data scope to a minimum because that is where all the effort is.

  • Determine whether the required source data has ever been gathered and stored in an operational system. You cannot create new data with a BI decision-support application.

  • Try to determine the types of ad hoc queries the business people may want to write. Although "ad hoc" implies that those queries are not yet defined, most business people have a pattern of questions they usually ask on a regular basis. This information will have bearing on the design of the BI target databases.

  • Be prepared to negotiate and renegotiate the final deliverables throughout the project. Unexpected roadblocks can derail a project if the scope is not renegotiated with the business representative and the business sponsor. Keeping the business representative involved in all decisions increases your chances for success despite the obstacles.

 

Don ' ts

 
  • Don't work without a business representative. Project requirements must come from business people, not from IT systems analysts.

  • Don't spend too much time in this step on source data analysis. Investigate suspected data quality problems only enough to get a better understanding of the cleansing complexity. Conversely, don't wait until design or testing to discover dirty source data. Preliminary source data analysis must be performed in this step, and rigorous source data analysis will be performed in Step 5, Data Analysis.

  • Don't spend too much time in this step on refining the logical data model, only enough to take it beyond the conceptual level and to add the significant attributes (data elements).

  • Never accept a wish list of data elements and a stack of mock reports as the final requirements. Requirements need to be negotiated.

  • This is not the time to commit to service-level agreements because it is too early to know if they can be met. However, you can document the outer limits of what will be acceptable and what will not be acceptable to the business sponsor and to the business people.

 

Tips and Rules of Thumb

 
  • Always review the scope to see if it is still doable within the constraints of time, budget, resources, and quality. Always balance the constraints and renegotiate them when necessary with the business sponsor and the business representative.

  • Defining requirements is a different activity than designing a solution. Don't let the technicians jump into designing the solution at this time, as many like to do.

  • Try to conduct as many group interviews as possible. The synergy created during these sessions often brings new requirements to the surface, uncovers definitional and quality problems, clears up misunderstandings, and occasionally even resolves disputes.

  • In regards to source data for the BI target databases, going after too much operational data "just in case they'll need it some day" leads to more complex data models and more time and money spent for data extraction, cleansing, and maintenance. Not all data is created equal. Keep in mind that in most cases, only 20 percent of the BI data is regularly used 80 percent of the time.

  • Ask the IT staff to help identify which source data is seldom or never used by the business people. This means that it most probably does not have to be included in the BI project scope. However, let the business representative make the final decision, not the technicians.

5. Data Analysis

  • Ensure that the logical data model reflects the cross-organizational understanding of data and is not an isolated view of one business representative or one department. For example, if the desire is to study customer profiles that cross all product lines in the organization, common customer and product definitions are critical components.

  • Understand the kinds of external data that will be used in the BI decision-support environment. Determine what the basis will be to synthesize the external information with the internal information.

  • Review all business meta data with the data owners and with the information consumers (business managers and business analysts in other departments who use that data) to obtain their concurrence on data definitions, data domains, business rules, and data usage.

  • Establish a dispute resolution procedure to resolve differences among business people from different departments regarding data definitions, data domains, business rules, and data usage. These procedures should suggest the involvement of a BI arbitration board as a last resort.

  • Urge the data owners to accept responsibility and accountability for the quality of their data. Ask them to clean up their operational source data, if feasible .

  • Identify the critical data elements that have real business value. These critical data elements have to be cleansed to meet the business expectations of data reliability.

 

Don ' ts

 
  • Never develop a logical data model in isolation. Business people must drive the data requirements: what information they need, how they need it, when they need it, and how clean it has to be.

  • When reviewing the business meta data with business people from other departments, resist including their data requirements in your scope. Remind those other business people that you only want them to validate the correctness of your data definitions, domains, business rules, and so on.

  • Don't ask programmers or systems people to create the business meta data. (Technicians generally do not enjoy analysis, just as business analysts generally do not enjoy programming.) Find people who enjoy detailed business analysis, such as data administrators, business analysts, and subject matter experts.

  • Don't assume that the more current the data, the higher the accuracy. If quality processes are not enforced at an organization, new data can get corrupted within months.

  • Don't underestimate the time it will take to resolve the many battles among the business people to arrive at a consensus on valid data definitions, data domains, business rules, and interpretations of data.

  • Never suck and plunk! In other words, don't plan to move all the source data as is. One of the main reasons for a BI decision-support initiative is gaining access to reliable and clean data. Simply giving access to dirty data through a BI application will not help business executives make better decisions.

  • Don't attempt to cleanse every piece of data because the cleansing task can become overwhelming without providing real business value. Triage your data-cleansing activity.

 

Tips and Rules of Thumb

 
  • The most effective techniques for both data integration and source data analysis are normalization rules and logical data modeling. Involve a data administrator on your project to create the logical data model. Data administration should then compare and merge your project-specific logical data model with the enterprise logical data model to find and resolve data discrepancies.

  • In regards to business rules, use top-down logical data modeling to define the business rules and bottom-up source data analysis to discover where the source data violates the business rules.

  • While the data is being modeled , have the source data analyzed . Since there is never enough time to analyze each data element, prioritize the data elements, and concentrate on the most critical ones.

  • The data analysis effort on BI projects can be greatly reduced if logical data models and business meta data already exist for the operational source systems. Unfortunately, these models and meta data are rarely created for operational systems and, therefore, they rarely exist. The business representative and business sponsor should urge the owners of the operational systems ( line-of-business managers) to consider creating logical data models and collecting business meta data for their operational systems in the future.

  • Business managers (and many IT managers) who have not been through data cleansing before and who are unfamiliar with the effort often underestimate the time required by a factor of four.

  • Source data analysis is primarily an intensive manual effort. Although tools can help with the effort, they cannot eliminate it.

  • Solicit help from the IT staff when needed. Systems analysts, developers, and database administrators often know the technical aspects of the data more intimately than the business representative or the data owners do. The IT staff knows how and where the data is stored, processed , and used. They often have in-depth knowledge of the accuracy, the relationships, and the history of the data.

  • Be sure that you have a mechanism for capturing the business meta data during this step. At a minimum, capture:

    - Data names

    - Definitions

    - Relationships

    - Cardinality

    - Data domains

    - Business rules

  • Source data quality will only be as good as the enforcement of quality processes in the operational systems. Quality enforcement should include data entry rules, edit checks, and training. Improving the organization's quality of data is a holistic approach and cannot be achieved through data cleansing alone.

6. Application Prototyping

  • Decide on the type of application prototype to create:

    - Show-and-tell

    - Mock-up

    - Proof-of-concept

    - Visual-design

    - Demo

    - Operational

  • Get business people to participate in the prototype from the beginning, especially during needs assessment and graphical user interface construction. Remember, it is their definition of success and failure that counts, not yours.

  • Make sure that time limits are set and followed for each prototype iteration.

  • Limit each prototype to a specific subject area or a specific application function.

  • Review and renegotiate the project constraints whenever the scope of the prototype changes.

  • Communicate daily with the project core team members, especially the database administrator and the ETL lead developer.

  • Make sure that the business people realize that only a prototype is being built, not a full-scale, production-worthy BI application.

  • Learn about the reporting dimensions, the level of aggregation needed, and the variety of data accessed. Test the database design during the prototype.

  • Stop prototyping when you reach the point of diminishing returns.

 

Don ' ts

 
  • Don't promise what you cannot deliver. Unfulfilled promises can damage your reputation.

  • Don't include all the project requirements in the scope of the prototype. Even when using the operational prototype as a development method for the access and analysis portion of the BI application, each prototype should deliver only a small piece of the final BI application.

  • Don't keep adding new functionality to the end of the prototype when the original objectives of the prototype have been reached. If you want to continue prototyping, set new objectives, and revise the project plan.

  • It is best not to use an arbitrary sample of source data. Source data is usually voluminous, and an arbitrary sample may not necessarily represent all the possible combinations that need to be tested in that particular prototype.

  • Don't contaminate the prototype with poor-quality data, and don't spend any time cleansing the data unless you are testing an ETL function.

  • Don't address too many data integration requirements. The more integration, the higher the complexity, and the longer the prototype will take. Data integration requirements should be kept to a minimum.

  • Don't ignore the front-end interface design. Be sure to have the business people involved.

  • Don't work only with the business people who are vocal or who are your favorites. You need to have some nonbelievers as well because they will tell you more readily when something is not working. You want to find out during prototyping what is working and what is not. Most importantly, include the business people who "sign the check" for the prototype since the funding is coming out of their budget and they form the core group for whom the prototype is being built.

  • Don't underestimate the time required to implement suggested changes.

  • Don't use the prototype to test technology performance. This is not a stress-testing environment.

 

Tips and Rules of Thumb

 
  • To test ease of use, let the business people have as much hands-on experience with the prototype as possible so that you can see whether the application will be easy to use the way you are planning to build it.

  • Use the prototype activity to build and maintain a coalition comprised of line-of-business managers, IT managers, and senior business executives.

  • Ensure that an appropriate number of business people participate in the prototype (more than one but no more than eight).

  • Avoid using a large project team to build the prototype. Don't add to the team size if deadlines are missed. Instead, shrink the team size! "Bloating" the team will increase the time required for staff communication and slow things down even more. Shrinking the team size will reduce required communication among team members and will enable the team to get things done faster.

  • Business managers often think they need only summary data, but sooner or later they end up asking for detailed data. Depending on your design and the tools you are using, providing detailed data may not be as trivial as it sounds. Be sure to test it in the prototype.

  • Consider building an operational prototype for the access and analysis portion of the BI application. Operational prototypes are robust enough, and access and analysis tools are flexible enough, that this type of prototype could naturally evolve into the final access and analysis application after several tightly controlled iterations. The activities of Step 12, Application Development, could be applied to the final iteration of the operational prototype.

  • Define ease of use. When is a system considered easy to use? Some measurements include the following:

    - Learning curve: One or two days is the maximum that a business person can usually set aside for learning a new application.

    - Speed of task accomplishment: By using the new BI application, knowledge workers and business analysts have to be able to finish their analysis tasks at least 25 percent faster.

    - Subjective satisfaction: The business people should be looking forward to using the new BI application and not avoiding it.

    - The help function: The business people should be using the help function at least once a day for the first month rather than avoiding it because it is too complicated or too confusing.

    - Difference in usage and needs: Not all business people are created equal. Understand the different needs of executive managers versus business analysts.

7. Meta Data Repository Analysis

  • Have the business representative participate in the meta data requirements definition process. Meta data is an important part of every BI project and needs to be given the same amount of attention as business data.

  • Establish data ownership and allow those data owners to control the meta data for the business data over which they have authority. Data ownership can be assumed by business managers individually or by a representative committee.

  • Work with the data administrators to develop or revise cross-organizational data standards and publish those standards.

  • Plan to consolidate all existing meta data from various tools into one enterprise meta data repository.

  • Capture and validate the meta data requirements through a logical meta model using the entity-relationship modeling technique, even if you later decide to license a meta data repository or to build one based on an object-oriented design.

  • Pay equal attention to business meta data and technical meta data.

 

Don ' ts

 
  • Don't try to do everything at once, but don't forget the big picture either. The meta data repository will evolve over time just like the other components of the BI decision-support environment.

  • Don't forget to analyze the interface requirements for the meta data repository. There are two types of interfaces to consider: the access interface for business people and technicians, and the tool interface to the ETL, OLAP, CASE, and other tools.

  • Don't consider meta data to be just documentation. Meta data provides the context for business data and is used as a navigation tool in the BI decision-support environment. It is therefore an integral deliverable of every BI application.

 

Tips and Rules of Thumb

 
  • If your organization has no meta data solution and this is the first time you are addressing meta data, triple your time estimates for the development steps in the Meta Data Repository track.

  • Prioritize the meta data components into three categories:

    - Mandatory

    - Important

    - Optional

    All mandatory meta data components should be captured and stored. Also try to capture as many important components as you can, and postpone the optional ones if you run out of time.

  • When creating the logical meta model, draw the entity-relationship diagram for the most critical meta data components:

    - Entity

    - Attribute

    - Relationship rules (cardinality and optionality)

    - Table

    - Column

    - Keys (primary and foreign)

    - Domain

    - Data type and length

    - Definition

    - Transformation rules

    Add additional components to your logical meta model as time allows.

8. Database Design

  • Work with multiple business people to understand the types of analyses they need to perform and the ways they will access the data. Choose the appropriate database design schema based on those requirements and access patterns.

  • Use the entities on the logical data model and the business meta data as a starting point for designing the conformed dimensions and the normalized snowflake dimensions.

  • Review the lessons learned from the prototyping activities. See how many reports and queries can be run in parallel, what dimensions are needed, and what security is needed.

  • Identify similar reporting patterns among business analysts in order to minimize the number of star schemas needed.

  • Plan to monitor the performance of queries and reports on a regular basis.

  • Expect to continually refine the BI target database designs.

  • Make decisions about clustering tables, partitioning, data placement, and indexing with performance in mind.

  • Index those columns that are searched on frequently and that have a high distribution in values.

 

Don ' ts

 
  • Never attempt to implement a fully normalized logical data model! The purpose for a normalized logical data model in any environment (operational or BI) is to facilitate business analysis. Even if an entity-relationship design schema is chosen , the logical data model must still be denormalized into a physical data model (database design).

  • Don't assume that one size fits all. Different business people have different access requirements; they need different levels of detail and summarization; and they have different requirements for timeliness, availability, and quality.

  • Don't assume that all BI target databases must be multidimensional. There are occasions where an entity-relationship design is appropriate. Be sure to let the requirements drive the database design decision, not the latest technology or design trends.

  • Don't count on being able to reload your BI target databases from scratch after a catastrophic database failure. It would take a very long time, if it could be done at all. Plan on taking frequent backups (full, partial, or incremental).

  • Don't blindly use a physical data model (database design) developed for a different organization. Often those models reflect compromises made for a given organization, which could easily lead to having to redesign your database later.

 

Tips and Rules of Thumb

 
  • Database administrators ”not programmers ”should design databases. Do not use database administrators as "data entry clerks" who simply type up and run DDL handed to them by the programmers. The job description of database administrators includes database design for the main reason that database administrators are or should be trained in the DBMS-specific optimizers as well as in multidimensional design techniques.

  • Understand the differences between operational databases and BI target databases. Key features in well-designed operational databases are normalization, utilization of DBMS referential integrity, and judicious use of indexing. Key features in BI target databases are just the opposite : denormalization, reliance on program-enforced referential integrity, and heavy indexing.

  • Many BI target databases will almost inevitably fall into the VLDB category.

    - Small databases are in the range of 10 to 100 GB

    - Medium databases are in the range of 100 to 300 GB

    - Large databases are in the range of 300 to 800 GB

    - VLDBs are in the range of 800 GB to many terabytes

    Since a BI target database is grow-only, today's small database is next year's medium (or large) database.

  • Clustering is a very useful technique for sequential access of large amounts of data. Since sequential access of data is the norm in BI applications, using this technique can dramatically improve performance.

  • Physically co-locate related tables on the disk drive.

  • When indexing the dimension tables, either of two extreme approaches may pay off.

    - Index everything in sight.

    - Leave the table as a heap (no physical index or ordering whatsoever).

    The decision depends on the makeup , usage, size, and distribution of data values in the dimension tables.

  • When should an index not be built?

    - When value entries in the index are more than 15 percent, building an index does not pay off. For example, if an index is built on the column Gender_Code there will be about 50 percent entries in the index for the value " female " and 50 percent entries for the value "male."

    - When the index database is searched sequentially.

    - When performance still does not improve.

    - Or worse , when as a result of building the index, performance is degraded even further because of the additional adding and dropping of indices and maintaining the index database.

  • When should the databases be reorganized?

    - When 5 percent of the records are inserted or deleted.

    Removing fragmentation improves performance.

9. Extract/Transform/Load Design

  • Remember that the more data elements you include in the scope, the more transformations will need to be coded, and the longer the ETL process will run.

  • Let your transformation requirements drive the selection of an ETL tool, not marketing hype from the vendors .

  • Create the load files for all BI target databases for the same load period at the same time from the same ETL process. Loading one BI target database only to turn around and read it again to extract, transform, and load another BI target database is too time consuming and unnecessary.

  • Share one ETL process and staging area. Do not allow each data mart to have its own ETL process because that produces stovepipe systems. While a staging area can be decentralized (running different ETL functions on different platforms), the only valid reasons for decentralizing are due to different types and locations of source files and source databases, as well as the functions, capabilities, and licensing terms of the ETL tool. Nevertheless, the ETL staging area should be managed as one logical unit.

  • Include data quality metrics and reconciliation totals in the ETL process design for every program that moves or manipulates data.

 

Don ' ts

 
  • Don't develop the ETL process flow without the assistance and participation of the database administrator. Database administrators often know about tricks of the trade that can help streamline the ETL process flow.

  • Never limit the ETL process to the technical conversion rules. ETL is much more than converting the data type and length of your source data structures to your target data structures. It also includes transformation logic for business data domain rules and business data integrity rules.

  • Don't automatically delete rows from the BI target databases after discovering that an operational record was deleted from a source file or source database. Develop business rules for the ETL process, which specify when to propagate operational record deletions into the BI target databases and when not to propagate them.

  • Don't overlook aggregations and summarizations for the data marts. These need to be incorporated toward the end of the ETL process flow.

  • No matter how advanced the ETL tool is, don't rely on it to know exactly how to populate the BI target databases by following standard conversion rules. Only the business people and IT understand the business rules that are buried in the source data and in the operational programs.

 

Tips and Rules of Thumb

 
  • About 80 percent of ETL work is in the "T" (transform). It is the most complicated process of any BI decision-support application. Be sure to allocate enough time for designing the ETL processes. The success of your BI initiative may depend on it.

  • At a minimum, transformations should reconcile different expressions of the same data from different sources. Transformations should also enforce business data domain rules and business data integrity rules.

  • The biggest challenge in determining the correct transformation and cleansing specifications is finding people who understand the origin and history of the source data. Do not limit yourself to the business people. Look for programmers who have been around for many years and who know the history of some of the source files and source databases.

  • If loading past history is not required with the initial implementation of the BI application, try to postpone building the historical load process until the next release. That will reduce the scope of the project and speed up delivery of the BI application release.

  • Turn off referential integrity during the ETL load cycle, and turn it back on after the load has completed to allow the DBMS to find referential integrity violations. Look for tables in "check pending."

  • Drop all indices during the ETL load cycle, and recreate them after the load has completed.

10. Meta Data Repository Design

  • Review and expand (if necessary) the meta data repository design with every BI project. Be sure your design can accommodate expansion.

  • Evaluate off-the-shelf meta data repository products, which have the capability to integrate business meta data with technical meta data. It may be easier and faster to install a product than to build custom software, especially when the schedule is tight.

  • Follow up with client references provided by vendors when licensing (buying) a meta data repository product. Ask the vendors' clients what the vendor products cannot do and what they don't like about the products.

  • Design a reusable interface between the meta data repository and other tools (CASE, ETL, OLAP, report writers, other access and analysis tools)

  • Provide a context-sensitive online help function for meta data. This feature is invaluable to business people, who will use it to help them navigate not only through the meta data repository but also through the BI decision-support environment as a whole.

 

Don ' ts

 
  • Don't shortcut the meta data repository design by modeling only the most common meta data entities. When building a customized solution, design it to be fully functioning, even though you may not implement all of the features at once.

  • Don't fail to automate the integration process of linking the business meta data with the technical meta data. Linking (relating) those two types of meta data manually is a labor-intensive process.

  • Don't forget to include meta data components for capturing metrics, such as data reliability factors, reconciliation totals, and load statistics.

  • Don't even consider any meta data repository products that do not satisfy your mandatory meta data requirements.

  • Don't forget to evaluate the vendors in addition to their products. Vendor stability is important for sustaining a meta data repository solution.

 

Tips and Rules of Thumb

 
  • Since the meta data repository is a database, many tips and rules of thumb that are applicable to database design are also applicable to meta data repository design.

  • Start with a central meta data repository database because distributed meta data repositories and XML-enabled meta data solutions are more difficult to build and maintain.

  • Start with an entity-relationship design rather than an object-oriented design. They are easier to implement and easier to comprehend.

  • About 90 percent of direct access to the meta data repository will be from the business people and only about 10 percent from the technicians. Meta data helps business people understand the meaning of the data, the quality of the data content, where the data came from (source system), and how to use it. Therefore, spend time on designing an easy-to-use access interface to the meta data repository.

11. Extract/Transform/Load Development

  • Run as many ETL programs in parallel as possible to cut down on runtime. Since the ETL staging window at most large organizations is very short (often only a few hours per night), be prepared to run your ETL process over several nights.

  • Produce reconciliation totals for record counts, domain counts, and amount counts. These totals should be stored in the meta data repository for every load cycle.

  • Produce a detailed error accounting for source data that failed the edit rules and was rejected, as well as for source data that failed the edit rules but was accepted. Errors should be categorized, such as number of missing values, number of domain violations, number of business rules violations, and so on.

  • Perform rigorous testing with formal test plans, test cases, and expected test results.

  • Involve the business representative in writing the test cases and expected test results.

  • Use a stress test simulation tool to project estimated performance numbers before running an actual performance (stress) test with real data.

  • Use peer reviews or XP programming techniques for quality control.

 

Don ' ts

 
  • Developers should not test their own code; however, they can test the code of other developers.

  • Don't limit the business representative and the subject matter expert to testing only the access and analysis portion of the BI application. Be sure they are also involved in testing the ETL process.

  • Don't consider integration or regression testing completed until all programs in the ETL process run as expected from beginning to end. In other words, don't just test individual modules until they are error free, but test and retest the entire job stream until it runs error free.

  • Never skip testing because you think you can fix problems in the next release. If the BI target databases are not loaded properly with correct data, the BI application is of no use to the business people.

  • Don't expect to cleanse the source data in the operational systems. The operational systems staff actually expect their dirty data to run successfully! In many cases, modifying an operational system is not cost-effective . On the other hand, old practices and bad habits that produce these data quality problems should be addressed. Ask your business sponsor to make the owners of the operational systems and the business executives aware of the cost and effort it takes to cleanse their bad data for the BI decision-support environment. Some general data quality standards should be implemented across the entire organization to avoid perpetuating data quality problems.

 

Tips and Rules of Thumb

 
  • Organizations devote close to 80 percent of the BI project time to back-end efforts, including labor-intensive data cleansing. Although tools can help with assessing the extent of data quality problems in the operational systems, they cannot magically turn bad data into good data.

  • Cleansing data is a time-intensive and expensive process. Analyze, prioritize, and then choose your battles since cleansing 20 percent of the enterprise data may solve 80 percent of the information needs.

  • About 80 percent of the data transformation effort is spent on enforcing business data domain rules and business data integrity rules, and only about 20 percent of the effort is spent on technical data conversion rules.

  • The most common symptoms of dirty source data are data inconsistencies and overuse of data elements, especially in old flat files, where one data element can explicitly be redefined half a dozen times or can implicitly have half a dozen different meanings.

  • Why use automated software tools for data transformation? Data-profiling tools can significantly shorten the time it takes to analyze data domains. ETL tools can perform data type and length conversions and code translations in minutes, rather than hours when done manually. However, note that writing data-cleansing algorithms is still a manual effort and must be performed before the ETL tool can be utilized.

  • The ETL process will run into fewer problems if extensive source data analysis is performed ahead of time. Source data rules are usually discovered pro actively during requirements gathering, data analysis, and meta data repository analysis. They are discovered re actively during prototyping, application development, ETL development, and when loading the BI target databases during implementation.

12. Application Development

  • Let the business people define what types of analyses they will perform. Let their analysis needs drive the choice of tools to be deployed. And, let the requirements of the tools define the schema in which the data should be stored.

  • Keep abreast of the latest BI tools. Standards and tools are evolving constantly, and new tools and features leapfrog each other. The features that used to clearly delineate OLAP strengths and weaknesses (for example, indexing approaches, size barriers and scalability, API standardization) have become less important as tools mature and add features.

  • Determine and communicate the strengths and weaknesses of each access and analysis tool, whether it is a report writer, an OLAP tool, or another query tool.

  • Consider selecting a product suite that combines querying, reporting, and analysis capabilities for multidimensional analysis. Querying, reporting, and analysis are interrelated, interactive, and iterative and can be implemented with one tool. A business person should not have to switch between tools for different types of analyses, such as from the "what" analysis (query and reporting) to the "why" analysis (OLAP).

 

Don ' ts

 
  • Don't start with every possible dimension to satisfy every conceivable multidimensional query. Keep in mind that the more dimensions you have, the bigger the database, the more granular the facts, the longer it will take to precalculate the facts, the longer the ETL process will run, and the longer it will take to run reports and queries.

  • Don't plan to store every possible computation and ratio (fact) just because you can.

  • Don't forget to stress test the access and analysis components of the BI application. OLAP tools have their limitations, and you must find those limitations early in order to adjust your design before moving the BI application into production.

  • Avoid having completely different production and development platforms. In many organizations, the development platform is very "rich" in resources, but the production machines are very "meager." As a result, performance is outstanding during development and terrible in production.

 

Tips and Rules of Thumb

 
  • Find out how many dimensions the business people commonly use for slicing and dicing.

    - Two or three dimensions are easily grasped.

    - Four dimensions require training and practice for comprehension (the fourth dimension is usually the time dimension).

    - Five and six dimensions get difficult to comprehend and to use effectively.

    - Seven should be the maximum number of dimensions used.

  • In order to optimize performance, store and use precalculated facts for slicing-and-dicing analysis. A simple query should be able to run in less than 5 seconds. Business people are not interested in knowing about the delay in response time because of the sizes of the tables or because of inefficient indexing. They expect reasonable performance, even for complex queries.

  • The complexity of queries run against the BI target databases generally breaks down as follows :

    - About 80 percent of the queries are simple.

    - About 20 percent of the queries are complex.

  • The breakdown of the complexity of reports run against the BI target databases is generally the opposite of that for queries.

    - About 20 percent of the reports are simple.

    - About 80 percent of the reports are complex.

  • Knowledge workers and business analysts executing simple queries are like farmers ”they harvest their crops on a regular basis. In order to get better performance, prebuild tables to satisfy these routines.

  • Knowledge workers and business analysts executing complex queries are like gold miners ”they dig many mines before they strike gold, so they need a powerful platform to let them look for their data and manipulate it.

13. Data Mining

  • Plan to incorporate data mining into the BI decision-support environment. Although data mining can use data from operational files and databases, BI target databases usually provide the only place where data has been cleansed and consolidated across functional boundaries.

  • Work with sales or marketing groups to bring in data mining technology. These groups are the most likely groups to understand the business value of data mining and may champion the data mining effort.

  • Spend some time and effort researching data mining and knowledge discovery techniques and methods before selecting products or hiring consultants . The technology itself takes various approaches and employs high-level applied mathematics, advanced statistics, and artificial intelligence. Since few vendors offer a full range of data mining techniques and methods, most will be biased toward whatever is used in their products.

  • Hire consultants specializing in data mining to help set up your data mining environment and to help interpret the data mining results. Data mining is difficult; it requires a statistical background to interpret data mining results.

  • Start the data mining efforts with realistic expectations. Setting expectations too high may result in disappointment. It is commonly known that your satisfaction depends on your expectations.

 

Don ' ts

 
  • Don't forget to work with other departments to find additional potential applications for data mining. Sales and marketing are not the only departments that can benefit; credit risk, manufacturing, acquisition, billing, and human resources can also utilize data mining.

  • Don't get fooled by software vendors who claim to have data mining capabilities in their products, when their tools are just query tools that require the business analyst to be the analytical engine.

  • Don't believe software vendors who say that data mining does not offer any real advantages.

  • Don't assume that you must have a BI decision-support environment in order to implement data mining. Also, don't assume that every BI target database will be a suitable source for data mining.

  • Don't run data mining directly against operational files and operational databases. The performance impact on the operational systems would be enormous .

 

Tips and Rules of Thumb

 
  • Beware of using operational data for data mining. Operational data is often full of duplicates, inconsistencies, and errors. Using this data could throw off your data mining results.

  • Before building an analytical data model you have to prepare the data by classifying the variables. Variables could be discrete or continuous, qualitative or quantitative. If you find variables that have missing values, either eliminate the variables or replace the missing values with "most likely" values.

  • Compare your data mining results with industry statistics on a regular basis. Industry statistics are established periodically by using very large samples of data.

  • When using data mining, one or more of the following marketing opportunities should be realized in order to cost-justify the activity:

    - Cross-selling should be enabled or enhanced as a result of data mining efforts.

    - A large percentage of customers should be retained. These customers would have left if a defection pattern had not been noticed through data mining.

    - Marketing costs should be reduced as a result of data mining efforts.

    - New customers should be acquired as a result of data mining efforts.

    However, it is challenging to know when to attribute these changes to the data mining efforts and when these changes would have occurred naturally without the data mining efforts.

14. Meta Data Repository Development

  • Provide interactive and context-sensitive meta data query capabilities. Using the meta data repository should be easy and intuitive.

  • Keep the meta data repository in synch with the meta data contained in other tools and in the DBMS. Unfortunately, meta data repositories are still passive, which means that the synchronization has to be performed manually or through custom-written programs.

  • Actively maintain the meta data repository. If the content of the meta data repository becomes stale, it will affect the business people who are relying on the completeness and accuracy of the meta data. It will make the meta data repository questionable, and the business people may stop using it.

 

Don ' ts

 
  • Don't underestimate the effort required to build and maintain a meta data repository, especially if you choose a decentralized or distributed approach. Building gateways and portals are not trivial activities.

  • Don't attempt to develop a meta data repository with part-time people who will leave after the meta data repository is implemented. You need at least one dedicated person to maintain the meta data repository on an ongoing basis.

  • Don't overlook existing sources of meta data. Organizations are sometimes drowning in documentation. The challenge will be to separate the valuable current information from outdated information that is no longer valid.

 

Tips and Rules of Thumb

 
  • Developing a meta data repository solution is not an event ”it is an evolution. It starts with the first BI decision-support initiative and continues to evolve on an ongoing basis. Whenever new business data is added to the BI target databases, new meta data is added to the meta data repository (for example, names, definitions, and domains of the new business data). Whenever new functionality is added to the BI applications, new meta data is added to the meta data repository (for example, calculations and reconciliation totals for the new functions).

  • Be prepared to develop two types of meta data repository interfaces:

    - Access interface for business people and technicians

    - Tool interface for sharing meta data between the meta data repository and the tools where meta data is originated

  • Be sure to understand the import/export features of the tools from which you will extract meta data, for example, CASE, ETL, OLAP. Depending on how much or how little commonality there is among the tools, you may have to write several different tool interface programs.

  • The estimated size of a meta data repository could reach 5 percent of the total size of BI target databases, if done correctly and diligently.

  • Keeping the meta data repository up to date and synchronized with other tools, databases, and programs is a challenge. It will require collaboration from all stakeholders on all BI projects. The bigger challenge is to get the operational systems people to communicate any changes to their source data immediately. Changes to source data may also affect the ETL process and the design of the BI target databases.

  • Remember that for every day of coding, you will probably spend at least three days on testing.

15. Implementation

  • Work closely with the operations staff in creating the production program libraries and moving all programs into the production environment.

  • Include regular database maintenance activities in the job schedule, such as database backups and reorganizations.

  • Use appropriate monitoring and alert utilities to detect and diagnose resource problems early.

  • Work with your organization's security officer to prepare a security gap analysis matrix, and make sure that only authorized persons can access the data that is intended for their use.

 

Don ' ts

 
  • Don't try to roll out a BI application to the entire organization all at once. Use an incremental implementation approach, learn as you go, and make adjustments where needed (for example, provide training earlier in the process or change the log-on procedures).

  • Don't skimp on support staff. A BI decision-support environment is very complex, and the business people will need mentoring.

  • Don't offer open Internet access to the BI target databases without properly tested security measures. Improper security measures could allow competitors to view sensitive organizational data, allow unauthorized people to see customer data, or allow someone to misuse or abuse displayed information. In the worst case, you may also be liable for fines , legal consequences, and so on.

 

Tips and Rules of Thumb

 
  • The standard engineering maxim for sizing databases is to estimate the volume of your business data, then triple it to get a realistic size (if a lot of indices are created, quadruple it). The allocation of space is often distributed as follows:

    - 30 percent business data

    - 30 percent indices

    - 30 percent summaries, aggregates

    - 10 percent miscellaneous

  • Based on the experience of other organizations, you can assume that your BI decision-support environment will double in size every two years. Some believe this is a conservative estimate.

  • Many BI target databases are by their nature VLDBs. Therefore, stop conserving disk space if the BI application is helping the organization to make a profit.

  • Because of the immense size of VLDBs, backing up entire tables all at once may not be possible. Consider incremental ("net change") backups, high-speed mainframe backups, or partial backups of partitions.

  • Monitor computer utilization, network utilization, and personnel utilization closely to avoid unexpected bottlenecks.

16. Release Evaluation

  • Conduct the post-implementation review meeting offsite to eliminate interruptions.

  • Invite all project team members (from the core team as well as the extended team). You may invite additional stakeholders as observers if they have a vested interest in the lessons learned from the BI project.

  • Stay focused on the agenda during the meeting. Schedule a second meeting if the topics cannot be covered adequately in the time allocated.

  • Send out a preliminary agenda early and invite attendees to add items to it.

  • End on a positive note and with assigned action items for future development improvements and possible business process improvements.

 

Don ' ts

 
  • Don't pretend you are using the release concept if you don't follow the BI application release guidelines. To simply hurry an incomplete project to implementation with little thought, no structure, and poor quality is not in the spirit of the release concept.

  • Never accept the excuse that "nobody has time for a one-day post-implementation review." This indicates a lack of commitment from management to improve the quality and speed of the development process, which they claim they want.

  • Don't review voluminous documentation because that is unproductive. However, if some documents will be referenced during the meeting, send those documents out ahead of time so that the attendees can read them before the meeting.

  • Don't let the meeting deteriorate into a "finger-pointing" session.

  • Don't postpone the meeting beyond two to three months after implementation because people will become disinterested and will forget the issues.

  • Don't use anyone from the core team to be the facilitator or the scribe. Core team members are major players and contributors during the post-implementation review. Their level of participation would be diminished if they had to lead the session or take notes. Have a trained third-party facilitator conduct the meeting, and have a third-party scribe take notes during the meeting.

 

Tips and Rules of Thumb

 
  • Implement your BI applications using the release concept. It is much better to deliver high-quality , partially functioning application releases over time than to deliver a low-quality, completed application that is fraught with many defects and with dirty data. If the first release is successful, new requirements will emerge as the business people get used to the iterative development process.

  • Post-implementation reviews should always be performed after each BI project. The purpose for these reviews is to document "lessons learned" and to improve the BI development approach. These lessons could also be shared with other project teams and business managers.



Business Intelligence Roadmap
Business Intelligence Roadmap: The Complete Project Lifecycle for Decision-Support Applications
ISBN: 0201784203
EAN: 2147483647
Year: 2003
Pages: 202

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net