Data Mining Activities


The activities for data mining do not need to be performed linearly. Figure 13.2 indicates which activities can be performed concurrently. The list below briefly describes the activities associated with Step 13, Data Mining.

  1. State the business problem.

    Set goals before starting the data mining efforts, and prioritize the goals (such as increase profits, reduce costs, create innovative product strategies, or expand the market share). Time and money have to be invested in order to reach any of these goals. There also needs to be a commitment from management to implement a data mining solution at the organization.

  2. Collect the data.

    One of the most time-consuming activities of data mining is the collection of the appropriate types and quantities of data. In order to have correct representation, first identify all the data needed for analysis. This includes data stored in the operational databases, data from the BI target databases, and any external data that will have to be included. Once you have identified the source data, extract all pertinent data elements from these various internal and external data sources.

  3. Consolidate and cleanse the data.

    Redundantly stored data is more of a norm than an exception in most organizations. Therefore, the data from the various sources has to be consolidated and cleansed. If the internal data is to be supplemented by acquired external data, match the external data to the internal data, and determine the correct content.

  4. Prepare the data.

    Before building an analytical data model, you need to prepare the data. Part of data preparation is the classification of variables. The variables could be discrete or continuous, qualitative or quantitative. Eliminate variables with missing values or replace them with most likely values. It provides great insight to know the maximum, minimum, average, mean, median, and mode values for quantitative variables. In order to streamline the preparation process, consider applying data reduction transformations. The objective of data reduction is to combine several variables into one in order to keep the result set manageable for analysis. For example, combine education level, income, marital status, and ZIP code into one profile variable.

  5. Build the analytical data model.

    One of the most important activities of data mining is to build the analytical data model. An analytical data model represents a structure of consolidated, integrated, and time-dependent data that was selected and preprocessed from various internal and external data sources. Once implemented, this model must be able to continue "learning" while it is repeatedly used by the data mining tool and tuned by the data mining expert.

  6. Interpret the data mining results.

    Once the data mining operations are run and results are produced, the next major task is to interpret the results. Important things to consider during this interpretation are how easily the results can be acted upon and whether the results can be presented to business executives in a convincing, business-oriented way.

  7. Perform external validation of the results.

    Compare your results with published industry statistics. Identify the deviations from those statistics and determine the reasons for the deviations. Be sure you are using updated industry statistics since they change from time to time. Compare the selection criteria of your data to that of the industry statistics, and compare the time frame during which your data was selected to the time frame covered by the industry statistics. The selection criteria and time frame of your model and of the industry statistics must be similar.

  8. Monitor the analytical data model over time.

    Industry statistics are usually established by using very large samples. It is important to validate your analytical data model against industry statistics at regular intervals. Industry statistics change over time, and some industries have seasonal changes. In that case, adjust your internal analytical model.

Figure 13.2. Data Mining Activities

graphics/13fig02.gif



Business Intelligence Roadmap
Business Intelligence Roadmap: The Complete Project Lifecycle for Decision-Support Applications
ISBN: 0201784203
EAN: 2147483647
Year: 2003
Pages: 202

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net