Data Mining Operations


Data mining tools enable statisticians to build analytical models, which the tools then use during data mining operations. A predictive engine asks for a list of input criteria and follows the steps and relationships from the analytical model to determine the most likely predictions . The results of data mining operations are tables and files loaded with analysis data that can be accessed with query and reporting tools. The four main data mining operations are described below.

Predictive and Classification Modeling

This data mining operation is used to forecast a particular event. It assumes that the analyst has a specific question he or she wants to ask. The model provides the answer by assigning ranks that indicate the likelihood of certain classes. For example, if a bank analyst wants to predict which customers are likely to leave, he or she has to prepare for predictive modeling by feeding data about two types of customers into the data mining tool.

  1. Customer data that indicates which customers have already left. This data is called "bad" data.

  2. Customer data that indicates which customers stayed and are long-time customers. This data is called "good" data.

The tool then sifts through the data to uncover the variables that identify classes of profiles of typical customers who leave and classes of profiles of typical customers who stay. The analysis results might be, "A female customer over 40 years of age who has an income greater than $150,000 per year and owns her own home has a 35 percent chance of leaving the bank."

Typical probing questions for predictive data mining are those that look for associations, patterns, trends, and facts in order to make decisions. For example:

  • Which offers will prompt customers to buy more? (Trend)

  • Which customers should be targeted for a new product? (Association)

  • What are the signs of fraudulent activity? (Pattern)

  • Which customers are better credit risks? (Fact)

Link Analysis

The link analysis data mining operation is a collection of mathematical algorithms and visualization techniques that identify and visually present links between individual records in a database. It is related to the associations discovery and sequential pattern discovery data mining techniques. For example, link analysis can determine which items usually sell together (e.g., cereal and milk).

Database Segmentation

This data mining operation is a set of algorithms that group similar records into homogeneous segments. It is related to the clustering data mining technique. This grouping is often the first step of data selection, before other data mining operations take place. For example, database segmentation may group airline passengers as either frequent flyer passengers or occasional passengers.

Deviation Detection

The deviation detection data mining operation is a set of algorithms that look for records that fall outside some expectation or norm and then suggest reasons for the anomalies. While deviation detection is mainly used for fraud detection, other uses include tracing the potential reasons for declines in customer numbers or sales. For example, "Customers who used to make frequent purchases but have not purchased anything in a long time were either transferred by their companies or have moved away from the area."



Business Intelligence Roadmap
Business Intelligence Roadmap: The Complete Project Lifecycle for Decision-Support Applications
ISBN: 0201784203
EAN: 2147483647
Year: 2003
Pages: 202

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net