Data Mining Techniques


Data mining techniques are specific implementations of algorithms used in data mining operations. The five most common data mining techniques are described briefly below.

Associations Discovery

This data mining technique is used to identify the behavior of specific events or processes. Associations discovery links occurrences within a single event. An example might be the discovery that men who purchase premium brands of coffee are three times more likely to buy imported cigars than men who buy standard brands of coffee. Associations discovery is based on rules that follow this general form: "If item A is part of an event, then X percent of the time (confidence factor), item B is part of the same event." For example:

  • If a customer buys snacks, there is an 85 percent probability that the customer will also buy soft drinks or beer.

  • If a person buys vacation airline tickets for an entire family, there is a 95 percent probability that he or she will rent a full-size car at the vacation location.

With the help of scanners , retail stores use this data mining technique to find buying patterns in grocery stores. Because of the context of a grocery store, associations discovery is sometimes called market basket analysis .

Sequential Pattern Discovery

This data mining technique is similar to associations discovery except that a sequential pattern discovery links events over time and determines how items relate to each other over time. For example, sequential pattern discovery might predict that a person who buys a washing machine may also buy a clothes dryer within six months with a probability of 0.7. To increase the chances above the predicted 70 percent probability, the store may offer each buyer a 10 percent discount on a clothes dryer within four months after purchasing a washing machine.

Classification

The classification technique is the most common use of data mining. Classification looks at the behavior and attributes of predetermined groups. The groups might include frequent flyers, high spenders, loyal customers, people who respond to direct mail campaigns , or people with frequent back problems (e.g., people who drive long distances every day). The data mining tool can assign classifications to new data by examining existing data that has already been classified and by using those results to infer a set of rules. The set of rules is then applied to any new data to be classified. This technique often uses supervised induction, which employs a small training set of already classified records to determine additional classifications. An example of this use is to discover the characteristics of customers who are (or are not) likely to buy a certain type of product. This knowledge would result in reducing the costs of promotions and direct mailings .

Clustering

The clustering technique is used to discover different groupings within the data. Clustering is similar to classification except that no groups have yet been defined at the outset of running the data mining tool. The clustering technique often uses neural networks or statistical methods . Clustering divides items into groups based on the similarities the data mining tool finds. Within a cluster the members are very similar, but the clusters themselves are very dis similar. Clustering is used for problems such as detecting manufacturing defects or finding affinity groups for credit cards.

Forecasting

The forecasting data mining technique comes in two flavors: regression analysis and time sequence discovery.

  • Regression analysis uses known values of data to predict future values or future events based on historical trends and statistics. For example, the sales volume of sports car accessories can be forecasted based on the number of sports cars sold last month.

  • Time sequence discovery differs from regression analysis in that it forecasts only time-dependent data values. For example, it determines the rates of accidents during a holiday season based on the number of accidents that occurred during the same holiday season in prior years . The property of time can be:

    - Work week versus calendar week

    - Holidays

    - Seasons

    - Date ranges and date intervals



Business Intelligence Roadmap
Business Intelligence Roadmap: The Complete Project Lifecycle for Decision-Support Applications
ISBN: 0201784203
EAN: 2147483647
Year: 2003
Pages: 202

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net