14.2 Data Mining | Internet-Enabled Business Intelligence


Team-Fly

	Internet-Enabled Business Intelligence By William A. Giovinazzo
	Table of Contents

	Chapter 14. Personalization

14.2 Data Mining

Data mining is at the heart of the recommendation engine. Data mining is a process of discovery. It detects patterns in seemingly random data. We noted how there are different types of data mining: decision trees, genetic modeling, and neural networks. There are situations in which one method may be more appropriate than another. The different data mining methods are not mutually exclusive; rather, they work in concert with one another. In the following subsections, we look at how we can apply these different data mining methods to gain a better understanding of our customers.

14.2.1 THE WEB SITE THAT LEARNS

The first data mining method applicable to mining the clickstream is neural networks. These are powerful data mining tools when working with numerical data. As you will recall from Chapter 3, the human brain is a neural network. The brain is composed of neurons, each of which could be thought of as a separate processor. The output of one neuron acts as the input to the next . Figure 14.3 shows the structure of such a network. Each node in the network generates an output, which is a nonlinear function of the node's inputs. The inputs in the structure are weighted according to the pattern recognition formula. During the training process, the results generated by the network are compared with the known results, and the input weights are adjusted accordingly . The training teaches the network to learn patterns so that it is able to recognize these same patterns later.

Figure 14.3. A neural network.

graphics/14fig03.gif

We could imagine a marketing strategist sitting at a terminal watching a user browse our site. Based on his or her experience, the strategist dynamically provides recommendations and influences the prospective customer's path through the store. The trouble with such a scenario is that on the Web we are not looking at one or two customers viewing a handful of products, but at hundreds and thousands of customers viewing hundreds of different products.

Let's look at an example of how this might work. A first-time customer comes to our store from the site of a partner that sells appliances, Friendly Alf's World of Appliances. Within our store we have three departments: kitchen furniture, bed and bath, and formal dining rooms. Our neural network has learned that visitors coming to our site from Friendly Alf's World of Appliances typically come to purchase kitchen furniture. The recommendation engine therefore recommends a variety of kitchen sets that might be of interest to the customer. Instead, the customer visits the bed and bath department, browsing the discount Early American bedroom sets. The customer spends most of her time examining sets with a lighter colored wood stain . The neural network recognizes that the customer may be interested in bedroom sets and recommends low-cost, high-margin bedroom sets. In addition, it changes the kitchen recommendation to include discount kitchen sets.

The customer responds as the recommendation engine predicted and goes to the kitchen area. While there, she proceeds to the higher priced sets and browses several Mission-style dining room sets. The neural network again sees that for a customer who just came from a low-cost bedroom set, such a path will typically not result in a sale. The recommendation engine therefore recommends an inexpensive Mission-style kitchen set. This ultimately leads to a purchase by the customer.

This process involved an anonymous customer, someone who never visited our site before. If this were a returning customer, we may have remembered that he or she had purchased high-priced furniture in the past. We could then have recommended the more expensive items. Perhaps we would have realized that the customer had purchased bedroom furniture from us in the past and we could have recommended lamps that were popular with other customers who purchased that same bedroom set.

14.2.2 THE WEB SITE THAT DIFFERENTIATES

Each data mining method has uses that complement one another. As in the previous section, neural networks were well suited to working with numerical data, determining the probability that a customer would or would not buy a particular product based on some path. Decision trees have a different strength. These data mining algorithms work well with demographic data, where each record has many fields and the data sets have large numbers of attributes. There are a number of types of decision trees.

Classification and Regression Trees (CART) are binary, meaning that there are two outputs from every node of the tree. At each node, a variable is tested to see if it is less than or greater than a split value to determine if the left or right branch is to be taken. The key obviously is to determine the appropriate split value. The CART algorithm recursively searches the variables to define the most appropriate value for the split. After a tree is fully defined, the algorithm prunes the tree by removing any nodes that reduce the accuracy of the tree.

An alternative method, Chi-Square Automatic Interaction Detection (CHAID), is useful for the categorization. The CHAID decision tree is non-binary and is useful for dealing with categorization. In this case, we would see multiple split values that would encompass the domain of values for a particular variable. Age, for example, could be divided into 10-year increments running from 0 to 120. Figure 14.4 demonstrate the difference between a CART and CHAID decision tree.

Figure 14.4. CART and CHAID decision trees.

graphics/14fig04.gif

In Figure 14.4 (a), we show a CART decision tree based on income. Customers whose annual income exceeds $50,000 have a 67.8 percent probability of purchasing our product, while those whose income is less than $50,000 have only a 25.2 percent probability. CHAID, on the other hand, branches into two or more nodes; see Figure 14.4 (b). Using CHAID, we break the income of our customers down into multiple categories, in this case, not just customers with incomes greater than $50,000, but customers whose income is between $75,000 and $100,000.

Human wants and desires are complex and seldom driven by individual motive. We can see that the simple tree structures shown in these examples are not sufficient to fully describe the demographic data analyzed to predict if someone is going to buy or not buy a product. We can, however, create multiple levels to these structures to reflect the different demographic characteristics that come into play when predicting customer behavior. Figure 14.5 presents a more complex decision tree, taking multiple factors into consideration. We have added the variable gender to our analysis, giving a more complete description of visitors who are likely to purchase a particular product.

Figure 14.5. CHAID customer segmentation.

graphics/14fig05.gif

14.2.3 THE WEB SITE WITH GENES

The final data mining method we discuss is genetic modeling. The most promising use of this modeling method is in conjunction with other data mining methods to find the fittest model. Genetic modeling differs slightly from other data mining algorithms in this regard. While data mining is a process of discovery, genetic modeling optimizes the output of other data mining models. Their discovery is the most optimized model. We often see them used in conjunction with neural networks.

Genetic modeling finds the fittest model through a natural evolution of the models presented to it. The process starts with a genetic pool, a set of possible solutions. The model then reproduces, selecting from the genetic pool the fittest solutions. The genetic algorithm then takes the fittest solutions and exchanges information between them. This crossover process creates solutions with different genetic profiles than the previous generation. This new generation creates new and diverse offspring. Mutation , an essential element in evolution, provides a means of creating variation. Where crossover simply creates new genetic combinations, mutation creates new values to be included in these combinations.

Both crossover and mutation create new populations, solutions with new characteristics. The fitness of these new solutions, the subsequent generations, is compared with the previous generations. If the newer solutions are more fit, they replace their predecessors. If they are less fit, they die off. We use this methodology with the solutions derived from the neural network, applying crossover and mutation to the various solutions until we arrive at an optimized solution. The basis of a generation's fitness is how well the data set matches a particular cluster. Remember that genetic algorithms are most useful when clustering data. The fitness of a particular generation is how well the characteristics of that generation match the other data sets within the cluster.


Team-Fly

Top