Source Code Discussion


On the CD  

The ART1 source code includes the recommendation algorithm as discussed above. The complete source is available on the CD-ROM at ./software/ch3. The ART1 source can be compiled on Linux or on Windows using the Cygwin UNIX environment.

The source includes a sample data set within a structure at the head of the file. The data set is a set of feature vectors that represent the customer purchase records. Per the notation in the algorithm discussion, a one value represents a purchase of the item by the customer; a zero represents no purchase. The feature vectors (customer database) and other relevant structures are shown in Listing 3.1.

Listing 3.1: ART1 Data Structures for Personalization.
start example
 #define MAX_ITEMS                         (11) #define MAX_CUSTOMERS                     (10) #define TOTAL_PROTOTYPE_VECTORS           (5) const float beta = 1.0;           /* Small positive number */ const float vigilance = 0.9;      /* 0 <= vigilance < 1 */ int numPrototypeVectors = 0;      /* Total prototype vectors */ int prototypeVector[TOTAL_PROTOTYPE_VECTORS][MAX_ITEMS]; /* sumVector supports making recommendations. */ int sumVector[TOTAL_PROTOTYPE_VECTORS][MAX_ITEMS]; /* Number of occupants of the cluster */ int members[TOTAL_PROTOTYPE_VECTORS]; /* Identifies the cluster to which a particular customer belongs */ int membership[MAX_CUSTOMERS]; /*  * Feature vectors are contained within the database array. A one      in  * the field represents a product that the customer has purchased.    A  * zero represents a product not purchased by the customer.  */ /*       Hmr  Ppr  Snk  Scr  Pen  Kkt  Wrn  Pcl  Hth  Tpm  Bdr */ int database[MAX_CUSTOMERS][MAX_ITEMS] = {         { 0,   0,   0,   0,   0,   1,   0,   0,   1,   0,   0},         { 0,   1,   0,   0,   0,   0,   0,   1,   0,   0,   1},         { 0,   0,   0,   1,   0,   0,   1,   0,   0,   1,   0},         { 0,   0,   0,   0,   1,   0,   0,   1,   0,   0,   1},         { 1,   0,   0,   1,   0,   0,   0,   0,   0,   1,   0},         { 0,   0,   0,   0,   1,   0,   0,   0,   0,   0,   1},         { 1,   0,   0,   1,   0,   0,   0,   0,   0,   0,   0},         { 0,   0,   1,   0,   0,   0,   0,   0,   1,   0,   0},         { 0,   0,   0,   0,   1,   0,   0,   1,   0,   0,   0},         { 0,   0,   1,   0,   0,   1,   0,   0,   1,   0,   0} }; 
end example
 

The prototypevector represents the prototype vectors for each of the clusters. The sumVector is used solely for recommendation, and is not part of the standard ART1 algorithm. The members array specifies the number of members for a particular cluster and the membership array identifies to which particular cluster a customer feature vector belongs. Finally, the database structure defines the unique feature vectors for the customers.

The main routine performs the ART1 processing and then makes recommendations based upon the ART1 results. The main function is shown in Listing 3.2.

Listing 3.2: ART1 Personalization main Function.
start example
 int main() {   int customer;   srand( time( NULL ) );   initialize();   performART1();   displayCustomerDatabase();   for (customer = 0 ; customer < MAX_CUSTOMERS ; customer++) {     makeRecommendation( customer );   }   return 0; } 
end example
 
On the CD  

After setting the random seed using srand , the initialize function is called to clear and initialize the structures used by the ART1 and recommendation algorithm. The ART1 algorithm is represented by the performART1 function and recommendations are made using the makeRecommendation function. The function displayCustomerDatabase is used to show prototype vectors for each cluster created and the feature vector occupants of the clusters. This function can be viewed on the CD-ROM.

The initialize function in Listing 3.3 clears out the prototype vectors, sum vector as well as the members and membership arrays.

Listing 3.3: Initializing the Algorithm Structures.
start example
 void initialize( void ) {   int i, j;   /* Clear out prototype vectors */   for (i = 0 ; i < TOTAL_PROTOTYPE_VECTORS ; i++) {     for (j = 0 ; j < MAX_ITEMS ; j++) {       prototypeVector[i][j] = 0;       sumVector[i][j] = 0;     }     members[i] = 0;   }   /* Initialize example vectors to no membership to any cluster */   for (j = 0 ; j < MAX_CUSTOMERS ; j++) {     membership[j] = -1;   } } 
end example
 

Two support functions are shown in Listing 3.4. These are the vectorMagnitude and vectorBitwiseAnd functions.

Listing 3.4: Support Functions for the ART1 Algorithm.
start example
 int vectorMagnitude( int *vector ) {   int j, total = 0;   for (j = 0 ; j < MAX_ITEMS ; j++) {     if (vector[j] == 1)  total++;   }   return total; } void vectorBitwiseAnd( int *result, int *v, int *w ) {   int i;   for (i = 0 ; i < MAX_ITEMS ; i++) {     result[i] = (v[i] && w[i]);   }   return; } 
end example
 

The vectorMagnitude function simply counts the number of elements within the vector that are set (value of 1) and returns the total. The vectorBitwiseAnd performs a bitwise AND of two vectors, resulting in a new vector.

Two prototype vector routines are also provided to create a new cluster and update a cluster based upon changes that have occurred within it ( occupant moving in or out of the cluster). These functions are shown in Listing 3.5

Listing 3.5: Prototype Vector Manipulation Functions.
start example
 int createNewPrototypeVector( int *example ) {   int i, cluster;   for (cluster = 0 ; cluster < TOTAL_PROTOTYPE_VECTORS ;     cluster++) {     if (members[cluster] == 0) break;   }     if (cluster == TOTAL_PROTOTYPE_VECTORS) assert(0);     numPrototypeVectors++;     for (i = 0 ; i < MAX_ITEMS ; i++) {       prototypeVector[cluster][i] = example[i];     }     members[cluster] = 1;     return cluster;   }   void updatePrototypeVectors( int cluster )   {     int item, customer, first = 1;     assert( cluster >= 0);     for (item = 0 ; item < MAX_ITEMS ; item++) {       prototypeVector[cluster][item] = 0;       sumVector[cluster][item] = 0;     }     for (customer = 0 ; customer < MAX_CUSTOMERS ; customer++) {       if (membership[customer] == cluster) {         if (first) {           for (item = 0 ; item < MAX_ITEMS ; item++) {             prototypeVector[cluster][item] =                 database[customer][item];             sumVector[cluster][item] = database[customer][item];           }           first = 0;         } else {           for (item = 0 ; item < MAX_ITEMS ; item++) {             prototypeVector[cluster][item] =               prototypeVector[cluster][item] &&                   database[customer][item];            sumVector[cluster][item] += database[customer][item];           }         }       }   }   return; } 
end example
 

The first function, createNewPrototypeVector , takes an example feature vector and creates a new cluster with it. The example feature vector is simply copied to the prototype vector for the cluster. The number of members for the cluster is automatically initialized to one. Function updatePrototypeVectors recalculates the prototype vector based upon the occupants contained within it. Recall from Equation 3.4, that the prototype vector is nothing more than a bitwise AND of all of the feature vectors. The function in Listing 3.5 loads the first feature vector into the prototype vector and then performs the AND operation on the subsequent feature vectors within the cluster. The sumVector is also computed here, which is used solely for making recommendations and plays no part in the ART1 algorithm.

The ART1 algorithm is shown in Listing 3.6. The implementation includes debug output, though these portions have been removed from the listing. To enable the debug feature to watch the ART1 process, change the #undef to #define at the top of the source for DEBUG.

Listing 3.6: ART1 Algorithm.
start example
 int performART1( void ) {   int andresult[MAX_ITEMS];   int pvec, magPE, magP, magE;   float result, test;   int index, done = 0;   int count = 50;   while (!done) {     done = 1;     /* Walk through each of the customers */     for (index = 0 ; index < MAX_CUSTOMERS ; index++) {       /* Step 3 */       for (pvec = 0 ; pvec < TOTAL_PROTOTYPE_VECTORS ; pvec++) {       /* Does this vector have any members? */       if (members[pvec]) {         vectorBitwiseAnd( andresult,                            &database[index][0],                            &prototypeVector[pvec][0] );         magPE = vectorMagnitude( andresult );         magP  = vectorMagnitude( &prototypeVector[pvec][0] );         magE  = vectorMagnitude( &database[index][0] );         result = (float)magPE / (beta + (float)magP);         test = (float)magE / (beta + (float)MAX_ITEMS);         /* Equation 2 */         if (result > test) {           /* Test for vigilance acceptability (Equation 3)*/           if (((float)magPE/(float)magE) < vigilance) {             int old;             /* Ensure this is a different cluster */             if (membership[index] 1= pvec) {               /* Move the customer to the new cluster */               old = membership[index];               membership[index] = pvec;               if (old >= 0) {                 members[old]-;                 if (members[old] == 0) numPrototypeVectors-;               }               members[pvec]++;               /* Recalculate the prototype vectors for the old                * and new clusters.                */               if ((old >= 0) && (old < TOTAL_PROTOTYPE_VECTORS))               {                 updatePrototypeVectors( old );               }                 updatePrototypeVectors( pvec );                 done = 0;                 break;               } else {                 /* Already in this cluster */               }             } /* vigilance test */           }         }       } /* for vector loop */       /* Check to see if the current vector was processed */       if (membership[index] == -1) {         /* No prototype vector was found to be close to the example          * vector.  Create a new prototype vector for this example.          */         membership[index] =           createNewPrototypeVector( &database[index][0] );         done = 0;       }     } /* for customers loop */     if (!count-) break;   } /* !done */   return 0; } 
end example
 

The ART1 algorithm is a very simple nested loop where all customer feature vectors are compared against all cluster prototype vectors. When no further changes occur in the clusters (it's no longer resonating), the algorithm is complete and we return to the main function. The source is a simple implementation of Equations 3.1 through 3.4. The mag* variables are the magnitudes of the vectors and are calculated once for efficiency. The algorithm also includes a number of optimizations to ignore prototype vectors that have no members (empty clusters are not considered ). If the example feature vector is not classified during the cluster loop, a new cluster is automatically created. This handles Equation 3.1 in a clean fashion.

The final function, makeRecommendation , provides the recommendation service for a given customer (represented by their feature vector). See Listing 3.7.

Listing 3.7: Recommendation Algorithm.
start example
 void makeRecommendation ( int customer ) {   int bestItem = -1;   int val = 0;   int item;   for (item = 0 ; item < MAX_ITEMS ; item++) {     if ((database[customer][item] == 0) &&         (sumVector[membership[customer]][item] > val)) {       bestItem = item;       val = sumVector[membership[customer]][item];     }   }   printf("For Customer %d, ", customer);   if (bestItem >= 0) {     printf("The best recommendation is %d (%s)\n",             bestItem, itemName[bestItem]);     printf("Owned by %d out of %d members of this cluster\n",             sumVector[membership[customer]][bestItem],             members[membership[customer]]);   } else {     printf("No recommendation can be made.\n");   }   printf("Already owns: ");   for (item = 0 ; item < MAX_ITEMS ; item++) {     if (database[customer][item]) printf("%s ", itemName[item]);   }   printf("\n\n"); } 
end example
 

Per Figure 3.6, the algorithm looks for the intersection of the most purchased item in the cluster that was also not purchased by the current customer. The initial loop in makeRecommendation finds this intersection by keeping track of the most purchased item and recording the item number (column of the sumVector ). The remaining code simply emits textual information defining the recommendation (if one could be made). The function also emits all items that the customer has purchased as a check of the result.

Tweaking the Algorithm

Three important parameters can be tuned for the algorithm. The maximum number of clusters possible (defined as TOTAL_PROTOTYPE_VECTORS ), should be set high enough so that the algorithm can create a new cluster if it needs to. Since the ability to create new clusters is a major advantage for ART1, sufficient clusters should be available.

The beta and vigilance parameters are very important to the proper operation of ART1. The vigilance parameter is of considerable importance because it determines the quality of the recommendations that can be made. If the clusters are too large, recommendations might not be suitable because of the large variance of the feature vectors contained within the cluster. If the clusters are too small, the recommendations again might not be relevant because not enough feature vectors are present to find interesting similarities. The beta parameter, as defined in by Stephen Gallant [Gallant 1994] is a tiebreaker that favors prototypes that are more similar over those that have fewer similarities.




Visual Basic Developer
Visual Basic Developers Guide to ASP and IIS: Build Powerful Server-Side Web Applications with Visual Basic. (Visual Basic Developers Guides)
ISBN: 0782125573
EAN: 2147483647
Year: 1999
Pages: 175

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net