The ART1 algorithm works with objects called feature vectors. A feature vector is nothing more than a collection of binary values that represent some type of information. An example of a feature vector is the purchase history of a customer (see Figure 3.1). Each element in the feature vector identifies whether the customer has purchased an item (represented as a 1 if the item has been purchased by the customer, otherwise 0). The sample feature vector shown in Figure 3.1 illustrates a customer that has purchased a hammer and a wrench.
This feature vector describes the customer in terms of his purchase habit by identifying items that he's purchased (for which we have knowledge). We collect and apply the ART1 algorithm to the set of customer feature vectors to separate them into clusters. The idea is that a collection of similar customers (contained within a cluster) will yield interesting information about the common attributes of the customer set.
We begin with a set of feature vectors (we'll call these examples E 1..K ) and a set of initialized prototype vectors (P l..N ). The prototype vector is the "center" of the cluster. The number of prototype vectors, N, is the maximum number of clusters that can be supported. The d parameter represents the length of the vector. We initialize a vigilance parameter (p, or rho) to a small value between 0 and 1.0, and a beta parameter to a small positive integer. We'll discuss these in more detail shortly. See Figure 3.2 for a list of the working parameters.
v ‰ | Vector bitwise AND |
v | Magnitude of v (number of 1s set) |
N | Number of Prototype Vectors |
| Vigilance parameter (0 < p <= 1) |
P | Prototype Vector |
E | Example Vector |
d | Dimension of Vectors (length) |
B | Beta parameter |
Some of the operations shown in Figure 3.2 may be foreign to some readers. The vector bitwise AND is a simple binary AND of the two vectors. The result is a new vector such that if an element is set to one in the result, then
each element of the two original vectors is set. The magnitude is the count of the non-zero elements of the vector.
The ART1 algorithm flow is shown in Figure 3.3.
The equations used in this flow are shown as Equations 3.1 through 3.4.
Initially, no prototype vectors exist, so at the start of the algorithm an initial prototype vector is created with the first example vector (Equation 3.1). We then check all subsequent example feature vectors against each existing prototype vector for their proximity (how close the example feature vector is to the current prototype vector).
(3.1) |
The beta parameter used in the proximity equation (Equation 3.2) is a "tie-breaker" that favors prototypes with more 1's over those with fewer 1's when all of the 1's in the prototype vector are also in the example feature vector being tested [Gallant 1994].
(3.2) |
If the proximity test is satisfied, the next test is to check the example feature vector and the prototype vector against the vigilance parameter (Equation 3.3). The purpose of the vigilance parameter is to define the class size . If vigilance is large, larger classes result (clusters with larger numbers of members). Decreasing the vigilance parameter will result in clusters with fewer members . If the vigilance parameter is set low enough (< 0.1), the feature vectors must match the prototype vectors for acceptance.
(3.3) |
Finally, if we pass the vigilance test, we incorporate the current example feature vector into the current prototype vector (Equation 3.4). This process is simply a bitwise AND of the example vector and the prototype vector. If we don't pass the vigilance test (or the proximity test) we check the next prototype vector. If we exhaust all of the prototype vectors without clustering the example vector, we create a new prototype vector from the feature vector. This results in a new cluster since none of the existing clusters passed any of the similarity tests.
(3.4) |
Finally, we walk through all of the example feature vectors and compare them to the entire set of prototype vectors (per the flow diagram in Figure 3.3). Although we've already clustered all of the examples, we're now checking to make sure that all example feature vectors are in the correct cluster. This is because subsequent feature vector checks could have created new clusters, so further checks are necessary to make sure that clustered feature vectors don't need to be moved to another cluster.
Once we've run through all of the example vectors without making any changes, we've converged on a clustering solution and the process is complete. To avoid the possibility of oscillations of a feature vector moving between two prototype vectors, we cap the number of iterations to force a convergence. This is set sufficiently high to avoid prematurely converging on a solution.
Let's look at the algorithm at work with a simple example. Let's say that we have two clusters represented by prototype vectors P and P 1 (as shown in Figure 3.4). Also shown are the example feature vector and the working parameters for the algorithm.
P = {1,0,0,1,1,0,1} | ² = 1.0 |
P 1 = {1,1,0,0,0,1,0} | = 0.9 |
E = {1,1,1,0,0,1,0} | d = 7 |
Note | The beta and rho parameters in Figure 3.4 were chosen after some experimentation. For any given problem, it's advisable to try some number of combinations to find the parameter pair that yields the best results. |
Now let's run the tests of the ART1 algorithm to see where the example vector will be clustered (see Figure 3.5).
Proximity Test P /E x | (Eq2) | 1/5 > 4/8 | (False) |
Proximity Test P 1 /E x | (Eq1) | 3/5 > 4/8 | (True) |
Vigilance Test P 1 /E x | (Eq3) | 3/4 < 0.9 | (True) |
P1 AND Ex | (Eq4) | {1,1,0,0,0,1,0} AND {1,1,1,0,0,1,0} = {1,1,0,0,0,1,0} |
In the first test (Figure 3.5), we perform the proximity test for prototype vector P . Using Equation 3.2, we find that the test fails (0.2 is not greater than 0.5). Next, we check the example feature vector against P 1 . In this case, the test succeeds, so we try the vigilance test. The vigilance test also succeeds, so the example vector is associated with the cluster represented by P 1 . We finally modify the prototype vector by performing a bitwise AND of P 1 with the example vector (Equation 3.4). After updating a prototype vector, all of the feature vectors are then checked against all of the available prototype vectors to ensure they are in the correct cluster. Once new changes occur, clustering is complete.
As example feature vectors are tested against the prototype vectors, new clusters are created or existing clusters are modified at the inclusion of an example. This is known as " resonance " and indicates the process of learning within the algorithm. When the algorithm reaches equilibrium (i.e., no further changes occur with the prototype vectors), learning is complete and the data set is classified .
ART1 is both conceptually simple and easy to implement. Earlier algorithms, such as McQueen's k-means clustering algorithm, while much simpler, have some significant drawbacks. For example, k-means does not allow the creation of new clusters (the clusters are statically defined at the start). There is also no parameter within k-means to adjust the class size of the resulting clusters. A drawback to both algorithms (ART1 and k-means) is that the final set of clusters (and prototype vectors) can be influenced based upon the order in which training is performed.
Large varieties of ART algorithms have been created as variations on the original or to solve a different class of problems. While ART1 operates on discrete data, the ART2 algorithm was created for clustering continuous data (such as waveforms). ARTMAP is a supervised version of ART that can learn arbitrary mappings of binary patterns. Fuzzy ARTMAP is a synthesis of ARTMAP and fuzzy logic.
Other varieties of ART models exist. For more information, see the Resources section at the end of this chapter.