Given a transaction database DB, I={I 1 ,I 2 , ,I m }is a set of itemsets with m different itemsets in DB. Each transaction T in DB is a set of items (i.e., itemsets), so T I.
Itemset P is defined as A 1 ˆ A 2 ˆ ˆ A k , A i ˆˆ I(i=1,2, ,k), and P containing k items is called k-itemset .
The support of itemset P is defined as ƒ (P/DB)=the support account containing P in DB/the total transaction amount in DB=A/DB/DB.
If A and B are two itemsets, and A ˆ B= , then the confidence of association rule A B in DB is defined as ˆ (A B /DB)= ƒ (A ˆ B /DB)/ ƒ (A /DB).
Let the minimum support be ƒ min . Then the set of k frequent itemsets and the set of k non-frequent itemsets are defined separately as:
To mine efficacious association rules in DB, minimum support ƒ min and minimum confidence ˆ min must first be defined. Mining association rules find all of the association rules satisfying ƒ (A ˆ B /DB) ‰ ƒ min and ˆ (A B /DB) ‰ ˆ min in DB. Owing to the fact that the result of ˆ (A B /DB) can be gotten from the value of ƒ (A ˆ B /DB) and ƒ (A /DB), the key to mining association rule A B is to generate the set of k frequent itemsets. Therefore, the substantive study at present focuses on generating the set of k frequent itemsets (see Agrawal & Srikant, 1994; Feng et al., 1998; Zhang et al., 2000), which is the key to heightening the mining efficiency. We also focus on pattern match, which is the key to generating k frequent itemsets. The corresponding Apriori algorithm is as follows :
C 1 ={candidate 1-itemsets}
L 1 ={c ˆˆ C 1 c.count ‰ƒ min }
For (k=2; L k ˆ’ 1 ‰ ; k++)
C k =apriori-gen(L k ˆ’ 1 )
Count_support(C k )
L k ={c ˆˆ C1c.counte ‰ƒ min }
Resultset= ˆ L k
Next
Here, C k is candidate k-itemsets , L k is k-itemsets , Count_support(C k ) is to count the support count of candidate k-itemsets , C k , apriori-gen(L k ˆ’ 1 ) is to generate C k , which includes two steps. First, join L k ˆ’ 1 into k-itemsets . This is called the join step:
insert into C k
select P.A 1 , P.A 2 , , P.A k ˆ’ 1 ,Q. A k ˆ’ 1
from L k ˆ’ 1 P inner join L k ˆ’ 1 Q
where P.A 1 = Q.A 1 , P.A 2 = Q.A 2 , , P.A k ˆ’ 2 = Q.A k ˆ’ 2 , P.A k ˆ’ 1 < Q.A k ˆ’ 1
Then, delete any (k ˆ’ 1)-subitemsets of C k which not be included in L k ˆ’ 1 . This is called the prune step:
For all itemsets c C k For all k-1_subitemsets s of c If (s L k-1 ), then Delete c from C k and get the candidate k-itemsets C k .
During the mining of association rules, pattern match mainly occurs in Count_support(C k ), which is the account of the support count of candidate k-itemsets . The resulting account is a match between the k-itemsets constructed by all the k items, compounded by each transaction in transaction data set and the set of candidate k-itemsets C k (k=1,2, ). From the above, we know the pattern match of mining association rules is the match between any k-itemsets from each transaction of transaction data set whose item number is not less than k and any one itemset in the set of candidate k-itemsets .