ASSOCIATION RULES DESCRIPTION | (ed.) Intelligent Agents for Data Mining and Information Retrieval

Given a transaction database DB, I={I ₁ ,I ₂ , ,I _m }is a set of itemsets with m different itemsets in DB. Each transaction T in DB is a set of items (i.e., itemsets), so T I.

Definition 1

Itemset P is defined as A ₁ ˆ A ₂ ˆ ˆ A _k , A _i ˆˆ I(i=1,2, ,k), and P containing k items is called k-itemset .

Definition 2

The support of itemset P is defined as ƒ (P/DB)=the support account containing P in DB/the total transaction amount in DB=A/DB/DB.

Definition 3

If A and B are two itemsets, and A ˆ B= , then the confidence of association rule A B in DB is defined as ˆ (A B /DB)= ƒ (A ˆ B /DB)/ ƒ (A /DB).

Definition 4

Let the minimum support be ƒ _min . Then the set of k frequent itemsets and the set of k non-frequent itemsets are defined separately as:

To mine efficacious association rules in DB, minimum support ƒ _min and minimum confidence ˆ _min must first be defined. Mining association rules find all of the association rules satisfying ƒ (A ˆ B /DB) ‰ ƒ _min and ˆ (A B /DB) ‰ ˆ _min in DB. Owing to the fact that the result of ˆ (A B /DB) can be gotten from the value of ƒ (A ˆ B /DB) and ƒ (A /DB), the key to mining association rule A B is to generate the set of k frequent itemsets. Therefore, the substantive study at present focuses on generating the set of k frequent itemsets (see Agrawal & Srikant, 1994; Feng et al., 1998; Zhang et al., 2000), which is the key to heightening the mining efficiency. We also focus on pattern match, which is the key to generating k frequent itemsets. The corresponding Apriori algorithm is as follows :

C ₁ ={candidate 1-itemsets}
L ₁ ={c ˆˆ C ₁ c.count ‰ƒ _min }
For (k=2; L _{k ˆ’ 1} ‰ ; k++)
C _k =apriori-gen(L _{k ˆ’ 1} )
Count_support(C _k )
L _k ={c ˆˆ C1c.counte ‰ƒ _min }
Resultset= ˆ L _k
Next

Here, C _k is candidate k-itemsets , L _k is k-itemsets , Count_support(C _k ) is to count the support count of candidate k-itemsets , C _k , apriori-gen(L _{k ˆ’ 1} ) is to generate C _k , which includes two steps. First, join L _{k ˆ’ 1} into k-itemsets . This is called the join step:

insert into C _k
select P.A ₁ , P.A ₂ , , P.A _{k ˆ’ 1} ,Q. A _{k ˆ’ 1}
from L _{k ˆ’ 1} P inner join L _{k ˆ’ 1} Q
where P.A ₁ = Q.A ₁ , P.A ₂ = Q.A ₂ , , P.A _{k ˆ’ 2} = Q.A _{k ˆ’ 2} , P.A _{k ˆ’ 1} < Q.A _{k ˆ’ 1}

Then, delete any (k ˆ’ 1)-subitemsets of C _k which not be included in L _{k ˆ’ 1} . This is called the prune step:

 For all itemsets c   C _k For all k-1_subitemsets s of c      If (s   L _k-1 ), then          Delete c from C _k and get the candidate  k-itemsets  C _k .

During the mining of association rules, pattern match mainly occurs in Count_support(C _k ), which is the account of the support count of candidate k-itemsets . The resulting account is a match between the k-itemsets constructed by all the k items, compounded by each transaction in transaction data set and the set of candidate k-itemsets C _k (k=1,2, ). From the above, we know the pattern match of mining association rules is the match between any k-itemsets from each transaction of transaction data set whose item number is not less than k and any one itemset in the set of candidate k-itemsets .