ASSOCIATION RULES DESCRIPTION


Given a transaction database DB, I={I 1 ,I 2 , ,I m }is a set of itemsets with m different itemsets in DB. Each transaction T in DB is a set of items (i.e., itemsets), so T I.

Definition 1
start example

Itemset P is defined as A 1 ˆ A 2 ˆ ˆ A k , A i ˆˆ I(i=1,2, ,k), and P containing k items is called k-itemset .

end example
 
Definition 2
start example

The support of itemset P is defined as ƒ (P/DB)=the support account containing P in DB/the total transaction amount in DB=A/DB/DB.

end example
 
Definition 3
start example

If A and B are two itemsets, and A ˆ B= , then the confidence of association rule A   B in DB is defined as ˆ (A   B /DB)= ƒ (A ˆ B /DB)/ ƒ (A /DB).

end example
 
Definition 4
start example

Let the minimum support be ƒ min . Then the set of k frequent itemsets and the set of k non-frequent itemsets are defined separately as:

end example
 

To mine efficacious association rules in DB, minimum support ƒ min and minimum confidence ˆ min must first be defined. Mining association rules find all of the association rules satisfying ƒ (A ˆ B /DB) ƒ min and ˆ (A   B /DB) ˆ min in DB. Owing to the fact that the result of ˆ (A   B /DB) can be gotten from the value of ƒ (A ˆ B /DB) and ƒ (A /DB), the key to mining association rule A   B is to generate the set of k frequent itemsets. Therefore, the substantive study at present focuses on generating the set of k frequent itemsets (see Agrawal & Srikant, 1994; Feng et al., 1998; Zhang et al., 2000), which is the key to heightening the mining efficiency. We also focus on pattern match, which is the key to generating k frequent itemsets. The corresponding Apriori algorithm is as follows :

  1. C 1 ={candidate 1-itemsets}

  2. L 1 ={c ˆˆ C 1 c.count ‰ƒ min }

  3. For (k=2; L k ˆ’ 1 ‰  ; k++)

  4. C k =apriori-gen(L k ˆ’ 1 )

  5. Count_support(C k )

  6. L k ={c ˆˆ C1c.counte ‰ƒ min }

  7. Resultset= ˆ L k

  8. Next

Here, C k is candidate k-itemsets , L k is k-itemsets , Count_support(C k ) is to count the support count of candidate k-itemsets , C k , apriori-gen(L k ˆ’ 1 ) is to generate C k , which includes two steps. First, join L k ˆ’ 1 into k-itemsets . This is called the join step:

  • insert into C k
    select P.A 1 , P.A 2 , , P.A k ˆ’ 1 ,Q. A k ˆ’ 1
    from L k ˆ’ 1 P inner join L k ˆ’ 1 Q
    where P.A 1 = Q.A 1 , P.A 2 = Q.A 2 , , P.A k ˆ’ 2 = Q.A k ˆ’ 2 , P.A k ˆ’ 1 < Q.A k ˆ’ 1

Then, delete any (k ˆ’ 1)-subitemsets of C k which not be included in L k ˆ’ 1 . This is called the prune step:

 For all itemsets c   C  k  For all k-1_subitemsets s of c      If (s   L  k-1  ), then          Delete c from C  k  and get the candidate  k-itemsets  C  k  . 

During the mining of association rules, pattern match mainly occurs in Count_support(C k ), which is the account of the support count of candidate k-itemsets . The resulting account is a match between the k-itemsets constructed by all the k items, compounded by each transaction in transaction data set and the set of candidate k-itemsets C k (k=1,2, ). From the above, we know the pattern match of mining association rules is the match between any k-itemsets from each transaction of transaction data set whose item number is not less than k and any one itemset in the set of candidate k-itemsets .




(ed.) Intelligent Agents for Data Mining and Information Retrieval
(ed.) Intelligent Agents for Data Mining and Information Retrieval
ISBN: N/A
EAN: N/A
Year: 2004
Pages: 171

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net