# Association Rule Mining

1. Definition

• Association: Given a set of transactions, find rules that will predict the occurrence of an item based on the occurrence of other items in the transaction.
• Support count ( ): frequency of occurrence of an itemset
• Support: fraction of transactions that contain an item set
• Frequent Itemset: An itemset whose support is greater or equal to a minsup threshold
2. Association rule
• Association rule
• • support (s)
• fraction of transactions that contains both X and Y
• confidence
• measures how often items in Y appear in transactions that contains X
3. Association Rule Mining Task

• Given a set of transactions T, the goal of association rule mining is to find all rules having
• support >= minsup threshold
• confidence >= minconf threshold
• Brute-force approach
• list all possible association rules
• compute the support and confidence for each rule
• prune rules that fail the minsup and minconf thresholds
• this is computationally prohibitive!
4. Two-step Approach
• Step 1: frequent itemset generation
• generate all itemset whose support >= minsup
• Step 2: rule generation
• generate high confidence rules from each frequent itemset, where each rule is a binary partitioning of a frequent itemset
• Drawback: frequent itemset generation is still computationally expensive
5. Computational Complexity
• Given d unique items
• total number of item set is: • total number of possible association rules is:
• = 6. Reducing Number of Candidates
• Apriori principal
• If an itemset is frequent, then all of its subsets must also be frequent
• i.e., • Apriori algorithm
• Let k = 1
• generate frequent itemset of length 1
• repeat until no new frequent itemsets are identified
• generate length (k+1) candidate itemsets from length k frequent itemsets
• prune candidate itemsets containing subsets of length k that are infrequent
• count the support of each candidate by scanning the db
• eliminate candidates that are infrequent, leaving only those that are frequent
7. Closed Itermset

• An itemset is closed if none of its immediate supersets has the same support as the itemset.
8. Effect of Support Distribution
• How to set the appropriate minsup threshold?
• if minsup is set too high, we could miss items involving interesting rate items (e.g., expensive products)
• if minsup is set too low, it is computatinally expensive and the number of itemset is very large.
• Using a single minimum support threshold may not be effective.