**1. Definition**

**Association**: Given a set of transactions, find rules that will predict the occurrence of an item based on the occurrence of other items in the transaction.**Support count ():**frequency of occurrence of an itemset**Support:**fraction of transactions that contain an item set**Frequent Itemset:**An itemset whose support is greater or equal to a__minsup__threshold

**2. Association rule**

**Association rule****support (s)**- fraction of transactions that contains both X and Y
**confidence**- measures how often items in Y appear in transactions that contains X

**
**

**3. Association Rule Mining Task**

**
**

- Given a set of transactions T, the goal of association rule mining is to find all rules having
- support >= minsup threshold
- confidence >= minconf threshold
- Brute-force approach
- list all possible association rules
- compute the support and confidence for each rule
- prune rules that fail the minsup and minconf thresholds
- this is computationally prohibitive!

**4. Two-step Approach**

- Step 1: frequent itemset generation
- generate all itemset whose support >= minsup
- Step 2: rule generation
- generate high confidence rules from each frequent itemset, where each rule is a binary partitioning of a frequent itemset
- Drawback: frequent itemset generation is still computationally expensive

**5. Computational Complexity**

- Given d unique items
- total number of item set is:
- total number of possible association rules is:
- =

**6. Reducing Number of Candidates**

- Apriori principal
- If an itemset is frequent, then all of its subsets must also be frequent
- i.e.,
- Apriori algorithm
- Let k = 1
- generate frequent itemset of length 1
- repeat until no new frequent itemsets are identified
- generate length (k+1) candidate itemsets from length k frequent itemsets
- prune candidate itemsets containing subsets of length k that are infrequent
- count the support of each candidate by scanning the db
- eliminate candidates that are infrequent, leaving only those that are frequent

**7. Closed Itermset**

- An itemset is closed if none of its immediate supersets has the same support as the itemset.

**8. Effect of Support Distribution**

- How to set the appropriate minsup threshold?
- if minsup is set too high, we could miss items involving interesting rate items (e.g., expensive products)
- if minsup is set too low, it is computatinally expensive and the number of itemset is very large.
- Using a single minimum support threshold may not be effective.