You are on page 1of 10

Association rule

Association rule learning is a popular and well researched method for discovering
interesting relations between variables in large databases. Piatetsky-Shapiro describes
analyzing and presenting strong rules discovered in databases using different measures of
interestingness. Based on the concept of strong rules, Agrawal et al. introduced association
rules for discovering regularities between products in large scale transaction data recorded by
point-of-sale (POS) systems in supermarkets. For example, the rule

found in the sales data of a supermarket would indicate


that if a customer buys onions and potatoes together, he or she is likely to also buy beef. Such
information can be used as the basis for decisions about marketing activities

Definition

Following the original definition by Agrawal et al. the problem of association rule mining is

defined as: Let be a set of n binary attributes called items. Let

be a set of transactions called the database. Each transaction in D


has a unique transaction ID and contains a subset of the items in I. A rule is defined as an

implication of the form where and . The sets of items (for


short itemsets) X and Y are called antecedent (left-hand-side or LHS) and consequent (right-
hand-side or RHS) of the rule respectively.

To illustrate the concepts, we use a small example from the supermarket domain. The set of
items is I = {milk,bread,butter,beer} and a small database containing the items (1 codes
presence and 0 absence of an item in a transaction) is shown in the table to the right. An

example rule for the supermarket could be meaning that if


milk and bread is bought, customers also buy butter.

History
The concept of association rules was popularised particularly due to the 1993 article of Agrawal
Statistically sound associations
One limitation of the standard approach to discovering associations is that by searching massive
numbers of possible associations to look for collections of items that appear to be associated, there is
a large risk of finding many spurious associations. These are collections of items that co-occur with
unexpected frequency in the data, but only do so by chance. For example, suppose we are considering
a collection of 10,000 items and looking for rules containing two items in the left-hand-side and 1
item in the right-hand-side. There are approximately 1,000,000,000,000 such rules. If we apply a
statistical test for independence with a significance level of 0.05 it means there is only a 5% chance of
accepting a rule if there is no association. If we assume there are no associations, we should
nonetheless expect to find 50,000,000,000 rules. Statistically sound association discovery controls
this risk, in most cases reducing the risk of finding any spurious associations to a user-specified
significance level.

Algorithms

Apriori algorithm

Apriori is the best-known algorithm to mine association rules. It uses a breadth-first search
strategy to counting the support of itemsets and uses a candidate generation function which
exploits the downward closure property of support.

Apriori is a classic algorithm for learning association rules. Apriori is designed to operate on
databases containing transactions (for example, collections of items bought by customers, or
details of a website frequentation).

The Apriori Algorithm is an influential algorithm for mining frequent itemsets for boolean
association rules.

Key Concepts :

• Frequent Itemsets: The sets of item which has minimum support (denoted by Li for ith-
Itemset).

• Apriori Property: Any subset of frequent itemset must be frequent.

• Join Operation: To find Lk , a set of candidate k-itemsets is generated by joining Lk-1 with
itself.
 In computer science and data mining, Apriori is a classic algorithm for learning
association rules.

 Apriori is designed to operate on databases containing transactions (for example,


collections of items bought by customers, or details of a website frequentation).

 The algorithm attempts to find subsets which are common to at least a minimum
number C (the cutoff, or confidence threshold) of the itemsets.

 Apriori uses a "bottom up" approach, where frequent subsets are extended one item at
a time (a step known as candidate generation, and groups of candidates are tested
against the data.

 The algorithm terminates when no further successful extensions are found.

 Apriori uses breadth-first search and a hash tree structure to count candidate item sets
efficiently.

Algorithm

Association rule mining is to find out association rules that satisfy the predefined minimum
support and confidence from a given database. The problem is usually decomposed into two
subproblems. One is to find those itemsets whose occurrences exceed a predefined threshold
in the database; those itemsets are called frequent or large itemsets. The second problem is to
generate association rules from those large itemsets with the constraints of minimal
confidence. Suppose one of the large itemsets is Lk, Lk = {I1, I2, … , Ik}, association rules
with this itemsets are generated in the following way: the first rule is {I1, I2, … , Ik-1}⇒
{Ik}, by checking the confidence this rule can be determined as interesting or not. Then other
rule are generated by deleting the last items in the antecedent and inserting it to the
consequent, further the confidences of the new rules are checked to determine the
interestingness of them. Those processes iterated until the antecedent becomes empty. Since
the second subproblem is quite straight forward, most of the researches focus on the first
subproblem. The Apriori algorithm finds the frequent sets L In Database D.

• Find frequent set Lk − 1.


• Join Step.
o Ck is generated by joining Lk − 1with itself
• Prune Step.
o Any (k − 1) -itemset that is not frequent cannot be a subset of a frequent k -itemset,
hence should be removed.

where

• (Ck: Candidate itemset of size k)


• (Lk: frequent itemset of size k)

Apriori Pseudocode

Apriori

large 1-itemsets }

while

Generate(Lk − 1)

for transactions

Subset(Ck,t)

for candidates

return

How Apriori Works

1. Find all frequent itemsets:


o Get frequent items:
 Items whose occurrence in database is greater than or equal to the
min.support threshold.
o Get frequent itemsets:
 Generate candidates from frequent items.
 Prune the results to find the frequent itemsets.
2. Generate strong association rules from frequent itemsets
o Rules which satisfy the min.support and min.confidence threshold.

Apriori Advantages/Disadvantages
 Advantages

 Uses large itemset property

 Easily parallelized

 Easy to implement

 Disadvantages

 Assumes transaction database is memory resident.

 Requires many database scans.


Eclat algorithm

Eclat is a depth-first search algorithm using set intersection.

FP-growth algorithm

FP-growth (frequent pattern growth) uses an extended prefix-tree (FP-tree) structure to store
the database in a compressed form. FP-growth adopts a divide-and-conquer approach to
decompose both the mining tasks and the databases. It uses a pattern fragment growth method
to avoid the costly process of candidate generation and testing used by Apriori.

The popular FP-growth Association Rule Mining (ARM) algorirthm (Han et al. 2000) is
applied to a particular kind of set enumerationj tree, the FP-tree, alsp developped by Han et
al. Both the FP-tree and the FP-growth algorithm are described in the following two sections.
A short critique is also provided in Section 3.

The essential difference between the original FP-growth algorithm and the LUCS-KDD
(Java) implementation is that a second tree structure, the T-tree (developped by Frans
Coenen, Paul Leng and Graham Goulbourne) is used to store the discovered frequent itemsets
and subsequently generate the desired ARs.

THE FP-TREE

A popular "preprocessing" tree structure is the FP-tree proposed by Han et al. (2000). The
FP-tree stores a single item (attribute) at each node, and includes additional links to facilitate
processing. These links start from a header table and link together all nodes in the FP-tree
which store the same "label", i.e. item.

To illustrate the method, considering the simple data set:

{1, 3, 4}
{2, 4, 5}
{2, 4, 6}
The construction process begins with an initial pass to count support for the single items.
Those that fail to meet the support threshold are eliminated, and the others ordered by
decreasing frequency. For our illustration we will assume that all 1-itemsets are adequately
supported, so the ordering will be {4,2,1,3,5,6}. We then pass through the dataset a second time
and produce an initial FP-tree. We commence by reading the first record in the dataset and
place this in the FP-tree (Figure 1(a) --- note the links from the header table). We then add the
second record; the first element of this is common with an existing node and so we add the
new node to the FP-tree structure as shown in Figure 1(b). We then add the last record
(Figure 1(c)) to complete the initial FP-tree.

Figure 1: Example FP tree

THE FP-GROWTH ALGORITHM

The algorithm, FP-growth, for mining the FP-tree structure is a recursive procedure during
which many sub FP-trees and header tables are created. The process commences by
examining each item in the header table, starting with the least frequent. For each entry the
support value for the item is produced by following the links connecting all occurrences of
the current item in the FP-tree. If the item is adequately supported, then for each leaf node a
set of ancestor labels is produced (stored in a prefix tree), each of which has a support
equivalent to the sum of the leaf node items from which it is generated. If the set of ancestor
labels is not null, a new tree is generated with the set of ancestor labels as the dataset, and the
process repeated. In our implementation all frequent itemsets thus discovered were placed in
a T-tree, thus providing fast access during the final stage of the ARM process, while at the
same time providing for the deletion of FP-subtrees and tables created "on route" as the FP-
growth algorithm progressed.

For example in Figure 1(c) we would start with attribute 6. This has a support of 1, and
assuming this is above the required suppoprt threshold, this would be identified as a frequent
set. There are two ancester labels so a new FP-tree is created (Figure 2(a)) using the ancestor
labels as the input set /bin/sh: -c: line 1: syntax error near unexpected token `(' /bin/sh: -c: line
1: `new FP-tree is created (Figure 2(a)) using the ancestor labels as the input set ' (note that
the support values for the label are equivalent to the leaf node). FP growth is applied again
and the frequent sets {2,6} discovered. There are still ancestor nodes so another tree is
produced (Figure 2(b)) and the process is continued until there are no more ancestor nodes
(Figure 2(c)).

Figure 2: FP growth algorithm

 Compress a large database into a Frequent-Pattern tree (FP-tree) structure


– highly condensed, but complete for frequent pattern mining
– avoid costly database scans during candidate generation.
 Develop an FP-tree-based frequent pattern mining method
– A divide-and-conquer methodology: decompose mining tasks into smaller
ones

Usually faster than Apriori

FP Growth Algorithm

 Algorithm 1: Constructing the FP tree


– Input:
 A transaction database DB
 minimum support threshold
– Output:
 FP tree
 Algorithm 2: Mining the FP tree
– Input:
 A database DB, represented by FP-tree constructed according to
Algorithm 1
 a minimum support threshold
– Output:

The complete set of frequent subgraphs (patterns)

Performance

 We can now work with a minimum support of 5%


 For 100 documents it takes around 6 minutes to generate all the
maximum_frequent_subgraphs
– Keyword: around 100
 For 500 documents its around half an hour.
– Keyword: around 200

One-attribute-rule

The one-attribute-rule, or OneR, is an algorithm for finding association rules. According to


Ross, very simple association rules, involving just one attribute in the condition part, often
work well in practice with real-world data. The idea of the OneR (one-attribute-rule)
algorithm is to find the one attribute to use to classify a novel datapoint that makes fewest
prediction errors.

For example, to classify a car you haven't seen before, you might apply the following rule: If
Fast Then Sportscar, as opposed to a rule with multiple attributes in the condition: If Fast
And Softtop And Red Then Sportscar.

The algorithm is as follows:


For each attribute A:
For each value V of that attribute, create a rule:
1. count how often each class appears
2. find the most frequent class, c
3. make a rule "if A=V then C=c"
Calculate the error rate of this rule
Pick the attribute whose rules produce the lowest error rate

OPUS search

OPUS is an efficient algorithm for rule discovery that, in contrast to most alternatives, does
not require either monotone or anti-monotone constraints such as minimum support. Initially
used to find rules for a fixed consequent it has subsequently been extended to find rules with
any item as a consequent. OPUS search is the core technology in the popular Magnum Opus
association discovery system.

Zero-attribute-rule

The zero-attribute-rule, or ZeroR, does not involved any attribute in the condition part, and
always returns the most frequent class in the training set. This algorithm is frequently used to
measure the classification success of other algorithms.

You might also like