You are on page 1of 2

Week 9: Market Basket Analysis

Introduction to Market Basket Analysis


- Market Basket Analysis allows for the analysis of how items are transacted together
- MBA also develops insights into consumer behaviour as well as allowing for formulation
of upselling and cross selling strategies based on purchasing patterns
> Cross Selling: when items are often bought together, they can be offered with a
bundled discount
> Up Selling: when items are bought together, more advertising can be done on a more
expensive model of the
same type of product near the other product, or to add features of warranties that
relate to the product (eg. put
a more expensive chocolate near pads or putting sales at different times to
continually drive sales)
- Definition: The collection of undirected data mining methods for discovering customer
purchasing patterns by
finding associations between different items in customers shopping carts; also known
as association rule
discovery/affinity analysis
MBA Terminology
- Items: Objects that we attempt to identify associations between.
- Item set: Set of Items, e.g. {A,B,C}
- Transactions: Instances of groups of items co-occurring together, for each transaction,
there is an item set
- Rules: Statements of the form {A,B}

{C}, the output of a MBA is generally a set

of rules that we can derive


insights and develop strategies from
- Support: The fraction of transactions in our data set that contain that item or item set,
good to have rules with high
support as these would be applicable to a large number of transactions (can also do a
time series analysis to plan
for season variations in support levels as well as apply profiling to the support.
> Support(A)=P(A) and Support(A

B)=P(AB) or P(BA) Symmetric

- Confidence: The likelihood that a rule is true for a new transaction that contains items
on the LHS of the rule,
> Confidence(A

B)= P(B|A) = P(BA)/P(A)

> Implement rules with high confidence


- Lift: The ratio by which the confidence of a rule exceeds the expected confidence

1 , AB are positively correlated


1, AB are independent
1, AB are negatively correlated

> For all values of lift which are> 1, Actual lift= Lift Value-1
> % Increase in those cases= (Lift Value-1)100
> Lift =

Confidence ( A B)
Support ( A B)
P( B A ) P(B A) P (AB)
=
=
=
=
Support( B)
P( A)
P( B)
Support ( A ) Support (B) P ( A ) P (B)
Basket Data

ID
001
001
002
003

Item
Apple
Orange
Apple
Orange

- Consist of collection of transaction IDs and Items bought in a


transaction
- Each item takes up one line and ID must be sorted.
- Final output of rule: LHSRHS [Confidence, Support] e.g. AB [90%,
10%]

Types of rules
- Actionable rules: DA GEM, contain high quality actionable information that was
previously not known/not common
knowledge
- Trivial rules: Information already well known within the business
- Inexplicable rules: no explanation available and non-actionable
A priori Algorithm
- Used to reduce the number of combinations (reduce the number of factors and hence
hopefully increase factor loading)
- Based on the A priori principle that if the support of an item set is large, then the
support of all of its subsets must also be
large; likewise if an item set is small, the support of its supersets must also be small.
- Support of an item set will never exceed the support of its subsets.
- By using the A priori algorithm, we progressively identify large item sets and eliminate
low support items early in the analysis
to reduce the amount of computation required to get to insightful actionable rules (high
support, high confidence)

You might also like