You are on page 1of 41

Cluster Analysis

March 4, 2007 Prepared by Prof C Y Nimkar 1


Usage

• Usually used for grouping customers into


clusters that have similar behaviour /
attitude
• Helps marketer to decide target audience.
With this he can do
– Product differentiation…
– Offer differentiation…

March 4, 2007 Prepared by Prof C Y Nimkar 2


Product differentiation

Model 1 Target audience 1

Model 2 Target audience 2

Mobile handset
Model 3 Target audience 3

Model 4 Target audience 4

March 4, 2007 Prepared by Prof C Y Nimkar 3


Offer differentiation

Enjoyment Children

Chocolate Celebration Youngsters

Quick lunch Busy persons

March 4, 2007 Prepared by Prof C Y Nimkar 4


Steps in Cluster analysis

March 4, 2007 Prepared by Prof C Y Nimkar 5


Collect data

Specify method to compute distance


between two respondents

Specify method to form clusters

Perform cluster analysis

Obtain clusters

March 4, 2007 Prepared by Prof C Y Nimkar 6


Step 1 – Collect data

March 4, 2007 Prepared by Prof C Y Nimkar 7


Collect data

• Collect data on any variable to be used for


segmentation
• Usually they are:
– Customer needs
– Customer’s demographic data
– Customer’s opinion about product(s)…….
• Data should be preferably in interval scale

March 4, 2007 Prepared by Prof C Y Nimkar 8


Sample data on demographics

Customer Age Annual Income Area of house (Sq. ft)


No. (Rs. Lacs)

1 42 4.5 600
2 35 3.0 550
3 45 6.5 650
4 40 6.0 780
5 50 15.0 950

March 4, 2007 Prepared by Prof C Y Nimkar 9


Customer data CUSTOMER SPACE

Customer Age Annual Area of


No. Income house (Sq.
(Rs. ft)
Lacs) 16

1 42 4.5 600 14

12
2 35 3.0 550
10
INCOME
3 45 6.5 650 8

4 40 6.0 780 6

4
5 50 15.0 950
60 1000
50 900
800
40 700
AGE 600
AREA

Each customer is a point in space

Cluster analysis groups customers in clusters based on distances between them

March 4, 2007 Prepared by Prof C Y Nimkar 10


Step 2 – Specify distance method

March 4, 2007 Prepared by Prof C Y Nimkar 11


Distance method

• Following distance methods are available:


– Squared Euclidean distance method
– Euclidean distance method
– City-block (Manhattan) distance method
– Chebychev distance method

March 4, 2007 Prepared by Prof C Y Nimkar 12


Squared Euclidean/Euclidean distance
Customer Age Annual Income (Rs. Area of house (Sq. ft)
No. Lacs)

1 42 4.5 600
2 35 3.0 550

Squared Euclidean distance = (42-35)2 + (4.5-3.0)2 + (600-550)2


= 2551.25 units
Euclidean distance = √ (42-35)2 + (4.5-3.0)2 + (600-550)2
= √2551.25 = 50.51 units

March 4, 2007 Prepared by Prof C Y Nimkar 13


City Block (Manhattan) distance
Customer Age Annual Income (Rs. Area of house (Sq. ft)
No. Lacs)

1 42 4.5 600
2 35 3.0 550

• Itis the sum of absolute (positive) differences


• In our example this distance
= 42-35 + 4.5 – 3.0 + 600 – 550 = 58.5 units
March 4, 2007 Prepared by Prof C Y Nimkar 14
Chebychev distance
Customer Age Annual Income (Rs. Area of house (Sq. ft)
No. Lacs)

1 42 4.5 600
2 35 3.0 550

• It is the maximum absolute distance


• In our example this distance
= Max{ 42-35, 4.5 – 3.0, 600 – 550 }= 50 units
March 4, 2007 Prepared by Prof C Y Nimkar 15
Step 3 – Specify method to form clusters

March 4, 2007 Prepared by Prof C Y Nimkar 16


Methods to form clusters

• Following methods are available:


– Single linkage rule (nearest neighbour)
– Complete linkage rule (farthest neighbour)
– Between-groups linkage rule
– Within-groups linkage rule
– Centroid rule
– Ward’s method

March 4, 2007 Prepared by Prof C Y Nimkar 17


Single linkage rule
(nearest neighbours)

1 . 5 .
2 . .
6

3 . .
4 8. .
7

• Distance between clusters is the minimum


distance between a customer in one cluster
and a customer in other cluster
March 4, 2007 Prepared by Prof C Y Nimkar 18
Complete linkage rule
(farthest neighbours)

1 . 5 .
2 . .
6

3 . .
4 8. .
7

• Distance between clusters is the maximum


distance between a customer in one cluster
and a customer in other cluster
March 4, 2007 Prepared by Prof C Y Nimkar 19
Between - group linkage

.
3
1 . .
4
.
2 .
5

• It is the average distance between all pairs


of customers in two clusters

March 4, 2007 Prepared by Prof C Y Nimkar 20


Within - group linkage

I II III

• Within-group method considers distance between


pairs of customers after combining two clusters.
– For e.g. there are 3 clusters I, II and III
– After combining clusters I and II, calculate average
distance between pairs of customers in I and II.
– Do the same calculations after II and III and I and III
are combined
– Combine those clusters where average distance is least

March 4, 2007 Prepared by Prof C Y Nimkar 21


Centroid rule
Customer Age Annual Income (Rs. Lacs) Area of house (Sq. ft)
No.

1 42 4.5 600
2 35 3.0 550
3 45 6.5 650
Centroid (42+35+45)/3 = (4.5+3.0+6.5)/3 = 4.7 (600+550+650)/3 = 600
40.7

(40.7, 4.7, 600)


I
III
II

• Centroid of a cluster is a virtual customer with age 40.7, annual


income 4.7 lacs and area of house 600 sq. ft
• Distance between two clusters is distance between their centroids
• Two clusters are combined whose centroids are closest
March 4, 2007 Prepared by Prof C Y Nimkar 22
Ward’s method
• Distance is calculated between respondent and the cluster
centroid by squared Euclidean method and added
• Same calculation is done after combining two clusters
• Two clusters are joined that result in smallest increase

1 4
2 3
6

5 7 8

March 4, 2007 Prepared by Prof C Y Nimkar 23


Step 4 – Perform cluster analysis

March 4, 2007 Prepared by Prof C Y Nimkar 24


Perform cluster analysis

Perform Hierarchical cluster analysis

Perform K- means cluster analysis

March 4, 2007 Prepared by Prof C Y Nimkar 25


Perform Hierarchical cluster analysis
• Hierarchical cluster analysis technique gives number of clusters that
can be formed
Customer Age Annual Income (Rs. Area of house (Sq.
No. Lacs) ft)
Pair Distance
1 42 4.5 600 Distance (1, 2) 2551.25
matrix
2 35 3.0 550 (1, 3) 2513.00
by
3 45 6.5 650 (1, 4) 32406.25
squared
4 40 6.0 780 Euclidean (1, 5) 122674.25

5 50 15.0 950 method (2, 3) 10112.25


(2, 4) 52934.00
(2, 5) 160369.00
(3, 4) 16925.25
(3, 5) 90097.25
(4, 5) 29081.00

March 4, 2007 Prepared by Prof C Y Nimkar 26


Centroid method
Pair Distance
Pair Distance
(1, 2) 2551.25 Pair Distance
((1, 3), 2) 5703.50
(1, 3) 2513.00 ((1, 2, 3), 4) 32402.22
((1, 3), 4) 24037.50
(1, 4) 32406.25 ((1, 2, 3), 5) 122693.76
((1, 3), 5) 105757.50
(1, 5) 122674.25 (4, 5) 29081.00
(2, 4) 52934.00
(2, 3) 10112.25
(2, 5) 160369.00
(2, 4) 52934.00
(4, 5) 29081.00
(2, 5) 160369.00
(3, 4) 16925.25
(3, 5) 90097.25
(4, 5) 29081.00
Pair Distance

((1, 2, 3), (4, 5)) 70277.74

March 4, 2007 Prepared by Prof C Y Nimkar 27


Perform Hierarchical cluster analysis

March 4, 2007 Prepared by Prof C Y Nimkar 28


Perform Hierarchical cluster analysis

• Obtain Agglomeration schedule and


Dendrogram from software

March 4, 2007 Prepared by Prof C Y Nimkar 29


Centroid method
Pair Distance
Pair Distance
(1, 2) 2551.25 Pair Distance
((1, 3), 2) 5703.50
(1, 3) 2513.00 ((1, 2, 3), 4) 32402.22
((1, 3), 4) 24037.50
(1, 4) 32406.25 ((1, 2, 3), 5) 122693.76
((1, 3), 5) 105757.50
(1, 5) 122674.25 (4, 5) 29081.00
(2, 4) 52934.00
(2, 3) 10112.25
(2, 5) 160369.00
(2, 4) 52934.00
(4, 5) 29081.00
(2, 5) 160369.00
(3, 4) 16925.25
(3, 5) 90097.25
(4, 5) 29081.00
Pair Distance

Agglomeration Schedule
((1, 2, 3), (4, 5)) 70277.74
Stage Cluster First
Cluster Combined Appears
Stage
1
Cluster 1
1
Cluster 2
3
Coefficients
2513.000
Cluster 1
0
Cluster 2
0
Next Stage
2
Jumps in coeff. seen between
2 1 2 5703.500 1 0 4 stages 2 and 3 as well 3 and 4
3 4 5 29081.000 0 0 4
4 1 4 70277.806 2 3 0

3 or 2 clusters possible

March 4, 2007 Prepared by Prof C Y Nimkar 30


Dendrogram
Agglomeration Schedule

Stage Cluster First


Cluster Combined Appears
Stage Cluster 1 Cluster 2 Coefficients Cluster 1 Cluster 2 Next Stage
1 1 3 2513.000 0 0 2
2 1 2 5703.500 1 0 4
3 4 5 29081.000 0 0 4
4 1 4 70277.806 2 3 0

March 4, 2007 Prepared by Prof C Y Nimkar 31


Product:
Men’s Readymade Formal Shirt

March 4, 2007 Prepared by Prof C Y Nimkar 32


Short listing of shirt’s features

1. Fabric
2. Brand image
25 respondents were Formal 3. Style
Contacted discussion 4. Colour
5. Fitting
6. Price

March 4, 2007 Prepared by Prof C Y Nimkar 33


Questionnaire

Que.:
(a) Imagine you want to buy a formal shirt for yourself. Here are some features
of a shirt. Please rank them according to importance that you would attach to
them. The most important feature to you would get rank 1,the second most
important will get rank 2 and so on.

(b) Now allocate a total of 50 points to them. The points would be allocated
such that rank 1 will get the highest points, rank 2 will get second highest points
and so on. Please ensure that sum of points should be 50.

Rank Points
Fabric
Brand image
Style
Colour
Fitting

March 4, 2007 Price Prepared by Prof C Y Nimkar 34


TOTAL 50
Data

March 4, 2007 Prepared by Prof C Y Nimkar 35


Hierarchical Cluster Analysis - Agglomeration schedule
(Between group linkage/squared Euclidean distance)

Number of clusters = 2
March 4, 2007 Prepared by Prof C Y Nimkar 36
Perform K-means cluster analysis

March 4, 2007 Prepared by Prof C Y Nimkar 37


K-Means Cluster Analysis
Iteration History a

Change in Cluster
Centers
Iteration 1 2
1 7.603 7.347
2 .329 .258
3 .110 .083
4 .078 .058
5 .081 .061
6 .000 .000
a. Convergence achieved due to no or small
change in cluster centers. The maximum
absolute coordinate change for any center is
.000. The current iteration is 6. The minimum
distance between initial centers is 15.634.

March 4, 2007 Prepared by Prof C Y Nimkar 38


ANOVA
ANOVA

Cluster Error
Mean Square df Mean Square df F Sig.
FABRIC 2.976 1 2.492 98 1.194 .277
BRANDIMA 119.782 1 2.908 98 41.192 .000
STYLE 4.775 1 1.971 98 2.422 .123
COLOUR 18.688 1 2.972 98 6.289 .014
FITTING 11.184 1 1.673 98 6.685 .011
PRICE 363.735 1 2.621 98 138.781 .000
The F tests should be used only for descriptive purposes because the clusters have been chosen
to maximize the differences among cases in different clusters. The observed significance levels
are not corrected for this and thus cannot be interpreted as tests of the hypothesis that the cluster
means are equal.

Final Cluster Centers

Cluster
1 2
FABRIC 9 9
BRANDIMA 6 8 Clusters differ on:
STYLE 8 9
COLOUR 9 9
Price
FITTING
MarchPRICE
4, 2007
9 10
Prepared by Prof C Y Nimkar
Brand image39
9 5
Cluster Sizes

Number of Cases in each Cluster


Cluster 1 43.000
2 57.000
Valid 100.000
Missing .000

Sizes of both clusters are fairly same

Both segments are important to marketer

March 4, 2007 Prepared by Prof C Y Nimkar 40


Final Cluster Centers

Cluster
1 2
FABRIC 9 9
BRANDIMA 6 8
STYLE 8 9
COLOUR 9 9
FITTING 9 10
PRICE 9 5

Cluster 1:
Company can consider marketing
Price sensitive
shirts under two brands:
Cluster 2:
One brand should associate to
Brand image sensitive
Best value for money

Second brand should associate to


Status symbol

March 4, 2007 Prepared by Prof C Y Nimkar 41

You might also like