You are on page 1of 17

Motivation

Market segmentation involves viewing a heterogeneous market as a number of


smaller markets, in response to differing preferences, attributable to the desires of
consumers for more precise satisfaction of their varying wants.
Steps Associated with a Cluster Analysis
Decide on the clustering variables

Decide on the clustering procedure

Hierarchical methods Partitioning methods

Select a measure of
similarity or
dissimilarity

Choose a clustering
algorithm

Decide on the number of clusters

Validate and interpret the cluster solution


Decision Rules for Choosing Clustering Variables

Clustering variables
1. There should be significant differences between the “dependent” variable(s)
across the clusters

2. Avoid using an abundance of clustering variables, as this increases the odds that
the variables are no longer dissimilar
If the variables are highly correlated, specific aspects covered by these variables
will be overrepresented in the clustering solution

3. Keep the sample size in mind (rule of thumb: The sample size should be at least
2m, where m equals the number of clustering variables)

4. The data underlying the clustering variables should be of high quality


Agglomerative Methods are the Most Common Type of
Hierarchical Clustering Methods
Hierarchical methods:
• Agglomerative clustering:
• Clusters are consecutively

Step 1
Step 5

A, B, C, D, E

formed from objects


• At the beginning each object
represents an individual cluster
• The clusters are then

Step 2
Agglomerative clustering

Step 4

Divisive clustering
A, B C, D, E
sequentially merged according
to their similarity
• Divisive clustering:
• At the beginning all objects are

Step 3
Step 3

A, B C, D E
initially merged into a single
cluster
• This cluster is then gradually
Step 4
Step 2

A, B C D E split up
Partitioning methods:
• k-means, k-medoids…
Step 5
Step 1

A B C D E
For Ordinal and Metric Variables Different Distances Measures
can be Used (Hierarchical Methods)
Distance measures*:
C • Euclidean distance:
( x B - x C) + ( y B- yC )
2 2
d Euclidian(B,C) =
Brand loyalty (y)

Euclidean distance City-block distance • City-block distance / Manhattan


metric
d Cityblock
( B, C )  x x
B C
 y yB C
G

Chebychev distance
• Chebychev distance:
d Chebychev
( B, C )  max( x x , y  y
B C
)
B C

Price consciousness (x)

Variables measured on different scales or levels affect the results of the


analysis  Always standardize the data prior to the analysis
* Distance between customer B and customer C
Clustering Algorithms for Hierarchical Methods (I)

Single linkage Complete linkage Average linkage

The distance between The distance between The distance between


two clusters corresponds two clusters is based on two clusters is defined as
to the shortest distance the longest distance the average distance
between any two between any two between all pairs of the
members in the two members in the two two clusters’ members
clusters (= nearest clusters (= furthest
neighbor) neighbor)
Clustering Algorithms for Hierarchical Methods (II)

Centroid Ward’s method


The objects whose
merger increases the
overall within-cluster
variance to the smallest
possible degree, are
combined

The geometric center


(centroid) of each cluster Note that each algorithm has
is computed first. The
distance between the two
clusters equals the
distance between the two
centroids
! different properties making it
more or less suitable for
specific data constellations
(e.g. presence of outliers)
k-means Clustering Process (I)

Step 1 Step 2
A B A B
CC1 CC1

C C

D E D E
Brand loyalty (y)

Brand loyalty (y)


CC2 CC2

F F

G G

Price consciousness (x) Price consciousness (x)

• Decide on the number of clusters • Euclidean distances are computed from


(e.g. two) the centers to every single object
• The algorithm randomly selects a center • Each object is then assigned to the cluster
for each cluster (e.g. CC1 and CC2) center with the shortest distance to it
k-means Clustering Process (II)

Step 3 Step 4
A B A B
CC1 CC1‘
CC1‘

C
C

D E D E
Brand loyalty (y)

Brand loyalty (y)


CC2
CC2‘
CC2‘
F F

G G

Price consciousness (x) Price consciousness (x)

• Each cluster’s geometric center is • The distances from each object to the
computed (=mean values of the objects newly located cluster centers are
contained in the cluster regarding each of computed
the clustering variables) • The objects are again assigned to a certain
cluster
Generally, k-means is More Flexible Compared to
Hierarchical Methods

k-means or hierarchical methods?

• k-means is less affected by outliers and the presence of irrelevant clustering


+ •
variables
k-means can be applied to very large datasets

• k-means should only be used on interval or ratio-scaled data


• The researcher has to pre-specify the number of clusters to retain from the data
(but the number of clusters can be determined by running a hierarchical
- procedure before)
• The final clustering solution depends strongly on this initial step  run the
algorithm several times to check whether the results are stable
Decide on the Number of Clusters

• Scree plot

• Dendrogram

• Variance ratio criterion (VRC)


 Compute
 Choose solution which minimizes
k  VRCk 1  VRCk   VRCk  VRCk 1 

• A priori knowledge

• Practical considerations
 Are the results interpretable and meaningful?
 Are the segments manageable?
 Does the solution warrant strategic attention?
Example SPSS (I)

• Dataset thaltegos.sav
Example SPSS (II)

Move all variables


into the variables box

Select Pearson
Example SPSS (III)
Check the box
Agglomeration schedule
and continue.

Choose to display a
dendrogram

Specify the cluster method, the distance


measure and the type of standardization of
values
Here: Nearest neighbor, Euclidean
distances, Range -1 to 1 (by variable)

The Save option enables you to


save cluster memberships for a
single solution or a range of
solutions
Example SPSS (IV)

Agglomeration Schedule The last column on the very right tells


Cluster Combined Stage Cluster First Appears
Stage Cluster 1 Cluster 2 Coefficients Cluster 1 Cluster 2 Next Stage you in which stage of the algorithm
1 5 6 .149 0 0 2
2 5 7 .184 1 0 3 this cluster will appear next
3 4 5 .201 0 2 5
4 14 15 .213 0 0 6
5 3 4 .220 0 3 8
6
7
13
11
14
12
.267
.321
0
0
4
0
11
9
The icicle diagram provides similar
8
9
2
10
3
11
.353
.357
0
0
5
7
10
11
information as the agglomeration schedule:
10 1 2 .389 0 8 14
11 10 13 .484 9 6 13
12 8 9 .575 0 0 13
13 8 10 .618 12 11 14
14 1 8 .910 10 13 0

In the first stage, objects 5 and 6 are


merged at a distance of 0.149; the
resulting cluster is labeled as indicated by
the first object involved in this merger,
which is object 5
Example SPSS (V)

?!?
?!?
?!?

No clear elbow indicating a suitable number of clusters to retain


Example SPSS (VI)

Distances are rescaled to a range of 0 to 25

Indicates the rescaled


distance at which Audi A6
2.4 (object #14) and
BMW 525i (object #15)
are merged

You might also like