11 Chapter 3

Motivation
Market segmentation involves viewing a heterogeneous market as a number of

smaller markets, in response to differing preferences, attributable to the desires of
consumers for more precise satisfaction of their varying wants.
Steps Associated with a Cluster Analysis
Decide on the clustering variables
Decide on the clustering procedure
Hierarchical methods Partitioning methods
Select a measure of
similarity or
dissimilarity
Choose a clustering
algorithm
Decide on the number of clusters
Validate and interpret the cluster solution

Decision Rules for Choosing Clustering Variables
Clustering variables
1. There should be significant differences between the “dependent” variable(s)
across the clusters
2. Avoid using an abundance of clustering variables, as this increases the odds that
the variables are no longer dissimilar
If the variables are highly correlated, specific aspects covered by these variables
will be overrepresented in the clustering solution
3. Keep the sample size in mind (rule of thumb: The sample size should be at least
2m, where m equals the number of clustering variables)
4. The data underlying the clustering variables should be of high quality

Agglomerative Methods are the Most Common Type of
Hierarchical Clustering Methods
Hierarchical methods:
• Agglomerative clustering:
• Clusters are consecutively
Step 1
Step 5
A, B, C, D, E
formed from objects

• At the beginning each object
represents an individual cluster
• The clusters are then
Step 2
Agglomerative clustering
Step 4
Divisive clustering
A, B C, D, E
sequentially merged according
to their similarity
• Divisive clustering:
• At the beginning all objects are
Step 3
Step 3
A, B C, D E
initially merged into a single
cluster
• This cluster is then gradually
Step 4
Step 2
A, B C D E split up
Partitioning methods:
• k-means, k-medoids…
Step 5
Step 1
A B C D E
For Ordinal and Metric Variables Different Distances Measures
can be Used (Hierarchical Methods)
Distance measures*:
C • Euclidean distance:
( x B - x C) + ( y B- yC )
2 2
d Euclidian(B,C) =
Brand loyalty (y)
Euclidean distance City-block distance • City-block distance / Manhattan

metric
d Cityblock
( B, C )  x x
B C
 y yB C
G
Chebychev distance
• Chebychev distance:
d Chebychev
( B, C )  max( x x , y  y
B C
)
B C
Price consciousness (x)
Variables measured on different scales or levels affect the results of the

analysis  Always standardize the data prior to the analysis
* Distance between customer B and customer C
Clustering Algorithms for Hierarchical Methods (I)
Single linkage Complete linkage Average linkage
The distance between The distance between The distance between

two clusters corresponds two clusters is based on two clusters is defined as
to the shortest distance the longest distance the average distance
between any two between any two between all pairs of the
members in the two members in the two two clusters’ members
clusters (= nearest clusters (= furthest
neighbor) neighbor)
Clustering Algorithms for Hierarchical Methods (II)
Centroid Ward’s method

The objects whose
merger increases the
overall within-cluster
variance to the smallest
possible degree, are
combined
The geometric center

(centroid) of each cluster Note that each algorithm has
is computed first. The
distance between the two
clusters equals the
distance between the two
centroids
! different properties making it
more or less suitable for
specific data constellations
(e.g. presence of outliers)
k-means Clustering Process (I)
Step 1 Step 2
A B A B
CC1 CC1
C C
D E D E
Brand loyalty (y)
Brand loyalty (y)

CC2 CC2
F F
G G
Price consciousness (x) Price consciousness (x)
• Decide on the number of clusters • Euclidean distances are computed from

(e.g. two) the centers to every single object
• The algorithm randomly selects a center • Each object is then assigned to the cluster
for each cluster (e.g. CC1 and CC2) center with the shortest distance to it
k-means Clustering Process (II)
Step 3 Step 4
A B A B
CC1 CC1‘
CC1‘
C
C
D E D E
Brand loyalty (y)
Brand loyalty (y)

CC2
CC2‘
CC2‘
F F
G G
Price consciousness (x) Price consciousness (x)
• Each cluster’s geometric center is • The distances from each object to the
computed (=mean values of the objects newly located cluster centers are
contained in the cluster regarding each of computed
the clustering variables) • The objects are again assigned to a certain
cluster
Generally, k-means is More Flexible Compared to
Hierarchical Methods
k-means or hierarchical methods?
• k-means is less affected by outliers and the presence of irrelevant clustering

+ •
variables
k-means can be applied to very large datasets
• k-means should only be used on interval or ratio-scaled data

• The researcher has to pre-specify the number of clusters to retain from the data
(but the number of clusters can be determined by running a hierarchical
- procedure before)
• The final clustering solution depends strongly on this initial step  run the
algorithm several times to check whether the results are stable
Decide on the Number of Clusters
• Scree plot
• Dendrogram
• Variance ratio criterion (VRC)

 Compute
 Choose solution which minimizes
k  VRCk 1  VRCk   VRCk  VRCk 1 
• A priori knowledge
• Practical considerations
 Are the results interpretable and meaningful?
 Are the segments manageable?
 Does the solution warrant strategic attention?
Example SPSS (I)
• Dataset thaltegos.sav
Example SPSS (II)
Move all variables

into the variables box
Select Pearson
Example SPSS (III)
Check the box
Agglomeration schedule
and continue.
Choose to display a
dendrogram
Specify the cluster method, the distance

measure and the type of standardization of
values
Here: Nearest neighbor, Euclidean
distances, Range -1 to 1 (by variable)
The Save option enables you to

save cluster memberships for a
single solution or a range of
solutions
Example SPSS (IV)
Agglomeration Schedule The last column on the very right tells

Cluster Combined Stage Cluster First Appears
Stage Cluster 1 Cluster 2 Coefficients Cluster 1 Cluster 2 Next Stage you in which stage of the algorithm
1 5 6 .149 0 0 2
2 5 7 .184 1 0 3 this cluster will appear next
3 4 5 .201 0 2 5
4 14 15 .213 0 0 6
5 3 4 .220 0 3 8
6
7
13
11
14
12
.267
.321
0
0
4
0
11
9
The icicle diagram provides similar
8
9
2
10
3
11
.353
.357
0
0
5
7
10
11
information as the agglomeration schedule:
10 1 2 .389 0 8 14
11 10 13 .484 9 6 13
12 8 9 .575 0 0 13
13 8 10 .618 12 11 14
14 1 8 .910 10 13 0
In the first stage, objects 5 and 6 are

merged at a distance of 0.149; the
resulting cluster is labeled as indicated by
the first object involved in this merger,
which is object 5
Example SPSS (V)
?!?
?!?
?!?
No clear elbow indicating a suitable number of clusters to retain

Example SPSS (VI)
Distances are rescaled to a range of 0 to 25
Indicates the rescaled

distance at which Audi A6
2.4 (object #14) and
BMW 525i (object #15)
are merged

11 Chapter 3

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

11 Chapter 3

Uploaded by

Copyright:

Available Formats

Motivation

Market segmentation involves viewing a heterogeneous market as a number of

Decide on the clustering procedure

Hierarchical methods Partitioning methods

Decide on the number of clusters

Validate and interpret the cluster solution

4. The data underlying the clustering variables should be of high quality

formed from objects

Euclidean distance City-block distance • City-block distance / Manhattan

Price consciousness (x)

Variables measured on different scales or levels affect the results of the

Single linkage Complete linkage Average linkage

The distance between The distance between The distance between

Centroid Ward’s method

The geometric center

Brand loyalty (y)

Price consciousness (x) Price consciousness (x)

• Decide on the number of clusters • Euclidean distances are computed from

Brand loyalty (y)

Price consciousness (x) Price consciousness (x)

k-means or hierarchical methods?

• k-means is less affected by outliers and the presence of irrelevant clustering

• k-means should only be used on interval or ratio-scaled data

• Variance ratio criterion (VRC)

Move all variables

Specify the cluster method, the distance

The Save option enables you to

Agglomeration Schedule The last column on the very right tells

In the first stage, objects 5 and 6 are

No clear elbow indicating a suitable number of clusters to retain

Distances are rescaled to a range of 0 to 25

Indicates the rescaled

You might also like