You are on page 1of 25

Chapter 9

Cluster Analysis:
Overview and Applications

Cluster Analysis Overview
What is it?

Why use it?
Cluster Analysis

. . . groups objects (respondents, products,
firms, variables, etc.) so that each object is
similar to the other objects in the cluster and
different from objects in all the other clusters.
Two Variable Cluster Analysis
1
3
2
Low Frequency of Going to Fast Food Restaurants High
L
o
w







F
r
e
q
u
e
n
c
y

o
f

E
a
t
i
n
g

O
u
t









H
i
g
h

Cluster Analysis of Eating Out Statements
I eat out as often as I can.
I eat out at fast food restaurants at least once a week.
I prefer restaurants that have quick service.
Eating at home is better than eating out.
I prefer to eat at restaurants that have a nice atmosphere.
I prefer restaurants with the highest quality food.
Objective: Identify groups that maximize ratio of

between groups variance large
within groups variance small
=
1 7
7-point Agree/Disagree Scale
Three Cluster Diagram Showing
Between-Cluster and Within-Cluster Variation
Between-Cluster Variation = Maximize
Within-Cluster Variation = Minimize
High






Low
Low High
Scatter Diagram for Cluster Observations
F
r
e
q
u
e
n
c
y

o
f

e
a
t
i
n
g

o
u
t

Frequency of going to fast food restaurants
High






Low
Low High

Scatter Diagram for Cluster Observations
Frequency of going to fast food restaurants
F
r
e
q
u
e
n
c
y

o
f

e
a
t
i
n
g

o
u
t

High





Low
Low High

Scatter Diagram for Cluster Observations
Frequency of going to fast food restaurants
F
r
e
q
u
e
n
c
y

o
f

e
a
t
i
n
g

o
u
t

High






Low
Low High

Scatter Diagram for Cluster Observations
Frequency of going to fast food restaurants
F
r
e
q
u
e
n
c
y

o
f

e
a
t
i
n
g

o
u
t

Comparison of Score Profiles for Factor
Analysis and Hierarchical Cluster Analysis
Variables
Respondent 1 2 3
A 7 6 7
B 6 7 6
C 4 3 4
D 3 4 3


7
6
5
4
3
2
1
Respondent A
Respondent B
Respondent C
Respondent D
S
c
o
r
e

What Can We Do With Cluster Analysis?
1. Determine if statistically different clusters
exist.

2. Identify the meaning of the clusters.

3. Explain how the clusters can be used.
Research Design Considerations
in Using Cluster Analysis:
Outliers.
Similarity/Distance Measures.
Standardizing the Data.
Cluster Analysis Assumptions:
Representative Sample.

Minimal Multicollinearity.
Three Basic Questions:
1. How to measure similarity?

2. How to form clusters?
(extraction method)

3. How many clusters?
Answers to First Two Basic Questions:
1. How to measure similarity?
Distance squared Euclidean.

2. How to form clusters?
Hierarchical Wards method.
Third Basic Question: How many clusters?
1. Run cluster; examine solutions for two,
three, four, etc. clusters ??

2. Select number of clusters based on a priori
criteria, practical judgement, common sense,
theoretical foundations, and statistical
significance.

Steps in Cluster Analysis:
1. Identify the variables to be clustered.
2. Determine if clusters exist. To do so, verify the
clusters are statistically different and theoretically
meaningful (a logical name can be assigned).
3. Make an initial decision on how many clusters to use.
4. Where possible, validate clusters using an external
variable.
5. Describe the characteristics of the derived clusters
using demographics, psychographics, etc.
Variable Description Variable Type
Work Environment Measures
X
1
I am paid fairly for the work I do. Metric
X
2
I am doing the kind of work I want. Metric
X
3
My supervisor gives credit an praise for work well done. Metric
X
4
There is a lot of cooperation among the members of my work group. Metric
X
5
My job allows me to learn new skills. Metric
X
6
My supervisor recognizes my potential. Metric
X
7
My work gives me a sense of accomplishment. Metric
X
8
My immediate work group functions as a team. Metric
X
9
My pay reflects the effort I put into doing my work. Metric
X
10
My supervisor is friendly and helpful. Metric
X
11
The members of my work group have the skills and/or training
to do their job well. Metric
X
12
The benefits I receive are reasonable. Metric
Relationship Measures
X
13
Loyalty I have a sense of loyalty to Samouels restaurant. Metric
X
14
Effort I am willing to put in a great deal of effort beyond that
expected to help Samouels restaurant to be successful. Metric
X
15
Proud I am proud to tell others that I work for Samouels restaurant. Metric
Classification Variables
X
16
Intention to Search Metric
X
17
Length of Time an Employee Nonmetric
X
18
Work Type = Part-Time vs. Full-Time Nonmetric
X
19
Gender Nonmetric
X
20
Age Nonmetric
X
21
Performance Metric

Description of Employee Survey Variables
Variable Description Variable Type
Restaurant Perceptions
X
1
Excellent Food Quality Metric
X
2
Attractive Interior Metric
X
3
Generous Portions Metric
X
4
Excellent Food Taste Metric
X
5
Good Value for the Money Metric
X
6
Friendly Employees Metric
X
7
Appears Clean & Neat Metric
X
8
Fun Place to Go Metric
X
9
Wide Variety of menu Items Metric
X
10
Reasonable Prices Metric
X
11
Courteous Employees Metric
X
12
Competent Employees Metric
Selection Factor Rankings
X
13
Food Quality Nonmetric
X
14
Atmosphere Nonmetric
X
15
Prices Nonmetric
X
16
Employees Nonmetric
Relationship Variables
X
17
Satisfaction Metric
X
18
Likely to Return in Future Metric
X
19
Recommend to Friend Metric
X
20
Frequency of Patronage Nonmetric
X
21
Length of Time a Customer Nonmetric
Classification Variables
X
22
Gender Nonmetric
X
23
Age Nonmetric
X
24
Income Nonmetric
X
25
Competitor Nonmetric
X
26
Which AD Viewed (#1, 2 or 3) Nonmetric
X
27
AD Rating Metric
X
28
Respondents that Viewed Ads Nonmetric

Description of Customer Survey Variables
Using SPSS to Identify Clusters:
For this example we are looking for subgroups among all the
restaurant customers using the satisfaction variables. The SPSS click through
sequence is: Analyze Classify Hierarchical Cluster. This will take you to a
dialog box where you select and move variables X
17
, X
18
and X
19
into the
Variables box. Now look at the other options below. We will use all the
defaults shown on the dialog box as well as the defaults for the Statistics and
Plots options below. Next click on the Method box and select Wards under
Cluster Method (it is the last one and you must scroll down). Squared
Euclidean Distances is the default under Measure and we will use it. At this
point we will not need the Save option so click on OK to run the program.

When the program finishes look for a table called Agglomeration
Schedule. There are lots of numbers in it, but we only use the numbers in the
Coefficients column (middle of table). At the bottom of the agglomeration
schedule table find the numbers in the Coefficients column. The number at the
bottom will be the largest. As you move up the column the numbers (error
coefficients) get smaller. For example, the bottom number is 834.255 and the
one right above it is 282.850.
Dialog Boxes for SPSS Cluster
Error Coefficients for Cluster Solution
Error
Coefficients
New Cluster Variables
New two-group
and three-group
variables.
Cluster Analysis
Learning Checkpoint
1. Why might we use cluster analysis?
2. What are the three major steps in cluster
analysis?
3. How do you decide how many clusters
to extract?
4. Why do we validate clusters?

You might also like