Professional Documents
Culture Documents
Cluster
Analysis
20-1
Chapter Outline
1) Overview
2) Basic Concept
3) Statistics Associated with Cluster Analysis
4) Conducting Cluster Analysis
i.
ii.
iii.
iv.
v.
vi.
20-2
Chapter Outline
5) Applications of Nonhierarchical Clustering
6) Clustering Variables
7) Summary
20-3
Cluster Analysis
20-4
Variable 1
Fig. 20.1
Variable 2
20-5
A Practical Clustering
Situation
Variable 1
Fig. 20.2
Variable 2
20-6
20-7
20-8
20-9
20-10
V1
V2
V3
V4
V5
V6
6
2
7
4
1
6
5
7
2
3
1
5
2
4
6
3
4
3
4
2
4
3
2
6
3
4
3
3
4
5
3
4
2
6
5
5
4
7
6
3
7
1
6
4
2
6
6
7
3
3
2
5
1
4
4
4
7
2
3
2
3
4
4
5
2
3
3
4
3
6
3
4
5
6
2
6
2
6
7
4
2
5
1
3
6
3
3
1
6
4
5
2
4
4
1
4
2
4
2
7
3
4
3
6
4
4
4
4
3
6
3
4
4
7
4
7
5
3
7
20-11
20-12
20-13
A Classification of Clustering
Procedures
Fig. 20.4
Clustering Procedures
Hierarchical
Agglomerative
Linkage Variance
Methods Methods
Nonhierarchical
Divisive
Centroid
Methods
Sequential
Threshold
Other
Two-Step
Parallel
Threshold
Optimizing
Partitioning
Wards
Method
Single
Linkage
Complete
Linkage
Average
Linkage
20-14
20-15
20-16
Linkage Methods of
Clustering
Fig. 20.5
Single Linkage
Minimum
Distance
Cluster 2
Cluster 1
Complete Linkage
Maximum
Distance
Cluster 1
Cluster 1
2007 Prentice Hall
Average Linkage
Average
Distance
Cluster 2
Cluster 2
20-17
20-18
Centroid Method
20-19
20-20
20-21
Results of Hierarchical
Table
20.2
Clustering
Agglomeration Schedule Using Wards Procedure
Stage
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
2007 Prentice Hall
cluster
Clusters
combined
Cluster 1 Cluster
2 Coefficient
appears
14
16
1.000000
6
2
5
3
10
6
9
4
1
5
4
1
1
2
1
4
2
1
7
13
11
8
14
12
20
10
6
9
19
17
15
5
3
18
4
2
2.000000
3.500000
5.000000
6.500000
8.160000
10.166667
13.000000
15.583000
18.500000
23.000000
27.750000
33.100000
41.333000
51.833000
64.500000
79.667000
172.662000
328.600000
Stage
first
Cluster
1 Cluster 2 Next stage
0
0
6
0
0
7
0
0
15
0
0
11
0
0
16
0
1
9
2
0
10
0
0
11
0
6
12
6
7
13
4
8
15
9
0
17
10
0
14
13
0
16
3
11
18
14
5
19
12
0
18
15
17
19
16
18
0
20-22
Results of Hierarchical
Table
20.2,
Clustering
cont.
Label case
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
1
2
1
3
2
1
1
1
2
3
2
1
2
3
1
3
1
4
3
2
1
2
1
3
2
1
1
1
2
3
2
1
2
3
1
3
1
3
3
2
1
2
1
2
2
1
1
1
2
2
2
1
2
2
1
2
1
2
2
2
20-23
20-24
20-25
20-26
20-27
Cluster Centroids
Table 20.3
Cluster No. V1
Means of
Variables
V2
V3
V4
V5
5.750
3.625
6.000
3.125
1.750
3.875
1.667
3.000
1.833
3.500
5.500
3.333
3.500
5.833
3.333
6.000
3.500
6.000
V6
20-28
2.
3.
4.
5.
20-29
Results of Nonhierarchical
Clustering
Table 20.4
Initial Cluster Centers
1
V1
V2
V3
V4
V5
V6
4
6
3
7
2
7
Cluster
2
2
3
2
4
7
2
3
7
2
6
4
1
3
Iteration
1
2
Iteration History
Change in Cluster Centers
1
2
3
2.154
2.102
2.550
0.000
0.000
0.000
a.
Convergence
achieved due to no or small distance
change. The maximum distance by which any center
has changed is 0.000. The current iteration is 2. The
minimum distance between initial centers is 7.746.
2007 Prentice Hall
20-30
Results of Nonhierarchical
Clustering
Cluster Membership
Table 20.4 cont.
Case Number
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
Cluster
3
2
3
1
2
3
3
3
2
1
2
3
2
1
3
1
3
1
1
2
Distance
1.414
1.323
2.550
1.404
1.848
1.225
1.500
2.121
1.756
1.143
1.041
1.581
2.598
1.404
2.828
1.624
2.598
3.555
2.154
2.102
20-31
Results of Nonhierarchical
Table 20.4, cont.
Clustering
Final Cluster Centers
Cluster
1
2
V1
V2
V3
V4
V5
V6
4
6
3
6
4
6
2
3
2
4
6
3
3
6
4
6
3
2
4
5.568
5.568
5.698
5.698
6.928
6.928
20-32
Results of Nonhierarchical
Table 20.4, cont.
Clustering
ANOVA
V1
V2
V3
V4
V5
V6
Cluster
Mean Square
29.108
13.546
31.392
15.713
22.537
12.171
df
2
2
2
2
2
2
Error
Mean Square
0.608
0.630
0.833
0.728
0.816
1.071
df
17
17
17
17
17
17
F
47.888
21.505
37.670
21.585
27.614
11.363
Sig.
0.000
0.000
0.000
0.000
0.000
0.001
The F tests should be used only for descriptive purposes because the clusters have been
chosen to maximize the differences among cases in different clusters. The observed
significance levels are not corrected for this, and thus cannot be interpreted as tests of the
hypothesis that the cluster means are equal.
Valid
Missing
2007 Prentice Hall
1
2
3
6.000
6.000
8.000
20.000
0.000
20-33
Results of Two-Step
Table 20.5
Clustering
Auto-Clustering
Number of Clusters
1
2
Akaike's
Information
Criterion (AIC)
AIC
Change(a)
Ratio of AIC
Changes(b)
Ratio of
Distance
Measures(c)
104.140
101.171
-2.969
1.000
.847
97.594
-3.577
1.205
1.583
116.896
19.302
-6.502
2.115
138.230
21.335
-7.187
1.222
158.586
20.355
-6.857
1.021
179.340
20.755
-6.991
1.224
201.628
22.288
-7.508
1.006
224.055
22.426
-7.555
1.111
10
246.522
22.467
-7.568
1.588
11
269.570
23.048
-7.764
1.001
12
292.718
23.148
-7.798
1.055
13
316.120
23.402
-7.883
1.002
14
339.223
23.103
-7.782
1.044
15
362.650
23.427
-7.892
1.004
a The changes are from the previous number of clusters in the table.
b The ratios of changes are relative to the change for the two cluster solution.
c The ratios of distance measures are based on the current number of clusters
against the previous number of clusters.
2007 Prentice Hall
20-34
Cluster Distribution
Table 20.5, cont.
Cluster
N
6
% of
Combined
30.0%
% of Total
30.0%
30.0%
30.0%
40.0%
40.0%
20
100.0%
100.0%
Combined
Total
20
100.0%
20-35
Cluster Profiles
Table 20.5, cont.
Fun
Cluster
Eating Out
Mean
1.67
Std. Deviation
.516
Mean
3.00
Std. Deviation
.632
Mean
1.83
Std. Deviation
.753
3.50
.548
5.83
.753
3.33
.816
5.75
1.035
3.63
.916
6.00
1.069
Combined
3.85
1.899
4.10
1.410
3.95
2.012
Best Buys
Don't Care
Compare Prices
Mean
3.50
Std. Deviation
1.049
Mean
5.50
Std. Deviation
1.049
Mean
3.33
Std. Deviation
.816
6.00
.632
3.50
.837
6.00
1.549
3.13
.835
1.88
.835
3.88
.641
4.10
1.518
3.45
1.761
4.35
1.496
20-36
Clustering Variables
20-37
SPSS Windows
To select this procedures using SPSS for
Windows click:
Analyze>Classify>Hierarchical Cluster
Analyze>Classify>K-Means Cluster
Analyze>Classify>Two-Step Cluster
20-38
1.
2.
3.
4.
5.
6.
7.
8.
20-39
2.
3.
4.
5.
6.
20-40
1.
2.
3.
4.
5.
6.
7.
20-41