Professional Documents
Culture Documents
No rigorous definition
Subjective
Scale/Resolution dependent (e.g. hierarchy)
increasing confidence to be 2
What do we want?
increasing confidence to be 2
What do we want?
increasing confidence to be 2
What do we want?
increasing confidence to be 2
What do we want?
increasing confidence to be 2
Do we want?
An index that is
independent of cluster volume?
independent of cluster size?
independent of cluster shape?
sensitive to outliers?
etc
Domain Knowledge!
Part II:
The Gap Statistic
Within-Cluster Sum of Squares
Dr xi x j
2
iCr jCr
xj
xi
Within-Cluster Sum of Squares
x x
2
Dr i j
iC r jC r
2nr xi x
2
iC r
k
1
Wk Dr
r 1 2nr
MSE X * (k ) MSE X (k )
Gap (k ) log log
MSE X * (1) MSE X (1)
6834 genes
64 human tumour
The Gap curve raises at k = 2 and 6
Other Approaches
Bk /( k 1)
Calinski and Harabasz 74 CH (k )
Wk /( n k )
(k 1) 2 / p Wk 1 k 2 / pWk
Krzanowski and Lai 85 KL (k ) 2 / p
k Wk (k 1) 2 / p Wk 1
Wk
Hartigan 75 H (k ) 1 (n k 1)
Wk 1