Professional Documents
Culture Documents
Introduction
2
2.1
1
u ki
x v
k
i
u
k 1
ki
JX | U, V u ki x i v k min,
(2)
i 1 k 1
x i yi .
2
(3)
1m
.
2
(6)
(7)
k, i
(1)
1, i 1..n.
j1
xi v j
c
1m
(8)
2.2
i 1
J (X | U, V) u mki x i v k min,
(4)
i 1 k 1
v k u mki x i
i 1
u
i 1
m
ki
(5)
M (x i ) e
x i ,x j
(9)
j1
M t ( x j ) M t 1 ( x j ) M * e
x* x j
2
where
(10)
(11)
3
3.1
2k P( x i | v k ) x i v k .
2
(19)
i 1
(20)
k 1
3.1.2
Fuzzy mountain function amendment
Each time the most dense data point is selected, the
mountain function of the other data points must be amended
to search for new cluster centers. In the traditional SC
method, the amendment is done using the direct
relationships between the selected data point and its
neighborhood, i.e., using a pre-specified mountain radius
(10). In our approach, this is done using the fuzzy partition
and no other parameters are required. First, the accumulated
density of all cluster centers is revised using
Acc t 1 ( v k ) Acc t ( v k ) M *t P( v k | x *t ),
(21)
where xt* is the data point selected as the new cluster center
and Mt* is the mountain function value, at time t. The
density at each data point is then re-estimated using (20)
and (21) as follows,
c
Acc(v k ) u ki .
(16)
i 1
k 1
dens(x i ) Acc(v k ) u ki .
(17)
k 1
u 'ki e
xi vk
2k
l 1
x i vl
l2
(18)
3.1.3
3.2
P( v k )
i 1
(25)
.
a
(x i | v l )
x i vk
2 2k
(26)
3.3
(23)
P( x i | v k ) max P a ( x i | v k ), ( 2)1 / n k e
i 1 k 1
n
c
(x i | v k )
P( x i | U, V ) P( v k ) P( x i | v k ).
i 1
l 1 i 1
L ( | X ) L ( U, V | X )
Experimental results
4.1
4.2
Avg.
Ratio
fzSC
1.00
1.00
1.00
1.00
1.00
1.00
1.00
k-means
0.97
0.87
1.00
1.00
1.00
0.75
0.93
k-medians
0.95
0.82
1.00
1.00
1.00
0.62
0.90
FCM
0.97
1.00
0.95
1.00
1.00
0.96
0.98
Algorithm
4.3
0.97
1.00
1.00
1.00
1.00
0.98
0.90
1.00
1.00
1.00
1.00
1.00
1.00
0.99
0.97
1.00
0.87
0.99
1.00
0.96
known
#clusters
predicted
#clusters
ratio
Iris
150
1.00
Wine
178
1.00
6
5
6
5
0.95
0.05
0.65
0.35
Dataset
Glass
214
Breast Cancer
Wisconsin
699
6
4.3.2
Real datasets
The number of clusters in the Iris, Wine, Glass and
Breast Cancer Wisconsin datasets are 3, 3, 6, and 6
respectively. The data in the Wine and Glass datasets were
normalized by attributes before clustering. The Pearson
Correlation distance, which is the most appropriate for real
datasets [11], was used to measure the similarity between
data points.
Conclusions
References