You are on page 1of 15

International Journal of Innovative

Computing, Information and Control


Volume 10, Number 6, December 2014

c
ICIC International 2014
ISSN 1349-4198
pp. 1-13

A GENERAL FRAMEWORK FOR CONSTRUCTING POSSIBILISTIC


MEMBERSHIP FUNCTIONS IN ALTERNATING CLUSTER
ESTIMATION
Qina Wang1 , Jian Zhou1 and Chih-Cheng Hung2,3
1

School of Management, Shanghai University, Shanghai 200444, P. R. China


2
Anyang Normal University, Anyang 455000, P. R. China
3
Southern Polytechnic State University, Georgia 30060, USA
zhou jian@shu.edu.cn

Received December 2013; revised February 2014


Abstract. Clustering models based on objective functions are one of the most famous
approaches in clustering algorithms. These models are often solved by using some alternating optimization algorithms with the necessary conditions for local extrema. Alternating cluster estimation is a generalized scheme of alternating optimization algorithms,
which does not use the objective function and allows the users to select membership and
prototype functions directly through an alternating iteration architecture. A general approach of alternating cluster estimation based on possibilistic membership functions was
developed by Zhou and Hung [1], where the memberships of feature points are estimated
by the predetermined membership functions, and the cluster centers are updated via a performance index with possibilistic membership weights. The membership functions play a
vital role in that approach, which should be properly provided in order to obtain good clustering results. Nevertheless, the method for how to determine the appropriate functions
was not given. For a further investigation in alternating cluster estimation for possibilistic clustering, a general framework for constructing membership functions in this model
is introduced by combining the clustering performance with the fuzzy set theory. In addition, four specic generalized possibilistic clustering algorithms are recommended for
applications. Finally, a comparative study based on real data experiments is presented to
demonstrate the performance and eciency of the proposed algorithms.
Keywords: Fuzzy clustering, Possibilistic clustering, Alternating cluster estimation,
Fuzzy set theory

1. Introduction. Cluster analysis is the process of partitioning a data set into subsets of
objects which have similar properties. Clustering methods have been used extensively in
computer vision and pattern recognition. Fuzzy clustering is an approach using the fuzzy
set theory as a tool for data grouping, which has advantages over traditional clustering
in many applications such as pattern recognition [2, 3], image segmentation [4, 5], etc.
In the eld of fuzzy clustering analysis, the fuzzy c-means (FCM) clustering algorithm
presented by Bezdek [6] is the most well-known and widely used method, based on a
fuzzy objective function with probabilistic membership weights. Several recent papers
dealing with the FCM and their derivatives can be found in Chen et al. [7], Li et al. [8],
and Miyamoto et al. [9]. Since the memberships resulting from the FCM do not always
correspond to the intuitive concept of degrees of belongingness or compatibility, Krishnapuram and Keller [10, 11] developed the possibilistic clustering algorithms (PCAs), where
the memberships provide a good explanation of degrees of belongingness for the data.
Compared with the FCM, the PCAs are more robust to noise and outliers. Therefore,
theoretical and practical studies on the PCAs have received extensive attention in the
last two decades. For instance, Hopner and Klawonn [12] and Zhou et al. [13] presented

Corresponding author. Tel.: +86-21-66134414-805. E-mail address: zhou jian@shu.edu.cn (J. Zhou).
1

Q. WANG, J. ZHOU AND C. C. HUNG

some rigorous proofs on the convergence property of the PCAs as a complement of the
experimental results. Krishnapuram et al. [14, 15] generalized the prototype-based fuzzy
and possibilistic clustering algorithms by introducing the shell clustering approach, and
illustrated their applications to boundary detection and surface approximation. Since the
clustering performance of the PCAs in [10, 11] heavily depends on the parameters used,
Yang and Wu [16] suggested a new PCA whose performance can be easily controlled.
Rhee et al. [17] applied the kernel approach to the possibilistic c-means algorithm by
cooperating a variance updating method for Gaussian kernels in each iteration. Other
recent contributions in this area were made by Anderson et al. [18], Lai et al. [19], Li et
al. [21], Treerattanapitak and Jaruskulchai [20], Wen et al. [22], and Xie et al. [23].
Essentially, both FCM and PCAs are alternating optimization algorithms driven by
necessary conditions for local extrema of their respective objective functions. The objective functions play an extremely important role in the approaches of fuzzy clustering
analysis, and hence were studied extensively in the literature. In this area more attentions
were focused on the improvement of membership functions of the FCM (see, e.g., Chen
and Wang [24], Miyamoto [25], Rodriguez et al. [26]). Recently, Barcelo-Rico et al. [27]
presented a new possibilistic method for discovering linear local behavior by suggesting
an objective function with hyper-Gaussian distributed membership, its parameters were
found using global optimization so as to avoid local optima. Instead of using alternating
optimization, Runkler and Bezdek [28] developed a generalized model using an alternating
iteration architecture, called alternating cluster estimation (ACE), in which the membership and prototype functions are selected directly by the users. They stated that ACE is a
new and eective tool for clustering and function approximation [29], and virtually every
clustering model can be realized as an instance of ACE. However, they didnt explain how
to select proper functions for dierent real data sets. Besides, Zhou and Hung [1] proposed a similar procedure especially for possibilistic clustering, and discussed the general
characteristics of membership and prototype functions used in the approach with details.
The algorithms derived are called generalized possibilistic clustering algorithms (GPCAs).
Zhou et al. [30, 31] also proposed some other alternating clustering estimation algorithms
by extending the possibilistic membership weights to the normalized possibilistic and
credibilistic membership weights.
It is obvious that dierent membership functions may have diverse eciency when
dealing with real problems. In order to dene good clusters, it is important to provide
a suitable membership function for each cluster. To tackle this problem, we introduce
a general procedure for the users to determine the membership functions in ACE for
possibilistic clustering by combining both the clustering performance and the fuzzy set
theory, which leads to a denite GPCA by providing some proper values of parameters.
Furthermore, four specic algorithms are recommended for various real data sets, and then
their experimental results are illustrated with real data including Iris data, car evaluation
data, and beer tastes data. The comparative results with the K-means algorithm and
existing PCAs show that the proposed algorithms are eective with better performances
when solving these practical application problems.
The rest of this paper is organized as follows. In Section 2, the approach of possibilistic clustering as well as the existing PCAs are briey reviewed. Subsequently, a
method of determining the membership function for a cluster in the generalized approach
to possibilistic clustering algorithms is suggested, and four specic functions are then recommended for real applications in Section 3. In Section 4, four new GPCAs are obtained
by using the specic functions in the membership evaluation equations. Finally, some
experimental results of real data sets for dierent applications are presented in Section 5.
2. Possibilistic Clustering Algorithms. Since its introduction in 1965 by Zadeh [32],
the fuzzy set theory has been well developed and applied in a wide variety of real problems.

A GENERAL FRAMEWORK FOR CONSTRUCTING POSSIBILISTIC MEMBERSHIP FUNCTIONS 3

In clustering, a great deal of research on fuzzy clustering has been accomplished in the
literature, in which the fuzzy c-means algorithm developed by Bezdek [6] is the most
well known. Since the normalization condition in the FCM assigns noise points improper
memberships in each cluster according to the fuzzy set theory, the memberships resulting
from the FCM do not always correspond to the intuitive concept of degree of belongingness
or compatibility. In order to produce memberships with a good explanation of degree of
belongingness for the data, the PCAs were proposed by Krishnapuram and Keller [10, 11].
Given a data set X = {x1 , x2 , , xn } in a p-dimensional Euclidean space, an ordinary
Euclidean norm || || on p , an integer c 2 representing the specied number of clusters,
and a parameter of fuzzier m with m > 1, the approach of possibilistic clustering initiated
by Krishnapuram and Keller [10, 11] is to nd the optimal membership matrix and the
cluster center matrix A which minimize the objective function
n
c
n
c

i (1 ij )m
(ij )m d2ij +
JP CA93 (, A) =
(1)
i=1

i=1 j=1

and
JP CA96 (, A) =

n
c

ij d2ij +

i=1 j=1

i=1

j=1

(ij ln ij ij ),

(2)

j=1

respectively, subject to the membership constraint


{
}
c


UX = 0 ij 1,
ij > 0, i = 1, 2, , c, j = 1, 2, , n ,

(3)

i=1

where ij represents the degree of compatibility or membership of feature point xj belonging to cluster i, dij represents the Euclidean distance between the ith cluster center
ai and the jth feature point xj , i.e.,

dij = ||ai xj ||2 , i = 1, 2, , c, j = 1, 2, , n.


(4)
For convenience, we denote the index sets of clusters and feature points as I = {1, 2, , c}
and J = {1, 2, , n}, respectively. The distance parameters i , i I, are user specied,
which were recommended by Krishnapuram and Keller [10] as
n
n
m 2
2
dij
j=1 (ij ) dij
j=1 (ij )

i = K
or

=
(5)
i
n
n
m

j=1 (ij )
j=1 (ij )
where the parameter K > 0 is typically chosen to be one, and is the predetermined
threshold with 0 < < 1, which gives the crisp partition
{
1, if ij

(6)
(ij ) =
0, otherwise.
The necessary conditions for a minimizer (, A) of JP CA93 and JP CA96 subject to the
constraint UX deduce two related possibilistic clustering algorithms, denoted as
PCA93 and PCA96 respectively, which are both iterative algorithms with the following
update equations for memberships

(PCA93)
ij =

(
)1/(m1) for (i, j) (I, J)

d2ij

1+
(7)
i

{
}

ij = exp ij
for (i, j) (I, J)
(PCA96)
i
in PCA93 and PCA96, respectively, and the following update equation for cluster centers
n
m
j=1 (ij ) xj

for i I
(8)
ai =
n
m
j=1 (ij )

Q. WANG, J. ZHOU AND C. C. HUNG

in both PCA93 and PCA96.


Since the performance of possibilistic clustering proposed in [10, 11] heavily depends on
the selection of parameters i , a new PCA was developed by Yang and Wu [16] denoted
as PCA06, whose performance can be easily controlled. The objective function used in
PCA06 is
c
n
c
n


JP CA06 (, A) =
(ij )m d2ij + 2
[(ij )m ln(ij )m (ij )m ] ,
(9)
m
c
i=1 j=1
i=1 j=1
and accordingly the update equation for memberships is

}
{
m c d2ij
for (i, j) (I, J),
ij = exp

(10)

where the parameter is dened by


n
n
||2
j=1 ||xj x
j=1 xj
=
=
with x
.
(11)
n
n
The same update equation (8) for cluster centers in PCA93 and PCA96 is also used in
PCA06.
3. Membership Function in ACE for Possibilistic Clustering. It is easy to see
that the only dierence among PCA93, PCA96 and PCA06 is the evaluation equation
of membership, which is a function of distance dij . Based on the analogous principle of
alternating cluster estimation in [28], Zhou and Hung generalized the PCAs in [10, 11, 16]
to a family of iterative clustering algorithms, called the generalized possibilistic clustering
algorithms [1], in which the memberships are calculated by
ij = fi (dij ) for (i, j) (I, J),
with the membership function fi satisfying

fi is monotone decreasing on [0, +)


fi (0) = 1

fi (+) = 0

(12)

(13)

for each i I, and the cluster centers are also updated by (8). The membership functions
fi in (12) should be predetermined by the decision-maker, and various functions satisfying
(13) may lead to clustering algorithms with dierent performances. Thus, it is necessary
to discuss how to provide a proper function for each cluster in practice, which is the
objective of this section.
In this section, a methodology is suggested for the determination of a suitable membership function for each cluster in the generalized approach of possibilistic clustering
algorithms in [1], and then four specic membership functions are recommended for real
applications. All the discussions in this section are based on only one cluster.
3.1. Membership Function of a Fuzzy Set. In order to obtain clustering results using
the GPCA, a membership function fi satisfying (13) should be given for each cluster i
(i I) in advance by the decision-maker, which is assumed as a fuzzy set in fuzzy
clustering. There are many functions satisfying (13), for example,

1
f (x) =
p
1 + kxb
, k > 0, b > 0.
(14)

fe (x) = exp {kxb }


The following example illustrates a methodology to provide an appropriate membership
function for a fuzzy set.

A GENERAL FRAMEWORK FOR CONSTRUCTING POSSIBILISTIC MEMBERSHIP FUNCTIONS 5

Example 3.1. Suppose that V is the set of ages of people in a residential area. Then the

set old people in this residential area can be modeled as a fuzzy set on V , denoted as A.

Let the standard of A be x0 = 80 years old, and the dissimilarity between the standard x0
and the element x V is dened by
{
0,
if x x0
d(x, x0 ) =
(15)
x0 x, if 0 < x < x0
where standard represents the semantic centre of the fuzzy set, and dissimilarity
means the measurement of the distance or dierence. In order to give a proper monotone decreasing function f for evaluating the memberships of x V , additive preference
information on the fuzzy set old people must be used. For example, it is reasonable to
assume that the membership of 40 years old is 0.01, and the membership of 60 years old
is 0.5. After that, suppose that the monotone decreasing function fp in (14) is employed.
Considering all the assumptions in this example, we then obtain the following equations,

= 0.01

1 + k(80 40)b
(16)
1

= 0.5.

1 + k(80 60)b
The solution of (16) is k = 206.6 and b = 6.6, and the deduced function is fp (y) =
1/(1 + 206.6 y 6.6 ). Consequently, the membership function of A is
{
1,
if x 80

p (x) = fp (d(x, x0 )) =
(17)
6.6
6.6 1
(1 + 20 (80 x) ) , if 0 < x < 80.
If the monotone function fe in (14) is adopted, accordingly the membership function of A
is
{
1,
if x 80

4
2.7
e (x) = exp {2 10 d(x, x0 ) } =
4
2.7
exp {2 10 (80 x) }, if 0 < x < 80.
(18)
The two membership functions p in (17) and e in (18) are shown in Figure 1.

Figure 1. Two Membership Functions of the Fuzzy Set Old People in a


Residential Area

Q. WANG, J. ZHOU AND C. C. HUNG

Remark 3.1. For a fuzzy set, many functions can be used to evaluate its memberships
properly according to the opinion of the decision-maker. In Example 3.1, all the parameters including the standard x0 = 80, the dissimilarity d(x, x0 ) dened in (15), the two
reference elements 40 and 60 together with the corresponding memberships 0.01 and 0.5,
and the monotone decreasing function fp or fe used are predetermined by the decisionmaker, and obviously dierent selections of parameters will lead to diverse membership
functions.
From Example 3.1, a procedure to provide a proper membership function A for a fuzzy
set A is summarized as follows.
Algorithm 1: (Determination of Membership Function of a Fuzzy Set)
Step 1: Predetermine a standard x0 and a dissimilarity d(x, x0 ) between the element

x and x0 of the fuzzy set A.


Step 2: Decide a monotone decreasing function f satisfying f (0) = 1 and f (+) = 0.
Step 3: Give some reference elements and their corresponding memberships according
to the knowledge of experts or by estimation, in which the number of reference
elements equals to the number of unknown parameters in f .
Step 4: Solve the equations that combine the information of functions d, f , and the
reference elements together with the memberships, and obtain the membership function A (x) = f (d(x, x0 )) of A as shown in (16)-(18).
3.2. Membership Function of a Cluster. The methodology of deciding a membership
function for a fuzzy set in Section 3.1 can be applied in fuzzy clustering to obtain a
membership function for each cluster in each iteration in the generalized approach of
possibilistic clustering algorithms [1]. The procedure of providing a proper membership
function for each cluster i (i I) is introduced in detail as follows.
Algorithm 2: (Determination of Membership Function of a Cluster)
Step 1: It is natural to select the cluster center ai , i I as the standard of fuzzy
cluster i and measure the dissimilarity between the feature point xj X, j J
and the standard ai by the Euclidean distance dij in (4).
Step 2: Generally speaking, any monotone decreasing function f with f (0) = 1 and
f (+) = 0 can be used for evaluating the memberships in fuzzy clustering. In this
section, we only investigate the functions fp and fe dened in (14) for the sake of
simplicity.
Step 3: Decide a max-distance dmax and a mid-distance dmid for each cluster i with
f (dmax ) = 0 and f (dmid ) = 1 , in which 0 (0, 1) is a small number in (0, 1) and
1 (0, 1) is a relative large number. Typically set 0 = 0.01 and 1 = 0.5. It follows
that the mid-distance dmid is the distance at which the membership value of a feature
point in the cluster becomes 0.5, and the max-distance dmax is the distance at which
the membership value of a point becomes 0.01.
Step 4: Combine the information of all the parameters and functions to obtain the
membership functions p and e corresponding to fp and fe , respectively, as follows,

{
1
(x ) =
bp = ln 99/ ln (dmax /dmid )
p
j
1 + (dij /dmid )bp
with
(19)

be = ln ( lnln100
)/
ln
(d
/d
)
max
mid
e (xj ) = exp { ln 2 (dij /dmid )be }
2
for j J by setting 0 = 0.01 and 1 = 0.5.
Remark 3.2. In order to obtain a specic membership function for a cluster, the distances
dmid and dmax must be given. We can see that the mid-distance dmid actually determines
the possible main zone of inuence of a cluster, and the max-distance dmax means the

A GENERAL FRAMEWORK FOR CONSTRUCTING POSSIBILISTIC MEMBERSHIP FUNCTIONS 7

maximum distance that a cluster can see. Thus, it is a good idea to use dmax = dmid
by introducing a weighting parameter > 1. An appropriate vector of (dmid , ) would
determine a corresponding membership function when the functions fp and fe are used,
respectively. In this paper, we set the weighting parameter = 10 for our purpose. Consequently, the setting of = 10 in (19) leads to bp 2 and be 0.82, and we obtain the
following membership functions,

1
(x ) = f (d ) =
j
p
p ij
1 + (dij /dmid )2
(20)

0.82
e (xj ) = fe (dij ) = exp { ln 2 (dij /dmid ) }
for j J. The monotone decreasing functions fp and fe in (20) are sketched in Figure 2.

Figure 2. Two Monotone Functions Representing Membership of Each


Fuzzy Cluster
3.3. Four Specific Membership Functions. As shown in Section 3.2, a vector (dmid , )
would determine two corresponding membership functions of a cluster p and e in (19),
where dmid determines the main zone of inuence of a cluster, and dmid represents the
maximum distance that a cluster can see. In this section, by using = 10 and the
monotone functions fp and fe in (20), and setting some proper values of dmid , four specic
membership functions are recommended for the real clustering problems. Before that, a
concept of -level set in the fuzzy set theory is reviewed as follows.
Definition 3.1. Let A be a fuzzy set dened on a universal set U with membership
function A . Then the set of elements that belong to the fuzzy set A at least to the degree
denoted by
of membership is called the -level set of A,

A = {x U (x) }.
A

Furthermore, two parameters if and i are dened for convenience with

n
n
m 2
2
d

dij
ij
ij
j=1
j=1 (ij )
f

i =
=
,

n
i
m

j=1 ij
j=1 (ij )

(21)

for each i I, where (ij ) is dened in (6). The parameter if represents the average
fuzzy intracluster distance of cluster i , and i implies the average intracluster distance
for all of the feature points in the -level set of cluster i .

Q. WANG, J. ZHOU AND C. C. HUNG

Now let us decide some appropriate distances dmid . It is reasonable to assume that
the mid-distance dmid is proportional to the distance if or i for i I, i.e., setting
dmid = W if or dmid = W i , where W is an appropriate constant required to be determined. Obviously, dierent constants would result in clustering algorithms with diverse
eciency. In order to obtain some proper values for the parameter W , enormous numerical experiments based on randomly generated data sets are run instead of theoretical
analysis, which seems not easy for this problem. According to the results of enormous numerical experiments, four membership functions which perform well with better clustering
performance, and thus are recommended for real applications as listed follows,

f 2 1
3
by letting dmid = (mc)3/2 if in fp

1 (xj ) = [1 + (mc) (dij /i ) ]

2 (xj ) = [1 + (mc)3 (dij / )2 ]1


by letting dmid = (mc)3/2 i in fp
i
(22)

3 (xj ) = exp { ln 2(mc)0.41 (dij /if )0.82 } by letting dmid = (mc)1/2 if in fe

4 (xj ) = exp { ln 2(mc)0.41 (dij /i )0.82 } by letting dmid = (mc)1/2 i in fe


where fp and fe are dened in (20), and c is the number of clusters and m is the fuzzier.
It is easy to deduce that the constant W of the distance dmid used in the above four
membership functions is (mc)3/2 or (mc)1/2 .
4. Four Specific GPCAs. In this section, four GPCAs are presented by combining
the generalized approach of possibilistic clustering algorithms in [1] and the four specic
membership functions for clusters recommended in Section 3.3. These GPCAs are also
iterative algorithms based on ACE, in which the membership evaluation equations in each
iteration are
ij = k (xj ), k = 1, 2, 3, 4,
(23)
and the update equation for cluster centers is
n
m
j=1 (ij ) xj
ai = n
m
j=1 (ij )

for i I,

(24)

where k are functions given in (22). For convenience, the GPCAs with the update
equations (23) and (24) based on the membership functions 1 , 2 , 3 , 4 are denoted as
GPCA1, GPCA2, GPCA3 and GPCA4, respectively. The algorithms of four new GPCAs
are summarized as follows by taking 1 as an example.
Algorithm 3: (GPCA1 Based on 1 )
(0)
(0)
Step 1: Initialize ij [0, 1] and ai p for all (i, j) (I, J), set a small number
> 0 and the iteration counter t = 0, set the number of clusters c, and the fuzzier
m.
Step 2: Calculate (if )(t) with
v
u n
(t)
m
u j=1 ((t)
(dij )2
ij )
f (t)
t
(25)
(i ) =
n
(t) m
(
)
ij
j=1
for i I, and then estimate the distance dmid by dmid = (mc)3/2 (if )(t) , where

(t)
(t)
(26)
dij = ||ai xj ||2 for (i, j) (I, J).
(t)

(t+1)

Step 3: Compute ij

(t)

for all (i, j) (I, J) with


(t+1)

ij

= [1 + (mc)3 (dij /(if )(t) )2 ]1 .


(t)

(27)

A GENERAL FRAMEWORK FOR CONSTRUCTING POSSIBILISTIC MEMBERSHIP FUNCTIONS 9


(t+1)

Step 4: Compute ai

for all i I with


n
(t+1)
ai

(t+1) m
) xj
j=1 (ij
.
n
(t+1) m
)
j=1 (ij

(t+1)

Step 5: Increment t until maxi ||ai

(28)

(t)

ai || < .

5. Computational Experiments. In this section, three real data sets including Iris
data, car evaluation data and beer tastes data are used to illustrate the cluster performance of the four new GPCAs recommended in Section 4, in which the real labels of feature points are known. We implement PCA93k (1), PCA93 (0.3), PCA96k (1),
PCA96 (0.3), PCA06, GPCA1, GPCA2, GPCA3, and GPCA4, where PCA93k (1) and
PCA96k (1) means the algorithms using the parameter
n
m 2
j=1 (ij ) dij
i = K n
(29)
m
j=1 (ij )
with K = 1, and PCA93 (0.3) and PCA96 (0.3) means the algorithms using the parameter
n
2
dij
j=1 (ij )
(30)
i = n

j=1 (ij )
with = 0.3 in the membership update equation (7). Note that the parameters i have
not been reestimated in each iteration during all the clustering process according to the
suggestion presented in [10]. Besides, we set the parameter as 0.3 in both GPCA2 and
GPCA4.
Since the results of the PCAs and the GPCAs heavily depend on the initialization,
a reasonably good initialization is required for all the algorithms. In this section, the
K-means algorithm is used to obtain the initial cluster centers. For each clustering algorithm, 1000 experiments with initial centers generated by the K-means algorithm are
implemented to see the robustness of all the PCAs and GPCAs to the initial settings. The
fuzzier m has dierent inuences on these algorithms. In this paper, all the algorithms
are run with fuzzier m = 2 for 1000 experiments for comparison since it is recommended
that this value will generate relatively desirable results. In order to compare the results
obtained by running these clustering algorithms, the clustering results are evaluated by
their overall accuracies in this paper. The larger overall accuracy means better clustering
results, and the lower variance represents more robust to the initialization. The maximal
mapping number criterion introduced in [1] is employed to obtain the clustering map of
each result. The interested reader may consult [1] for detail explanation.
5.1. Iris Data. The Iris ower data or Fishers Iris data in [33, 34] is a multivariate data
set introduced by Ronald Fisher in 1936 as an example of discriminant analysis. In this
paper, we use the recommended new GPCAs to cluster the Iris data with the comparison
of the existing PCAs. The data consists of 50 samples from each of three species of Iris
(Iris setosa, Iris virginica and Iris versicolor). Four features were measured in centimetres
from each sample including the length and the width of the sepals and petals. Among
the three clusters, two of the clusters have substantial overlapping.
The analysis on the overall accuracies of all the clustering results are given in Table 1,
where the columns Highest, Mean and Variance below Overall Accuracy provide the highest one, mean and variance of the overall accuracies of 1000 experiments
for each algorithm, respectively. In order to show the accuracy distribution of 1000 clustering results, the numbers of experiments with overall accuracies in the regions [0.9, 1],

10

Q. WANG, J. ZHOU AND C. C. HUNG

Table 1. A Comparison of 1000 Experiments for the Iris Data


Clustering
Overall
Algorithm Highest
The Existing PCAs
PCA93k (1)
74.00
PCA93 (0.3) 70.00
PCA96k (1)
75.33
PCA96 (0.3) 69.33
PCA06
92.00
The New GPCAs
GPCA1
94.00
GPCA2
94.00
GPCA3
94.00
GPCA4
94.00

Accuracy (%)
Accuracy Distribution Iteration
Mean Variance [0.9, 1] [0.8, 0.9) [0, 0.8) Number
69.45
69.29
68.88
68.30
80.83

1.87
0.15
5.61
0.19
324.18

0
0
0
0
722

0
0
0
0
0

1000
1000
1000
1000
278

20.98
21.96
25.16
26.87
45.57

86.84
86.72
86.14
83.08

181.97
199.64
251.89
320.95

620
723
803
729

239
122
0
0

141
155
197
271

50.70
54.21
251.89
78.49

Table 2. Analysis of 1000 Experiments for the Iris Data by PCA93k (0.02)
Clustering
Overall Accuracy (%)
Accuracy Distribution Iteration
Algorithm
Highest Mean Variance [0.9, 1] [0.8, 0.9) [0, 0.8) Number
PCA93k (0.02) 93.33 88.08 128.76
915
0
85
24.312

[0.8, 0.9) and [0, 0.8) are counted and given in the three columns below Accuracy Distribution. The last column Iteration Number shows the average iteration number of
1000 experiments for each algorithm.
As shown in Table 1 that all the algorithms except PCA93k (1), PCA93 (0.3), PCA96k (1)
and PCA96 (0.3) are ecient for clustering the Iris data with good results. More than
62% of experiments of the GPCAs provide the clustering results with the overall accuracies larger than 90%. Furthermore, we can see that although it is possible for the
clustering algorithms PCA06, GPCA1, GPCA2, GPCA3, and GPCA4 to provide clustering results with higher overall accuracies than PCA93k (1), PCA93 (0.3), PCA96k (1)
and PCA96 (0.3) (the highest one even achieves 94.00% by the four new GPCAs), however, the variances of all these algorithms including PCA06, GPCA1, GPCA2, GPCA3,
and GPCA4 are much larger than those of PCA93k (1), PCA93 (0.3), PCA96k (1) and
PCA96 (0.3). It means that PCA93k (1), PCA93 (0.3), PCA96k (1) and PCA96 (0.3)
is more robust to the initialization than the new GPCAs according to the experiments
on the Iris data. In other words, GPCA1, GPCA2, GPCA3, GPCA4 are better than
PCA93k (1), PCA93 (0.3), PCA96k (1) and PCA96 (0.3) on overall accuracy, but the cost
is the lack of robustness.
Remark 5.1. The parameter K in (29) plays an important role in PCA93 and PCA96.
It follows from Table 1 that the typical value K = 1 is not appropriate for PCA93 and
PCA96 when clustering the Iris data. In fact, we run the algorithm PCA93k (0.02) (i.e.,
set K = 0.02 in (29)) for the Iris data for 1000 experiments and obtain the result in
Table 2, which is much better than that of PCA93k (1). Furthermore, the inuence of K
on PCA93 and PCA96 is dierent. In our paper, we would not discuss the selection of
the parameter K on PCA93 and PCA96.

A GENERAL FRAMEWORK FOR CONSTRUCTING POSSIBILISTIC MEMBERSHIP FUNCTIONS 11

Table 3. A Comparison of 1000 Experiments for the Car Evaluation Data


Clustering Overall
Algorithm Highest
The Existing PCAs
PCA96k (1) 49.50
PCA06
89.00
The New GPCAs
GPCA1
93.50
GPCA2
93.50
GPCA3
93.00
GPCA4
93.00

Accuracy (%)
Accuracy Distribution Iteration
Mean Variance [0.8, 1] [0.7, 0.8) [0, 0.7) Number
49.27
81.25

0.07
137.35

0
696

0
0

1000
304

42.42
55.31

84.44
81.90
89.07
89.09

189.17
217.48
92.20
90.02

720
638
872
875

0
0
0
0

280
362
128
125

80.32
75.30
61.45
60.30

Remark 5.2. Similarly, the parameters and have great inuence on the GPCAs.
However, in this paper, we only consider the case when = 10 and = 0.3 in the four
new GPCAs.
5.2. Car Evaluation Data. The second data set comes from an evaluation of a special
car by dierent users. The quality of the car is measured by two main criteria: price and
technical characteristics. 200 users were asked to mark the same car based on the price
and technology, and nally, the car evaluation data set was generated in this way. This
data set contains 200 points in a 2-dimensional space, and is classied into 4 clusters,
which represent unacceptable, acceptable, good and very good. Each cluster has 46, 60,
45 and 49 data, respectively, as shown in Figure 3.

Figure 3. The Car Evaluation Data Set


In this example, we run the PCA96k (1), PCA06, GPCA1, GPCA2, GPCA3, and
GPCA4 for 1000 experiments with fuzzier m = 2, and the results are shown in Table 3.
It follows from Table 3 that the new GPCAs are superior to PCA96k (1) since it seems
that PCA96k (1) is not eective to this data set. It can also be seen that GPCA1 and
GPCA2 have the same eciency with PCA06 both from the perspective of eciency
and variance. In addition, GPCA3 and GPCA4 are obviously superior to PCA06 with a
higher average accuracy. Furthermore, in terms of the variance, it follows from Table 3

12

Q. WANG, J. ZHOU AND C. C. HUNG

Table 4. A Comparison of 1000 Experiments for the Beer Tastes Data


Clustering Overall
Algorithm Highest
The Existing PCAs
PCA06
77.33
The New GPCAs
GPCA1
84.67
GPCA2
84.67
GPCA3
85.00
GPCA4
85.00
K-Means
K-Means
86.00

Accuracy (%)
Accuracy Distribution Iteration
Mean Variance [0.8, 1] [0.7, 0.8) [0, 0.7) Number
75.65

8.62

877

123

90.88

78.48
80.50
81.36
81.50

39.50
30.61
21.468
29.39

648
804
830
852

258
139
127
55

94
57
43
93

19.69
20.62
8.395
8.279

80.61

43.82

777

82

141

13.421

that the results of GPCA3 and GPCA4 ensure a relatively more stable result, which is a
signicant factor in practical applications. From this example, we can see that the new
GPCAs provide results with relatively better average accuracy while more robustness to
initialization (the variance is relatively lower).
5.3. Beer Tastes Data. There are various brands of beer in the market, and various
kinds of beer have dierent tastes. In this subsection, we choose six top brands of beer,
and obtain a data set with two attributes, which represent the values of bitterness and
alcohol concentration. In this way, a beer tastes data set is generated with 300 points and
6 clusters in a 2-dimensional space, and each cluster has 91, 66, 41, 60, 13 and 29 points,
respectively, as shown in Figure 4.

Figure 4. The Beer Tastes Data Set


In this example, we run the K-means, PCA06, GPCA1, GPCA2, GPCA3, and GPCA4
for 1000 experiments with fuzzier m = 2, and Table 4 presents all the clustering results.
It follows from Table 4 more than 64% of experiments of the GPCAs provide the
clustering results with the overall accuracies larger than 80%. Comparing PCA06 with
all the new GPCAs, we can see that though it is possible for the generalized possibilistic
clustering algorithms to provide clustering results with higher overall accuracies than
PCA06 (the highest one even achieves 85.00% by GPCA3 and GPCA4), however, the

A GENERAL FRAMEWORK FOR CONSTRUCTING POSSIBILISTIC MEMBERSHIP FUNCTIONS 13

variances of all the GPCAs including GPCA1, GPCA2, GPCA3 and GPCA4 are much
larger than that of PCA06. It means that PCA06 is more robust to the initialization than
the GPCAs according to the experiments on the Beer Tastes Data. In addition, from the
perspective of average accuracy, it is obvious that the accuracy of new GPCAs exceeds
that of PCA06. When comparing the new GPCAs with the K-means algorithm, it seems
that all of them have similar eciency in terms of both the average accuracy and the
variance, which are almost at the same level.
5.4. Summary. Finally, we summarize the result analysis of all the numerical experiments on the possibilistic clustering algorithms and the generalized possibilistic clustering
algorithms in Tables 1-4 by emphasizing the following points.
a) PCA93 and PCA96 can provide good clustering results with an appropriate parameter
K. However, the typical selection K = 1 is not appropriate for all the data sets, and the
inuence of K on PCA93 and PCA96 is dierent. So in order to perform the clustering
eciently, the selection of parameter K must be discussed.
b) PCA06 is a good clustering algorithm among the existing PCAs because of its easy
control for diverse real applications.
c) The four new algorithms GPCA1, GPCA2, GPCA3 and GPCA4 are also ecient for
clustering all the data sets in this paper when = 10 and = 0.3.
d) Compared with the existing PCAs on the convergence speed (i.e., the iteration number
shown in the last columns of Tables 1-4), the four new GPCAs converge much faster with
better clustering results.
Remark 5.3. The conclusions drawn above are based on the clustering results of the
three real data sets experiments. In order to extend these conclusions, theoretic analysis
or experiments on more data sets are necessary.
6. Conclusion. Fuzzy clustering is an approach using the fuzzy set theory as a tool for
data grouping. In the literature, many fuzzy clustering algorithms based on the objective
function have been proposed including PCA93, PCA96, PCA06. However, it seems that
nding an appropriate objective function with good clustering performance is not easy and
also not necessary from the practical application point of view. Hence, alternating cluster
estimation does not use the objective function and allows the users to select membership
and prototype functions directly through an alternating iteration architecture. In [1],
an application of alternating cluster estimation for possibilistic clustering was initialized,
where the membership functions of clusters can be a monotone decreasing function satisfying (13). The membership functions should be predetermined by the decision-maker
in practice, and various functions lead to the clustering results with dierent accuracies.
In [1], the problem that how to provide the membership functions was not discussed. It
is obviously necessary to make a further discussion on the membership functions.
As a continuous work of [1], this paper tries to present a method of determining an
appropriate membership function for each cluster in the clustering process. By combining
the clustering process with the fuzzy set theory, a methodology of determining a proper
membership function for each cluster is demonstrated, and then four specic functions
are recommended for real applications. By introducing these new functions into the
membership evaluation equations in the generalized possibilistic clustering algorithms
in [1], four new GPCAs are obtained as a result, which have been illustrated by three real
data experiments. The results compared with the existing PCAs showed that the new
proposed algorithms are ecient for clustering these real data sets with easy performance.
Stability of algorithms is necessary for real applications. From the results of numerical
experiments in Section 5, we see that both the PCAs and the GPCAs are not very stable
when clustering all the data sets. As the future work in this area, it is necessary to explore

14

Q. WANG, J. ZHOU AND C. C. HUNG

why the GPCAs are unstable and how to improve the stability of the GPCAs through
theoretic analysis and more real data experiments.
Acknowledgments. This work was supported in part by grants from the Innovation
Program of Shanghai Municipal Education Commission (No. 13ZS065), the National
Social Science Foundation of China (No. 13CGL057), and the Ministry of Education
Funded Project for Humanities and Social Sciences Research (No. 12JDXF005).
REFERENCES
[1] J. Zhou and C. C. Hung, A generalized approach to possibilistic clustering algorithms, International
Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, vol.15, supp.2, pp.110-132, 2007.
[2] A. Baraldi and P. Blonda, A survey of fuzzy clustering algorithms for pattern recognition - part I,
IEEE Transactions on Systems, Man, and Cybernetics - Part B: Cybernetics, vol.29, no.6, pp.778785, 1999.
[3] A. Baraldi and P. Blonda, A survey of fuzzy clustering algorithms for pattern recognition - part II,
IEEE Transactions on Systems, Man, and Cybernetics - Part B: Cybernetics, vol.29, no.6, pp.786801, 1999.
[4] D. Li and C. Zhong, Segmentation of images with damaged blocks based on fuzzy clustering, ICIC
Express Letters, vol.6, no.10, pp.2679-2684, 2012.
[5] P. Wen, J. Zhou and L. Zheng, Hybrid methods of spatial credibilistic clustering and particle swarm
optimization in high noise image segmentation, International Journal of Fuzzy Systems, vol.10, no.3,
pp.174-184, 2008.
[6] J. C. Bezdek, Pattern Recognition with Fuzzy Objective Function Algorithms, Plenum Press, New
York, 1981.
[7] Z. Chen, W. Hong and C. Wang, Fuzzy clustering algorithm of kernel for gene expression data
analysis, ICIC Express Letters, vol.3, no.4, pp.1435-1440, 2009.
[8] D. Li, H. Gu and L. Zhang, A hybrid genetic algorithm-fuzzy c-means approach for incomplete data
clustering based on nearest-neighbor intervals, Soft Computing, vol.17, no.10, pp.1787-1796, 2013.
[9] S. Miyamoto, H. Ichihashi and K. Honda, Algorithms for Fuzzy Clustering, Springer-Verlag, Berlin,
2008.
[10] R. Krishnapuram and J. M. Keller, A possibilistic approach to clustering, IEEE Transactions on
Fuzzy Systems, vol.1, no.2, pp.98-110, 1993.
[11] R. Krishnapuram and J. M. Keller, The possibilistic c-means algorithm: Insights and recommendations, IEEE Transactions on Fuzzy Systems, vol.4, no.3, pp.385-393, 1996.
[12] F. Hopner and F. Klawonn, A contribution to convergence theory of fuzzy c-means and derivatives,
IEEE Transactions on Fuzzy Systems, vol.11, no.5, pp.682-694, 2003.
[13] J. Zhou, L. Cao and N. Yang, On the convergence of some possibilistic clustering algorithms, Fuzzy
Optimization and Decision Making, vol.12, no.4, pp.415-432, 2013.
[14] R. Krishnapuram, H. Frigui and O. Nasraoui, Fuzzy and possibilistic shell clustering algorihm and
their application to boundary detection and surface approximation - Part I, IEEE Transactions on
Fuzzy Systems, vol.3, no.1, pp.29-43, 1995.
[15] R. Krishnapuram, H. Frigui and O. Nasraoui, Fuzzy and possibilistic shell clustering algorihm and
their application to boundary detection and surface approximation - Part II, IEEE Transactions on
Fuzzy Systems, vol.3, no.1, pp.44-60, 1995.
[16] M. S. Yang and K. L. Wu, Unsupervised possibilistic clustering, Pattern Recognition, vol.39, no.1,
pp.5-21, 2006.
[17] F. C. H. Rhee, K. S. Choi and B. I. Choi, Kernel approach to possibilistic C-means clustering,
International Journal of Intelligent Systems, vol.24, no.3, pp.272-292, 2009.
[18] D. T. Anderson, J. C. Bezdek, M. Popescu and J. M. Keller, Comparing fuzzy, probabilistic, and
possibilistic partitions, IEEE Transactions on Fuzzy Systems, vol.18, no.5, pp.906-918, 2010.
[19] J. Z. C. Lai, E. Y. T. Juan and F. J. C. Lai, Rough clustering using generalized fuzzy clustering
algorithm, Pattern Recognition, vol.46, no.9, pp.2538-2547, 2013.
[20] K. Treerattanapitak and C. Jaruskulchai, Possibilistic exponential fuzzy clustering, Journal of Computer Science and Technology, vol.28, no.2, pp.311-321, 2013.
[21] X. Li, H. S. Wong and S. Wu, A fuzzy minimax clustering model and its applications, Information
Sciences, vol.186, no.1, pp.114-125, 2012.
[22] P. Wen, J. Zhou and L. Zheng, A modied hybrid method of spatial credibilistic clustering and
particle swarm optimization, Soft Computing, vol.15, no.5, pp.855-865, 2011.

A GENERAL FRAMEWORK FOR CONSTRUCTING POSSIBILISTIC MEMBERSHIP FUNCTIONS 15

[23] Z. Xie, S. Wang and F. L. Chung, An enhanced possibilistic C-Means clustering algorithm EPCM,
Soft Computing, vol.12, no.6, pp.593-611, 2008.
[24] M. S. Chen and S. W. Wang, Fuzzy clustering analysis for optimizing fuzzy membership functions,
Fuzzy Sets and Systems, vol.103, no.2, pp.239-254, 1999.
[25] S. Miyamoto, Dierent objective functions in fuzzy c-means algorithms and kernel-based clustering,
International Journal of Fuzzy Systems, vol.13, no.2, pp.89-97, 2011.
[26] A. Rodriguez, M. S. Tomas and J. Rubio-Martinez, A benchmark calculation for the fuzzy cmeans clustering algorithm: Initial memberships, Journal of Mathematical Chemistry, vol.50, no.10,
pp.2703-2715, 2012.
[27] F. Barcelo-Rico, J. L. Dez and J. Bondia, New possibilistic method for discovering linear local behavior using hyper-Gaussian distributed membership function, Knowledge and Information Systems,
vol.30, pp.377-403, 2012.
[28] T. A. Runkler and J. C. Bezdek, Alternating cluster estimation: A new tool for clustering and
function approximation, IEEE Transactions on Fuzzy Systems, vol.7, no.4, pp.377-393, 1999.
[29] T. A. Runkler and J. C. Bezdek, Function approximation with polynomial membership functions
and alternating cluster estimation, Fuzzy Sets and Systems, vol.101, no.2, pp.207-218, 1999.
[30] J. Zhou, C. C. Hung, J. He and Y. Luo, Normalized possibilistic clustering algorithms, Proceedings
of the Sixth International Conference on Information and Management Sciences, Lhasa, China, July
1-6, 2007, pp.397-403.
[31] J. Zhou, C. C. Hung, X. Wang and S. Chen, Fuzzy clustering based on credibility measure, Proceedings of the Sixth International Conference on Information and Management Sciences, Lhasa, China,
July 1-6, 2007, pp.404-411.
[32] L. A. Zadeh, Fuzzy sets, Information and Control, vol.8, no.3, pp.338-353, 1965.
[33] E. Anderson, The irises of the Gaspe peninsula, Bulletin of the American Iris Society, vol.59, pp.2-5,
1935.
[34] J. C. Bezdek, J. M. Keller, R. Krishnapuram, L. I. Kuncheva and N. R. Pal, Will the real iris data
please stand up?, IEEE Transactions on Fuzzy Systems, vol.7, no.3, pp.368-369, 1999.

You might also like