You are on page 1of 119

FUZZY CLASSIFICATION

Classification by Equivalence Relations- Crisp Relations, Fuzzy


Relations, Cluster Analysis, Cluster validity, c-Means Clustering- Hard
c-Means (HCM), Fuzzy c-Means (FCM), Classification Metric,
Hardening the Fuzzy c-Partition, Similarity Relations from Clustering
Hard partition
(i) The partition Covers all data points
(ii)The partition are mutual exclusive
Soft partition
Constrained soft partition
Clustering example
Compact separated clusters
Any two points in a cluster are closer than the
distance between two points in different cluster.
Compact well separated Not Compact
well separated
Compact
Not well separated
Criterion for searching cluster center
To to find a minimal J
V is the vector of cluster
centers
P is a partition of data set

=
j k
C x
k
j
j
x
C
v
| |
1
Searching cluster center
Hard C-means algorithm(HCM)
(1) calculating the cost J of current partition
(2) modifying the current cluster centers using
gradient descent method to minimize J.

=
j k
C x
k
j
j
x
C
v
| |
1
Criterion for fuzzy C-means (FCM)
Higher degree of membership will have higher influence
Weighting sum by membership degree
Criterion for fuzzy C-means(FCM)

Ci
(x): xci
dix ci ||x v
i
|| = di
Hharmonic mean
X X
v1
X
1
X X
X
X v2 X
X X
C1 C2
d1
d2
1
2 1
) , (
) (
1
d
d d H
x
C
=
Harmonic Mean
xy
y x
y x H
1 1
1
) , (
+
=

1
) , (

x
y x H
1
) , (

y
y x H
X X
v1
x X X
X
X v2 X
X X
C1 C2
d1
d2
2
2 1
1
2 1
) , ( ) , (
) ( ) (
2 1
d
d d H
d
d d H
x x
C C
+ = +
]
1 1
)[ , (
2 1
2 1
d d
d d H + =
1
) , (
1
) , (
2 1
2 1
= =
d d H
d d H
Constrained soft partition
0
) , (
) (
2
1
2
2 1
2
< < =
d
d
d
d d H
x
C

1
) , (
) (
1
2 1
1
< =
d
d d H
x
C

X X
v1
x X X
X
X v2 X
X X
d1
d2
membership
fuzzy C-means algorithm
Example
Initial v1=(5,5), v2=(10,10)
Example-1
Example-2
Initial v1=(5,5), v2=(10,10)
Fuzzy Clustering - Theory
CLUSTER ANALYSIS
way to search for structure in a dataset X
a component of patter recognition
clusters form a partition
Examples:
partition all credit card users into two groups, those
that
are legally using their credit cards and those who are
illegally using stolen credit cards
partition UCD students into two classes,
those who will go skiing over winter vacation and
those
who will go to the beach
Clustering is a mathematical tool that
attempts to discover structures or
certain patterns in a data set, where
the objects inside each cluster show
a certain degree of similarity.
Clustering
Hard clustering assign each feature
vector to one and only one of the
clusters with a degree of membership
equal to one and well defined
boundaries between clusters.
Fuzzy clustering allows each feature
vector to belong to more than one
cluster with different membership
degrees (between 0 and 1) and
vague or fuzzy boundaries between
clusters.
Difficulties with Fuzzy Clustering
The optimal number of clusters K to be
created has to be determined (the
number of clusters cannot always be
defined a priori and a good cluster
validity criterion has to be found).
The character and location of cluster
prototypes (centers) is not necessarily
known a priori, and initial guesses
have to be made.
Difficulties with Fuzzy Clustering
The data characterized by large
variabilities in cluster shape, cluster
density, and the number of points
(feature vectors) in different clusters
have to be handled.
Objectives and Challenges
Create an algorithm for fuzzy clustering that
partitions the data set into an optimal number
of clusters.
This algorithm should account for variability
in cluster shapes, cluster densities, and the
number of data points in each of the subsets.
Cluster prototypes would be generated
through a process of unsupervised learning.
The Fuzzy k-Means Algorithm
N the number of feature vectors
K the number of clusters (partitions)
q weighting exponent (fuzzifier; q > 1)
u
ik
the ith membership function
on the kth vector ( u
ik
: X [0,1] )

k
u
ik
= 1; 0 <
i
u
ik
< n
V
i
the cluster prototype (the mean of all
feature vectors in cluster i or the
center of cluster i)
J
q
(U,V) the objective function
Partition a set of feature vectors X
into K clusters (subgroups) represented as
fuzzy sets F
1
, F
2
, , F
K
by minimizing the objective function J
q
(U,V)
J
q
(U,V) =
i

k
(u
ik
)
q
d
2
(X
j
V
i
); K N
Larger membership values indicate higher
confidence in the assignment of the pattern to
the cluster.
The Fuzzy k-Means Algorithm
Description of Fuzzy Partitioning
1) Choose primary cluster prototypes V
i
for the values of the memberships
2) Compute the degree of membership of
all feature vectors in all clusters:
u
ij
= [1/d
2
(X
j
V
i
)]
1/(q-1)
/
k
[1/ d
2
(X
j
V
i
)]
1/(q-1)
(1)
under the constraint:
i
u
ik
= 1
Description of Fuzzy Partitioning
3) Compute new cluster prototypes V
i
V
i
=
j
[(u
ij
)
q
X
j
] /
j
(u
ij
)
q
(2)
4) Iterate back and force between (1) and (2)
until the memberships or cluster centers
for successive iteration differ by more than
some prescribed value s (a termination
criterion)
The Fuzzy k-Means Algorithm
Computation of the degree of membership u
ij
depends
on the definition of the distance measure, d
2
(X
j
V
i
):
d
2
(X
j
V
i
) = (X
j
V
i
)
T
L
-1
(X
j
V
i
)
L = I => The distance is Euclidian, the shape of the
clusters assumed to be hyperspherical
L is arbitrary => The shape of the clusters assumed
to be of arbitrary shape
The Fuzzy k-Means Algorithm
For the hyperellipsoidal clusters, an exponential
distance measure, d
2
e
(X
j
V
i
), based on ML
estimation was defined:
d
2
e
(X
j
V
i
) = [det(F
i
)]
1/2
/P
i
exp[(X
j
V
i
)
T
F
i
-1
(X
j
V
i
)/2]
F
i
the fuzzy covariance matrix of the ith cluster
P
i
the a priori probability of selecting ith cluster
h(i/X
j
) = (1/d
2
e
(X
j
V
i
))/
k
(1/d
2
e
(X
j
V
k
))
h(i/X
j
) the posterior probability (the probability of
selecting ith cluster given jth vector)
The Fuzzy k-Means Algorithm
Its easy to see that for q = 2, h(i/X
j
) = u
ij
Thus, substituting u
ij
with h(i/X
j
) results in the fuzzy
modification of the ML estimation (FMLE).
Addition calculations for the FMLE:
The Major Advantage of FMLE
Obtaining good partition results starting from
good classification prototypes.
The first layer of the algorithm, unsupervised
tracking of initial centroids, is based on the fuzzy
K-means algorithm.
The next phase, the optimal fuzzy partition, is
being carried out with the FMLE algorithm.
Unsupervised Tracking of Cluster
Prototypes
Different choices of classification prototypes
may lead to different partitions.
Given a partition into k cluster prototypes, place
the next (k +1)th cluster center in a region where
data points have low degree of membership in the
existing k clusters.
Unsupervised Tracking of Cluster
Prototypes
1) Compute average and standard deviation of the
whole data set.
2) Choose the first initial cluster prototype at the
average location of all feature vectors.
3) Choose an additional classification prototype
equally distant from all data points.
4) Calculate a new partition of the data set
according to steps 1) and 2) of the fuzzy
k-means algorithm.
1) If k, the number of clusters, is less than a given
maximum, go to step 3, otherwise stop.
Common Fuzzy Cluster Validity
Each data point has K memberships; so, it is
desirable to summarize the information by a
single number, which indicates how well the
data point (X
k
) is classified by clustering.

i
(u
ik
)
2
partition coefficient

i
(u
ik
) logu
ik
classification entropy
max
i
u
ik
proportional coefficient
The cluster validity is just the average of any
of those functions over the entire data set.
Proposed Performance Measures
Good clusters are actually not very fuzzy.
The criteria for the definition of optimal
partition of the data into subgroups were
based on the following requirements:
1. Clear separation between the resulting
clusters
2. Minimal volume of the clusters
3. Maximal number of data points concentrated
in the vicinity of the cluster centroid
Proposed Performance Measures
Fuzzy hypervolume, F
HV
, is defined by:
Where F
i
is given by:
Proposed Performance Measures
Average partition density, D
PA
, is calculated from:
Where S
i
, the sum of the central members, is given by:
Proposed Performance Measures
The partition density, P
D
, is calculated from:
Sample Runs
In order to test the performance of the
algorithm, N artificial m-dimensional
feature vectors from a multivariate normal
distribution having different parameters and
densities were generated.
Situations of large variability of cluster
shapes, densities, and number of data points
in each cluster were simulated.
FCM Clustering with Varying
Density
The higher density cluster attracts all other cluster prototypes
so that the prototype of the right cluster is slightly drawn away
from the original cluster center and the prototype of the left
cluster migrates completely into the dense cluster.
Fig. 3. Partition of 12 clusters generated from five-
dimensional multivariate Gaussian distribution with
unequally variable features, variable densities and
variable number of data points ineach cluster (only three
of the features are displayed).
(a) Data points before partitioning
(b) Partition of 12 subgroups using the UFP-ONC algorithm.
All data points gave been classified correctly.
(a) (b)
Conclusions
The new algorithm, UFP-ONC
(unsupervised fuzzy partition-optimal number
of classes), that combines the most favorable
features of both the fuzzy K-means algorithm
and the FMLE, together with unsupervised
tracking of classification prototypes, were
created.
The algorithm performs extremely well in
situations of large variability of cluster shapes,
densities, and number of data points in each
cluster .
Fuzzy Clustering - Theory
REMARKS: (1) The dataset, in the case of students
would include such things as age, school, income
of parents, number of years as student, marital
status
(2) Classical cluster analysis would partition
the set of student (with respect to their
characteristics; that is, the items in the dataset) into
disjoint sets P
i
so that we would have:
. for } { and
1
j i
j
P
i
P
c
i
P
i
= 1 =
=
+ 7
Fuzzy Clustering - Theory
Lets suppose that our dataset has:
Age = {17,18,,35}
School = {Arts, Drama, , Civil Engineering, Natural
Sciences, Mathematics, Computer Science}
Income = {$0 $500,000}
Note: It is (or should be) intuitively clear that for this
problem the partitions are intersecting since for many
students there is an equal preference between going
to the beach and going to ski for vacation and the
preferences are not zero/one for most students.
Fuzzy Clustering - Theory
The idea of cluster analysis is to obtain centers (i=1,,c
where c=2 for the example of skiing and going to the
beach) v
1
,,v
c
that are exemplars and radii that will
define the partition. Now, the centers serve as
exemplars and an advertising company could sent
skiing brochures to the group that is defined by the
first center and another brochure for beach trips for
students. The idea of fuzzy clustering (fuzzy c-
means clustering where c is an a-priori chosen
number of clusters) is to allow overlapping clusters
with partial membership of individuals in clusters.
Fuzzy Clustering - Theory
degree some to element(s) some
contain must partition a , partitions empty no are there - 1 ) ( 0
d distribute completely be must point data each that says - 1 ) (
such that
,..., partition fuzzy
,..., dataset : Given
1
1
1
1

=
=
=
=

,
|

,
|
j
N
j
i
A
j
c
i
i
A
c
N
x
x
A A P
x x X

A
1
= {0.6/x
1
, 1/x
2
, 0.1/x
3
}
A
2
= {0.4/x
1
, 0/x
2
, 0.9/x
3
}
Fuzzy Clustering Example (from
Klir&Yuan)
3 3 . 1 ) ( ) ( ) ( 0
3 7 . 1 ) ( ) ( ) ( 0
: 2 Property
0 . 1 ) ( ) (
0 . 1 ) ( ) (
0 . 1 ) ( ) (
: 1 Property
3
2
2
2
1
2
3
1
2
1
1
1
3
2
3
1
2
2
2
1
1
2
1
1
= + +
= + +
= +
= +
= +
x x x
x x x
x x
x x
x x
A A A
A A A
A A
A A
A A





Fuzzy Clustering
In general:

=
T =
000 , 60 $
20
#17, student particular
example, our For vector. l dimensiona - p a
17
1
s mathematic x
x
x
x
p
p
j
j
j
/
T
Fuzzy Clustering
Suppose all components to the vectors in the dataset
are numeric, then:
m>1 governs the effect of the membership grade.
(4.1)
)] ( [
)] ( [
1
1
m
j
N
j
i
A
j
m
j
N
j
i
A
i
x
x x
v

=
=
=

Fuzzy Clustering
Given a way to compute the center v
i
we need a way to
measure how good these centers are (one by one).
This is done by a performance measure or
objective function as follows:
(4.2) )] ( [ ) (
2
1 1
i j
m
j
N
j
c
i
i
A
v x x P
m
J

=
= =

Fuzzy Clustering: Fuzzy c-means algorithm


Step 1: Set k=0, select an initial partition P
(0)
Step 2: Calculate centers v
i
(k)
according to equation
(4.1)
Step 3: Update the partition to P
(k+1)
according to:
Fuzzy Clustering: Fuzzy c-means algorithm (step 3
continued)
. for 0 ) (
and 1 ) (
that so values ly) (arbitrari assign then }, ,..., 1 { some for 0 If
. ) (
then 0, if ), dataset (
) 1 (
) 1 (
2
) (
1
1
1
2
) (
2
) (
) 1 (
2
) (
I i x
x
c I i v x
v x
v x
x
v x X x
j
k
i
A
I i
j
k
i
A
k
i j
c
l
m
k
l
j
k
i j
j
k
i
A
k
i j
j
=
=

_ =

=
> V
+

Step 4: Compare P
(k)
to P
(k+1)
. If || P
(k)
- P
(k+1)
|| < s
then stop. Otherwise set k:=k+1 and go to step 2.
Remark: the computation of the updated membership
function is the condition for the minimization of the
objective function given by equation (4.2).
The example that follows uses c=2, s=00l, the
Euclidean norm and A
1
= {0.854/x
1
,, 0.854/x
15
}
and
A
2
= {0.146/x
1
,, 0.146/x
15
}.
For k=6, A
1
and A
2
are given in the following slide
where v
1
(6)
=(0.88,2)
T
and v
2
(6)
=(5.14,2)
T
Fuzzy Clustering: Fuzzy c-means algorithm
Fuzzy Clustering - Example
Cluster (www.m-w.com)
A number of similar individuals that
occur together as a: two or more
consecutive consonants or vowels in
a segment of speech b: a group of
houses (...) c: an aggregation of
stars or galaxies that appear close
together in the sky and are
gravitationally associated.
Cluster analysis (www.m-
w.com)
A statistical classification technique
for discovering whether the
individuals of a population fall into
different groups by making
quantitative comparisons of multiple
characteristics.
Vehicle Example
Vehicle Top speed
km/h
Colour Air
resistance
Weight
Kg
V1 220 red 0.30 1300
V2 230 black 0.32 1400
V3 260 red 0.29 1500
V4 140 gray 0.35 800
V5 155 blue 0.33 950
V6 130 white 0.40 600
V7 100 black 0.50 3000
V8 105 red 0.60 2500
V9 110 gray 0.55 3500
Vehicle Clusters
100 150 200 250 300
500
1000
1500
2000
2500
3000
3500
Top speed [km/h]
W
e
i
g
h
t

[
k
g
]
Sports cars
Medium market cars
Lorries
Terminology
100 150 200 250 300
500
1000
1500
2000
2500
3000
3500
Top speed [km/h]
W
e
i
g
h
t

[
k
g
]
Sports cars
Medium market cars
Lorries
Object or data point
featur
e
feature space
cluste
r
featur
e
label
Example: Classify cracked tiles
475Hz 557Hz Ok?
-----+-----+---
0.958 0.003 Yes
1.043 0.001 Yes
1.907 0.003 Yes
0.780 0.002 Yes
0.579 0.001 Yes
0.003 0.105 No
0.001 1.748 No
0.014 1.839 No
0.007 1.021 No
0.004 0.214 No
Table 1: frequency
intensities for ten
tiles.
Tiles are made from clay moulded into the right shape, brushed, glazed,
and baked. Unfortunately, the baking may produce invisible cracks.
Operators can detect the cracks by hitting the tiles with a hammer, and in
an automated system the response is recorded with a microphone, filtered,
Fourier transformed, and normalised. A small set of data is given in TABLE
1 (adapted from MIT, 1997).
Algorithm: hard c-means (HCM)
(also known as k means)
Plot of tiles by frequencies (logarithms). The whole tiles (o) seem
well separated from the cracked tiles (*). The objective is to find
the two clusters.
-8 -6 -4 -2 0 2
-8
-7
-6
-5
-4
-3
-2
-1
0
1
2
log(intensity) 475 Hz
l
o
g
(
i
n
t
e
n
s
i
t
y
)

5
5
7

H
z
Tiles data: o = whole tiles, * = cracked tiles, x = c entres
1. Place two cluster centres (x) at random.
2. Assign each data point (* and o) to the nearest cluster centre (x)
-8 -6 -4 -2 0 2
-8
-7
-6
-5
-4
-3
-2
-1
0
1
2
log(intensity) 475 Hz
l
o
g
(
i
n
t
e
n
s
i
t
y
)

5
5
7

H
z
Tiles data: o = whole tiles, * = cracked tiles, x = c entres
-8 -6 -4 -2 0 2
-8
-7
-6
-5
-4
-3
-2
-1
0
1
2
log(intensity) 475 Hz
l
o
g
(
i
n
t
e
n
s
i
t
y
)

5
5
7

H
z
Tiles data: o = whole tiles, * = cracked tiles, x = c entres
1. Compute the new centre of each class
2. Move the crosses (x)
Iteration 2
-8 -6 -4 -2 0 2
-8
-7
-6
-5
-4
-3
-2
-1
0
1
2
log(intensity) 475 Hz
l
o
g
(
i
n
t
e
n
s
i
t
y
)

5
5
7

H
z
Tiles data: o = whole tiles, * = cracked tiles, x = c entres
Iteration 3
-8 -6 -4 -2 0 2
-8
-7
-6
-5
-4
-3
-2
-1
0
1
2
log(intensity) 475 Hz
l
o
g
(
i
n
t
e
n
s
i
t
y
)

5
5
7

H
z
Tiles data: o = whole tiles, * = cracked tiles, x = c entres
Iteration 4 (then stop, because no visible change)
Each data point belongs to the cluster defined by the nearest centre
-8 -6 -4 -2 0 2
-8
-7
-6
-5
-4
-3
-2
-1
0
1
2
log(intensity) 475 Hz
l
o
g
(
i
n
t
e
n
s
i
t
y
)

5
5
7

H
z
Tiles data: o = whole tiles, * = cracked tiles, x = c entres
The membership matrix M:
1. The last five data points (rows) belong to the first cluster (column)
2. The first five data points (rows) belong to the second cluster (column)
M =
0.0000 1.0000
0.0000 1.0000
0.0000 1.0000
0.0000 1.0000
0.0000 1.0000
1.0000 0.0000
1.0000 0.0000
1.0000 0.0000
1.0000 0.0000
1.0000 0.0000
Membership matrix M

,
|

=
otherwise
if
m
j k i k
ik
0
1
2
2
c u c u
data point k cluster centre
i
distance
cluster centre
j
c-partition
K c
i all for U C
j i all for C C
U C
i
j i
c
i
i


= =
=
=
2
1
7
All clusters C
together fills the
whole universe U
Clusters do not
overlap
A cluster C is
never empty and it
is smaller than the
whole universe U
There must be at least 2
clusters in a c-partition
and at most as many as
the number of data
points K
Objective function

= =

= =
c
i C k
i k
c
i
i
i k
J J
1
2
, 1 u
c u
Minimise the total sum
of all distances
Algorithm: fuzzy c-means (FCM)
Each data point belongs to two clusters to different degrees
-8 -6 -4 -2 0 2
-8
-7
-6
-5
-4
-3
-2
-1
0
1
2
log(intensity) 475 Hz
l
o
g
(
i
n
t
e
n
s
i
t
y
)

5
5
7

H
z
Tiles data: o = whole tiles, * = cracked tiles, x = c entres
1. Place two cluster centres
2. Assign a fuzzy membership to each data point depending on
distance
-8 -6 -4 -2 0 2
-8
-7
-6
-5
-4
-3
-2
-1
0
1
2
log(intensity) 475 Hz
l
o
g
(
i
n
t
e
n
s
i
t
y
)

5
5
7

H
z
Tiles data: o = whole tiles, * = cracked tiles, x = c entres
1. Compute the new centre of each class
2. Move the crosses (x)
-8 -6 -4 -2 0 2
-8
-7
-6
-5
-4
-3
-2
-1
0
1
2
log(intensity) 475 Hz
l
o
g
(
i
n
t
e
n
s
i
t
y
)

5
5
7

H
z
Tiles data: o = whole tiles, * = cracked tiles, x = c entres
Iteration 2
-8 -6 -4 -2 0 2
-8
-7
-6
-5
-4
-3
-2
-1
0
1
2
log(intensity) 475 Hz
l
o
g
(
i
n
t
e
n
s
i
t
y
)

5
5
7

H
z
Tiles data: o = whole tiles, * = cracked tiles, x = c entres
Iteration 5
-8 -6 -4 -2 0 2
-8
-7
-6
-5
-4
-3
-2
-1
0
1
2
log(intensity) 475 Hz
l
o
g
(
i
n
t
e
n
s
i
t
y
)

5
5
7

H
z
Tiles data: o = whole tiles, * = cracked tiles, x = c entres
Iteration 10
-8 -6 -4 -2 0 2
-8
-7
-6
-5
-4
-3
-2
-1
0
1
2
log(intensity) 475 Hz
l
o
g
(
i
n
t
e
n
s
i
t
y
)

5
5
7

H
z
Tiles data: o = whole tiles, * = cracked tiles, x = c entres
Iteration 13 (then stop, because no visible change)
Each data point belongs to the two clusters to a degree
-8 -6 -4 -2 0 2
-8
-7
-6
-5
-4
-3
-2
-1
0
1
2
log(intensity) 475 Hz
l
o
g
(
i
n
t
e
n
s
i
t
y
)

5
5
7

H
z
Tiles data: o = whole tiles, * = cracked tiles, x = c entres
The membership matrix M:
1. The last five data points (rows) belong mostly to the first cluster (column)
2. The first five data points (rows) belong mostly to the second cluster (column)
M =
0.0025 0.9975
0.0091 0.9909
0.0129 0.9871
0.0001 0.9999
0.0107 0.9893
0.9393 0.0607
0.9638 0.0362
0.9574 0.0426
0.9906 0.0094
0.9807 0.0193
Fuzzy membership matrix
M

=
c
j
q
jk
ik
ik
d
d
m
1
1 / 2
1
i k ik
d c u =
Distance from point k
to current cluster
centre i
Distance from point k
to other cluster
centres j
Point ks membership
of cluster i
Fuzzines
s
exponent
Fuzzy membership matrix
M
ik
m



1 / 2 1 / 2
2
1 / 2
1
1 / 2
1 / 2 1 / 2
2
1 / 2
1
1
1 / 2
1 1 1
1
1
1

+ + +
=

+ +

q
ck
q
k
q
k
q
ik
q
ck
ik
q
k
ik
q
k
ik
c
j
q
jk
ik
d d d
d
d
d
d
d
d
d
d
d
.
.
Gravitation to
cluster i relative
to total
gravitation
Electrical Analogy
R
1
R
2
i
1
i
2
U
I
I
i
i
U
I
U
R
R
R R R
R
R
R
R R R
R
RI U
i
i
i
c
i
i
c
= =
+ + +
=
+ + +
=
=
1 1
1 1 1
1
1
1 1 1
1
2 1
2 1
.
.
Same form as
m
ik
Fuzzy Membership
1 2 3 4 5
0
0.5
1
Cluster centres
M
e
m
b
e
r
s
h
i
p

o
f

t
e
s
t

p
o
i
n
t
o is with q = 1.1, * is with q = 2
Data
point
Fuzzy c-partition
K c
i all for U C
j i all for C C
U C
i
j i
c
i
i


= =
=
=
2
1
7
All clusters C together fill
the whole universe U.
Remark: The sum of
memberships for a data
point is 1, and the total for
all points is K
Not valid: Clusters
do overlap
A cluster C is
never empty and it
is smaller than the
whole universe U
There must be at least 2
clusters in a c-partition
and at most as many as
the number of data
points K
Example: Classify cancer
cells
Normal smear Severely dysplastic smear
Using a small brush, cotton stick, or wooden
stick, a specimen is taken from the uterin
cervix and smeared onto a thin, rectangular
glass plate, a slide. The purpose of the smear
screening is to diagnose pre-malignant cell
changes before they progress to cancer. The
smear is stained using the Papanicolau
method, hence the name Pap smear.
Different characteristics have different
colours, easy to distinguish in a microscope.
A cyto-technician performs the screening in a
microscope. It is time consuming and prone
to error, as each slide may contain up to
300.000 cells.
Dysplastic cells have undergone precancerous
changes. They generally have longer and darker
nuclei, and they have a tendency to cling together in
large clusters. Mildly dysplastic cels have enlarged
and bright nuclei. Moderately dysplastic cells have
larger and darker nuclei. Severely dysplastic cells
have large, dark, and often oddly shaped nuclei. The
cytoplasm is dark, and it is relatively small.
Possible Features
Nucleus and cytoplasm area
Nucleus and cyto brightness
Nucleus shortest and longest
diameter
Cyto shortest and longest diameter
Nucleus and cyto perimeter
Nucleus and cyto no of maxima
(...)
Classes are nonseparable
Hard Classifier (HCM)
O
k
light
moderat
e
severe O
k
A cell is either one
or the other class
defined by a colour.
Fuzzy Classifier (FCM)
O
k
light
moderat
e
severe O
k
A cell can belong to
several classes to a
Degree, i.e., one column
may have several colours.
Function approximation
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
-1.5
-1
-0.5
0
0.5
1
1.5
Input
O
u
t
p
u
t
1
Curve fitting in a multi-dimensional space is also called
function approximation. Learning is equivalent to finding a
function that best fits the training data.
Approximation by fuzzy
sets
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
-2
-1
0
1
2
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0
0.2
0.4
0.6
0.8
1
Procedure to find a model
1. Acquire data
2. Select structure
3. Find clusters, generate model
4. Validate model
Conclusions
Compared to neural networks, fuzzy
models can be interpreted by human
beings
Applications: system identification,
adaptive systems
Links
J. Jantzen: Neurofuzzy Modelling. Technical University of
Denmark: Oersted-DTU, Tech report no 98-H-874 (nfmod), 1998.
URL http://fuzzy.iau.dtu.dk/download/nfmod.pdf
PapSmear tutorial. URL http://fuzzy.iau.dtu.dk/smear/
U. Kaymak: Data Driven Fuzzy Modelling. PowerPoint, URL
http://fuzzy.iau.dtu.dk/tutor/ddfm.htm
Exercise: fuzzy clustering
(Matlab)
Download and follow the instructions in this text
file: http://fuzzy.iau.dtu.dk/tutor/fcm/exerF5.txt
The exercise requires Matlab (no special
toolboxes are required)
Fuzzy Classification
Classification by Equivalence Relations
A set, [x
i
] = {x
j
| (x
i
x
j
) R}
As the equivalence class of X
i
on a universe of patterns.
Properties [Bezdek, 1974]
1. x
i
[x
i
] (x
i
,x
j
) R
2. [x
i
] = [x
j
] [x
i
] [x
j
] = o
3. [x] = X x X
Hence, the equivalence relation R can divide the universe
X into mutually exclusive equivalent classes, i.e.,
X | R = {[x] | x X}
Where, X|R is the quotient set of X with respect to relation
R
Fuzzy Classification
1 2 3 4 5 6 7 8 9 10
1 1 0 0 1 0 0 1 0 0 1
2 0 1 0 0 1 0 0 1 0 0
3 0 0 1 0 0 1 0 0 1 0
4 1 0 0 1 0 0 1 0 0 1
5 0 1 0 0 1 0 0 1 0 0
6 0 0 1 0 0 1 0 0 1 0
7 1 0 0 1 0 0 1 0 0 1
8 0 1 0 0 1 0 0 1 0 0
9 0 0 1 0 0 1 0 0 1 0
10 1 0 0 1 0 0 1 0 0 1
R =
The relation is reflexive, symmetric and transitive. Hence, the
matrix is an equivalence relation.
Fuzzy Classification
We can group the elements of the universe into classes as:
[1] = [4] = [7] = [10] = {1,4,7,10} with remainder = 1
[2] = [5] = [8] = {2,5,8} with remainder = 2
[3] = [6] = [9] = {3,6,9} with remainder = 0
With these classes, we can prove the three properties
discussed earlier. Hence, the quotient set is:
X | R ={(1,4,7,10),(2,5,8),(3,6,9)}
Not all relations are equivalent, but a tolerance relation can
become an equivalent one by max-min compositions.
Fuzzy Relations
1 0.8 0 0.1 0.2
0.8 1 0.4 0 0.9
0 0.4 1 0 0
0.1 0 0 1 0.5
0.2 0.9 0 0.5 1
R
t
= R =
By taking -cuts of fuzzy equivalent relation R at values of
= 1, 0.9, 0.8, 0.5, 0.4; we get the following:
1 0.8 0.4 0.5 0.8
0.8 1 0.4 0.5 0.9
0.4 0.4 1 0.4 0.4
0.5 0.5 0.4 1 0.5
0.8 0.9 0.4 0.5 1
1 1 0 1 1
1 1 0 1 1
0 0 1 0 0
1 1 0 1 1
1 1 0 1 1
1 0
1
1
1
0 1
R
1
R
0.9
R
0.8
R
0.5
R
0.4
1 0 0 0 0
0 1 0 0 1
0 0 1 0 0
0 0 0 1 0
0 1 0 0 1
1 1 0 0 1
1 1 0 0 1
0 0 1 0 0
0 0 0 1 0
1 1 0 0 1
1 1 1 1 1
1 1 1 1 1
1 1 1 1 1
1 1 1 1 1
1 1 1 1 1
Fuzzy Relations
The classification can be described as follows:
Fuzzy Relations
Example:
3 families, 16 people related by blood. Each has a photo,
and find the relation by viewing the photos. Similarity matrix
is as follows:
Fuzzy Relations
Convert to an equivalent relation by composition.
Fuzzy Relations
-cut = 0.6, we have
Fuzzy Relations
Four distinct classes are identified:
{1,6,8,13,16}, {2,5,7,11,14}, {3}, {4,9,10,12,15}
From this clustering it seems that only photograph number
3 cannot be identified with any of the families. Perhaps a
lower value of might assign photograph 3 to one of the
other three classes.
The other three clusters are all correct.
Cluster Analysis
How many clusters?
C-means clustering
Sample set: X = {x
1
,x
2
,,x
n
}
n points, each x
i
= {x
i1
,x
i2,
,x
im
} is an m-dimensional vector.


V2
V1
Minimize the distance in
each cluster
Maximize the distance
between clusters
Cluster Analysis
Hard C-means (HCM)
Classify data in crisp sense.
Each data will be one and only one cluster.
n C
X A
j i A A
X A
i
j i
C
i
i


= =
=
=
2
1
o
o
7
Cluster Analysis
The objective function for the hard c-means algorithm is
known as a within-class sum of squared errors approach
using a Euclidian norm to characterize distance. It is given
by:
Where,
U: partition matrix V: vector of cluster centers
D
ik
: Euclidian distance in m-dimensional feature space
between the k
th
data sample and i
th
cluster center v
i
, given
by:

= =
=
n
k
C
i
ik ik
d v U J
1 1
2
, G

2 / 1
2

= = =
ij kj i k i k ik
v x v x v x d d
Fuzzy Pattern Recognition
Features
Feature Extraction
Partition of feature space
Fuzzy Pattern Recognition
Multi-feature pattern recognition: more features
Multi-dimensional pattern recognition
1. Nearest neighbor classifier.
2. Nearest center classifier.
3. Weighted approaching degree.
Fuzzy Pattern Recognition
Nearest neighbor approach:
Sample X
i
has m features
x
i
= {x
i1
,x
i2
,,x
im
}
X = {X
1
,X
2
,,X
n
}
We can use C-fuzzy partitions, then get c-hard partitions
If we have new singleton data X, then
x and x
i
in the same class
j i A A A X
j i
c
i
i
= =
=
7
1
_ a
k
n k
i
x x d x x d , min ,
1
=
Fuzzy Pattern Recognition
Nearest Center Classifier:
First got c-clusters, the center for each cluster v
i
and V =
{V
1
,V
2
,,V
c
}
x is in cluster i
_ a
k
c k
i
v x d v x D , min ,
1
=
Syntactic recognition
Examples include image recognition, fingerprint
recognition, chromosome analysis, character recognition,
scene analysis, etc.
Problem: how to deal with noise?
Solution: a few noteworthy of them are [Fu, 1982]:
The use of approximation
The use of transformational grammars
The use of similarity and error-correcting parsing
The use of stochastic grammars
The use of fuzzy grammars
Syntactic recognition
Fuzzy Grammar and Application
Primitives are fuzzy, or productions are fuzzy, or both are
fuzzy
A fuzzy language:
J: {r
i
| I = 1,2,,n, n = cardinality of P}
(r
i
): membership rule of the production rule r
:J[0,1]

, , , , ,
/
*
J S P V V F
x x F F
T N G
L
V x
L
T
=
=

Syntactic recognition
A string x L iff
M: # of derivations
l
k
: the length of the k
th
derivation chain
r: i
th
production used in the k
th
derivation chain

_ a
k
i
l k m k
L
L
r x
x
k


=
>
1 1
min max
0

You might also like