Partition Based

Partitioning Algorithms: Basic Concepts
 Partition n objects into k clusters

 Optimize the chosen partitioning criterion
 Example: minimize the Squared Error
 Squared Error of a cluster
Error (Ci )   d ( p, m )
2
i
pCi
mi is the mean (centroid) of Ci

 Squared Error of a clustering
k k
Error   Error (Ci )   d ( p, mi )
2
i 1 i 1 pCi
1
Example of Square Error of Cluster
Ci={P1, P2, P3}

10
P1 = (3, 7)
9
8 P2 = (2, 3)
P1
7 P3 = (7, 5)
6 P3 mi = (4, 5)
5
4 P2 mi |d(P1, mi)|2
3 =(3-4)2+(7-5)2=5
2 |d(P2, mi)|2=8
1 |d(P3, mi)|2=9
0 1 2 3 4 5 6 7 8 9 10
Error (Ci)=5+8+9=22
2
Example of Square Error of Cluster
Cj={P4, P5, P6}

10
P4 = (4, 6)
9
8 P5 = (5, 5)
7 P6 = (3, 4)
P4
6 P5 mj = (4, 5)
5
4 mj |d(P4, mj)|2
P6
3 =(4-4)2+(6-5)2=1
2 |d(P5, mj)|2=1
1 |d(P6, mj)|2=1
0 1 2 3 4 5 6 7 8 9 10
Error (Cj)=1+1+1=3
3
Partitioning Algorithms: Basic Concepts
 Global optimal: examine all possible partitions

 kn possible partitions, too expensive!
 Heuristic methods: k-means and k-medoids
 k-means (MacQueen’67): Each cluster is
represented by center of cluster
 k-medoids (Kaufman & Rousseeuw’87): Each
cluster is represented by one of the objects
(medoid) in cluster
4
K-means
 Initialization
 Arbitrarily choose k objects as the initial cluster centers
(centroids)
 Iteration until no change
 For each object Oi
 Calculate the distances between Oi and the k centroids
 (Re)assign Oi to the cluster whose centroid is the closest

to Oi
 Update the cluster centroids based on current assignment
5
k-Means Clustering Method cluster
10 current mean
10
9 clusters 9
8 8
7 7
6 6
5 5
4 4
3 3
2 2
1 1
0 0
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
objects
new relocated
clusters
10
10
9
9
8
8
7
7
6
6
5
5
4
4
3
3
2
2
1
1
0
0
0 1 2 3 4 5 6 7 8 9 10
0 1 2 3 4 5 6 7 8 9 10
6
Example
 For simplicity, 1 dimensional objects and k=2.

 Objects: 1, 2, 5, 6,7
 K-means:
 Randomly select 5 and 6 as initial centroids;
 => Two clusters {1,2,5} and {6,7}; meanC1=8/3,
meanC2=6.5
 => {1,2}, {5,6,7}; meanC1=1.5, meanC2=6
 => no change.
 Aggregate dissimilarity = 0.5^2 + 0.5^2 + 1^2 + 1^2 =
2.5
7
Variations of k-Means Method
 Aspects of variants of k-means
 Selection of initial k centroids
 E.g., choose k farthest points
 Dissimilarity calculations
 E.g., use Manhattan distance
 Strategies to calculate cluster means

 E.g., update the means incrementally
8
Strengths of k-Means Method
 Strength
 Relatively efficient for large datasets
 O(tkn) where n is # objects, k is # clusters, and t is #
iterations; normally, k, t <<n

 Often terminates at a local optimum
 global optimum may be found using techniques such as
deterministic annealing and genetic algorithms
9
Weakness of k-Means Method
 Weakness
 Applicable only when mean is defined, then what about
categorical data?
 k-modes algorithm
 Unable to handle noisy data and outliers

 k-medoids algorithm
 Need to specify k, number of clusters, in advance

 Hierarchical algorithms
 Density-based algorithms
10
k-modes Algorithm age income student credit_rating
< = 30 high no fair
 Handling categorical data: < = 30 high no excellent
31…40 high no fair
k-modes (Huang’98) > 40 medium no fair
 Replacing means of > 40 low yes fair
> 40 low yes excellent
clusters with modes
31…40 low yes excellent
 Given n records in < = 30 medium no fair
cluster, mode is record < = 30 low yes fair
made up of most > 40 medium yes fair
< = 30 medium yes excellent
frequent attribute
31…40 medium no excellent
values 31…40 high yes fair
 In the example cluster, mode = (<=30, medium, yes, fair)
 Using new dissimilarity measures to deal with
categorical objects
11
A Problem of K-means
 Sensitive to outliers
 Outlier: objects with extremely large (or small) values
 May substantially distort the distribution of the data
+
+
Outlier
12
k-Medoids Clustering Method
 k-medoids: Find k representative objects, called medoids

 PAM (Partitioning Around Medoids, 1987)
 CLARA (Kaufmann & Rousseeuw, 1990)
 CLARANS (Ng & Han, 1994): Randomized sampling
10 10
9 9
8 8
7 7
6 6
5 5
4 4
3 3
2 2
1 1
0 0
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
k-means k-medoids
13
PAM (Partitioning Around Medoids) (1987)
 PAM (Kaufman and Rousseeuw, 1987)

 Arbitrarily choose k objects as the initial medoids
 Until no change, do
 (Re)assign each object to the cluster with the nearest
medoid
 Improve the quality of the k-medoids
(Randomly select a nonmedoid object, Orandom,
compute the total cost of swapping a medoid with
Orandom)
 Work for small data sets (100 objects in 5 clusters)
 Not efficient for medium and large data sets
14
Swapping Cost
 For each pair of a medoid m and a non-medoid object h,
measure whether h is better than m as a medoid
 Use the squared-error criterion
k
E    d ( p, mi ) 2
i 1 pCi
 Compute Eh-Em
 Negative: swapping brings benefit
 Choose the minimum swapping cost
15
Four Swapping Cases
 When a medoid m is to be swapped with a non-medoid
object h, check each of other non-medoid objects j
 j is in cluster of m reassign j
 Case 1: j is closer to some k than to h; after swapping m and

h, j relocates to cluster represented by k
 Case 2: j is closer to h than to k; after swapping m and h, j is
in cluster represented by h
 j is in cluster of some k, not m  compare k with h
 Case 3: j is closer to some k than to h; after swapping m and
h, j remains in cluster represented by k

 Case 4: j is closer to h than to k; after swapping m and h, j is
in cluster represented by h
16
PAM Clustering: Total swapping cost TCmh=jCjmh
Case 1 10 Case 3 10
9 9
j
8
h 8
k
7
6
j 7
5 5
m h
4
k 4
2
3
2
m
1
1
0
0
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
C jmh = d(j, k)  d(j, m)  0 C jmh = d(j, k)  d(j, k)= 0

Case 2 10
Case 4 10
9
9
8
k 8
7
h 7
6
j 6
5
5 m
4
3
m 4
h j
3
1
2
1
k
0
0
0 1 2 3 4 5 6 7 8 9 10
0 1 2 3 4 5 6 7 8 9 10
C jmh = d(j, h)  d(j, m) C jmh= d (j, h)  d(j, k) < 0

May be positiv e or negativ e
17
Complexity of PAM
 Arbitrarily choose k objects as
the initial medoids O(1)
 Until no change, do O((n-k)2*k)
 (Re)assign each object to the
cluster with the nearest
medoid O((n-k)*k)
 Improve the quality of the k-
medoids O((n-k)2*k)
 For each pair of medoid m
and non-medoid object h (n-k)*k times

 Calculate the swapping
cost TCmh =jCjmh O(n-k)
18
Strength and Weakness of PAM
 PAM is more robust than k-means in the presence of

outliers because a medoid is less influenced by outliers
or other extreme values than a mean
 PAM works efficiently for small data sets but does not
scale well for large data sets
 O(k(n-k)2 ) for each iteration
where n is # of data objects, k is # of clusters
 Can we find the medoids faster?
19
CLARA (Clustering Large Applications) (1990)
 CLARA (Kaufmann and Rousseeuw in 1990)

 Built in statistical analysis packages, such as S+
 It draws multiple samples of data set, applies PAM on
each sample, gives best clustering as output
 Handle larger data sets than PAM (1,000 objects in 10
clusters)
 Efficiency and effectiveness depends on the sampling
20
CLARA - Algorithm
 Set mincost to MAXIMUM;
 Repeat q times // draws q samples
 Create S by drawing s objects randomly from D;
 Generate the set of medoids M from S by applying the
PAM algorithm;
 Compute cost(M,D)
 If cost(M, D)<mincost
Mincost = cost(M, D);
Bestset = M;
 Endif;
 Endrepeat;
 Return Bestset;
21
Complexity of CLARA
 Set mincost to MAXIMUM; O(1)
 Repeat q times O((s-k)2*k+(n-k)*k)
 Create S by drawing s objects
randomly from D; O(1)
 Generate the set of medoids M
from S by applying the PAM
algorithm; O((s-k)2*k)
 Compute cost(M,D) O((n-k)*k)
 If cost(M, D)<mincost O(1)
Mincost = cost(M, D);
Bestset = M;
Endif;
 Endrepeat;
 Return Bestset; 22
Strengths and Weaknesses of CLARA
 Strength:
 Handle larger data sets than PAM (1,000 objects in 10
clusters)
 Weakness:
 Efficiency depends on sample size
 A good clustering based on samples will not necessarily
represent a good clustering of whole data set if sample is
biased
23
CLARANS (“Randomized” CLARA) (1994)
 CLARANS (A Clustering Algorithm based on
Randomized Search) (Ng and Han’94)
 CLARANS draws sample in solution space dynamically
 A solution is a set of k medoids
 The solutions space contains  n  solutions in total
 
k
 The solution space can be represented by a graph where
every node is a potential solution, i.e., a set of k medoids
24
Graph Abstraction
 Every node is a potential solution (k-medoid)
 Every node is associated with a squared error
 Two nodes are adjacent if they differ by one medoid
 Every node has k(nk) adjacent nodes
{O1,O2,…,Ok}
k(n k)
{Ok+1,O2,…,Ok}
… {Ok+n,O2,…,Ok}
… neighbors for
one node
n-k neighbors for

one medoid 25
Graph Abstraction: CLARANS
 Start with a randomly selected node, check at most m
neighbors randomly
 If a better adjacent node is found, moves to node and
continue; otherwise, current node is local optimum; re-
starts with another randomly selected node to search
for another local optimum
 When h local optimum have been found, returns best
result as overall result
26
CLARANS Compare no more than
maxneighbor times
N  C N
N

N
<

C
… Local
minimum
 
N N numlocal
… Local
minimum
… Local
minimum
…
Best Node
Local
minimum
27
CLARANS - Algorithm
 Set mincost to MAXIMUM;
 For i=1 to h do // find h local optimum
 Randomly select a node as the current node C in the graph;
 J = 1; // counter of neighbors
 Repeat
Randomly select a neighbor N of C;
If Cost(N,D)<Cost(C,D)
Assign N as the current node C;
J = 1;
Else J++;
Endif;
 Until J > m
 Update mincost with Cost(C,D) if applicableEnd for;
 End For
 Return bestnode;
28
Graph Abstraction (k-means, k-modes, k-medoids)
 Each vertex is a set of k-representative objects (means,
modes, medoids)
 Each iteration produces a new set of k-representative
objects with lower overall dissimilarity
 Iterations correspond to a hill descent process in a
landscape (graph) of vertices
29
Comparison with PAM
 Search for minimum in graph (landscape)
 At each step, all adjacent vertices are examined; the one
with deepest descent is chosen as next k-medoids
 Search continues until minimum is reached
 For large n and k values (n=1,000, k=10), examining all
k(nk) adjacent vertices is time consuming; inefficient
for large data sets
 CLARANS vs PAM
 For large and medium data sets, it is obvious that
CLARANS is much more efficient than PAM
 For small data sets, CLARANS outperforms PAM
significantly
30
When n=80,
CLARANS is 5
times faster
than PAM,
while the
cluster quality
is the same.
31
Comparision with CLARA
 CLARANS vs CLARA
 CLARANS is always able to find clusterings of better
quality than those found by CLARA; CLARANS may
use much more time than CLARA
 When the time used is the same, CLARANS is still better
than CLARA
32
33
Hierarchies of Co-expressed Genes and Coherent Patterns
The interpretation of
co-expressed genes
and coherent patterns
mainly depends on the
domain knowledge
34
A Subtle Situation
 To split or not to split? It’s a question.
group A1
group A2
group A
35

Partition Based

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Partition Based

Uploaded by

Copyright:

Available Formats

Partitioning Algorithms: Basic Concepts

 Partition n objects into k clusters

mi is the mean (centroid) of Ci

Ci={P1, P2, P3}

Cj={P4, P5, P6}

 Global optimal: examine all possible partitions

 (Re)assign Oi to the cluster whose centroid is the closest

 For simplicity, 1 dimensional objects and k=2.

 Strategies to calculate cluster means

iterations; normally, k, t <<n

deterministic annealing and genetic algorithms

 Unable to handle noisy data and outliers

 Need to specify k, number of clusters, in advance

 k-medoids: Find k representative objects, called medoids

 PAM (Kaufman and Rousseeuw, 1987)

 Case 1: j is closer to some k than to h; after swapping m and

h, j remains in cluster represented by k

C jmh = d(j, k)  d(j, m)  0 C jmh = d(j, k)  d(j, k)= 0

C jmh = d(j, h)  d(j, m) C jmh= d (j, h)  d(j, k) < 0

and non-medoid object h (n-k)*k times

 PAM is more robust than k-means in the presence of

 CLARA (Kaufmann and Rousseeuw in 1990)

n-k neighbors for

 To split or not to split? It’s a question.

You might also like