You are on page 1of 35

Partitioning Algorithms: Basic Concepts

 Partition n objects into k clusters


 Optimize the chosen partitioning criterion
 Example: minimize the Squared Error
 Squared Error of a cluster
Error (Ci )   d ( p, m )
2
i
pCi

mi is the mean (centroid) of Ci


 Squared Error of a clustering
k k
Error   Error (Ci )   d ( p, mi )
2

i 1 i 1 pCi

1
Example of Square Error of Cluster

Ci={P1, P2, P3}


10
P1 = (3, 7)
9
8 P2 = (2, 3)
P1
7 P3 = (7, 5)
6 P3 mi = (4, 5)
5
4 P2 mi |d(P1, mi)|2
3 =(3-4)2+(7-5)2=5
2 |d(P2, mi)|2=8
1 |d(P3, mi)|2=9
0 1 2 3 4 5 6 7 8 9 10
Error (Ci)=5+8+9=22
2
Example of Square Error of Cluster

Cj={P4, P5, P6}


10
P4 = (4, 6)
9
8 P5 = (5, 5)
7 P6 = (3, 4)
P4
6 P5 mj = (4, 5)
5
4 mj |d(P4, mj)|2
P6
3 =(4-4)2+(6-5)2=1
2 |d(P5, mj)|2=1
1 |d(P6, mj)|2=1
0 1 2 3 4 5 6 7 8 9 10
Error (Cj)=1+1+1=3
3
Partitioning Algorithms: Basic Concepts

 Global optimal: examine all possible partitions


 kn possible partitions, too expensive!
 Heuristic methods: k-means and k-medoids
 k-means (MacQueen’67): Each cluster is
represented by center of cluster
 k-medoids (Kaufman & Rousseeuw’87): Each
cluster is represented by one of the objects
(medoid) in cluster

4
K-means
 Initialization
 Arbitrarily choose k objects as the initial cluster centers
(centroids)
 Iteration until no change
 For each object Oi
 Calculate the distances between Oi and the k centroids

 (Re)assign Oi to the cluster whose centroid is the closest


to Oi
 Update the cluster centroids based on current assignment

5
k-Means Clustering Method cluster
10 current mean
10
9 clusters 9
8 8
7 7
6 6
5 5

4 4

3 3

2 2

1 1

0 0
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
objects
new relocated
clusters
10
10
9
9
8
8
7
7
6
6
5
5
4
4
3
3
2
2
1
1
0
0
0 1 2 3 4 5 6 7 8 9 10
0 1 2 3 4 5 6 7 8 9 10
6
Example

 For simplicity, 1 dimensional objects and k=2.


 Objects: 1, 2, 5, 6,7
 K-means:
 Randomly select 5 and 6 as initial centroids;
 => Two clusters {1,2,5} and {6,7}; meanC1=8/3,
meanC2=6.5
 => {1,2}, {5,6,7}; meanC1=1.5, meanC2=6
 => no change.
 Aggregate dissimilarity = 0.5^2 + 0.5^2 + 1^2 + 1^2 =
2.5

7
Variations of k-Means Method
 Aspects of variants of k-means
 Selection of initial k centroids
 E.g., choose k farthest points

 Dissimilarity calculations
 E.g., use Manhattan distance

 Strategies to calculate cluster means


 E.g., update the means incrementally

8
Strengths of k-Means Method
 Strength
 Relatively efficient for large datasets
 O(tkn) where n is # objects, k is # clusters, and t is #

iterations; normally, k, t <<n


 Often terminates at a local optimum
 global optimum may be found using techniques such as

deterministic annealing and genetic algorithms

9
Weakness of k-Means Method
 Weakness
 Applicable only when mean is defined, then what about
categorical data?
 k-modes algorithm

 Unable to handle noisy data and outliers


 k-medoids algorithm

 Need to specify k, number of clusters, in advance


 Hierarchical algorithms

 Density-based algorithms

10
k-modes Algorithm age income student credit_rating
< = 30 high no fair
 Handling categorical data: < = 30 high no excellent
31…40 high no fair
k-modes (Huang’98) > 40 medium no fair
 Replacing means of > 40 low yes fair
> 40 low yes excellent
clusters with modes
31…40 low yes excellent
 Given n records in < = 30 medium no fair
cluster, mode is record < = 30 low yes fair
made up of most > 40 medium yes fair
< = 30 medium yes excellent
frequent attribute
31…40 medium no excellent
values 31…40 high yes fair
 In the example cluster, mode = (<=30, medium, yes, fair)
 Using new dissimilarity measures to deal with
categorical objects

11
A Problem of K-means
 Sensitive to outliers
 Outlier: objects with extremely large (or small) values
 May substantially distort the distribution of the data

+
+

Outlier

12
k-Medoids Clustering Method

 k-medoids: Find k representative objects, called medoids


 PAM (Partitioning Around Medoids, 1987)
 CLARA (Kaufmann & Rousseeuw, 1990)
 CLARANS (Ng & Han, 1994): Randomized sampling

10 10

9 9

8 8

7 7

6 6

5 5

4 4

3 3

2 2

1 1

0 0

0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10

k-means k-medoids
13
PAM (Partitioning Around Medoids) (1987)

 PAM (Kaufman and Rousseeuw, 1987)


 Arbitrarily choose k objects as the initial medoids
 Until no change, do
 (Re)assign each object to the cluster with the nearest
medoid
 Improve the quality of the k-medoids
(Randomly select a nonmedoid object, Orandom,
compute the total cost of swapping a medoid with
Orandom)
 Work for small data sets (100 objects in 5 clusters)
 Not efficient for medium and large data sets
14
Swapping Cost
 For each pair of a medoid m and a non-medoid object h,
measure whether h is better than m as a medoid
 Use the squared-error criterion
k
E    d ( p, mi ) 2

i 1 pCi

 Compute Eh-Em
 Negative: swapping brings benefit
 Choose the minimum swapping cost

15
Four Swapping Cases
 When a medoid m is to be swapped with a non-medoid
object h, check each of other non-medoid objects j
 j is in cluster of m reassign j

 Case 1: j is closer to some k than to h; after swapping m and


h, j relocates to cluster represented by k
 Case 2: j is closer to h than to k; after swapping m and h, j is
in cluster represented by h
 j is in cluster of some k, not m  compare k with h
 Case 3: j is closer to some k than to h; after swapping m and

h, j remains in cluster represented by k


 Case 4: j is closer to h than to k; after swapping m and h, j is

in cluster represented by h

16
PAM Clustering: Total swapping cost TCmh=jCjmh
Case 1 10 Case 3 10

9 9
j
8
h 8
k
7

6
j 7

5 5

m h
4
k 4

2
3

2
m
1
1

0
0
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10

C jmh = d(j, k)  d(j, m)  0 C jmh = d(j, k)  d(j, k)= 0


Case 2 10
Case 4 10

9
9

8
k 8

7
h 7

6
j 6

5
5 m
4

3
m 4
h j
3

1
2

1
k
0
0
0 1 2 3 4 5 6 7 8 9 10
0 1 2 3 4 5 6 7 8 9 10

C jmh = d(j, h)  d(j, m) C jmh= d (j, h)  d(j, k) < 0


May be positiv e or negativ e
17
Complexity of PAM
 Arbitrarily choose k objects as
the initial medoids O(1)
 Until no change, do O((n-k)2*k)
 (Re)assign each object to the
cluster with the nearest
medoid O((n-k)*k)
 Improve the quality of the k-
medoids O((n-k)2*k)
 For each pair of medoid m

and non-medoid object h (n-k)*k times


 Calculate the swapping
cost TCmh =jCjmh O(n-k)

18
Strength and Weakness of PAM

 PAM is more robust than k-means in the presence of


outliers because a medoid is less influenced by outliers
or other extreme values than a mean
 PAM works efficiently for small data sets but does not
scale well for large data sets
 O(k(n-k)2 ) for each iteration
where n is # of data objects, k is # of clusters
 Can we find the medoids faster?

19
CLARA (Clustering Large Applications) (1990)

 CLARA (Kaufmann and Rousseeuw in 1990)


 Built in statistical analysis packages, such as S+
 It draws multiple samples of data set, applies PAM on
each sample, gives best clustering as output
 Handle larger data sets than PAM (1,000 objects in 10
clusters)
 Efficiency and effectiveness depends on the sampling

20
CLARA - Algorithm
 Set mincost to MAXIMUM;
 Repeat q times // draws q samples
 Create S by drawing s objects randomly from D;
 Generate the set of medoids M from S by applying the
PAM algorithm;
 Compute cost(M,D)
 If cost(M, D)<mincost
Mincost = cost(M, D);
Bestset = M;
 Endif;
 Endrepeat;
 Return Bestset;
21
Complexity of CLARA
 Set mincost to MAXIMUM; O(1)
 Repeat q times O((s-k)2*k+(n-k)*k)
 Create S by drawing s objects
randomly from D; O(1)
 Generate the set of medoids M
from S by applying the PAM
algorithm; O((s-k)2*k)
 Compute cost(M,D) O((n-k)*k)
 If cost(M, D)<mincost O(1)
Mincost = cost(M, D);
Bestset = M;
Endif;
 Endrepeat;
 Return Bestset; 22
Strengths and Weaknesses of CLARA
 Strength:
 Handle larger data sets than PAM (1,000 objects in 10
clusters)
 Weakness:
 Efficiency depends on sample size
 A good clustering based on samples will not necessarily
represent a good clustering of whole data set if sample is
biased

23
CLARANS (“Randomized” CLARA) (1994)
 CLARANS (A Clustering Algorithm based on
Randomized Search) (Ng and Han’94)
 CLARANS draws sample in solution space dynamically
 A solution is a set of k medoids
 The solutions space contains  n  solutions in total
 
k
 The solution space can be represented by a graph where
every node is a potential solution, i.e., a set of k medoids

24
Graph Abstraction
 Every node is a potential solution (k-medoid)
 Every node is associated with a squared error
 Two nodes are adjacent if they differ by one medoid
 Every node has k(nk) adjacent nodes

{O1,O2,…,Ok}

k(n k)
{Ok+1,O2,…,Ok}
… {Ok+n,O2,…,Ok}
… neighbors for
one node

n-k neighbors for


one medoid 25
Graph Abstraction: CLARANS
 Start with a randomly selected node, check at most m
neighbors randomly
 If a better adjacent node is found, moves to node and
continue; otherwise, current node is local optimum; re-
starts with another randomly selected node to search
for another local optimum
 When h local optimum have been found, returns best
result as overall result

26
CLARANS Compare no more than
maxneighbor times

N  C N

N

N
<

C
… Local
minimum
 
N N numlocal
… Local
minimum

… Local
minimum


Best Node
Local
minimum
27
CLARANS - Algorithm
 Set mincost to MAXIMUM;
 For i=1 to h do // find h local optimum
 Randomly select a node as the current node C in the graph;
 J = 1; // counter of neighbors
 Repeat
Randomly select a neighbor N of C;
If Cost(N,D)<Cost(C,D)
Assign N as the current node C;
J = 1;
Else J++;
Endif;
 Until J > m
 Update mincost with Cost(C,D) if applicableEnd for;
 End For
 Return bestnode;

28
Graph Abstraction (k-means, k-modes, k-medoids)
 Each vertex is a set of k-representative objects (means,
modes, medoids)
 Each iteration produces a new set of k-representative
objects with lower overall dissimilarity
 Iterations correspond to a hill descent process in a
landscape (graph) of vertices

29
Comparison with PAM
 Search for minimum in graph (landscape)
 At each step, all adjacent vertices are examined; the one
with deepest descent is chosen as next k-medoids
 Search continues until minimum is reached
 For large n and k values (n=1,000, k=10), examining all
k(nk) adjacent vertices is time consuming; inefficient
for large data sets
 CLARANS vs PAM
 For large and medium data sets, it is obvious that
CLARANS is much more efficient than PAM
 For small data sets, CLARANS outperforms PAM
significantly
30
When n=80,
CLARANS is 5
times faster
than PAM,
while the
cluster quality
is the same.

31
Comparision with CLARA
 CLARANS vs CLARA
 CLARANS is always able to find clusterings of better
quality than those found by CLARA; CLARANS may
use much more time than CLARA
 When the time used is the same, CLARANS is still better
than CLARA

32
33
Hierarchies of Co-expressed Genes and Coherent Patterns

The interpretation of
co-expressed genes
and coherent patterns
mainly depends on the
domain knowledge

34
A Subtle Situation

 To split or not to split? It’s a question.

group A1

group A2
group A

35

You might also like