Professional Documents
Culture Documents
Clustering
Anirban Mondal
anirban.mondal@snu.edu.in
Clustering
Clustering is about grouping similar objects/items
together
Within a cluster, items are more similar to each other than
with items outside the cluster
Clustering
Clustering is one of the most important
unsupervised learning processes
Clustering finds structures in a collection of
unlabeled data
A separate quality function measures how
good the clustering is
Clustering
Input: a collection of n objects each
represented by a vector
Objective: to divide these n objects into k
clusters so that similar objects are grouped
together
In real-world settings, k is usually unknown
Example
Example
Example
Example
Example
Think scalability
In this example, you could do the clustering
manually because the dataset was very small
What if you had to cluster 1 million people or
even 10000 people based on any one
dimension such as age range, interests etc?
Clustering algorithms are needed to achieve this
Example with K = 2
ID
Example with K = 2
ID
4
3
K=2, The points in red AND blue are randomly selected as your
two initial clusters
Example
ID
4
3
Example
ID
4
3
Example
ID
4
3
Example
ID
4
3
Example
ID
4
3
Example
ID
4
3
Example
ID
4
3
Example
ID
4
3
Example
ID
Step 3: Recompute
each cluster center as
the average of the
points in that cluster. In
this diagram, each
cluster centre is
indicated by an X
4
3
Example
ID
4
3
Example
ID
4
3
Cons
The number K of clusters needs to be provided as an input, hence K
needs to be decided in advance
When dataset is relatively small, the initial clustering assignment has
significant influence on the final clustering results
The same dataset can produce different clusters, depending upon the
order of input
Each attribute is provided the same weightage, hence we cannot figure
out which attribute contributes how much to the clustering process
Observe that the algorithm essentially uses average (arithmetic mean)
Arithmetic mean does not work well with outliers. (Can use median if
outliers issue is significant)