Welcome to Scribd!

COL870: Clustering Algorithms: Ragesh Jaiswal, CSE, IIT Delhi

Uploaded by

0% found this document useful (0 votes)

15 views16 pages

This document provides information about the COL870: Clustering Algorithms course taught by Ragesh Jaiswal at IIT Delhi. The grading scheme and policies are outlined, including that cheating will result in an F grade. There are no required textbooks; reference materials will be provided online. The course website provides additional resources. Clustering involves grouping similar objects without predefined labels, unlike classification which uses supervised learning with labeled training data. The course will focus on understanding clustering techniques and algorithms, with optional programming assignments.

Original Description:

Original Title

lec-1

Copyright

Available Formats

PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

0% found this document useful (0 votes)

15 views16 pages

COL870: Clustering Algorithms: Ragesh Jaiswal, CSE, IIT Delhi

Uploaded by

Pooja Sinha

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

Jump to Page

You are on page 1of 16

Search inside document

COL870: Clustering Algorithms

Ragesh Jaiswal, CSE, IIT Delhi

COL870: Clustering Algorithms

Administrative Information

Ragesh Jaiswal, CSE, IIT Delhi

COL870: Clustering Algorithms

Administrative Information

Instructor
Ragesh Jaiswal
Office: 403, SIT Building
Email: rjaiswal@cse.iitd.ac.in

Please send email to set up a meeting with me.

Ragesh Jaiswal, CSE, IIT Delhi

COL870: Clustering Algorithms

Administrative Information

Grading Scheme
Homework: 25 (12.5 Theory and 12.5 programming)
Minor 1 and 2: 15 points each.
Major: 25 points.
4 Project: 18 points.
5 Attendance: 2 points.
1
2
3

Policy on cheating:
Anyone found using unfair means in the course will receive an
F grade.

Ragesh Jaiswal, CSE, IIT Delhi

COL870: Clustering Algorithms

Administrative Information

Textbook: There are no textbooks for this course. This is an

advanced level course. The reference material will include
book chapters, course notes, research papers, and other
material available online.

Course webpage:
http://www.cse.iitd.ac.in/~rjaiswal/2015/col870.
The site will contain course information, references, and
announcements. Please check this page regularly.

Ragesh Jaiswal, CSE, IIT Delhi

COL870: Clustering Algorithms

Introduction

Ragesh Jaiswal, CSE, IIT Delhi

COL870: Clustering Algorithms

Introduction

What is data clustering?

Ragesh Jaiswal, CSE, IIT Delhi

COL870: Clustering Algorithms

Introduction

What is data clustering?

The task of grouping a set of objects such that objects in the
same group (called cluster) are more similar than objects in
different groups.
Given a representation of n objects, find k groups based on a
measure of similarity (dissimilarity) such that the similarities
between objects in the same group are high while similarities
between objects in different groups are low.

Ragesh Jaiswal, CSE, IIT Delhi

COL870: Clustering Algorithms

Introduction
Why study data clustering?
Custering is usually the first step when trying to make sense of
Big Data.
Common tool across many different fields such as image
analysis, pattern recognition, bioinformatics, information
retrieval etc.

Data clustering has been used for the following three main
purposes [Jain09]:
Underlying structure: to gain insight into data, generate
hypotheses, detect anomalies, and identify salient features.
Natural classification: to identify the degree of similarity
among forms or organisms.
Compression: as a method for organizing the data and
summarizing it through cluster prototypes.

Ragesh Jaiswal, CSE, IIT Delhi

COL870: Clustering Algorithms

Introduction

How is clustering different from classification?

Ragesh Jaiswal, CSE, IIT Delhi

COL870: Clustering Algorithms

Introduction

How is clustering different from classification?

Classification
The task of assigning
objects to predefined
classes.

Clustering
The task of grouping objects without any knowledge of the classes.

The more general terms used are supervised learning versus

unsupervised learning.
Supervised Learning
- Classes/labels known
- Expert available
- Training data available

Ragesh Jaiswal, CSE, IIT Delhi

Unsupervised Learning
- Classes/labels unknown
- No experts
- No training data

COL870: Clustering Algorithms

Introduction
Frequently Asked Questions

What are the pre-requisites for the course?

Answer: Data Structures and Algorithms. I will assume that you
are familiar with Algorithm design and analysis techniques and
intractability concepts like NP-completeness etc.

What flavour will this course have, algorithms or machine

learning?
Answer: The course will have the rigour of typical algorithms
courses. That is, we will spend time proving things. However,
techniques we learn in the course will have applications in many
different areas.

Does the course involve programming?

Answer: Depends on the interest of the class. Our focus will be to
understand clustering techniques. However, if the class is
interested, I can set up programming assignments.

Does the course involve a project?

Answer: Yes. You will be asked to read and present research
papers on clustering topics towards the later part of the course.

Do we become more marketable after doing all this?

Answer: Perhaps yes!
Ragesh Jaiswal, CSE, IIT Delhi

COL870: Clustering Algorithms

Introduction

What is data clustering?

Given a representation of n objects, find k groups based on a
measure of similarity (dissimilarity) such that the similarities
between objects in the same group are high while similarities
between objects in different groups are low.

Suppose the given objects to be clustered can be represented

as points in two-dimensional space (i.e., R2 ).
What is a reasonable notion of similarity between objects?

Ragesh Jaiswal, CSE, IIT Delhi

COL870: Clustering Algorithms

Introduction
What is data clustering?
Given a representation of n objects, find k groups based on a
measure of similarity (dissimilarity) such that the similarities
between objects in the same group are high while similarities
between objects in different groups are low.

Suppose the given objects to be clustered can be represented

as points in two-dimensional space (i.e., R2 ).
What is a reasonable notion of similarity between objects?
Distance between points.

Ragesh Jaiswal, CSE, IIT Delhi

COL870: Clustering Algorithms

Suppose the given objects to be clustered can be represented

as points in two-dimensional space (i.e., R2 ).
What is a reasonable notion of similarity between objects?
Distance between points.
The notion of similarity/dissimilarity has to be defined
carefully.

Ragesh Jaiswal, CSE, IIT Delhi

COL870: Clustering Algorithms

End

Ragesh Jaiswal, CSE, IIT Delhi

COL870: Clustering Algorithms

Machine Learning with Clustering: A Visual Guide for Beginners with Examples in Python
From Everand
Machine Learning with Clustering: A Visual Guide for Beginners with Examples in Python
Artem Kovera
No ratings yet
Modelling Business Information: Entity relationship and class modelling for Business Analysts
From Everand
Modelling Business Information: Entity relationship and class modelling for Business Analysts
Keith Gordon
No ratings yet
Weekly 12
Document3 pages
Weekly 12
Pablo Escobar
No ratings yet
Data Mining and Visualization Question Bank
Document11 pages
Data Mining and Visualization Question Bank
ghost
100% (1)
Data Mining
Document9 pages
Data Mining
Chethan.M
100% (4)
Ecml12 Verma
Document16 pages
Ecml12 Verma
saurabh_34
No ratings yet
An Overview of Anomalous Sub Population: Mukti, Hari Singh
Document4 pages
An Overview of Anomalous Sub Population: Mukti, Hari Singh
erpublication
No ratings yet
CS583 Chapter 4 Supervised Learning
Document166 pages
CS583 Chapter 4 Supervised Learning
Kunal Deore
No ratings yet
Sequence Alignment Dissertation
Document4 pages
Sequence Alignment Dissertation
PayToWritePaperUK
100% (1)
Data Mining - Tasks
Document7 pages
Data Mining - Tasks
vijayganesh pinisetti
No ratings yet
Ho01 DMM 01
Document15 pages
Ho01 DMM 01
Jaleelah Abdallah
No ratings yet
DSA Presentation Group 6
Document34 pages
DSA Presentation Group 6
AYUSHI WAKODE
No ratings yet
An Enhanced Technique For Database Clustering Using An Extensive Data Set For Multi-Valued Attributes
Document6 pages
An Enhanced Technique For Database Clustering Using An Extensive Data Set For Multi-Valued Attributes
International Journal of Application or Innovation in Engineering & Management
No ratings yet
AI ML Unit 4 QB
Document38 pages
AI ML Unit 4 QB
Sumedh Rane
No ratings yet
知识图谱三元组定义参考
Document9 pages
知识图谱三元组定义参考
luoluo Li
No ratings yet
Data Clustering: 50 Years Beyond K-Means
Document35 pages
Data Clustering: 50 Years Beyond K-Means
Soumen Chowdhury
No ratings yet
Design A Database
Document39 pages
Design A Database
Birhanu Girmay
No ratings yet
Lec 1
Document16 pages
Lec 1
Sahil Mittal
No ratings yet
Chapter 1 PART B
Document76 pages
Chapter 1 PART B
sai rahul
No ratings yet
Unpacking Agile Enterprise Architecture Innovation Work Practices: A Qualitative Case Study of A Railroad Company
Document10 pages
Unpacking Agile Enterprise Architecture Innovation Work Practices: A Qualitative Case Study of A Railroad Company
4zoli
No ratings yet
CS583 Supervised Learning
Document166 pages
CS583 Supervised Learning
Ahmad Helmy
No ratings yet
SLIQ: A Fast Scalable Classifier For Data Mining
Document15 pages
SLIQ: A Fast Scalable Classifier For Data Mining
Andrew Dotson
No ratings yet
Research
Document16 pages
Research
martin martin
No ratings yet
DBA Lecture 2 ER
Document48 pages
DBA Lecture 2 ER
arean
No ratings yet
Da Unit-4
Document43 pages
Da Unit-4
Shruthi Sayam
No ratings yet
Activity 1 PDF
Document3 pages
Activity 1 PDF
John Michael Reyes
No ratings yet
Data Mining Course Overview
Document38 pages
Data Mining Course Overview
harishkode
No ratings yet
6 Text Clustering
Document66 pages
6 Text Clustering
Tushar Shah
No ratings yet
Abstract Hi-Tech Hexagons PowerPoint Templates
Document10 pages
Abstract Hi-Tech Hexagons PowerPoint Templates
Nikhat Parween
No ratings yet
Data Structure Interview Questions and Answers (Top 46)
Document21 pages
Data Structure Interview Questions and Answers (Top 46)
Ritika Kamble
No ratings yet
BMW M-4
Document108 pages
BMW M-4
Tarun K
No ratings yet
Classvac
Document14 pages
Classvac
نبيل يحيي
No ratings yet
What Is A Set in Data Structure - Google Search
Document1 page
What Is A Set in Data Structure - Google Search
Srinidhi
No ratings yet
DWM Mid 2 Question Bank
Document5 pages
DWM Mid 2 Question Bank
Indu Alluru
No ratings yet
Data Analytics - Object Segmentation UNIT-IV
Document33 pages
Data Analytics - Object Segmentation UNIT-IV
19524 Alekhya
No ratings yet
HierarchicalClusteringASurvey - Published7 3 9 871
Document5 pages
HierarchicalClusteringASurvey - Published7 3 9 871
thaneatharran santharasekaran
No ratings yet
Semi-Supervised Learning A Brief Review
Document6 pages
Semi-Supervised Learning A Brief Review
Reshma Khemchandani
No ratings yet
Data Mining With Clustering AND Classification
Document16 pages
Data Mining With Clustering AND Classification
Amanjyot Singh Oberoi
No ratings yet
Association Rule Generation For Student Performance Analysis Using Apriori Algorithm
Document5 pages
Association Rule Generation For Student Performance Analysis Using Apriori Algorithm
thesij
No ratings yet
Report 1 Coderindeed PDF
Document15 pages
Report 1 Coderindeed PDF
Ayushi
No ratings yet
02 Database Design ER Model
Document22 pages
02 Database Design ER Model
Unyil Persi
No ratings yet
Database Model
Document24 pages
Database Model
shashwat2010
No ratings yet
A Complete Set of Systems: Thinking Skills
Document13 pages
A Complete Set of Systems: Thinking Skills
system user
No ratings yet
Essentials of Systems Analysis and Design 6th Edition Valacich Solutions Manual
Document35 pages
Essentials of Systems Analysis and Design 6th Edition Valacich Solutions Manual
mantlingunturnedggxqaz
100% (21)
DMW - Unit 1
Document21 pages
DMW - Unit 1
Priya Bhalerao
No ratings yet
CA Home: Interview Preparation
Document32 pages
CA Home: Interview Preparation
Sai Ram
No ratings yet
The Steps of Qualitative Data Analysis
Document92 pages
The Steps of Qualitative Data Analysis
Emmanuel Michael
No ratings yet
Un-Supervised Machine Learning
Document9 pages
Un-Supervised Machine Learning
ranamzeeshan
No ratings yet
Data Model - Important - Concepts
Document24 pages
Data Model - Important - Concepts
MUKIZA Enos
No ratings yet
An Efficient Classification Algorithm For Real Estate Domain
Document7 pages
An Efficient Classification Algorithm For Real Estate Domain
IJMER
No ratings yet
About The Classification and Regression Supervised Learning Problems
Document3 pages
About The Classification and Regression Supervised Learning Problems
saileshmns
No ratings yet
A Direct Data-Cluster Analysis Method Based On Neutrosophic Set Implication
Document18 pages
A Direct Data-Cluster Analysis Method Based On Neutrosophic Set Implication
Science Direct
No ratings yet
Explain The Concept of Foreign Key. How A Foreign Key Differs From A Primary Key? Can The Foreign Key Accept Nulls? Answer
Document5 pages
Explain The Concept of Foreign Key. How A Foreign Key Differs From A Primary Key? Can The Foreign Key Accept Nulls? Answer
hayerpa
No ratings yet
ASSL Professor Paper
Document15 pages
ASSL Professor Paper
minhaz21
No ratings yet
Importany Questions Unit 3 4
Document30 pages
Importany Questions Unit 3 4
Mubena Hussain
No ratings yet
Machine Learning CS229/STATS229: Instructors: Moses Charikar, Tengyu Ma, and Chris Re
Document40 pages
Machine Learning CS229/STATS229: Instructors: Moses Charikar, Tengyu Ma, and Chris Re
Saurabh suman
No ratings yet
Data Mining Query Languages
Document30 pages
Data Mining Query Languages
Jámès Kõstã
No ratings yet
CSL105: Discrete Mathematical Structures: Ragesh Jaiswal, CSE, IIT Delhi
Document28 pages
CSL105: Discrete Mathematical Structures: Ragesh Jaiswal, CSE, IIT Delhi
SRINIVASA RAO
No ratings yet
For Unit 4 Useful
Document107 pages
For Unit 4 Useful
shilpa dirisala
No ratings yet
Chapter: Entity-Relationship Model
Document74 pages
Chapter: Entity-Relationship Model
Raghav Rathi
No ratings yet
ISRO2017
Document26 pages
ISRO2017
Pooja Sinha
No ratings yet
RBI Mains English Paper 2007 To 2011
Document14 pages
RBI Mains English Paper 2007 To 2011
sssbulbul
No ratings yet
CSP 2015 Eng
Document30 pages
CSP 2015 Eng
AakashVats
No ratings yet
Economy
Document63 pages
Economy
Pooja Sinha
No ratings yet
Intro Analysis
Document269 pages
Intro Analysis
Brian Precious
No ratings yet
Mobile Identiy
Document18 pages
Mobile Identiy
Pooja Sinha
No ratings yet
ISRO2016
Document29 pages
ISRO2016
Pooja Sinha
No ratings yet
Country Project
Document32 pages
Country Project
SeemiAli
100% (1)
Chemistry Cbse Solved
Document9 pages
Chemistry Cbse Solved
a
No ratings yet
World Physical 2011 Nov
Document1 page
World Physical 2011 Nov
Bruno Teixeira
No ratings yet
Politička Karta Svijeta
Document1 page
Politička Karta Svijeta
Dario Šakić
No ratings yet
CSP 2015 Eng
Document30 pages
CSP 2015 Eng
AakashVats
No ratings yet
Ciml v0 - 8 All Machine Learning
Document189 pages
Ciml v0 - 8 All Machine Learning
aglobal2
100% (2)
Politička Karta Svijeta
Document1 page
Politička Karta Svijeta
Dario Šakić
No ratings yet
The Archimedean Property
Document1 page
The Archimedean Property
Pooja Sinha
No ratings yet
11 July 2016 13:29: Unfiled Notes Page 1
Document1 page
11 July 2016 13:29: Unfiled Notes Page 1
Pooja Sinha
No ratings yet
Af 1
Document25 pages
Af 1
Pooja Sinha
No ratings yet
Mathematics IA Algebra and Geometry (Part I) Michaelmas Term 2002
Document30 pages
Mathematics IA Algebra and Geometry (Part I) Michaelmas Term 2002
Pooja Sinha
No ratings yet
Q. 1 - Q. 25 Carry One Mark Each
Document12 pages
Q. 1 - Q. 25 Carry One Mark Each
Pooja Sinha
No ratings yet
Cs 2kjkj VB
Document2 pages
Cs 2kjkj VB
Alok Kumar Singh
No ratings yet
MM508 Topologi I Ugeseddel 1
Document1 page
MM508 Topologi I Ugeseddel 1
Pooja Sinha
No ratings yet
COS 511: Foundations of Machine Learning
Document7 pages
COS 511: Foundations of Machine Learning
Pooja Sinha
No ratings yet
Completeness of R
Document10 pages
Completeness of R
SIVA KRISHNA PRASAD ARJA
No ratings yet
01 Streams
Document34 pages
01 Streams
Pooja Sinha
No ratings yet
GATE 2014 Answer Keys For CS Computer Science and Information Technology
Document1 page
GATE 2014 Answer Keys For CS Computer Science and Information Technology
Manoj Bojja
No ratings yet
Q. 1 - Q. 25 Carry One Mark Each.: GX X X HX X GHX HGX HX GX X GX HX X X
Document11 pages
Q. 1 - Q. 25 Carry One Mark Each.: GX X X HX X GHX HGX HX GX X GX HX X X
Fahad Rasool Dar
No ratings yet
GATE 2014 Answer Keys For CS Computer Science and Information Technology
Document1 page
GATE 2014 Answer Keys For CS Computer Science and Information Technology
Manoj Bojja
No ratings yet
GATE 2014 Answer Keys For CS Computer Science and Information Technology
Document1 page
GATE 2014 Answer Keys For CS Computer Science and Information Technology
Manoj Bojja
No ratings yet
Q. 1 - Q. 25 Carry One Mark Each
Document12 pages
Q. 1 - Q. 25 Carry One Mark Each
Pooja Sinha
No ratings yet
CS: Computer Science and Information Technology 7Th Feb Shift2 65 100.0
Document25 pages
CS: Computer Science and Information Technology 7Th Feb Shift2 65 100.0
Pooja Sinha
No ratings yet
ML Answer Key (M.tech)
Document31 pages
ML Answer Key (M.tech)
SH Gaming
No ratings yet
PGD DS Brochure
Document9 pages
PGD DS Brochure
JASMER SINGH 1611118
No ratings yet
r16 Syllabus Cse Jntuh
Document58 pages
r16 Syllabus Cse Jntuh
ramakanth83
No ratings yet
Fiske Et Al. (2002) Stereotype Content Model
Document25 pages
Fiske Et Al. (2002) Stereotype Content Model
naty92
No ratings yet
BAM Analysis
Document15 pages
BAM Analysis
pratik panchal
No ratings yet
2014 Ieee Project Dotnet Titles
Document6 pages
2014 Ieee Project Dotnet Titles
Raghu Nath
No ratings yet
A06-A Survey of Clustering Techniques
Document5 pages
A06-A Survey of Clustering Techniques
Deniz BIYIK
No ratings yet
2020-Business Intelligence Capabilities and Firm Performance A Study in China
Document15 pages
2020-Business Intelligence Capabilities and Firm Performance A Study in China
Vynska Permadi
No ratings yet
Ketulkumar Polara: Data Scientist Email: Phone
Document6 pages
Ketulkumar Polara: Data Scientist Email: Phone
harsh
No ratings yet
Using Accuracy and Diversity To Select Classifiers To Build Ensembles
Document7 pages
Using Accuracy and Diversity To Select Classifiers To Build Ensembles
edersonschmeing
No ratings yet
Project Report - Data Mining
Document52 pages
Project Report - Data Mining
Ruhee's Kitchen
No ratings yet
Fundamentals of Learning MIT 15.097 Course Notes Cynthia Rudin
Document8 pages
Fundamentals of Learning MIT 15.097 Course Notes Cynthia Rudin
alok
No ratings yet
Tutorial 4
Document3 pages
Tutorial 4
TharmanSiva
No ratings yet
Analysis and Study of K Means Clustering Algorithm IJERTV2IS70648
Document6 pages
Analysis and Study of K Means Clustering Algorithm IJERTV2IS70648
Somu Naskar
No ratings yet
Ruce - B.tech - Ru20 - Vi Semester - Syllabus
Document34 pages
Ruce - B.tech - Ru20 - Vi Semester - Syllabus
Shaik azeezullah
No ratings yet
Chapter 23 - Cluster Analysis
Document16 pages
Chapter 23 - Cluster Analysis
fer.d
100% (1)
Advanced Data Analysis in Neuroscience Integrating Statistical and Computational Models PDF
Document308 pages
Advanced Data Analysis in Neuroscience Integrating Statistical and Computational Models PDF
Lucas Rodrigues
No ratings yet
Mlba Slides 4 Clustering-Note
Document14 pages
Mlba Slides 4 Clustering-Note
歐怡君
No ratings yet
AI Based Service Management For 6G Green Communications
Document35 pages
AI Based Service Management For 6G Green Communications
Elmustafa Sayed Ali Ahmed
No ratings yet
DIANA NAJJUMA Progress Reports
Document5 pages
DIANA NAJJUMA Progress Reports
diana najjuma
No ratings yet
K-Means Clustering Method For The Analysis of Log Data
Document3 pages
K-Means Clustering Method For The Analysis of Log Data
idescitation
No ratings yet
CS246: Mining Massive Datasets Jure Leskovec,: Stanford University
Document49 pages
CS246: Mining Massive Datasets Jure Leskovec,: Stanford University
Ehtasham Ul Haq
No ratings yet
A Cluster First-Route Second
Document21 pages
A Cluster First-Route Second
varunsingh214761
No ratings yet
Project On Data Mining: Prepared by Ashish Pavan Kumar K PGP-DSBA at Great Learning
Document50 pages
Project On Data Mining: Prepared by Ashish Pavan Kumar K PGP-DSBA at Great Learning
Ashish Pavan Kumar K
No ratings yet
Merchandise Management
Document170 pages
Merchandise Management
jaijohnk
100% (1)
K Means Clustering Solved Numerical - 5 Minutes Engineering
Document8 pages
K Means Clustering Solved Numerical - 5 Minutes Engineering
priyanshidubey2008
No ratings yet
The SPSS TwoStep Cluster Component
Document9 pages
The SPSS TwoStep Cluster Component
shamvir
No ratings yet
Automated Ticket Resolution
Document10 pages
Automated Ticket Resolution
rsingh2083
No ratings yet
Cluster Analysis On The Effect of Dollar Increment On The Economy of Nigeria Real
Document58 pages
Cluster Analysis On The Effect of Dollar Increment On The Economy of Nigeria Real
Victor Hopeson Eweh
100% (1)
Carnival PPT - Proactive Fraud Analytics at Carnival
Document22 pages
Carnival PPT - Proactive Fraud Analytics at Carnival
Sumeet Agrawal
No ratings yet