You are on page 1of 23

CS 660: Data Mining For Decision Making

Lecture 1 (Week 1)

Varun Dutt
School of Computing and Electrical Engineering
School of Humanities and Social Sciences
Indian Institute of Technology Mandi, India

Scaling the Heights

Course Instructor
Prof. Varun Dutt
School of Computing and Electrical Engineering
School of Humanities and Social Sciences
Indian Institute of Technology, Mandi
PWD Rest House 2nd Floor, Mandi - 175 001, H.P., India
Phone: +91-1905-267041
Email: varun@iitmandi.ac.in
Office Hours: Only with a prior appointment

A Little About Me! In the office


Qualifications
M.S. degrees in Software Engineering, Engineering and Public Policy, and Rational
Simulation (cognitive modeling) from Carnegie Mellon University
Ph.D. in Engineering and Public Policy from Carnegie Mellon University
Post-doctoral fellowship from Carnegie Mellon University
Since 2012 at Indian Institute of Technology, Mandi, India
Research interests
Artificial intelligence and cognitive modeling, Human-Computer Interaction,
Environmental decision making, Judgment and Decision Making
Professional Experience
Served as a Software Engineer in Tata Consultancy Services (TCS) and in
MothersonSumi INfotech and Designs Ltd.
Serves as Knowledge Editor of a financial daily, Financial Chronicle
Serves as Lead Author on Chapter 2 on UN IPCCs AR5 (WG III) report

A Little About Me! At home


ABBA Fan
x5

Married to Dr. Rajeshwari Dutt with a cute little daughter


Get no sleep!
Do a lot of writing and have a back problem
I have a TA to help!
4

Teaching Assistants
- Sanjay Rathee, Ph.D. student, SCEE, IIT Mandi. Email:
sanjay_rathee@students.iitmandi.ac.in (Has been working on
parallelizing A-priori algorithm recently.)
- Akash Porwal, Ph.D. student, SCEE, IIT Mandi. Email:
porwalakash.1989@gmail.com (Has recently joined and is working on
electrical problems concerning Solar Photovoltaics)

What about you folks?


Please introduce yourselves

Announcements
Syllabus
Your Grade:
30% Final exam
20% Surprise Quizzes
10% Class Participation
20% Class Assignments
20% Class Project

Course Logistics
- Please dont copy or plagiarize!
- Being an AI researcher, I know how to catch it
- If found, consequences will be catastrophic!
- If you did copy, then please cite the sources as
(author, date). E.g., (Dutt, 2012)

An Example (Witten, Frank, & Hall, 2011)

Data Mining: What is it? (Witten, Frank, &


Hall, 2011)
Data mining is defined as the process of
discovering structural patterns in data.
The process must be automatic or (more
usually) semiautomatic.
The patterns discovered must be
meaningful in that they lead to some
advantage, usually an economic one.
The data is invariably present in
substantial quantities.
10

Example

11

Structural Description (Pattern) in Data


If tear production rate = reduced then
recommendation = none
Otherwise, if age = young and astigmatic =
no then recommendation = soft

12

Weather Dataset

In this case there are four attributes: outlook, temperature,


humidity, and windy. The outcome is whether to play or not.

13

Structural Description (Pattern) in Data (also,


called a Decision List)
A set of rules learned from this information
might look like this:
If outlook = sunny and humidity = high
then play = no
If outlook = rainy and windy = true then
play = no
If outlook = overcast then play = yes
If humidity = normal then play = yes
If none of the above then play = yes

14

Decision List
These rules are meant to be interpreted in order:
The first one; then, if it doesnt apply, the second;
and so on. A set of rules that are intended to be
interpreted in sequence is called a decision list.
Interpreted as a decision list, the rules correctly
classify all of the examples in the table, whereas
taken individually, out of context, some of the rules
are incorrect. For example, the rule if humidity =
normal then play = yes gets one of the examples
wrong (check which one).
15

Weather Dataset: Two of the attributes


temperature and humidityhave numeric values

16

Structural Description (Classification Rules)

For this example, there must be inequalities involving these attributes


rather than simple equality tests as in the former case.
This is called a numeric-attribute problemin this case, a mixed-attribute
problem because not all attributes are numeric.

Now the first rule given earlier might take the form
If outlook = sunny and humidity > 83 then play = no

17

Association Rules

18

Association Rules

19

Preparing Input Data for Data Mining


Data Cleaning (scrubbing, also called data cleansing), is the
process of amending or removing data in a database that is
incorrect, incomplete, improperly formatted, or duplicated. It is a time
consuming activity often done in a semi-automated manner.
Missing Values: Missing values are frequently indicated by out-ofrange entries. Example: A negative number (e.g., 1) in a numeric
field that is normally only positive, or a 0 in a numeric field that can
never normally be 0. For nominal attributes, missing values may be
indicated by blanks or dashes.
Inaccurate Values: Pepsi somewhere and Pepsi-Cola somewhere
else. Typographical errors. Example: Super-market seller uses her
own cards for discounts to those who forgot their cards.

20

Applications of Data Mining in Real World


Web-mining: Prestige of a web-page based upon how
many link to it (PageRank)
Decisions involving judgments (Banks use data-mining
while giving you loans accept or reject cases)
Screening images (oil slicks or not in sea using satellite
data)
Load forecasting in Electricity Industry
Diagnosing faults in machines in Industry
Marketing and Sales (Pharmaceutical Industry Patient
Journeys, Market-Basket Analysis (Pepsi and Diapers on
Thursdays), Discount or Loyalty Cards to Collect Data
21

Activities
Read Witten, Frank, and Hall, 2011: Chapter 1 (up
to page 15 before CPU performance; 21-29, 51-52,
58-60):
http://www.cse.hcmut.edu.vn/~chauvtn/data_mining/
Texts/[7]%20Data%20Mining%20%20Practical%20Machine%20Learning%20Tools%2
0and%20Techniques%20(3rd%20Ed).pdf
Read Singhal, 2011

22

Thank you!

Comments and Questions most welcome!

23

You might also like