You are on page 1of 35

Machine Learning

Spring 2010

Rong Jin

1
CSE847 Machine Learning
† Instructor: Rong Jin
† Office Hour:
„ Tuesday 4:00pm-5:00pm
„ Thursday 4:00pm-5:00pm
† Textbook
„ Machine Learning
„ The Elements of Statistical Learning
„ Pattern Recognition and Machine Learning
„ Many subjects are from papers
† Web site: http://www.cse.msu.edu/~cse847
2
Requirements
† ~10 homework assignments
† Course project
„ Topic: visual object recognition
„ Data: over one million images with extracted
visual features
„ Objective: build a classifier that automatically
identify the class of objects in images
† Midterm exam & final exam
3
Goal
† Familiarize you with the state-of-art in
Machine Learning
„ Breadth: many different techniques
„ Depth: Project
„ Hands-on experience
† Develop the way of machine learning thinking
„ Learn how to model real-world problems by
machine learning techniques
„ Learn how to deal with practical issues
4
Course Outline

Theoretical Aspects Practical Aspects


• Information Theory • Supervised Learning Algorithms
• Optimization Theory • Unsupervised Learning Algorithms
• Probability Theory • Important Practical Issues
• Learning Theory • Applications

5
Today’s Topics
† Why is machine learning?
† Example: learning to play backgammon
† General issues in machine learning

6
Why Machine Learning?
† Past: most computer programs are mainly
made by hand
† Future: Computers should be able to program
themselves by the interaction with their
environment

7
Recent Trends
† Recent progress in algorithm and theory
† Growing flood of online data
† Computational power is available
† Growing industry

8
Three Niches for Machine Learning
† Data mining: using historical data to improve
decisions
„ Medical records Æ medical knowledge
† Software applications that are difficult to program by
hand
„ Autonomous driving
„ Image Classification
† User modeling
„ Automatic recommender systems
9
Typical Data Mining Task

Given:
• 9147 patient records, each describing pregnancy and birth
• Each patient contains 215 features
Task:
• Classes of future patients at high risk for Emergency Cesarean Section 10
Data Mining Results

One of 18 learned rules:


If no previous vaginal delivery
abnormal 2nd Trimester Ultrasound
Malpresentation at admission
Then probability of Emergency C-Section is 0.6
11
Credit Risk Analysis

Learned Rules:
If Other-Delinquent-Account > 2
Number-Delinquent-Billing-Cycles > 1
Then Profitable-Costumer ? = no

If Other-Delinquent-Account = 0
(Income > $30K or Years-of-Credit > 3)
12
Then Profitable-Costumer ? = yes
Programs too Difficult to Program By Hand
† ALVINN drives 70mph on highways

13
Programs too Difficult to Program By Hand
† ALVINN drives 70mph on highways

14
Programs too Difficult to Program By Hand
† Visual object recognition
Classify Bird Images

Positive Examples
2

Statistical Model
Train Test
Negative Examples 3

2
15
Image Retrieval using Texts

16
Software that Models Users
History What to Recommend?
Description:A homicide detective and a Description: A high-school boy
fire marshall must stop a pair of murderers is given the chance to write a story
who commit videotaped crimes to become about an up-and-coming rock band
media darlings as he accompanies it on their
concert tour.
Rating:
Recommend: ?No
Description: A biography of sports legend,
Muhammad Ali, from his early days to his
days in the ring
Description: A young
Rating: adventurer named Milo Thatch
Description: Benjamin Martin is drawn joins an intrepid group of
into the American revolutionary war against explorers to find the mysterious
his will when a brutal British commander lost continent of Atlantis.
kills his son. Recommend: ?Yes
17
Rating:
Netflix Contest

18
Relevant Disciplines
† Artificial Intelligence
† Statistics (particularly Bayesian Stat.)
† Computational complexity theory
† Information theory
† Optimization theory
† Philosophy
† Psychology
† …
19
Today’s Topics
† Why is machine learning?
† Example: learning to play backgammon
† General issues in machine learning

20
What is the Learning Problem
† Learning = Improving with experience at some task
„ Improve over task T
„ With respect to performance measure P
„ Based on experience E
† Example: Learning to Play Backgammon
„ T: Play backgammon
„ P: % of games won in world tournament
„ E: opportunity to play against itself

21
Backgammon

† More than 1020 states (boards)


† Best human players see only small fraction of all board
during lifetime
† Searching is hard because of dice (branching factor > 100)
22
TD-Gammon by Tesauro (1995)

† Trained by playing with itself


† Now approximately equal to the best human
player
23
Learn to Play Chess
† Task T: Play chess
† Performance P: Percent of games won in the
world tournament
† Experience E:
„ What experience?
„ How shall it be represented?
„ What exactly should be learned?
„ What specific algorithm to learn it?
24
Choose a Target Function
† Goal:
„ Policy: π: b Æ m B = board
† Choice of value ℜ = real values
function
„ V: b, m Æ ℜ

25
Choose a Target Function
† Goal:
„ Policy: π: b Æ m B = board
† Choice of value ℜ = real values
function
„ V: b, m Æ ℜ
„ V: b Æ ℜ

26
Value Function V(b): Example Definition

† If b final board that is won: V(b) = 1


† If b final board that is lost: V(b) = -1

† If b not final board V(b) = E[V(b*)]


where b* is final board after playing optimally

27
Representation of Target Function V(b)

Same value Lookup table


for each board (one entry for each board)

Summarize experience into


• Polynomials
• Neural Networks
No Learning No Generalization

28
Example: Linear Feature
Representation
† Features:
„ pb(b), pw(b) = number of black (white) pieces on board b
„ ub(b), ub(b) = number of unprotected pieces
„ tb(b), tb(b) = number of pieces threatened by opponent
† Linear function:
„ V(b) = w0pb(b)+ w1pw(b)+ w2ub(b)+ w3uw(b)+ w4tb(b)+
w5tw(b)
† Learning:
„ Estimation of parameters w0, …, w5
29
Tuning Weights
† Given:
„ board b
„ Predicted value V(b)
„ Desired value V*(b)
† Calculate
error(b) = (V*(b) – V(b))2
For each board feature fi
wiÅ wi + c×error(b)×fi
† Stochastically minimizes
∑b (V*(b)-V(b))2
Gradient Descent Optimization

30
Obtain Boards

† Random boards
† Beginner plays
† Professionals plays
31
Obtain Target Values
† Person provides value V(b)
† Play until termination. If outcome is
„ Win: V(b) Å 1 for all boards
„ Loss: V(b) Å -1 for all boards
„ Draw: V(b) Å 0 for all boards
† Play one move: b Æ b’
V(b) Å V(b’)
† Play n moves: b Æ b’Æ…Æ b(n)
„ V(b) Å V(b(n))
32
A General Framework
Mathematical Finding Optimal
Modeling Parameters

Statistics + Optimization

Machine Learning

33
Today’s Topics
† Why is machine learning?
† Example: learning to play backgammon
† General issues in machine learning

34
Importants Issues in Machine Learning
† Obtaining experience
„ How to obtain experience?
† Supervised learning vs. Unsupervised learning
„ How many examples are enough?
† PAC learning theory
† Learning algorithms
„ What algorithm can approximate function well, when?
„ How does the complexity of learning algorithms impact the learning accuracy?
„ Whether the target function is learnable?
† Representing inputs
„ How to represent the inputs?
„ How to remove the irrelevant information from the input representation?
„ How to reduce the redundancy of the input representation?

35

You might also like