1 Intro

Machine Learning
Spring 2010
Rong Jin
1
CSE847 Machine Learning
Instructor: Rong Jin
Office Hour:
Tuesday 4:00pm-5:00pm
Thursday 4:00pm-5:00pm
Textbook
Machine Learning
The Elements of Statistical Learning
Pattern Recognition and Machine Learning
Many subjects are from papers
Web site: http://www.cse.msu.edu/~cse847
2
Requirements
~10 homework assignments
Course project
Topic: visual object recognition
Data: over one million images with extracted
visual features
Objective: build a classifier that automatically
identify the class of objects in images
Midterm exam & final exam
3
Goal
Familiarize you with the state-of-art in
Machine Learning
Breadth: many different techniques
Depth: Project
Hands-on experience
Develop the way of machine learning thinking
Learn how to model real-world problems by
machine learning techniques
Learn how to deal with practical issues
4
Course Outline
Theoretical Aspects Practical Aspects

• Information Theory • Supervised Learning Algorithms
• Optimization Theory • Unsupervised Learning Algorithms
• Probability Theory • Important Practical Issues
• Learning Theory • Applications
5
Today’s Topics
Why is machine learning?
Example: learning to play backgammon
General issues in machine learning
6
Why Machine Learning?
Past: most computer programs are mainly
made by hand
Future: Computers should be able to program
themselves by the interaction with their
environment
7
Recent Trends
Recent progress in algorithm and theory
Growing flood of online data
Computational power is available
Growing industry
8
Three Niches for Machine Learning
Data mining: using historical data to improve
decisions
Medical records Æ medical knowledge
Software applications that are difficult to program by
hand
Autonomous driving
Image Classification
User modeling
Automatic recommender systems
9
Typical Data Mining Task
Given:
• 9147 patient records, each describing pregnancy and birth
• Each patient contains 215 features
Task:
• Classes of future patients at high risk for Emergency Cesarean Section 10
Data Mining Results
One of 18 learned rules:

If no previous vaginal delivery
abnormal 2nd Trimester Ultrasound
Malpresentation at admission
Then probability of Emergency C-Section is 0.6
11
Credit Risk Analysis
Learned Rules:
If Other-Delinquent-Account > 2
Number-Delinquent-Billing-Cycles > 1
Then Profitable-Costumer ? = no
If Other-Delinquent-Account = 0
(Income > $30K or Years-of-Credit > 3)
12
Then Profitable-Costumer ? = yes
Programs too Difficult to Program By Hand
ALVINN drives 70mph on highways
13
ALVINN drives 70mph on highways
14
Visual object recognition
Classify Bird Images
Positive Examples
2
Statistical Model
Train Test
Negative Examples 3
2
15
Image Retrieval using Texts
16
Software that Models Users
History What to Recommend?
Description:A homicide detective and a Description: A high-school boy
fire marshall must stop a pair of murderers is given the chance to write a story
who commit videotaped crimes to become about an up-and-coming rock band
media darlings as he accompanies it on their
concert tour.
Rating:
Recommend: ?No
Description: A biography of sports legend,
Muhammad Ali, from his early days to his
days in the ring
Description: A young
Rating: adventurer named Milo Thatch
Description: Benjamin Martin is drawn joins an intrepid group of
into the American revolutionary war against explorers to find the mysterious
his will when a brutal British commander lost continent of Atlantis.
kills his son. Recommend: ?Yes
17
Rating:
Netflix Contest
18
Relevant Disciplines
Artificial Intelligence
Statistics (particularly Bayesian Stat.)
Computational complexity theory
Information theory
Optimization theory
Philosophy
Psychology
…
19
Today’s Topics
20
What is the Learning Problem
Learning = Improving with experience at some task
Improve over task T
With respect to performance measure P
Based on experience E
Example: Learning to Play Backgammon
T: Play backgammon
P: % of games won in world tournament
E: opportunity to play against itself
21
Backgammon
More than 1020 states (boards)

Best human players see only small fraction of all board
during lifetime
Searching is hard because of dice (branching factor > 100)
22
TD-Gammon by Tesauro (1995)
Trained by playing with itself

Now approximately equal to the best human
player
23
Learn to Play Chess
Task T: Play chess
Performance P: Percent of games won in the
world tournament
Experience E:
What experience?
How shall it be represented?
What exactly should be learned?
What specific algorithm to learn it?
24
Choose a Target Function
Goal:
Policy: π: b Æ m B = board
Choice of value ℜ = real values
function
V: b, m Æ ℜ
25
Choose a Target Function
Goal:
Policy: π: b Æ m B = board
Choice of value ℜ = real values
function
V: b, m Æ ℜ
V: b Æ ℜ
26
Value Function V(b): Example Definition
If b final board that is won: V(b) = 1

If b final board that is lost: V(b) = -1
If b not final board V(b) = E[V(b*)]

where b* is final board after playing optimally
27
Representation of Target Function V(b)
Same value Lookup table

for each board (one entry for each board)
Summarize experience into

• Polynomials
• Neural Networks
No Learning No Generalization
28
Example: Linear Feature
Representation
Features:
pb(b), pw(b) = number of black (white) pieces on board b
ub(b), ub(b) = number of unprotected pieces
tb(b), tb(b) = number of pieces threatened by opponent
Linear function:
V(b) = w0pb(b)+ w1pw(b)+ w2ub(b)+ w3uw(b)+ w4tb(b)+
w5tw(b)
Learning:
Estimation of parameters w0, …, w5
29
Tuning Weights
Given:
board b
Predicted value V(b)
Desired value V*(b)
Calculate
error(b) = (V*(b) – V(b))2
For each board feature fi
wiÅ wi + c×error(b)×fi
Stochastically minimizes
∑b (V*(b)-V(b))2
Gradient Descent Optimization
30
Obtain Boards
Random boards
Beginner plays
Professionals plays
31
Obtain Target Values
Person provides value V(b)
Play until termination. If outcome is
Win: V(b) Å 1 for all boards
Loss: V(b) Å -1 for all boards
Draw: V(b) Å 0 for all boards
Play one move: b Æ b’
V(b) Å V(b’)
Play n moves: b Æ b’Æ…Æ b(n)
V(b) Å V(b(n))
32
A General Framework
Mathematical Finding Optimal
Modeling Parameters
Statistics + Optimization
Machine Learning
33
Today’s Topics
34
Importants Issues in Machine Learning
Obtaining experience
How to obtain experience?
Supervised learning vs. Unsupervised learning
How many examples are enough?
PAC learning theory
Learning algorithms
What algorithm can approximate function well, when?
How does the complexity of learning algorithms impact the learning accuracy?
Whether the target function is learnable?
Representing inputs
How to represent the inputs?
How to remove the irrelevant information from the input representation?
How to reduce the redundancy of the input representation?
35

1 Intro

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

1 Intro

Uploaded by

Copyright:

Available Formats

Machine Learning

Theoretical Aspects Practical Aspects

One of 18 learned rules:

More than 1020 states (boards)

Trained by playing with itself

If b final board that is won: V(b) = 1

If b not final board V(b) = E[V(b*)]

Same value Lookup table

Summarize experience into

You might also like

1 Intro

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

1 Intro

Uploaded by

Copyright:

Available Formats

Machine Learning

Theoretical Aspects Practical Aspects

One of 18 learned rules:

 More than 1020 states (boards)

 Trained by playing with itself

 If b final board that is won: V(b) = 1

 If b not final board V(b) = E[V(b*)]

Same value Lookup table

Summarize experience into

You might also like

More than 1020 states (boards)

Trained by playing with itself

If b final board that is won: V(b) = 1

If b not final board V(b) = E[V(b*)]