Professional Documents
Culture Documents
Introduction
Sudeshna Sarkar
IIT Kharagpur
Learning involves
Learning general models from data
Data is cheap and abundant. Knowledge is expensive
and scarce.
Build a model that is a good and useful approximation
to the data
Oct 17, 2006 Sudeshna Sarkar, IIT 3
Applications
Speech and hand-writing recognition
Autonomous robot control
Data mining and bioinformatics: motifs,
alignment,
Playing games
Fault detection
Clinical diagnosis
Spam email detection
Credit scoring, fraud detection
Associations
Reinforcement Learning
Sudeshna Sarkar
IIT Kharagpur
Concept learning
Acquire/Infer the definition of a general concept given a
sample of positive and negative training examples of the
concept
Each concept can be thought of as a Boolean valued
function
Approximate the function from samples
?
Sports VS Entertainment
Determine:
A hypothesis h H such that h(x) = f(x) for all x S ?
A hypothesis h H such that h(x) = f(x) for all x X ?
Hypothesis representation
define
Hypotheses space
Best fit?
Search
Training
examples Desired hypothesis
Data representation:
features from spectral
analysis of speech
signals
Task:
Classification of vowel
sounds in words of
the form h-?-d
Problem features:
Highly variable data with same classification.
Good feature selection is very important.
test classifier
# Words (f)
1 2 3 m
Words by rank order (r)
Fundamental questions:
What if the target concept is not contained in
hypothesis space?
The relationship between the size of hypothesis space,
the ability of algorithm to generalize to unobserved
instances, the number of training examples that must
be observed
Inductive Deductive
bias
Oct 17, 2006 Sudeshna Sarkar, IIT 45
Inductive Learning Hypothesis
Any hypothesis found to approximate the target function
well over the training examples, will also approximate the
target function well over the unobserved examples.
All support:
Probabilities - graded membership; comparability across categories
Adaptive - over time; across individuals
f1 !f1
f7 !f7 P(class) = .9
P(class) = .6 P(class) = .2
x1 x2 x3 xn
x1 x2 x3 xn
support
vectors
Sudeshna Sarkar
IIT Kharagpur
No Yes No Yes
Wind No No
Strong Weak
No Yes
No Yes No Yes
(Outlook=Sunny Humidity=Normal)
(Outlook=Overcast)
(Outlook=Rain Wind=Weak)
Over
Sunny Rain
cast
? Yes ?
Gain(Ssunny , Humidity)=0.970-(3/5)0.0 2/5(0.0) = 0.970
Gain(Ssunny , Temp.)=0.970-(2/5)0.0 2/5(1.0)-(1/5)0.0 = 0.570
Gain(Ssunny , Wind)=0.970= -(2/5)1.0 3/5(0.918) = 0.019
Oct 17, 2006 Sudeshna Sarkar, IIT 74
ID3 Algorithm
Outlook
No Yes No Yes
+ - +
A2
A1
+ - + + - -
+ - + - - +
A2 A2
- + - + -
A3 A4
Oct 17, 2006 Sudeshna Sarkar, IIT 76
+ - - +
Hypothesis Space Search ID3
Hypothesis space is complete!
Target function surely in there
Consider:
Medical diagnosis : blood test costs 1000 SEK
D1 D2 D3 D4 D1 D2 D3 D4
D1 D2 D3 D4 D1 D2 D3 D4
Feature subset
evaluation
Feature subset
evaluation
Oct 17, 2006 Sudeshna Sarkar, IIT 95
Wrapper Model
Evaluate the accuracy of the inducer for a given subset of features by
means of n-fold cross-validation
The training data is split into n folds, and the induction algorithm is
run n times. The accuracy results are averaged to produce the
estimated accuracy.
Forward elimination:
Starts with the empty set of features and greedily adds the feature
that improves the estimated accuracy at most
Backward elimination:
Starts with the set of all features and greedily removes features and
greedily removes the worst feature
yes
instance C*
yes no yes
C1 C2 CT
Training set1
Oct 17, 2006
Training set2
Sudeshna Sarkar, IIT
Training setT
97
Bagging
Bagging requires instable classifiers like for
example decision trees or neural networks