Professional Documents
Culture Documents
Jason Corso
SUNY at Buffalo
15 January 2013
Biometrics
Salmon
Sea Bass
FIGURE 1.1. The objects to be classified are first sensed by a transducer (camera),
whose signals are preprocessed. Next the features are extracted and finally the clas-
sification is emitted, here either salmon or sea bass. Although the information flow
is often chosen to be from the source to the classifier, some systems employ information
J. Corso (SUNY at Buffalo) Introduction to Pattern Recognition
flow in which earlier levels of processing can be15 January
altered based on2013
the tentative9or/pre-
41
Pattern Recognition By Example
A Note On Preprocessing
Salmon
Sea Bass
Clear that the populations of salmon and sea bass are indeed distinct.
The space of all fish is quite large. Each dimension is defined by some
property of the fish, most of which we cannot even measure with the
camera.
J. Corso (SUNY at Buffalo) Introduction to Pattern Recognition 15 January 2013 11 / 41
Pattern Recognition By Example
Salmon
Sea Bass
Salmon
Sea Bass
Marginal
(A Feature)
Salmon
Sea Bass
Models
Suppose an expert at the fish packing plant tells us that length is the
best feature.
We cautiously trust this Histogram of the Length
expert. Gather a few examples Feature
from our installation to analyze
the length feature. count
salmon sea bass
22
These examples are our 20
training set. 18
16
FIGURE 1.2. Histograms for the length feature for the two categories. No single thresh-
marginal distributions. old value of the length will serve to unambiguously discriminate between the two cat-
egories; using length alone, we will have some errors. The value marked l will lead to
the smallest number of errors, on average. From: Richard O. Duda, Peter E. Hart, and
David G. Stork, Pattern Classification. Copyright
c 2001 by John Wiley & Sons, Inc.
Suppose an expert at the fish packing plant tells us that length is the
best feature.
We cautiously trust this Histogram of the Length
expert. Gather a few examples Feature
from our installation to analyze
the length feature. count
salmon sea bass
22
These examples are our 20
training set. 18
16
FIGURE 1.2. Histograms for the length feature for the two categories. No single thresh-
marginal distributions. old value of the length will serve to unambiguously discriminate between the two cat-
egories; using length alone, we will have some errors. The value marked l will lead to
But this is a disappointing result. The sea bass length does exceed
the smallest number of errors, on average. From: Richard O. Duda, Peter E. Hart, and
David G. Stork, Pattern Classification. Copyright
c 2001 by John Wiley & Sons, Inc.
count
14 salmon sea bass
12
10
0 lightness
2 4 x* 6 8 10
FIGURE 1.3. Histograms for the lightness feature for the two categories. No single
threshold value x (decision boundary) will serve to unambiguously discriminate be-
tween the two categories; using lightness alone, we will have some errors. The value x
This feature exhibits a much better separation between the two
marked will lead to the smallest number of errors, on average. From: Richard O. Duda,
Peter E. Hart, and David G. Stork, Pattern Classification. Copyright c 2001 by John
classes.Wiley & Sons, Inc.
J. Corso (SUNY at Buffalo) Introduction to Pattern Recognition 15 January 2013 17 / 41
Pattern Recognition By Example Modeling for the Fish Example
Feature Combination
Seldom will one feature be enough in practice.
In the fish example, perhaps lightness, x1 , and width, x2 , will jointly
do better than any alone.
This is an example of a 2D feature space:
x
x= 1 . (1)
x2
width
22 salmon sea bass
21
20
19
18
17
16
15
14 lightness
2 4 6 8 10
FIGURE
J. Corso (SUNY at 1.4. The two features
Buffalo) of lightness
Introduction and width
to Pattern for sea bass and salmon.
Recognition 15 The dark2013
January 18 / 41
Key Ideas in Pattern Recognition
Curse Of Dimensionality
The two features obviously separate the classes much better than one
alone.
This suggests adding a third feature. And a fourth feature. And so
on.
Key questions
Curse Of Dimensionality
The two features obviously separate the classes much better than one
alone.
This suggests adding a third feature. And a fourth feature. And so
on.
Key questions
How many features are required?
Curse Of Dimensionality
The two features obviously separate the classes much better than one
alone.
This suggests adding a third feature. And a fourth feature. And so
on.
Key questions
How many features are required?
Is there a point where we have too many features?
Curse Of Dimensionality
The two features obviously separate the classes much better than one
alone.
This suggests adding a third feature. And a fourth feature. And so
on.
Key questions
How many features are required?
Is there a point where we have too many features?
How do we know beforehand which features will work best?
Curse Of Dimensionality
The two features obviously separate the classes much better than one
alone.
This suggests adding a third feature. And a fourth feature. And so
on.
Key questions
How many features are required?
Is there a point where we have too many features?
How do we know beforehand which features will work best?
What happens when there is feature redundance/correlation?
Decision Boundary
count width
14 salmon sea bass 22 salmon sea bass
21
12
20
10
19
8
18
6
17
4
16
2 15
0 lightness 14 lightness
2 4 x* 6 8 10 2 4 6 8 10
FIGURE 1.3. Histograms for the lightness feature for the two categories.
FIGURENo single
1.4. The two features of lightness and width for sea bass and salmon. The dark
threshold value x (decision boundary) will serve to unambiguouslyline
discriminate
could servebe-as a decision boundary of our classifier. Overall classification error on
tween the two categories; using lightness alone, we will have some errors. shownx is lower than if we use only one feature as in Fig. 1.3, but there will
The value
the data
marked will lead to the smallest number of errors, on average. From: Richard O. Duda,
still be some errors. From: Richard O. Duda, Peter E. Hart, and David G. Stork, Pattern
Peter E. Hart, and David G. Stork, Pattern Classification. Copyright c 2001 by John
Classification . Copyright c 2001 by John Wiley & Sons, Inc.
Wiley & Sons, Inc.
Bias-Variance Dilemma
Depending on the available features, complexity of the problem and
classifier, the decision boundaries will also vary in complexity.
width width width
22 salmon sea bass 22 salmon sea bass 22 salmon sea bass
21 21 21
20 20 20
19 19 19
18
?
18 18
17 17 17
16 16 16
15 15 15
FIGURE
FIGURE 1.4. The two features of lightness and width for sea bass 1.6. The
and salmon. decision boundary shown might represent
The dark FIGURE 1.5. Overly
the optimal complex
tradeoff be- models for the fish will lead to decision boundaries t
line could serve as a decision boundary of our classifier. Overall
tweenclassification
performance error on are complicated.
on the training set and simplicity of classifier, therebyWhile
givingsuch
the a decision may lead to perfect classification of our train
the data shown is lower than if we use only one feature as highest
in Fig. 1.3, but thereonwill
accuracy new patterns. From: Richard O. Duda,samples, it would
Peter E. Hart, and lead
DavidtoG.poor performance on future patterns. The novel test po
still be some errors. From: Richard O. Duda, Peter E. Hart, and David G. Stork, Pattern
Classification. Copyright
c 2001 by John Wiley & Sons, Inc.
Stork, Pattern Classification. Copyright marked
c 2001 by John Wiley ? is Inc.
& Sons, evidently most likely a salmon, whereas the complex decision bound
shown leads it to be classified as a sea bass. From: Richard O. Duda, Peter E. Hart, a
David G. Stork, Pattern Classification. Copyright c 2001 by John Wiley & Sons, Inc
Bias-Variance Dilemma
Depending on the available features, complexity of the problem and
classifier, the decision boundaries will also vary in complexity.
width width width
22 salmon sea bass 22 salmon sea bass 22 salmon sea bass
21 21 21
20 20 20
19 19 19
18
?
18 18
17 17 17
16 16 16
15 15 15
FIGURE
FIGURE 1.4. The two features of lightness and width for sea bass 1.6. The
and salmon. decision boundary shown might represent
The dark FIGURE 1.5. Overly
the optimal complex
tradeoff be- models for the fish will lead to decision boundaries t
line could serve as a decision boundary of our classifier. Overall
tweenclassification
performance error on are complicated.
on the training set and simplicity of classifier, therebyWhile
givingsuch
the a decision may lead to perfect classification of our train
the data shown is lower than if we use only one feature as highest
in Fig. 1.3, but thereonwill
accuracy new patterns. From: Richard O. Duda,samples, it would
Peter E. Hart, and lead
DavidtoG.poor performance on future patterns. The novel test po
still be some errors. From: Richard O. Duda, Peter E. Hart, and David G. Stork, Pattern
Simple decision boundaries (e.g., linear) seem to miss some obvious
Classification. Copyright
c 2001 by John Wiley & Sons, Inc.
Stork, Pattern Classification. Copyright marked
c 2001 by John Wiley ? is Inc.
& Sons, evidently most likely a salmon, whereas the complex decision bound
shown leads it to be classified as a sea bass. From: Richard O. Duda, Peter E. Hart, a
Pattern Classification David G. Stork, . Copyright
c 2001 by John Wiley & Sons, Inc
trends in the data variance.
Bias-Variance Dilemma
Depending on the available features, complexity of the problem and
classifier, the decision boundaries will also vary in complexity.
width width width
22 salmon sea bass 22 salmon sea bass 22 salmon sea bass
21 21 21
20 20 20
19 19 19
18
?
18 18
17 17 17
16 16 16
15 15 15
FIGURE
FIGURE 1.4. The two features of lightness and width for sea bass 1.6. The
and salmon. decision boundary shown might represent
The dark FIGURE 1.5. Overly
the optimal complex
tradeoff be- models for the fish will lead to decision boundaries t
line could serve as a decision boundary of our classifier. Overall
tweenclassification
performance error on are complicated.
on the training set and simplicity of classifier, therebyWhile
givingsuch
the a decision may lead to perfect classification of our train
the data shown is lower than if we use only one feature as highest
in Fig. 1.3, but thereonwill
accuracy new patterns. From: Richard O. Duda,samples, it would
Peter E. Hart, and lead
DavidtoG.poor performance on future patterns. The novel test po
still be some errors. From: Richard O. Duda, Peter E. Hart, and David G. Stork, Pattern
Simple decision boundaries (e.g., linear) seem to miss some obvious
Classification. Copyright
c 2001 by John Wiley & Sons, Inc.
Stork, Pattern Classification. Copyright marked
c 2001 by John Wiley ? is Inc.
& Sons, evidently most likely a salmon, whereas the complex decision bound
shown leads it to be classified as a sea bass. From: Richard O. Duda, Peter E. Hart, a
Pattern Classification David G. Stork, . Copyright
c 2001 by John Wiley & Sons, Inc
trends in the data variance.
Complex decision boundaries seem to lock onto the idiosyncracies of
the training data set bias.
Bias-Variance Dilemma
Depending on the available features, complexity of the problem and
classifier, the decision boundaries will also vary in complexity.
width width width
22 salmon sea bass 22 salmon sea bass 22 salmon sea bass
21 21 21
20 20 20
19 19 19
18
?
18 18
17 17 17
16 16 16
15 15 15
FIGURE
FIGURE 1.4. The two features of lightness and width for sea bass 1.6. The
and salmon. decision boundary shown might represent
The dark FIGURE 1.5. Overly
the optimal complex
tradeoff be- models for the fish will lead to decision boundaries t
line could serve as a decision boundary of our classifier. Overall
tweenclassification
performance error on are complicated.
on the training set and simplicity of classifier, therebyWhile
givingsuch
the a decision may lead to perfect classification of our train
the data shown is lower than if we use only one feature as highest
in Fig. 1.3, but thereonwill
accuracy new patterns. From: Richard O. Duda,samples, it would
Peter E. Hart, and lead
DavidtoG.poor performance on future patterns. The novel test po
still be some errors. From: Richard O. Duda, Peter E. Hart, and David G. Stork, Pattern
Simple decision boundaries (e.g., linear) seem to miss some obvious
Classification. Copyright
c 2001 by John Wiley & Sons, Inc.
Stork, Pattern Classification. Copyright marked
c 2001 by John Wiley ? is Inc.
& Sons, evidently most likely a salmon, whereas the complex decision bound
shown leads it to be classified as a sea bass. From: Richard O. Duda, Peter E. Hart, a
Pattern Classification David G. Stork, . Copyright
c 2001 by John Wiley & Sons, Inc
trends in the data variance.
Complex decision boundaries seem to lock onto the idiosyncracies of
the training data set bias.
A central issue in pattern recognition is to build classifiers that can
work properly on novel query data. Hence, generalization is key.
Can we predict how well our classifier will generalize to novel data?
J. Corso (SUNY at Buffalo) Introduction to Pattern Recognition 15 January 2013 21 / 41
Key Ideas in Pattern Recognition Cost and Decision Theory
Decision Theory
lightness
0
x* 6
smaller values of lightness. 2 4 8 10
FIGURE 1.3. Histograms for the lightness feature for the two categories. No single
threshold value x (decision boundary) will serve to unambiguously discriminate be-
Our underlying goal is to establish a decision boundary to minimize
tween the two categories; using lightness alone, we will have some errors. The value x
marked will lead to the smallest number of errors, on average. From: Richard O. Duda,
Peter E. Hart, and David G. Stork, Pattern Classification. Copyright c 2001 by John
the overall cost; this is called decision theory. Wiley & Sons, Inc.
Pattern Recognition
Pattern Recognition
Statistical
Focus on statistics of the patterns.
The primary emphasis of our course.
Syntactic
Classifiers are defined using a set of logical rules.
Grammars can group rules.
Analysis By Synthesis
Classifier Ensembles
SO MANY QUESTIONS...
Schedule of Topics
1 Introduction to Pattern Recognition
2 Tree Classifiers Getting our feet wet with real classifiers
1 Decision Trees: CART, C4.5, ID3.
2 Random Forests
3 Bayesian Decision Theory Grounding our inquiry
4 Linear Discriminants Discriminative Classifiers: the Decision Boundary
1 Separability
2 Perceptrons
3 Support Vector Machines
5 Parametric Techniques Generative Methods grounded in Bayesian Decision
Theory
1 Maximum Likelihood Estimation
2 Bayesian Parameter Estimation
3 Sufficient Statistics
J. Corso (SUNY at Buffalo) Introduction to Pattern Recognition 15 January 2013 31 / 41
Schedule of Topics
6 Non-Parametric Techniques
1 Kernel Density Estimators
2 Parzen Window
3 Nearest Neighbor Methods
7 Unsupervised Methods Exploring the Data for Latent Structure
1 Component Analysis and Dimension Reduction
1 The Curse of Dimensionality
2 Principal Component Analysis
3 Fisher Linear Discriminant
4 Locally Linear Embedding
2 Clustering
1 K-Means
2 Expectation Maximization
3 Mean Shift
8 Classifier Ensembles (Bagging and Boosting)
1 Bagging
2 Boosting / AdaBoost
Code / Environments
Code / Environments
Python
Python
Logistical Things
Logistical Things
There will be homeworks posted after each topic. The homeworks are
to be done alone or in groups. Solutions will be posted. No
homeworks will be turned in or graded.
There will be a quiz once a week. Each quiz will have one rote
question and one longer question; ten minutes of class time will be
allotted to quizzes each week.
14 quizzes will be given. 2 lowest will be dropped.
Quizzes will be on Tuesday or Thursday; you will not know in advance.
Quizzes will be in-class, independent, closed-book.
Quizzes will not require a calculator.
Assessments of this type force you to study continuously throughout
the term.
See syllabus for more information.
455
Slightly advanced math for an undergrad CSE student; I felt
bombarded with math; This is a statistics class.
I would have liked to see more in depth walkthroughs. . . cemented
with real numbers.
This will rarely happen in the course. First, there is a lot of material to
cover. Second, you can work through these while you study; active study.
Third, there are recitations/hours with the TA to work through these.
I appreciated the balance between powerpoint and blackboard; there
was good reason to attend class.
455
Slightly advanced math for an undergrad CSE student; I felt
bombarded with math; This is a statistics class.
I would have liked to see more in depth walkthroughs. . . cemented
with real numbers.
This will rarely happen in the course. First, there is a lot of material to
cover. Second, you can work through these while you study; active study.
Third, there are recitations/hours with the TA to work through these.
I appreciated the balance between powerpoint and blackboard; there
was good reason to attend class.
The (hands-down) most interesting class Ive taken to-date; Very cool
course. Really cool field.
555
The course requires a very strong foundation in probability theory. . . it
would have been a lot easier if the professor [reviewed this material in
the beginning of the semester].
Students are expected to be fluent in probability theory and have a fresh
review of the material. Take responsibility.
I need more detailed examples on the course material; More time
should be spent with examples.
High-level examples on plausible data sets are indeed shown throughout the
course. Source code is also given to allow self-experimentation.
555
The course requires a very strong foundation in probability theory. . . it
would have been a lot easier if the professor [reviewed this material in
the beginning of the semester].
Students are expected to be fluent in probability theory and have a fresh
review of the material. Take responsibility.
I need more detailed examples on the course material; More time
should be spent with examples.
High-level examples on plausible data sets are indeed shown throughout the
course. Source code is also given to allow self-experimentation.
This is the best course I have taken so far in UB; This course is great;
This class stimulated me to go into the field of Machine Learning.