Lecture # 1 - Pattern Recognition

1
Course Syllabus
Course Content Introduction to Pattern Recognition Statistical Pattern Recognition

Bayesian Decision Theory Quadratic Classifier Parameter and Density Estimation Nearest Neighbors Linear Discriminants Validation
Dimensionality Reduction Principal Component Analysis Feature Subset Selection

2
Course Syllabus
Clustering

Mixture Models Self Organizing Map
Neural Network Approaches Single-Layer Perceptron Multi-Layer Perceptron Radial Basis Functions
Advanced Topics
Decision trees Hidden Markov Models Support Vector Machines

3
Course Syllabus
Course Prerequisites
Basic probability theory, statistics and the basic linear
algebra. Basic computer science principles and skills, at a level sufficient to write a reasonable non-trivial computer program.
Relation to Other Courses Machine Learning Data Mining
Course Syllabus
Reference Book
Pattern Classification, Richard O. Duda et.al, Second Edition,
2000.
Grading Policy
Assignments and Presentation.
Project and Paper submission. Final Exam.
What is a pattern?
is an entity, vaguely defined, that could be given a
name, fingerprint image, handwritten word, human
face, speech signal, DNA sequence,
What is Recognition?
It is the ability to classify
What is Pattern Recognition (PR)?

It is the study of how machines can
observe the environment learn to distinguish patterns of interest from their
background make sound and reasonable decisions about the categories of the patterns
Other Patterns
Insurance, credit card applications - applicants are characterized by
# of accidents, make of car, year of model Income, # of dependents, credit worthiness, mortgage amount
Web documents
Key words based descriptions.
Housing market
Location, size, year, school district, etc.
Pattern Class
A collection of similar (not necessarily identical)
objects
Inter-class variability
The letter T in different typefaces
Intra-class variability
Characters that look similar
10
Pattern Class Model

Different descriptions, which are typically mathematical in form for each class/population (e.g., a probability density like Gaussian)
11
Pattern Recognition
Key Objectives:
Process the sensed data to eliminate noise Hypothesize the models that describe each class
population (e.g., recover the process that generated the patterns) Given a sensed pattern, choose the best-fitting model for it and then assign it to class associated with the model
12
Classification vs Clustering
Classification (known categories)
Clustering (creation of new categories)
Category A
Category B Clustering
Classification (Recognition)
(Supervised Classification)
(Unsupervised Classification)
13
Emerging PR Applications
Problem Speech recognition Identification and counting of cells Input Speech waveforms Slides of blood samples, microsections of tissues Output Spoken words, speaker identity Type of cells
Detection and diagnosis of disease

Fingerprint identification Web search Character recognition (page readers, zip code, license plate)
EKG, EEG waveforms
Types of cardiac conditions, classes of brain conditions

Owner of the fingerprint, fingerprint classes Text relevant to the user Alphanumeric characters
Input image from fingerprint sensors Key words specified by a user Optical scanned image
14
Template Matching
Template
Input scene
Deformable Template: Corpus Callosum Segmentation

Shape training set
Prototype and variation learning
Prototype registration to the low-level segmented image
Prototype warping
16
Statistical Pattern Recognition

Patterns represented in a feature space Statistical model for pattern generation in feature space
17
Main Components of a PR system
18
Example of a PR System
19
Complexity of PR An Example
Preprocessing involves:
(1) image enhancement (2) separating touching or occluding fish (3) finding the boundary of the fish
Fish Classification: Sea Bass / Salmon
20
How to separate sea bass from salmon?

Possible features to be used:
Length Lightness Width Number and shape of fins Position of the mouth Etc
21
Decision Using Length

Choose the optimal threshold using a number of training examples.
22
Decision Using Average Lightness

Choose the optimal threshold using a number of training examples.
Overlap in the histograms is small compared to length feature

23
Cost of Missclassification
There are two possible classification errors.
(1) deciding the fish was a sea bass when it was a salmon. (2) deciding the fish was a salmon when it was a sea bass.
Which error is more important ?
24
Decision Using Multiple Features

To improve recognition, we might have to use more than one feature at a time.
Single features might not yield the best performance. Combinations of features might yield better
performance.
x1 x1 : lightness x x : width 2 2
25
Decision Boundary
Partition the feature space into two regions by finding the decision boundary that minimizes the error.
26
How Many and which Features to use?

Issues with feature extraction:
Correlated features do not improve performance. It might be difficult to extract certain features. It might be computationally expensive to extract
many features. Curse of dimensionality
27
Curse of Dimensionality
Adding too many features can lead to a worsening of
performance.
Calculation Complexity Overfitting Problem
28
Model Complexity
We can get perfect classification performance on the
training data by choosing complex models. Complex models are tuned to the particular training samples, rather than on the characteristics of the true model.
Issue of generalization
29
Generalization
The ability of the classifier to produce correct results
on novel patterns. How can we improve generalization performance ?

More training examples (i.e., better pdf estimates). Simpler models (i.e., simpler classification boundaries) usually yield
better performance.
30
The Design Cycle
31
Issues to Consider:
Noise / Segmentation
Data Collection / Feature Extraction Pattern Representation / Invariance/Missing Features Model Selection / Overfitting
Prior Knowledge / Context

Classifier Combination Costs and Risks
Computational Complexity
32
Noise
Various types of noise (e.g., shadows, conveyor belt might shake, etc.)
Noise can reduce the reliability of the feature values measured. Knowledge of the noise process can help improve performance.
33
Segmentation
Individual patterns have to be segmented.
Can we segment without having categorized them
first ? Can we categorize them without having segmented them first ?
How do we "group" together the proper number of elements ?
34
Data Collection
How do we know that we have collected an adequately
large and representative

set of examples for training/testing the system?
35
Feature Extraction
It is a domain-specific problem which influences classifier's performance.
Which features are most promising ? Are there ways to automatically learn which features are best ? How many should we use ? Choose features that are robust to noise. Favor features that lead to simpler decision regions.
36
Pattern Representation
Similar patterns should have similar representations. Patterns from different classes should have dissimilar representations. Pattern representations should be invariant to transformations such as:
translations, rotations, size, reflections, non-rigid
deformations
Small intra-class variation, large inter-class variation.

37
Missing Features
Certain features might be missing (e.g., due to occlusion).
How should the classifier make the best decision with missing features ? How should we train the classifier with missing features ?
38
Model Selection
How do we know when to reject a class of models and try another one ?
Is the model selection process just a trial and error process ? Can we automate this process ?
39
Overfitting
Models complex than necessary lead to overfitting (i.e., good performance on the training data but poor performance on novel data).
How can we adjust the complexity of the model ? (not very complex or simple). Are there principled methods for finding the
best complexity ?
40
Domain Knowledge
When there is not sufficient training data, incorporate domain knowledge:
Model how each pattern in generated (analysis by
synthesis) - this is difficult !! (e.g., recognize all types of chairs). Incorporate some knowledge about the pattern generation method. (e.g., optical character recognition (OCR) assuming characters are sequences of strokes)
41
Context
How m ch info mation a e we mi sing

42
Classifier Combination
Performance can be improved using a "pool" of classifiers.
How should we combine multiple classifiers ?
43
Costs and Risks

Each classification is associated with a cost or risk (e.g., classification error).
How can we incorporate knowledge about such risks ? Can we estimate the lowest possible risk of any classifier ?
44
Computational Complexity
How does an algorithm scale with
the number of feature dimensions number of patterns number of categories
Brute-force approaches might lead to perfect classifications results but usually have impractical time and memory requirements.
What is the tradeoff between computational ease and performance ?
45
General Purpose PR Systems

Ability to switch rapidly and seamlessly between different pattern recognition tasks
Difficulty to design a device that is capable of performing a variety of classification tasks
Different decision tasks may require different
features. Different features might yield different solutions. Different tradeoffs (e.g., classification error) exist for different tasks.
46

Lecture # 1 - Pattern Recognition

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lecture # 1 - Pattern Recognition

Uploaded by

Copyright:

Available Formats

1

Dimensionality Reduction Principal Component Analysis Feature Subset Selection

Mixture Models Self Organizing Map

Decision trees Hidden Markov Models Support Vector Machines

Relation to Other Courses Machine Learning Data Mining

face, speech signal, DNA sequence,

What is Pattern Recognition (PR)?

The letter T in different typefaces

Characters that look similar

Pattern Class Model

Detection and diagnosis of disease

EKG, EEG waveforms

Types of cardiac conditions, classes of brain conditions

Deformable Template: Corpus Callosum Segmentation

Prototype and variation learning

Prototype registration to the low-level segmented image

Statistical Pattern Recognition

Main Components of a PR system

Fish Classification: Sea Bass / Salmon

How to separate sea bass from salmon?

Decision Using Length

Decision Using Average Lightness

Overlap in the histograms is small compared to length feature

Which error is more important ?

Decision Using Multiple Features

How Many and which Features to use?

many features. Curse of dimensionality

on novel patterns. How can we improve generalization performance ?

The Design Cycle

Prior Knowledge / Context

first ? Can we categorize them without having segmented them first ?

How do we "group" together the proper number of elements ?

large and representative

Small intra-class variation, large inter-class variation.

How m ch info mation a e we mi sing

Costs and Risks

General Purpose PR Systems

You might also like