You are on page 1of 37

2/21/2017

CS 8121 Pattern Recognition


BE(CSE)-VIII Semester
Dept of computer Sc. & Engg.
BIT, Jaipur

Syllabus
Mod-I : Pattern Recognition Overview

Mod-II : Statistical Pattern Recognition

Mod-III : Supervised Learning

Mod-IV : Linear Discriminate Functions and The Discrete


and Binary Feature Cases

Mod-V & VI : Syntactic Pattern Recognition

Mod-VII : Neural Pattern Recognition

1
2/21/2017

Books
Text Books:
1. Robort Schalkoff - Pattern Recognition, Statistical, Structural and
Neural Approach, John Wiley Indian Edition.

Reference Books :
1. R. U. Duda Pattern Classification, John Wiley, Indian Edition, 2006.

2. Bishop, Christopher - Pattern Recognition & Machine Learning, CBS,


Delhi

3. E. Gose, R.J., & S.J. Pattern Recognition and Image Analysis, PHI
Learning Pvt Ltd.

4. K. Fugunaga -Introduction to Statistical Pattern Recognition, New. York:


Academic Press, 1990

Text Book Analysis


Pattern Recognition, Statistical, Structural and
Neural Approach (Author : Robort Schalkoff)

This book is divided into four parts, demonstrating the


similarities and differences among the three approaches.

Part1: Introduction to general pattern recognition concerns (Ch1)


Part2: Statistical Pattern Recognition (StatPR) (Ch2 to Ch5)
Part3: Syntactic Pattern Recognition (SyntPR) (Ch6 to Ch9)
Part4: Neural Pattern Recognition (NeurPR) (Ch10 to Ch13)

2
2/21/2017

Pattern Recognition and Living Being


The most basic and essential characteristic of living beings is
the ability to segment and recognize objects. e.g.

Recognize each character of the alphabet effortlessly within a


fraction of a second
Easily recognizing faces
Distinguish between male and female faces
Identify a voice of known person when hearing a voice on the
phone
Understanding spoken words
Distinguish different style of handwriting
Distinguish fresh food from its smell

Pattern Recognition and Living Being


Inspecting manually the objects on the production line and rejecting of
pieces identified as faulty or damaged on the production line

A doctor with the knowledge acquired throughout a learning process


(e.g. years of education), can make a medical diagnosis based on clinical
findings and symptoms

To analyze the working of heart, doctor can recognize the ECG pattern
made by Heart beat and detect any misfire in pattern produced by
heart pumping.

Similarly a mechanic watch the misfire in pattern produced by engine


ignition.

Similarly an expert in seismology has ability to recognize a type of


volcanic event, based on the analysis of seismic signals and their
corresponding spectra.

3
2/21/2017

Pattern Recognition and Living Being


Certainly, all the higher animal depend on this ability for their very
survival.

Without it, they would be unable to function even in a static,


unchanging environment.

We, human beings, perceive information of the surrounding


environment by the senses.

Using a series of general concepts or patterns that we have learned


about the objects as well as with multi-sensorial information and the
cognitive ability of recognition we can recognize the objects.

In general, all the processes of recognition, involve a classification or an


identification of objects, persons, events or situations

Handwritten Characters Recognition

4
2/21/2017

Regular shaped Object recognition

Shape Discrimination

5
2/21/2017

Texture Discrimination

Face detection and Recognition

6
2/21/2017

Pattern Recognition and Computer


By the study of PR field of computer science, we
want to emulate the living beings capability into
the machine just to automate and reliable the
repetitive task of:

a) To distinguish different objects (pattern) or an object


from its background : Segmentation /Clustering

b) To categorize the distinguished objects (pattern):


Recognition/Classification

What is Pattern Recognition?


The study of how machines can observe the environment to
learn to distinguish patterns of interest from their background, and

make sound and reasonable decisions about the categories of the patterns.

The field of pattern recognition is concerned with the automatic


discovery of regularities in data through the use of computer
algorithms and with the use of these regularities to take actions such
as classifying data into different categories.

PR is the reverse problem. i.e. we have measurements about an object,


but we have to decide the class of that object.

E.g. in computer vision we are given an object. Now we have to


recognize what it is!

House/Building construction example.

7
2/21/2017

Abstract Representation of PR mapping

PR problem (StatPR and SyntPR):


Given measurements mi, we look for a method to
identify and invert mappings M and Gi for all i.
Unfortunately, these mapping are not functions and are
not onto ==> are not invertible.
Different patterns may have the same measurements
==> ambiguity.
M reflects our view of the world ... Good measurements
are more likely to produce good classification.
Patterns from the same class are close in the P space.
Measurements from the same class are (often) not close
in the F space. Example ... red and blue cars are close in
P; while red and blue color are far in F.
100% correct classification may not be feasible.

8
2/21/2017

Definitions from the literature


The assignment of a physical object or event to one of several pre -
specified categories Duda and Hart

A problem of estimating density functions in a high-dimensional space


and dividing the space into the regions of categories or classes
Fukunaga

Given some examples of complex signals and the correct decisions for
them, make decisions automatically for a stream of future examples
Ripley

The science that concerns the description or classification


(recognition) of measurements Schalkoff

The process of giving names w (class) to observations x Schrmann

Pattern Recognition is concerned with answering the question What is


this? Morse

PR in Scientific and Engineering discipline:


In scientific discipline, we develop PR techniques based on mathematics
and logic concerning the description or classification of objects
represented in terms of features with measurements.

Before 1960 PR was mostly the output of theoretical research in the area
of statistics.

In engineering discipline, we develop a PR System by understanding the


underlying models and techniques and their respective limitation which
are fundamental in designing of a PR system.

PR system design forces the engineer to consider trade -off between


exact solutions to approximate solutions. In fact we make the complete
study from the feasibility to its implementation. E.g. Computational
issues related to practical or even real-time implementation.

"Pattern recognition has its origins in engineering, whereas machine


learning grew out of computer science. Bishop (2006)

9
2/21/2017

Pattern Recognition Applications


S.No. Data Applications
1 Graphical(Image, Video data) Based on Computer Vision
electromagnetic waves (can be reproduced) = DIP + PR

2 Acoustic(Sound, Voice, Speech data) Based on Computer Listening


mechanical waves (can be reproduced) =ADP + PR

3 smell/Taste/Touch (Odor, Bitter, Temperature Computer Smelling,


data) (cannot be reproduced) taste, touch
= MDP + PR

4 General Textual(Char/Numeric) Reading General Data analyst


(Business, Scientific, Engineering and others = GDP + PR
data)

10
2/21/2017

Application Areas of PR
Machine Vision System (Inspector): Image Analysis
Character Recognition System (OCR) : Image Analysis
Computer aided diagnosis
Speech and Audio recognition (NLP)
Data mining and Knowledge discovery (Information Retrieval)
Biometrics: Faces, Iris, fingerprints, handwriting etc
Bioinformatics : DNA
Seismic analysis: Volcano eruption, earth quake etc
Radar signal classification and analysis
Medical domain: ECG, medical diagnosis
Remote sensing: Weather forecasting, Estimation of Glacier
melting etc
In general, extracting the hidden pattern and trend.

Application Areas of PR

11
2/21/2017

What is a Pattern?
Pattern is an abstract entity.

As opposite of a chaos; it is an entity, vaguely defined, that could be given


a name. For example, a pattern could be
A fingerprint images
A handwritten cursive word
A human face
A speech signal

Pattern is a composite of traits or features characteristic of an individual


e.g. a set of measurements, often in a vector form (StatPR) or graph/
grammar form (SyntPR).

In classification tasks, a pattern is a pair of variables {,} where


- is a collection of observations or features (feature vector)
- is the concept behind the observation (label)

Examples of patterns

12
2/21/2017

Hand written digit recognition

What is a Feature?

Feature is any distinctive aspect, quality or


characteristic

Features may be symbolic (e.g. color) or


numeric (e.g. size) or complex (primitives i.e.
building blocks)

13
2/21/2017

Feature Vector and space


The combination of features is a -dim column vector called a feature vector

The -dimensional space defined by the feature vector is called the


-feature space Rd ... if features are unconstrained and
-subspace of Rd ... if features are constrained

Objects are represented as points in feature space; the result is a scatter plot
Feature vectors ... used in StatPR, NeurPR

Classification

14
2/21/2017

Feature Extraction
Feature as measurements extracted from data may require
significant computational effort (e.g., extracting shape
properties of 3D objects)
Extracted features may be noisy ... may have errors
The quality of a feature vector is related to its ability to
discriminate examples from different classes

Examples from the same class should have similar feature values,
while from different classes save different feature values

Feature Selection
selection of features from the set of available
Features based on.
computationally feasible
good discriminative power
good descriptive power

15
2/21/2017

Pattern Distortion
Measurements may be noisy ... color varies with
lighting, shape varies with viewing angle, etc.

Features should be invariant to such changes.

In Image processing PR application, we often seek


recognition of objects when the objects may be in
arbitrary position (Translated), Angular orientation
(Rotated) and Sized(Scale). Thus, RST invariant features
are desired.

RST Invariant Feature extraction

Figure: Example of 2D regions for RST feature extraction.

16
2/21/2017

RST-invariant moments (well-known 7 features


based on statistical central moments)

i = invariant to RST transforms

RST-invariant moments (well-known 7 features


based on statistical central moments)

17
2/21/2017

Some definitions
Recognition: It is the ability to classify. In PR problems, dont
know class is dummy c+1st class
Classification: It assigns input data to one or more of c pre-
specified classes based on extraction of significant features or
attributes and the analysis of these attributes
Description It is alternative to classification where structural
description of the input pattern is desired.
A pattern Class: it is a set of patterns known to originate from
the same source in C.
Noise: It results from non-ideal circumstances
Distortion in the input pattern(measurement errors)
Error in preprocessing
Feature extraction errors
Training data errors

Some other definitions


Decision region:
A class labeled partition in feature space Rd made by
classifier.
For possible and unique class assignment, these regions
must cover Rd and be disjointed (non overlapped)
The border of these regions is a decision boundary.

Fig: Sample decision regions


a) Linear (piecewise)
b) Quadratic (hyperbolic)
c) (relatively) General

18
2/21/2017

Some other definitions


Discriminant function
A function together with decision rule in classifier to assign the class
label to a pattern is called discriminant function. It is denoted as
gi(X), where i = 1, 2, 3,c. (c = nos. of classes).

Classifier with decision rule:


A classifier is a set of discriminant
functions.
Using the decision rule it assign the
class wm to a pattern X.

Decision Rule:
gm(X) > gi(X) where i = 1, 2, 3, c
and i =/= m

Minimum distance classifier:

Discriminant functions
g1(X) = || X-X1||
g2(X) = || X- X2 ||

Corresponding partition
of R2

Decision Rule:
X belongs to class R1 if g1(X) < g2(X)
X on the decision boundary if g1(X) == g2(X)
X belongs to class R2 Otherwise

19
2/21/2017

Pattern Recognition System


The design of a pattern recognition system essentially involves
the following:

Data acquisition and sensing:


Measurements of physical variables
Important issues: bandwidth, resolution, sensitivity,
distortion, SNR, latency, etc.

Pre-processing: (Segmentation)
Removal of noise in data
Isolation of patterns of interest from the background

Feature extraction and selection:


Finding a new representation in terms of features

Pattern Recognition System


Model learning:
Learning a mapping between features and pattern groups
and categories

Classification:
Using features and learned models to assign a pattern to a
category

Post-processing:
Evaluation of confidence in decisions
Exploitation of context to improve performance
Combination of experts

20
2/21/2017

Pattern Recognition System: Process diagram

Two Modes of a PR system


Classification Mode

test Feature
Preprocessing Classification
pattern Measurement

training Feature
pattern Preprocessing Extraction/ Learning
Selection

Training Mode

21
2/21/2017

Example: PR system for Fish Sorting


A fish processing plant wants to automate the process of sorting incoming fishes according
to species (salmon and sea bass)

The automation system consists of


a conveyor belt for incoming products
two conveyor belts for sorted products
a pick-and-place robotic arm
a vision system with an overhead camera
a computer to analyze images and control the robot arm

Fish species

22
2/21/2017

Clear that the populations of salmon and sea bass are indeed distinct.
The space of all sh is quite large. Each dimension is dened by some property of
the sh, most of which we cannot even measure with the camera.

When we choose a set of possible features, we are projecting this


very high dimension space down into a lower dimension space.

23
2/21/2017

We build a model of each phenomenon we want to classify, which is


an approximate representation given the features we've selected.

PR system for Fish Sorting


Once a feature selection or a
classification procedure finds a
proper representation, a classifier
can be designed using a number of
possible approaches.

In practice, the choice of a classifier


is a difficult problem and it is often
based on which classifier(s) happen
to be available, best known, to the
user.

24
2/21/2017

Machine Learning
Programming computers to use example data or
past experience.

Well-Posed Learning Problems.


A computer program is said to learn from experienceE
with respect to class of tasksTand performance
measureP, if its performance at tasks T, as measured by
P, improves with experience E.

25
2/21/2017

PR system for Fish Sorting


Sensor
The vision system captures an image as a new fish enters the sorting area

Preprocessing
For lighting conditions, position of fish on the conveyor belt, camera noise,
etc, just to segmentation to separate fish from background

Feature extraction and Selection


-What kind of information can distinguish one species from the other?
e.g. length, width, weight, number and shape of fins, tail shape, etc.

-According a fisherman that, on the average a sea bass is generally longer


than a salmon.

-So, we can use length as a feature and sort the sea bass and salmon
according to a threshold on length.

-So, from the segmented image we estimate the length of the fishes

PR system for Fish Sorting


Classification and setting the threshold length
Collect a set of examples from both species
Compute the distribution of lengths for both classes
Determine a decision boundary (threshold) that minimizes the classification
error and getting the threshold length l*
We estimate the classifier s probability of error and obtain a discouraging
result of 40%
Decision Boundary

Fig: Histograms of length feature for


two type of fishes in training samples
to decide the threshold length

26
2/21/2017

Improving the performance of our PR system


Even though sea bass is longer than salmon on the average, there are many
examples of fish where this observation does not hold.

Determined to achieve a recognition rate of 95%, we try a number of features


Width, area, position of the eyes w.r.t. mouth... etc
only to find out that these features contain no discriminatory information

Finally we find a good feature: average intensity of the scales lightness

Fig: Histograms of intensity feature for


two type of fishes in training samples
to decide the threshold intensity

It looks easier to choose the threshold x*


but we still cannot make a perfect decision.

Improving the performance of our PR system


Multiple features option:

Assume we also observed that sea bass are typically longer than
salmon.

So, we combine length and average intensity of the scales to


improve separability.

We can use two features in our decision:


Intensity: x1
length: x2

Each fish image is now represented as a point (feature vector) in a two-


dimensional feature space.

We compute a linear discriminant function to separate the two classes,


and obtain a classification rate of 95.7%

27
2/21/2017

Improving the performance of our PR system


Scatter plot of lightness and width features for training samples.
We can draw a decision boundary to divide the feature space into
two regions.

Decision rule:
Classify the fish as a sea bass
if its feature vector falls above
the decision boundary shown,
and as salmon otherwise

Improving the performance of our PR system

Cost vs. classification rate


Our linear classifier was designed to minimize the
overall misclassification rate
Is this linear classifier the best objective function
for our fish processing plant?
The cost of misclassifying salmon as sea bass is that the end
customer will occasionally find a tasty piece of salmon when
he purchases sea bass
The cost of misclassifying sea bass as salmon is an end
customer upset when he finds a piece of sea bass purchased
at the price of salmon

28
2/21/2017

Improving the performance of our PR system


Intuitively, we could adjust the decision boundary to
minimize this cost function.

Still any improvement possible?

Improving the performance of our PR system


The recognition rate of our linear classifier (95.7%) met the design
specs, but we still think we can improve the performance of the
system.

We then design an ANN and obtain an impressive classification rate


of 99.99% with the following decision boundary.

More complex models result in


more complex boundaries.

Satisfied with our classifier, we


integrate the system and deploy
it to the fish processing plant.

29
2/21/2017

Improving the performance of our PR system

After a few days, the plant manager


calls to complain that the system is
misclassifying an average of 25% of
the fish.

What went wrong?

Improving the performance of our PR system


Problem: Missing the issue of generalization
Simple decision boundaries (e.g. linear) seem to miss
some obivous trends in data : Variance
Complex decision boundaries seem to lock onto the
idiosyncracies of the training data set : Bias.
A central issue in pattern recognition is to build classiers
that can work properly on novel query data. Hence,
generalization is key.
We may distinguish training samples perfectly, but can we
predict how well our classier will generalize to novel data?
i.e. generalize to unknown samples?

30
2/21/2017

Generalization
A good classifier should be able to generalize, i.e. perform well on unseen
data

The classifier should capture the underlying characteristics of the


categories

The classifier should NOT be tuned to the specific (accidental)


characteristics of the training data

Training data in practice contain some noise

As consequence:
We are better off with a slightly poorer performance on the training
examples, if this means that our classifier will have better performance
on novel patterns.

Generalization

The decision boundary shown may represent the optimal tradeoff between
accuracy on the training set and on new patterns

31
2/21/2017

Tradeoff between performance on training and novel


examples:
How can we determine automatically when the optimal tradeoff
has been reached?

Evaluation of the classifier on novel data is important to


avoid overfitting

Summarizing our example:


Classifier design is the task of recovering (approximating)
the model that generated the patterns (generally expressed
in terms of probability densities).

Given a new vector of feature values, the classifier needs to


determine the corresponding probability for each of the
possible categories.

32
2/21/2017

Performance evaluation:
Classification Error rate (Pe): The percentage of
misclassified test samples is taken as an estimate
of the error rate. i.e. new patterns that are
assigned to the wrong class.

Risk: Total expected cost. Can we estimate the


lowest possible risk of any classifier? to see how
close ours meet this ideal?

How should the available samples be split to form


training and test sets?

Bayesian Decision
Theory: considers the
ideal case in which the
probability structure
underlying the classes is
known perfectly. This is
rarely true in practice,
but it allows to
determine the optimal
classifier, against which
we can compare all
classifiers.

Class-conditional probability density functions: represent


the probability of measuring a certain value x given that the
pattern is in a certain class

33
2/21/2017

Procedure for PR system engineering

Pattern Recognition Approaches


The three best known approaches
statistical based
syntactic or structural matching based
neural networks based

34
2/21/2017

Approaches of PR
Statistical
-Patterns classified based on an underlying statistical model of the features
-The statistical model is defined by a family of class-conditional probability
density functions (x/ ) (Probability of feature vector given class )
Neural
-Classification is based on the response of a network of processing units (neurons) to an
input stimuli (pattern)
-Knowledge is stored in the connectivity and strength of the synaptic weights
-Trainable, non-algorithmic, black-box strategy
-Very attractive since it requires minimum a priori knowledge
-With enough layers and neurons, ANNs can create any complex decision region
Syntactic
-Patterns classified based on measures of structural similarity
-Knowledge is represented by means of formal grammars or relational descriptions
(graphs)
-Used not only for classification, but also for description
-Typically, syntactic approaches formulate hierarchical descriptions of complex patterns
built up from simpler sub patterns

Neural, Stat and Structural approach for OCR

35
2/21/2017

Comparisons of approaches

Other approach: Reasoning Driven


In this approach, the objective is to infer or derive a set of general
rules from the labeled training data to classify the objects.

We have a reduced order version of this approach in the graphical


approach to syntPR.

The AI based reasoning approach of PR can be observed in case of


incomplete pattern showing a small portion of an object.

e.g. Human observer (through a somewhat difficult to quantify


inference process) recognizes a partial pattern and complete it also.

36
2/21/2017

Thanks

37

You might also like