You are on page 1of 99

Pattern Classification and Dimensionality

Reduction : An Overview
Dr. D S Guru
dsg@compsci.uni-mysore.ac.in
Department of Studies in Computer Science,
University of Mysore, Manasagangothri,
Mysore - 570 006, INDIA.
Classification
Given are
the descriptions of n classes of objects,
and an unknown object X,
(Learning : Cognition)
task is to
identifying the class label of X.
(Re-learning : Recognition)
Application : Male or Female
Classification
Male Female
Application : Character Recognition
Classification
Hello
In this case, there are 26 classes A, B, Z
Application : Medical diagnostics
Cancer Not Cancer
classification
Application : Speech
Speech input
Speaker recognition .. Speech recognition
Identification Verification
Who What How
Classification
Techniques to recognize or describe
What : Unknown Pattern / Instance
How : By means of measured properties
called features.
Thus,
Classification Data Acquisition
+
Data Analysis
A formal definition :
M
-1
:I P
i.e. I
j
T(P
i
)
Stages in Classification
Delineation
Feature Extraction
Descriptive features
Discriminating features
Representation (Knowledge base creation)
Labeling
Feature Extraction :
Feature : An extractable measurement.
Why ? : For Discrimination.
What Feature ? : Depends on purpose of classification.
How many ? : Depends on Qualities of the System.
When ? : 1. Cognition (Training)
2. Recognition (Classification)
How ? : ??!!!
Feature Extraction
A fundamental step in Classification
Influences the performance and simplicity of the
classifier
Refers defining new features which are functions
of the original features
Depends on application domain and purpose of
classification (i.e., Label)
Representation Problem
Features
Text based features
Eg. Keywords Visual features
General
Eg. Colour, Texture
Domain Specific
Shape Spatial
Features
Qualitative
( Eg. Intelligent, smart, beautiful, liking).
Quantitative (Numeric)
Crisp : Single eg. 10 cm.
Fuzzy: eg. Around 10AM
Interval-Valued: [a..b]
Multivalued: [a
1
,a
2
,,a
n
]
Single Categorical Value eg. Town=Madurai
Multivalued With Weightage
Data with logical dependency
16
Fish Sorting
Classifier
Fish
Image
Fish Species
Sword
Fish
Golden Fish
Conveyer belt
Classifier Design
17
Fish Length in Centimeter
35 40 45 50 55 60 100 200 350 400 425 450 475 500
Golden
Fish 20 33 48 55 40 28 0 0 0 0 0 0 0 0
Sword
Fish 0 0 0 0 0 0 0 0 10 25 40 54 32 15
Fish Length as a Discriminating Factor
18
Find the best length threshold
Golden
Fish
Sword
Fish
Threshold
Selection
:
:
Golden fish if fishlength Threshold
FishClass Label
Sword fish if fishlength Threshold
<

=

>=

Classifying a new sample


19
20
Classifier Design
Threshold
Selection
Sea bass Salmon
:
:
Seabass fish if fishlength Threshold
FishClass Label
Salmon fish if fishlength Threshold
<

=

>=

Classifying a new sample


Use Fish Length as a feature
21
22
23
24
Classifiers
25
Nearest Neighbor Classifier schematic
For a test instance,
1) Calculate distances from training pts.
2) Find the first nearest neighbor
3) Assign class label of the first neighbor
26
K-NN classifier schematic
For a test instance,
1) Calculate distances from training pts.
2) Find K-nearest neighbours (say, K = 3)
3) Assign class label based on majority
27
How good is it?
Susceptible to noisy values
Slow because of distance calculation
Alternate approaches:
Distances to representative points only
Partial distance
How to determine value of K?
Determine K experimentally. The K that gives minimum
error is selected.
K-NN classifier Issues
28
Support Vector Machines
29
Support Vector Machine (SVM) Classification
Classification as a problem of finding
optimal (canonical) linear hyperplanes.
Optimal Linear Separating Hyperplanes:
In Input Space
In Kernel Space
Can be non-linear
Linear Separating Hyper-Planes
How many lines can separate these points?
NO!
Which line should we use?
Calculating the Margin of a Classifier
P0
P2
P1
P0: Any separating hyperplane
P1: Parallel to P0, passing through
closest point in one class
P2: Parallel to P0, passing through
point closest to the opposite class
Margin (M): distance measured along
a line perpendicular to P1 and P2
1
x
2
x
Different P0s have Different Margins
P0
P2
P1
P0: Any separating hyperplane
P1: Parallel to P0, passing through
closest point in one class
P2: Parallel to P0, passing through
point closest to the opposite class
Margin (M): distance measured along
a line perpendicular to P1 and P2
Different P0s have Different Margins
P0
P2
P1
P0: Any separating hyperplane
P1: Parallel to P0, passing through
closest point in one class
P2: Parallel to P0, passing through
point closest to the opposite class
Margin (M): distance measured along
a line perpendicular to P1 and P2
Different P0s have Different Margins
P0
P2
P1
P0: Any separating hyperplane
P1: Parallel to P0, passing through
closest point in one class
P2: Parallel to P0, passing through
point closest to the opposite class
Margin (M): distance measured along
a line perpendicular to P1 and P2
How Do SVMs Choose the Optimal Separating
Hyperplane (boundary)?
P2
P1
Find the that
maximizes the margin!
Margin (M): distance measured along
a line perpendicular to P1 and P2
2
margin (M)
w
=
w
Neural Network
37
Classifiers
Linear Classifier
Non Linear Classifier
Parametric Classifier
Non- Parametric Classifier
Hierarchical Classifier
Adaptive Classifier
38
Examples (1) :
Patterns :
A B C D E F
Features?
Line and Curve Segments
Knowledge Acquired
Object 0 45 90 145 Top semi
circle
Bottom
semi
circle
Left
semi
circle
Right
semi
circle
A 1 1 0 1 0 0 0 0
B 0 0 1 0 0 0 0 2
C 0 0 0 0 0 0 1 0
D 0 0 1 0 0 0 0 1
E 3 0 1 0 0 0 0 0
F 2 0 1 0 0 0 0 0
Recognition
Object 0 45 90 145 Top semi
circle
Bottom
semi
circle
Left
semi
circle
Right
semi
circle
^
0 1 0 1 0 0 0 0
Dist with A 01 Dist with B 07
Dist with C 03 Dist with D 04
Dist with E 12 Dist with F 07
Given a test pattern :
^
Example (2)
Patterns:
not a straight line
circle
straight line
X Y
x
1
y
1
x
2
y
2
x
3
y
3
x
n
y
n
2
1
22
) (
1
) (

=
= =
n
i
i
y y
n
c y Variance
2
1
11
) (
1
) (

=
= =
n
i
i
x x
n
c x Variance

=
= = =
n
i
i i
y y x x
n
c c y x Variance Co
1
21 12
) ( ) (
1
) , (

=
=
n
i
i
y
n
y
1
1
where,
(

=
22 21
12 11
c c
c c
C Construct a Matrix
Eigenvalues as Features
Given a set of points B = {p
i
| p
i
= (x
i
, y
i
) e Z
2
, i = 1, 2, 3,, n}.

=
=
n
i
i
x
n
x
1
1
( ) | |
2
12
2
22 11 22 11
4
2
1
c c c c c
S
+ + =
( ) | |
2
12
2
22 11 22 11
4
2
1
c c c c c
L
+ + + =
Compute eigenvalues
Solve for in | C - I | = 0
Solve for eigenvector V in CV = V
Compute eigenvectors
Orientation
(u)
Length large Small
0 50 50.54 0.0
10 60 60.34 0.0
27 70 70.45 0.0
30 55 55.62 0.0
Line : y = mx + c
Supervised Training
Radius (r) large small
10 52.0727 52.0727
20 201.98 201.98
30 452.57 452.5749
45 1016.90 1016.90
90 4074.90 4074.90
Circle : x
2
+ y
2
= r
2
Supervised Training
u large small
40 1.56 x 10
3
148.8766
60 851.3714 0.83 x 10
3
90 843.9310 605.5014
100 843.116 208.85
Angle : y = tan
-1
(90 - u /2) * |x|
u
Supervised Training
Knowledge Acquired
Straight Line :
* Small eigenvalue is zero and
* Large eigenvalue is proportional to the length of the line
Circle :
* Both eigenvalues are equal
Angle :
* Eigenvalues are different
* Small eigenvalue is relatively large.
KB : Created through Supervised Learning
Approaches to classification
Geometrical or Statistical Approach
Structural or Syntactic Approach
<STRUCTURAL DECOMPOSITION BASED METHODS>
Eg: AMAN
H
L S L
L L
An animal
H S
L L L L
Physical Image Symbolic Image
Topology Based Methods
Shape Based
Pixel representation
Chain code representation
Polygonal approximation
Higher order Moments
Centroidal / Radial profile
Incremental Circle Transform
Axis of least inertia
etc.,
Shape based Methods:
ICT and EA : Integrated approach
Definition :
Let (l) be a closed curve,
the ICT vector of o is :
Ao(l) = (Ax(l), Ay(l)) such that
Ax
2
(l) + Ay
2
(l) = r
2
and o(l+Al) = o(l) + Ao(l).
For some r and 0 s l s L
Boundary representation scheme:
1) Compute ICT vector
2) Find first PCV
Invariant Properties:
Translation invariant
Let C(l), 0 s l s L, be the boundary curve of an object and AC(l) = (Ax(l), Ay(l)) be the
corresponding ICT vector computed with constant radius r. Let C
t
(l) is the translated version
of C(l) and AC
t
(l)=(Ax
t
(l), Ay
t
(l)) is its corresponding ICT vector computed with the same
radius r. Then irrespective of the Translation vector the determinants of the variance and
covariance matrices AC(l) and AC
t
(l) remains the same.
Rotation Invariant
Theorem:
Let C(l), 0 s l s L, be the boundary of an object and AC(l)=(Ax(l), Ay(l)) be the corresponding
ICT vector computed with constant radius r. If C
r
(l) is the rotated version of C(l) and AC
r
(l) =
(Ax
r
(l), Ay
r
(l)) is its corresponding ICT vector computed with the same radius r then
irrespective of the rotation angle u the determinants of the variance-covariance matrices of
AC(l) and AC
r
(l) remain same.
Corollary: The egienvalues of the variance-covariance matrix of the ICT vector of the
boundary of a given object are rotational invariants.
Flipping Invariant
Lemma :
Let C(l) = (x(l), y(l)) 0 s l s L be the boundary curve of an object and AC(l)=(Ax(l), Ay(l)) be the
corresponding ICT vector computed with constant radius r. If C
f
(l), is the flipped version of C(l)
about Y-axis or/and X-axis and AC
f
(l) is its corresponding ICT vector computed with the same
radius r then the eigenvalues of the variance-covariance matrix of AC
f
(l) are same as that of
AC(l).
Theorem: Let C(l) = (x(l), y(l)) be the shape curve of an object and AC(l) = (Ax(l), Ay(l)) be its
corresponding ICT vector computed with a constant radius r. If C
f
(l) is the flipped version of C(l)
about an arbitrary line and AC
f
(l) is its corresponding ICT vector computed with the same radius
r then the eigenvalues of AC
f
(l) and AC(l) are one and the same.
Proposed Methodology
Algorithm: Create_Knowledge_base.
Input : S, Set of images of objects to be learnt (say n in number)
Output : Knowledge base of eigenvalues.
Method : For each of the image I in S do
1. Extract the boundary curve, B using a suitable boundary extractor.
2. Compute the ICT vector, V for the boundary B.
3. Construct the variance-covariance matrix, Mof the ICT vector V.
4. Find out the largest eigenvalue, E of the matrix M.
5. Store the eigenvalue E to represent the image I in a Knowledge base KB.
For end.
Create_Database ends.
Algorithm: Recognition.
Input : I, The image of an object O to be recognized.
Output : Index of I if it is one of the learnt image.
Method : 1. Extract the boundary curve, B of I.
2. Compute the ICT vector, V for B.
3. Construct the variance-covariance matrix, Mof V.
4. Find out the largest eigenvalue, E, of M.
5. Employ binary search technique to search for E in the knowledge base KB with
some threshold value and return the index.
Recognition Ends.
36 Samples of each object is considered
No flipped version of any object is
considered.
Data set 3
Aset of Industrial objects
Object Type Determinant Span Large eigenvalue Span Small eigenvalue Span
Model 1 0.2184 to 0.2302 0.5697 to 0.5808 0.3818 to 0.3966
Model 2 0.2214 to 0.2323 0.4939 to 0.5068 0.4457 to 0.4596
Model 3 0.2284 to 0.2360 0.5461 to 0.5578 0.4160 to 0.4255
Model 4 0.1656 to 0.1729 0.7198 to 0.7361 0.2298 to 0.2356
Model 5 0.2012 to 0.2068 0.4485 to 0.4560 0.4481 to 0.4543
Model 6 0.2264 to 0.2335 0.5319 to 0.5398 0.4253 to 0.4333
(a)
Object Type Determinant Span Large eigenvalue Span Small eigenvalue Span
Key A 0.1824 to 0.1893 0.5929 to 0.6070 0.3029 to 0.3146
key B 0.1873 to 0.1935 0.5694 to 0.5793 0.3279 to 0.3358
key C 0.1988 to 0.2044 0.5547 to 0.5639 0.3567 to 0.3636
key D 0.1920 to 0.1983 0.5828 to 0.5913 0.3288 to 0.3358
(b)
Object Type Determinant Span Large eigenvalue Span Small eigenvalue Span
Industrial Obj 1 0.1175 to 0.1207 0.7402 to 0.7442 0.1588 to 0.1628
Industrial Obj 2 0.0926 to 0.0962 0.7958 to 0.8018 0.1162 to 0.1206
Industrial Obj 3 0.1528 to 0.1554 0.7355 to 0.7401 0.2076 to 0.2103
Industrial Obj 4 0.0664 to 0.0697 0.8106 to 0.8155 0.0818 to 0.0856
Industrial Obj 5 0.0815 to 0.0854 0.8156 to 0.8276 0.0993 to 0.1039
Industrial Obj 6 0.1434 to 0.1477 0.6695 to 0.6786 0.2142 to 0.2176
Industrial Obj 7 0.1476 to 0.1515 0.6499 to 0.6561 0.2261 to 0.2312
Industrial Obj 8 0.1304 to 0.1355 0.7352 to 0.7409 0.1765 to 0.1839
Industrial Obj 9 0.0566 to 0.0601 0.7834 to 0.7918 0.0717 to 0.0760
(c)
Table. 10.1(a-c). Span in determinant, large eigenvalue and small eigenvalue.
MACHINE LEARNING ?!
Gaining Knowledge of ...
Skill in
By
Study, Practice or Being Taught
Unsupervised
Supervised
Through Experience
Crucial stage in Machine Perception
A COW
A COW WITH THREE LEGS AND TWO TAILS
Machine Learning through Vision?
Re Learning
MACHINE LEARNING ?!
Gaining Knowledge of ...
Skill in
By
Study, Practice or Being Taught
Unsupervised
Supervised
Through Experience
Crucial stage in Machine Perception
The process that allows the learner to cope with reality
Cognitive process
Dimensionality Reduction
Feature
S
a
m
p
l
e
s
1
2
3
.
.
m
1 2 3 n
Reducing m to c ; c<<m
independent of sample
sequence (Classification
or Cluster Analysis)
Dimensionality
Reduction
Cluster Analysis : Unsupervised Learning
Classification : Supervised Learning
Dimensionality Reduction
Crashing of n features to d features where d<<n
Advantages of DR
Reduction in Memory Requirement
Data Analysis becomes simplified
Cluster Analysis and hence Classifier design
becomes easier
Visualization becomes relatively possible
Time efficient classifier
Dimensionality Reduction Methodologies
Feature Sub setting
Feature Transformation
Feature Sub setting :
Process of choosing d number of features from the collection of n
features.
There are 2
n
possible subsets.
Problem lies in : Choosing the best subset.
: O(2
n
) : Exponential
Original
features
Transformed
features
T
Feature Transformation
T?
Feature Selection Methods
Filter method
Supervised
Learning Algorithm Independent
Feature Selection criterion is required
Linear time complexity
Wrapper method
Unsupervised
Learning Algorithm dependent
No feature selection criterion is required
Quadratic complexity
The Simplest Filter Method :
Repeat
Until
Merge those two features
for which correlation is the
highest
(desired level of
dimensionality reduction
is achieved).
Wrapper Methods
Sequential Forward Selection (SFS)
Sequential Backward Selection (SBS)
Sequential Floating Forward Selection (SFFS)
Sequential Floating Backward Selection (SFFS)
77
Sequential Forward Selection (SFS)
Method of inclusion
Starts with empty set
At each step it adds a best feature such that
performance of a learning algorithm is
maximized
78
SFS - Example
79
Sequential Backward Selection (SBS)
Method of elimination
Starts with the set of all features
At each step it eliminates a worst feature such
that performance of a learning algorithm is
maximized
80
SBS - Example
81
Sequential Floating Forward Selection (SFFS)
Method of inclusion and elimination
Starts with empty set
Forward selection followed by backward elimination
SFS + SBS at each step
82
Sequential Floating Backward Selection (SFBS)
Method of elimination and inclusion
Starts with set of all features
Backward elimination followed by forward selection
SBS + SFS at each step
83
Feature Transformation Techniques
Principal Component Analysis
Independent Component Analysis
Latent Semantic Indexing
Manifold Learning
Fisher Linear Discriminate Analysis
Canonical Correlation Analysis
Partial Least Square
Principal Component Analysis <Hotelling Transformation>
PCA
f
1
f
2
f
n
Pc
1
Pc
n
Pc
2
Let F be a feature
matrix,
M=Covariance(F) ;
= Eigen values [M]
(|M- I| = 0) ;
MV= V.
x
y
pc
1
pc
2
s
1
s
2
s
3
s
4
d
1
d
2
d
3
d
4
Stereographic Projection Model
Quadratic Solver
(1)
(1,15,6)
(2)
(3)
(4)
(5)
(6)
(8) (7)
(8,15,6)
(8,15,7)
(1,15,7)
(8,20,6)
(8,20,7)
(1,20,6)
(1,20,7)
X
Y
Z
Quadratic Solver Based Model
(1,-5,6) (3,2) (2,8,6) (-3,3/2)
(1,8,12) (-6,-2) (4,-14,6) (3,1/2)
(1,7,12) (-4,-3) (2,-15,13) (1,13/2)
(1,-11,24) (3,8) (1,-8,15) (5,3)
(1,-7,10) (5,2) (5,-7,2) (1,2/5)
-20
-18
-16
-14
0
(-0.9) (-0.8) (-0.3) (-0.2)
6
3
7
4
1
2
8
5
Quadratic Solver : Dimensionality Reducer
(1,-5,6) (3,2) (2,8,6) (-3,3/2)
(1,8,12) (-6,-2) (4,-14,6) (3,1/2)
(1,7,12) (-4,-3) (2,-15,13) (1,13/2)
(1,-11,24) (3,8) (1,-8,15) (5,3)
(1,-7,10) (5,2) (5,-7,2) (1,2/5)
0 172 172 229 20
172 0 468 505 360
172 468 0 1 180
229 505 1 0 205
20 360 180 205 0
0 40 106 125 2
40 0 170 181 6
106 170 0 5 74
125 181 5 0 10
2 6 74 10 0
1 2 3 4 5
1
2
3
4
5
1 2 3 4 5
1
2
3
4
5
Triplet
Pair
Semantic Gap
Features
Low Level Features
High Level Features
Extracted directly from data
Easy to extract and analyze
Widely used
Inferred from Low Level Features
Difficult to extract and analyze
Rarely used
Not realistic in nature and hence far
away from human perception
Statistical analysis can be carried out
Conventional in nature
Realistic in nature and hence
Similar to human perception
Aggregation and abstraction is
possible
Unconventional in nature
Semantic Gap
Proximity (Conventional)
Work on crisp type features
The proximity is crisp
It is symmetric
Similarity + Dissimilarity = Constant
But in reality,
- Feature is not necessarily crisp
- Proximity itself may not be crisp
- Proximity might not be symmetric
- Similarity might not be just another aspect of dissimilarity
Technology Provides
User Demands
Semantic
Gap
Existing classifiers
Parametric
Exclusive
Uncertain (Inconsistency)
Non-adaptive
Non-Parametric
Overlapping
Consistent
Adaptive
Semantic
Gap
Demanding classifiers
Some Publications from my team for your
reference
1. D.S. Guru, K.S. Manjunatha, S. Manjunath., User Dependent Features in Online
Signature Verification. Proceedings of ICMCCA12, LNEE, 2012.
2. Harish B S, Guru D S and Manjunath S., Dissimilarity Based Feature Selection for
Text Classification: A Cluster Based Approach. proceedings of ACM International
Conference and Workshop on Emerging Trends and Technology, Feb 25 -26,
Mumbai, India, 2011.
3. Guru D S and Mallikarjuna P B. Fusion of Texture Features and Sequential Forward
selection method for Classification of Tobacco Leaves for Automatic Harvesting. In
proceedings of second International conference on Computational Vision and
Robotics, Bhubaneshwar, India, August 14-15, 2011, pp. 168-172.
4. B. S. Harish, D. S. Guru, S. Manjunath, Bapu B. Kiranagi., Symbolic Similarity and
symbolic Feature Selection for Text Classification. International workshop on
Emerging Applications on Computer Vision, 2011, pp. 21 28, Moscow (Russia), pp
141-146.
5. D. S. Guru, P. B. Mallikarjuna., Classification of Tobacco Leaves for Automatic Harvesting: An
Approach Based on Feature Level Fusion and SBS Method. International workshop on
Emerging Applications on Computer Vision, 2011, pp. 21 28, Moscow (Russia), pp 102-109.
6. D. S. Guru, M.G. Suraj, S. Manjunath., Fusion of covariance matrices of PCA and FLD. Pattern
Recognition Letters.,32, 2011, pp 432-440.
7. Harish B S, Guru D S, Manjunath S, Dinesh R., Cluster Based Symbolic Representation and
Feature Selection for Text Classification. Proceedings of Advanced Data Mining and
Applications, Vol. 2, pp. 158-166, 2010.
8. Punitha P and Guru D S., Symbolic image indexing and retrieval by spatial similarity: An
approach based on B-tree. Journal of Pattern Recognition, Elsevier Publishers, Vol. 41, 2008,
pp 2068 - 2085.
9. Suraj M G and Guru D S., Secondary diagonal FLD for fingerspelling recognition. Proceedings
of the International Conference on Computing: Theory and Applications, (ICCTA07,) Kolkota,
India, March 5-7, 2007, pp. 693-697.
10. Kiranagi B B, Guru D S and Ichino M., Exploitation of multivalued type proximity for symbolic
feature selection. Proceedings of the International Conference on Computing: Theory and
Applications, (ICCTA07), Kolkota, India, March 5-7, 2007, pp. 320 - 324
11. Nagabhushan P, Guru D S and Shekar B H., (2D)
2
FLD: An efficient approach for appearance
based object recognition. Journal of Neurocomputing, Elsevier Publishers, Vol. 69. No.7-9,
2006, pp 934-940.
12. Nagabhushan P, Guru D S and Shekar B H., Visual Learning and Recognition of 3D Objects
Using Two Dimensional Principal Component Analysis: A Robust and an Efficient Approach,
Journal of Pattern Recognition, Elsevier Publishers, Vol. 39. No.4, 2006, pp 721-725
In Summary
Classification
Stages
Applications
Designing a classification System
Dimensionality Reduction
Case Study Examples
Now
Questions?
are Welcome
There is always a distance between two living
things as it is unlikely that any two living beings
are alike. It is true even with artificially made
objects however, they are visually alike.
D S Guru
R E S E A R C H ??
Reading a lot for
Establishing
Scientific and
Engineering
Aptitude to have a good personal
Rapport with a
Commitment to build up a
Healthy society for the development of Nation
-D.S. Guru
No(w) Questions!?
Dr. D S G

You might also like