You are on page 1of 17

Pattern

Classification



All materials in these slides were taken from
Pattern Classification (2nd ed) by R. O.
Duda, P. E. Hart and D. G. Stork, John Wiley
& Sons, 2000
with the permission of the authors and the
publisher
Chapter 2 (part 3)
Bayesian Decision Theory
(Sections 2-6,2-9)
Discriminant Functions for the Normal Density

Bayes Decision Theory Discrete Features
Pattern Classification, Chapter 2 (Part 3)
2
Discriminant Functions for the
Normal Density
We saw that the minimum error-rate
classification can be achieved by the
discriminant function

g
i
(x) = ln P(x | e
i
) + ln P(e
i
)

Case of multivariate normal

) ( P ln ln
2
1
2 ln
2
d
) x ( ) x (
2
1
) x ( g
i i
1
i
i
t
i i
e E t + =

Pattern Classification, Chapter 2 (Part 3)


3

Case E
i
= o
2
.I

(I stands for the identity matrix)

) category! th the for threshold the called is (
) ( P ln
2
1
w ; w
: where
function) nt discrimina (linear w x w ) x ( g
0 i
i i
t
i
2
0 i
2
i
i
0 i
t
i i
i e
e
o o

+ = =
+ =
Pattern Classification, Chapter 2 (Part 3)
4
A classifier that uses linear discriminant functions
is called a linear machine


The decision surfaces for a linear machine are
pieces of hyperplanes defined by:

g
i
(x) = g
j
(x)


Pattern Classification, Chapter 2 (Part 3)
5
Pattern Classification, Chapter 2 (Part 3)
6
The hyperplane separating R
i
and R
j





always orthogonal to the line linking the means!


) (
) ( P
) ( P
ln ) (
2
1
x
j i
j
i
2
j i
2
j i 0

e
e

o

+ =
) (
2
1
x then ) ( P ) ( P if
j i 0 j i
e e + = =
Pattern Classification, Chapter 2 (Part 3)
7
Pattern Classification, Chapter 2 (Part 3)
8
Pattern Classification, Chapter 2 (Part 3)
9
Case E
i
= E (covariance of all classes are
identical but arbitrary!)

Hyperplane separating R
i
and R
j





(the hyperplane separating R
i
and R
j
is generally
not orthogonal to the line between the means!)
| |
) .(
) ( ) (
) ( P / ) ( P ln
) (
2
1
x
j i
j i
1 t
j i
j i
j i 0

E
e e


+ =

Pattern Classification, Chapter 2 (Part 3)
10
Pattern Classification, Chapter 2 (Part 3)
11
Pattern Classification, Chapter 2 (Part 3)
12
Case E
i
= arbitrary

The covariance matrices are different for each category












(Hyperquadrics which are: hyperplanes, pairs of
hyperplanes, hyperspheres, hyperellipsoids,
hyperparaboloids, hyperhyperboloids)
) ( P ln ln
2
1
2
1
w
w
2
1
W
: where
w x w x W x ) x ( g
i i i
1
i
t
i 0 i
i
1
i i
1
i i
0 i
t
i i
t
i
e E E
E
E
+ =
=
=
= + =

Pattern Classification, Chapter 2 (Part 3)


13
Pattern Classification, Chapter 2 (Part 3)
14
Pattern Classification, Chapter 2 (Part 3)
15
Bayes Decision Theory Discrete
Features
Components of x are binary or integer valued, x can
take only one of m discrete values
v
1
, v
2
, , v
m


Case of independent binary features in 2 category
problem
Let x = [x
1
, x
2
, , x
d
]
t
where each x
i
is either 0 or 1, with
probabilities:

p
i
= P(x
i
= 1 | e
1
)
q
i
= P(x
i
= 1 | e
2
)

Pattern Classification, Chapter 2 (Part 3)
16
The discriminant function in this case is:

0 g(x) if and 0 g(x) if decide
) ( P
) ( P
ln
q 1
p 1
ln w
: and
d ,..., 1 i
) p 1 ( q
) q 1 ( p
ln w
: where
w x w ) x ( g
2 1
2
1
d
1 i
i
i
0
i i
i i
i
0 i
d
1 i
i
s >
+

=
=

=
+ =
=
=
e e
e
e

You might also like