You are on page 1of 13

Introduction Supervised Classication Backbone 1-NN k-NN Conclusions

Pattern Classication (Duda, Hart, Stork) Nearest Neighbor Pattern Classication (Cover and Hart)

Roberto Souza
DCA/FEEC/UNICAMP

16 de mar co de 2012

Roberto Souza

1/ 13

Introduction Supervised Classication Backbone 1-NN k-NN Conclusions

Agenda

Introduction Supervised Classication Backbone 1-NN k-NN Conclusions

Roberto Souza

2/ 13

Introduction Supervised Classication Backbone 1-NN k-NN Conclusions

Developed in the 1960s; Non-parametric, sub-optimum classiers; Often provide competitive results; Simple to understand and implement.

Roberto Souza

3/ 13

Introduction Supervised Classication Backbone 1-NN k-NN Conclusions

Supervised Classication Problem Formulation


M Classes; N i.i.d. labeled samples Z = {(X1 , (X1 )), (X2 , (X2 )), ...(XN , (XN ))} (Xi ) {1 , 2 , ..., M } Assign new samples X s to one of the M possible classes in a way to minimize the misclassication error. Error expression:
+

p (error ) =
+

p (error , X )dX p (error | X )p (X )dX .

4/ 13

=
Roberto Souza

Introduction Supervised Classication Backbone 1-NN k-NN Conclusions

Bayesian Decison Rule


Finds an optimal solution to the classication problem; p (i ) and p (X | i ) are known distributions;
From Bayes Theorem, p (i | X ) can be written as: p (i | X ) = where: p (X ) =
i =1 M p (X |i )p (i ) , p (X )

p (X | i ) p (i ).

Bayesian decision Rule: p (error | X ) = 1 max [p (1 | X ), ..., p (M | X )]. Bayes Error Rate (BER) is achieved by using BDR.
Roberto Souza 5/ 13

Introduction Supervised Classication Backbone 1-NN k-NN Conclusions

1-NN Overview

Figura: Illustration of 1-NN operation.


Roberto Souza 6/ 13

Introduction Supervised Classication Backbone 1-NN k-NN Conclusions

1-NN Mathematical Formulation


The 1-NN classier can be formulated in terms of mathematical equations. The label (X ) of each new sample X is given by the following equation: (X ) = (XNN ), where XNN is given by: XNN = arg min {dX (Xi )},
X i Z

(1)

(2)

and dX (Xi ) is the distance between X and Xi in the chosen metric.


Roberto Souza 7/ 13

Introduction Supervised Classication Backbone 1-NN k-NN Conclusions

1-NN Error Bounds

As the number of labeled samples N tends to innity in a M -class classication problem, the 1-Nearest Neighbor Error Rate (1NNER) is bounded by the following expression: BER 1NNER BER (2
M M 1

BER ).

(3)

Roberto Souza

8/ 13

Introduction Supervised Classication Backbone 1-NN k-NN Conclusions

1-NN Weaknesses

1-NN is sensible to noise and outliers; 1NNER is only valid on an innite labeled samples space; Its computational complexity increases with N .

Roberto Souza

9/ 13

Introduction Supervised Classication Backbone 1-NN k-NN Conclusions

k-NN Overview

k-NN is a natural extension of the 1-NN classier. k-NN classies X by assigning it to the label most frequently present in the k nearest neighbors. k-NN takes into account k neighbors, so it is less sensible to noisy than 1-NN; It can be shown that for an innite number of samples, N ,as k tends to innity the k-NN Error Rate (kNNER) tends to the BER;

Roberto Souza

10/ 13

Introduction Supervised Classication Backbone 1-NN k-NN Conclusions

Anomalous 3-NN Case

Figura: Illustration of a 3-NN anomalous case.


Roberto Souza 11/ 13

Introduction Supervised Classication Backbone 1-NN k-NN Conclusions

Although k-NN, for k > 1, is theoretically a better classier than 1-NN, this may not be true if the number of training samples is not large enough; To avoid k-NN anomalous behaviour it is inserted a parameter d, i.e. k-NN is no longer a non-parametric classier.

Roberto Souza

12/ 13

Introduction Supervised Classication Backbone 1-NN k-NN Conclusions

Thanks for your attention!

Roberto Souza

13/ 13

You might also like