AI Lec 06 - KNN

Supervised Learning
Lecture 6
K-Nearest Neighbor (KNN)

Classifier and application
Dr Mustansar Ali Ghazanfar

Mustansara.ali@uettaxila.edu.pk
10/11/13
1
Supervised Learning
Lecture 6
K-Nearest Neighbor (KNN)

Classifier and application
Dr Mustansar Ali Ghazanfar

Mustansara.ali@uettaxila.edu.pk
*Note: Not part of the course
10/11/13
2
Objects, Feature Vectors,
Points
x1 1 2 3
X(15)
4
X(1) X(7) 6 5
X(16)
7
X(3) X(8)
9 10
X(25) X(12) 11
12
X(13) X(6) 13
14 8
X(9)
X(10)
15 16
X(4)
X(11)
X(14) Elliptical blobs (objects)
x2
10/11/13
3
Nearest Neighbours
x1
X(j)=(x1(j), x2(j), ,xn(j))
n
D(i, j ) = ( xk ( i ) xk ( j ) )
2
k =1
X(i)=(x1(i), x2(i), ,xn(i))
x2
10/11/13
4
Nearest Neighbour

Algorithm
Given training data (X(1),D(1)), (X(2),D(2)), ,
(X(N),D(N))
Define a distance metric between points in inputs

space. Common measures are:
Euclidean Distance
n
D(i, j ) = ( xk ( i ) xk ( j ) )
2
k =1
10/11/13
5
K-Nearest Neighbour Model
Given test point X
Find the K nearest training

inputs to X
Denote these points as

x
(X(1),D(1)), (X(2), D(2)), , (X(k), D(k))
10/11/13
6
Classification
The class identification of X
Y = most common class in set {D(1), D(2), , D(k)}
x
x
10/11/13
7
Example : Classify whether a customer will respond to a
survey question using a 3-Nearest Neighbor classifier
Customer Age Income No. credit Response

cards
John 35 35K 3 No
Rachel 22 50K 2 Yes
Hannah 63 200K 1 No
Tom 59 170K 1 No
Nellie 25 40K 4 Yes
David 37 50K 2 ?
10/11/13
8
Example : 3-Nearest Neighbors
15.16
15
152.23
122
15.74
10/11/13
9
15.16
15
152.23
122
15.74
Three nearest ones to David are: No, Yes, Yes
10/11/13
10
15.16
15
152.23
122
15.74
Yes
Three nearest ones to David are: No, Yes, Yes
10/11/13
11
Example: For the example we saw earlier, pick the best K
from the set {1, 2, 3} to build a K-NN classifier
10/11/13
12
k-Nearest Neighbor Pseudo
code
Training Algorithm:
Store all training examples <x, f(x)>
Find best value for K
Classification (Testing) Algorithm:
Given a query instance xq to be classified,

Let x1, xk denote the k instances from the list of
training examples

Return k
f (xq) argmax (c, f (xi ))

i=1
where (a,b)=1 if a=b and where (a,b)=0 otherwise
(C = class)
10/11/13
13
Nearest Neighbour Rule
Consider a two class problem

where each sample consists of
two measurements (x,y).
For a given query point q,

assign the class of the
k=1
nearest neighbour.
Compute the k nearest

neighbours and assign the
k=3
class by majority vote.
10/11/13 14
k-Nearest Neighbor
Examples
(discrete-valued target function)
k=1Click to edit Master text styles

Second level
k=5 Third level
Fourth level
Fifth level
10/11/13
15
KNN Flavors
10/11/13
16
Distance-Weighted Nearest
Neighbor Algorithm
Distance-weighted function (For Discrete-valued function)
k
f argmax i (c, f (xi ))
where
i=1
1
i
d(xq, xi )2
weights are proportional to distance;

d(xq, xi) is Euclidean distance.
10/11/13
17
How many neighbors, K?
K
Fixed constant
Determines number of elements to be
included in each neighborhood.

Neighborhood determines classification

Different k values can and will produce
different classifications
10/11/13
18
Picking K
Use N fold cross validation Pick K to minimize the cross

validation error
For each of N training example

Find its K nearest neighbours

Make a classification based on these K neighbours

Calculate classification error

Output average error over all examples
Use the K that gives lowest average error over the N

training examples
10/11/13
19
Nearest Neighbour
Complexity
Expensive for high dimensional data
(d>20?)
O(Nd) complexity for both storage
and query time
N is the number of training examples,
d is the dimension of each sample
10/11/13
20
Advantages/Disadvantages
Advantages:
Training is very fast
Learning complex target functions
Dont lose information
Disadvantages:
Slow at query
Easily fooled by irrelevant attributes
10/11/13
21
Nearest Neighbour Issues
Expensive
To determine the nearest neighbour of a query point q,
must compute the distance to all N training examples

Pre-sort training examples into fast data structures (kd-trees)

Remove redundant data (condensing)
Storage Requirements
Must store all training data D_tr

Remove redundant data (condensing)

Pre-sorting often increases the storage requirements
High Dimensional Data
Curse of Dimensionality

Required amount of training data increases exponentially
with dimension

Computational cost also increases dramatically
10/11/13
22
Condensing
Aim is to reduce the number of training samples

For example, Retain only the samples that are needed
to define the decision boundary
*Note: Not part of this lecture

10/11/13 23
10/11/13 24
Applications of KNN
Predicting Unknown movie for

a movie fan
10/11/13 25
10/11/13 26
KNN in Collaborative Filtering (CF)
Item1 Item2 Item3 Item4 Item5 Item6
User1 4 2 1 5
User2 5 5 5 1
User3 4 4 4 1
User4 3 3 5
10/11/13 27
KNN in CF
User1 4 1 5
x
User2 5 5 5 1
User3 4 4 4 1
User4 3 3 5
10/11/13 28
KNN in CF
User1 4 1 5
x
User2 5 5 5 1
User3 4 4 4 1
User4 3 3 5
10/11/13 29
KNN in CF
User1 4 1 5
x
User2 5 5 5 1
User3 4 4 4 1
User4 3 3 5
Sim (u1,u2) = {-1,+1} 0.5

Sim (u1,u3) = {-1,+1} -0.1
Sim (u1,u4) = {-1,+1} 0.3
10/11/13 Sum(|sim|) = 0.5 + 0.1 + 0.3 = 0.9 30

KNN in CF
User1 4 1 5
x
User2 5 5 5 1
User3 4 4 4 1
User4 3 3 5
Prediction = {(Sim (u1,u2) * rating_User2)

+ (Sim (u1,u3) * rating_User3)
+ (Sim (u1,u4) * rating_User4) } /
Sum(sim)
Prediction = {(0.5 * 5) + (-0.1 * 4) + (0.3* 3)} /(0.9)

10/11/13 = 3.05 31
How does KNN work here?
Find similar users.
Similarity measures.

Vector

Pearson correlation
Select set of K most similar users

User their votes for prediction
10/11/13
32
Pearson Correlation
10/11/13
33
10/11/13
34
Questions?
10/11/13
35

AI Lec 06 - KNN

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

AI Lec 06 - KNN

Uploaded by

Copyright:

Available Formats

Supervised Learning

K-Nearest Neighbor (KNN)

Dr Mustansar Ali Ghazanfar

K-Nearest Neighbor (KNN)

Dr Mustansar Ali Ghazanfar

X(i)=(x1(i), x2(i), ,xn(i))

Define a distance metric between points in inputs

Find the K nearest training

Denote these points as

The class identification of X

Y = most common class in set {D(1), D(2), , D(k)}

Customer Age Income No. credit Response

Rachel 22 50K 2 Yes

Nellie 25 40K 4 Yes

Three nearest ones to David are: No, Yes, Yes

Three nearest ones to David are: No, Yes, Yes

f (xq) argmax (c, f (xi ))

Consider a two class problem

For a given query point q,

Compute the k nearest

k=1Click to edit Master text styles

f argmax i (c, f (xi ))

weights are proportional to distance;

Use N fold cross validation Pick K to minimize the cross

For each of N training example

Use the K that gives lowest average error over the N

Aim is to reduce the number of training samples

*Note: Not part of this lecture

Predicting Unknown movie for

Item1 Item2 Item3 Item4 Item5 Item6

Sim (u1,u2) = {-1,+1} 0.5

10/11/13 Sum(|sim|) = 0.5 + 0.1 + 0.3 = 0.9 30

Prediction = {(Sim (u1,u2) * rating_User2)

Prediction = {(0.5 * 5) + (-0.1 * 4) + (0.3* 3)} /(0.9)

Select set of K most similar users

You might also like