Professional Documents
Culture Documents
William Cohen
10-601 April 2008
1
But first….
50
45
40
1
35
yˆ x 26
30
7
Age in Years
25
20
15
10
0
0 20 40 60 80 100 120 140 160
2 Number of Publications
Onward: multivariate linear regression
Univariate col is feature Multivariate
yˆ wˆ 1 x1 ... wˆ k x k
row is example
1
w X y( X X )
ˆ T T
w arg min i
[ˆ ( w )] 2
ˆi (w ) yi w T x i
3
X Y
4
5
6
ACM Computing Surveys 2002
7
8
Review of K-NN methods (so far)
9
Kernel regression
• aka locally weighted regression, locally linear
regression, LOESS, …
What does making the kernel wider
do to bias and variance?
10
BellCore’s MovieRecommender
• Participants sent email to videos@bellcore.com
• System replied with a list of 500 movies to rate on a
1-10 scale (250 random, 250 popular)
– Only subset need to be rated
• New participant P sends in rated movies via email
• System compares ratings for P to ratings of (a
random sample of) previous users
• Most similar users are used to predict scores for
unrated movies (more later)
• System returns recommendations in an email
message.
11
Suggested Videos for: John A. Jamus.
Your must-see list with predicted ratings:
•7.0 "Alien (1979)"
•6.5 "Blade Runner"
•6.2 "Close Encounters Of The Third Kind (1977)"
Your video categories with average ratings:
•6.7 "Action/Adventure"
•6.5 "Science Fiction/Fantasy"
•6.3 "Children/Family"
•6.0 "Mystery/Suspense"
•5.9 "Comedy"
•5.8 "Drama"
12
The viewing patterns of 243 viewers were consulted. Patterns of 7 viewers were found to be most similar.
Correlation with target viewer:
•0.59 viewer-130 (unlisted@merl.com)
•0.55 bullert,jane r (bullert@cc.bellcore.com)
•0.51 jan_arst (jan_arst@khdld.decnet.philips.nl) Mystery/Suspense:
•"Silence Of The Lambs, The" 9.3, 3
•0.46 Ken Cross (moose@denali.EE.CORNELL.EDU) viewers
•0.42 rskt (rskt@cc.bellcore.com) Comedy:
•0.41 kkgg (kkgg@Athena.MIT.EDU) •"National Lampoon's Animal House" 7.5,
4 viewers
•0.41 bnn (bnn@cc.bellcore.com)
•"Driving Miss Daisy" 7.5, 4 viewers
By category, their joint ratings recommend: •"Hannah and Her Sisters" 8.0, 3 viewers
•Action/Adventure: Drama:
•"It's A Wonderful Life" 8.0, 5 viewers
•"Excalibur" 8.0, 4 viewers
•"Dead Poets Society" 7.0, 5 viewers
•"Apocalypse Now" 7.2, 4 viewers •"Rain Man" 7.5, 4 viewers
•"Platoon" 8.3, 3 viewers
Correlation of predicted ratings with your actual
•Science Fiction/Fantasy:
ratings is: 0.64 This number measures ability to
•"Total Recall" 7.2, 5 viewers evaluate movies accurately for you. 0.15 means
•Children/Family: low ability. 0.85 means very good ability. 0.50
•"Wizard Of Oz, The" 8.5, 4 viewers means fair ability.
•"Mary Poppins" 7.7, 3 viewers
13
Algorithms for Collaborative Filtering 1:
Memory-Based Algorithms (Breese et al, UAI98)
• vi,j= vote of user i on item j
• Ii = items for which user i has voted
• Mean vote for i is
14
Basic k-nearest neighbor classification
• Training method:
– Save the training examples
• At prediction time:
– Find the k training examples (x1,y1),…(xk,yk) that
are closest to the test example x
– Predict the most frequent class among those yi’s.
• Example:
http://cgm.cs.mcgill.ca/~soss/cs644/projects/simard/
15
What is the decision boundary?
Voronoi diagram
16
Convergence of 1-NN
P(Y|x’’) x2
P (knnError) P(Y|x)
1 Pr( y y1 ) x y2
1 Pr(Y y ' | x) 2 neighbor
y'
y
1 Pr( y* | x) 2
y ' y*
Pr(Y y ' | x ) 2
x1
... P(Y|x1)
2(1 Pr( y* | x)) y1
2(Bayes optimal error rate)
assume equal
• Improvements:
– Weighting examples from the neighborhood
– Measuring “closeness”
– Finding “close” examples in a large training set
quickly
18
K-NN and irrelevant features
19
K-NN and irrelevant features
+
o
+ ? o oo
+
o o
o o oo
+ o
o + +
+ + o o
o
o
o
o
20
K-NN and irrelevant features
+
+ ? oo o o
oo + o
+ o oo
+ o oo + o + oo +
o o
21
Ways of rescaling for KNN
Normalized L1 distance:
Scale by IG:
Modified value
distance metric:
22
Ways of rescaling for KNN
Dot product:
Cosine distance:
#docs in
#occur. of
corpus
term i in
doc j
#docs in
corpus that
contain term i
23
Combining distances to neighbors
Standard KNN: yˆ arg max y C ( y, Neighbors ( x))
C ( y, D' ) | {( x' , y ' ) D': y ' y} |
Distance-weighted KNN:
yˆ arg max y C ( y, Neighbors ( x))
C ( y, D' ) (SIM ( x, x' ))
{( x ', y ')D ': y ' y}
27
28
29
M1
M2
31
Efficiently implementing KNN (for text)
IDF is nice
computationally
32
Tricks with fast KNN
K-means using r-NN
1. Pick k points c1=x1,….,ck=xk as centers
2. For each xi, find Di=Neighborhood(xi)
3. For each xi, let ci=mean(Di)
4. Go to step 2….
33
Efficiently implementing KNN
dj3
dj4
34
Train once and select 100 test cases to classify
35