Professional Documents
Culture Documents
László Kozma
Lkozma@cis.hut.fi
20. 2. 2008
Supervised Learning
• Data set:
• Training (labeled) data: T = {(xi , yi )}
• xi ∈ Rp
• Test (unlabeled) data: x0 ∈ Rp
• Tasks:
• Classification: yi ∈ {1, . . . , J}
• Regression: yi ∈ R
• Given new x0 predict y0
• Methods:
• Model-based
• Memory-based
Classification
• 1 NN
• Predict the same value/class as the nearest instance in the
training set
• k NN
• find the k closest training points (small kxi − x0 k according
to some metric, for ex. euclidean, manhattan, etc.)
• predicted class: majority vote
• predicted value: average weighted by inverse distance
1 NN
1 NN - Voronoi diagram
• Classification
• use majority voting
• Binary classification
• k preferably odd to avoid ties
• Regression
Xk
• y0 = wi yi
i=1
• weights:
1
• wi =
k
• wi ∼ 1 − kxi − x0 k
• wi ∼ k − rankkxi − x0 k
k NN Classification
• Choice of k
• smaller k ⇒ higher variance (less stable)
• larger k ⇒ higher bias (less precise)
• Proper choice of k dependends on the data:
• Adaptive methods, heuristics
• Cross-validation
Distance metric
• Distance used:
• Euclidean, Manhattan, etc.
• Issue: scaling of different dimensions
• Selecting/scaling features: common problem for all
methods
• but affects k NN even more
• Improving results
• Preprocessing: smoothing the training data (remove
outliers, isolated points)
• Adapt metric to data
Discriminant Adaptive Nearest
Neighbor Classification (DANN)
(m1 −m2 )2
• Find w that maximizes J(w) = s1 2 +s2 2
(source: Alpaydin)
Linear Discriminant Analysis
Σ = W −1 BW −1 (2)
• x0 - test point
• di - distance of xi from x0 according to metric Σ
di = kΣ1/2 (xi − x0 )k (4)
X
J X X
N
W = wi (xi − x¯j )(xi − x¯j )T / wi (9)
j=1 yi =j i=1
• From Σ we obtain Σ′
• Iterative algorithm can be deviced (see article for proof of
convergence and more details)
DANN Algorithm