Professional Documents
Culture Documents
o t
o
=
=
=
=
=
w
e w f
x
n
x
n
n
i
i
n
i
i
For examples,
Likelihood of Yes =
Likelihood of No =
000036 . 0
14
9
9
3
0221 . 0 0340 . 0
9
2
=
000136 . 0
14
5
5
3
038 . 0 0291 . 0
5
3
=
( )
( )
( )
( )
0340 . 0
2 . 6 2
1
Yes | 66 e temperatur
2
2 . 6 2
2
73 66
= = =
e f
t
Instance-Based Learning
In instance-based learning, we take k
nearest training samples of a new instance
(v
1
, v
2
, , v
m
) and assign the new
instance to the class that has most
instances in the k nearest training samples.
Classifiers that adopt instance-based
learning are commonly called the KNN
classifiers.
The basic version of the KNN classifiers works
only for data sets with numerical values.
However, extensions have been proposed for
handling data sets with categorical attributes.
If the number of training samples is sufficiently
large, then it can be proved statistically that the
KNN classifier can deliver the accuracy
achievable with learning from the training data
set.
However, if the number of training
samples is not large enough, the KNN
classifier may not work well.
If the data set is noiseless, then the 1NN classifier should work well.
In general, the more noisy the data set is, the higher should k be set.
However, the optimal k value should be figured out through cross
validation.
The ranges of attribute values should be normalized, before the
KNN classifier is applied. There are two common normalization
approaches
, where and o
2
are the mean and the variance of
the attribute values, respectively.
o
=
=
v
w
v v
v v
w
min max
min
Cross Validatioan
Most data classification algorithms require some
parameters to be set, e.g. k in KNN classifier
and the tree pruning threshold in the decision
tree.
One way to find an appropriate parameter
setting is through k-fold cross validation,
normally k=10.
In the k-fold cross validation, the training data
set is divided into k subsets. Then k runs of the
classification algorithm is conducted, with each
subset serving as the test set once, while using
the remaining (k-1) subsets as the training set.
The parameter values that yield maximum
accuracy in cross validation are then
adopted.
Example of the KNN Classifiers
If an 1NN classifier is employed, then the
prediction of A = X.
If an 3NN classifier is employed, then
prediction of A = O.