You are on page 1of 16

Naive Bayes Classifier

The naive Bayes classifier assigns an


instance s
k
with attribute values (A
1
=v
1
,
A
2
=v
2
, , A
m
=v
m
)

to class C
i
with
maximum Prob(C
i
|(v
1
, v
2
, , v
m
)) for all i.
The naive Bayes classifier exploits the
Bayess rule and assumes independence
of attributes.
Likelihood of s
k
belonging to C
i



Likelihood of s
k
belonging to C
j




Therefore, when comparing Prob(C
i
| (v
1
, v
2
, , v
m
)) and
P(C
j
|(v
1
, v
2
, , v
m
)), we only need to compute P((v
1
,
v
2
, , v
m
)|C
i
)P(C
i
) and P((v
1
, v
2
, , v
m
)|C
j
)P(C
j
)


( ) ( )
( ) ( )
( ) ( )
m
i i m
m i
P
C P C P
C
v ,..., v , v
) ( | v ,..., v , v
v ,..., v , v | Prob
2 1
2 1
2 1
= =
( ) ( )
( ) ( )
( ) ( )
m
j j m
m j
P
C P C P
C
v ,..., v , v
) ( | v ,..., v , v
v ,..., v , v | Prob
2 1
2 1
2 1
= =
Under the assumption of independent attributes




Furthermore, P(C
j
) can be computed by
( ) ( )
[
=
= =
= = = =
m
h
j h h
j m m j j
j m
C A P
C A P C A P C A P
C P
1
2 2 1 1
2 1
) | v (
) | v ( ) | v ( ) | v (
| v ,..., v , v

samples training of number total


to belonging samples training of number
j
C
An Example of the Nave Bayes
Classifier
The weather data, with counts and probabilities
outlook temperature humidity windy play
yes no yes no yes no yes no yes no
sunny 2 3 hot 2 2 high 3 4 false 6 2 9 5
overcast 4 0 mild 4 2 normal 6 1 true 3 3
rainy 3 2 cool 3 1
sunny 2/9 3/5 hot 2/9 2/5 high 3/9 4/5 false 6/9 2/5 9/14 5/14
overcast 4/9 0/5 mild 4/9 2/5 normal 6/9 1/5 true 3/9 3/5
rainy 3/9 2/5 cool 3/9 1/5
A new day
outlook temperature humidity windy play
sunny cool high true ?
Likelihood of yes


Likelihood of no


Therefore, the prediction is No
0053 . 0
14
9
9
3
9
3
9
3
9
2
= =
0206 . 0
14
5
5
3
5
4
5
1
5
3
= =
The Naive Bayes Classifier for
Data Sets with Numerical Attribute
Values
One common practice to handle numerical
attribute values is to assume normal
distributions for numerical attributes.
The numeric weather data with summary statistics
outlook temperature humidity windy play
yes no yes no yes no yes no yes no
sunny 2 3 83 85 86 85 false 6 2 9 5
overcast 4 0 70 80 96 90 true 3 3
rainy 3 2 68 65 80 70
64 72 65 95
69 71 70 91
75 80
75 70
72 90
81 75
sunny 2/9 3/5 mean 73 74.6 mean 79.1 86.2 false 6/9 2/5 9/14 5/14
overcast 4/9 0/5 std
dev
6.2 7.9 std
dev
10.2 9.7 true 3/9 3/5
rainy 3/9 2/5
Let x
1
, x
2
, , x
n
be the values of a numerical
attribute in the training data set.




( )
( )
2
2
2
1
) (
1
1
1
1
2
1
o

o t
o

=
=
=

=
=

w
e w f
x
n
x
n
n
i
i
n
i
i
For examples,


Likelihood of Yes =

Likelihood of No =
000036 . 0
14
9
9
3
0221 . 0 0340 . 0
9
2
=
000136 . 0
14
5
5
3
038 . 0 0291 . 0
5
3
=
( )
( )
( )
( )
0340 . 0
2 . 6 2
1
Yes | 66 e temperatur
2
2 . 6 2
2
73 66
= = =

e f
t
Instance-Based Learning
In instance-based learning, we take k
nearest training samples of a new instance
(v
1
, v
2
, , v
m
) and assign the new
instance to the class that has most
instances in the k nearest training samples.
Classifiers that adopt instance-based
learning are commonly called the KNN
classifiers.
The basic version of the KNN classifiers works
only for data sets with numerical values.
However, extensions have been proposed for
handling data sets with categorical attributes.
If the number of training samples is sufficiently
large, then it can be proved statistically that the
KNN classifier can deliver the accuracy
achievable with learning from the training data
set.
However, if the number of training
samples is not large enough, the KNN
classifier may not work well.
If the data set is noiseless, then the 1NN classifier should work well.
In general, the more noisy the data set is, the higher should k be set.
However, the optimal k value should be figured out through cross
validation.
The ranges of attribute values should be normalized, before the
KNN classifier is applied. There are two common normalization
approaches




, where and o
2
are the mean and the variance of
the attribute values, respectively.
o

=

=
v
w
v v
v v
w
min max
min
Cross Validatioan
Most data classification algorithms require some
parameters to be set, e.g. k in KNN classifier
and the tree pruning threshold in the decision
tree.
One way to find an appropriate parameter
setting is through k-fold cross validation,
normally k=10.
In the k-fold cross validation, the training data
set is divided into k subsets. Then k runs of the
classification algorithm is conducted, with each
subset serving as the test set once, while using
the remaining (k-1) subsets as the training set.
The parameter values that yield maximum
accuracy in cross validation are then
adopted.
Example of the KNN Classifiers




If an 1NN classifier is employed, then the
prediction of A = X.
If an 3NN classifier is employed, then
prediction of A = O.

You might also like