Professional Documents
Culture Documents
DOI 10.1007/s12555-011-0099-1
ISSN:1598-6446 eISSN:2005-4092
http://www.springer.com/12555
1. INTRODUCTION
In huge datasets, thousands of measurements are
collected every day, thus the amount of information
stored in databases is massive and continuously growing.
Therefore, the knowledge extracted from these databases
need to be continuously updated, otherwise it could
become outdated. The main problem of using traditional
(non-incremental) learning algorithms to extract knowledge from databases with huge and continuously
growing data is the high computational effort required.
Incremental learning ability is very vital to large
datasets due to two reasons. Firstly, it is almost impossible
to collect all useful training instances before the trained
system is put into use. Secondly, modifying a trained
system may be cheaper in time cost than building a new
system from scratch.
Naive Bayes (NB) [5] classifier is the simplest form of
Bayesian network [15] since it captures the assumption
that every attribute is independent of all the all other
attributes, given the class feature. The naive Bayes algorithm is traditionally used in batch mode, meaning that
the algorithm does not perform the majority of its computations after observing each training instance, but rather accumulates certain information on all of the training instances and then performs the final computations
on the batch of instances [5]. However, note that there
is nothing inherent in the algorithm that stops one from
using it to learn incrementally. As an example, consider
how the incremental naive Bayes algorithm can work
assuming that it makes one pass through all of the training set. In step #1, it initializes all of the counts and to__________
Manuscript received August 16, 2011; revised March 2, 2012
and September 6, 2012; accepted November 1, 2012. Recommended by Editorial Board member Yuan Fang Zheng under the
direction of Editor Myotaeg Lim.
Sotiris Kotsiantis is with the Educational Software Development Laboratory of the Department of Mathematics, University of
Patras, Rio 26504, Greece (e-mail: kotsiantis@upatras.gr).
ICROS, KIEE and Springer 2013
160
Sotiris Kotsiantis
Increasing the Accuracy of Incremental Naive Bayes Classifier Using Instance Based Learning
Training:
1. Initialize all the counts of feature values and totals to 0
and then goes through the training examples, one at a time.
2. For each training example, it is given the feature vector x
and the value of the class for that. The algorithm goes
through the feature vector and according to the feature
values increments the proper feature values counts.
3. These counts and totals are converted to probabilities by
dividing each count by the number of training examples in
same class c e.g. P(f|c): the probability of each feature value
f in each class c.
4. Computes the prior probabilities p(c) as the fraction of all
training examples that are in class c.
Classification:
1. Obtain the test instance
2. Calculate the probabilities of belonging the instance in
each class of the dataset e.g. take P(c) times the calculated
probability of each feature in class c, i.e. each P(f |c) based
on the test instance.
3. If the probability of the most possible class is at least two
times the probability of the next possible class then the
decision is that of NB model else
a. Find the k(=3) nearest neighbors using the selected
distance metric (Manhattan in our implementation)
b. Aggregate the decisions of NB with 3NN classifier by
averaging of the probabilities for the classification of the
testing instance.
c. The class with the highest probability is the final
decision
161
Sotiris Kotsiantis
162
Proposed
73.07
69.46
100
90.1
73.22
96.19
80.57
75.15
82.41
75.72
66.31
73.31
83.74
84.15
84.93
83.11
91.14
95
93.37
84.77
80.37
56.87
93.45
47.2
80.04
85.26
78.31
66.17
90.74
98.13
96.25
80.12
74.58
95.31
NB
72.64
57.41
99.66
90.53
72.7
96.07
78.7
75.16
77.86
75.75
49.45
75.06
83.34
83.95
83.59
83.81
82.17
95.53
93.57
83.13
73.38
56.83
93.45
49.71
67.71
85.7
77.85
44.68
90.02
97.46
94.97
80.09
74.67
96.00
0/27/7
*
*
*
*
*
3NN
67.97
67.23
100
86.74
73.13
96.6
80.95
72.21
84.96
73.86
70.02
69.77
81.82
82.33
79.11
80.85
86.02
95.2
92.83
81.74
78.97
54.74
86.72
44.98
83.76
82.29
78.9
70.21
93.08
95.85
92.61
21.79
19.05
75.35
2/23/9
*
*
v
v
*
*
*
K*
80.32
72.01
90.27
88.72
73.73
95.35
75.71
70.17
79.1
70.19
75.31
70.27
75.18
77.83
76.44
80.17
84.64
94.67
92.03
85.08
80.27
58.35
86.22
38.02
85.11
80.85
77.56
70.22
93.22
98.72
96.03
32.02
31.27
51.00
4/15/15
v
*
*
*
*
*
v
*
*
*
*
*
*
*
v
v
*
*
*
NNge
73
74.26
100
80.46
67.8
96.18
79.02
69.24
82.83
72.84
67.98
66.8
77.78
79.6
77.3
81.88
90.6
96
86.23
77.14
86.73
53.98
89.08
39.09
71.12
81.01
70.69
62.26
95.1
95.93
94.09
73.06
67.96
94.30
1/21/12
*
*
*
*
*
*
*
v
*
*
5. CONCLUSION
Online learning is essential when the dataset is large
enough that multiple passes through the dataset would be
time-consuming. In this study, we attempted to increase
the prediction accuracy of an incremental version of
Naive Bayes model by integrating instance based learning. We performed a large-scale comparison with other
state-of-the-art algorithms and ensembles on 34 standard
benchmark datasets and we took better accuracy in most
cases. We believe the simplicity of this algorithm and its
great performance makes it an appealing tool for pattern
classification. However, in spite of these results, no general method will work always.
We have mostly used our online algorithm to learn
static datasets, i.e., those which do not have any temporal
ordering among the training instances. Much data mining
research is concerned with finding methods applicable to
the increasing variety of types of data availabletime
series, multimedia, spatial, worldwide web logs, etc.
Using the presented algorithm on these different types of
data is an important area of future work.
Increasing the Accuracy of Incremental Naive Bayes Classifier Using Instance Based Learning
Fig. 2. Representative graphs showing how the accuracy increases as more instances are added.
163
164
Sotiris Kotsiantis
Fig. 3. Representative graphs showing how the accuracy increases as more instances are added.
Increasing the Accuracy of Incremental Naive Bayes Classifier Using Instance Based Learning
165
[1]
[2]
[3]
[4]
[5]
[6]
[7]
Proposed
73.07
69.46
100
90.1
73.22
96.19
80.57
75.15
82.41
75.72
66.31
73.31
83.74
84.15
84.93
83.11
91.14
95
93.37
84.77
80.37
56.87
93.45
47.2
80.04
85.26
78.31
66.17
90.74
98.13
96.25
80.12
74.58
95.31
SMO
80.77
71.34
100
87.57
69.52
96.75
82.66
75.09
84.88
76.8
57.36
73.4
83.86
82.74
83.89
85.77
88.07
96.27
92.97
86.48
79.58
58.7
93.45
47.09
76.6
86.72
77.6
74.08
95.77
98.76
96.05
81.62
73.98
95.35
3/27/4
*
*
*
v
v
REFERENCES
P. Auer and M. Warmuth, Tracking the best disjunction, Machine Learning, vol. 32, no. 2, pp.
127-150, 1998.
F. Chu and C. Zaniolo, Fast and light boosting for
adaptive mining of data streams, Lecture Notes in
Computer Science, vol. 3056, pp. 282-292, 2004.
J. G. Cleary and L. E. Trigg, K*: an instance-based
learner using an entropic distance measure, Proc.
of the 12th International Conference on Machine
Learning, pp. 108-114, 1995.
W. Cohen, fast effective rule induction, Proc. of
Int Conf. of ML-95, pp. 115-123, 1995.
P. Domingos and M. Pazzani, On the optimality of
the simple Bayesian classifier under zero-one loss,
Machine Learning, vol. 29, no. 2-3, pp. 103-130,
1997.
W. Fan, S. Stolfo, and J. Zhang, The application of
AdaBoost for distributed, scalable and online learning, Proc. of Fifth ACM SIGKDD International
Conference on Knowledge Discovery and Data
Mining, New York, pp. 362-366, 1999.
F. Fdez-Riverola, E. L. Iglesias, F. Daz, J. R.
RIPPER
73.1
73.62
100
80.3
71.45
95.61
85.1
72.21
85.16
75.18
66.78
72.72
79.95
79.57
78.7
78.13
89.16
93.93
83.7
76.31
83.87
56.21
84.8
38.74
73.4
86.44
78.01
68.32
95.75
93.14
86.62
80.78
76.38
74.45
2/24/8
[8]
[9]
[10]
[11]
[12]
[13]
*
*
*
*
v
*
C4.5
77.26
81.77
100
77.82
74.28
95.01
85.16
71.25
85.57
74.49
67.63
71.05
76.94
80.22
78.15
79.22
89.74
94.73
78.6
75.84
80.61
57.75
92.95
41.39
73.61
86.75
78.55
72.28
96.57
93.2
92.61
81.57
74.64
77.90
5/20/9
v
*
v
*
v
*
*
*
*
v
v
*
AODE
72.73
74.76
100
69.96
73.05
97.05
82.45
75.83
86.67
75.70
74.53
71.57
82.87
84.33
82.70
85.36
91.09
93.07
88.43
86.86
82.32
59.62
93.21
49.77
77.05
86.08
78.21
70.32
94.28
98.31
94.66
76.06
69.96
95.35
4/25/5
v
*
v
v
*
*
Mndez, and J. M. Corchado, Applying lazy learning algorithms to tackle concept drift in spam filtering, Expert Systems with Applications, vol. 33, no.
1, pp. 36-48, 2007.
A. Fern and R. Givan, Online ensemble learning:
An empirical study, Proc. of the 17th International
Conference on ML, pp. 279-286, 2000.
A. Frank and A. Asuncion, UCI Machine Learning
Repository [http://archive.ics.uci.edu/ml]. Irvine,
CA: University of California, 2010.
Y. Freund and R. Schapire, Large margin classification using the perceptron algorithm, Machine
Learning, vol. 37, no. 3, pp. 277-296, 1999.
A. Gangardiwala and R. Polikar, Dynamically
weighted majority voting for incremental learning
and comparison of three boosting based approaches, Proc. of Joint Conf. on Neural Networks, pp.
1131-1136, 2005.
M. Hall, E. Frank, G. Holmes, B. Pfahringer, P.
Reutemann, and I. H. Witten, The WEKA data
mining software: an update, SIGKDD Explorations, vol. 11, no. 1, pp. 10-18, 2009.
L. I. Kuncheva, Classifier ensembles for changing
166
[14]
[15]
[16]
[17]
[18]
[19]
[20]
[21]
[22]
[23]
[24]
[25]
[26]
[27]
[28]
[29]
[30]
Sotiris Kotsiantis