5.1 Naive Bayes Experimental Result

CHAPTER V
RESULTS AND DISCUSSION
5.1 NAIVE BAYES EXPERIMENTAL RESULT

Artificial Intelligence techniques have been shown to perform well in many
applications. Data mining should be applicable to any kind of information
repository. The performance of classification algorithm is usually examined by
evaluating the accuracy of the classification. According to Nave Bayes classifier,
(2011) the Bayesian networks are used to construct classifiers from a given set of
training examples with class labels. In the current research work, the risk of
diabetic patients getting heart diseases is estimated using different data mining
classification techniques.
In the first experiment, in the current work Weka has been used as a tool
and the results of our Nave Bayes experiment are shown in Figure 5.1. The
proposed nave Bayes model was able to classify 74% of the input instances
correctly. It exhibited a precision of 71% on an average, recall of 74% on an
average, and F-measure of 71.2% on an average. The results show clearly that the
proposed method performs well compared to other similar methods in the literature,
considering the fact that the attributes taken for analysis are not direct indicators of
heart disease.
88
Figure 5.1 Result screen for Nave Bayes classification in WEKA
89
5.2 SVM EXPERIMENTAL RESULT

In the second experiment, SVM classifier has been used with radial basis function kernel
to diagnose heart disease vulnerability in diabetic patients with reasonable accuracy and the
result of ROC curve for the classifier characteristics in Weka tool is shown in Figure 5.2.
Figure 5.2 ROC curve for the SVM classifier characteristics
Receiver Operating Characteristics (ROC) graph is a technique for

visualizing, organizing and selecting classifiers based on their performance. ROC
analysis has been extended for use in visualizing and analyzing the behavior of
diagnostic systems. An attempt is made to determine not only accuracy but also
looking for the receiver operating characteristic or simply ROC curve, which is a
90
graphical plot which illustrates the performance of a binary classifier system as its
discrimination threshold is varied. It is created by plotting the fraction of true
positives out of the positives True positive rate vs. the fraction of false positives
out of the negatives false positive rate , at various threshold settings. TPR is also
known as sensitivity and FPR is one minus the specificity or true negative rate.
ROC graphs are two-dimensional graphs in which true positive rate is plotted on
the Y axis and false positive rate is plotted on the X axis. An ROC graph depicts
relative tradeoffs between benefits (true positives) and costs (false positives).
According to John Peter et al. (2012), confusion matrix displays the number
of correct and incorrect predictions made by the model compared with the actual
classifications in the test data. The matrix is n-by-n, where n is the number of
classes and from that we calculated the accuracy of each classification algorithms.
Table 5.1 A simple confusion matrix table
Predicted class
Actual Class
C1
C1
True positives
C2
False negatives
C2
False positives
True negatives
Table 5.1 shows a confusion matrix for a two-class classification problem.

Thuy Thi Thu Nguyen (2009) described a confusion matrix, in which if the instance
is positive and it is classified as positive, it is counted as a TP, if it is classified as
negative, it is counted as a FN. If the instance is negative and it is classified as
negative, it is counted as a TN, if it is classified as positive, it is counted as a FP. In
order to measure the performance of a medical test, true positives and true
negatives are correct classifications. For e.g., some of the people have the disease,
and our test says they are positive. They are called true positives. Some have the
disease, but the test claims they do not. They are called false negatives. Some don't
have the disease, and the test says they do not - true negatives. Finally, there might
be healthy people who have a positive test result - false positives.
91
This analysis will result in preventing diabetic patients from being affected
by heart diseases, thereby resulting in low mortality rates as well as reduced cost on
health for the state. SVMs have proven to be a classification technique with
excellent predictive performance and also have been investigated with the help of
ROC curves for both training and testing data.
The confusion matrix indicating the accuracy of the SVM classifier for the
given data set is shown in Table 5.2.
Table 5.2. The confusion matrix of the SVM classifier
True low
True high
Class precision
pred.
low
355
24
93.67%
pred.
high
118
97.52%
class
recall
99.16%
83.10%
Overall accuracy: 94.60% +/- 2.01% (mikro: 94.60%)
From the results obtained, it can be seen that the classifier exhibits a very
high classification accuracy i.e 94.60% overall. It also shows a very high precision
for the positive class (97.52%) and also the recall of the positive class is quite good
(83.10%). In the case of negative classes, the classifier exhibits high precision
(93.67%) as well as high recall (99.10%).
The possibility of diagnosis of heart disease vulnerability in diabetic
patients with reasonable accuracy has been shown. Classifiers of this kind can help
in early detection of the vulnerability of a diabetic patient to heart disease, thereby
the patients can be forewarned to change their lifestyles. This results in preventing
diabetic patients from being affected by heart disease, in turn resulting in low
92
mortality rates as well as reduced cost on health for the state. SVMs have proved
to be a classification technique with excellent predictive performance and also have
been investigated with the help of ROC curves for both training and testing data.
Hence, this SVM model can be recommended for the classification of the diabetic
dataset.
5.3 COMPARING SUPPORT VECTOR MACHINE AND DECISION TREE
EXPERIMENTAL RESULTS
In the third experiment, Rapid miner has been used as a tool due to its
learning operators and operator framework, which allows forming nearly arbitrary
processes. Apart from accuracy are trying to determine the ROC, AUC and Lift
chart for measuring the performance.
5.3.1 ROC / AUC and Performance for the SVM classifier
In data mining and association rule learning, lift is a measure of the
performance of a targeting model at predicting or classifying cases as having an
enhanced response measured against a random choice targeting model. Lift is
simply the ratio of target response divided by average response. This operator
creates a lift chart based on a plot for the discredited confidence values for the
given example set and model.
The AUC of a classifier is equivalent to the probability that the classifier
will rank a randomly chosen positive instance higher than a randomly chosen
negative instance. AUC is also used for comparing classifiers. However, ROC and
AUC use a single training and testing pair. It is an estimate of the probability that a
classifier will rank a randomly chosen positive instance higher than a randomly
chosen negative instance.
We have set up the value of C as 5 and as 1 for the SVM to operate in
the RBF (Radial Basis Function) kernel type and use the type C-SVC which is
standardized regular algorithm. A common method is to calculate the area under the
roc curve, abbreviated AUC. Since the AUC is a portion of the area of the unit
93
square, its value will always be between 0 and 1.0. However, because random
guessing produces the diagonal line between (0, 0) and (1, 1), which has an area of
0.5, no realistic classifier should have an AUC less than 0.5. It measures the
discriminating ability of a binary classification model. The larger the AUC, the
higher the likelihood that an actual positive case will be assigned a higher
probability of being positive than an actual negative case.
Figure 5.3 Area Under Curve of the SVM classifier
The result of Area under curve for the SVM classifier used in Rapid miner
tool is shown in Figure 5.3. The red color indicates ROC and blue color indicates
ROC threshold.
94
The performance of the SVM classifier indicating the accuracy with two
classes high and low for the given data set is shown in Table 5.3.
Table 5.3 Performance of SVM with an accuracy of 79.8 % (High and Low)
True low
True high
Class precision
pred.
low
755
209
78.32%
pred.
high
36
100.00%
class
recall
100.00%
14.69%
5.3.2 ROC / AUC and Performance for the Decision Tree
Figure 5.4 Area Under Curve of the Decision Tree

The result of Area under curve for the decision tree used in Rapid miner tool
is shown in Figure 5.4. In the case of decision tree the AUC is O.907, where as the
AUC of support vector machine is 0.710.
95
The performance of the decision tree indicating the accuracy with two
classes high and low using information gain as split parameter for the given data set
is shown in Table 5.4.
Table 5.4 Performance of Decision tree with an accuracy of 89.2 % (High and
Low) using Information gain as split parameter
True low
True high
Class precision
pred.
low
654
98.94%
pred.
high
101
238
70.21%
class
recall
86.62%
97.14%
As shown in Table 5.5, for the results of all two models, decision tree
appears to be most effective as it has the highest percentage of correct predictions
(89.2%) for patients with heart diseases, followed by support vector machine.
Table 5.5 Accuracy of SVM and Decision Tree (High and Low)
Technique
Accuracy
in
percentage
Decision tree
89.20
Support Vector
Machine
79.80
The use of Rapid miner has simplified the efforts on k-fold validation,
generation of AUC and ROC which has helped in proper evaluation of the
96
performance of the learning models to evaluate the best classifier so that it can be
further refined for better prediction.
5.4 COMPARING NAIVE BAYES, SUPPORT VECTOR MACHINE AND
DECISION TREE EXPERIMENTAL RESULTS
In the final experiment also Rapid miner has been used as a tool for
evaluating and comparing three classification techniques using three classes high,
medium and low with diabetic patient dataset to determine the possible ways to
predict the risk of heart disease for diabetic patients.
In general, The Bayes theorem formula is P(h/D)= P(D/h) P(h) / P(D)
where
P (h) - Prior probability of hypothesis h

P (D) - Prior probability of training data D
P (h/D) - Probability of h given D
and
P (D/h) - Probability of D given h
Naive Bayes algorithm uses the Bayes formula, which calculates the
probability of a patient record Y having the class label Cj. The label could be
High, Medium and Low.
P(label= Cj | Y ) = P(Y|label= Cj) * P(Cj) / P(Y)
97
5.4.1 Nave Bayes distribution graphs and tables

In Nave Bayes classification technique, we are assumed that the probability
distribution for an attribute follows a normal or Gaussian distribution.
Figure 5.5 Bayes distribution plot by age attribute
We have tried to take age, sex, smoking, alcohol, cholesterol HDL as the
prime attribute to evaluate Naive Bayes with the plot and it is shown respectively in
Figures 5.5, 5.6, 5.7, 5.8 and 5.9. In Figure 5.5, X axis denotes age and Y axis
denotes the density. At the age of 52, the risk is high, at the age of 54, the risk is
medium and at the age of 55 the risk is low.
98
Table 5.6 Nave Bayes distribution table for age attribute

Attribute
Parameter
Low
Medium
High
Age
Mean
54.92
53.62
51.24
Age
Standard
deviation
12.03
11.90
11.33
Similarly, its distribution table for above five attributes is shown in Tables
5.6, 5.7, 5.8, 5.9 and 5.10. In Figure 5.6, X axis denotes sex and Y axis denotes the
density.
Figure 5.6 Bayes distribution plot by sex attribute
99
Table 5.7 Nave Bayes distribution table for sex attribute

Attribute
Parameter
Low
Medium
High
Sex
value=M
0.61
0.55
0.49
Sex
value=F
0.40
0.45
0.50
Figure 5.7 Bayes distribution plot by smoking attribute
100
Table 5.8 Nave Bayes distribution table for smoking attribute

Attribute
Parameter
Low
Medium
High
Smoking
value= yes
0.163
0.151
0.173
Smoking
value= - (no)
0.485
0.459
0.534
Smoking
value= unknown
0.350
0.389
0.293
Figure 5.8 Bayes distribution plot by alcohol attribute
101
Table 5.9 Nave Bayes distribution table for alcohol attribute
Attribute
Parameter
Low
Medium
High
Alcohol
value= yes
0.09
0.08
0.04
Alcohol
value= - (no)
0.56
0.48
0.62
Alcohol
value= unknown
0.35
0.43
0.35
Figure 5.9 Bayes distribution plot by cholesterol HDL attribute

102
Table 5.10 Nave Bayes distribution table for cholesterol HDL attribute
Attribute
Parameter
Low
Medium
High
Cholesterol
mean
3.92
4.38
4.89
Cholesterol
Standard
deviation
0.78
0.86
0.97
5.4.2 Naive Bayes, Support Vector Machine, Decision Tree performances

In order to validate the final results obtained in the research presented,
experiments were carried out by combining the three techniques and the
performance of Bayes theorem, SVM and Decision tree are shown in Tables 5.11,
5.12 and 5.14 respectively.
Table 5.11 Performance of Bayes Theorem with an accuracy of 81.58%
True low
True
medium
True high
Class
precision
631
62
21
88.38%
39
98
26
60.12%
11
25
86
70.49%
92.66%
52.97%
64.66%
pred. low
pred.
medium
pred. high
class recall
103
Table 5.12 Performance of Support vector machine with an accuracy of 61.26%
True low
True
medium
True high
Class precision
398
25
15
90.87%
283
155
59
31.19%
59
92.19%
58.44%
83.78%
44.36%
pred. low
pred. medium
pred. high
class recall
The decision tree using various split methods such as Gain ratio,
Information gain and Gini index has been tried as shown in Table 5.13 which gives
different levels of accuracy.
Table 5.13
Accuracy by split methods using decision tree
Split method
criteria
Accuracy in
percentage
Classification
error in
percentage
Gain ratio
88.19
11.81
Information gain
90.79
9.21
Gini Index
87.69
12.31
104
For classification problems, it is natural to measure a classifiers

performance in terms of the error rate. The classifier predicts the class of each
instance. The correct class is counted as success and, if not, it is taken as an error.
The error rate is just the proportion of errors made over a whole set of instances,
and it measures the overall performance of the classifier. With the use of the
information gain as split parameter in decision trees, the results are exhibited by
average precision, recall and accuracy of this technique was found to be 90.79 %.
Table 5.14: Performance of Decision tree with an accuracy of 90.79% using

information gain as split parameter
pred. low
pred.
medium
True low
657
True
medium
25
True high
11
class
precision
94.81%
18
148
20
79.57%
12
102
85.00%
96.48%
80.00%
76.69%
pred. high
class recall
Niyati Gupta et al. (2013) have defined the accuracy as the proportion of
instances that are correctly classified. It is calculated by the total number of
correctly predicted high risk (true positive) and correctly predicted low risk
(true negative) over the total number of classifications. It can be calculated as
Accuracy = (TP + TN) / (TP + TN + FP + FN)
For a multiclass classification problem, TP, FP, TN and FN for each class i
are indicated as definitions. TPi, FPi, TNi, and FNi for the class i are also defined.
Then, certain parameters can be calculated to evaluate the multiclass classification
results accordingly. For e.g., True Positive Rate TPR, Precision and f-Measure
value for each class and the overall accuracy can be calculated.
105
Table 5.15 Accuracy of various classification techniques (High, Medium, Low)

Technique
Accuracy
in
percentage
Decision tree
90.79
Nave Bayes
81.58
Support Vector
Machine
61.26
As shown in Table 5.15, for the results of all three models, decision tree
appears to be most effective as it has the highest percentage of correct predictions
(90.79%) for patients with heart diseases, followed by followed by nave Bayes and
support vector machine. The performance in terms of graphs for accuracy,
precision, sensitivity, specificity and F-score are shown in figures 5.10, 5.11, 5.12,
5.13 and 5.14 respectively.
Figure 5.10 Performance in terms of accuracy

106
When more than two classes are dealt with, the accuracy alone might not be
sufficient. So evaluation of precision, sensitivity, specificity and F-score along with
accuracy to determine the right classifier has been done.
According to Sheik Abdullah et al. (2012) precision is the fraction of
retrieved instances that are relevant and recall is the fraction of relevant instances
that are retrieved.
The precision can be calculated as
Precision = TP / (TP + FP)
However, TP rate alone is not sufficient to fully measure performance of the
classifier in a single class. Therefore we compute Precision for class i as,
Precision i = TPi / (TPi + FPi)
The sensitivity is the proportion of positive instances that are correctly
classified as positive (e.g. the proportion of sick people that are classified as sick).
It is also called as Recall.
It can be calculated as
Sensitivity = TP / (TP + FN)
The specificity is the proportion of negative instances that are correctly
classified as negative (e.g. the proportion of healthy people that are classified as
healthy). It can be calculated as
Specificity = TN / (TN + FP)
F-score or F-measure is a measure of a test's accuracy and it is the harmonic
mean of precision and recall which can be calculated as
F-score = 2 * (Precision * Recall) / (Precision + Recall)
107
We can also compute F-Measure for class i as,

F=2 *(Precision i + TP_rate i) / (Precision i + TP_rate i)
The risks calculated are arrived at by using various classification techniques
as shown in Table 5.16 and detailed analysis of performance is shown in Table
5.17.
Table 5.16 Performance of sensitivity, specificity and F-Score
Classification
Accuracy
Precision
Sensitivity
Specificity
F-Score
Decision tree
90.79
0.86
0.84
0.93
0.84
Nave Bayes
81.58
0.72
0.69
0.85
0.70
Support Vector
Machine
61.26
0.71
0.61
0.80
0.65
Figure 5.11 Performance in terms of precision

108
Figure 5.12 Performance in terms of sensitivity
Figure 5.13 Performance in terms of specificity

109
Figure 5.14 Performance in terms of F-Score
Table 5.17 Performance of sensitivity, specificity and precision by Class

overall
accuracy
in
percent
Med
High
Decision tree
90.79
0.79
Nave Bayes
81.58
61.26
Classification
models
Support
Vector
Machine
Precision
Recall (Sensitivity)
per class Specificity
Low
Med
High
Low
Med
High
Low
0.85
0.94
0.80
0.76
0.96
0.95
0.97
0.87
0.60
0.70
0.88
0.52
0.64
0.92
0.92
0.95
0.68
0.31
0.92
0.90
0.83
0.44
0.58
0.57
0.99
0.84
110
5.5 SUMMARY
The main focus in this chapter is on the application of three different data
mining algorithms namely Nave Bayes, Support vector machine and Decision tree
on diabetes dataset to predict the risk of heart diseases based on their predictive
accuracy. Hence, a comparison of the outcomes of the various classification
techniques has been made and a higher degree of accuracy of the decision tree is
found. The performances are compared through accuracy, sensitivity, specificity
and F-score. For future research, stacking techniques can be used to increase the
accuracy of decision trees and reduce the number of leaf nodes.
111

5.1 Naive Bayes Experimental Result

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

5.1 Naive Bayes Experimental Result

Uploaded by

Copyright:

Available Formats

CHAPTER V

RESULTS AND DISCUSSION

5.1 NAIVE BAYES EXPERIMENTAL RESULT

Figure 5.1 Result screen for Nave Bayes classification in WEKA

5.2 SVM EXPERIMENTAL RESULT

Figure 5.2 ROC curve for the SVM classifier characteristics

Receiver Operating Characteristics (ROC) graph is a technique for

Table 5.1 shows a confusion matrix for a two-class classification problem.

Overall accuracy: 94.60% +/- 2.01% (mikro: 94.60%)

Figure 5.3 Area Under Curve of the SVM classifier

5.3.2 ROC / AUC and Performance for the Decision Tree

Figure 5.4 Area Under Curve of the Decision Tree

P (h) - Prior probability of hypothesis h

P (D/h) - Probability of D given h

P(label= Cj | Y ) = P(Y|label= Cj) * P(Cj) / P(Y)

5.4.1 Nave Bayes distribution graphs and tables

Figure 5.5 Bayes distribution plot by age attribute

Table 5.6 Nave Bayes distribution table for age attribute

Figure 5.6 Bayes distribution plot by sex attribute

Table 5.7 Nave Bayes distribution table for sex attribute

Figure 5.7 Bayes distribution plot by smoking attribute

Table 5.8 Nave Bayes distribution table for smoking attribute

Figure 5.8 Bayes distribution plot by alcohol attribute

Table 5.9 Nave Bayes distribution table for alcohol attribute

Figure 5.9 Bayes distribution plot by cholesterol HDL attribute

5.4.2 Naive Bayes, Support Vector Machine, Decision Tree performances

Table 5.11 Performance of Bayes Theorem with an accuracy of 81.58%

Table 5.12 Performance of Support vector machine with an accuracy of 61.26%

Accuracy by split methods using decision tree

For classification problems, it is natural to measure a classifiers

Table 5.14: Performance of Decision tree with an accuracy of 90.79% using

Table 5.15 Accuracy of various classification techniques (High, Medium, Low)

Figure 5.10 Performance in terms of accuracy

We can also compute F-Measure for class i as,

Figure 5.11 Performance in terms of precision

Figure 5.12 Performance in terms of sensitivity

Figure 5.13 Performance in terms of specificity

Figure 5.14 Performance in terms of F-Score

Table 5.17 Performance of sensitivity, specificity and precision by Class

per class Specificity

You might also like