You are on page 1of 3

GREAT STEP – SAFETY DATA ANALYTICS ABSTRACT SUBMISSION

TEAM MATES:

SRI CHANDRA DUDDU – 14AG36001

SRICHANDRA CHILAPPAGARI – 13EC35014

Abstract Submission:

1. We have reviewed the predictor variables and dropped the variables ‘Id’ and ‘Phone
number’ which is obvious for the reason that they are unique for each customer.
This is also seen from the importance plot from randomForest package in R.
2. Looking at the importance plot from Random Forest, ‘Area Code’ is the least
important with < 5 % importance.
3. Upon changing the Categorical Variable ‘State’ into One Hot Encoding, we have seen
a decrease in the accuracy. So, we dropped this variable.
4. We have found that there are no missing values in the data. We have performed the
stratified sampling using ‘CreateDataPartition’ and divided the whole dataset into
train set and test set in 70:30 split.

RESULTS-

1. For Naive Bayes:

Reference
Prediction False True
False 1225 116
True 62 96

Accuracy: 88.12 %
Precision: 91.13 %
Recall: 95.2 %
2. For Decision Tree:

Reference
Prediction False True
False 1269 56
True 18 156

Accuracy: 95.06 %
Precision: 95.8 %
Recall: 98.6 %
GREAT STEP – SAFETY DATA ANALYTICS ABSTRACT SUBMISSION

3. For SVM – radial kernel:

Reference
Prediction False True
False 1275 117
True 12 95

Accuracy: 91.39 %
Precision: 91.6 %
Recall: 99.1 %

4. For SVM – polynomial kernel:

Reference
Prediction False True
False 1280 123
True 7 89

Accuracy: 91.32 %
Precision: 91.2 %
Recall: 99.5 %
5. For SVM – Linear kernel:

Reference
Prediction False True
False 1287 212
True 0 0

Accuracy: 85.85 %
Precision: 85.9 %
Recall: 100 %

6. For SVM – sigmoid kernel:

Reference
Prediction False True
False 1195 190
True 92 22

Accuracy: 81.18 %
Precision: 86.3 %
Recall: 92.9 %
GREAT STEP – SAFETY DATA ANALYTICS ABSTRACT SUBMISSION

There are no parameters to be tuned in rpart and Naive bayes. In order to improve the
accuracy performance of the support vector classification we will need to select the best
parameters for the model. We trained a lot of models for the different couples
of ϵ(epsilon) and cost, and choose the best one based on the root mean square
error(RMSE) value.

For SVM :

Best possible accuracy : 92.93%


Gamma : 0.0556
Cost : 16
Epsilon : 0
Degree : 3
Kernel : polynomial

In the figure below, dark blue regions represent the svm models with less RMSE value.
Darker the region, less is the RMSE of the model.

You might also like