You are on page 1of 2

I think before classifying any data sets with any algorithms, we should first fi

nd out what the baseline accuracy for a particular data set. It should be first
classified with ZeroR for nominal datasets . ZeroR is the algorithm which class
ifies the data with the class frequency table. i.e. For e.g. if the class attrib
utes have the maximum no of "Yes" value then it will classify the data set using
this value.
Using the mentioned classifiers for analysing the zoo data set, It is found out
that the algorithms worked very well ,better than ZeroR. It is quiet so because
,it depends on what the data set consists of and how attributes are linked and
this can be proved by using supermarket data set. In supermarket dataset these a
lgorithms are worst than the Baseline accuracy .

when zoo dataset is classified using JRip classifier, the following rules are g
enerated
if animal = frog then it is type amphibian,
legs>=6 and predator=false then it is type insect, is backbone =false then it i
s of type fish
when feathers =true then it is of type bird
It has an accuracy of 88.253% but still it can do better.

When J48 is used on this dataset it outperforms everyone with accuracy of 94.117
% and following are the classifying rules :
if feathers =true then it is of type bird else it will look for milk ,and it is
found that milk= true then it is of type mammal else it will look for backbone a
nd if it is true then it will again check for another attribute called fins and
if it finds fins to be true then it is definitely of type fish else it will loo
k for tail , if it finds that it has tail then it is amphibian else reptile .Als
o when Backbone is false it looks for attributes like airborne, predator ,legs a
nd if it finds that it has true value then they are classified as invertebrate ,
insects and invertebrate.

Incase of bolts dataset , the linear regression classifier works with calculatin
g weights for data sets from the training data and then it is followed by calcu
lating the first training instance value for prediction . It is computed using
the formula described in the algorithm.
In this case ,Linear Regression classifier developed this model
Linear Regression Model
TIME = 1.8997 * TOTAL +
1.1448 * T20BOLT +
-40.5554
This model is used to determine how to get the shortest time to count 20 bolts.
Correlation coefficient
0.9456
Mean absolute error
10.2265
Root mean squared error
12.9832
Relative absolute error
31.7993 %
Root relative squared error
31.5208 %

Since Correlation coefficient for LinearRegression classifier is good , Linearregression Classifier did a pretty good job .
But When MP5 classifier i.e Non-Linear regression classifier is used for classif
ication, It did a pretty decent job than Linear regression.
M5 pruned model tree:
(using smoothed linear models)
T20BOLT <= 62.365
T20BOLT > 62.365
| TOTAL <= 20 :
| TOTAL > 20 :

: LM1 (29/4.058%)
:
LM2 (3/5.86%)
LM3 (8/0.008%)

LM num: 1
TIME =
1.1824 * TOTAL
+ 0.4414 * NUMBER2
+ 0.7813 * T20BOLT
- 21.3755
LM num: 2
TIME =
0.0561 * RUN
+ 2.4037 * TOTAL
+ 1.0813 * T20BOLT
- 52.9476
LM num: 3
TIME =
0.0439 * RUN
+ 2.1194 * TOTAL
+ 1.2106 * T20BOLT
- 48.5376
This are the three cases that would be helpful in classifying the datasets.MP5 i
s better than Linear Regression because of the following results.
Correlation coefficient
Mean absolute error
Root mean squared error
Relative absolute error
Root relative squared error

0.9749
5.6659
8.9528
17.6182 %
21.7357 %

Similarly when the bolts dataset is classified using the Kstar and DecisionTable
classifier , The results are average .Hence it is pretty sure both MP5 and Line
arRegression did better job than Kstar and DecisionTable

You might also like