You are on page 1of 5

Regression models

Course Project

Student: Jonatan Bording

Abstract
Using regression models and exploratory data analysis on a dataset containing specications
for 32 dierent cars, it was investigated whether whether an automatic or manual transmission is
better for fuel eciency. Automatic models had on average signicantly better fuel eciency. Car
weight and 1/4 mile time revealed signicant predictors of fuel eciency with weight showing a
stronger negative correlation for automatic transmission than manual transmission and 1/4 mile
time showing a stronger positive correlation for automatic transmission than manual transmission.
We conclude that the expected fuel eciency for a light and slow accelerating car is better with
automatic transmission than manual and vice versa.

Introduction

This report explores the relationship between a set of variables and miles per gallon (MPG) for 32 automobiles using the Motor Trend Car Road Tests dataset (description available at http://
stat.ethz.ch/R-manual/R-devel/library/datasets/html/mtcars.html). Specically,
we will investigate whether an automatic or manual transmission is better for MPG and quantify any
dierences using regression models and exploratory data analysis.

Exploratory Data Analysis

13 and 19 of the total 32 cars had automatic and manual transmissions, respectively. Plotting the
MPG for the manual vs the automatic transmission cars reveals a clear dierence in the MPG, with
automatic models having a superior MPG of 24.39 6.17 vs 17.15 3.83 for the manual models (see
gure 1 in the appendix). The p-value for the null hypothesis using a t-test statistic is 0.001374,
thereby conrming that the MPG is dierent for manual and automatic models.
In order to explore the relationship between MPG and horse power, weight of the car, displacement,
number of carburetors, 1/4 mile time and rear axle ratio, the MPG were plotted againts these variable
and tted with regression lines for automatic and manual models in gure 2 in the appendix. We see
that there seem to be negative relationships between MPG and horse power, weight, displacement and
number of carburetors, but positive relationships with 1/4 mile time and Rear axle ratio.

3
3.1

Regression Model
Model selection

The strategy for model selection is based on the Akaike information criterion (AIC). The AIC value is
given by
AIC = 2k ln L

where k is the number of estimated parameters in the model and L is the likelihood. Hence, the
AIC value is a measure of goodness of t while penalizing increases in model parameters. In order
to select the best model for the MPG we perform multiple linear regressions stepwise adding parameters/variables and selecting the one with the minimum AIC value. Automatic transmission was set
as a binary variable and interactions between automatic transmission and the remaining variables was
accounted for. We get the following nal model:
(Intercept)
factor(am)0:wt
factor(am)1:wt
factor(am)0:qsec
factor(am)1:qsec

Estimate Std. Error t value Pr(>|t|)


13.9692
5.7756
2.42
0.0226
-3.1759
0.6362
-4.99
0.0000
-6.0992
0.9685
-6.30
0.0000
0.8338
0.2602
3.20
0.0035
1.4464
0.2692
5.37
0.0000

Regression models

Course Project

Student: Jonatan Bording

with an R2 = 0.879. From the p-values we see that all parameters are signicant. The model can
be expressed as
Y = 0 + 1 xweight + 1(am)(2 1 )xweight + 3 xqsec + 1(am)(4 3 )xqsec + e

where xweight is(the weight of the car in US tons, xqsec is the 1/4 mile time in seconds, e is an error term
1 if automatic transmission
and 1(am) =
. The error term is assummed to have contant variance
0 else
and to follow a normal distribution with zero mean.
The parameters represents the estimates in the above table as follows: 0 , intercept; 1 , factor(am)0:wt;
2 , factor(am)1:wt; 3 , factor(am)0:qsec; 4 , factor(am)1:qsec.
3.2

Model diagnostics

An assumption about our model is that the residuals have constant variance and follow a normal
distribution with mean 0. The residuals for are model are plotted in gure 3 in the appendix. We
observe that the residuals seems randomly scattered around zero and that quantiles for the residuals
fall somewhat close to the theoretical normal quantiles in the Q-Q plot. In order to test for normality
for the residuals we perform a Shapiro-Wilk normality test and a one-sample Kolmogorov-Smirnov
normality test which yields p-values of 0.3497 and 0.6584, respectively. Hence, we fail to reject the
hypothesis that the residuals are normal distributed.
3.3

Model Interpretation

We can interprete our model as follows:


For every 1 US ton increase in the weight of the car we predict a 1 = 3.1759 decrease in MPG.
If the car has automatic transmission we predict an additional (2 1 ) = 2.9233 decrease in

MPG for every increase in weight.

For every increase in 1/4 mile time in seconds we predict a 3 = 0.8338 increase in MPG.
If the car has automatic transmission we predict an additional (4 3 ) = 0.6126 increase in

MPG for every second increase in 1/4 mile time.

Let Yauto and Yman denote the MPG for a car with and without automatic transmission, respectively.
Then we have
Yauto = 0 + 1 xweight + (2 1 )xweight + 3 xqsec + (4 3 )xqsec + e

and
Yman = 0 + 1 xweight + 3 xqsec + e

Comparing Yauto vs Yman we obtain


E[Yauto ] > E[Yman ] (2 1 )xweight + (4 3 )xqsec > 0
2.9233 xweight + 0.6126 xqsec > 0

Testing 2.9233 xweight + 0.6126 xqsec > 0 on our dataset we expect 21 of the 32 cars to have better
MPG with manual transmission. We conclude that we cannot assume that automatic cars are better
for MPG than manual models, but for light cars with a slow acceleration (high 1/4 mile time) we
predict a better fuel economy for an automatic model versus a manuel.

Appendix: Plots
2

Regression models

Course Project

Student: Jonatan Bording

30
25
20
15
10

15

20

25

30

Manual vs Automatic transmission

10

Miles per gallon

Miles per gallon (MPG)

10 12

Manual

Car Index

Figure 1:

Automatic

Regression models

Course Project

Student: Jonatan Bording

MPG vs Horse Power

MPG vs Weight

25
10

15

20

MPG

20
10

15

MPG

25

30

Automatic
Manuel

30

Automatic
Manuel

50

100

150

200

250

300

1.5

2.0

2.5

3.0

Horse Power

Weight (US tons)

MPG vs displacement

MPG vs Number of carburetors

25
10

15

20

MPG

20
10

15

MPG

25

30

Automatic
Manuel

30

Automatic
Manuel

100

150

200

250

300

350

Displacement

Number of carburetors

MPG vs 1/4 mile time

MPG vs Rear axle ratio

25
20
15
10

10

15

20

MPG

25

30

Automatic
Manuel

30

Automatic
Manuel

MPG

3.5

15

16

17

18

19

20

3.6

1/4 mile time

3.8

4.0

4.2

4.4

Rear axle ratio

Figure 2:
4

4.6

4.8

Regression models

Course Project

Student: Jonatan Bording

1
0
2

0
1
2

Obs. Est.

Sample Quantiles

Normal QQ Plot

Standardized Residuals

10 15 20 25 30

Index

Theoretical Quantiles

Figure 3:

You might also like