Professional Documents
Culture Documents
Course Project
Abstract
Using regression models and exploratory data analysis on a dataset containing specications
for 32 dierent cars, it was investigated whether whether an automatic or manual transmission is
better for fuel eciency. Automatic models had on average signicantly better fuel eciency. Car
weight and 1/4 mile time revealed signicant predictors of fuel eciency with weight showing a
stronger negative correlation for automatic transmission than manual transmission and 1/4 mile
time showing a stronger positive correlation for automatic transmission than manual transmission.
We conclude that the expected fuel eciency for a light and slow accelerating car is better with
automatic transmission than manual and vice versa.
Introduction
This report explores the relationship between a set of variables and miles per gallon (MPG) for 32 automobiles using the Motor Trend Car Road Tests dataset (description available at http://
stat.ethz.ch/R-manual/R-devel/library/datasets/html/mtcars.html). Specically,
we will investigate whether an automatic or manual transmission is better for MPG and quantify any
dierences using regression models and exploratory data analysis.
13 and 19 of the total 32 cars had automatic and manual transmissions, respectively. Plotting the
MPG for the manual vs the automatic transmission cars reveals a clear dierence in the MPG, with
automatic models having a superior MPG of 24.39 6.17 vs 17.15 3.83 for the manual models (see
gure 1 in the appendix). The p-value for the null hypothesis using a t-test statistic is 0.001374,
thereby conrming that the MPG is dierent for manual and automatic models.
In order to explore the relationship between MPG and horse power, weight of the car, displacement,
number of carburetors, 1/4 mile time and rear axle ratio, the MPG were plotted againts these variable
and tted with regression lines for automatic and manual models in gure 2 in the appendix. We see
that there seem to be negative relationships between MPG and horse power, weight, displacement and
number of carburetors, but positive relationships with 1/4 mile time and Rear axle ratio.
3
3.1
Regression Model
Model selection
The strategy for model selection is based on the Akaike information criterion (AIC). The AIC value is
given by
AIC = 2k ln L
where k is the number of estimated parameters in the model and L is the likelihood. Hence, the
AIC value is a measure of goodness of t while penalizing increases in model parameters. In order
to select the best model for the MPG we perform multiple linear regressions stepwise adding parameters/variables and selecting the one with the minimum AIC value. Automatic transmission was set
as a binary variable and interactions between automatic transmission and the remaining variables was
accounted for. We get the following nal model:
(Intercept)
factor(am)0:wt
factor(am)1:wt
factor(am)0:qsec
factor(am)1:qsec
Regression models
Course Project
with an R2 = 0.879. From the p-values we see that all parameters are signicant. The model can
be expressed as
Y = 0 + 1 xweight + 1(am)(2 1 )xweight + 3 xqsec + 1(am)(4 3 )xqsec + e
where xweight is(the weight of the car in US tons, xqsec is the 1/4 mile time in seconds, e is an error term
1 if automatic transmission
and 1(am) =
. The error term is assummed to have contant variance
0 else
and to follow a normal distribution with zero mean.
The parameters represents the estimates in the above table as follows: 0 , intercept; 1 , factor(am)0:wt;
2 , factor(am)1:wt; 3 , factor(am)0:qsec; 4 , factor(am)1:qsec.
3.2
Model diagnostics
An assumption about our model is that the residuals have constant variance and follow a normal
distribution with mean 0. The residuals for are model are plotted in gure 3 in the appendix. We
observe that the residuals seems randomly scattered around zero and that quantiles for the residuals
fall somewhat close to the theoretical normal quantiles in the Q-Q plot. In order to test for normality
for the residuals we perform a Shapiro-Wilk normality test and a one-sample Kolmogorov-Smirnov
normality test which yields p-values of 0.3497 and 0.6584, respectively. Hence, we fail to reject the
hypothesis that the residuals are normal distributed.
3.3
Model Interpretation
For every increase in 1/4 mile time in seconds we predict a 3 = 0.8338 increase in MPG.
If the car has automatic transmission we predict an additional (4 3 ) = 0.6126 increase in
Let Yauto and Yman denote the MPG for a car with and without automatic transmission, respectively.
Then we have
Yauto = 0 + 1 xweight + (2 1 )xweight + 3 xqsec + (4 3 )xqsec + e
and
Yman = 0 + 1 xweight + 3 xqsec + e
Testing 2.9233 xweight + 0.6126 xqsec > 0 on our dataset we expect 21 of the 32 cars to have better
MPG with manual transmission. We conclude that we cannot assume that automatic cars are better
for MPG than manual models, but for light cars with a slow acceleration (high 1/4 mile time) we
predict a better fuel economy for an automatic model versus a manuel.
Appendix: Plots
2
Regression models
Course Project
30
25
20
15
10
15
20
25
30
10
10 12
Manual
Car Index
Figure 1:
Automatic
Regression models
Course Project
MPG vs Weight
25
10
15
20
MPG
20
10
15
MPG
25
30
Automatic
Manuel
30
Automatic
Manuel
50
100
150
200
250
300
1.5
2.0
2.5
3.0
Horse Power
MPG vs displacement
25
10
15
20
MPG
20
10
15
MPG
25
30
Automatic
Manuel
30
Automatic
Manuel
100
150
200
250
300
350
Displacement
Number of carburetors
25
20
15
10
10
15
20
MPG
25
30
Automatic
Manuel
30
Automatic
Manuel
MPG
3.5
15
16
17
18
19
20
3.6
3.8
4.0
4.2
4.4
Figure 2:
4
4.6
4.8
Regression models
Course Project
1
0
2
0
1
2
Obs. Est.
Sample Quantiles
Normal QQ Plot
Standardized Residuals
10 15 20 25 30
Index
Theoretical Quantiles
Figure 3: