Professional Documents
Culture Documents
QUANTITATIVE MODELING
Richard Waterman
Module 4: Regression models
WHARTON ONLINE
Module 4 content
WHARTON ONLINE 3
Example
WHARTON ONLINE 4
A linear regression model for the diamonds data
( | ) = 260 + 3721
WHARTON ONLINE 5
Correlation
WHARTON ONLINE 6
Questions that can be answered with a regression
WHARTON ONLINE 8
Fitting a model to data using least squares
WHARTON ONLINE 9
Residuals and fitted values
Key insight:
The regression line decomposes the observed data into two
components
1. The fitted values (the predictions)
2. The residuals (the vertical distance from point to line)
Both are useful:
The fitted values are the forecasts
The residuals allow us to assess the quality of the fit. If a point
has a large residual it is not well fit by the regression. If we can
explain why, we have learnt something new
WHARTON ONLINE 10
Example: fuel economy v. weight
WHARTON ONLINE 11
Interpretation of regression coefficients
WHARTON ONLINE 12
R2 and Root Mean Squared Error (RMSE)
WHARTON ONLINE 13
Using Root Mean Squared Error
WHARTON ONLINE 14
An approximate 95% prediction interval for a new
observation
Using the Normality assumption and the Empirical Rule, (within
the range of the observed data) an approximate 95% prediction
interval for a new observation is given by:
2
For the diamonds data the RMSE is approximately 32
Therefore under the Normality assumption the width of the
approximate 95% prediction interval is $64
An approximate 95% PI for the price of a diamond that weighs
0.25 carats is 260 + 3721 0.25 64 = (606, 734)
WHARTON ONLINE 15
Residual diagnostics checking the Normality
assumption
The histogram of residuals from the diamonds regression is
approximately Normally distributed, providing no strong evidence
against the Normality assumption
WHARTON ONLINE 16
Fitting curves to data
WHARTON ONLINE 17
On observing curvature, transform
WHARTON ONLINE 18
The regression equation for the log-log model
WHARTON ONLINE 19
Multiple regression
WHARTON ONLINE 20
Weight and horsepower as predictors of fuel economy
WHARTON ONLINE 22
Logistic regression
WHARTON ONLINE 23
Website compromise study
#Plugins 0 1 2 3 4 5 6 7 8 9 10
Compromised 16 23 20 26 40 55 60 74 80 83 88
Not compromised 84 77 80 74 60 45 40 26 20 17 12
Proportion compromised 0.16 0.23 0.20 0.26 0.40 0.55 0.60 0.74 0.80 0.83 0.88
WHARTON ONLINE 24
Linear fit
WHARTON ONLINE 25
Logistic regression fit
WHARTON ONLINE 26
Module summary