You are on page 1of 18

Quantitative Business Analysis for

Decision Making

Simple Linear
Regression

Lecture Outlines
Scatter Plots
Correlation Analysis
Simple Linear Regression Model
Estimation and Significance Testing
Coefficient of Determination
Confidence and Prediction Intervals
Analysis of Residuals

403.7

Regression Analysis ?
Regression analysis is used for modeling
the mean of response variable Y as a
function of predictor variables X 1, X2,..,
X k.
When K = 1, it is called simple regression
analysis.

403.7

Random Sample
Y: Response Variable,
X: Predictor Variable
For each unit in a random sample of n, the
pair
(X, Y) is observed resulting a random
sample:
(x1, y1), (x2, y2),... (xn, yn)
403.7

Scatter Plot
Scatter Plot is a graphical displays of the
sample (x1, y1), (x2, y2),... (xn, yn) by n
points in 2-dimension.
It will suggest if there is a relationship
between X and Y

403.7

A Scatter Plot Showing Linear


Trend
A Scatter Plot Showing Linear Trend
of Peoples Ratings and Nielsen Ratings

PeopleM

25

20

15

16

21

26

Nielsen

403.7

A Scatter Plot Showing No Linear


Trend
A Scatter Plot Showing No Linear Trend
of Today's With Yesterday's DJIA

Yesterda

-1
-1

Today

403.7

Modeling linear Trend


A perfect linear relationship between Y
and X exists if Y X
.
Coefficient
is the slope--quantifying
of X
the amount of change in y corresponding
to one unit change in x.
There are no perfect linear relationships
in practical world.

403.7

Simple Linear Regression


Model
Model:

Y X

X is linear function (nonrandom)

is random error. It is assumed to be

normally distributed mean 0 and standard


deviation . So
y X

are parameters of the model

, and

403.7

Estimation
Simple linear regression analysis estimates the
mean of
y X
y a bx
Y (linear trend)
by

a y bx

and

( x x )( y y )

b
(x x)
2

403.7

10

Standard deviation
Standard deviation (s) of the sample of
n points in the scatter plot around the
estimated regression line y a bx
is:

y y
n2

403.7

11

Testing the Slope of Linear


Trend
For Testing

H 0 : 0 vs. H a : 0
compute t-statistic and its p value:

b - 0
t - statistic
sb

403.7

12

Coefficient of Determination:
R2
A quantification of the significance of
estimated model y a bx is denoted by
R2.
R2 > 85% = significant model
R2 < 85% = model is perceived as
inadequate
Low R2 will suggest a need for additional
predictors for modeling the mean of Y

403.7

13

Correlation Coefficient: r
The correlation coefficient r is the square
root of R2. It is a number between -1 and 1.
Closer r is to -1 or 1, the stronger is the
linear trend
Its sign is positive for increasing trend
(slope b is positive)
Its sign is negative for decreasing trend
(slope b is negative)
403.7

14

Confidence and Prediction


Intervals
To estimate y x by a confidence
interval, or to predict response Y
corresponding to its predictor value x
= x0
y a bx0
1. Compute:

y s.e. y

2. compute:
403.7

15

What is s.e. y ?
i.e. Standard Error of
For estimating

y,

2
1
( x x0 )

s.e.( y ) s

n (x x)2

For Predicting Y,

( x x0 )
1

s.e.( y ) s 1
2

n
(
x

x
)

403.7

16

Analysis of Residuals
Residuals are defined:

ei y i y i , i 1, 2,....n

Residual analysis is used to check the


normality and homogeneity of variance
assumptions of random errors .

Histogram or box plot of residuals will


help to ascertain if errors are

normally distributed.

403.7

17

Analysis of Residuals
(cont)
Plot of residual ei against observed
predictor values xi will help ascertain
homogeneity assumption.
random appearance = homogeneity of
variance assumption is valid.
non-random appearance
=homogeneity assumption is not valid
and variance is dependent on predictor
values.
403.7

18

You might also like