11

Statistics for Managers
using Microsoft Excel

3rd Edition
Chapter 11
Simple Linear Regression
2002 Prentice-Hall, Inc.
Chap 11-1
Chapter Topics
Types of regression models

Determining the simple linear regression
equation
Measures of variation
Assumptions of regression and correlation
Residual analysis
Measuring autocorrelation
Inferences about the slope
Chap 11-2
Chapter Topics
(continued)
Correlation - measuring the strength of

the association
Estimation of mean values and
prediction of individual values
Pitfalls in regression and ethical issues
Chap 11-3
Purpose of Regression
Analysis
Regression analysis is used primarily

to model causality and provide
prediction
Predicts the value of a dependent

(response) variable based on the value of
at least one independent (explanatory)
variable
Explains the effect of the independent
variables on the dependent variable
Chap 11-4
Types of Regression Models

Positive Linear Relationship
Negative Linear Relationship
Relationship NOT Linear
No Relationship
Chap 11-5
Simple Linear Regression

Model
Relationship between variables

is described by a linear function
The change of one variable
causes the change in the
other variable
A dependency of one variable
on the other
Chap 11-6
Population Linear Regression

Population regression line is a straight line
that describes the dependence of the
average value (conditional mean) of one
Population Rando
variable on the other
Population
Y intercept
Dependen
t
(Response
) Variable
Slope
Coefficient
m Error
Yi X i i
Population
Regression
YX
Line
(conditional mean)
Independent
(Explanatory
) Variable
Chap 11-7
Population Linear Regression
(continued)
(Observed Value of Y) = Yi
X i i
i = Random Error
YX X i
(Conditional Mean)
X
Observed Value of Y
Chap 11-8
Sample Linear Regression

Sample regression line provides an
estimate of the population regression
line as well as a predicted value of Y
Sample
Y Intercept
Yi b0 b1 X i ei
Sample
Slope
Coefficient
Residual
Sample Regression Line

Y b0 b1 X (Fitted
Regression Line, Predicted Value)
Chap 11-9
(continued)
b0 and b1 are obtained by finding the

values
b0 of b1 and that minimizes the
sum of the squared residuals
i 1
Yi Yi
e
2
i 1
2
i
b0 provides an estimateof
b1 provides and estimateof
Chap 11-10
(continued)
Yi b0 b1 X i ei
Y
ei
Yi X i i
b1
YX X i
b0
Observed Value
Y i b0 b1 X i
X
Chap 11-11
Interpretation of the
Slope and the Intercept
E Y | X 0 is the average value of Y
when the value of X is zero.
E Y | X
1
measures the change in
X
the average value of Y as a result of a

one-unit change in X.
Chap 11-12
Interpretation of the
Slope and the Intercept
(continued)
is the estimated
b E Y | X 0
average value of Y when the value of X is

zero.
E Y | X
b1
Xthe estimated change in the
is
average value of Y as a result of a oneunit change in X.
Chap 11-13
Simple Linear Regression:

Example
You want to examine
the linear
dependency of the
annual sales of
produce stores on
their size in square
footage. Sample
data for seven
stores were
obtained. Find the
equation of the
Store
Square
Feet
Annual
Sales
($1000)
1
2
3
4
5
6
7
1,726
1,542
2,816
5,555
1,292
2,208
1,313
3,681
3,395
6,653
9,543
3,318
5,563
3,760
Chap 11-14
Scatter Diagram: Example

Annual Sales ($000)
12000
10000
8000
6000
4000
2000
0
0
Excel Output
1000
2000
3000
4000
5000
6000
Square Feet
Chap 11-15
Equation for the Sample

Regression Line: Example
Yi b0 b1 X i
1636.415 1.487 X i
From Excel Printout:
Coefficients
Intercept
1636.414726
X Variable 1 1.486633657
Chap 11-16
Annual Sales ($000)
Graph of the Sample

Regression Line: Example
12000
10000
8000
6000
Yi =
4000
2000
15
4
.
6
3
6
1
Xi
7
8
1. 4
0
0
1000
2000
3000
4000
5000
6000
Square Feet
Chap 11-17
Interpretation of Results:
Example
Yi 1636.415 1.487 X i
The slope of 1.487 means that for each increase of
one unit in X, we predict the average of Y to
increase by an estimated 1.487 units.
The model estimates that for each increase of one
square foot in the size of the store, the expected
annual sales are predicted to increase by $1487.
Chap 11-18
Simple Linear Regression in

PHStat
In excel, use PHStat | regression | simple

linear regression
EXCEL spreadsheet of regression sales

on footage
Micros oft Excel

Works heet
Chap 11-19
Measure of Variation:
The Sum of Squares
SST
Total
=
Sample
Variability
SSR
Explained
Variability
SSE
Unexplained
Variability
Chap 11-20
The Sum of Squares
(continued)
SST = total sum of squares
SSR = regression sum of squares
Measures the variation of the Yi values

around their mean Y
Explained variation attributable to the
relationship between X and Y
SSE = error sum of squares
Variation attributable to factors other than

the relationship between X and Y
Chap 11-21
The Sum of Squares
(continued)
SSE =(Yi - Yi )2
SST = (Yi - Y)
_
SSR = (Yi - Y)2
Xi
_
Y
X
Chap 11-22
Explanatory Power of
Regression
Variations in
store sizes not
used in
explaining
variation in
sales
Sizes
Sales
Variations in
sales explained
by the error
term SSE
Variations in sales
explained by sizes or
variations in sizes
used in explaining
variation in sales
SSR
Chap 11-23
The ANOVA Table in Excel

ANOVA
df
SS
MS
Significanc
e
F
Regressio
n
SSR
MSR
=SSR/p
MSR/MSE
P-value of
the F Test
Residuals
n-p1
MSE
SSE =SSE/(n-p1)
Total
n-1
SST
Chap 11-24
Measures of Variation
The Sum of Squares: Example
Excel Output for Produce Stores
Degrees of freedom
ANOVA
df
SS
MS
Regression
30380456.12
30380456
Residual
1871199.595 374239.92
Total
32251655.71
Regression (explained) df
Error (residual) df
Total df
F
81.17909
SSE
SSR
Significance F
0.000281201
SST
Chap 11-25
The Coefficient of
Determination
SSR Regression Sum of Squares

r
SST
Total Sum of Squares
2
Measures the proportion of variation

in Y that is explained by the
independent variable X in the
regression model
Chap 11-26
Explanatory Power of
Regression
Sales
Sizes
r
2
SSR
SSR SSE
Chap 11-27
Coefficients of Determination (r
2
) and Correlation (r)
Y r2 = 1, r = +1
^=b +b X
Y
i
0
1 i
Y r2 = 1, r = -1
^=b +b X
Y
i
Yr2 = .8, r = +0.9
X
Y
^=b +b X
Y
i
0
1 i
X
1 i
r2 = 0, r = 0
^ =b +b X
Y
i
0
1 i
X
Chap 11-28
Standard Error of Estimate
SYX
SSE
n2
i 1
Y Yi
n2
The standard deviation of the variation

of observations around the regression
line
Chap 11-29
Measures of Variation:
Produce Store Example
Excel Output for Produce Stores
r2 = .94
Regression Statistics
Multiple R
0.9705572
R Square
0.94198129
Adjusted R Square 0.93037754
Standard Error
611.751517
Observations
7
Syx
94% of the variation in annual sales can

be explained by the variability in the
size of the store as measured by square
Chap 11-30
footage
Linear Regression
Assumptions
Normality
Y values are normally distributed for each X

Probability distribution of error is normal
2. Homoscedasticity (Constant Variance)

3. Independence of Errors
Chap 11-31
Variation of Errors around

the Regression Line
f(e)
Y values are normally distributed

around the regression line.
For each X value, the spread or
variance around the regression line is
the same.
Y
X2
X1
X
Sample Regression Line
Chap 11-32
Residual Analysis
Purposes
Examine linearity
Evaluate violations of assumptions
Graphical Analysis of Residuals
Plot residuals vs. Xi , Yi and time
Chap 11-33
Residual Analysis for

Linearity
Y
X
X
e
X
Not Linear
Linear
Chap 11-34
Studentized Residual
SRi
SYX
ei
1 hi
where
1
hi
n
X X
X X
2
i 1
Residual divided by its standard error

Standardized residual adjusted for the
distance from the average X value
Allow us to normalize the magnitude of the
residuals in units reflecting the variation
around the regression line
Chap 11-35
Residual Analysis for

Homoscedasticity
Y
SR
SR
Heteroscedasticity
Homoscedasticity
Chap 11-36
Residual Analysis:Excel
Output for Produce Stores
Example
Observation
1
2
3
4
5
6
7
Excel Output
Predicted Y
4202.344417
3928.803824
5822.775103
9894.664688
3557.14541
4918.90184
3588.364717
Residuals
-521.3444173
-533.8038245
830.2248971
-351.6646882
-239.1454103
644.0981603
171.6352829
Residual Plot
1000
2000
3000
4000
Square Feet
5000
6000
Chap 11-37
Residual Analysis
for Independence
The Durbin-Watson Statistic
Used when data is collected over time to

detect autocorrelation (residuals in one time
period are related to residuals in another
period)
n
Measures
violation of independence
2
(
e
e
)
assumption
i
i 1
Should be close to 2.
i 2
e
i 1
2
i
If not, examine the

model for
autocorrelation.
Chap 11-38
Durbin-Watson Statistic
in PHStat
PHStat | regression | simple linear

regression
Check the box for Durbin-Watson Statistic
Chap 11-39
Obtaining the Critical Values

of Durbin-Watson Statistic
Table 13.4 Finding critical values of Durbin-Watson Statistic
5
p=1
p=2
dL
dU
dL
dU
15
1.08
1.36
.95
1.54
16
1.10
1.37
.98
1.54
Chap 11-40
Using the
Durbin-Watson Statistic
H 0:
No autocorrelation (error terms are independent)

H1 : There is autocorrelation (error terms are not
independent)
Reject H0
(positive
autocorrelation)
dL
Inconclusive
Accept H0
(no autocorrelatin)
dU
4-dU
Reject H0
(negative
autocorrelation)
4-dL
4
Chap 11-41
Residual Analysis
for Independence
Graphical Approach
Not Independent
Independent
e
Time
Cyclical Pattern
Time
No Particular Pattern
Residual is plotted against time to detect any autocorrelation

Chap 11-42
Inference about the Slope:

t Test
t test for a population slope
Null and alternative hypotheses
Is there a linear dependency of Y on X ?

H0: 1 = 0 (no linear dependency)
H1: 1 0 (linear dependency)
Test statistic
b1 1
t
where Sb1
Sb1
d. f . n 2
SYX
n
(X
i 1
X)
Chap 11-43
Example: Produce Store

Data for Seven Stores:
Store
1
2
3
4
5
6
7
Square
Feet
Annual
Sales
($000)
1,726
1,542
2,816
5,555
1,292
2,208
1,313
3,681
3,395
6,653
9,543
3,318
5,563
3,760
Estimated
Regression
Equation:
Yi = 1636.415 +1.487Xi
The slope of this
model is 1.487.
Is square footage
of the store
affecting its annual
Chap 11-44
sales?
Inferences about the Slope:

t Test Example
H0: 1 = 0
H1: 1 0
Test Statistic:
From Excel Printout
b1 Sb1
Coefficients Standard Error t Stat P-value

.05
Intercept
1636.4147
451.4953 3.6244 0.01515
df 7 - 2 = 5
Footage
1.4866
0.1650 9.0099 0.00028
Critical Value(s):
Reject
.025
Decision:
Reject H0
Reject
.025
-2.5706 0 2.5706
Conclusion:
There is evidence that
square footage affects
annual sales. Chap 11-45

Confidence Interval Example
Confidence Interval Estimate of the
Slope:
b1 tn 2 Sb1
Excel Printout for Produce Stores

Lower 95% Upper 95%
Intercept
475.810926 2797.01853
X Variable 11.06249037 1.91077694
At 95% level of confidence, the confidence interval
for the slope is (1.062, 1.911). Does not include 0.
Conclusion: There is a significant linear dependency
of annual sales on the size of the store.
Chap 11-46

F Test
F Test for a population slope
Is there a linear dependency of Y on X ?

H0: 1 = 0 (No Linear Dependency)
H1: 1 0 (Linear Dependency)
Test statistic
SSR
1
SSE
n 2
Numerator d.f.=1, denominator d.f.=n-2

Chap 11-47
Relationship between
a t Test and an F Test
H0: 1 = 0 (No linear dependency)
H1: 1 0
t
n2
(Linear dependency)
F1,n 2
Chap 11-48

F Test Example
Test Statistic:
H0: 1 = 0
From Excel Printout
ANOVA
H1: 1 0
df
SS
MS
F Significance F
.05
Regression
1 30380456.12 30380456.12 81.179
0.000281
numerator Residual
5 1871199.595 374239.919
df = 1
Total
6 32251655.71
denominator
df 7 - 2 = 5
Decision: Reject H0
Reject
Conclusion:
= .05
6.61
F1,n 2
There is evidence that

square footage affects
annual sales.
Chap 11-49
Purpose of Correlation
Analysis
Correlation analysis is used to measure

strength of association (linear
relationship) between two numerical
variables
Only concerned with strength of the

relationship
No causal effect is implied
Chap 11-50
Purpose of Correlation
Analysis
(continued)
Population correlation coefficient (Rho)

is used to measure the strength
between the variables
Sample correlation coefficient r is an
estimate of and is used to measure
the strength of the linear relationship in
the sample observations
Chap 11-51
Sample of Observations from

Various r Values
Y
r = -1
r = -.6
r=0
r = .6
r=1
Chap 11-52
Features of and r
Unit free
Range between -1 and 1
The closer to -1, the stronger the
negative linear relationship
The closer to 1, the stronger the positive
linear relationship
The closer to 0, the weaker the linear
relationship
Chap 11-53
Test for a Linear Relationship
Hypotheses
H0: = 0 (no correlation)
H1: 0 (correlation)
Test statistic
t
where
r
n2
2
r r2
X
i 1
X
i 1
X Yi Y
Y Y
i 1
Chap 11-54
Example: Produce Stores

From Excel Printout
Is there any
evidence of a
linear relationship
between the
annual sales of a
store and its
square footage
at .05 level of
significance?
Regression Statistics
Multiple R
0.9705572
R Square
0.94198129
Adjusted R Square 0.93037754
Standard Error
611.751517
Observations
7
H0: = 0 (No association)

H1: 0 (Association)
.05
df 7 - 2 = 5
Chap 11-55
Example:
Produce Stores Solution
r
.9706
t
9.0099
2
1 .9420
r
5
n2
Critical Value(s):
Reject
.025
Reject
.025
-2.5706 0 2.5706
Decision:
Reject H0
Conclusion:
There is evidence of a
linear relationship at 5%
level of significance
The value of the t

statistic is exactly the
same as the t statistic
value for test on the
slope coefficient
Chap 11-56
Estimation of Mean Values

Confidence interval estimate for
Y | X X i
:
The mean ofSize
Y given
a particular Xi
of interval varies according
Standard error
of the estimate
Yi tn 2 SYX
t value from table
with df=n-2
to distance away from mean,
(Xi X )
1
n
n
2
(Xi X )
2
i 1
Chap 11-57
Prediction of Individual
Values
Prediction interval for individual
response Yi at a particular Xi
Addition of one increases width of interval
from that for the mean of Y
Yi tn 2 SYX
1 (Xi X )
1 n
n
2
(Xi X )
2
i 1
Chap 11-58
Interval Estimates
for Different Values of X
Y
Confidence
Interval for the
mean of Y
Prediction Interval
for a individual Yi
X
b
1 i
+
Yi = b0
A given X
X
Chap 11-59
Example: Produce Stores

Data for seven stores:
Store
Square
Feet
Annual
Sales
($000)
1
2
3
4
5
6
7
1,726
1,542
2,816
5,555
1,292
2,208
1,313
3,681
3,395
6,653
9,543
3,318
5,563
3,760
Predict the
annual sales for
a store with
2000 square
feet.
Regression Model
Obtained:
Yi = 1636.415 +1.487Xi
Chap 11-60
Estimation of Mean Values:

Example
Confidence Interval Estimate
for
Y|X Xi
Find the 95% confidence interval for the average
annual sales for a 2,000 square-foot store.
Predicted Sales Yi = 1636.415 +1.487Xi = 4610.45 ($000)

X = 2350.29
SYX = 611.75
Yi tn 2 SYX
tn-2 = t5 = 2.5706
1
( X i X )2
n
4610.45 612.66
n
2
(
X
X
)
i
i 1
Chap 11-61
Prediction Interval for Y :

Example
Prediction Interval for
Individual Y
Find the 95% prediction interval

for the annual sales of a 2,000
square-foot store
Predicted Sales Yi = 1636.415 +1.487Xi = 4610.45 ($000)
X = 2350.29
Yi tn 2 SYX
SYX = 611.75
tn-2 = t5 = 2.5706
1 ( X i X )2
1 n
4610.45 1687.68
n
2
(
X
X
)
i
i 1
Chap 11-62
Estimation of Mean Values and

Prediction of Individual Values in
PHStat
In excel, use PHStat | regression | simple

linear regression
Check the confidence and prediction

interval for X= box
EXCEL spreadsheet of regression sales

on footage
Micros oft Excel

Works heet
Chap 11-63
Pitfalls of Regression
Analysis
Lacking an awareness of the assumptions

underlying least-squares regression
Not knowing how to evaluate the
assumptions
Not knowing the alternatives to leastsquares regression if a particular
assumption is violated
Using a regression model without
knowledge of the subject matter
Chap 11-64
Strategies for Avoiding

the Pitfalls of Regression
Start with a scatter plot of X on Y to

observe possible relationship
Perform residual analysis to check the
assumptions
Use a histogram, stem-and-leaf
display, box-and-whisker plot, or
normal probability plot of the
residuals to uncover possible nonnormality
Chap 11-65
Strategies for Avoiding

the Pitfalls of Regression
(continued)
If there is violation of any assumption, use

alternative methods (e.g.: least absolute
deviation regression or least median of
squares regression) to least-squares
regression or alternative least-squares
models (e.g.: Curvilinear or multiple
regression)
If there is no evidence of assumption
violation, then test for the significance of
the regression coefficients and construct
confidence intervals and prediction intervals
Chap 11-66
Chapter Summary
Introduced types of regression models

Discussed determining the simple linear
regression equation
Described measures of variation
Addressed assumptions of regression
and correlation
Discussed residual analysis
Addressed measuring autocorrelation
Chap 11-67
Chapter Summary
(continued)
Described inference about the slope

Discussed correlation -- measuring the
strength of the association
Addressed estimation of mean values
and prediction of individual values
Discussed possible pitfalls in regression
and recommended a strategy to avoid
them
Chap 11-68

11

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

11

Uploaded by

Copyright:

Available Formats

Statistics for Managers

using Microsoft Excel

2002 Prentice-Hall, Inc.

Types of regression models

Correlation - measuring the strength of

Regression analysis is used primarily

Predicts the value of a dependent

Types of Regression Models

Negative Linear Relationship

Relationship NOT Linear

Simple Linear Regression

Relationship between variables

Population Linear Regression

Population Linear Regression

Sample Linear Regression

Sample Regression Line

Sample Linear Regression

b0 and b1 are obtained by finding the

Sample Linear Regression

E Y | X 0 is the average value of Y

when the value of X is zero.

the average value of Y as a result of a

average value of Y when the value of X is

Simple Linear Regression:

Scatter Diagram: Example

Equation for the Sample

Annual Sales ($000)

Graph of the Sample

Simple Linear Regression in

In excel, use PHStat | regression | simple

EXCEL spreadsheet of regression sales

Micros oft Excel

SST = total sum of squares

SSR = regression sum of squares

Measures the variation of the Yi values

SSE = error sum of squares

Variation attributable to factors other than

The ANOVA Table in Excel

SSR Regression Sum of Squares

Measures the proportion of variation

Yr2 = .8, r = +0.9

Standard Error of Estimate

The standard deviation of the variation

94% of the variation in annual sales can

Y values are normally distributed for each X

2. Homoscedasticity (Constant Variance)

Variation of Errors around

Y values are normally distributed

Sample Regression Line

Graphical Analysis of Residuals

Plot residuals vs. Xi , Yi and time

Residual Analysis for

Residual divided by its standard error

Residual Analysis for

The Durbin-Watson Statistic

Used when data is collected over time to

If not, examine the

PHStat | regression | simple linear

Check the box for Durbin-Watson Statistic

Obtaining the Critical Values

No autocorrelation (error terms are independent)

Residual is plotted against time to detect any autocorrelation

Inference about the Slope:

t test for a population slope