Simple Regression

Simple Linear Regression

Simple Linear Regression Model
Least Squares Method
Coefficient of Determination
Model Assumptions
Testing for Significance
Regression analysis can be used to develop an
equation showing how the variables are related.
Managerial decisions often are based on the
relationship between two or more variables.
The variables being used to predict the value of the
dependent variable are called the independent
variables and are denoted by x.
The variable being predicted is called the dependent
variable and is denoted by y.
The relationship between the two variables is
approximated by a straight line.
Simple linear regression involves one independent
variable and one dependent variable.
Regression analysis involving two or more
independent variables is called multiple regression.
Simple Linear Regression Model
y = |
0
+ |
1
x +c
where:
|
0
and |
1
are called parameters of the model,
c is a random variable called the error term.
The simple linear regression model is:
The equation that describes how y is related to x and
an error term is called the regression model.
Simple Linear Regression Equation
The simple linear regression equation is:
E(y) is the expected value of y for a given x value.
|
1
is the slope of the regression line.
|
0
is the y intercept of the regression line.
Graph of the regression equation is a straight line.
E(y) = |
0
+ |
1
x
Estimated Simple Linear Regression Equation
The estimated simple linear regression equation
0 1
y b b x = +
is the estimated value of y for a given x value.
y
b
1
is the slope of the line.
b
0
is the y intercept of the line.
The graph is called the estimated regression line.
Estimation Process

Regression Model
y = |
0
+ |
1
x +c
Regression Equation
E(y) = |
0
+ |
1
x
Unknown Parameters
|
0
, |
1

Sample Data:
x y
x
1
y
1
. .
. .
x
n
y
n

b
0
and b
1

provide estimates of
|
0
and |
1
Estimated
Regression Equation

Sample Statistics
b
0
, b
1

0 1
y b b x = +
Least Squares Criterion
min (y y
i i
)
2
where:
y
i
= observed value of the dependent variable
for the ith observation
^
y
i
= estimated value of the dependent variable
for the ith observation
Slope for the Estimated Regression Equation
1
2
( )( )
( )
i i
i
x x y y
b
x x

=

where:
x
i
= value of independent variable for ith
observation
_
y = mean value for dependent variable
_
x = mean value for independent variable
y
i
= value of dependent variable for ith
observation
y-Intercept for the Estimated Regression Equation
0 1
b y b x =
Kataria Auto periodically has a special week-long
sale. As part of the advertising campaign Kataria runs
one or more television commercials during the
weekend preceding the sale. Data from a sample of 5
previous sales are shown on the next slide.
Example: Kataria Auto Sales
Example: Kataria Auto Sales
Number of
TV Ads (x)
Number of
Cars Sold (y)
1
3
2
1
3
14
24
18
17
27
Ex = 10 Ey = 100
2 x = 20 y =
Estimated Regression Equation
10 5 y x = +
1
2
( )( )
20
5
( ) 4
i i
i
x x y y
b
x x

= = =
0 1
20 5(2) 10 b y b x = = =
Slope for the Estimated Regression Equation
y-Intercept for the Estimated Regression Equation
Estimated Regression Equation
Relationship Among SST, SSR, SSE
where:
SST = total sum of squares
SSR = sum of squares due to regression
SSE = sum of squares due to error
SST = SSR + SSE
2
( )
i
y y
( )
i
y y =
( )
i i
y y +
The coefficient of determination is:

where:
SSR = sum of squares due to regression
SST = total sum of squares
r
2
= SSR/SST
r
2
= SSR/SST = 100/114 = .8772
The regression relationship is very strong; 87.72%
of the variability in the number of cars sold can be
explained by the linear relationship between the
number of TV ads and the number of cars sold.
Sample Correlation Coefficient
2
1
) of (sign r b r
xy
=
xy
r
1
(sign of ) Coefficient of Determination = b
where:
b
1
= the slope of the estimated regression
equation x b b y
1 0
+ =
2
1
) of (sign r b r
xy
=
The sign of b
1
in the equation is +.
10 5 y x = +
= + .8772
xy
r
Sample Correlation Coefficient
r
xy
= +.9366
Assumptions About the Error Term c
1. The error c is a random variable with mean of zero.
2. The variance of c , denoted by o
2
, is the same for
all values of the independent variable.
3. The values of c are independent.
4. The error c is a normally distributed random
variable.
To test for a significant regression relationship, we
must conduct a hypothesis test to determine whether
the value of |
1
is zero.
Two tests are commonly used:

t Test
and
F Test
Both the t test and F test require an estimate of o
2
,
the variance of c in the regression model.
An Estimate of o
2


= =
2
1 0
2
) ( )
( SSE
i i i i
x b b y y y
where:
s
2
= MSE = SSE/(n 2)
The mean square error (MSE) provides the estimate
of o
2
, and the notation s
2
is also used.
An Estimate of o
2
SSE
MSE
= =
n
s
To estimate o we take the square root of o
2
.
The resulting s is called the standard error of
the estimate.
Hypotheses

Test Statistic
Testing for Significance: t Test
0 1
: 0 H | =
1
: 0
a
H | =
1
1
b
b
t
s
=
where
1
2
( )
b
i
s
s
x x
=
E
Rejection Rule
where:
t
o/2
is based on a t distribution
with n - 2 degrees of freedom
Reject H
0
if p-value < o
or t < -t
o/2
or t > t
o/2
1. Determine the hypotheses.
2. Specify the level of significance.
3. Select the test statistic.
o = .05
4. State the rejection rule.
Reject H
0
if p-value < .05
or |t| > 3.182 (with
3 degrees of freedom)
0 1
: 0 H | =
1
: 0
a
H | =
1
1
b
b
t
s
=
5. Compute the value of the test statistic.
6. Determine whether to reject H
0
.
t = 4.541 provides an area of .01 in the upper
tail. Hence, the p-value is less than .02. (Also,
t = 4.63 > 3.182.) We can reject H
0
.
1
1
5
4.63
1.08
b
b
t
s
= = =
Confidence Interval for |
1
H
0
is rejected if the hypothesized value of |
1
is not
included in the confidence interval for |
1
.
We can use a 95% confidence interval for |
1
to test
the hypotheses just used in the t test.
The form of a confidence interval for |
1
is:
1
1
1 /2 b
b t s
o
where is the t value providing an area

of o/2 in the upper tail of a t distribution
with n - 2 degrees of freedom
2 / o
t
b
1
is the
point
estimator

is the
margin
of error
1
/2 b
t s
o
1
Reject H
0
if 0 is not included in
the confidence interval for |
1
.
0 is not included in the confidence interval.
Reject H
0
= 5 +/- 3.182(1.08) = 5 +/- 3.44
1
2 / 1 b
s t b
o
or 1.56 to 8.44
Rejection Rule

95% Confidence Interval for |
1
Conclusion

Hypotheses

Test Statistic
Testing for Significance: F Test
F = MSR/MSE
0 1
: 0 H | =
1
: 0
a
H | =
Rejection Rule
where:
F
o
is based on an F distribution with
1 degree of freedom in the numerator and
n - 2 degrees of freedom in the denominator
Reject H
0
if
p-value < o
or F > F
o
1. Determine the hypotheses.
2. Specify the level of significance.
3. Select the test statistic.
o = .05
4. State the rejection rule.
Reject H
0
if p-value < .05
or F > 10.13 (with 1 d.f.
in numerator and
3 d.f. in denominator)
0 1
: 0 H | =
1
: 0
a
H | =
F = MSR/MSE
5. Compute the value of the test statistic.
6. Determine whether to reject H
0
.
F = 17.44 provides an area of .025 in the upper
tail. Thus, the p-value corresponding to F = 21.43
is less than 2(.025) = .05. Hence, we reject H
0
.
F = MSR/MSE = 100/4.667 = 21.43
The statistical evidence is sufficient to conclude
that we have a significant relationship between the
number of TV ads aired and the number of cars sold.
Some Cautions about the
Interpretation of Significance Tests
Just because we are able to reject H
0
: |
1
= 0 and
demonstrate statistical significance does not enable
us to conclude that there is a linear relationship
between x and y.
Rejecting H
0
: |
1
= 0 and concluding that the
relationship between x and y is significant does
not enable us to conclude that a cause-and-effect
relationship is present between x and y.

Simple Regression

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Simple Regression

Uploaded by

Copyright:

Available Formats

Simple Linear Regression

Simple Linear Regression

Least Squares Method

The coefficient of determination is:

where is the t value providing an area

You might also like