Professional Documents
Culture Documents
y y
x x
y y
x x
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 14-3
Scatter Plot Examples
(continued)
Strong relationships Weak relationships
y y
x x
y y
x x
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 14-4
Scatter Plot Examples
(continued)
No relationship
x
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 14-5
Correlation Coefficient
(continued)
x x x
r = -1 r = -.6 r=0
y y
x x
r = +.3
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. r = +1 Chap 14-8
Calculating the
Correlation Coefficient
Sample correlation coefficient:
r=
∑ ( x − x)( y − y)
[∑ ( x − x ) ][ ∑ ( y − y ) ]
2 2
Tree n∑ xy − ∑ x ∑ y
Height, r=
y 70 [n( ∑ x 2 ) − ( ∑ x)2 ][n( ∑ y 2 ) − ( ∑ y)2 ]
60
8(3142) − (73)(321)
50 =
40
[8(713) − (73)2 ][8(14111) − (321)2 ]
30
= 0.886
20
10
0
r = 0.886 → relatively strong positive
0 2 4 6 8 10 12 14
linear association between x and y
Trunk Diameter, x
Correlation between
Tree Height and Trunk Diameter
Test statistic
r
t=
1− r 2
(with n – 2 degrees of freedom)
n−2
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 14-13
Example: Produce Stores
Is there evidence of a linear relationship
between tree height and trunk diameter at
the .05 level of significance?
H 0: ρ = 0 (No correlation)
H 1: ρ ≠ 0 (correlation exists)
α =.05 , df = 8 - 2 = 6
r .886
t= = = 4.68
1− r 2 1 − .886 2
n−2 8−2
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 14-14
Example: Test Solution
r .886 Decision:
t= = = 4.68
1− r 2 1 − .886 2 Reject H0
y = β0 + β1x + ε
Variable
y y = β0 + β1x + ε
Observed Value
of y for xi
εi Slope = β1
Predicted Value Random Error
of y for xi
for this x value
Intercept = β0
xi x
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 14-21
Estimated Regression Model
The sample regression line provides an estimate of
the population regression line
ŷ i = b0 + b1x variable
∑e 2
= ∑ (y −ŷ) 2
= ∑ (y − (b 0 + b1x))
2
b1 =
∑ (x − x)(y − y) algebraic equivalent for b1:
∑ (x − x) 2
∑ xy − ∑ x∑ y
b1 = n
and (
∑x − n
2 ∑ x ) 2
b0 = y − b1x
ANOVA
df SS MS F Significance F
Regression 1 18934.9348 18934.9348 11.0848 0.01039
Residual 8 13665.5652 1708.1957
Total 9 32600.5000
350
Slope
300
250
= 0.10977
200
150
100
50
Intercept 0
= 98.248 0 500 1000 1500 2000 2500 3000
Square Feet
Xi x
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 14-36
Coefficient of Determination, R2
The coefficient of determination is the portion
of the total variation in the dependent variable
that is explained by variation in the
independent variable
The coefficient of determination is also called
R-squared and is denoted as R2
SSR
R = 2 where 0 ≤R ≤1
2
SST
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 14-37
Coefficient of Determination, R2
(continued)
Coefficient of determination
SSR sum of squares explained by regression
R = 2
=
SST total sum of squares
R =r 2 2
where:
R2 = Coefficient of determination
r = Simple correlation coefficient
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 14-38
Examples of Approximate
R2 Values
y
R2 = 1
x
R = +1
2
y
0 < R2 < 1
x
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 14-40
Examples of Approximate
R2 Values
(continued)
R2 = 0
y
No linear relationship
between x and y:
Test statistic
SSR/1
F=
SSE/(n − 2) (with D1 = 1 and D2 = n - 2
degrees of freedom)
ANOVA
df SS MS F Significance F
Regression 1 18934.9348 18934.9348 11.0848 0.01039
Residual 8 13665.5652 1708.1957
Total 9 32600.5000
SSE
sε =
n−2
Where
SSE = Sum of squares error
n = Sample size
where:
sb1 = Estimate of the standard error of the least squares slope
SSE
sε = = Sample standard error of the estimate
n−2
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 14-46
Excel Output
Regression Statistics sε = 41.33032
Multiple R 0.76211
R Square 0.58082
Adjusted R Square 0.52842
Standard Error
Observations
41.33032
10
sb1 = 0.03297
ANOVA
df SS MS F Significance F
Regression 1 18934.9348 18934.9348 11.0848 0.01039
Residual 8 13665.5652 1708.1957
Total 9 32600.5000
y y
d.f. = 10-2 = 8
Decision:
α/2=.025 α/2=.025 Reject H0
Conclusion:
Reject H0 Do not reject H0 Reject H
There is sufficient evidence
-tα/2 tα/2 0
1 (x p − x)
2
ŷ ± t α/2sε +
n ∑ (x − x) 2
1 (x p − x)
2
ŷ ± t α/2 sε 1+ +
n ∑ (x − x) 2
Prediction Interval
for an individual y,
y given xp
Confidence
Interval for
∧ + b x the mean of
y = b0
1
y, given xp
x
x xp
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 14-56
Example: House Prices
= 98.25 + 0.1098(200 0)
= 317.85
The predicted price for a house with 2000
square feet is 317.85($1,000s) = $317,850
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 14-58
Estimation of Mean Values:
Example
Confidence Interval Estimate for E(y)|xp
Find the 95% confidence interval for the average
price of 2,000 square-foot houses
∧
Predicted Price Yi = 317.85 ($1,000s)
1 (x p − x)2
ŷ ± t α/2s ε + = 317.85 ± 37.12
n ∑ (x − x) 2
1 (x p − x)2
ŷ ± t α/2s ε 1+ + = 317.85 ± 102.28
n ∑ (x − x) 2
In Excel, use
PHStat | regression | simple linear regression …
Check the
“confidence and prediction interval for X=”
box and enter the x-value and confidence level
desired
Input values
levels of x
Evaluate normal distribution assumption
y y
x x
residuals
residuals
x x
Not Linear
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Linear
Chap 14-64
Residual Analysis for
Constant Variance
y y
x x
residuals
x residuals x
RESIDUAL OUTPUT
Predicted House Price Model Residual Plot
House Price Residuals
1 251.92316 -6.923162 80
2 273.87671 38.12329 60
3 284.85348 -5.853484 Residuals 40
4 304.06284 3.937162 20
5 218.99284 -19.99284
0
6 268.38832 -49.38832 0 1000 2000 3000
-20
7 356.20251 48.79749
-40
8 367.17929 -43.17929
9 254.6674 64.33264 -60