Week10 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 10
ACTL2002/ACTL5101 Probability and Statistics

c Katja Ignatieva

School of Risk and Actuarial Studies
Australian School of Business
University of New South Wales
k.ignatieva@unsw.edu.au
Week 10
Week 2
Week 3
Week 4
Probability:
Week 6
Review
Estimation: Week 5
Week
7
Week
8
Week 9
Hypothesis testing:
Week
11
Week
12
Linear regression:
Week 2 VL
Week 3 VL
Week 4 VL
Video lectures: Week 1 VL
Week 1
Week 5 VL
3101/3173
Last nine weeks

Introduction to probability;
Moments: (non)-central moments, mean, variance (standard
deviation), skewness & kurtosis;
Special univariate (parametric) distributions (discrete &
continue);
Joint distributions;
Convergence; with applications LLN & CLT;
Estimators (MME, MLE, and Bayesian);
Evaluation of estimators;
Interval estimation & Hypothesis tests.
3102/3173
This week
Simple linear regression:
- Idea;
- Estimating using LSE (& BLUE estimator & relation MLE);
- Partition of variability of the variable;
- Testing:
i) Slope;
ii) Intercept;
iii) Regression line;
iv) Correlation coefficient.
3103/3173

Simple linear regression
Basic idea
Simple Linear Regression

Basic idea
Correlation Coefficient
Assumptions
Relation MLE and LSE
Partitioning the variability
Exercise
Testing in simple linear regression
Overview
Inference on the slope
Inference on the intercept
Confidence Intervals for the Population Regression Line
Prediction Intervals for the Actual Value of the Dependent Variable
Hypothesis Test for Population Correlation
Exercise
Appendix
Algebra: parameter estimates
Properties of parameter estimates
Proof: Partitioning the variability

Basic idea
Basic idea
Suppose observe data y = [y1 , . . . , yn ]> ;
Assume that Y is affected by X with x = [x1 , . . . , xn ]> ;
What can we say about the relationship between X and Y ?
To do so we fit:
y = 0 + 1 x + ,
to the data:
(xi , yi )
for i = 1, 2, . . . , n.
y is called the endogenous variable (or response/dependent

variable);
x is called exogenous variable (or predictor/independent
variable);
3104/3173
Question: How to determine 0 and 1 ?

Basic idea
Basic idea
Regression, with E[i ] = 0:
yi = 0 + 1 xi + i .
We determine 0 and 1 by minimizing S (0 , 1 ) =
Pn
2
i=1 i .
Hence, we use least squares estimates (LSE) for 0 and 1 :

( n
)
( n
)
X
X
2
min {S (0 , 1 )} = min
2i = min
(yi (0 + 1 xi ))
0 ,1
0 ,1
i=1
0 ,1
i=1
The minimum is obtained by setting FOC equal to zero:

n
X
S (0 , 1 )
yi (0 + 1 xi )
= 2
0
i=1
S (0 , 1 )
= 2
1
3105/3173
n
X
i=1
xi (yi (0 + 1 xi )) .

Basic idea
The LSE b0 and b1 are given by setting the FOC equal to

zero:
n
n
X
X
b
b
yi =n0 + 1
xi
i=1
n
X
i=1
xi yi =b0
i=1
n
X
xi + b1
i=1
Next step: b0 P
and b1 as functions of
P
n
n
2
i=1 xi yi .
i=1 xi , and
n
X
xi2 .
i=1
Pn
i=1 yi ,
Pn
i=1 xi ,
See F&T page 24, slides 3161-3165 (MLE on slide 3123):

Pn
i=1 xi yi n xy
b
b
b
0 =y 1 x;
1 = P
;
n
2
2
i=1 xi n x
!
Pn
P
n
X
bi )2
( ni=1 xi yi n x y )2
1
2
2
2
i=1 (yi y
Pn
b =
=
yi n y
2
2
n2
n2
i=1 xi n x
i=1
3106/3173


Basic idea
Assumptions
Exercise
Overview
Exercise
Appendix

Regression: find the dependency of Y and X , i.e., they have a
joint distribution.
1 give the marginal effect of a change in X , XY measures
the strength of the dependence.
Recall from week 3: the correlation coefficient between a pair

of random variables X and Y , denoted by XY , or simply is:
XY =
E [(X X ) (Y Y )]
Cov (X , Y )
=p
.
X Y
E [(X X )2 ] E [(Y Y )2 ]
The value of the correlation coefficient lays between -1 and 1,

i.e., 1 XY 1.
3107/3173

We know that the correlation coefficient has the following
interpretations:
- A correlation of 1 indicates a perfect negative linear
relationship;
- A correlation of 1 indicates a perfect positive linear
relationship;
- A correlation of 0 implies no linear relationship;
- The larger the correlation in absolute value, the stronger the
(positive/negative) linear relationship.
3108/3173

Correlations are indications of linear relationships - it is
possible that two variables have zero correlation, but are
strongly dependent (non-linear).
The correlation coefficient is a population parameter that can
be estimated from data.
Suppose we have n pairs of observations denoted by:
(x1 , y1 ) , (x2 , y2 ) , . . . , (xn , yn )
Estimate the population correlation XY using (week 3):
v
v
u
u
n
n
u 1 X
u 1 X
2
t
t
(xi x)
and
sy =
(yi y )2
sx =
n1
n1
i=1
3109/3173
i=1
to estimate the population standard deviations X and Y ,

respectively.

Similarly, the sample covariance is given by:
n
sX ,Y
1 X
(xi x) (yi y ) .
=
n1
i=1
Thus the sample correlation coefficient is:
=
=
=
3110/3173
1 Pn
(xi x) (yi y )
n 1 i=1
sx sy
Pn
(x x) (yi y )
qP i=1 i
n
2 Pn
2
i=1 (xi x)
i=1 (yi y )
Pn
i=1 xi yi n (x y )
q P
Pn
.
n
2 nx 2
2 ny 2
y
x
i=1 i
i=1 i

Effect correlation
=0
=0.8
=0.3
2
1.5
1
0.5
0
0.5
1
1.5
2
2
3111/3173
1.5
0.5
0
x
0.5
1.5

Effect variance
x=4, y=1
x=1, y=1
x=1, y=4
6
4
2
0
2
4
6
8
3112/3173
0
x

Effect mean
x=3, y=0
x=0, y=0
x=0, y=3
4
3
2
1
0
1
2
3
4
5
3113/3173
0
x

Assumptions

Basic idea
Assumptions
Exercise
Overview
Exercise
Appendix

Assumptions
In order to preform linear regression we require:

- Non-collinearity: x is linear independent of 1n or rank(X )=2 or
X is not singular, i.e., det(X > X ) 6= 0.
- Weak assumptions:
E [|X = x] =0
Var (|X = x) = 2 In ,
where In is a matrix of size n n with 1 on the diagonal.
- Strong assumption (for tests/CI, not required for
LSE-estimates):
L {|X = x} =Nn (0, 2 In ).
- Sometimes additional assumption X and are independent
(not required for LSE-estimates).
Weak assumptions imply:

E y |X = x =E X + |X = x = X

Var y |X = x =Var X + |X = x = 2 In ,
3114/3173
conditional covariance matrix y |X = x is independent of X .

Assumptions
Effect non-linear function

exponential
linear
quadratic
5
4
3
2
1
0
1
2
3
3
3115/3173
1
x

Assumptions
Effect changing variance

8
4
1.5
3116/3173
0.5
0.5
x
1.5

Assumptions
Effect binary choice variable

1
0.8
0.6
0.4
0.2
0
0.2
3117/3173
1.5
0.5
0.5
x
1.5

Assumptions
Unbiased regression parameters

Statistical Properties of Least Squares Estimators. Recall the
statistical model:
yi = 0 + 1 xi + i ,
for i = 1, . . . , n.
Under the weak assumptions we have unbiased estimates (see

slide 3166):
h i
h i
E b0 =0
and
E b1 = 1 .
3118/3173
An (unbiased) estimate of 2 is given by:

2
Pn
Pn 2
b0 + b1 xi
y
i
i=1
i=1 i
=
.
s2 =
n2
n2
Proof: we use that b0 and b1 are unbiased and E[i ] = 0:
h i
h i
ybi = b0 + b1 xi + i
E [b
yi ] = E b0 + E b1 xi .

Assumptions
Interpretation uncertainty slope

x=4, y=1
x=1, y=1
x=4, y=4
10
10
10
3119/3173
0
x
10

Assumptions
Interpretation uncertainty slope

x=4, y=1
x=1, y=1
x=4, y=4
6
4
2
0
2
4
6
10
3120/3173
0
x
10

Assumptions
Interpretation uncertainty intercept

x=3, y=1
x=1, y=0
x=3, y=0
3
2
1
0
1
2
3
5
3121/3173
0
x


Basic idea
Assumptions
Exercise
Overview
Exercise
Appendix

Maximum Likelihood Estimates

In the regression model there are three parameters to
estimate: 0 , 1 , and 2 .
Joint density of Y1 , Y2 , . . . , Yn -under the (strong) normality
assumptions- is the product of their marginals (independent
by assumption) so that the likelihood is:
!
n
Y
1
(yi (0 + 1 xi ))2
exp
L y ; 0 , 1 , =
2 2
2
i=1
!
n
1 X
2
=
exp 2
(yi (0 + 1 xi ))
2
(2)n/2 n
i=1
n

1 X
` y ; 0 , 1 , = n log
2 2
(yi (0 + 1 xi ))2 .
2
1
i=1
3122/3173


Partial derivatives set to zero give the following MLEs:
b0 =y b1 x,
Pn
(xi x) (yi y )
b
1 = i=1
,
Pn
2
i=1 (xi x)
and
n
b2 =

2
n2
1 X
yi b0 + b1 xi
= s2
.
n
n
i=1
Note: the parameters b0 and b1 are the same as in case of

LS (see slide 3106).
However, the MLE
b2 is a biased estimator of 2 .
3123/3173
Thus we use LSE, which is the unbiased variant of the MLE.

BLUE estimator
A point estimator b() is called linear if b() = Ax + b.
A point estimator b() is called Best Linear Unbiased
Estimator (BLUE):
unbiased;
E [b
()] = (),
Var (b
()) Var ( ? ()) , minimum variance,
for any linear unbiased estimator ? .
One can show that the LS-estimator b0 + X b1 is BLUE for
= 0 + X 1 under the weak assumptions (prove not
required in this course);
One can show that the LS-estimator b0 + X b1 is UMVUE for
= 0 + X 1 under the strong assumptions (prove not
required in this course).
3124/3173


Basic idea
Assumptions
Exercise
Overview
Exercise
Appendix


Partitioning the variability is used for economic significance.
The squared deviations (yi y )2 provide us with a measure of
the spread of the data.
Define:
SST =
n
X
(yi y )2
i=1
to be the total sum of squares.

Using the estimated regression line, we can compute the fitted
value:
ybi = b0 + b1 xi .
3125/3173


Partition the total deviation as:
yi y
| {z }
(y yb )
| i {z i }
total deviation
unexplained deviation
(b
y y) .
| i {z }
explained deviation
We then obtain:
n
X
n
X
|i=1 {z
(yi y ) =
|i=1 {z
SST
n
X
|i=1 {z
(yi ybi ) +
SSE
(b
yi y )2 ,
SSM
where
- SSE: sum of squares error (sometime called residual);
- SSM: sum of squares model (sometime called regression).
Proof: similar to week 8 k-sample tests, see slides 3171-3172

3126/3173

Interpret these sums of squares as follows:

- SST is the total variability in the absence of knowledge of the
variable X ;
- SSE is the total variability remaining after introducing the
effect of X ;
- SSM is the total variability explained because of knowledge
of X .
This partitioning of the variability is used in ANOVA tables:

Source
Model
Sum of squares
SSM=
Degrees
of freedom
Mean
square
(b
yi y )2
DFM=1
SSM
MSM= DFM
MSM
MSE
(yi ybi )2
DFE=n 2
SSE
MSE= DFE
(yi y )2
DFT=n 1
SST
MST= DFT
n
P
i=1
Error
Total
3127/3173
SSE=
n
P
SST=
i=1
n
P
i=1

Coefficient of Determination
Notice that the square of the correlation coefficient occurs in
the denominator of the t statistic used to test hypotheses
concerning the population correlation coefficient. The statistic
R 2 is called the coefficient of determination and provides
useful information.
Noting (prove: slide 3173, notation: slide3163):
SSE = Syy b1 Sxy ,
|{z} | {z }
=SST
=SSM
then the coefficient of determination may be written as:

!2
Sxy
SSE
2
R = p
=1
.
SST
Sxx Syy
3128/3173
Thus, R 2 can be seen as the proportion of total variation of y

explained by the variable x in a simple linear regression model.

Exercise

Basic idea
Assumptions
Exercise
Overview
Exercise
Appendix

Exercise
Exercise
A car insurance company is interested in how large adverse
selection effect is in his sample, i.e. how large the difference in
claim size relative to the premium is for different groups.
The insurance premium depends on the coverage (Gold
Comprehensive Car Insurance, Standard Comprehensive Car
Insurance and Third Party Property Car Insurance) and the
price of the insured vehicle (five categories).
a. Explain why there might be difference in the claim sizes for
the different groups.
Solution: high coverage reckless behavior (example:
airbags). Expensive car more wealthy drivers better
drivers? Other explanation is also possible.
3129/3173

Exercise
Exercise
Each of the 15 categories has a different premiums and
number of contracts for the insurance contract.
The insurance company has the total claim sizes in the groups.
b. Give the linear regression model.
Solution: Let:
yi be the average claim size for group i = 1, . . . , 15;
xi be the average MVI premium for group i = 1, . . . , 15;
Then the regression is:
yi = 0 + 1 x1 + i ,
3130/3173
where 0 and 1 are regression constants, i for i = 1, . . . , 15

the residual independently distributed with mean zero,
variance 2 , independent of X .

Exercise
Exercise
c. Are the weak assumptions and strong assumptions reasonable
in this regression model?
Solution: Weak assumptions:
Residual has a mean of zero: yes, the mean is captured in 0
and 1 . Note: assumed linear relation!
Variance independent of explanatory variable: debatable
(increasing?), have to check using data.
Residuals are independent: yes.
Additional strong assumption:
3131/3173
Residuals are normally distributed: debatable, have to check

using data.

Exercise
Exercise data
1000
900
Average claim size
800
700
600
500
400
300
200
100
200
3132/3173
300
400
500
600
700
Premium (x)
800
900
1000

Exercise
Exercise
The observed values for the 15 groups are:
i
xi
yi
i
xi
yi
1
210
189
9
380
323
2
230
267
10
410
313
3
235
234
11
460
456
4
250
142
12
540
528
5
260
302
13
720
768
6
280
149
14
880
963
7
320
308
15
910
954
8
360
392
P
P
Summary statistics: 15
xi = 6445, 15
i=1
i=1 yi = 6288,
P
P
15
15
2 = 3, 529, 325,
2 = 3, 660, 190, and
x
y
i=1 i
i
Pi=1
15
i=1 xi yi = 3, 566, 000.
d. Find the LS estimates of the regression model.
3133/3173
Solution: See next slide.

Exercise
P15
i=1 xi yi n x y
P15 2
2
i=1 xi n x
3, 566, 000 64456288
15
=
64452
3, 529, 325 15
b1 =
= 1.137.
b0 =y b1 x
6288
6445
=
1.137
= 69.329.
15
15
P
2
15
15
x
y
x
i
i=1 i
1
X 2
b2 =
yi n y 2
P15 2
2
n2
i=1 xi n x
i=1
3, 566, 000 64456288

62882
15
3, 660, 190
64452
15
3, 529, 325 15
=3200
b = 3200 = 56.57.
1
=
13
3134/3173
2 !

Exercise
Exercise
e. Find the Correlation coefficient. Relate the (sign) of the
correlation coefficient to the estimates.
Solution:
Pn
n (x y )

Pn
2
2
2
2
i=1 yi ny
i=1 xi nx
r =q P
n
= r
i=1 xi yi
3, 566, 000 64456228

15

64552
3, 529, 325 15 3, 660, 190
62282
15
=0.9795.
Positive sample correlation (r > 0) b1 is positive.
3135/3173

Exercise
Exercise
f. Partition the variability.
Solution:
n
X
SST =
yi2 n y 2
i=1
=3, 660, 190
62882
= 1, 024, 260
15
SSE =(n 2)
b2
=13 3200 = 41, 606
SSM =SST SSE = 1, 024, 260 41, 606 = 982, 654.
3136/3173

Exercise
Comment on residual plot

80
60
40
Residual
20
0
20
40
60
80
100
120
200
3137/3173
300
400
500
600
700
Premium (x)
800
900
1000

Overview

Basic idea
Assumptions
Exercise
Overview
Exercise
Appendix

Overview
Overview of tests and CI

- Inference on white noise (i )
- Inference on individual parameters (0 or 1 )
- Inference on a function of both parameters (0 + xi 1 )
- Inference on a function of both parameters and white noise
(0 + xi 1 + i )
Note: under strong assumptions: i N(0, 2 ).
Thus (using two estimated parameters b0 and b1 ):
n
(n 2) s 2 X
=
( i / )2 =
|{z}
2
i=1
3138/3173
N(0,1)
Pn
i=1 (yi
b0 b1 xi )2
2 (n2)
2


Basic idea
Assumptions
Exercise
Overview
Exercise
Appendix

Inference on the slope: Often we want to test whether the

exogenous variable has an influence on the endogenous
variable or if the influence is larger/smaller than some value.
The distribution of b1 under the strong assumptions:
b1 1
q
N (0, 1) .
>
e
x
x e
P
Notation:
x =x x e
x >e
x = ni=1 (xi x)2 and
e
P
Var b1 = 2 / ni=1 (xi x)2 (see slide 3168).
is usually unknown, and estimated by s so:
b
b1 1
1 q 1 =
q
>
e
x >e
x
e
s
x e
x
| {z }
N(0,1)
3139/3173
, s (n2)s 2
2
|
2
n2
{z }
(n2)/(n2)
t (n 2)


A 100 (1 ) % confidence interval for 1 is given by:

b
b
b
1 t1/2,n2 se 1 < 1 < 1 + t1/2,n2 se b1 ,
where

s
se b1 = q
e
x >e
x
is the standard error of the estimated slope coefficient.

For testing the null hypothesis H0 : 1 = e1 for some constant
e1 , use the test statistic:
b e
b1 e1
1
1
t b1 = = q
.
>
se b1
e
s
x e
x
3140/3173


The decision rules under various alternative hypotheses are
summarized below.
Decision Making Procedures for Testing H0 : 1 = e1
Alternative H1
Reject
0 in favor of H1 if
H
b
e
1 6= 1
t 1 > t1/2,n2
1 > e1

t b1 > t1,n2
1 < e1

t b1 < t1,n2
To test whether the regressor variable is significant or not, it is

equivalent to testing whether the slope is zero or not. Thus, test
H0 : 1 = 0 against H1 : 1 6= 0.
3141/3173


Basic idea
Assumptions
Exercise
Overview
Exercise
Appendix


Similar, one can test the value of the intercept.
Again, using testing procedure / confidence interval.
The distribution of b0 under the strong assumptions:
b 0
r0
N (0, 1) .
|x|2
1
n + 2 e> e
n x x

Note (see slide 3169): Var b0 = 2 n1 +
2
P x
(xi x)2
is usually unknown, and estimated by s thus:
3142/3173
, s (n2)s 2
b0 0
b0 0
2
q
q
=
t (n 2) .
2
2
|x|
|x|
n
2
s n1 + n2ex >ex
n1 + n2ex >ex
| {z }
|
{z
} 2 (n2)/(n2)
N(0,1)


A 100 (1 ) % confidence interval for 0 is given by:

b0 t1/2,n2 se b0 < 0 < b0 + t1/2,n2 se b0 ,
s

s
1
x2
where se b0 = r
=s
+ Pn
2
n
|x|2
1
i=1 (xi x)
n + 2 e> e
n x x
is the standard error of the estimated slope coefficient.

For testing the null hypothesis H0 : 0 = e0 for some constant
e0 , use the test statistic (with similar decision rules as for
the slope, see slide 3141):
b e
b e0
0
0
r0
t b0 = =
.
|x|2
1
se b0
s n + 2 e> e
n x x
3143/3173


Basic idea
Assumptions
Exercise
Overview
Exercise
Appendix


Suppose x = x0 is a specified value of the -out of sampleregressor variable and we want to predict the corresponding Y
value associated with it.
Consider estimating the mean of this which will be:
E [Y |x = x0 ] = E [0 + 1 x + |x = x0 ]
= 0 + 1 x0 + 0.
Thus the best predicted value (also unbiased) will be:
yb0 = b0 + b1 x0 .
3144/3173


The variance of the prediction is:

Var (b
y0 ) =Var b0 + b1 x0

=Var b0 + x02 Var b1 + 2x0 Cov b0 , b1

1
x2
2
x 2
2
2
+
+
x
+
2x
=
0
0
n (n 1) sx2
(n 1) sx2
(n 1) sx2

1 x 2 2x0 x + x02
=
+
2
n
(n 1) sx2
!
1
(x x0 )2
+
2.
=
n (n 1) sx2

* see slide 3169 for Var b0 slide 3168 for Var b0 and slide

3170 for Cov b0 , b1 .
3145/3173


Since both b0 and b1 are linear functions of Y1 , . . . , Yn , so is
b0 + b1 x0 .

1
(xx0 )2
2
Therefore, we have: yb0 N 0 + 1 x0 ,
+ (n1)s
2
x
n
and we have: yb0 (0 + 1 x0 )
s
t (n 2) .
(x x0 )2
1
+
s
n (n 1) sx2
This pivot can therefore be used to construct the
100 (1 ) % confidence interval for y0 :
s

(x x0 )2
1
b0 + b1 x0 t1/2,n2 s
+
.
n (n 1) sx2
3146/3173
Note: Regression line (mean response) does not include

uncertainty due to i , for that see slide 3150.

Example
x=4, y=1
x=1, y=1
x=4, y=4
10
10
10
3147/3173
0
x
10


Basic idea
Assumptions
Exercise
Overview
Exercise
Appendix

CI for the Actual Value of the Dependent Variable

In the next slides we will find pointwise CI and prediction
intervals for the value of yi . Note, this implies that for each
observation the probability that it is between the upper and
lower bound is , i.e., all values of yi are between upper and
lower bound w.p. much smaller than .
We base our prediction of Yi (given X = x) when X = xi on:
ybi = b0 + b1 xi .
The error in our prediction is:
Yi ybi = 0 + 1 xi + i ybi = E[Y |X = xi ] ybi + i ,
where we have:
h
i
E [b
yi |X = x, X = xi ] = E b0 + b1 xi |X = x
3148/3173
= 0 + 1 xi .

CI for the Actual Value of the Dependent Variable

Thus we have:
E [Yi ybi |X = x, X = xi ] = E [Y |X = xi ] (0 + 1 xi )
= 0.
Further (using 3145),
Var (Yi ybi |X = x, X = xi ) =Var (Yi |X = xi ) + Var (b
yi |X = x)
2Cov (Yi , ybi |X = x, X = xi )

(xi x)2
2
2 1
= +
+
0
n
Sxx

1 (xi x)2
2
= 1 + +
.
n
Sxx
Notation: Sxx =
3149/3173
Pn
2
i=1 xi .

Prediction Intervals
It then follows that:

1 (xi x)2
(Yi ybi |X = x, X = xi ) N 0, 1 + +
n
Sxx

and thus the test statistic is:

Yi ybi
T =
s
s
1+
x)2
tn2 .
1 (xi
+
n
Sxx
Thus: for the variance for the predicted individual response an

additional 2 must be added to the variance for the predicted
mean response (see slide 3146)
3150/3173

Prediction Intervals
Thus, we have a 100(1 )% prediction interval for Yi , the value

of Y at X = xi given by:
s
1 (xi x)2
ybi t1/2,n2 s 1 + +
n
Sxx
s
1 (xi x)2
= b0 + b1 xi t1/2,n2 s 1 + +
.
n
Sxx
3151/3173

Example
x=4, y=1
x=1, y=1
x=4, y=4
10
10
10
3152/3173
0
x
10


Basic idea
Assumptions
Exercise
Overview
Exercise
Appendix

Testing the correlation coefficient

See F&T page 25.
Let a pair of random variables come from a common bivariate
normal distribution, it is possible to find a test statistic based
on the likelihood ratio test to conduct a test of independence.
Using LRT test H0 : XY = 0 v.s. H1 : XY 6= 0 (or > or <).
The test statistic is based on (prove not required):
R
R n2
T =p
=
,
1 R2
(1 R 2 ) /(n 2)
where R is the random variable denoting correlation
coefficient, but with the x and y replaced by X and Y .
3153/3173


It can be shown that T t (n 2) has a t-distribution with
(n 2) degrees of freedom.
We can summarize a procedure for testing the independence
between two sets of random variables as follow:
Suppose we obtain observed pairs of variables
(x1 , y1 ) , (x2 , y2 ) , . . . , (xn , yn ).
1. To test H0 : XY = 0 against the alternative H1 : XY > 0, the
decision rule is:
r n2
Reject H0 if the observed t =
> t1,n2 .
1 r2
3154/3173


2. To test H0 : XY = 0 against the alternative H1 : XY < 0,
the decision rule is:
r n2
< t1,n2 .
Reject H0 if the observed t =
1 r2
3. And to test H0 : XY = 0 against the alternative
H1 : XY 6= 0, the decision rule is:

r n 2
> t1/2,n2 .
Reject H0 if the observed |t| =
1 r2
3155/3173

Fishers z-transformation for testing correlation hypotheses

We have n independent data points of the form:
(x1 , y1 ), (x2 , y2 ), . . . , (xn , yn ). Each point is random, drawn
from bivariate normal distribution with correlation .
We wish to test:
H0 : = 0
3156/3173
against
H1 : 6= 0 .
Our test statistic is based on r , the sample correlation

coefficient, together with:
1
log ((1 + r )/(1 r )) and
z =
2
1
0 =
log (1 + 0 )/(1 0 )) ,
2
and is the form: T = (z 0 ) n 3, which is

approximately normally distributed (prove not required).

Exercise

Basic idea
Assumptions
Exercise
Overview
Exercise
Appendix

Exercise
Exercise
Consider the previous exercise on slides 3129-3137 with the
data on slide 3133.
i. Test whether the correlation coefficient is positive.
Solution: r = 0.9795 (see slide 3135).
n2
13
Method 1: T = r 1r
= 0.9795
= 17.55. Using F&T
2
10.97952
page 163: t1p (13) = 17.55 for p = 0, i.e., p-value is almost
0, reject null hypothesis.

.

1.9795
Method 2: z = log 1+r
2 = log 0.0205
2 = 2.2845
1r
and 0 = 0. Hence, T = 2.2845 12 = 7.91 Thus p-value

equals zero.
3157/3173

Exercise
Exercise
ii. Test whether the slope parameter is larger than one.
Solution: test H0 : 1 = 1 v.s. H1 : 1 6= 1.
Test statistic:
T =
b 1
qP 1
n
2
2
s/
i=1 xi n x
1.137 1
q
56.57/ 3, 529, 325
64452
15
0.137
= 2.11
0.06488809
* using
b = 56.57 and b1 = 1.137, see slide 3134.
=
3158/3173
t10.027 (13) = 2.11. Accept null for level of significance of

5.5% or lower (note two-sided test).

Exercise
Exercise
iii. Test whether the intercept parameter is non-negative.
Solution: test H0 : 0 0 v.s. H1 : 0 > 0.
Test statistic:
b0 0
T =
s
1
n
i=1
x2
xi2 nx 2
69.329
r
56.57
Pn
1
15
( 6445
15 )
3,529,325 6445
15
69.329
= 2.20.
31.475
* using
b = 56.57 and b0 = 69.329, see slide 3134.
3159/3173
t0.0231 (13) = 2.20. Accept null only for level of significance

of 2.3% or lower (note one-sided test).

Exercise
Exercise
iv. Calculate the 95% Confidence interval of Y given that
X = 350.
Solution: We have:
ybi |xi = 350 = 69.329 + 1.137 350 =328.619, s = 56.57.
s
1
(x x0 )2
+ Pn
2
2
n
i=1 xi n x
v
u
2
6445
u
1
t
15 350
=56.57 1 +
+
2
15 3, 529, 325 6445
15
Var (yi ) =s
1+
=60.449
t0.975 (13) = 2.160. Thus

Pr (yi (328.61 2.160 60.449, 328.61 + 2.160 60.449)) =0.95
3160/3173
Pr (yi (198.04, 459.18)) =0.95

Appendix

Basic idea
Assumptions
Exercise
Overview
Exercise
Appendix

Appendix
Basic idea, used on slide 3106

Pn
b0
b1
b0
!
P
( ni=1 xi )2 b
0
1 Pn
n i=1 xi2
b0
P
b1 ni=1 xi
=
n
Pn
P
yi b0 ni=1 xi
i=1 xiP
=
n
2
i=1 xi
P

n
Pn
Pn
b Pn xi
i yi 0
i=1
i=1 xP
y
n
2
i=1 i
i=1 xi
i=1 xi
=
n
P
Pn
Pn
Pn
Pn
n
2
b
x
y
x
y
x
0
i=1 i
i=1 i
i=1 i i
i=1 i
i=1 xi
Pn
=
n i=1 xi2

Pn
Pn
Pn
Pn
2
i=1 yi
i=1 xi
i=1 xi yi
i=1 xi
=
.
Pn
Pn
2
2
n i=1 xi ( i=1 xi )
i=1 yi
*: (1 a/b)c = d/b (bc ac)/b = d/b c = d/(b a).
3161/3173

Appendix
From the previous slide we have:

Pn
b Pn xi
i=1 yi 1
i=1
b
0 =
n
Pn
P
yi b0 ni=1 xi
i=1 xiP
b
1 =
.
n
2
i=1 xi
thus parameter 1 is estimated by (used on slide 3106):
P

P
n
b1 Pn xi Pn xi
n ni=1 xi yi
y
i
i=1
i=1
i=1
Pn
b1 =
2
n i=1 xi
!
Pn
Pn
Pn
P
2
( i=1 xi )
n i=1 xi yi i=1 yi ni=1 xi
b
P
1 Pn
1 =
n i=1 xi2
n ni=1 xi2
Pn
Pn
Pn
n
i=1 xi yi
i=1 yi
i=1 xi
b
1 =
.
Pn
Pn
2
n i=1 xi ( i=1 xi )2
*: (1 a/b)c = d/b (bc ac)/b = d/b c = d/(b a).
3162/3173

Appendix
Parameter estimates II: Notation

More commonly, we express the parameter estimates in terms
of (squared) errors.
We have P
the following sum of squares (see F&T
P page 24-25):
Sx = Pni=1 xi and
Sy = Pni=1 yi
Sxx = Pni=1 xi2 and
Syy = ni=1 yi2
n
Sxy = Pi=1 xi yi
sxx = Pni=1 (xi x)2
= (n 1) sx2
n
2
syy = i=1 (yi y )
= (n 1) sy2
Pn
Pn
sxy = i=1 (xi x) i=1 (yi y ) = (n 1) sx,y ,
where sx2 (sx,y ) denotes sample (co-)variance. Moreover, we
denote:
sx,y
sxy
r=
=
.
sx sy
sxx syy
3163/3173

Appendix
Parameter estimates II (used on slide 3106)

We have:
n
b1 =
=
Pn
Pn
Pn
nSxy Sx Sy
i=1 xi yi
i=1 yi
i=1 xi
=
Pn
Pn
2
2
nSxx Sx2
n
x (
xi )

Pn i=1 i Pn i=1 Pn
n
i=1 xi yi
i=1 yi
i=1 xi n2

P
P
n
2 ( n x )2 1
n
x
i=1 i
i=1 i
n2
Pn
xi yi nxy
= Pi=1
n
x 2 nx 2
Pni=1 i
P
P
xi yi ni=1 xi y ni=1 xi y + nxy
i=1
Pn
Pn
Pn
=
2
2
i=1 xi +
i=1 x 2
i=1 xi x
Pn
(xi x) (yi y )
= i=1Pn
2
i=1 (xi x)
syy
e
sxy
sxy
sy
=
=
=r .
e
sxx
sxx sxx
syy
sx
3164/3173

Appendix
Parameter estimates II
Thus we have that b1 is the sample correlation coefficient
times the quotient of the sample standard deviation of Y and
X.
We refer to b1 as the slope of the regression line.
3165/3173
For b0 we have (used on slide 3106):

Pn
Pn
Pn
Pn
2
i=1 yi ( i=1 xi )
i=1 xi yi
i=1 xi
b
0 =
Pn
Pn
2
2
n i=1 xi ( i=1 xi )
Sy Sxx Sxy Sx
=
nSxx Sx2
Pn
b Pn xi
Sy
Sx
i=1 yi 1
i=1
=
=
b1 .
n
n
n
We refer to b0 as the intercept of the regression line.

Appendix

Basic idea
Assumptions
Exercise
Overview
Exercise
Appendix

Appendix
Unbiased regression parameters

For b0 we have (used on slide 3118):
"P
#
Pn
Pn
Pn
n
h i
2
y
x
x
y
x
i
i
i
i
i=1
i=1
i=1
i=1 i
E b0 = E
P
P
n ni=1 xi2 ( ni=1 xi )2
Pn
Pn
Pn
Pn
2
i=1 E [yi ]
i=1 xi
i=1 xi E [yi ]
i=1 xi
=
Pn
Pn
2
2
n i=1 xi ( i=1 xi )
Pn
Pn
Pn
Pn
2
i=1 xi (0 + 1 xi )
i=1 xi
i=1 (0 + 1 xi )
i=1 xi
=
Pn
P
2
n
n i=1 xi2 ( i=1 xi )

Pn
P
Pn
n
2
n0
i=1 (0 xi )
i=1 xi
i=1 xi
=
Pn
Pn
2
2
n i=1 xi ( i=1 xi )
= 0 .
h i
Home exercise: show the same for E b1 = 1 .
3166/3173

Appendix
Regression parameters uncertainty

Under the weak assumptions we have that the (co-)variance
of the parameters is given by:
P

2 ni=1 xi2
Sxx
2
b
s0 = Var 0 = Pn
= s21
Pn
2
2
n
n i=1 xi ( i=1 xi )

2
n
ns2
s21 = Var b1 = Pn
=
P
nSxx Sx2
n i=1 xi2 ( ni=1 xi )2
P

2 ni=1 xi
Sx
= s21
Cov b0 , b1 = Pn
Pn
2
2
n
n i=1 xi ( i=1 xi )
nSyy Sy2 b12 (nSxx Sx2 )
s2 = Var (b
) =
n(n 2)
Proof: See next slides.
3167/3173

Appendix
Proof sample variance slope (used on slide 3139 and 3145)

Note that we have:
Pn
Pn
(xi x) (yi y )
(xi x) yi
b1 = i=1
=
.
Pn
Pi=1
n
2
2
(x
(x
x)
x)
i
i
i=1
i=1
Then we have that:

Var b1
= Var
Pn
(xi x) yi
Pi=1
2
n
i=1 (xi x)
2
i=1 (xi x) Var (yi )
P
2
n
2
(x
x)
i=1 i
P
n
2 i=1 (xi x)2
= P
2
n
2
(x
x)
i
i=1
.X
2
n
= 2
i=1 (xi x) .
3168/3173
Pn

Appendix
Proof sample variance intercept

Using that:
b0 = y b1 x,
we have (used on slide 3142 and 3145):

Var b0 =Var y b1 x
!
n

X
=Var
yi /n + x 2 Var b1
i=1
n
i=1 Var
(yi ) /n2 + x 2 2
P
n n (xi x)2 + nx 2
= 2 Pni=1
P
n i=1 xi2 ( ni=1 xi )2
P
2 ni=1 xi2
= Pn
.
P
n i=1 xi2 ( ni=1 xi )2
3169/3173
.X
n
i=1 (xi
x)2

Appendix
Proof sample covariance intercept and slope (used on slide

3145)
Using that:
b0 = y b1 x,
we have:

Cov b0 , b1 =Cov y b1 x, b1

=Cov b1 x, b1

= x Cov b1 , b1
Pn

xi
= i=1 Var b1 .
n
3170/3173

Appendix

Basic idea
Assumptions
Exercise
Overview
Exercise
Appendix

Appendix

SST =
SSE + SSM =
=
n
n
X
X
(yi y )2 =
yi2 + y 2 2y yi
i=1
n
X
i=1
n
X
i=1
i=1
(yi ybi )2 +
n
X
(b
yi y )2
yi2 + ybi2 2yi ybi + ybi2 + y 2 2y ybi
i=1
n
X
yi2 2yi ybi + y 2 2y ybi

yi2 + 2b
i=1
n
X
yi2 + 2b
yi2 2(b
yi + i )b
yi + y 2 2y (yi i )
i=1
* using yi = ybi + i . Cont. next slide.
3171/3173

Appendix
Proof cont.:
SST =
n
X
i=1
SSE + SSM =
n
X
(yi y )2 =
n
X
yi2 + y 2 2y yi
i=1
yi2 + 2b
yi2 2(b
yi + i )b
yi + y 2 2y (yi i )
i=1
n
X
yi i + y 2 2y yi + 2y i
yi2 2b
i=1
n
X
=
yi2 + y 2 2y yi = SST
i=1
Pn
3172/3173
P
** using i=1 2y i = 2y ni=1 i = 0 and
Pn
yi i = 2nE[b
yi i ] = 2nE[b
yi ] E[i ] = 0, using ***
i=1 2b
independence.
Used on slide 3126.

Appendix
SSM =
=
n
X
(b
yi y )2
i=1
n
X
b0 + b1 xi y
2
i=1
n
X
(y b1 x) + b1 xi y
i=1
n
X
b12 (xi x)2
i=1
=b12 sxx
=b1 sxy
using b1 =
3173/3173
sxy
sxx
from slide 3164.
2

Week10 Annotated

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Week10 Annotated

Uploaded by

Copyright:

Available Formats

ACTL2002/ACTL5101 Probability and Statistics: Week 10

ACTL2002/ACTL5101 Probability and Statistics

ACTL2002/ACTL5101 Probability and Statistics: Week 10

ACTL2002/ACTL5101 Probability and Statistics: Week 10

Last nine weeks

ACTL2002/ACTL5101 Probability and Statistics: Week 10

ACTL2002/ACTL5101 Probability and Statistics: Week 10

Simple Linear Regression

ACTL2002/ACTL5101 Probability and Statistics: Week 10

y is called the endogenous variable (or response/dependent

Question: How to determine 0 and 1 ?

ACTL2002/ACTL5101 Probability and Statistics: Week 10

Hence, we use least squares estimates (LSE) for 0 and 1 :

The minimum is obtained by setting FOC equal to zero:

ACTL2002/ACTL5101 Probability and Statistics: Week 10

The LSE b0 and b1 are given by setting the FOC equal to

See F&T page 24, slides 3161-3165 (MLE on slide 3123):

ACTL2002/ACTL5101 Probability and Statistics: Week 10

Simple Linear Regression

ACTL2002/ACTL5101 Probability and Statistics: Week 10

Recall from week 3: the correlation coefficient between a pair

The value of the correlation coefficient lays between -1 and 1,

ACTL2002/ACTL5101 Probability and Statistics: Week 10

ACTL2002/ACTL5101 Probability and Statistics: Week 10

to estimate the population standard deviations X and Y ,

ACTL2002/ACTL5101 Probability and Statistics: Week 10

Thus the sample correlation coefficient is:

ACTL2002/ACTL5101 Probability and Statistics: Week 10

ACTL2002/ACTL5101 Probability and Statistics: Week 10

ACTL2002/ACTL5101 Probability and Statistics: Week 10

ACTL2002/ACTL5101 Probability and Statistics: Week 10

Simple Linear Regression

ACTL2002/ACTL5101 Probability and Statistics: Week 10

In order to preform linear regression we require:

Weak assumptions imply:

conditional covariance matrix y |X = x is independent of X .

ACTL2002/ACTL5101 Probability and Statistics: Week 10

Effect non-linear function

ACTL2002/ACTL5101 Probability and Statistics: Week 10

Effect changing variance

ACTL2002/ACTL5101 Probability and Statistics: Week 10

Effect binary choice variable

ACTL2002/ACTL5101 Probability and Statistics: Week 10

Unbiased regression parameters

Under the weak assumptions we have unbiased estimates (see

An (unbiased) estimate of 2 is given by:

ACTL2002/ACTL5101 Probability and Statistics: Week 10

Interpretation uncertainty slope

ACTL2002/ACTL5101 Probability and Statistics: Week 10

Interpretation uncertainty slope

ACTL2002/ACTL5101 Probability and Statistics: Week 10

Interpretation uncertainty intercept

ACTL2002/ACTL5101 Probability and Statistics: Week 10

Simple Linear Regression

ACTL2002/ACTL5101 Probability and Statistics: Week 10

Maximum Likelihood Estimates

ACTL2002/ACTL5101 Probability and Statistics: Week 10

Relation MLE and LSE

Note: the parameters b0 and b1 are the same as in case of

Thus we use LSE, which is the unbiased variant of the MLE.

ACTL2002/ACTL5101 Probability and Statistics: Week 10

ACTL2002/ACTL5101 Probability and Statistics: Week 10

Simple Linear Regression

ACTL2002/ACTL5101 Probability and Statistics: Week 10