You are on page 1of 26

ST102

Elementary Statistical Theory


Example workshop: Linear regression
Dr James Abdey
Department of Statistics
London School of Economics and Political Science

ST102 Elementary Statistical Theory

Dr James Abdey

LT 2014

Example workshop: Linear regression

Question 1
Suppose that you are given observations y1 , y2 , y3 and y4 such that
y1 = + + 1
y2 = + + 2
y3 = + 3
y4 = + 4 .
The variables i , i = 1, 2, 3, 4, are independent and normally
distributed with mean 0 and variance 2 . Find the least squares
estimators for the parameters and , verify that they are unbiased
and find the variance of the estimator for the parameter .

ST102 Elementary Statistical Theory

Dr James Abdey

LT 2014

Example workshop: Linear regression

Question 1
We start off with the sum of squares function, S:
S=

4
X

2i = (y1 )2 +(y2 +)2 +(y3 +)2 +(y4 ++)2 .

i=1

Now take partial derivatives:


S

= 2(y1 ) + 2(y2 + ) 2(y3 + )


+2(y4 + + )
= 2(y1 y2 + y3 y4 ) + 8

= 2(y1 ) 2(y2 + ) + 2(y3 + )


+2(y4 + + )
= 2(y1 + y2 y3 y4 ) + 8.

ST102 Elementary Statistical Theory

Dr James Abdey

LT 2014

Example workshop: Linear regression

Question 1

The least squares estimators


and are the solutions to S/ = 0
and S/ = 0.
Hence,

ST102 Elementary Statistical Theory

y1 y2 + y3 y4
4

y1 + y2 y3 y4
4

Dr James Abdey

LT 2014

Example workshop: Linear regression

Question 1

is unbiased since


y1 y2 + y3 y4
+++++
E(
) = E
=
=
4
4
is unbiased since


+++++
y1 + y2 y3 y4

=
=
E() = E
4
4
Finally,

Var(
) = Var

ST102 Elementary Statistical Theory

y1 y2 + y3 y4
4

Dr James Abdey

LT 2014


=

2
4 2
=
.
16
4

Example workshop: Linear regression

Question 2

The following linear regression model is proposed to represent the


linear relationship between two variables, y and x:
yi = xi + i
for i = 1, . . . , n, where i are independent and identically distributed
random variables with mean 0 and variance 2 , and is an unknown
parameter to be estimated.

ST102 Elementary Statistical Theory

Dr James Abdey

LT 2014

Example workshop: Linear regression

Question 2
(a) A proposed estimator for is
= min

n
X
(yi xi )2 .
i=1

Explain why this estimator is sensible.


(b) Another proposed estimator for is
= min

n
X

|yi xi |.

i=1

Explain why would be preferred to .

ST102 Elementary Statistical Theory

Dr James Abdey

LT 2014

Example workshop: Linear regression

Question 2

(c) Express explicitly as a function of yi and xi only.

(d) Using the estimator :


i. What is the value of if yi = xi for all i? What if they are the exact
opposites of each other, i.e. yi = xi for all i?
ii. Is it always the case that 1 1?

ST102 Elementary Statistical Theory

Dr James Abdey

LT 2014

Example workshop: Linear regression

Question 2

(a) The estimator is sensible because it is the least squares estimator,


which provides the best fit to the data in terms of minimising the
sum of squared residuals.
(b) The estimator is preferred to because the estimator is the least
it
absolute deviations estimator, which is also an option, but unlike ,
cannot be computed explicitly via differentiation as the function
f (x) = |x| is not differentiable at zero. Therefore, is harder to

compute than .

ST102 Elementary Statistical Theory

Dr James Abdey

LT 2014

Example workshop: Linear regression

Question 2
(c) We need to minimise a convex quadratic, so we can do that by
differentiating it and equating the derivative to zero. We obtain
2

n
X

i )xi = 0
(yi x

i=1

which yields
n
P

xi yi

i=1
n
P
i=1

.
xi2

(d)
i. If xi = yi , then = 1; if xi = yi , then = 1.
ii. Not true. A counterexample is to take n = 1, x1 = 1, y1 = 2.
ST102 Elementary Statistical Theory

Dr James Abdey

LT 2014

Example workshop: Linear regression

10

Question 3

Let {(xi , yi ), 1 i n} be observations from the linear regression


model
yi = 0 + 1 xi + i
(a) Suppose that the slope, 1 , is known. Find the LSE for the intercept,
0 .
(b) Suppose that the intercept, 0 , is known. Find the LSE for the slope,
1 .

ST102 Elementary Statistical Theory

Dr James Abdey

LT 2014

Example workshop: Linear regression

11

Question 3

(a) When 1 is known, let zi = yi 1 xi


The model then reduces to zi = 0 + i
The LSE 0 minimises

n
P

(zi 0 )2 , hence

i=1
n

1X
(yi 1 xi ).
0 = z =
n
i=1

ST102 Elementary Statistical Theory

Dr James Abdey

LT 2014

Example workshop: Linear regression

12

Question 3
(b) When 0 is known, we may write zi = yi 0 . The model is reduced
to zi = 1 xi + i . Note that
n
n 
2
X
X
(zi 1 xi )2 =
zi 1 xi + (1 1 )xi
i=1

i=1
n
n
X
X
=
(zi 1 xi )2 + (1 1 )2
xi2 + 2D,
i=1

where D = (1 1 )

n
P

i=1

xi (zi 1 xi ). If we choose 1 such that

i=1
n
X

xi (zi 1 xi ) = 0,

i.e.

i=1

ST102 Elementary Statistical Theory

n
X
i=1

Dr James Abdey

LT 2014

xi zi 1

n
X

xi2 = 0,

i=1

Example workshop: Linear regression

13

Question 3
Then,
n
X

n
n
n
X
X
X
2
2
2

(zi 1 xi ) =
(zi 1 xi ) +(1 1 )
xi
(zi 1 xi )2 ,
2

i=1

i=1

i=1

i=1

thus 1 is the LSE.


Note now,

n
P

1 =

i=1
n
P
i=1

ST102 Elementary Statistical Theory

n
P

xi zi
=
xi2

Dr James Abdey

xi (yi 0 )

i=1
n
P
i=1

LT 2014

.
xi2

Example workshop: Linear regression

14

Question 4
A researcher wants to investigate whether there is a significant link
between GDP per capita and average life expectancy in major cities.
Data have been collected in 30 major cities, yielding average GDPs
per capita x1 , . . . , x30 (in $000s) and average life expectancies
y1 , . . . , y30 (in years). The following linear regression model has been
proposed:
yi = 0 + 1 xi + i
where i are independent and N(0, 2 ). Some summary statistics are
30
X

xi = 620.35,

i=1

30
X

yi = 2123.00,

i=1
30
X

xi2 = 13495.62,

i=1

ST102 Elementary Statistical Theory

30
X

xi yi = 44585.1

i=1
30
X

yi2 = 151577.3

i=1

Dr James Abdey

LT 2014

Example workshop: Linear regression

15

Question 4

(a) Find the least-squares estimates of 0 and 1 and write down the
fitted regression model.
(b) Compute the 95% confidence interval for the slope coefficient 1 .
What can be concluded?
(c) Compute R 2 . What can be said about how good the model is?
(d) With x = 30, find a predictive interval which covers y with probability
0.95. With 97.5% confidence, what minimum average life expectancy
can a city expect once its GDP per capita reaches $30,000?

ST102 Elementary Statistical Theory

Dr James Abdey

LT 2014

Example workshop: Linear regression

16

Question 4

(a) We have
n
P

1 =

n
P

(xi x)(yi y )

i=1
n
P

=
(xi

x)2

i=1

xi yi n
x y

i=1
n
P
i=1

= 1.026
xi2

n
x2

0 = y 1 x = 49.55

Hence the fitted model is y = 49.55 + 1.026 x.

ST102 Elementary Statistical Theory

Dr James Abdey

LT 2014

Example workshop: Linear regression

17

Question 4
(b) We first need S.E.(1 ), for which we need
2 . For that, we need the
Residual SS (from the Total SS and the Regression SS). We compute
Total SS =

yi2 n
y 2 = 1339.67

!
Regression SS =

12

xi2

n
x

= 702.99

Residual SS = Total SS Regression SS = 636.68

2 = 636.68/28 = 22.74

S.E.(1 ) =
ST102 Elementary Statistical Theory

2
2
x2
i xi n

Dr James Abdey

LT 2014

1/2
= 0.184
Example workshop: Linear regression

18

Question 4

Hence the 95% confidence interval for 1 is


(1 t0.025,28 S.E.(1 ), 1 + t0.025,28 S.E.(1 )) = 1.026 2.05 0.184
= (0.65, 1.40).
The confidence interval does not contain zero, therefore we would
reject the hypothesis of 1 being zero at the 5% significance level.
Therefore, there does appear to be a significant link.

ST102 Elementary Statistical Theory

Dr James Abdey

LT 2014

Example workshop: Linear regression

19

Question 4

(c)
R2 =

Regression SS
702.99
=
= 0.52
Total SS
1339.67

The model can explain 52% of the variation in y . Whether or not the
model is good is subjective. It is not necessarily bad, although we
may be able to determine a better model with better explanatory
power, possibly using multiple linear regression.

ST102 Elementary Statistical Theory

Dr James Abdey

LT 2014

Example workshop: Linear regression

20

Question 4

(d) The predictive interval has the form


P 2
P

2 1/2
x

2x
x
+
nx
i
i
i
i
P
0 + 1 x t0.025,n2
1+
n( i xi2 n
x 2)
= (69.79, 90.87)

Therefore we can be 97.5% confident that the average life expectancy


lies above 69.79 years once GDP per capita reaches $30,000.

ST102 Elementary Statistical Theory

Dr James Abdey

LT 2014

Example workshop: Linear regression

21

Question 5
The following is partial Minitab regression output:
The regression equation is
y = 2.1071 + 1.1263x
Predictor
Constant
x

Coef
2.1071
1.1263

SE Coef
0.2321
0.0911

Analysis of Variance
SOURCE
Regression
Residual Error

DF
1
40

SS
2011.12
539.17

In addition, x = 1.56.
ST102 Elementary Statistical Theory

Dr James Abdey

LT 2014

Example workshop: Linear regression

22

Question 5

(a) Find an estimate for the error variance, 2 .


(b) Calculate the regression correlation coefficient R and the adjusted
correlation coefficient.
(c) Test at the 5% significance level if the slope in the regression model is
equal to 1 or not.
(d) For x = 0.8, find a 95% confidence interval for the expectation of y .

ST102 Elementary Statistical Theory

Dr James Abdey

LT 2014

Example workshop: Linear regression

23

Question 5
(a) Noting n = 40 + 1 + 1 = 42,

2 = (Residual SS)/(n 2) = 539.17/40 = 13.479


.
(b) Total SS = Regression SS + Residual SS = 2550.29.
r


2011.12
Regression SS 1/2
=
= 0.888
R=
Total SS
2550.29
.


(Residual SS)/(n 2) 1/2
Radj =
1
(Total SS)/(n 1)


539.17/40 1/2
=
1
= 0.885.
2550.29/41
ST102 Elementary Statistical Theory

Dr James Abdey

LT 2014

Example workshop: Linear regression

24

Question 5

(c) The test statistic is


T =

1 1
tn2 = t40
S.E.(1 )

under H0 : 1 = 1.
We reject H0 if |t| > 2.021 = t0.025, 40 .
As t = 0.1263/0.0911 = 1.386, we cannot reject the null hypothesis
that 1 = 1.

ST102 Elementary Statistical Theory

Dr James Abdey

LT 2014

Example workshop: Linear regression

25

Question 5
(d)

Pn

x)2 = (Regression SS)/12 = 2011.12/(1.1263)2 =


1585.367.
i=1 (xi

Pn

P
x)2 = ni=1 (xi x)2 + n(
x x)2 =
2
1585.367 + 42 (1.56 0.8) = 1609.626.
i=1 (xi

Hence the 95% confidence interval for E(y ) given x = 0.8 is

!1/2
Pn
2
(x

x)
i
Pi=1
0 + 1 x t0.025,n2

n nj=1 (xj x)2


r
13.479 1609.626
2.1071 + 1.1263 0.8 2.021
42 1585.367
3.0081 1.1536

(1.854, 4.162).

ST102 Elementary Statistical Theory

Dr James Abdey

LT 2014

Example workshop: Linear regression

26

You might also like