Examples PDF

ST102
Elementary Statistical Theory

Example workshop: Linear regression
Dr James Abdey
Department of Statistics
London School of Economics and Political Science
ST102 Elementary Statistical Theory
Dr James Abdey
LT 2014
Question 1
Suppose that you are given observations y1 , y2 , y3 and y4 such that
y1 = + + 1
y2 = + + 2
y3 = + 3
y4 = + 4 .
The variables i , i = 1, 2, 3, 4, are independent and normally
distributed with mean 0 and variance 2 . Find the least squares
estimators for the parameters and , verify that they are unbiased
and find the variance of the estimator for the parameter .
Dr James Abdey
LT 2014
Question 1
We start off with the sum of squares function, S:
S=
4
X
2i = (y1 )2 +(y2 +)2 +(y3 +)2 +(y4 ++)2 .
i=1
Now take partial derivatives:

S
= 2(y1 ) + 2(y2 + ) 2(y3 + )

+2(y4 + + )
= 2(y1 y2 + y3 y4 ) + 8
= 2(y1 ) 2(y2 + ) + 2(y3 + )

+2(y4 + + )
= 2(y1 + y2 y3 y4 ) + 8.
Dr James Abdey
LT 2014
Question 1
The least squares estimators

and are the solutions to S/ = 0
and S/ = 0.
Hence,
y1 y2 + y3 y4
4
y1 + y2 y3 y4
4
Dr James Abdey
LT 2014
Question 1
is unbiased since

y1 y2 + y3 y4
+++++
E(
) = E
=
=
4
4
is unbiased since

+++++
y1 + y2 y3 y4
=
=
E() = E
4
4
Finally,

Var(
) = Var
y1 y2 + y3 y4
4
Dr James Abdey
LT 2014

=
2
4 2
=
.
16
4
Question 2
The following linear regression model is proposed to represent the

linear relationship between two variables, y and x:
yi = xi + i
for i = 1, . . . , n, where i are independent and identically distributed
random variables with mean 0 and variance 2 , and is an unknown
parameter to be estimated.
Dr James Abdey
LT 2014
Question 2
(a) A proposed estimator for is
= min
n
X
(yi xi )2 .
i=1
Explain why this estimator is sensible.

(b) Another proposed estimator for is
= min
n
X
|yi xi |.
i=1
Explain why would be preferred to .
Dr James Abdey
LT 2014
Question 2
(c) Express explicitly as a function of yi and xi only.
(d) Using the estimator :

i. What is the value of if yi = xi for all i? What if they are the exact
opposites of each other, i.e. yi = xi for all i?
ii. Is it always the case that 1 1?
Dr James Abdey
LT 2014
Question 2
(a) The estimator is sensible because it is the least squares estimator,

which provides the best fit to the data in terms of minimising the
sum of squared residuals.
(b) The estimator is preferred to because the estimator is the least
it
absolute deviations estimator, which is also an option, but unlike ,
cannot be computed explicitly via differentiation as the function
f (x) = |x| is not differentiable at zero. Therefore, is harder to
compute than .
Dr James Abdey
LT 2014
Question 2
(c) We need to minimise a convex quadratic, so we can do that by
differentiating it and equating the derivative to zero. We obtain
2
n
X
i )xi = 0
(yi x
i=1
which yields
n
P
xi yi
i=1
n
P
i=1
.
xi2
(d)
i. If xi = yi , then = 1; if xi = yi , then = 1.
ii. Not true. A counterexample is to take n = 1, x1 = 1, y1 = 2.
Dr James Abdey
LT 2014
10
Question 3
Let {(xi , yi ), 1 i n} be observations from the linear regression

model
yi = 0 + 1 xi + i
(a) Suppose that the slope, 1 , is known. Find the LSE for the intercept,
0 .
(b) Suppose that the intercept, 0 , is known. Find the LSE for the slope,
1 .
Dr James Abdey
LT 2014
11
Question 3
(a) When 1 is known, let zi = yi 1 xi

The model then reduces to zi = 0 + i
The LSE 0 minimises
n
P
(zi 0 )2 , hence
i=1
n
1X
(yi 1 xi ).
0 = z =
n
i=1
Dr James Abdey
LT 2014
12
Question 3
(b) When 0 is known, we may write zi = yi 0 . The model is reduced
to zi = 1 xi + i . Note that
n
n
2
X
X
(zi 1 xi )2 =
zi 1 xi + (1 1 )xi
i=1
i=1
n
n
X
X
=
(zi 1 xi )2 + (1 1 )2
xi2 + 2D,
i=1
where D = (1 1 )
n
P
i=1
xi (zi 1 xi ). If we choose 1 such that
i=1
n
X
xi (zi 1 xi ) = 0,
i.e.
i=1
n
X
i=1
Dr James Abdey
LT 2014
xi zi 1
n
X
xi2 = 0,
i=1
13
Question 3
Then,
n
X
n
n
n
X
X
X
2
2
2
(zi 1 xi ) =
(zi 1 xi ) +(1 1 )
xi
(zi 1 xi )2 ,
2
i=1
i=1
i=1
i=1
thus 1 is the LSE.

Note now,
n
P
1 =
i=1
n
P
i=1
n
P
xi zi
=
xi2
Dr James Abdey
xi (yi 0 )
i=1
n
P
i=1
LT 2014
.
xi2
14
Question 4
A researcher wants to investigate whether there is a significant link
between GDP per capita and average life expectancy in major cities.
Data have been collected in 30 major cities, yielding average GDPs
per capita x1 , . . . , x30 (in $000s) and average life expectancies
y1 , . . . , y30 (in years). The following linear regression model has been
proposed:
yi = 0 + 1 xi + i
where i are independent and N(0, 2 ). Some summary statistics are
30
X
xi = 620.35,
i=1
30
X
yi = 2123.00,
i=1
30
X
xi2 = 13495.62,
i=1
30
X
xi yi = 44585.1
i=1
30
X
yi2 = 151577.3
i=1
Dr James Abdey
LT 2014
15
Question 4
(a) Find the least-squares estimates of 0 and 1 and write down the
fitted regression model.
(b) Compute the 95% confidence interval for the slope coefficient 1 .
What can be concluded?
(c) Compute R 2 . What can be said about how good the model is?
(d) With x = 30, find a predictive interval which covers y with probability
0.95. With 97.5% confidence, what minimum average life expectancy
can a city expect once its GDP per capita reaches $30,000?
Dr James Abdey
LT 2014
16
Question 4
(a) We have
n
P
1 =
n
P
(xi x)(yi y )
i=1
n
P
=
(xi
x)2
i=1
xi yi n
x y
i=1
n
P
i=1
= 1.026
xi2
n
x2
0 = y 1 x = 49.55
Hence the fitted model is y = 49.55 + 1.026 x.
Dr James Abdey
LT 2014
17
Question 4
(b) We first need S.E.(1 ), for which we need
2 . For that, we need the
Residual SS (from the Total SS and the Regression SS). We compute
Total SS =
yi2 n
y 2 = 1339.67
!
Regression SS =
12
xi2
n
x
= 702.99
Residual SS = Total SS Regression SS = 636.68
2 = 636.68/28 = 22.74
S.E.(1 ) =
2
2
x2
i xi n
Dr James Abdey
LT 2014
1/2
= 0.184
18
Question 4
Hence the 95% confidence interval for 1 is

(1 t0.025,28 S.E.(1 ), 1 + t0.025,28 S.E.(1 )) = 1.026 2.05 0.184
= (0.65, 1.40).
The confidence interval does not contain zero, therefore we would
reject the hypothesis of 1 being zero at the 5% significance level.
Therefore, there does appear to be a significant link.
Dr James Abdey
LT 2014
19
Question 4
(c)
R2 =
Regression SS
702.99
=
= 0.52
Total SS
1339.67
The model can explain 52% of the variation in y . Whether or not the
model is good is subjective. It is not necessarily bad, although we
may be able to determine a better model with better explanatory
power, possibly using multiple linear regression.
Dr James Abdey
LT 2014
20
Question 4
(d) The predictive interval has the form

P 2
P

2 1/2
x
2x
x
+
nx
i
i
i
i
P
0 + 1 x t0.025,n2
1+
n( i xi2 n
x 2)
= (69.79, 90.87)
Therefore we can be 97.5% confident that the average life expectancy

lies above 69.79 years once GDP per capita reaches $30,000.
Dr James Abdey
LT 2014
21
Question 5
The following is partial Minitab regression output:
The regression equation is
y = 2.1071 + 1.1263x
Predictor
Constant
x
Coef
2.1071
1.1263
SE Coef
0.2321
0.0911
Analysis of Variance
SOURCE
Regression
Residual Error
DF
1
40
SS
2011.12
539.17
In addition, x = 1.56.
Dr James Abdey
LT 2014
22
Question 5
(a) Find an estimate for the error variance, 2 .

(b) Calculate the regression correlation coefficient R and the adjusted
correlation coefficient.
(c) Test at the 5% significance level if the slope in the regression model is
equal to 1 or not.
(d) For x = 0.8, find a 95% confidence interval for the expectation of y .
Dr James Abdey
LT 2014
23
Question 5
(a) Noting n = 40 + 1 + 1 = 42,
2 = (Residual SS)/(n 2) = 539.17/40 = 13.479

.
(b) Total SS = Regression SS + Residual SS = 2550.29.
r

2011.12
Regression SS 1/2
=
= 0.888
R=
Total SS
2550.29
.

(Residual SS)/(n 2) 1/2
Radj =
1
(Total SS)/(n 1)

539.17/40 1/2
=
1
= 0.885.
2550.29/41
Dr James Abdey
LT 2014
24
Question 5
(c) The test statistic is

T =
1 1
tn2 = t40
S.E.(1 )
under H0 : 1 = 1.
We reject H0 if |t| > 2.021 = t0.025, 40 .
As t = 0.1263/0.0911 = 1.386, we cannot reject the null hypothesis
that 1 = 1.
Dr James Abdey
LT 2014
25
Question 5
(d)
Pn
x)2 = (Regression SS)/12 = 2011.12/(1.1263)2 =

1585.367.
i=1 (xi
Pn
P
x)2 = ni=1 (xi x)2 + n(
x x)2 =
2
1585.367 + 42 (1.56 0.8) = 1609.626.
i=1 (xi
Hence the 95% confidence interval for E(y ) given x = 0.8 is
!1/2
Pn
2
(x
x)
i
Pi=1
0 + 1 x t0.025,n2
n nj=1 (xj x)2

r
13.479 1609.626
2.1071 + 1.1263 0.8 2.021
42 1585.367
3.0081 1.1536
(1.854, 4.162).
Dr James Abdey
LT 2014
26

Examples PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Examples PDF

Uploaded by

Copyright:

Available Formats

ST102

Elementary Statistical Theory

ST102 Elementary Statistical Theory

Example workshop: Linear regression

ST102 Elementary Statistical Theory

Example workshop: Linear regression

2i = (y1 )2 +(y2 +)2 +(y3 +)2 +(y4 ++)2 .

Now take partial derivatives:

= 2(y1 ) + 2(y2 + ) 2(y3 + )

= 2(y1 ) 2(y2 + ) + 2(y3 + )

ST102 Elementary Statistical Theory

Example workshop: Linear regression

The least squares estimators

ST102 Elementary Statistical Theory

Example workshop: Linear regression

ST102 Elementary Statistical Theory

Example workshop: Linear regression

The following linear regression model is proposed to represent the

ST102 Elementary Statistical Theory

Example workshop: Linear regression

Explain why this estimator is sensible.

Explain why would be preferred to .

ST102 Elementary Statistical Theory

Example workshop: Linear regression

(c) Express explicitly as a function of yi and xi only.

(d) Using the estimator :

ST102 Elementary Statistical Theory

Example workshop: Linear regression

(a) The estimator is sensible because it is the least squares estimator,

ST102 Elementary Statistical Theory

Example workshop: Linear regression

Example workshop: Linear regression

Let {(xi , yi ), 1 i n} be observations from the linear regression

ST102 Elementary Statistical Theory

Example workshop: Linear regression

(a) When 1 is known, let zi = yi 1 xi

ST102 Elementary Statistical Theory

Example workshop: Linear regression

xi (zi 1 xi ). If we choose 1 such that

ST102 Elementary Statistical Theory

Example workshop: Linear regression

thus 1 is the LSE.

ST102 Elementary Statistical Theory

Example workshop: Linear regression

ST102 Elementary Statistical Theory

Example workshop: Linear regression

ST102 Elementary Statistical Theory

Example workshop: Linear regression

Hence the fitted model is y = 49.55 + 1.026 x.

ST102 Elementary Statistical Theory

Example workshop: Linear regression

Residual SS = Total SS Regression SS = 636.68

Hence the 95% confidence interval for 1 is

ST102 Elementary Statistical Theory

Example workshop: Linear regression

ST102 Elementary Statistical Theory

Example workshop: Linear regression

(d) The predictive interval has the form

Therefore we can be 97.5% confident that the average life expectancy

ST102 Elementary Statistical Theory

Example workshop: Linear regression

Example workshop: Linear regression

(a) Find an estimate for the error variance, 2 .