Professional Documents
Culture Documents
regression
Introduction
AS202 Applied
Statistical
Models
1
Numerical
Relationship
Causal
Relationship
Functional relation:
Regression Model:
Y f ( X ) a bX
Y a bX cZ
Y a bX errors
Y a b1 X 1 b2 X 2 errors
Type of relationship:
1. Linear relationship (refer to the slope, not regressor)
Simple: one regressor
yi 0 1 xi i
yi 0 1 x1 2 x2 L k xk i
2000
1500
1000
500
0
500
400
300
200
Price (RM1000)
100
10
20
30
40
y i 0 1 x 2 x 2 k x k i
yi 0 1 x1 2 x2 3 x3 4 x1 x2 5 x1 x3 6 x2 x3
7 x1 x2 x3 i
4
50
yi e
xi
xi
yi
i
xi
yi
1
1 e xi i
Some applications:
1. An economist wants to investigate the
relationship between the petrol price and
the inflation rate.
2. A sale manager is interested to predict
the total sale in next year based on the
number of staffs and square feet of
space in the store
3. A policy maker wants to identify the main
factors (e.g speed limit, road condition,
weather) contribute to the number of
road accident.
4. A scientist wants to know at which level
of sound pollution will affect human
health.
5. A computer scientist wants to compress
an image for minimum storage.
System
control
Data
description
Forecast
Data
reduction
Parameter
estimation
7
Simple linear
regression
Simple linear
regression model
AS202 Applied
Statistical
Models
8
Example:
You want to know if there is a relationship between the
monthly personal income and the age of a worker, and
then to forecast your monthly income when you are at
50 year-old. There are five workers in your study.
The two variables are:
1. Age of worker (years): Independent variable (X)
2. Monthly personal income (RM): Dependent
variable (Y)
3. Sample size (n): 5 workers
9
Age
34
2950
45
4000
29
2430
32
3000
23
1790
10
RM4565.87
?
?
?
YX
The gap between the points and the line are errors of the model.
12
i=1,2,..n
(Eq 1.1)
Where
13
Assumptions:
1) The error term i is normally distributed with mean
E(i)=0 and constant variance Var(i)=2 ;
2) The errors are uncorrelated with Cov i , j 0; i j
i ~ NID 0, 2
This implies that the dependent variable Y follows a
normal distribution with
E y | xx
Vari ; y x |
14
Some Properties:
15
Distribution of y at
x=23. The mean
E(y) is +(23) and
the standard
deviation is
y
x
E(y|x)=+x
Distribution of y at
x=45. The mean
E(y) is +(45) and
the standard
deviation is
16
Simple linear
regression
Least square
estimation
AS202 Applied
Statistical
Models
17
18
19
i=1,2,..n
20
21
22
23
Age (x)
Income (y)
xy
x2
34
2950
100300
1156
45
4000
180000
2025
29
2430
70470
841
32
3000
96000
1024
23
1790
41170
529
Sum
163
14170
487940
5575
xi 163,
yi 14170,
x 32.6,
y 2834
xi yi 487940,
2
x
i 5575
24
25
Properties of LSE:
1. The LSEs are linear combinations of the observations yi
n
n
xi x yi
S xy
1
ci yi
S xx i 1
S xx
i 1
0 y 1 x y x ci yi c
i 1
E 1 1
E 0 0
26
1
x
Var 1
Var 0 Var y 1 x
S xx
S xx
27
Summary
1. The SLR model
yi 0 1 xi i
i=1,2,,n
Simple linear
regression
Forecasting
using SLR
AS202 Applied
Statistical
Models
29
30
Age (x)
Income (y)
xy
x2
34
2950
100300
1156
45
4000
180000
2025
29
2430
70470
841
32
3000
96000
1024
23
1790
41170
529
Sum
163
14170
487940
5575
31
Extrapolation!
32
RM2575.21
Interpolation!
33
34
ei yi y i yi 0 1 xi
e
i 1
35
y y
i 1
i 1
xe
i i
i 1
y
i 1
ei 0
36
Simple linear
regression
Interval
estimation
AS202 Applied
Statistical
Models
37
Estimation of Variance 2
Method 1: based on several observations (replication) on y for
at least one value of x.
Method 2: when prior information concerning 2 is available.
Method 3: estimate based on the residual or2error sum of
n
n
squares.
SS Re s ei yi y i
i 1
i 1
2
SS Re s
MS Re s
n2
i i 410.77 99.53
yx
x
SS Re s ei2 yi y i 66063.02
i 1
i 1
2
MSRe s
SS Re s 66063.02
22021.01
n2
52
39
1 t ,n 2 se 1 1 1 t ,n 2 se 1
2
2
0 t ,n 2 se 0 0 0 t ,n 2 se 0
2
2
n 2 MS Re s
2
2,n 2
n 2 MS Re s
12 2,n 2
40
E y | x0 y| x0 0 1 x0
x0 x
2 1
where Var y| x0
S xx
y| x0 t 2,n 2
1 x x
MS Re s 0
n
S xx
E y | x t
0
y | x0
2, n 2
1 x x
MS Re s 0
n
S xx
41
Simple linear
regression
Prediction
interval
AS202 Applied
Statistical
Models
42
y 0 0 1 x0
Note that the random variable
1 x x
0, 1 0
n
S xx
y0 y 0 ~ N
y 0 t 2,n 2
1 x x
MS Re s 1 0
n
S xx
y y t
0
2, n 2
0
1 x x
MS Re s 1 0
n
S xx
43
Simple linear
regression
Hypothesis
testing
AS202 Applied
Statistical
Models
44
i ~ NID 0,
2
This implies that yi ~ NID 0 1 xi ,
2
Since 1 ci yi ~ NID 1 ,
S xx
i 1
H 0 : 1 10
H1 : 1 10
45
Z0
1 10
2
~ N 0,1
S xx
1 10
t0
~ t
,n 2
MS Re s
2
S xx
We reject the null hypothesis if t0 t ,n 2
2
46
MS Re s
se 1
S xx
H 0 : 0 00
H1 : 0 00
47
t0
0 00
MS Re s
0 00
~ t
2
,n 2
2
se 0
1 x
n S xx
t0 t
2
,n 2
48
49
50
Measures of Variation
Total variation is made up of two parts:
SST
SSR
Total Sum of
Squares
Regression Sum
of Squares
SST ( Yi Y )2
SSR ( Yi Y )2
SSE
Error Sum of
Squares
SSE ( Yi Yi )2
where:
13-51
Measures of Variation
SST = total sum of squares
(Total Variation)
13-52
Measures of Variation
(continued)
DCOVA
Y
Yi
SSE = (Yi - Yi )2
13-53
Xi
_
Y
MSR
FSTAT
MSE
MSR
SSR
k
MSE
SSE
n k 1
Simple linear
regression
AS202 Applied
Statistical
Models
Linear
Association
between X & Y
55
Coefficient of Determination :
R2
SS R
SS
1 Re s
SST
SST
SS Re s yi y i , SS R y i y , SST SS R SS Re s
i 1
i 1
SUMMARY OUTPUT
Regression Statistics
Multiple R
0.98747
R Square
0.97511
Adjusted R Square 0.96681
Standard Error
148.395
Observations
5
ANOVA
df
SS
MS
2587656.98
2587657 3
22021.0056
66063.02 2
2653720
Regression
Residual
Total
3
4
Coefficien Standard
ts
Error
Intercept
-410.77
Significance
F
F
117.50
9
0.00167965
Upper
t Stat
P-value Lower 95% 95%
1.33977800 0.2727 1386.50525
57
306.5981 1
6
9
564.959
10.8401372 0.0016 70.3120596
Simple linear
regression
Some remarks
(optional)
AS202 Applied
Statistical
Models
58
59
L yi , xi , 0 , 1 , 2
i 1
1
2 2
1
2
exp 2 yi 0 1 xi
2
61
ln L
0
0 , 1 ,
0,
ln L
1
0 , 1 ,
0,
ln L
0
2 2
0 , 1 ,
62