Professional Documents
Culture Documents
The Nature of
Econometrics
and the Modelling
Process
1.1
What is Econometrics?
Historical Context
Econometrics as Art or Science?
1.3
Economic Decisions
To use information effectively:
economic theory
economic data
economic
decisions
1.5
1.6
1.7
EXAMPLE
Degree Performance in Economics
Student Performance
STAGE 1- Statement of Theory /Hypothesis
Student Performance Function:
Student degree performance is determined by
ability
1.10
Student Performance
STAGE 2 - Mathematical Model
Performance, P, is some function of ability, A :
P = f(A)
(1)
In linear form:
Y = 1 + 2X
(2)
Student Performance
STAGE 3 - Econometric Model
Yi = 1 + 2Xi + Ui
(3)
1.12
Student Performance
STAGE 4a - Obtaining the data
Observed values of Y (yr 3 average)
and X (yr 2 average)
STAGE 4b Preliminary Data Analysis
Descriptive Statistics, graphical charts
(initial identification of possible errors:
outliers, influential observations and lurking
variables)
1.13
Student Performance
Year 3 Average Against Year 2 Average
80
70
60
Year 3 Average
50
40
30
20
10
0
10
20
30
40
50
60
70
80
90
Year 2 Average
1.14
Student Performance
1.15
Student Performance
Simple (Bivariate) Least Squares (Linear) Regression
Y = f(X)
Question
How shall we fit a line to the points in a scatterplot?
Answer
Use the method of least squares (commonly known
as ordinary least squares or OLS for short)
The least squares regression line is the line that
minimises the sum of square deviations of the data
points
1.16
e
i 1
Y = + X
Yi 1 2 X i
ei Yi Yi
ei Yi 1 2 X i
1.18
Student Performance
Year 3 Average Against Year 2 Average
80
70
60
Year 3 Average
50
40
30
20
10
0
10
20
30
40
50
60
70
80
90
Year 2 Average
1.19
Y4
.
e4 {
Y3
Y2
e2 {..
Y1 .
.} e3
.
Yi 1 2 X i ei
X4
Y4
Y3
Y2
e1
}
.
Y1
X1
X2
X3
1.20
X ) 2
e
(
Y
i i 1 2 i
i 1
i 1
1.21
Student Performance
Excel Regression Output: Bivariate Case
SUMMARY OUTPUT
Regression Statistics
Multiple R
0.599443
R Square
0.359332
Adjusted R Square
0.356338
Standard Error
4.617821
Observations
216
Correlation coefficient
Coefficient of determination
ANOVA
df
Regression
Residual
Total
Coefficients
Standard Error t Stat
P-value Lower 95% Upper 95%Lower 95.0%
Upper 95.0%
29.4029 2.799513 10.50286 4.45E-21 23.88475 34.92106 23.88475 34.92106
0.538088 0.049115 10.95566 1.84E-22 0.441277
0.6349 0.441277
0.6349
Intercept
Y2AV
SS
MS
F Significance F
1 2559.477 2559.477 120.0265 1.84E-22
214 4563.393 21.32427
215 7122.87
1
t statistic
P-value
1.22
Student Performance
LimDep Regression Output: Bivariate Case
Regression
diagnostics
Results of the
regression equation
1.23
Student Performance
STAGE 5b Interpreting the regression
results
Check the coefficient sign against
expectations (hypothesis)
(e.g. do we expect year 2 average to be positively
related to year 3 average?)
Check coefficient magnitude against
expectations (if any)
(e.g. are there any prior expectations on the possible
magnitude of the effect of year 2 average?)
1.24
Student Performance
STAGE 6 - Diagnostic Analysis
Is the model correctly specified?
- Correct functional form
- Omitted variables (or unnecessary ones)
Student Performance
STAGE 7 - Hypothesis Testing
Are the estimates statistically significant?
Do they conform with economic theory?
STAGE 8 - Forecasting/Prediction
For example, predicting the level of
performance for a particular ability level.
Possible policy implications?
1.26
Student Performance
Statistical Significance of the Regression
Coefficient
The significance test in regression analysis for
individual coefficients is the ttest.
The ttest is based on the tstatistic:
tk =
k
s k
Coefficient estimate
k = 1,2,..., K
And where
H0: k = 0
1.27
Student Performance
What does the t-statistic tell us?
Provides us with a numerical value for the
significance of the estimated coefficient against
some pre-determined hypothesis.
Two ways of evaluating this numerical value:
Compare this test statistic to a critical value
Calculate the P-value and compare this to the
significance level chosen
1.28
Student Performance
In the student performance example the regression results can
be compactly written as:
Y 3 AV 29.403 0.538 Y 2 AV
( 2.7995 )
(10.503)
( 4.45 E 21)
( 0.049 )
(10.956 )
(1.84 E 22 )
s.e.
t
p-value
Test procedure:
2 0
Hypothesis is that 2 = 0
t=
The formula for testing this is
s 2
Choose a significance level, 0.1, 0.05, 0.01
Calculate the p-value (see computer output)
Compare the p-value with the significance level:
- if p < , reject null hypothesis
- if p > , accept null hypothesis
1.29
Explaining Variation in Yi
^
Yi = 1 + 2Xi + ei
^
Yi = 1 + 2Xi
Explained variation:
Unexplained variation:
^
ei = Yi Yi = Yi 1 Xi
1.30
Explaining Variation in Yi
Y
Y4
.
e4 { Yi 1 2 X i ei
Y3
.} e3
.
Y4
Y4 Y
Y2
Y3
_
e2 {..
Y ------------------------------------------------------------
Y1 .
Y2
e1
}
.
Y1
X1
X2
X3
X4
1.31
Explaining Variation in Yi
_
2
(Y
Y)
=
(Y
Y)
+
e
i
i
i
2
Or
^2
yi = yi + ei2
2
Or
TSS = ESS + RSS
1.32
Explaining Variation in Yi
The Coefficient of Determination, R2
the proportion of the total variation in the dependent
variable Y that is explained by the variation in the
independent variable X.
2
R =
or
yi2
yi2
2
e
R = 1- i
yi2
2
ESS
TSS
=1- RSS
0 R 1
2
TSS
1.33
Explaining Variation in Yi
R2
1.34
Data Transformations
Linear transformations do not affect the shape of
the distribution (e.g. Celsius into Fahrenheit; miles
into kilometres; pounds into kilograms)
The linear transformation, e.g.
X a bX i
*
i
1.35
Nonlinear Relations
Yi
.
.
.
.
. . .
.
. . .. .
. .
.
.. .
.
. .. . . . .
..
. .
.
..
.
. .
..
.
.
Xi
1.37
Yi 1 X
2
2i
3 U i
3i
X e
ln Yi ln 1 2 ln X 2i 3 ln X 3i U i
1.38
ln Yi ln 1 2 ln X 2i 3 ln X 3i U i
Log-linear (log-log) model because both dependent
variable and independent variables are in logarithmic form
1.39
Notation
The general form of the multiple regression
model:
n
Summation form
yi k xki ui
k 1
y X u
Or
Matrix Form
where
y1
x11 x1k
1
u1
y , X
, , u
y n
x n1 x nk
k
u n
and
x11
x nk 1 1
T
1.42
Student Performance
Year 3 Average Against Year 2 Average
80
70
60
Year 3 Average
50
40
30
20
10
0
10
20
30
40
50
60
70
80
90
Year 2 Average
Yi = + 2Xi + 3Di + Ui
Yi = Final year average
Xi = Year 2 average
Di = 1 if did a placement,
Di = 0 otherwise.
Policy: Placement has a positive
effect on final year performance
H0: 3 = 0
H1: 3 > 0
1.46
Yi = + 2Xi + 3Di + Ui
Placement: Yi = ( + 3) + 2Xi + Ui
No Placement: Yi = + 2Xi + Ui
Final
Yr Av
Yi
Placement
+ 3
No placement
Yr2 Av
Xi
1.47
Student Performance
LimDep Regression Output: Multiple Case
Yes
No
Hypothesis testing
Policy: prediction and forecasting
1.49
Summary
In this session we have:
Outlined what econometrics is all about
Outlined the methodology of econometrics and
illustrated the process using economic
models
1.50
Next Week
1.51