The Nature of Econometrics and The Modelling Process: Session 1

Session 1
The Nature of
Econometrics
and the Modelling
Process
1.1
Aims and Learning Objectives

By the end of this session students should be able to:
Appreciate that the study of econometrics is
important
Understand the nature of the modelling process
Construct and interpret simple (bivariate) and
multiple regression models
Understand the purpose and use of variable
transformations and the incorporation of qualitative
variables in the context of regression analysis
1.2
What is Econometrics?
Historical Context
Econometrics as Art or Science?
1.3
Economic Decisions
To use information effectively:
economic theory
economic data
economic
decisions
*Econometrics* helps us combine

economic theory and economic data .
1.4
Types of Data Cross Sectional

Cross-sectional data is a random sample
Each observation is an individual, firm, etc.
with information at a point in time
If the data is not a random sample, we have
a sample-selection problem
1.5
Types of Data Panel

Can pool random cross sections and treat
similar to a normal cross section. Will just need
to account for time differences.
Can follow the same random individual
observations over time known as panel data or
longitudinal data
1.6
Types of Data Time Series
Time series data has a separate
observation for each time period e.g.

stock prices
Since not a random sample, different

problems to consider
1.7
The Modelling Process

8 STAGE PROCESS
(1) Statement of theory/hypothesis

(2) Specification of mathematical model
(3) Specification of the econometric model
(4) Obtaining the data / conduct preliminary data
analysis
(5) Estimation of the econometric model and
interpretation of regression results
(6) Diagnostic Analysis
(7) Hypothesis testing
(8) Prediction/forecasting
1.8
EXAMPLE
Degree Performance in Economics
Number of factors determine performance:

Ability
Family background
Effort
Let us look at this from the perspective of
ability only and analyse this using a simple
bivariate (2 variable) model
1.9
Student Performance
STAGE 1- Statement of Theory /Hypothesis
Student Performance Function:
Student degree performance is determined by
ability
1.10
Student Performance
STAGE 2 - Mathematical Model
Performance, P, is some function of ability, A :
P = f(A)
(1)
In linear form:
Y = 1 + 2X
(2)
where Y = performance and X = ability

1.11
Student Performance
STAGE 3 - Econometric Model
Yi = 1 + 2Xi + Ui
(3)
Y = Performance - the dependent variable

X = Ability - explanatory variable
U = Disturbance (random error) term
For this particular example we will collect data on
year 2 average and final year average
1.12
Student Performance
STAGE 4a - Obtaining the data
Observed values of Y (yr 3 average)
and X (yr 2 average)
STAGE 4b Preliminary Data Analysis
Descriptive Statistics, graphical charts
(initial identification of possible errors:
outliers, influential observations and lurking
variables)
1.13
Student Performance
Year 3 Average Against Year 2 Average
80
70
60
Year 3 Average
50
40
30
20
10
0
10
20
30
40
50
60
70
80
90
Year 2 Average
What can we say about the relationship between year 3

average and year 2 average? Subjective judgement
1.14
Student Performance
STAGE 5a - Estimation of the Parameters

Y and X are the variables - known
1, 2 and U are the parameters unknown
Estimators versus Estimates
1.15
Student Performance
Simple (Bivariate) Least Squares (Linear) Regression
Y = f(X)
Question
How shall we fit a line to the points in a scatterplot?
Answer
Use the method of least squares (commonly known
as ordinary least squares or OLS for short)
The least squares regression line is the line that
minimises the sum of square deviations of the data
points
1.16
The Method of Ordinary Least Squares

Why use OLS?
(1) OLS is relatively easy to use
n
(2) The goal of minimising
e
i 1
is appropriate from a theoretical point

(3) OLS estimators have a number of useful
properties
1.17
Equation of a straight line:

Line of best fit is:
Y = + X
Yi 1 2 X i
The vertical deviation of the ith observation from

the line is:
Deviation = Observed Y minus Predicted Y
ei Yi Yi
ei Yi 1 2 X i
1.18
Student Performance
80
70
60
Year 3 Average
50
40
30
20
10
0
10
20
30
40
50
60
70
80
90
Year 2 Average
1.19
Y4
.
e4 {
Y3
Y2
e2 {..
Y1 .
.} e3
.
Yi 1 2 X i ei
X4
Y4
Y3
Y2
e1
}
.
Y1
X1
X2
X3
The relationship among Y, e and

the fitted regression line.
1.20
The method of least squares therefore involves

minimising:
n
X ) 2
e
(
Y
i i 1 2 i
i 1
i 1
The solution to this problem is.

A homework question!
How?
Closed form solution (formulas from a textbook)
Regression software (Excel, LimDep, EViews)
Optimization software
1.21
Student Performance
Excel Regression Output: Bivariate Case
SUMMARY OUTPUT
Regression Statistics
Multiple R
0.599443
R Square
0.359332
Adjusted R Square
0.356338
Standard Error
4.617821
Observations
216
Correlation coefficient
36% of the variation in year 3 is

explained by variation in year 2
Coefficient of determination
ANOVA
df
Regression
Residual
Total
Coefficients
Standard Error t Stat
P-value Lower 95% Upper 95%Lower 95.0%
Upper 95.0%
29.4029 2.799513 10.50286 4.45E-21 23.88475 34.92106 23.88475 34.92106
0.538088 0.049115 10.95566 1.84E-22 0.441277
0.6349 0.441277
0.6349
Intercept
Y2AV
SS
MS
F Significance F
1 2559.477 2559.477 120.0265 1.84E-22
214 4563.393 21.32427
215 7122.87
1
t statistic
P-value
1.22
Student Performance
LimDep Regression Output: Bivariate Case
Regression
diagnostics
Results of the
regression equation
1.23
Student Performance
STAGE 5b Interpreting the regression
results
Check the coefficient sign against
expectations (hypothesis)
(e.g. do we expect year 2 average to be positively
related to year 3 average?)
Check coefficient magnitude against
expectations (if any)
(e.g. are there any prior expectations on the possible
magnitude of the effect of year 2 average?)
1.24
Student Performance
STAGE 6 - Diagnostic Analysis
Is the model correctly specified?
- Correct functional form
- Omitted variables (or unnecessary ones)
Has the model got good diagnostic properties?

(validity of the probability distribution of the disturbance
term)
- Is the disturbance term uncorrelated with the
regressors?
- Are the values of the disturbance term
independently and normally
distributed with mean
2
zero and variance
1.25
Student Performance
STAGE 7 - Hypothesis Testing
Are the estimates statistically significant?
Do they conform with economic theory?
STAGE 8 - Forecasting/Prediction
For example, predicting the level of
performance for a particular ability level.
Possible policy implications?
1.26
Student Performance
Statistical Significance of the Regression
Coefficient
The significance test in regression analysis for
individual coefficients is the ttest.
The ttest is based on the tstatistic:
tk =
k
s k
Coefficient estimate
k = 1,2,..., K
And where
Standard error of the estimate
H0: k = 0
1.27
Student Performance
What does the t-statistic tell us?
Provides us with a numerical value for the
significance of the estimated coefficient against
some pre-determined hypothesis.
Two ways of evaluating this numerical value:
Compare this test statistic to a critical value
Calculate the P-value and compare this to the
significance level chosen
1.28
Student Performance
In the student performance example the regression results can
be compactly written as:
Y 3 AV 29.403 0.538 Y 2 AV
( 2.7995 )
(10.503)
( 4.45 E 21)
( 0.049 )
(10.956 )
(1.84 E 22 )
s.e.
t
p-value
Test procedure:
2 0
Hypothesis is that 2 = 0
t=
The formula for testing this is
s 2
Choose a significance level, 0.1, 0.05, 0.01
Calculate the p-value (see computer output)
Compare the p-value with the significance level:
- if p < , reject null hypothesis
- if p > , accept null hypothesis
1.29
Explaining Variation in Yi
^
Yi = 1 + 2Xi + ei
^
Yi = 1 + 2Xi
Explained variation:
Unexplained variation:
^
ei = Yi Yi = Yi 1 Xi
1.30
Y
Y4
.
e4 { Yi 1 2 X i ei
Y3
.} e3
.
Y4
Y4 Y
Y2
Y3
_
e2 {..
Y ------------------------------------------------------------
Y1 .
Y2
e1
}
.
Y1
X1
X2
X3
X4
1.31
_
2
(Y
Y)
=
(Y
Y)
+
e
i
i
i
2
Or
^2
yi = yi + ei2
2
Or
TSS = ESS + RSS
1.32
The Coefficient of Determination, R2
the proportion of the total variation in the dependent
variable Y that is explained by the variation in the
independent variable X.
2
R =
or
yi2
yi2
2
e
R = 1- i
yi2
2
ESS
TSS
=1- RSS
0 R 1
2
TSS
1.33
R2
1.34
Data Transformations
Linear transformations do not affect the shape of
the distribution (e.g. Celsius into Fahrenheit; miles
into kilometres; pounds into kilograms)
The linear transformation, e.g.
X a bX i
*
i
Produces observations X1*, X2*, X3*,, Xn*

In the context of regression analysis this means that
the slope and intercept coefficients will change but
the coefficient of determination, standard errors and
t-ratios will remain the same
1.35
To change the shape of a distribution we use

non-linear transformations
For example, a power, square root or logarithm
The purpose of using non-linear transformations
is that skewness can be reduced
In the context of regression analysis the logarithmic
transformation is commonly used:
Some relationships are nonlinear
Interpretation
1.36
Nonlinear Relations
Yi
.
.
.
.
. . .
.
. . .. .
. .
.
.. .
.
. .. . . . .
..
. .
.
..
.
. .
..
.
.
Xi
1.37
When the Dependent Variable and Explanatory

Variables are in Logarithmic Form
Economic Relationship as an exponential regression
Yi 1 X
2
2i
3 U i
3i
X e
Nonlinear in terms of parameters - cant use OLS

Therefore, need to apply a suitable transformation
ln Yi ln 1 2 ln X 2i 3 ln X 3i U i
1.38
When the Dependent Variable and Explanatory

Variables are in Logarithmic Form
ln Yi ln 1 2 ln X 2i 3 ln X 3i U i
Log-linear (log-log) model because both dependent
variable and independent variables are in logarithmic form
Coefficients measure the relative change in Y for a relative

change in X (partial effects). Therefore, can be interpreted
as elasticities
Often applied to production functions (following Cobb and

Douglas, 1928)
1.39
Regression Analysis : Multiple

Variables
There are a number of potential pitfalls with the simple
(bivariate) regression model:
Outliers - points that lie far from the fitted line
Lurking Variables - a variable that has an important
effect on the dependent variable but is not included
in the simple regression model
The number of outliers can be reduced by taking

a non-linear transformation. What about lurking
variables?
1.40
Regression Analysis : Multiple

Variables
Bivariate Regression: Y =fX)
Multiple Regression: Y = f(X1, X2,Xk)
Examples
House price = f(size, bedrooms, location)
Salary = f(education, experience, gender)
Lecture Attendance = f(lecture notes on web, level
of interest, time of day)
Yi = + 2X2i + 3X3i ++kXki + ei

1.41
Notation
The general form of the multiple regression
model:
n
Summation form
yi k xki ui
k 1
y X u
Or
Matrix Form
where
y1
x11 x1k
1
u1
y , X
, , u
y n
x n1 x nk
k
u n
and
x11
x nk 1 1
T
1.42
Student Performance
80
70
60
Year 3 Average
50
40
30
20
10
0
10
20
30
40
50
60
70
80
90
Year 2 Average
Is there a lurking variable in the scatter?

1.43
Incorporating Qualitative Explanatory Variable(s)

So far we have considered how quantitative variables can
be incorporated into a regression equation.
Qualitative variables (principally, but not exclusively nominal
and ordinal data) can be included in a regression model
Typical examples include: gender, race, religion,

marital status.
Qualitative variables which take one of two values are
known as dummy variables
1.44

Suppose from our student performance example we
have information on whether the student undertook an
industrial placement in their 3rd year of study.
This represents qualitative info, since we only have info
on whether the student did or did not do a placement.
The usual practice is to create a coded variable, with a
code of 0 for the absence of the effect and 1 for the
presence of the effect.
Consequently we could create a variable which = 1 if the
student undertook a placement and 0 if they did not.
1.45
Yi = + 2Xi + 3Di + Ui
Yi = Final year average
Xi = Year 2 average
Di = 1 if did a placement,
Di = 0 otherwise.
Policy: Placement has a positive
effect on final year performance
H0: 3 = 0
H1: 3 > 0
1.46
Yi = + 2Xi + 3Di + Ui
Placement: Yi = ( + 3) + 2Xi + Ui
No Placement: Yi = + 2Xi + Ui
Final
Yr Av
Yi
Placement
+ 3
No placement
Yr2 Av
Xi
1.47
Student Performance
LimDep Regression Output: Multiple Case
Impact of the dummy variable

1.48
The Practice of Econometrics

Economic theory
Econometric model
Data
Estimation
Specification testing and diagnostic testing
Yes
Is the model adequate?
No
Hypothesis testing
Policy: prediction and forecasting
1.49
Summary
In this session we have:
Outlined what econometrics is all about
Outlined the methodology of econometrics and
illustrated the process using economic
models
1.50
Next Week
A much fuller discussion of OLS. In particular,

we will look at the assumptions and properties
of what is known as the classical regression
model
..using matrix algebra!
1.51

The Nature of Econometrics and The Modelling Process: Session 1

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

The Nature of Econometrics and The Modelling Process: Session 1

Uploaded by

Copyright:

Available Formats

Session 1

Aims and Learning Objectives

*Econometrics* helps us combine

Types of Data Cross Sectional

Types of Data Panel

Types of Data Time Series

Time series data has a separate

observation for each time period e.g.

Since not a random sample, different

The Modelling Process

(1) Statement of theory/hypothesis

Number of factors determine performance:

where Y = performance and X = ability

Y = Performance - the dependent variable

What can we say about the relationship between year 3

STAGE 5a - Estimation of the Parameters

The Method of Ordinary Least Squares

(2) The goal of minimising

is appropriate from a theoretical point

Equation of a straight line:

The vertical deviation of the ith observation from

The relationship among Y, e and

The method of least squares therefore involves

The solution to this problem is.

36% of the variation in year 3 is

Has the model got good diagnostic properties?

Standard error of the estimate

Produces observations X1*, X2*, X3*,, Xn*

To change the shape of a distribution we use

When the Dependent Variable and Explanatory

Economic Relationship as an exponential regression

Nonlinear in terms of parameters - cant use OLS

When the Dependent Variable and Explanatory

Coefficients measure the relative change in Y for a relative

Often applied to production functions (following Cobb and

Regression Analysis : Multiple

The number of outliers can be reduced by taking

Regression Analysis : Multiple

Yi = + 2X2i + 3X3i ++kXki + ei

Is there a lurking variable in the scatter?

Incorporating Qualitative Explanatory Variable(s)

Typical examples include: gender, race, religion,

Incorporating Qualitative Explanatory Variable(s)

Incorporating Qualitative Explanatory Variable(s)

Impact of the dummy variable

The Practice of Econometrics

Is the model adequate?

A much fuller discussion of OLS. In particular,

You might also like

Econometrics helps us combine

Produces observations X1, X2, X3,, Xn