You are on page 1of 51

Session 1

The Nature of
Econometrics
and the Modelling
Process
1.1

Aims and Learning Objectives


By the end of this session students should be able to:
Appreciate that the study of econometrics is
important
Understand the nature of the modelling process
Construct and interpret simple (bivariate) and
multiple regression models
Understand the purpose and use of variable
transformations and the incorporation of qualitative
variables in the context of regression analysis
1.2

What is Econometrics?
Historical Context
Econometrics as Art or Science?

1.3

Economic Decisions
To use information effectively:
economic theory
economic data

economic
decisions

*Econometrics* helps us combine


economic theory and economic data .
1.4

Types of Data Cross Sectional


Cross-sectional data is a random sample
Each observation is an individual, firm, etc.
with information at a point in time
If the data is not a random sample, we have
a sample-selection problem

1.5

Types of Data Panel


Can pool random cross sections and treat
similar to a normal cross section. Will just need
to account for time differences.
Can follow the same random individual
observations over time known as panel data or
longitudinal data

1.6

Types of Data Time Series

Time series data has a separate

observation for each time period e.g.


stock prices

Since not a random sample, different


problems to consider

1.7

The Modelling Process


8 STAGE PROCESS

(1) Statement of theory/hypothesis


(2) Specification of mathematical model
(3) Specification of the econometric model
(4) Obtaining the data / conduct preliminary data
analysis
(5) Estimation of the econometric model and
interpretation of regression results
(6) Diagnostic Analysis
(7) Hypothesis testing
(8) Prediction/forecasting
1.8

EXAMPLE
Degree Performance in Economics

Number of factors determine performance:


Ability
Family background
Effort
Let us look at this from the perspective of
ability only and analyse this using a simple
bivariate (2 variable) model
1.9

Student Performance
STAGE 1- Statement of Theory /Hypothesis
Student Performance Function:
Student degree performance is determined by
ability

1.10

Student Performance
STAGE 2 - Mathematical Model
Performance, P, is some function of ability, A :

P = f(A)

(1)

In linear form:

Y = 1 + 2X

(2)

where Y = performance and X = ability


1.11

Student Performance
STAGE 3 - Econometric Model

Yi = 1 + 2Xi + Ui

(3)

Y = Performance - the dependent variable


X = Ability - explanatory variable
U = Disturbance (random error) term
For this particular example we will collect data on
year 2 average and final year average

1.12

Student Performance
STAGE 4a - Obtaining the data
Observed values of Y (yr 3 average)
and X (yr 2 average)
STAGE 4b Preliminary Data Analysis
Descriptive Statistics, graphical charts
(initial identification of possible errors:
outliers, influential observations and lurking
variables)
1.13

Student Performance
Year 3 Average Against Year 2 Average
80

70

60

Year 3 Average

50

40

30

20

10

0
10

20

30

40

50

60

70

80

90

Year 2 Average

What can we say about the relationship between year 3


average and year 2 average? Subjective judgement

1.14

Student Performance

STAGE 5a - Estimation of the Parameters


Y and X are the variables - known
1, 2 and U are the parameters unknown
Estimators versus Estimates

1.15

Student Performance
Simple (Bivariate) Least Squares (Linear) Regression
Y = f(X)
Question
How shall we fit a line to the points in a scatterplot?
Answer
Use the method of least squares (commonly known
as ordinary least squares or OLS for short)
The least squares regression line is the line that
minimises the sum of square deviations of the data
points

1.16

The Method of Ordinary Least Squares


Why use OLS?
(1) OLS is relatively easy to use
n

(2) The goal of minimising

e
i 1

is appropriate from a theoretical point


(3) OLS estimators have a number of useful
properties
1.17

Equation of a straight line:


Line of best fit is:

Y = + X

Yi 1 2 X i

The vertical deviation of the ith observation from


the line is:
Deviation = Observed Y minus Predicted Y

ei Yi Yi
ei Yi 1 2 X i

1.18

Student Performance
Year 3 Average Against Year 2 Average
80

70

60

Year 3 Average

50

40

30

20

10

0
10

20

30

40

50

60

70

80

90

Year 2 Average

1.19

Y4
.
e4 {

Y3
Y2
e2 {..

Y1 .

.} e3
.

Yi 1 2 X i ei

X4

Y4

Y3

Y2

e1
}
.

Y1

X1

X2

X3

The relationship among Y, e and


the fitted regression line.

1.20

The method of least squares therefore involves


minimising:
n

X ) 2
e

(
Y

i i 1 2 i
i 1

i 1

The solution to this problem is.


A homework question!
How?
Closed form solution (formulas from a textbook)
Regression software (Excel, LimDep, EViews)
Optimization software

1.21

Student Performance
Excel Regression Output: Bivariate Case
SUMMARY OUTPUT
Regression Statistics
Multiple R
0.599443
R Square
0.359332
Adjusted R Square
0.356338
Standard Error
4.617821
Observations
216

Correlation coefficient

36% of the variation in year 3 is


explained by variation in year 2

Coefficient of determination

ANOVA
df
Regression
Residual
Total

Coefficients
Standard Error t Stat
P-value Lower 95% Upper 95%Lower 95.0%
Upper 95.0%
29.4029 2.799513 10.50286 4.45E-21 23.88475 34.92106 23.88475 34.92106
0.538088 0.049115 10.95566 1.84E-22 0.441277
0.6349 0.441277
0.6349

Intercept
Y2AV

SS
MS
F Significance F
1 2559.477 2559.477 120.0265 1.84E-22
214 4563.393 21.32427
215 7122.87

1
t statistic

P-value

1.22

Student Performance
LimDep Regression Output: Bivariate Case

Regression
diagnostics

Results of the
regression equation

1.23

Student Performance
STAGE 5b Interpreting the regression
results
Check the coefficient sign against
expectations (hypothesis)
(e.g. do we expect year 2 average to be positively
related to year 3 average?)
Check coefficient magnitude against
expectations (if any)
(e.g. are there any prior expectations on the possible
magnitude of the effect of year 2 average?)
1.24

Student Performance
STAGE 6 - Diagnostic Analysis
Is the model correctly specified?
- Correct functional form
- Omitted variables (or unnecessary ones)

Has the model got good diagnostic properties?


(validity of the probability distribution of the disturbance
term)
- Is the disturbance term uncorrelated with the
regressors?
- Are the values of the disturbance term
independently and normally
distributed with mean
2
zero and variance
1.25

Student Performance
STAGE 7 - Hypothesis Testing
Are the estimates statistically significant?
Do they conform with economic theory?
STAGE 8 - Forecasting/Prediction
For example, predicting the level of
performance for a particular ability level.
Possible policy implications?
1.26

Student Performance
Statistical Significance of the Regression
Coefficient
The significance test in regression analysis for
individual coefficients is the ttest.
The ttest is based on the tstatistic:

tk =

k
s k

Coefficient estimate

k = 1,2,..., K

And where

Standard error of the estimate

H0: k = 0

1.27

Student Performance
What does the t-statistic tell us?
Provides us with a numerical value for the
significance of the estimated coefficient against
some pre-determined hypothesis.
Two ways of evaluating this numerical value:
Compare this test statistic to a critical value
Calculate the P-value and compare this to the
significance level chosen
1.28

Student Performance
In the student performance example the regression results can
be compactly written as:

Y 3 AV 29.403 0.538 Y 2 AV
( 2.7995 )
(10.503)

( 4.45 E 21)

( 0.049 )
(10.956 )

(1.84 E 22 )

s.e.
t
p-value

Test procedure:
2 0
Hypothesis is that 2 = 0
t=
The formula for testing this is
s 2
Choose a significance level, 0.1, 0.05, 0.01
Calculate the p-value (see computer output)
Compare the p-value with the significance level:
- if p < , reject null hypothesis
- if p > , accept null hypothesis

1.29

Explaining Variation in Yi
^

Yi = 1 + 2Xi + ei
^

Yi = 1 + 2Xi

Explained variation:
Unexplained variation:
^

ei = Yi Yi = Yi 1 Xi
1.30

Explaining Variation in Yi
Y

Y4
.
e4 { Yi 1 2 X i ei

Y3

.} e3
.

Y4

Y4 Y

Y2
Y3
_
e2 {..
Y ------------------------------------------------------------

Y1 .

Y2

e1
}
.

Y1

X1

X2

X3

X4

1.31

Explaining Variation in Yi
_

2
(Y
Y)
=
(Y
Y)
+
e
i
i
i
2

Or

^2

yi = yi + ei2
2

Or
TSS = ESS + RSS
1.32

Explaining Variation in Yi
The Coefficient of Determination, R2
the proportion of the total variation in the dependent
variable Y that is explained by the variation in the
independent variable X.
2

R =
or

yi2
yi2

2
e
R = 1- i
yi2
2

ESS
TSS

=1- RSS

0 R 1
2

TSS

1.33

Explaining Variation in Yi

R2

1.34

Data Transformations
Linear transformations do not affect the shape of
the distribution (e.g. Celsius into Fahrenheit; miles
into kilometres; pounds into kilograms)
The linear transformation, e.g.

X a bX i
*
i

Produces observations X1*, X2*, X3*,, Xn*


In the context of regression analysis this means that
the slope and intercept coefficients will change but
the coefficient of determination, standard errors and
t-ratios will remain the same

1.35

To change the shape of a distribution we use


non-linear transformations
For example, a power, square root or logarithm
The purpose of using non-linear transformations
is that skewness can be reduced
In the context of regression analysis the logarithmic
transformation is commonly used:
Some relationships are nonlinear
Interpretation
1.36

Nonlinear Relations

Yi

.
.
.
.
. . .
.
. . .. .
. .
.
.. .
.
. .. . . . .
..
. .
.

..
.
. .
..
.
.

Xi
1.37

When the Dependent Variable and Explanatory


Variables are in Logarithmic Form

Economic Relationship as an exponential regression

Yi 1 X

2
2i

3 U i
3i

X e

Nonlinear in terms of parameters - cant use OLS


Therefore, need to apply a suitable transformation

ln Yi ln 1 2 ln X 2i 3 ln X 3i U i
1.38

When the Dependent Variable and Explanatory


Variables are in Logarithmic Form

ln Yi ln 1 2 ln X 2i 3 ln X 3i U i
Log-linear (log-log) model because both dependent
variable and independent variables are in logarithmic form

Coefficients measure the relative change in Y for a relative


change in X (partial effects). Therefore, can be interpreted
as elasticities

Often applied to production functions (following Cobb and


Douglas, 1928)

1.39

Regression Analysis : Multiple


Variables
There are a number of potential pitfalls with the simple
(bivariate) regression model:
Outliers - points that lie far from the fitted line
Lurking Variables - a variable that has an important
effect on the dependent variable but is not included
in the simple regression model

The number of outliers can be reduced by taking


a non-linear transformation. What about lurking
variables?
1.40

Regression Analysis : Multiple


Variables
Bivariate Regression: Y =fX)
Multiple Regression: Y = f(X1, X2,Xk)
Examples
House price = f(size, bedrooms, location)
Salary = f(education, experience, gender)
Lecture Attendance = f(lecture notes on web, level
of interest, time of day)

Yi = + 2X2i + 3X3i ++kXki + ei


1.41

Notation
The general form of the multiple regression
model:
n
Summation form
yi k xki ui

k 1

y X u

Or

Matrix Form

where
y1
x11 x1k
1
u1
y , X
, , u
y n
x n1 x nk
k
u n

and

x11

x nk 1 1
T

1.42

Student Performance
Year 3 Average Against Year 2 Average
80

70

60

Year 3 Average

50

40

30

20

10

0
10

20

30

40

50

60

70

80

90

Year 2 Average

Is there a lurking variable in the scatter?


1.43

Incorporating Qualitative Explanatory Variable(s)


So far we have considered how quantitative variables can
be incorporated into a regression equation.
Qualitative variables (principally, but not exclusively nominal
and ordinal data) can be included in a regression model

Typical examples include: gender, race, religion,


marital status.
Qualitative variables which take one of two values are
known as dummy variables
1.44

Incorporating Qualitative Explanatory Variable(s)


Suppose from our student performance example we
have information on whether the student undertook an
industrial placement in their 3rd year of study.
This represents qualitative info, since we only have info
on whether the student did or did not do a placement.
The usual practice is to create a coded variable, with a
code of 0 for the absence of the effect and 1 for the
presence of the effect.
Consequently we could create a variable which = 1 if the
student undertook a placement and 0 if they did not.
1.45

Incorporating Qualitative Explanatory Variable(s)

Yi = + 2Xi + 3Di + Ui
Yi = Final year average
Xi = Year 2 average
Di = 1 if did a placement,
Di = 0 otherwise.
Policy: Placement has a positive
effect on final year performance

H0: 3 = 0
H1: 3 > 0

1.46

Yi = + 2Xi + 3Di + Ui

Placement: Yi = ( + 3) + 2Xi + Ui
No Placement: Yi = + 2Xi + Ui
Final
Yr Av

Yi

Placement

+ 3

No placement

Yr2 Av

Xi

1.47

Student Performance
LimDep Regression Output: Multiple Case

Impact of the dummy variable


1.48

The Practice of Econometrics


Economic theory
Econometric model
Data
Estimation
Specification testing and diagnostic testing

Yes

Is the model adequate?

No

Hypothesis testing
Policy: prediction and forecasting

1.49

Summary
In this session we have:
Outlined what econometrics is all about
Outlined the methodology of econometrics and
illustrated the process using economic
models

1.50

Next Week

A much fuller discussion of OLS. In particular,


we will look at the assumptions and properties
of what is known as the classical regression
model
..using matrix algebra!

1.51

You might also like