You are on page 1of 41

Functional Form

Rmulo A. Chumacero

Functional Form

Motivation
What?: Extend OLS framework
Why?: Crucial in practice
How?: Using what we have learned

Outline

1. Scaling
2. Dummy variables / Time trends
3. Possible nonlinearities
4. Diagnostics tests
5. Measurement errors in variables
6. Omitting relevant variables
7. Including irrelevant variables
8. Multicollinearity
9. Influential analysis
10. Model selection
11. Specification searches
1

Functional Form

Eects of Scaling
Data are not always conveniently scaled
Changing the scale of
= 1 + 2 + = 1 + ( 2)
= 1 + 2 +

Changes magnitude of coecient by


Changes standard error of coecient by same factor
-statics not aect
Everything else remains unchanged

Changing the scale of

= 1 + 2 +

= 1 + 2 +

= 1 + 2 +

Changes magnitude of ALL coecients by


Changes standard error of coecient by same factor
-statics not aect
Scaled residuals and changes SER by the same factor
Everything else remains unchanged
2

Functional Form

Dummy Variables
Example: E ( |). Equivalent ways to model this:
Define a dummy variable

1 female
0 male
Thus, = 0 + 11 + . 0 = E ( |) and 0 + 1 = E ( | )
Alternatively, define the variable

0 female
2 =
1 male
1 =

Thus, = 0 + 12 + . 0 = E ( | ) and 0 + 1 = E ( |)
Or = 11 + 22 + . 1 = E ( | ) and 2 = E ( |)
Standard mistake: include an intercept, 1 and 2. Perfectly collinear (1 + 2 = 1)
If equation of interest is E ( | ):
= 0 + 11 + 2 +
Intercept eect for gender but return on education is the same
A regression model allowing for slope dierences (interactions) is
= 0 + 11 + 2 + 31 +

Functional Form

Dummy Variables
It is interesting to see how our estimators algebraically handle dummy variables
= 1 1 + 2 2 +
By construction 10 2 = 0. Thus,

1 X
0
0
=1 1
b
=
1 = 1
1 = (11) 1 = P

1 =1
=1 1
#
0
" b2
1

(11)
0
1 0
b =
b
V
=
b2
2
0
(20 2)1
0 b 2
1

1 X 2

b =
=1
2

Another candidate for the variance-covariance estimate is


#
" b21

1
2
0
0
1 0
b = b1 (11)
V
=
2
0

b22 (20 2)1


0 b22

1 X
2

b =

b2 for = 1 2
=1
4

Functional Form

Time Trend
Many economic variables exhibit trends
Consider a series growing at a constant rate:
= 0 (1 + )
where is the rate of growth per period
Taking logs (ln) of both sides:
ln = ln 0 + ln (1 + )
Adding a shock and changing notation:
= 1 + 2 +
where = ln 1 = ln 0 2 = ln (1 + ) '
Easily checked by taking the first dierence and ignoring the disturbance:
= 2
Important: choice of unit for is irrelevant if it used consistently

Functional Form

Seasonality
Economic time series:

= + + +
= 0 + 11 + 22 + 33 + 4 +

E ( |First quarter) = 0 + 1 + 4; E ( |Fourth quarter) = 0 + 4.


Seasonally adjusted series

5.4

b
b
b
= 11 + 22 + 33

5.2
5.0

4.8
4.6

4.4
87

88

89

GDP

90

91

92

93

94

95

96

97

Seasonally Adjusted GDP

Figure 1: Use of Quarterly Dummies

Detrended
6

(1)

Functional Form

Nonlinearity in Regressors
We are interested in E ( | ) = () R and form of is unknown
Common approach: polynomial approximation:
= 0 + 1 + 22 + + +

2
Let = 0 1 and = 1 this is = 0 + which is LRM.
Typically, is kept small
If R2, a simple quadratic approximation is
= 0 + 11 + 22 + 321 + 422 + 512 +
As dimensionality increases, approximations become non-parsimonious
Most applications use quadratic terms, some add cubics without interactions (or neural nets,
Fourier series, splines, wavelets, etc):
= 0 + 11 + 22 + 321 + 422 + 512 + 631 + 732 +
Since nonlinear models are linear in parameters, they can be estimated by OLS, and inference
is conventional

Functional Form

Nonlinearity in Regressors
However, model is nonlinear, so interpretation must take this into account
For example, in cubic model, slope with respect to 1 is
E ( | )
= 1 + 2 31 + 52 + 3 621
1
which is a function of 1 and 2, making reporting of the slope dicult
Important to report slopes for dierent values of the regressors, chosen to illustrate the point
of interest
In other applications, average slope may be sucient. Two obvious candidates:
Derivative evaluated at sample averages

E ( | )
= 1 + 2 31 + 52 + 3 621

1
=
and average derivative

1 X E ( | )
1X 2
= 1 + 2 31 + 52 + 3 6

=1
1
=1 1

Functional Form

Transformations
Even simplest model usually considers nonlinearities
Example: Cobb-Douglas production function
=
Take logs to (2):

= + +

Or: = +
Instances in which no transformation can be used: CES:
1


= + (1 ) +
OLS cannot be applied (NLLS)

(2)

Functional Form

Some Useful Functions


Choice of functional form aects interpretation of results
Use of good analytic skills and experience
Name
Linear
Quadratic
Cubic
Log-log
Log-linear
Linear-log

Function
= 1 + 2
= 1 + 22
= 1 + 23
ln = 1 + 2 ln
ln = 1 + 2
= 1 + 2 ln

If ln is used 0 is required

10

Slope=
2
2 2
3 22
2
2
2 1

Elasticity
2
2
2 2
3
3 2
2
2
2 1

Functional Form

ln ( )

versus as Dependent Variable

b +b
b +b
Econometrician can estimate =
or ln( ) =
(or both). Which is preferable?

Plain truth: either is fine, in the sense that E ( | ) and E (ln () | ) are well-defined (so
long as 0)
To select one specification over the other, requires the
of additional structure (as
imposition

conditional expectation is linear in , and N 0 2 )


Some good reasons for preferring the ln ( ) over regression

E (ln () | ) may be roughly linear in , while E ( | ) may be nonlinear, and linear


models are easier to report and interpret
= ln () E (ln () | ) may be less heteroskedastic than the errors from the linear
specification (although the reverse may be true!)
\
As long as 0, range of ln
() is well-defined in R; this is not the case for b which
b may produce b 0 (Tobit)
for some values of

If distribution of is skewed, E ( | ) may not be a useful measure of central tendency,


and estimates will be influenced by extreme observations (outliers); ln E (ln () | )
may be a better measure of central tendency, and more interesting to estimate and
report
Careful when the ln specification is used if interested in obtaining E ( | ); Jensen inequality:
exp [E (ln () | )] 6= E [exp (ln () | )]
11

Functional Form

Testing for Omitted Nonlinearity


Simple test: add nonlinear functions of , and test significance

e + 0
e by OLS, and
Let = () denote nonlinear functions of . Fit = 0
e +
test H0 : = 0

Ramsey RESET test. The null model is

= 0 +
2
b
= ...
b
e + 0
= 0
e
e +
by OLS, and form the Wald statistic 21 for H0 : = 0

(3)

Typically = 2 3 or 4 seem to work best.


Works well as test of functional form against smooth alternatives. Powerful at detecting
single-index models of the form
= (0) +
where () is a smooth link function. To see why this is the case, note that (3) may
be written as
3

2
0e
0b
0b
b
= +
e1 +
e2 + + 0
e1 +
e

which has essentially approximated () by a -th order polynomial


12

Functional Form

Are Errors Normally Distributed?


Normality of is not crucial for desired properties of OLS estimator (including inference)
However, it leads to use of exact distribution in small samples, helps in forecast, etc
Jarque-Bera test: most popular test for normality
"
#
2

( 3) 2
=
2 +
2
6
4
(Skewness) is a measure of asymmetry of the distribution around the mean
3

3
1 X
1 X e
=
==
=1

=1

= 0 symmetric, 0 long right tail, 0 long left tail


(Kurtosis) measures the peakedness or flatness of the distribution
4

3
1 X e
1 X
==
=
=1

=1

= 3 (normal), 3 peaked (leptokurtic), 3 flat (platykurtic) relative to normal


2
Important, if ln N , is log-normal
2

+052

2+2

1
E =
Median () = V =
13

Functional Form

Measurement Errors
= +

2
2
or not observed, = + = + ; 0 , 0

Consider first is observed with error. Then

= + + = + ; = +
b unbiased and ecient (not as ecient as when
Model satisfies assumptions of ,
is observed)

Now consider is measured with error,

= ( ) + = +
where = . Since = + , regressor is correlated with the disturbance, given
that
Cov ( ) = Cov ( + ) = 2
b biased and inconsistent.
violates assumption of no correlation between and error term.

Assumption that measurement errors are unsystematic is naive

b may be biased and inconsistent even in the first case


If they are systematic,
14

Functional Form

Omitted Variables
Correct Model:
= 1 1 + 2 2 +
Estimated Model: = 1 1 +
b1 = ( 0 1)1 0

1
1
1
1
0
= 1 + (11) 10 2 2 + (10 1) 10

b1 = 1 + ( 0 1)1 0 2 2
E
| 1 {z 1 }

b1 will generally be biased


Each column of is column of slopes of regression of 2 on 1.
Unbiased if either: = 0 which states that 1 and 2 are orthogonal or 2 = 0

Direction of bias dicult to assess in the general case; consider 1 and 2 are scalars

b1 = 1 + Cov (1 2) 2
E
V (1)

b1 1 and estimator will overestimate eect of 1 on
If sgn(Cov (1 2) 2) 0, E
(Friedman: Permanent Income)
15

Functional Form

Omitted Variables

b1 |1 = 2 ( 0 1)1
V
1



b = 1 and V
b |1 2 would be upper
If we had estimated the correct model, E
1
1

left block of 2 ( 0)1, with = 1 2



b | = 2 ( 0 21)1
V
1
1
To compare both expressions, analyze their inverses
i1 h i1
h
i
h
1
2
0
0
0
b1 |1
b |
12 (22) 21
V
V
=
1

(4)

which is p.d.!

b1 is biased it has a smaller variance than


b
We may be inclined to conclude that although
1
Nevertheless, 2 is not known and needs to be estimated

16

Functional Form

Omitted Variables
Proceeding as usual (thinking that the estimated model is correct) we would obtain

b0
b

e =
1
2

but
b = 1 = 1 (1 1 + 2 2 + ) = 12 2 + 1. Then
E (b
0
b) = 0220 12 2 + 2tr (1)
= 0220 12 2 + 2 ( 1)

First term: population counterpart to increase in due to dropping 2. As this term is


positive,
e2 will be biased upward (the true variance is smaller). Unfortunately, to take into
account this bias we would require to know 2.

In conclusion

b1 and
If we omit relevant variable,
e2 are biased

b
b
Even when 1 may be more precise than , 2 cannot estimate consistently
1

b1 would be unbiased is if 1 and 2 were orthogonal


Only case in which

17

Functional Form

Irrelevant Variables
Correct Model:
= 1 1 +
Estimated Model: = 1 1 + 2 2 +
b1 = ( 0 21)1 0 2 = 1 + ( 0 21)1 0 2

1
1
"1 # 1


0
b

b
2
b = E 1 = 1 ; E
E
e
= 2
=E
b
0
1 2
2

What is the problem? Overfit!! Cost: reduction in precision. Recall


b1 = 1 + ( 0 21)1 0 2

1
1

b1 | = 2 ( 0 21)1
V
1

b1 is larger than if correct model were estimated:


But variance of

b |1 = 2 ( 0 1)1
V
1
1
Asymptotically as ecient if 1 and 2 orthogonal

If 1 2 highly correlated, including 2 greatly inflate variance


18

Functional Form

Multicollinearity
Arises when measured variables are too highly intercorrelated to allow for precise analysis of
the individual eects of each one
We will discuss:
nature
ways to detect it
eects
remedies.

Perfect Collinearity
b not defined. Happens i columns of are linearly dependent.
If ( 0) ,

Most commonly, arises when sets of regressors are identically related


Example, let include ln (1) ln (2) and ln (12)

When happens, error is quickly discovered, software will be unable to construct ( 0)1

Since error is quickly discovered, this is rarely a problem of applied econometric practice
Thus, problem with multicollinearity is not with data, but with bad specification.

19

Functional Form

Near Multicollinearity
In contrast to perfect collinearity, near multicollinearity is statistical problem
Problem is not identification but precision
The higher the correlation between regressors, the less precise will be the estimates
Troubling about definition of problem is that our complaint is with the sample that was
given to us!
The usual symptoms of the problem are:
Small changes in data produce wide swings in estimates
statistics are not significant, 2 is high (excuse?)
Coecients have wrong sign or implausible magnitudes
Problem arises when 0 is near singular and columns of are close to linear dependence
(loose)
One implication of near singularity is that numerical reliability of calculations is reduced
(more likely that reported calculations will be in error due to floating-point calculation
diculties)

20

Functional Form

Near Multicollinearity
Problem is with ( 0)1, -th diagonal is ( = 1 for convenience):
1

1
1
0
0
0
0
0
(121) = 11 12 (22) 21

!!1
1
0
0
0
2 (22) 21
= 011 1 1
011
0

2 1
= 11 1 1
1
= 0
11 (1 12)
12 is (uncentered) 2 of regression of 1 on the other regressors. Thus,

2
b
V 1 = 0
11 (1 12)

2
b1
if a set of regressors is highly correlated to 1, 1 1 and V

21

Functional Form

Detection
Rule of thumb: concerned when overall 2 any 2
Alternative measure (Belsley) based on the conditioning number ()
r
max
=
min

q
0
0
s eigenvalues of = ( ) =diag 1

10
0

0
11

.
1
0
.
0
0
.

2 2
= .

.
.
..
.
0
0

0
0

If regressors are orthogonal (2 = 0 ), = 1. Higher intercorrelation, higher conditioning


number. If perfect collinearity, min = 0, and
Belsley suggests 20 indicate potential problems
Approaches used to deal with problem:
Reduce dimension of (drop variables). Obvious problem: omitted relevant variables,
b biased

Principal components, Ridge Regression

22

Functional Form

Bottom Line
There is no pair of words that is more misused than multicollinearity problem
That explanatory variables are highly collinear is a fact of life
It is clear that there are realizations of 0 which would be much preferred to the actual
data
To complaint about the apparent malevolence of nature is not constructive
Ad-hoc cures for a bad sample, can be disastrously inappropriate
Better to rightly accept the fact that non-experimental data is sometimes not very informative about parameters of interest

23

Functional Form

Bottom Line
Example to clarify what we are really talking about
Consider = 11 + 22 +
A regression of 2 on 1 yields 2 = b
1 + b, where b (by construction) is orthogonal
to 1
Substitute this auxiliary relationship into the original one to obtain the model

b
= 11 + 2 1 + b +

= 1 + 2b
1 + 2b +
= 11 + 22 +

b
where 1 = 1 + 2 2 = 2 1 = 1 and 2 = 2 b
1

Researcher who used 1 and 2 and the parameters 1 and 2, reports that 2 is estimated
inaccurately because of the collinearity problem
Researcher who happened to stumble on the model with variables 1 and 2 and parameters 1 and 2 would report that there is no collinearity problem because 1 and 2 are
orthogonal (recall that 1 and b are orthogonal by construction). This researcher would
nonetheless report that 2(= 2) is estimated inaccurately, not because of collinearity,
but because 2 does not vary adequately

24

Functional Form

Bottom Line
Example illustrates that collinearity as a cause of weak evidence is indistinguishable from
inadequate variability as a cause of weak evidence
In light of that fact, surprising that all econometrics texts have sections dealing with the
collinearity problem but none has a section on the inadequate variability problem
In summary
Collinearity is bound to be present in applied econometric practice
There is no simple solution to this problem
Fortunately, multicollinearity does not lead to errors in inference
Asymptotic distribution is still valid. Estimates are asymptotically normal, and estimated standard errors are consistent
Confidence intervals are not misleading. They are large, correctly indicating inherent
uncertainty about the true parameter values

25

Functional Form

Influential Analysis
OLS seeks to prevent few large residuals at expense of incurring into many relatively small
residuals
A few observations can be extremely influential in the sense that dropping them from sample,
b substantially
changes elements of

b() OLS estimate of that


A systematic way to find those influential observations is: let
would be obtained if -th observation were omitted
The key equation is

()

1
1
b=
( 0)

b
1

0 ( 0)

(5)

which is -th diagonal element of . It is easy to show that


0 1 and

=1

so equals on average.
What should be done with influential observations? Keep or drop?

26

(6)

Functional Form

0.2

Growth

0.1

0.0

1998:09

-0.1

-0.2
0.00

0.05

0.10

0.15

0.20

Policy Rate

0.2

Growth

0.1

0.0

-0.1

-0.2
0.02

0.04

0.06

0.08

0.10

0.12

Policy Rate

0.3
1998:09

0.2

0.1

0.0
0.00

0.05

0.10

0.15

0.20

Policy Rate

Figure 2: Monetary Policy Rate, Growth, and

27

Functional Form

Model Selection
We discussed costs and benefits of inclusion/exclusion of variables
How to select specification, when theory does not provide complete guidance?
This is the question of model selection
Question: What is the right model for ? not well posed, it does not make clear the
conditioning set
Question: Which subset of (1 ) enters the E ( |1 )? is well posed.
In cases, model selection reduced to compare two nested models
= 1 1 + 2 2 +
1 is 1 and 2 is 2. Compare
M1 : = 1 1 +
M2 : = 1 1 + 2 2 +

28

Functional Form

Model Selection
M1 : = 1 1 +
M2 : = 1 1 + 2 2 +
Note that M1 M2
We say that M2 is true if 2 6= 0

M1 and M2 are estimated by OLS, with residuals


b1 and
b2, estimated variances
b21 and

b22, etc., respectively


c
Model selection procedure is a data-dependent rule which selects one of the models (M)

Desirable properties for model selection procedure: consistency


h
i
c = M1 |M1 1
Pr M
h
i
c = M2 |M2 1
Pr M

29

Functional Form

Selection Based on Fit


Natural measures of the fit of a regression are
(b
0
b)

2 = 1 (b
0
b) b
2

2
b
b2 + ( is a constant)
Gaussian log-likelihood
b = ( 2) ln

It might be thought attractive to base model selection on one of these measures of fit
Problem: measures are monotonic between nested models,
b01
b1
b02
b2 12 22 and
1 2, so M2 would always be selected, regardless of the actual data and probability
structure
Clearly an inappropriate decision rule!

30

Functional Form

Selection Based on Testing


Common approach to model selection: base decision on statistical test such as Wald

b1
b22
=

b22

Model selection rule is: for a critical level , let satisfy Pr 22 . Select M1 if
, else select M2.
Major problem with this approach is that critical level is indeterminate
Reasoning which helps guide choice of in hypothesis testing (controlling
h Type I error)
i is
c = M1 |M1
not relevant for model selection. If is set to be a small number, then Pr M
i
h
c = M2 |M2 could vary dramatically, depending on the sample size, etc.
1 but Pr M
Another problem is that if is held fixed, model selection procedure is inconsistent, as
h
i
c = M1 |M1 1 1
Pr M

31

Functional Form

Selection Based on Adjusted R-squared


As 2 is not a useful model selection rule, as it prefers the larger model, Theil proposed
an adjusted coecient of determination
(b
0

e2
b) ( )
=1
=1 2

b2

b
2

At one time, it was popular to pick between models based on


2

Rule is to select M1 if 1 2, else select M2


2

2
Since is monotonically decreasing
on

e
, rule is the same as selecting model with smaller

e2, or equivalently, smaller ln


e2

It is helpful to observe that

b2
= ln
b2 + ln 1 +
ln
e = ln


2
2

' ln
b +
' ln
b +

(the first approximation is ln (1 + ) ' for small ).

32

Functional Form

Selection Based on Adjusted R-squared


2
2
Selecting based on is the same as selecting based on ln
b + , which is a particular
choice of penalized likelihood criteria

It turns out that model selection based on any criterion of the form
2

ln
b + 0

is inconsistent, as the rule tends to overfit. Indeed, since under M1,

ln
b21 ln
b22 ' 22
=
'
'
=

h
i
c = M1 |M1
Pr M
h 2
i
2
Pr 1 2 |M1

2
2

Pr ln
e1 ln
e2 |M1

2
2

Pr ln
b1 + 1 ln
b2 + (1 + 2) |M1
Pr [ 2 |M1 ]
2

Pr 2 2 1

33

(7)

(8)

Functional Form

Selection Based on Information Criteria


Akaike Information Criterion
Akaike proposed an information criterion which takes the form

+2

which with a Gaussian log-likelihood can be approximated by (7) with = 2:


=

' ln
b +2

Imposes larger penalty on overparameterization than does

Rule: select M1 if 1 2, else select M2


Since takes the form (7), it is inconsistent model selection criterion, and tends to overfit

34

Functional Form

Selection Based on Information Criteria


Schwarz Criterion
Modification of : Schwarz (based on Bayesian arguments)
2
= + ln ( )

which with a Gaussian log-likelihood can be approximated by


2
' ln
b + ln ( )

Since ln ( ) 2 (if 8), places larger penalty than on number of estimated


parameters (is more parsimonious)
is consistent. Indeed, since (8) holds under M1,

0
ln ( )
h
i
c
Pr M = M1 |M1 = Pr [1 2 |M1 ]

= Pr [ 2 ln ( ) |M1 ]

2 |M1
= Pr
ln ( )
Pr (0 2 |M1 ) = 1
35

Functional Form

Selection Based on Information Criteria


Schwarz Criterion
Also under M2, one can show that

ln ( )

i
h
c
Pr M = M2 |M2 = Pr [2 1 |M2 ]

2 |M2 1
= Pr
ln ( )

36

Functional Form

Selection Based on Information Criteria


Hannan-Quinn Criterion
Another popular model selection criterion is:

+ 2 ln (ln ( ))

which with a Gaussian log-likelihood can be approximated by


=

' ln
b + 2 ln (ln ( ))

Since ln (ln ( )) 1 (if 15), places larger penalty than on number of


estimated parameters and is more parsimonious
As 2 ln (ln ( )) ln ( ) ( 0), places a larger penalty than the and selects
more parsimonious models
is consistent

37

Functional Form

Selection Based on Information Criteria


A Final Word of Caution
Results were obtained in OLS context with Gaussian innovations
To compare dierent models, dependent variable and sample size need to be the same
Which model selection criterion is best? Open question and an active field of research
While consistency is desirable, there may be cases in which more parsimonious models run
the risk of excluding relevant variables and that is why some researchers prefer which
is consistent and not as parsimonious as
From a practical standpoint, it is important to look at the three criteria. Who knows, they
may all choose the same the model!

38

Functional Form

Selection Among Multiple Regressors


Selection among multiple regressors
= 11 + 22 + + +
which regressors enter the regression?
Ordered case (Nested):
M1 : 1 =
6 0 2 = 3 = = = 0
M2 : 1 =
6 0 2 6= 0 3 = = = 0
...
M : 1 =
6 0 2 6= 0 3 6= 0 6= 0
which are nested. Selection model that minimizes criterion
Unordered case: 2 models. Example, 210 = 1024 and 220 = 1 048 576. Computationally
demanding

39

Functional Form

Specification Searches
Theory often vague about relationship between variables
Result, many relations established from empirical regularities
If not accounted for, practice can generate serious biases in inference
Names: Data mining, data snooping, data grubbing, data fishing
Examples:
Because of space limitations, only the best of a variety of alternative models are
presented here.
The precise variables included in the regression were determined on the basis of
extensive experimentation (on the same body of data).
Since there is no firmly validated theory, we avoided a priori specification of the
functions we wished to fit.
We let the data specify the model.
Newsletter scam
Conventional hypothesis testing valid when a priori considerations rather than exploratory
data mining determine set of variables included
When miner uncovers t-statistics that appear significant at 0.05 level by running a large
number of alternative regressions on the same body of data, the probability of Type I error
is much greater than claimed 5%
40

You might also like