Tema II (Forma Funcional)

Functional Form
Rmulo A. Chumacero
Functional Form
Motivation
What?: Extend OLS framework
Why?: Crucial in practice
How?: Using what we have learned
Outline
1. Scaling
2. Dummy variables / Time trends
3. Possible nonlinearities
4. Diagnostics tests
5. Measurement errors in variables
6. Omitting relevant variables
7. Including irrelevant variables
8. Multicollinearity
9. Influential analysis
10. Model selection
11. Specification searches
1
Functional Form
Eects of Scaling
Data are not always conveniently scaled
Changing the scale of
= 1 + 2 + = 1 + ( 2)
= 1 + 2 +
Changes magnitude of coecient by

Changes standard error of coecient by same factor
-statics not aect
Everything else remains unchanged
Changing the scale of
= 1 + 2 +
= 1 + 2 +
= 1 + 2 +
Changes magnitude of ALL coecients by

Changes standard error of coecient by same factor
-statics not aect
Scaled residuals and changes SER by the same factor
Everything else remains unchanged
2
Functional Form
Dummy Variables
Example: E ( |). Equivalent ways to model this:
Define a dummy variable
1 female
0 male
Thus, = 0 + 11 + . 0 = E ( |) and 0 + 1 = E ( | )
Alternatively, define the variable
0 female
2 =
1 male
1 =
Thus, = 0 + 12 + . 0 = E ( | ) and 0 + 1 = E ( |)
Or = 11 + 22 + . 1 = E ( | ) and 2 = E ( |)
Standard mistake: include an intercept, 1 and 2. Perfectly collinear (1 + 2 = 1)
If equation of interest is E ( | ):
= 0 + 11 + 2 +
Intercept eect for gender but return on education is the same
A regression model allowing for slope dierences (interactions) is
= 0 + 11 + 2 + 31 +
Functional Form
Dummy Variables
It is interesting to see how our estimators algebraically handle dummy variables
= 1 1 + 2 2 +
By construction 10 2 = 0. Thus,
1 X
0
0
=1 1
b
=
1 = 1
1 = (11) 1 = P
1 =1
=1 1
#
0
" b2
1

(11)
0
1 0
b =
b
V
=
b2
2
0
(20 2)1
0 b 2
1
1 X 2
b =
=1
2
Another candidate for the variance-covariance estimate is

#
" b21

1
2
0
0
1 0
b = b1 (11)
V
=
2
0
b22 (20 2)1

0 b22
1 X
2
b =
b2 for = 1 2
=1
4
Functional Form
Time Trend
Many economic variables exhibit trends
Consider a series growing at a constant rate:
= 0 (1 + )
where is the rate of growth per period
Taking logs (ln) of both sides:
ln = ln 0 + ln (1 + )
Adding a shock and changing notation:
= 1 + 2 +
where = ln 1 = ln 0 2 = ln (1 + ) '
Easily checked by taking the first dierence and ignoring the disturbance:
= 2
Important: choice of unit for is irrelevant if it used consistently
Functional Form
Seasonality
Economic time series:
= + + +
= 0 + 11 + 22 + 33 + 4 +
E ( |First quarter) = 0 + 1 + 4; E ( |Fourth quarter) = 0 + 4.

Seasonally adjusted series
5.4
b
b
b
= 11 + 22 + 33
5.2
5.0
4.8
4.6
4.4
87
88
89
GDP
90
91
92
93
94
95
96
97
Seasonally Adjusted GDP
Figure 1: Use of Quarterly Dummies
Detrended
6
(1)
Functional Form
Nonlinearity in Regressors
We are interested in E ( | ) = () R and form of is unknown
Common approach: polynomial approximation:
= 0 + 1 + 22 + + +
2
Let = 0 1 and = 1 this is = 0 + which is LRM.
Typically, is kept small
If R2, a simple quadratic approximation is
= 0 + 11 + 22 + 321 + 422 + 512 +
As dimensionality increases, approximations become non-parsimonious
Most applications use quadratic terms, some add cubics without interactions (or neural nets,
Fourier series, splines, wavelets, etc):
= 0 + 11 + 22 + 321 + 422 + 512 + 631 + 732 +
Since nonlinear models are linear in parameters, they can be estimated by OLS, and inference
is conventional
Functional Form
Nonlinearity in Regressors
However, model is nonlinear, so interpretation must take this into account
For example, in cubic model, slope with respect to 1 is
E ( | )
= 1 + 2 31 + 52 + 3 621
1
which is a function of 1 and 2, making reporting of the slope dicult
Important to report slopes for dierent values of the regressors, chosen to illustrate the point
of interest
In other applications, average slope may be sucient. Two obvious candidates:
Derivative evaluated at sample averages
E ( | )
= 1 + 2 31 + 52 + 3 621
1
=
and average derivative
1 X E ( | )
1X 2
= 1 + 2 31 + 52 + 3 6
=1
1
=1 1
Functional Form
Transformations
Even simplest model usually considers nonlinearities
Example: Cobb-Douglas production function
=
Take logs to (2):
= + +
Or: = +
Instances in which no transformation can be used: CES:
1

= + (1 ) +
OLS cannot be applied (NLLS)
(2)
Functional Form
Some Useful Functions

Choice of functional form aects interpretation of results
Use of good analytic skills and experience
Name
Linear
Quadratic
Cubic
Log-log
Log-linear
Linear-log
Function
= 1 + 2
= 1 + 22
= 1 + 23
ln = 1 + 2 ln
ln = 1 + 2
= 1 + 2 ln
If ln is used 0 is required
10
Slope=
2
2 2
3 22
2
2
2 1
Elasticity
2
2
2 2
3
3 2
2
2
2 1
Functional Form
ln ( )
versus as Dependent Variable
b +b
b +b
Econometrician can estimate =
or ln( ) =
(or both). Which is preferable?
Plain truth: either is fine, in the sense that E ( | ) and E (ln () | ) are well-defined (so
long as 0)
To select one specification over the other, requires the
of additional structure (as
imposition
conditional expectation is linear in , and N 0 2 )

Some good reasons for preferring the ln ( ) over regression
E (ln () | ) may be roughly linear in , while E ( | ) may be nonlinear, and linear

models are easier to report and interpret
= ln () E (ln () | ) may be less heteroskedastic than the errors from the linear
specification (although the reverse may be true!)
\
As long as 0, range of ln
() is well-defined in R; this is not the case for b which
b may produce b 0 (Tobit)
for some values of
If distribution of is skewed, E ( | ) may not be a useful measure of central tendency,

and estimates will be influenced by extreme observations (outliers); ln E (ln () | )
may be a better measure of central tendency, and more interesting to estimate and
report
Careful when the ln specification is used if interested in obtaining E ( | ); Jensen inequality:
exp [E (ln () | )] 6= E [exp (ln () | )]
11
Functional Form
Testing for Omitted Nonlinearity

Simple test: add nonlinear functions of , and test significance
e + 0
e by OLS, and
Let = () denote nonlinear functions of . Fit = 0
e +
test H0 : = 0
Ramsey RESET test. The null model is
= 0 +
2
b
= ...
b
e + 0
= 0
e
e +
by OLS, and form the Wald statistic 21 for H0 : = 0
(3)
Typically = 2 3 or 4 seem to work best.

Works well as test of functional form against smooth alternatives. Powerful at detecting
single-index models of the form
= (0) +
where () is a smooth link function. To see why this is the case, note that (3) may
be written as
3

2
0e
0b
0b
b
= +
e1 +
e2 + + 0
e1 +
e
which has essentially approximated () by a -th order polynomial

12
Functional Form
Are Errors Normally Distributed?

Normality of is not crucial for desired properties of OLS estimator (including inference)
However, it leads to use of exact distribution in small samples, helps in forecast, etc
Jarque-Bera test: most popular test for normality
"
#
2
( 3) 2
=
2 +
2
6
4
(Skewness) is a measure of asymmetry of the distribution around the mean
3

3
1 X
1 X e
=
==
=1
=1
= 0 symmetric, 0 long right tail, 0 long left tail

(Kurtosis) measures the peakedness or flatness of the distribution
4

3
1 X e
1 X
==
=
=1
=1
= 3 (normal), 3 peaked (leptokurtic), 3 flat (platykurtic) relative to normal

2
Important, if ln N , is log-normal
2
+052
2+2
1
E =
Median () = V =
13
Functional Form
Measurement Errors
= +
2
2
or not observed, = + = + ; 0 , 0
Consider first is observed with error. Then
= + + = + ; = +
b unbiased and ecient (not as ecient as when
Model satisfies assumptions of ,
is observed)
Now consider is measured with error,
= ( ) + = +
where = . Since = + , regressor is correlated with the disturbance, given
that
Cov ( ) = Cov ( + ) = 2
b biased and inconsistent.
violates assumption of no correlation between and error term.
Assumption that measurement errors are unsystematic is naive
b may be biased and inconsistent even in the first case

If they are systematic,
14
Functional Form
Omitted Variables
Correct Model:
= 1 1 + 2 2 +
Estimated Model: = 1 1 +
b1 = ( 0 1)1 0
1
1
1
1
0
= 1 + (11) 10 2 2 + (10 1) 10

b1 = 1 + ( 0 1)1 0 2 2
E
| 1 {z 1 }
b1 will generally be biased

Each column of is column of slopes of regression of 2 on 1.
Unbiased if either: = 0 which states that 1 and 2 are orthogonal or 2 = 0
Direction of bias dicult to assess in the general case; consider 1 and 2 are scalars

b1 = 1 + Cov (1 2) 2
E
V (1)

b1 1 and estimator will overestimate eect of 1 on
If sgn(Cov (1 2) 2) 0, E
(Friedman: Permanent Income)
15
Functional Form
Omitted Variables
b1 |1 = 2 ( 0 1)1
V
1

b = 1 and V
b |1 2 would be upper
If we had estimated the correct model, E
1
1
left block of 2 ( 0)1, with = 1 2

b | = 2 ( 0 21)1
V
1
1
To compare both expressions, analyze their inverses
i1 h i1
h
i
h
1
2
0
0
0
b1 |1
b |
12 (22) 21
V
V
=
1
(4)
which is p.d.!
b1 is biased it has a smaller variance than

b
We may be inclined to conclude that although
1
Nevertheless, 2 is not known and needs to be estimated
16
Functional Form
Omitted Variables
Proceeding as usual (thinking that the estimated model is correct) we would obtain
b0
b
e =
1
2
but
b = 1 = 1 (1 1 + 2 2 + ) = 12 2 + 1. Then
E (b
0
b) = 0220 12 2 + 2tr (1)
= 0220 12 2 + 2 ( 1)
First term: population counterpart to increase in due to dropping 2. As this term is

positive,
e2 will be biased upward (the true variance is smaller). Unfortunately, to take into
account this bias we would require to know 2.
In conclusion
b1 and
If we omit relevant variable,
e2 are biased
b
b
Even when 1 may be more precise than , 2 cannot estimate consistently
1
b1 would be unbiased is if 1 and 2 were orthogonal

Only case in which
17
Functional Form
Irrelevant Variables
Correct Model:
= 1 1 +
Estimated Model: = 1 1 + 2 2 +
b1 = ( 0 21)1 0 2 = 1 + ( 0 21)1 0 2
1
1
"1 # 1

0
b
b
2
b = E 1 = 1 ; E
E
e
= 2
=E
b
0
1 2
2
What is the problem? Overfit!! Cost: reduction in precision. Recall

b1 = 1 + ( 0 21)1 0 2
1
1
b1 | = 2 ( 0 21)1
V
1
b1 is larger than if correct model were estimated:

But variance of

b |1 = 2 ( 0 1)1
V
1
1
Asymptotically as ecient if 1 and 2 orthogonal
If 1 2 highly correlated, including 2 greatly inflate variance

18
Functional Form
Multicollinearity
Arises when measured variables are too highly intercorrelated to allow for precise analysis of
the individual eects of each one
We will discuss:
nature
ways to detect it
eects
remedies.
Perfect Collinearity
b not defined. Happens i columns of are linearly dependent.
If ( 0) ,
Most commonly, arises when sets of regressors are identically related

Example, let include ln (1) ln (2) and ln (12)
When happens, error is quickly discovered, software will be unable to construct ( 0)1
Since error is quickly discovered, this is rarely a problem of applied econometric practice
Thus, problem with multicollinearity is not with data, but with bad specification.
19
Functional Form
Near Multicollinearity
In contrast to perfect collinearity, near multicollinearity is statistical problem
Problem is not identification but precision
The higher the correlation between regressors, the less precise will be the estimates
Troubling about definition of problem is that our complaint is with the sample that was
given to us!
The usual symptoms of the problem are:
Small changes in data produce wide swings in estimates
statistics are not significant, 2 is high (excuse?)
Coecients have wrong sign or implausible magnitudes
Problem arises when 0 is near singular and columns of are close to linear dependence
(loose)
One implication of near singularity is that numerical reliability of calculations is reduced
(more likely that reported calculations will be in error due to floating-point calculation
diculties)
20
Functional Form
Near Multicollinearity
Problem is with ( 0)1, -th diagonal is ( = 1 for convenience):
1
1
1
0
0
0
0
0
(121) = 11 12 (22) 21
!!1
1
0
0
0
2 (22) 21
= 011 1 1
011
0
2 1
= 11 1 1
1
= 0
11 (1 12)
12 is (uncentered) 2 of regression of 1 on the other regressors. Thus,

2
b
V 1 = 0
11 (1 12)

2
b1
if a set of regressors is highly correlated to 1, 1 1 and V
21
Functional Form
Detection
Rule of thumb: concerned when overall 2 any 2
Alternative measure (Belsley) based on the conditioning number ()
r
max
=
min
q
0
0
s eigenvalues of = ( ) =diag 1
10
0
0
11
.
1
0
.
0
0
.
2 2
= .
.
.
..
.
0
0
0
0

If regressors are orthogonal (2 = 0 ), = 1. Higher intercorrelation, higher conditioning

number. If perfect collinearity, min = 0, and
Belsley suggests 20 indicate potential problems
Approaches used to deal with problem:
Reduce dimension of (drop variables). Obvious problem: omitted relevant variables,
b biased
Principal components, Ridge Regression
22
Functional Form
Bottom Line
There is no pair of words that is more misused than multicollinearity problem
That explanatory variables are highly collinear is a fact of life
It is clear that there are realizations of 0 which would be much preferred to the actual
data
To complaint about the apparent malevolence of nature is not constructive
Ad-hoc cures for a bad sample, can be disastrously inappropriate
Better to rightly accept the fact that non-experimental data is sometimes not very informative about parameters of interest
23
Functional Form
Bottom Line
Example to clarify what we are really talking about
Consider = 11 + 22 +
A regression of 2 on 1 yields 2 = b
1 + b, where b (by construction) is orthogonal
to 1
Substitute this auxiliary relationship into the original one to obtain the model
b
= 11 + 2 1 + b +
= 1 + 2b
1 + 2b +
= 11 + 22 +
b
where 1 = 1 + 2 2 = 2 1 = 1 and 2 = 2 b
1
Researcher who used 1 and 2 and the parameters 1 and 2, reports that 2 is estimated
inaccurately because of the collinearity problem
Researcher who happened to stumble on the model with variables 1 and 2 and parameters 1 and 2 would report that there is no collinearity problem because 1 and 2 are
orthogonal (recall that 1 and b are orthogonal by construction). This researcher would
nonetheless report that 2(= 2) is estimated inaccurately, not because of collinearity,
but because 2 does not vary adequately
24
Functional Form
Bottom Line
Example illustrates that collinearity as a cause of weak evidence is indistinguishable from
inadequate variability as a cause of weak evidence
In light of that fact, surprising that all econometrics texts have sections dealing with the
collinearity problem but none has a section on the inadequate variability problem
In summary
Collinearity is bound to be present in applied econometric practice
There is no simple solution to this problem
Fortunately, multicollinearity does not lead to errors in inference
Asymptotic distribution is still valid. Estimates are asymptotically normal, and estimated standard errors are consistent
Confidence intervals are not misleading. They are large, correctly indicating inherent
uncertainty about the true parameter values
25
Functional Form
Influential Analysis
OLS seeks to prevent few large residuals at expense of incurring into many relatively small
residuals
A few observations can be extremely influential in the sense that dropping them from sample,
b substantially
changes elements of
b() OLS estimate of that

A systematic way to find those influential observations is: let
would be obtained if -th observation were omitted
The key equation is
()
1
1
b=
( 0)
b
1
0 ( 0)
(5)
which is -th diagonal element of . It is easy to show that

0 1 and
=1
so equals on average.
What should be done with influential observations? Keep or drop?
26
(6)
Functional Form
0.2
Growth
0.1
0.0
1998:09
-0.1
-0.2
0.00
0.05
0.10
0.15
0.20
Policy Rate
0.2
Growth
0.1
0.0
-0.1
-0.2
0.02
0.04
0.06
0.08
0.10
0.12
Policy Rate
0.3
1998:09
0.2
0.1
0.0
0.00
0.05
0.10
0.15
0.20
Policy Rate
Figure 2: Monetary Policy Rate, Growth, and
27
Functional Form
Model Selection
We discussed costs and benefits of inclusion/exclusion of variables
How to select specification, when theory does not provide complete guidance?
This is the question of model selection
Question: What is the right model for ? not well posed, it does not make clear the
conditioning set
Question: Which subset of (1 ) enters the E ( |1 )? is well posed.
In cases, model selection reduced to compare two nested models
= 1 1 + 2 2 +
1 is 1 and 2 is 2. Compare
M1 : = 1 1 +
M2 : = 1 1 + 2 2 +
28
Functional Form
Model Selection
M1 : = 1 1 +
M2 : = 1 1 + 2 2 +
Note that M1 M2
We say that M2 is true if 2 6= 0
M1 and M2 are estimated by OLS, with residuals

b1 and
b2, estimated variances
b21 and
b22, etc., respectively

c
Model selection procedure is a data-dependent rule which selects one of the models (M)
Desirable properties for model selection procedure: consistency

h
i
c = M1 |M1 1
Pr M
h
i
c = M2 |M2 1
Pr M
29
Functional Form
Selection Based on Fit

Natural measures of the fit of a regression are
(b
0
b)
2 = 1 (b
0
b) b
2
2
b
b2 + ( is a constant)
Gaussian log-likelihood
b = ( 2) ln
It might be thought attractive to base model selection on one of these measures of fit
Problem: measures are monotonic between nested models,
b01
b1
b02
b2 12 22 and
1 2, so M2 would always be selected, regardless of the actual data and probability
structure
Clearly an inappropriate decision rule!
30
Functional Form
Selection Based on Testing

Common approach to model selection: base decision on statistical test such as Wald
b1
b22
=
b22
Model selection rule is: for a critical level , let satisfy Pr 22 . Select M1 if
, else select M2.
Major problem with this approach is that critical level is indeterminate
Reasoning which helps guide choice of in hypothesis testing (controlling
h Type I error)
i is
c = M1 |M1
not relevant for model selection. If is set to be a small number, then Pr M
i
h
c = M2 |M2 could vary dramatically, depending on the sample size, etc.
1 but Pr M
Another problem is that if is held fixed, model selection procedure is inconsistent, as
h
i
c = M1 |M1 1 1
Pr M
31
Functional Form
Selection Based on Adjusted R-squared

As 2 is not a useful model selection rule, as it prefers the larger model, Theil proposed
an adjusted coecient of determination
(b
0
e2
b) ( )
=1
=1 2
b2
b
2
At one time, it was popular to pick between models based on

2
Rule is to select M1 if 1 2, else select M2

2
2
Since is monotonically decreasing
on
e
, rule is the same as selecting model with smaller
e2, or equivalently, smaller ln

e2
It is helpful to observe that
b2
= ln
b2 + ln 1 +
ln
e = ln

2
2
' ln
b +
' ln
b +

(the first approximation is ln (1 + ) ' for small ).
32
Functional Form
Selection Based on Adjusted R-squared

2
2
Selecting based on is the same as selecting based on ln
b + , which is a particular
choice of penalized likelihood criteria
It turns out that model selection based on any criterion of the form
2
ln
b + 0
is inconsistent, as the rule tends to overfit. Indeed, since under M1,
ln
b21 ln
b22 ' 22
=
'
'
=
h
i
c = M1 |M1
Pr M
h 2
i
2
Pr 1 2 |M1
2
2
Pr ln
e1 ln
e2 |M1
2
2
Pr ln
b1 + 1 ln
b2 + (1 + 2) |M1
Pr [ 2 |M1 ]
2
Pr 2 2 1
33
(7)
(8)
Functional Form
Selection Based on Information Criteria

Akaike Information Criterion
Akaike proposed an information criterion which takes the form
+2
which with a Gaussian log-likelihood can be approximated by (7) with = 2:

=
' ln
b +2
Imposes larger penalty on overparameterization than does
Rule: select M1 if 1 2, else select M2

Since takes the form (7), it is inconsistent model selection criterion, and tends to overfit
34
Functional Form

Schwarz Criterion
Modification of : Schwarz (based on Bayesian arguments)
2
= + ln ( )
which with a Gaussian log-likelihood can be approximated by

2
' ln
b + ln ( )
Since ln ( ) 2 (if 8), places larger penalty than on number of estimated

parameters (is more parsimonious)
is consistent. Indeed, since (8) holds under M1,

0
ln ( )
h
i
c
Pr M = M1 |M1 = Pr [1 2 |M1 ]
= Pr [ 2 ln ( ) |M1 ]
2 |M1
= Pr
ln ( )
Pr (0 2 |M1 ) = 1
35
Functional Form

Schwarz Criterion
Also under M2, one can show that
ln ( )
i
h
c
Pr M = M2 |M2 = Pr [2 1 |M2 ]
2 |M2 1
= Pr
ln ( )
36
Functional Form

Hannan-Quinn Criterion
Another popular model selection criterion is:
+ 2 ln (ln ( ))
which with a Gaussian log-likelihood can be approximated by

=
' ln
b + 2 ln (ln ( ))
Since ln (ln ( )) 1 (if 15), places larger penalty than on number of

estimated parameters and is more parsimonious
As 2 ln (ln ( )) ln ( ) ( 0), places a larger penalty than the and selects
more parsimonious models
is consistent
37
Functional Form

A Final Word of Caution
Results were obtained in OLS context with Gaussian innovations
To compare dierent models, dependent variable and sample size need to be the same
Which model selection criterion is best? Open question and an active field of research
While consistency is desirable, there may be cases in which more parsimonious models run
the risk of excluding relevant variables and that is why some researchers prefer which
is consistent and not as parsimonious as
From a practical standpoint, it is important to look at the three criteria. Who knows, they
may all choose the same the model!
38
Functional Form
Selection Among Multiple Regressors

Selection among multiple regressors
= 11 + 22 + + +
which regressors enter the regression?
Ordered case (Nested):
M1 : 1 =
6 0 2 = 3 = = = 0
M2 : 1 =
6 0 2 6= 0 3 = = = 0
...
M : 1 =
6 0 2 6= 0 3 6= 0 6= 0
which are nested. Selection model that minimizes criterion
Unordered case: 2 models. Example, 210 = 1024 and 220 = 1 048 576. Computationally
demanding
39
Functional Form
Specification Searches
Theory often vague about relationship between variables
Result, many relations established from empirical regularities
If not accounted for, practice can generate serious biases in inference
Names: Data mining, data snooping, data grubbing, data fishing
Examples:
Because of space limitations, only the best of a variety of alternative models are
presented here.
The precise variables included in the regression were determined on the basis of
extensive experimentation (on the same body of data).
Since there is no firmly validated theory, we avoided a priori specification of the
functions we wished to fit.
We let the data specify the model.
Newsletter scam
Conventional hypothesis testing valid when a priori considerations rather than exploratory
data mining determine set of variables included
When miner uncovers t-statistics that appear significant at 0.05 level by running a large
number of alternative regressions on the same body of data, the probability of Type I error
is much greater than claimed 5%
40

Tema II (Forma Funcional)

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Tema II (Forma Funcional)

Uploaded by

Copyright:

Available Formats

Functional Form

Changes magnitude of coecient by

Changing the scale of

Changes magnitude of ALL coecients by

Another candidate for the variance-covariance estimate is

b22 (20 2)1

E ( |First quarter) = 0 + 1 + 4; E ( |Fourth quarter) = 0 + 4.

Seasonally Adjusted GDP

Figure 1: Use of Quarterly Dummies

Some Useful Functions

versus as Dependent Variable

conditional expectation is linear in , and N 0 2 )

E (ln () | ) may be roughly linear in , while E ( | ) may be nonlinear, and linear

If distribution of is skewed, E ( | ) may not be a useful measure of central tendency,

Testing for Omitted Nonlinearity

Ramsey RESET test. The null model is

Typically = 2 3 or 4 seem to work best.

which has essentially approximated () by a -th order polynomial

Are Errors Normally Distributed?

= 0 symmetric, 0 long right tail, 0 long left tail

= 3 (normal), 3 peaked (leptokurtic), 3 flat (platykurtic) relative to normal

Consider first is observed with error. Then

Now consider is measured with error,

Assumption that measurement errors are unsystematic is naive

b may be biased and inconsistent even in the first case

b1 will generally be biased

left block of 2 ( 0)1, with = 1 2

b1 is biased it has a smaller variance than

First term: population counterpart to increase in due to dropping 2. As this term is

b1 would be unbiased is if 1 and 2 were orthogonal

What is the problem? Overfit!! Cost: reduction in precision. Recall

b1 is larger than if correct model were estimated:

If 1 2 highly correlated, including 2 greatly inflate variance

Most commonly, arises when sets of regressors are identically related

If regressors are orthogonal (2 = 0 ), = 1. Higher intercorrelation, higher conditioning

Principal components, Ridge Regression

b() OLS estimate of that

which is -th diagonal element of . It is easy to show that

Figure 2: Monetary Policy Rate, Growth, and

M1 and M2 are estimated by OLS, with residuals

b22, etc., respectively

Desirable properties for model selection procedure: consistency

Selection Based on Fit

Selection Based on Testing

Selection Based on Adjusted R-squared

At one time, it was popular to pick between models based on

Rule is to select M1 if 1 2, else select M2

e2, or equivalently, smaller ln

It is helpful to observe that

(the first approximation is ln (1 + ) ' for small ).

Selection Based on Adjusted R-squared

is inconsistent, as the rule tends to overfit. Indeed, since under M1,

Selection Based on Information Criteria

which with a Gaussian log-likelihood can be approximated by (7) with = 2:

Imposes larger penalty on overparameterization than does

Rule: select M1 if 1 2, else select M2

Selection Based on Information Criteria

which with a Gaussian log-likelihood can be approximated by

Since ln ( ) 2 (if 8), places larger penalty than on number of estimated

Selection Based on Information Criteria

Selection Based on Information Criteria

which with a Gaussian log-likelihood can be approximated by

Since ln (ln ( )) 1 (if 15), places larger penalty than on number of

Selection Based on Information Criteria