You are on page 1of 33

9.

Gujarati(2003): Chapter 10

M. Balcılar, EMU, ECON 503


9.2
The nature of multicollinearity

In general:
When there are some functional relationships
among independent variables, that is ∑ λiXi = 0

or λ1X1+ λ2X2 + λ3X3 +…+ λiXi = 0


Such as 1X1+ 2X2 = 0  X1= -2X2
If multicollinearity is perfect, the regression coefficients of
the Xi variables, βis, are indeterminate and their standard
βi)s, are infinite.
errors, Se(β
M. Balcılar, EMU, ECON 503
9.3
Example: ^ ^ ^
Y = β1 + β2X2 + β3X3 + ^u
3-variable Case:
Σyx2)(Σ
(Σ Σx32) - (Σ
Σyx3)(Σ
Σx2x3)
^
β2 =
Σx22)(Σ
(Σ Σx32) - (Σ
Σx2x3)2
If x3 = λx2,
^ Σyx2)(λ
(Σ λ2Σx22) - (λ
λΣyx2)(λ
λΣx2x2) 0
β2 = =
Σx2
(Σ λ2
2)(λ Σx2 - 2) λ2(Σ
Σx2x2)2 0
Indeterminate
Σyx3)(Σ
(Σ Σx22) - (Σ
Σyx2)(Σ
Σx2x3)
^
β3 =
Σx22)(Σ
(Σ Σx32) - (Σ
Σx2x3)2
Similarly
If x3 = λx2 ^ λΣyx2)(Σ
(λ Σx22) - (Σ
Σyx2)(λλΣx2x2) 0
β3 = =
Σx22)(λ
(Σ λ2 Σx22) - λ2(Σ
Σx2x2)2 0
M. Balcılar, EMU, ECON 503 Indeterminate
9.4
If multicollinearity is imperfect,

x3 = λ2 x2+ ν where ν is a stochastic error


(or x3 = λ1+ λ2 x2+ ν )

Then the regression coefficients, although determinate,


possess large standard errors, which means the
coefficients can be estimated but with less accuracy.

^ λ22Σx22 + Σν 2 ) - (λ
Σyx2)(λ
(Σ λ2 Σyx2 + Σyν )(λ
λ2 Σx2x2+ Σx2ν )
β2 =
λ22 Σx22 + Σν 2 ) - (λ
Σx22)(λ
(Σ λ2Σx2x2 + Σx2ν )2
0
≠ 0 =0
(Why?)
M. Balcılar, EMU, ECON 503
Perfect vs. Less Than Perfect
Multicollinearity
• Perfect multicollinearity

• Lens than perfect multicollinearity


Sources of Multicollinearity
1. The data collection method employed, for example, sampling
over a limited range of the values taken by the regressors in the
population.
2. Constraints on the model or in the population being sampled.
For example, in the regression of electricity consumption on income
(X2) and house size (X3) there is a physical constraint in the
population in that families with higher incomes generally have larger
homes than families with lower incomes.
3. Model specification, for example, adding polynomial terms to a
regression model, especially when the range of the X variable is
small.
4. An overdetermined model. This happens when the model has
more explanatory variables than the number of observations. This
could happen in medical research where there may be a small
number of patients about whom information is collected on a large
number of variables.
Multicollinearity: Property of OLS
Estimators
It can be shown that even if multicollinearity is very
high, as in the case of near multicollinearity, the
OLS estimators still retain the property of BLUE.
Multicollinearity violates no regression
assumptions. Unbiased, consistent estimates
will occur, and their standard errors will be
correctly estimated. The only effect of
multicollinearity is to make it hard to get
coefficient estimates with small standard error.
MULTICOLLINEARITY: MUCH
ADO ABOUT NOTHING?
• First, it is true that even in the case of near
multicollinearity the OLS estimators are unbiased. But
unbiasedness is a multisample or repeated sampling
property. But this says nothing about the properties of
estimators in any given sample.
• Second, it is also true that collinearity does not destroy
the property of minimum variance: But this does not
mean that the variance of an OLS estimator will
necessarily be small
• Third, multicollinearity is essentially a sample
(regression) phenomenon in the sense that even if the X
variables are not linearly related in the population, they
may be so related in the particular sample at hand: In
short, our sample may not be “rich” enough to
accommodate all X variables in the analysis.
9.10

Consequences of imperfect multicollinearity

1. Although the estimated coefficients are BLUE,


OLS estimators have large variances and
covariances, making the estimation with
less accuracy.
2. The estimation confidence intervals tend to be
much wider, leading to accept the “zero null
hypothesis” more readily.
3. The t-statistics of coefficients tend to be
statistically insignificant. Can be
detected from
4. The R2 can be very high. regression
5. The OLS estimators and their standard errors results
can be sensitive to small change in the data.
M. Balcılar, EMU, ECON 503
Large variance and covariance of OLS estimators 9.11

σ 2
σ 2
var(βˆ2 ) = u
= u
VIF
∑x 2
2 (1 − r )
2
23 ∑x 2
2
1
Variance-inflating factor: VIF =
1 − r23
2

Higher pair-wise correlation  higher VIF  larger variance

where r232 <=== OLS : X 2 = α1 + α 2 X 3 + v

r322 <=== OLS : X 3 = α1' + α 2' X 2 + v '

σ u2 σ u2
var(βˆ j ) = = VIF j
∑x 2
j (1 − r )
2
ji ∑x 2
j
M. Balcılar, EMU, ECON 503
Large variance and covariance of OLS estimators 9.12

As a rule of thumb, if the VIF of a


variable exceeds 10, which will
happen if Rj2 exceeds 0.90, that
variable is said be highly collinear.
The inverse of the VIF is called tolerance (TOL):

VIF > 10  TOL < 0.1

M. Balcılar, EMU, ECON 503


9.13

M. Balcılar, EMU, ECON 503


9.14

Detecting Multicollinearity
1. High R2 but few significant t ratios.

M. Balcılar, EMU, ECON 503


9.15
Detecting Multicollinearity
1. High R2 but few significant t ratios.

M. Balcılar, EMU, ECON 503


9.16
Detecting Multicollinearity
2. High pair-wise correlations among regressors.

Suppose:
where λ2 and λ3 are constants, not both zero. Obviously, X4 is an exact
linear combination of X2 and X3, giving R24.23 = 1, the coefficient of
determination in the regression of X4 on X2 and X3.

This is satisfied by r42 = 0.5,


r43 = 0.5, and r23 =−0.5, which
Since R24.23 = 1, we get
are not very high values.

M. Balcılar, EMU, ECON 503


9.17
Detecting Multicollinearity
3. Examination of partial correlations.
In the regression of Y on X2, X3, and X4, a finding that
R21.234 is very high but r212.34, r213.24, and r214.23 are
comparatively low may suggest that the variables X2,
X3, and X4 are highly intercorrelated and that at least
one of these variables is superfluous.

Although a study of the partial correlations may be


useful, there is no guarantee that they will provide an
infallible guide to multicollinearity, for it may happen
that both R2 and all the partial correlations are
sufficiently high.
M. Balcılar, EMU, ECON 503
9.18
Detecting Multicollinearity
4. Auxiliary regressions.
Regress each Xi on the remaining X variables and com-pute the
corresponding R2, which we designate as R2i; each one of these
regressions is called an auxiliary regression.

Model: Yi = β1 + β 2 X 2i + β3 X 3i + β 4 X 4i + ui
Check the overall F, Fi,
AuxiliaryRegressions: i=2,3,4, in each of these
auxiliary regressions. If the
X 2i = α1 + α 3 X 3i + α 4 X 4i + u2i ⇒ R22 computed F exceeds the
X 3i = γ 1 + γ 2 X 2i + γ 4 X 4i + u3i ⇒ R32 critical Fi at the chosen level
of significance, it is taken to
X 4i = φ1 + φ2 X 2i + φ3 X 3i + u3i ⇒ R42 mean that the particular Xi is
collinear with other X’s

Klien’s rule of thumb: multicollinearity may be a troublesome problem only if


the R2 obtained from an auxiliary regression is greater thanthe overall R2
M. Balcılar, EMU, ECON 503
9.19
Detecting Multicollinearity
5. Eigenvalues and condition index.
Ξ = ( X′X ) a (m × m) symmetric positive (semi)definite matrix
Eigenvalueof Ξ aresolution to
Ξ − λI = 0 solution gives m eigenvalues: λ1 , λ1 ,… , λm

 20 5   20 − λ 5 
Ξ=  , Ξ − λI =  
 5 10   5 10 − λ 
Ξ − λ I = (20 − λ )(10 − λ ) − (5)(5) = λ 2 − 30λ + 175 = 0
⇒ λ1 = 15 + 5 2, λ2 = 15 − 5 2

M. Balcılar, EMU, ECON 503


9.20
Detecting Multicollinearity
5. Eigenvalues and condition index.
condition number
k defined as

condition index
(CI) defined as

Rule of thumb:
1. 100 < k < 1000 (10 < CI < 30)  moderate to strong
multicollinearity
2. k > 1000 (CI > 30)  severe multicollinearity
M. Balcılar, EMU, ECON 503
9.21
Detecting Multicollinearity
6. Tolerance and variance inflation factor
Rule of thumb:
If the VIF of a variable exceeds 10, which will happen if R2j
exceeds 0.90, that variable is said be highly

or

The closer is TOLj to zero, the greater the degree of


collinearity of that variable with the other regressors. On the
other hand, the closer TOLj is to 1, the greater the evidence
that Xj is not collinear with the other regressors.

M. Balcılar, EMU, ECON 503


9.22
Detecting Multicollinearity
6. Tolerance and variance inflation factor
VIF is neither necessary nor sufficient to get high
variances and high standard errors. Therefore, high
multicollinearity, as measured by a high VIF, may
not necessarily cause high standard errors. In all this
discussion, the terms high and low are used in a
relative sense.

M. Balcılar, EMU, ECON 503


Remedial Measures 9.23
Y = β1 + β 2 X 2 + β 3 X 3 + u
1. Utilise a priori information
= β1 + β 2 X 2 + 0.1β 2 X 3 + u
given β 3 = 0.1β 2 = β1 + β 2 ( X 2 + 0.1X 3 ) + u
= β1 + β 2 Z + u
2. Combining cross-sectional and time-series data
3. Dropping a variable(s) and re-specify the regression

4. Transformation of variables: ∆Y = β1 + β 2 ∆X 2 + β 3 ∆X 3 + u '


(i) First-difference form Y 1 X
(ii) Ratio transformation = β1 ( ) + β2 ( 2 ) + β3 + u'
X3 X3 X3
5. Additional or new data
6. Reducing collinearity in polynomial regression
Y = β1 + β 2 X 22 + β 3 X 3 + u ' Y = β1 + β 2 X 2 + β 3 X 32 + u '
7. Do nothing (if the objective is only for prediction)
M. Balcılar, EMU, ECON 503
Remedial Measures 9.24

7. Other remedies

Factor analysis and principal components


Collinear variables are reduced into a single variable (a
combination of all collinear variables)

Ridge regression
A solution in case of perfect multicollinearity. We accept a
(small) bias in the estimates

M. Balcılar, EMU, ECON 503


9.25

Significant

Insignificant

Since GDP and GNP


are highly related

Other examples: CPI <=> WPI;


CD rate <=> TB rate
M2 <=> M3M. Balcılar, EMU, ECON 503
9.26
Example: Longley Data
Time series for the years 1947–1962:

Y = number of people employed, in thousands;


X1 = GNP implicit price deflator;
X2 = GNP, millions of dollars;
X3 = number of people unemployed in thousands,
X4 =number of people in the armed forces,
X5 = noninstitutionalized population over 14 years of age;
X6 = year, equal to 1 in 1947, 2 in 1948, and 16 in 1962.

M. Balcılar, EMU, ECON 503


9.27
Example: Longley Data

M. Balcılar, EMU, ECON 503


9.28
Example: Longley Data

M. Balcılar, EMU, ECON 503


9.29
Example: Longley Data

M. Balcılar, EMU, ECON 503


9.30
Example: Longley Data

M. Balcılar, EMU, ECON 503


9.31
Example: Longley Data

M. Balcılar, EMU, ECON 503


9.32
Example: Longley Data
Remedial Actions:

1. Express GNP not in nominal terms, but in real terms,


(X2/X3)
2. Noninstitutional population over 14 years of age (X5)
grows over time because of natural population growth, it
will be highly correlated with time, the variable X6 in
our model. Therefore, instead of keeping both these
variables, we will keep the variable X5 and drop X6.
3. There is no compelling reason to include X3, the number
of people unemployed

M. Balcılar, EMU, ECON 503


9.33
Example: Longley Data

M. Balcılar, EMU, ECON 503

You might also like