You are on page 1of 22

MULTICOLLINEARITY

Truong Dang Thuy truong@dangthuy.net


Outline
What it is?
Sources of Multicollinearity
Detection
Remedy
Multicollinearity
One of the assumptions of the classical linear
regression (CLRM) is that there is no exact linear
relationship among the regressors.
If there are one or more such relationships among
the regressors, we call it multicollinearity, or
collinearity for short.
Perfect
collinearity: A perfect linear relationship
between the two variables exists.
Imperfect collinearity: The regressors are highly (but
not perfectly) collinear.
and : No relationship among



and : Relationship among


Sources of Multicollinearity
Data collection method used
Example: sampling over a limited range of values
Constraints on the population being sampled
Example: people with higher incomes are wealthier
May not have enough obervations on wealthy low
income individuals or high income low wealth
individuals
Sources of Multicollinearity
Model specification
Example: adding polynomial terms to a model,
especially if the range of the variable is small.
Economic Function
Example: = (, )
Variables sharing a common time trend
Example: Growth Rate depend on Interest Rate and
Price Level.
Consequences
If collinearity is not perfect, but high, several
consequences ensue:
The OLS estimators are still BLUE, but one or more
regression coefficients have large standard errors
relative to the values of the coefficients, thereby making
the ratios small.
Even though some regression coefficients are statistically
insignificant, the 2 value may be very high.
Therefore, one may conclude (misleadingly) that the
true values of these coefficients are not different from
zero.
Also, the regression coefficients may be very sensitive to
small changes in the data, especially if the sample is
relatively small.
Example: Non-performing loan
Source: Bankscope database, 30 VN banks, 2008 - 2012

[DEP VAR] npl: % of nonperforming loan in total loan


(%)
lllp: log of loan loss provision
creditgr: credit growth rate (%)
roa: Profit/Total Asset (%)
roe: Profit/Total Equity (%)
state: dummy, 1 = State owned bank
foreign: dummy, 1 = foreign bank
power: loan/total loan provided by all bank (%)
size: asset/total asset of all banks (%)
Summary statistics
. sum

Variable Obs Mean Std. Dev. Min Max

name 0
code 150 15.5 8.684438 1 30
year 150 2010 1.418951 2008 2012
npls 116 2.366724 1.779102 .02 11.4
llp 139 18639.68 72802.38 .1523 464380

creditgr 127 34.84189 44.5581 -29.86 344.17


roe 139 11.68265 9.431224 -56.326 31.526
roa 139 1.34736 1.227684 -5.993 7.936
state 150 .1333333 .3410734 0 1
foreign 150 .2 .40134 0 1

power 150 3.332559 12.52116 0 66.76098


size 150 3.332914 12.2858 0 56.80165
lllp 139 5.649386 2.763179 -1.881903 13.04846
Non-performing loan: OLS regression

. reg npls lllp creditgr roa roe state foreign power size

Source SS df MS Number of obs = 109


F( 8, 100) = 1.71
Model 39.9361248 8 4.99201561 Prob > F = 0.1051
Residual 291.838075 100 2.91838075 R-squared = 0.1204
Adj R-squared = 0.0500
Total 331.7742 108 3.07198333 Root MSE = 1.7083

npls Coef. Std. Err. t P>|t| [95% Conf. Interval]

lllp .3316394 .1235236 2.68 0.008 .0865721 .5767066


creditgr -.0091769 .0055566 -1.65 0.102 -.020201 .0018472
roa .4650457 .3305745 1.41 0.163 -.1908048 1.120896
roe -.0702516 .0348194 -2.02 0.046 -.1393324 -.0011708
state -.7321817 .5759936 -1.27 0.207 -1.874937 .4105732
foreign .1708128 .6345434 0.27 0.788 -1.088103 1.429729
power .0025636 .0468837 0.05 0.957 -.0904523 .0955795
size -.0548576 .0538168 -1.02 0.311 -.1616286 .0519134
_cons 1.237104 .7009874 1.76 0.081 -.153635 2.627843
Variance Inflation Factor
For the following regression model:
= 1 + 2 2 + 3 3 +
It can be shown that:
2 2
2 = 2 2
= 2
2 1 23 2
and
2 2
3 = 2 2
= 2
3 1 23 3
where 2 is the variance of the error term , and 23
is the coefficient of correlation between 2 and 3 .
Variance Inflation Factor
1
= 2
1 23
is the variance-inflating factor.
is a measure of the degree to which the
variance of the OLS estimator is inflated because of
collinearity.
Detection of Multicollinearity
High 2 but few significant ratios
High pair-wise correlations among explanatory variables
or regressors
Significant test for auxiliary regressions (regressions of
each regressor on the remaining regressors) or 2 of
auxiliary regression is higher than the regression
between and
Wrong expected sign but high 2
High variance inflation factor () > 10
Sensitive change when one more independent variable
has been added
Non-performing loan: OLS regression

. reg npls lllp creditgr roa roe state foreign power size

Source SS df MS Number of obs = 109


F( 8, 100) = 1.71
Model 39.9361248 8 4.99201561 Prob > F = 0.1051
Residual 291.838075 100 2.91838075 R-squared = 0.1204
Adj R-squared = 0.0500
Total 331.7742 108 3.07198333 Root MSE = 1.7083

npls Coef. Std. Err. t P>|t| [95% Conf. Interval]

lllp .3316394 .1235236 2.68 0.008 .0865721 .5767066


creditgr -.0091769 .0055566 -1.65 0.102 -.020201 .0018472
roa .4650457 .3305745 1.41 0.163 -.1908048 1.120896
roe -.0702516 .0348194 -2.02 0.046 -.1393324 -.0011708
state -.7321817 .5759936 -1.27 0.207 -1.874937 .4105732
foreign .1708128 .6345434 0.27 0.788 -1.088103 1.429729
power .0025636 .0468837 0.05 0.957 -.0904523 .0955795
size -.0548576 .0538168 -1.02 0.311 -.1616286 .0519134
_cons 1.237104 .7009874 1.76 0.081 -.153635 2.627843
Correlation matrix
. cor npls lllp creditgr roa roe state foreign power size
(obs=109)

npls lllp creditgr roa roe state foreign power size

npls 1.0000
lllp 0.0926 1.0000
creditgr -0.2097 -0.0655 1.0000
roa -0.1116 -0.1105 0.3881 1.0000
roe -0.1519 0.1771 0.3280 0.7633 1.0000
state -0.0255 0.3182 -0.1173 -0.0612 0.2321 1.0000
foreign 0.0177 -0.3569 -0.1728 0.1352 -0.1193 -0.1390 1.0000
power -0.0092 0.6785 -0.0362 -0.0833 -0.0771 -0.1091 -0.1005 1.0000
size -0.0046 0.7004 -0.0363 -0.0928 -0.0995 -0.1166 -0.1023 0.9686 1.0000

High correlation coefficients (usually believed to be


0.8) suggest high multicollinearity.
Low correlation coefficients do not imply the absence
of multicollinearity
as multicollinearity may involve more than two
variables
Non-performing loan: Auxiliary regression

. reg lllp creditgr roa roe state foreign power size

Source SS df MS Number of obs = 126


F( 7, 118) = 48.32
Model 655.192407 7 93.5989152 Prob > F = 0.0000
Residual 228.561324 118 1.93696038 R-squared = 0.7414
Adj R-squared = 0.7260
Total 883.753731 125 7.07002985 Root MSE = 1.3917

lllp Coef. Std. Err. t P>|t| [95% Conf. Interval]

creditgr .0007621 .0034872 0.22 0.827 -.0061435 .0076678


roa -.7772079 .2155654 -3.61 0.000 -1.204086 -.3503298
roe .1228698 .0203874 6.03 0.000 .0824971 .1632424
state 1.853699 .3731038 4.97 0.000 1.114852 2.592546
foreign -.8523212 .3820269 -2.23 0.028 -1.608839 -.0958039
power -.0229738 .0380647 -0.60 0.547 -.0983521 .0524046
size .1736726 .0402785 4.31 0.000 .0939101 .253435
_cons 4.647297 .2325704 19.98 0.000 4.186744 5.107849
Variance inflating factor
. vif

Variable VIF 1/VIF

size 19.38 0.051599


power 16.40 0.060968
lllp 4.19 0.238577
roe 4.09 0.244621
roa 3.75 0.266922
state 1.55 0.644362
foreign 1.36 0.732870
creditgr 1.30 0.771305

Mean VIF 6.50


Solutions
General Rules of Thumb
Dontworry if for each independent variable
are higher than 2
Because we still test hypothesis and the testing being
valid/coefficient estimated as BLUE
Right expected signs for Coefficients
Solutions
Restructuring of the model
There may be alternative specifications or alternative
functional forms
Example: production function
= (, , )
Solution:

= ( , , )

Solutions
Dropping one independent variable
How

Example
Non-performing loan
reg npls lllp creditgr roa roe state foreign power size
est store Model1
reg npls lllp creditgr roa roe state foreign power
est store Model2
reg npls lllp creditgr roa roe state foreign size
est store Model3
. est tab Model1 Model2 Model3

Variable Model1 Model2 Model3

lllp .33163937 .28240042 .33123143


creditgr -.00917688 -.00944659 -.00918439
roa .46504569 .4080824 .46376619
roe -.07025157 -.0614077 -.07004183
state -.73218171 -.63319235 -.73211074
foreign .1708128 .12496422 .17082746
power .00256364 -.04105714
size -.05485761 -.05217162
_cons 1.2371041 1.4567478 1.2387091

You might also like