You are on page 1of 3

Detecting Collinearity:

You must always be cautious when using multiple linear regression commands within a statistical environment
because although the program will spit out numbers, these numbers may be misleading when other factors are not
considered. R does not preform interpretations, unless you specifically program it to do so. One thing that you may
encounter when preforming multiple linear regression is the presence of co-linear variables, or factors that are related
to each other. When predictor variables are highly correlated, coefficient estimates may become inaccurate, and their
effect on the model may be over or under predicted. The model will have no less predictive power.
It is prudent to test for collinearity before you begin to run any important statistical tests involving regression. There
are a few ways to do this in R. The two that I will demonstrate here involve using Variance Inflation Factors and visual
representations of different predictor variables.
To determine whether two predictor variables are highly correlated you can plot them against each other. If there is a
strong relationship it may be apparent from the spread of the paired data points. For this example we will use a blood
pressure data set.

bp<-read.csv("C:/Users/Student/Desktop/statintern/bp.csv", header=TRUE)
head(bp)
attach(bp)

Use the pairs(data) command to plot all the variables against each other simultaneously.

pairs(bp)

These are the resulting plots created by R

The individual values of each factor are plotted against the individual values of all other factors. Look among the plots
for any strong correlations, or in this case, linear relationships between variables. Weight and BSA appear to be
strongly correlated, as perhaps Age and Weight are.

VIF: Variance Inflation Factor

Another commonly used method to test for multicollinearity involves using VIFs(Variance inflation Factor). A VIF
measures how much the variance of an estimated coefficient is increased because of collinearity. To calculate a VIF
in R, one must first download the package car from the CRAN mirror.

library(car)

Once the car package has been loaded, you must create a multiple linear regression model in R that includes all the
factors that you wish to test. Using the . within the MLR model tells R to use all the factors. In order to calculate
VIFs, they must first be included in the model.

MLRmod<-lm(BP~.,data=bp)
vif(MLRmod)

As a general rule, if the VIF is larger than 5 then multicollinearity is assumed to be high. In this example there are two
terms, Weight and BSA that are highly collinear. To remedy this, one may exclude the factors or combination of
factors, or use some form of stepwise regression to determine the best model, as will be discussed in the next
section.
2012

12/07
CATEGORY
R
Tutorial
Comments Off

You might also like