You are on page 1of 3

Multivariate Introduction

Correlation Analysis

The correlation coecient or Pearson's Correlation Coecient


was originated by Karl Pearson in 1900's. Correlation coecient
is a measure of the (degree of) strength of the linear relationship
between two continuous random variables denote by XY for
population and for sample it is denoted by rXY .
Correlation coecient can take values that occur in the
interval [1, 1]. If coecient values is 1 or -1, there will be
perfect linear relationship between the variables. Positive sign
with coecient value shows positive (direct, or supportive),
while negative sign with coecient value show negative
(indirect, opposite) relationship between the variables. The
Zero value implies the absence of a linear linear relation and it
also shows that variables are independent. Zero value also
shows that there may be some other sort of relationship
between the variables of interest such as systematic or circular
relation between the variables.

Mathematically, if two random variables such as X and Y follow


an unknown joint distribution then the simple linear correlation
coecient is equal to covariance between X and Y divided by
the product of their standard deviations i.e
=

Cov(X, Y )
X Y

where Cov(X, Y ) is measure of covariance between X and Y ,


1 Visit: http://itfeature.com

Multivariate Introduction
X and Y are the respective standard deviation of the random

variables.

For a sample of size n, (X1 , Y1 ), (X2 , Y2 ), , (Xn , Yn ) from the


joint distribution, the quantity given bellow is an estimate of ,
called sampling correlation and denoted by r.
Pn
r = pPn

i=1

i Y )
X)(Y
2 Pn (Yi Y )2
(Xi X)
i=1 (Xi

i=1

Cov(X, Y )
=
SX XY

Note that
The existence of a statistical correlation does not means

that there exists a cause and eect relation between the


variables. Cause and eect means that change in one
variable does cause a change in the other variable.

The changes in the variables may be due to a common

cause or random variations.

There are many kind of correlation coecient. The choice

of which to use for a particular set of data depends on


dierent factors such as
 Type of Scale (Measurement Scale) used to express
the variables
 Nature of the underlying distribution (continuous or
discrete)
 Characteristics of the distribution of the scores (linear
or non linear)

Correlation is perfect linear if a constant change in X is


accompanied by a constant change in Y . In this case all

the points in scatter diagram will lie on a straight line.

High correlation coecient does not necessarily imply a

direct dependence of the variables. For example there may


2 Visit: http://itfeature.com

Multivariate Introduction
be a high correlation between number of crimes and shoe
prices. Such kind of correlation referred as non-sense or
spurious correlations.
Properties of the Correlation Coecient
The correlation coecient is symmetrical with respect to
X and Y i.e. rXY = rY X
The Correlation coecient is a pure number and it does

not depend upon the units in which the variables are


measure.

The correlation coecient is the geometric mean of the

two regression coecients. Thus if the two regression lines


of Y on X and X on Y are written as Y = a + bx and
X = c + dy respectively then bd = r2 .

The correlation coecient is independent of the choice of


origin and scale of measurement of the variables, i.e. r

remains unchanged if constants are added to or subtracted


from the variables and if the variables having same size are
multiplied or divided by the class interval size.

The correlation coecient lies between -1 and +1,


symbolically 1 r 1.

Promote my web-site by commenting, visiting and reading the


content of my site. Your participation will help us to provide
you quality content.

Visit and join my site: http://itfeature.com


Visit & join sms-collection site: http://zfm.in

3 Visit: http://itfeature.com

You might also like