You are on page 1of 20

CO RRELATIO N AN G

REG RESSIO N
Subtitle

CO RRELATIO N
CORRELATION COEFFICIENT
It is a mathematical index that describes the direction & magnitude
of a relationship.
Positive
High scores on X associated with High scores on Y
Negative
High scores on X associated with Low scores on Y
No Correlation
High or Low scores are not associated

SCATTERPLO TS

REG RESSIO N
Using correlation to assess the magnitude and direction of a
relationship
Used to make predictions about scores on one variable from the
knowledge of scores on another variable
Term first used in 1885 by Sir Francis Galton
Observed Regression toward Mediocrity or Regression toward the
mean meaning scores tend to regress towards the mean on repeated
occasions
Karl Pearson developed first statistical models of correlation and
regression (late 19th century)

REG RESSIO N SLO PE


The slope describes how much change is expected in Y
each time X increases by one unit.
How much change in Aggression Level (Y) each time a
violent movie (X) is watched
The point where X and Y meet is called the INTERCEPT
and its value is ZERO

REG RESSIO N LIN E


Defined as the best fitting straight line through a set of
point in a scatter diagram
Is formed by the principle of least squares it minimizes
the squared deviation around the regression line
The MEAN is the point of least squares in any variable.
Residual - The difference between the Observed scores
and Predicted scores

O TH ER CO N CEPTS
Residual
Standard Error of Estimate (SEE)
Coefficient of Determination
Coefficient of Alienation
Shrinkage
Cross Validation
The Correlation-Causation Problem
Third Variable Explanation
Restricted Range
Multivariate Analysis

RESID U AL
Definition: the difference between the predicted and the
observed values
Symbolically defined as Y - Y
The SUM of the residuals must always equals 0
The SUM of the squared residuals should be the
smallest value according to the principle of least squares.

STAN D ARD ERRO R O F ESTIM ATE (SEE)


SD of the Residuals
It is the Measure of Accuracy of Prediction.
Prediction is most accurate when SEE is small.
The Larger SEE = Less Accurate Prediction

CO EFFICIEN T O F D ETERM IN ATIO N


Defined as the Squared correlation coefficient
It is converted to percentage.
The higher it is = the better we can attribute variation to the two variables
Simply tells us that the variation that we get is truly the variation caused
by X on Y.
Eg. R = .40 (IQ and Length of Reviewing)
R2 = .402 = .16 Convert to percentage 16%
We can say that 16% of the change in IQ is attributable to the change in
length of reviewing.

CO EFFICIEN T O F ALIEN ATIO N


Opposite of Coefficient of Determination
Measure of non-association
Scores that are not explained

SH RIN KAG E
Defined as the amount of decrease observed when a
regression equation is created for one population and
then applied to another
The consequence of using a prediction created from
group A and then applied to group B
A tendency to overestimate the relationship between
variables, particularly if the sample subjects are small

CRO SS VALID ATIO N


A way to ensure that proper references are being made
Using the regression equation to predict performance in
a group of subjects other than the ones to which the
equation was applied
With this, one can also check the Residual.

TH E CO RRELATIO N -CAU SATIO N PRO BLEM


Just because two variables are correlated does not
necessarily mean the one has caused the other.
Correlation does not prove causality.

TH IRD VARIABLE EXPLAN ATIO N


The relationship between two variables may be
explained by another variable not included in the
analysis.
An external influence
E.g. Viewing violent movies and aggression may be a
result of some variable not included in the study, like
domestic violence

RESTRICTED RAN G E
Sometimes it is extremely difficult to demonstrate the
relationship between two things even though a true
underlying relationship may exist.
Correlations require variability, if variability is restricted,
then significant correlations are difficult to find.

M U TLIVARIATE AN ALYSIS
Considers relationship among combinations of three of
more variables
Multiple regression (interval data)
Discriminant Analysis (nominal/categorical data)
Factor Analysis (study of interrelationships among set of
variable without reference to a criterion; involves
reduction of large variable to small number)

CORRELATIONAL
STATISTICS
DESCRIBING
RELATIONSHIP BETWEEN
TWO VARIABLES

INFERENTIAL
STATISTICS
PARAMETRIC AND NON
PARAMETRIC TESTS

You might also like