You are on page 1of 5

UECM2623/UCCM2623 Numerical Methods and Statistics/UECM1693 Mathematics for Physics II

Chapter 5: Regression and Correlation


5.1

Introduction

The main objective of this chapter is to analyze a collection of paired sample data (or bivariate data) and
determine whether there appears to be a relationship between the two variables.
For a set of bivariate data, {( x1 , y1 ), ( x2 , y2 ), ... , ( xn , yn )} , the sum of squares of X and Y is given by
xy
.
S XY = ( x x )( y y ) = xy
n
Similarly,

(x) 2
n
(y ) 2
= y 2
n

the sum of squares of X is

S XX = x 2

the sum of squares of Y is

S YY

5.2

Correlation

A correlation exists between two variables when one of them is related to the other in some way.
A scatterplot (or scatter diagram) is a graph in which the paired (x, y) sample data are plotted with a
horizontal x-axis and a vertical y-axis. Each individual (x, y) pair is plotted as a single point.

Example 5.1. Suppose we take a sample of seven households and collect information on their incomes
and food expenditures for the past month. The information obtained (in hundreds of RM) is given below.
Income (hundreds)
Food expenditure (hundreds)

35
9

49
15

21
7

39
11

Solution.
The scatter diagram for this set of data is
16
14
12
10
8
6
4
10

20

30

40

50

Chapter 5 - 1

15
5

28
8

25
9

UECM2623/UCCM2623 Numerical Methods and Statistics/UECM1693 Mathematics for Physics II

5.2.1 Linear Correlation Coefficient


The linear correlation coefficient, r, (is also called the Pearson product moment correlation coefficient)
measures the strength of the linear relationship between the paired x- and y-quantitative values in a
sample.
S XY
r=
S XX SYY
Properties of the linear correlation coefficient, r,
1.
The value of r is always between 1 and 1 inclusive. That is 1 r 1 .
2.
r measures the strength of a linear relationship. It is not designed to measure the strength of a
relationship that is not linear.
3.
If r = 0 , then there is no linear relationship between the two variables
4.
0 r2 1

Degree of correlation
Perfect
Strong
Moderate
Weak
Absent

Positive correlation
+1
0.8 r < 1.0
0.4 r < 0.8
0 < r < 0.4
0

Negative correlation
1
1.0 < r 0.8
0.8 < r 0.4
0. 4 < r < 0
0

Example 5.2. Using the data given below, find the value of the linear correlation coefficient r. Interpret
the result.
X
Y

1
2

x
1
1
3
5
x = 10

y
2
8
6
4
y = 20

1
8

3
6

5
4

Solution.
xy
2
8
18

x2
1
1
9

y2
4
64
36

xy = 48

x2 = 36

y2 = 120

Chapter 5 - 2

UECM2623/UCCM2623 Numerical Methods and Statistics/UECM1693 Mathematics for Physics II

5.3

Regression

The simple regression equation (model) expresses a relationship between x (called the independent
variable, predictor variable or explanatory variable) and y (called the dependent variable or response
variable).
A (simple) regression model that gives a straight-line relationship between two variables is called a linear
regression model, y = A + B x
Given a collection of paired sample data, the regression equation
y = a + b x
describes the relationship between the two variables algebraically. The graph of the regression equation
is called the regression line.
The error sum of squares, denoted by SSE, is
SSE = ( y y ) 2
The values of a and b which give the minimum SSE are called the least squares estimates of A and B and
the regression line obtained with these estimates is called the least squares line.
For least squares regression line, y = a + b x
S
and
b = XY
a = y bx .
S XX
The least squares regression line y = a + b x is also called the regression of y on x.

Example 5.3. Find the regression equation for the data in Example 5.2.
Solution.

5.3.1 Interpretation of a and b


Interpretation of a
a is the y-intercept of the regression line, that is the value of y when x = 0
Interpretation of b
b is the slope of the regression line. The value of b in a regression line gives the change in y due to a
change of one unit in x.

Chapter 5 - 3

UECM2623/UCCM2623 Numerical Methods and Statistics/UECM1693 Mathematics for Physics II

Note
b is positive
b is negative

positive linear relationship between x and y


x increases then y increases, x decreases then y decreases.
negative linear relationship between x and y
x increases then y decreases, x decreases then y increases.

5.3.2 Using regression equation for predictions


If there is a linear correlation between x and y, the best predicted y-value is found by substituting the xvalue into the regression equation.

Example 5.4. Find the least squares regression line for the data in Example 5.1, use income as an
independent variable and food expenditure as a dependent variable
a)
What is the predicted food expenditure for a household with income of RM3000?
b)
Give a brief interpretation of the values of a and b calculated in part (a) in this context.
Solution.
Income, x
35
49
21
39
15
28
25
x =

Food expenditure, y
9
15
7
11
5
8
9
y = 64

xy =

xy
315
735
147

x2
1225
2401
441

y2
81
225
49

75
224
225

225
784
625

25
64
81

Chapter 5 - 4

x2 =

y2 =

UECM2623/UCCM2623 Numerical Methods and Statistics/UECM1693 Mathematics for Physics II

5.4

Coefficient of determination

The total sum of squares, denoted by SST is given by,


SST = ( y y ) 2

= SYY = y 2

( y ) 2
n

Example 5.5. For the regression line in Example 5.4, find the value of its SSE and SST.
Solution.
x

y = 1.1414 + 0.2642 x

y y

35
49
21
39
15
28
25

9
15
7
11
5
8
9

10.3884
14.0872
6.6896
11.4452

-1.3884
0.9128
0.3104
-0.4452

8.5390
7.7464

-0.5390
1.2536
( y y ) 2 =

( y y ) 2
1.9277
0.8332
0.0963
0.1982

0.2905
1.5715

These values indicate that the sum of squared errors decreased from 60.8571 to 4.9283 when we used
y in place of y to predict food expenditure.

This reduction in squared errors is called the regression sum of squares and is denoted by SSR. Thus
SSR = SST SSE
The coefficient of determination, denoted by r2, represents the proportion of SST that is explained by the
use of the linear regression model.
SSR
r2 =
SST
The computational formula for r2 is
2
S XY
and
r2 =
0 r 2 1.
S XX S YY

Example 5.6. For the data in Example 5.4, calculate the coefficient of determination and interpret the
result.

Chapter 5 - 5

You might also like