You are on page 1of 24

CHAPTER 4

Correlation and Regression Analysis

OBJECTIVES

 To discuss correlation and regression analysis.


 To gain knowledge about the formulae used in finding the coefficient of
correlation.
 To discuss the how to make a scatter diagram and its interpretation.
 To know how to compute the fitting line or the LSRL (Least Square
Regression Line)
 To provide exercises that will help the student grasps and fully
understand this module.

INTRODUCTION

Correlation and regression analysis are one of the most important legacies of Sir Francis Galton.
Correlation analysis is concerned with the relationships between two variables whereas
Regression analysis is used to describe the relationship precisely by mean of an equation.

Correlation and regression refer to the relationship that exists between two variables, X and Y,
in the case where each particular value of Xi is paired with one particular value of Yi. For
example: the measures of height for individual human subjects, paired with their corresponding
measures of weight; the number of hours that individual students in a statistics course spend
studying prior to an exam, paired with their corresponding measures of performance on the
exam; the amount of class time that individual students in a statistics course spend snoozing and
daydreaming prior to an exam, paired with their corresponding measures of performance on the
exam; and so on.

Fundamentally, it is a variation on the theme of quantitative functional relationship. The more


you have of this variable, the more you have of that one. Or conversely, the more you have of
this variable, the less you have of that one. Thus: the more you have of height, the more you will
tend to have of weight; the more that students study prior to a statistics exam, and the more they
will tend to do well on the exam. Or conversely, the greater the amount of class time prior to the
exam that students spend snoozing and daydreaming, the less they will tend to do well on the
exam. In the first kind of case (the more of this, the more of that), you are speaking of a positive
correlation between the two variables meaning if one variable increases, the other variable also

74
increase; and in the second kind (the more of this, the less of that), you are speaking of a
negative correlation between the two variables meaning if one variable increases, the other
decreases.

Correlation and regression are two sides of the same coin. In the under lying logic, you can begin
with either one or end up with the other. We will begin with correlation, since that is the part of
the correlation-regression story with which you are probably already somewhat familiar.

Correlation Analysis

Correlation is a statistical measurement of the relationship of two variables

Basic Properties of ρ

1. The value of r measures the strength of linear relationship between x and y


and will always be between ±1.
2. The closer r is to either -1 to +1, the stronger the linear relationship between x
and y.
3. If r is zero, then x and y are not linearly related.
4. The value of r does not change when the unit of measurements are changed.

Interpretation of ρ

0 - ± 0.20 Negligible relationship

± 0.21 - ± 0.40 Slight relationship

± 0.41 - ± 0.70 Moderate relationship

±0.71 - ± 0.90 Mark or High relationship

±0.91 - ±1.00 Very high to perfect relationship

Note:
Coefficient of correlation measures the similarity of the changes in the
value of x and y

75
Formulas for getting the coefficient of correlation

1. Using raw score


n(∑xy)− (∑x)(∑y)
r=
√[𝑛(∑𝑥 2 )− (∑𝑥)2 ][𝑛(∑𝑦 2 )− (∑𝑦)2 ]

Where ∑y – is the sum of all values of Y


∑x – is the sum of all values of X
n – is the total number of pairs of X and Y
∑y2 – is the sum of all squares of each value of Y
∑x2 - is the sum of all squares of each value of X
∑xy – is the sum of the individual product of each pair of X and Y

2. Product moment

∑𝑋𝑌
r=
√(∑𝑋 2 )(∑𝑌 2 )

Where ∑XY – is the sum of the product of (𝑥 − x̅) column and


( 𝑦 − 𝑦̅) column

∑X2 – is the sum of (𝑥 − x̅)2 column

∑Y2 – is the sum of ( 𝑦 − 𝑦̅)2 column


3. Standard score

∑(𝑥− x̅)2 ∑( 𝑦− 𝑦̅)2


Sx = √ Sy= √
𝑛 𝑛

∑(𝑥− 𝑥̅ )(𝑦−𝑦̅)
r= 𝑛(𝑆𝑥 )(𝑆𝑦 )

Where ∑XY – is the sum of the product of (𝑥 − x̅) column and ( 𝑦 − ỷ)


column

∑X2 – is the sum of (𝑥 − 𝑥̅ )2 column

∑Y2 – is the sum of ( 𝑦 − 𝑦̅)2 column

76
Proof that raw score formula is equal to product moment formula:
∑𝑋𝑌 n(∑xy)− (∑x)(∑y)
=
√(∑𝑋 2 )(∑𝑌 2 ) √[𝑛(∑𝑥 2 )− (∑𝑥)2 ][𝑛(∑𝑦 2 )− (∑𝑦)2 ]

∑(x −x̅)(y −𝑦̅) n(∑xy)− (∑x)(∑y)


=
√(∑(x −x̅)2 )(∑(y −𝑦̅)2 ) √[𝑛(∑𝑥 2 )− (∑𝑥)2 ][𝑛(∑𝑦 2 )− (∑𝑦)2 ]

∑(x y−x𝑦̅−𝑦𝑥̅ + 𝑥̅ 𝑦̅) n(∑xy)− (∑x)(∑y)


=
√(∑(x −x̅)2 )(∑(y −𝑦̅)2 ) √[𝑛(∑𝑥 2 )− (∑𝑥)2 ][𝑛(∑𝑦 2 )− (∑𝑦)2 ]

∑(x y−x𝑦̅−𝑦𝑥̅ + 𝑥̅ 𝑦̅) n(∑xy)− (∑x)(∑y)


=
√(∑𝑥 2 −2𝑥𝑥̅ + 𝑥̅ 2 )(∑(𝑦 2 −2𝑦𝑦̅+ 𝑦̅ 2 ) √[𝑛(∑𝑥 2 )− (∑𝑥)2 ][𝑛(∑𝑦 2 )− (∑𝑦)2 ]

∑x y−𝑦̅∑𝑥𝑦−𝑥̅ ∑𝑦+ 𝑛𝑥̅ 𝑦̅) n(∑xy)− (∑x)(∑y)


=
√(∑𝑥 2 −2𝑥̅ ∑𝑥 + 𝑛𝑥̅ 2 )(∑𝑦 2 −2𝑦̅∑𝑦+𝑛 𝑦̅ 2 ) √[𝑛(∑𝑥 2 )− (∑𝑥)2 ][𝑛(∑𝑦 2 )− (∑𝑦)2 ]

∑x y−n𝑦̅𝑥̅ −𝑛𝑥̅ 𝑦̅+ 𝑛𝑥̅ 𝑦̅) n(∑xy)− (∑x)(∑y)


(∑𝑥)2 (∑𝑦)2
=
√(∑𝑥 2 − √[𝑛(∑𝑥 2 )− (∑𝑥)2 ][𝑛(∑𝑦 2 )− (∑𝑦)2 ]
)(∑𝑦 2 − )
𝑛 𝑛

∑x∑y
∑x y− n(∑xy)− (∑x)(∑y)
n
𝑛∑𝑥2 − (∑𝑥)2 𝑛∑𝑦2 −(∑𝑦)2
=
√( √[𝑛(∑𝑥 2 )− (∑𝑥)2 ][𝑛(∑𝑦 2 )− (∑𝑦)2 ]
)( )
𝑛 𝑛

𝑛∑𝑥𝑦− ∑𝑥∑𝑦 n(∑xy)− (∑x)(∑y)


=
√[𝑛∑𝑥 2 − (∑𝑥)2 ][𝑛∑𝑦 2 −(∑𝑦)2 ] √[𝑛(∑𝑥 2 )− (∑𝑥)2 ][𝑛(∑𝑦 2 )− (∑𝑦)2 ]

𝑛∑𝑥𝑦− ∑𝑥∑𝑦 𝑛∑𝑥𝑦− ∑𝑥∑𝑦


=
√[𝑛∑𝑥 2 − (∑𝑥)2 ][𝑛∑𝑦 2 −(∑𝑦)2 ] √[𝑛∑𝑥 2 − (∑𝑥)2 ][𝑛∑𝑦 2 −(∑𝑦)2 ]

4. Spearman’s Rank Correlation

Data which are arranged in numerical order, usually from largest to smallest and
numbered 1, 2, 3 ---- are said to be in ranks or ranked data. These ranks prove useful at
certain times when two or more values of one variable are the same. The coefficient of
correlation for such type of data is given by Spearman rank difference correlation
coefficient and is denoted by ρ.
6∑𝑑2
ρ = 1 − 𝑛(𝑛2 −1)

where ρ = is the coefficient of correlation

77
d = is equal to the difference of rank 1 and rank 2

n = is the number of pairs

Examples:

1. Find the correlation coefficient of the data given below.

2. Researchers interested in determining if there is a relationship between death anxiety and


religiosity conducted the following study. Subjects completed a death anxiety scale (high
score = high anxiety) and also completed a checklist designed to measure an individuals
degree of religiosity (belief in a particular religion, regular attendance at religious
services, number of times per week they regularly pray, etc.) (high score = greater
religiosity . A data sample is provided below:

x(death anxiety) y(religiosity)


20 4
25 2
10 3
15 5
30 8
24 7
28 7
35 9
12 3
16 5
32 8
45 10

78
(𝑥 − (𝑦 − (𝑥 − 𝑥̅ )(𝑦 −
x y x2 y2 xy (𝑥 − 𝑥̅ ) (𝑦 − 𝑦̅)
𝑥̅ )² 𝑦̅) ² 𝑦̅)
18.7777 3.67361
20 4 400 16 80 -4.3333 -1.9167 8.305555556
8 1
0.44444 15.3402
25 2 625 4 50 0.66667 -3.9167 -2.611111111
4 8
205.444 8.50694
10 3 100 9 30 -14.333 -2.9167 41.80555556
4 4
87.1111 0.84027
15 5 225 25 75 -9.3333 -0.9167 8.555555556
1 8
32.1111 4.34027
30 8 900 64 240 5.66667 2.08333 11.80555556
1 8
0.11111 1.17361
24 7 576 49 168 -0.3333 1.08333 -0.361111111
1 1
13.4444 1.17361
28 7 784 49 196 3.66667 1.08333 3.972222222
4 1
122 113.777 9.50694
35 9 81 315 10.6667 3.08333 32.88888889
5 8 4
152.111 8.50694
12 3 144 9 36 -12.333 -2.9167 35.97222222
1 4
69.4444 0.84027
16 5 256 25 80 -8.3333 -0.9167 7.638888889
4 8
102 58.7777 4.34027
32 8 64 256 7.66667 2.08333 15.97222222
4 8 8
1 202 427.111 16.6736
45 100 450 20.6667 4.08333 84.38888889
0 5 1 1
29 7 828 1178.66 74.9166
495 1976 0 0 248.3333333
2 1 4 7 7

Solving r using:

1. Raw Score:
𝑛(∑𝑥𝑦)−(∑𝑥)(∑𝑦)
r=
√[𝑛(∑𝑥 2 )−(∑𝑥)2 ][𝑛(∑𝑦 2 )−(∑𝑦)²

12(1976)−(292)(71)
r=
√[12(8284)−(292)2 ][12(495)−(71)2 ]

r = 0.8357 Mark or High relationship

79
2. Standard Score

∑(𝑥− x̅)2 ∑( 𝑦− 𝑦̅)2


Sx = √ Sy = √
𝑛 𝑛

∑(𝑥−𝑥̅ )(𝑦− 𝑦̅ỷ)


r= 𝑛(𝑆𝑥 )(𝑆𝑦 )

1178.6667 74.1967
Sx = √ Sy =√
12 12

Sx = 9.9107 Sy = 2.4866
∑(𝑥−𝑥̅ )(𝑦− 𝑦̅ỷ)
r= 𝑛(𝑆𝑥)(𝑆𝑦)

248.3333
r= 12(9.9107)(2.2866)

r= 0.8357 Mark or High relationship

3. Product Moment

∑𝑋𝑌
r=
√(∑𝑋 2 )(∑𝑌 2 )

248.3333
r=
√(1178.6667)(74.1967)

r = 0.9130 Mark or High relationship

80
4. Spearman Rho

x y Rx Ry d d2
20 4 5 8 -3 9
25 2 12.5 10 2.5 6.25
10 3 18 17 1 1
15 5 11 6 5 25
30 8 2.5 4.5 -2 4
24 7 2.5 1.5 1 1
28 7 16 18 -2 4
35 9 6 7 -1 1
12 3 4 83 -79 6241
16 5 9 4.5 4.5 20.25
32 8 9 12.5 -3.5 12.25
45 10 17 10 7 49
292 71 -69.5 6373.75

6∑𝑑2
ρ = 1 − 𝑛(𝑛2 −1)

6(63.6375)
ρ = 1 − 12(122 −1)

ρ = 0.7775 Mark or High relationship

3. The following table summarizes the results of an aptitude test given to six clerks to
determine the correlation between test scores (x) and sales in the first month (y) in
hundreds of dollars.

x 80 65 48 67 91 91 52 75 86 71 72 51 67 52 74 55 94 69
y 71 68 57 81 79 87 50 77 83 78 67 67 61 66 65 63 85 64

81
(𝑥 (𝑦 (𝑥 − 𝑥̅ )(𝑦
x y x² y² xy 𝑥 − 𝑥̅ 𝑦 − 𝑦̅
− 𝑥̅ )2 − 𝑦̅)2 − 𝑦̅)
80 71 6400 5041 5680 10 0.5 100 0.25 5
65 68 4225 4624 4420 -5 -2.5 25 6.25 12.5
48 57 2304 3249 2736 -22 -13.5 484 182.25 297
67 81 4489 6561 5427 -3 10.5 9 110.25 -31.5
91 79 8281 6241 7189 21 8.5 441 72.25 178.5
91 87 8281 7569 7917 21 16.5 441 272.25 346.5
52 50 2704 2500 2600 -18 -20.5 324 420.25 369
75 77 5625 5929 5775 5 6.5 25 42.25 32.5
86 83 7396 6889 7138 16 12.5 256 156.25 200
71 78 5041 6084 5538 1 7.5 1 56.25 7.5
72 67 5184 4489 4824 2 -3.5 4 12.25 -7
51 67 2601 4489 3417 -19 -3.5 361 12.25 66.5
67 61 4489 3721 4087 -3 -9.5 9 90.25 28.5
52 66 2704 4356 3432 -18 -4.5 324 20.25 81
74 65 5476 4225 4810 4 -5.5 16 30.25 -22
55 63 3025 3969 3465 -15 -7.5 225 56.25 112.5
94 85 8836 7225 7990 24 14.5 576 210.25 348
69 64 4761 4096 4416 -1 -6.5 1 42.25 6.5
1260 1269 91822 91257 90861 0 0 3622 1792.5 2031

1. Find the correlation coefficient and interpret the results.

Solutions:

Using Raw Score:

n(∑xy) − (∑x)(∑y)
𝑟=
√[𝑛(∑𝑥 2 ) − (∑𝑥)2 ][𝑛(∑𝑦 2 ) − (∑𝑦)2 ]

18(90861) − (1260)(1269)
𝑟=
√[18(91822) − (1260)²][18(91257) − (1269)²]

𝑟 = 0.7971 Mark or High relationship

82
Using Product Moment:

∑𝑋𝑌
𝑟=
√(∑𝑋 2 )(∑𝑌 2 )

2031
𝑟=
√(3622)(1792.5)

𝑟 = 0.7971 Mark or High relationship

Using Standard Score:

∑(𝑥− x̅)2 ∑( 𝑦− 𝑦̅ỷ)2


𝑆𝑥 = √ 𝑆𝑦 = √
𝑛 𝑛

∑ ( x  x)( y  y)
𝑟 = 𝑛(𝑆𝑥 )(𝑆𝑦 )

∑(𝑥 − )2
𝑆𝑥 = √
𝑛

3622
𝑆𝑥 = √
18

𝑆𝑥 = 14.1853

∑( 𝑦 − ỷ)2
𝑆𝑦 = √
𝑛

1792.5
𝑆𝑦 = √
18

𝑆𝑦 = 9.9791

83
∑ ( x  x)( y  y )
𝑟 =
𝑛(𝑆𝑥)(𝑆𝑦)

2031
𝑟 =
18(14.1853)(9.9791)

𝑟 = 0.7971 Mark or High relationship

x y Rx Ry d d²
80 71 14 11 3 9
65 68 6 10 -4 16
48 57 1 2 -1 1
67 81 7.5 15 -7.5 56.25
91 79 16.5 14 2.5 6.25
91 87 16.5 18 -1.5 2.25
52 50 3.5 1 2.5 6.25
75 77 13 12 1 1
86 83 15 16 -1 1
71 78 10 13 -3 9
72 67 11 8.5 2.5 6.25
51 67 2 8.5 -6.5 42.25
67 61 7.5 3 4.5 20.25
52 66 3.5 7 -3.5 12.25
74 65 12 6 6 36
55 63 5 4 1 1
94 85 18 17 1 1
69 64 9 5 4 16
1260 1269 0 243

Using Spearman’s Rank Correlation Coefficient

6∑𝑑²
𝜌 = 1−
𝑛(𝑛2 − 1)

6(243)
𝜌 = 1−
18(182 − 1)

84
𝜌 = 0.7492 Mark or High relationship

3. With the growth of internet service providers, a researcher decides to examine whether there
is a correlation between cost of internet service per month (rounded to the nearest dollar) and
degree of customer satisfaction (on a scale of 1 - 25 with a 1 being not at all satisfied and a 25
being extremely satisfied). The researcher only includes programs with comparable types of
services. A sample of the data is provided below.

x y
(customer (cost of
satisfaction) internet)
20 30
20 38
22 40
22.5 25
23 20
23.5 10
24 13
24.5 15
25 9
25.5 12
230 212

x y x2 y2 xy 𝑥 − 𝑥̅ 𝑦 − 𝑦̅ (𝑥 − 𝑥̅ )2 𝑦 − 𝑦̅)2 (𝑥 − 𝑥̅ )(𝑦 − 𝑦̅)


20 30 400 900 600 -3 8.8 9 77.44 -26.4
20 38 400 1444 760 -3 16.8 9 282.24 -50.4
22 40 484 1600 880 -1 18.8 1 353.44 -18.8
22.5 25 506 625 562.5 -0.5 3.8 0.25 14.44 -1.9
23 20 529 400 460 0 -1.2 0 1.44 0
23.5 10 552 100 235 0.5 -11.2 0.25 125.44 -5.6
24 13 576 169 312 1 -8.2 1 67.24 -8.2
24.5 15 600 225 367.5 1.5 -6.2 2.25 38.44 -9.3
25 9 625 81 225 2 -12.2 4 148.84 -24.4
25.5 12 650 144 306 2.5 -9.2 6.25 84.64 -23
230 212 5323 5688 4708 0 0 33 1193.6 -168

85
Solutions:

Using raw score


n(∑xy)− (∑x)(∑y)
r=
√[𝑛(∑𝑥 2 )− (∑𝑥)2 ][𝑛(∑𝑦 2 )− (∑𝑦)2 ]

47080−48760
r=
√[53230−52900][56880−44944]

n(∑xy)− (∑x)(∑y)
r=
√[𝑛(∑𝑥 2 )− (∑𝑥)2 ][𝑛(∑𝑦 2 )− (∑𝑦)2 ]

r = -0.84 Mark or High relationship

Using standard score

∑(𝑥− x̅)2 ∑( 𝑦− 𝑦̅)2


Sx = √ Sy = √
𝑛 𝑛

∑(𝑥−𝑥̅ )(𝑦− 𝑦̅ỷ)


r= 𝑛(𝑆𝑥 )(𝑆𝑦 )

33 1193.6
Sx = √10 Sy =√ 10

Sx = 1.8165 Sy = 10.9252

∑(𝑥−𝑥̅ )(𝑦− 𝑦̅)


r= 𝑛(𝑆𝑥 )(𝑆𝑦 )

−168
r = 10(1.8165)(10.9252)

r = -0.84 Mark or High relationship

Using product moment


∑𝑋𝑌
r=
√(∑𝑋 2 )(∑𝑌 2 )

−168
r=
√(33)(1193.6)

86
r = -0.84 Mark or High relationship

x y Rx Ry d d²
20 30 1.5 8 -6.5 42.25
20 38 1.5 9 -7.5 56.25
22 40 3 10 -7 49
22.5 25 4 7 -3 9
23 20 5 6 -1 1
23.5 10 6 2 4 16
24 13 7 4 3 9
24.5 15 8 5 3 9
25 9 9 1 8 64
25.5 12 10 3 7 49
230 212 0 304.5
Using Spearman rank coefficient
6∑𝑑2
ρ = 1 − 𝑛(𝑛2 −1)
6(304.5)
ρ = 1 − 10(100−1)

ρ = −0.84 Mark or High relationship

Regression Analysis

Regression analysis is the analysis of several variables in which the focus is on the
relationship between a dependent variable and one or more independent variables.

Scatter Diagram

A scatter plot or scatter diagram is a type of mathematical diagram using Cartesian


coordinates to display values for two variables for a set of data.

This is an example of a scatter diagram.

Researchers interested in determining if there is a relationship between death anxiety and


religiosity conducted the following study. Subjects completed a death anxiety scale (high score =

87
high anxiety) and also completed a checklist designed to measure an individuals degree of
religiosity (belief in a particular religion, regular attendance at religious services, number of
times per week they regularly pray, etc.) (high score = greater religiosity . A data sample is
provided below:

X(death anxiety) Y(religiosity)


20 4
25 2
10 3
15 5
30 8
24 7
28 7
35 9
12 3
16 5
32 8
45 10


















x

     






88
LEAST SQUARE REGRESSION LINE

The Least Square Regression Line or the Method of Least Squares is the statistical procedure
for finding the best-fitting straight line for a set of points in a given problem.
Formulas:

LEAST SQUARE REGRESSION LINE of LEAST SQUARE REGRESSION LINE of


Y on X X on Y

Y= a0 + a1x X= b0 + b1x
Where: Where:

(∑𝑦)(∑𝑥 2 )− (∑𝑥)(∑𝑥𝑦) (∑𝑥)(∑𝑦2 )− (∑𝑦)(∑𝑥𝑦)


a0 = b0=
𝑛(∑𝑥 2 )− (∑𝑥)2 𝑛(∑𝑦 2 )− (∑𝑦)2

𝑛(∑𝑥𝑦)− (∑𝑥)(∑𝑦) 𝑛(∑𝑥𝑦)− (∑𝑥)(∑𝑦)


a1= b1=
𝑛(∑𝑥 2 )− (∑𝑥)2 𝑛(∑𝑦 2 )− (∑𝑦)2

Where:
∑y – is the sum of all values of Y
∑x – is the sum of all values of X
n – is the total number of pairs of X and Y
∑y2 – is the sum of all squares of each value of Y
∑x2 - is the sum of all squares of each value of X
∑xy – is the sum of the individual product of each pair of X and Y

89
Example:

1. Find the LSRL’s of the data given below.

Researchers interested in determining if there is a relationship between death


anxiety and religiosity conducted the following study. Subjects completed a death anxiety
scale (high score = high anxiety) and also completed a checklist designed to measure an
individuals degree of religiosity (belief in a particular religion, regular attendance at
religious services, number of times per week they regularly pray, etc.) (high score =
greater religiosity . A data sample is provided below:

X(death anxiety) Y(religiosity)


20 4
25 2
10 3
15 5
30 8
24 7
28 7
35 9
12 3
16 5
32 8
45 10

x y x2 y2 xy
20 4 400 16 80
25 2 625 4 50
10 3 100 9 30
15 5 225 25 75
30 8 900 64 240
24 7 576 49 168
28 7 784 49 196
35 9 1225 81 315
12 3 144 9 36
16 5 256 25 80
32 8 1024 64 256
45 10 2025 100 450
292 71 8284 495 1976

90
LSRL of Y on X

y= a0 + a1x

(∑𝑦)(∑𝑥 2 )−(∑𝑥)(∑𝑥𝑦) 𝑛(∑𝑥𝑦)−(∑𝑥)(∑𝑦)


a0 = a1 =
n(∑𝑥 2 )−(∑𝑥)2 n(∑𝑥 2 )−(∑𝑥)2

(71)(8284)−(292)(1976) 12(1976)−(292)(71)
a0 = a1 =
12(8284)−(292)2 12(8284)−(292)2

a0 = 0.7899 a1 = 0.2107

y = 0.7899 + 0.2107x

LSRL of X on Y

x= b0 + b1y

(∑𝑥)(∑𝑦 2 )−(∑𝑦)(∑𝑥𝑦) 𝑛(∑𝑥𝑦)−(∑𝑥)(∑𝑦)


b0 = b1 =
n(∑𝑦 2 )−(∑𝑦)2 n(∑𝑦 2 )−(∑𝑦)2

(292)(495)−(71)(1976) 12(1976)−(292)(71)
b0 = b1 =
12(495)−(71)2 12(495)−(71)2

b0 = -0.7853 b1 = 3.3148

x = -0.7853 + 3.3148y


















x

     





91
2. The following table summarizes the results of an aptitude test given to six clerks to
determine the correlation between test scores (x) and sales in the first month (y) in
hundreds of dollars.

X 80 65 48 67 91 91 52 75 86 71 72 51 67 52 74 55 94 69
Y 71 68 57 81 79 87 50 77 83 78 67 67 61 66 65 63 85 64

x y x² y² xy
80 71 6400 5041 5680
65 68 4225 4624 4420
48 57 2304 3249 2736
67 81 4489 6561 5427
91 79 8281 6241 7189
91 87 8281 7569 7917
52 50 2704 2500 2600
75 77 5625 5929 5775
86 83 7396 6889 7138
71 78 5041 6084 5538
72 67 5184 4489 4824
51 67 2601 4489 3417
67 61 4489 3721 4087
52 66 2704 4356 3432
74 65 5476 4225 4810
55 63 3025 3969 3465
94 85 8836 7225 7990
69 64 4761 4096 4416
1260 1269 91822 91257 90861

LSRL of Y on X

(∑𝑦)(∑𝑥 2 ) − (∑𝑥)(∑𝑥𝑦)
𝑎0 =
𝑛(∑𝑥 2 ) − (∑𝑥)²

(1269)(91822) − (1260)(90861)
𝑎0 =
18(91822) − (1260)²

𝑎0 = 31.2482

92
𝑛(∑𝑥𝑦) − (∑𝑥)(∑𝑦)
𝑎1 =
𝑛(∑𝑥 2 ) − (∑𝑥)²

18(90861) − (1260)(1269)
𝑎1 =
18(91822) − (1260)²

𝑎1 = 0.5607

y = 31.2482 + 0.5607x

LSRL of X on Y

(∑𝑥)(∑𝑦 2 ) − (∑𝑦)(∑𝑥𝑦)
𝑏0 =
𝑛(∑𝑦 2 ) − (∑𝑦)²

(1260)(91257) − (1269)(90861)
𝑏0 =
18(91257) − (1269)²

𝑏0 = −9.8803

𝑛(∑𝑥𝑦) − (∑𝑥)(∑𝑦)
𝑏1 =
𝑛(∑𝑦 2 ) − (∑𝑦)²

18(90861) − (1260)(1269)
𝑏1 =
18(91257) − (1269)²

𝑏1 = 1.1331
y
x=- 9.8803 + 1.1331y










93 x

                        


3. With the growth of internet service providers, a researcher decides to examine whether there
is a correlation between cost of internet service per month (rounded to the nearest dollar) and
degree of customer satisfaction (on a scale of 1 - 25 with a 1 being not at all satisfied and a 25
being extremely satisfied). The researcher only includes programs with comparable types of
services. A sample of the data is provided below.

x y
(customer (cost of
satisfaction) internet)
20 30
20 38
22 40
22.5 25
23 20
23.5 10
24 13
24.5 15
25 9
25.5 12
230 212

LSRL of y on x

y= a0 + a1x

(∑𝑦)(∑𝑥 2 )−(∑𝑥)(∑𝑥𝑦) 𝑛(∑𝑥𝑦)−(∑𝑥)(∑𝑦)


a0 = a1 =
n(∑𝑥 2 )−(∑𝑥)2 n(∑𝑥 2 )−(∑𝑥)2

(212)(5323)−(230)(4708) 11(4708)−(230)(212)
a0= a1 =
11(5323)−(230)2 11(5323)−(230)2

a0 = 8.0729 a1 = 0.5356
94
y = 8.0729 + 0.5356x

LSRL of x on y

x= b0 + b1y

(∑𝑥)(∑𝑦 2 )−(∑𝑦)(∑𝑥𝑦) 𝑛(∑𝑥𝑦)−(∑𝑥)(∑𝑦)


b0 = b1 =
n(∑𝑦 2 )−(∑𝑦)2 n(∑𝑦 2 )−(∑𝑦)2

(230)(5688)−(212)(4708) 11(4708)−(230)(212)
b0 = b1 =
11(5688)−(212)2 11(5688)−(212)2

b0 = 17.5978 b1 = 0.1718

x = 17.5978 + 0.1718y
y



















x y x2
y2 xy 𝑥 − 𝑥̅ 𝑦 − 𝑦̅ (𝑥 − 𝑥̅ )2 𝑦 − 𝑦̅)2 (𝑥 − 𝑥̅ )(𝑦 − 𝑦̅)

20 30 400 900 600 -3 8.8 9 77.44 -26.4

20 38 400 1444 760 -3 16.8 9 282.24 -50.4

22 40 484 1600 880 -1 18.8 1 353.44 -18.8

22.5 25 506 625 562.5 -0.5 3.8 0.25 14.44 x -1.9
23 20 529   400
      460

         0
       -1.2
          0         1.44
        0
23.5 10 552 100 
235 0.5 -11.2 0.25 125.44 -5.6
24 13 576 169 
312 1 -8.2 1 67.24 -8.2
24.5 15 600 225 367.5 1.5 -6.2 2.25 38.44 -9.3
25 9 625 81 225 2 -12.2 4 148.84 -24.4
25.5 12 650 144 306 2.5 -9.2 6.25 84.64 -23
230 212 5323 5688 4708 0 0 33 1193.6 -168

95
LEAST SQUARE REGRESSION PARABOLA

The method of Least Square Regression Parabola assumes that the best-fit
curve of a given type is the curve that has the minimal sum of the deviations
squared (least square error) from a given set of data.

Formulas:

∑y = 𝑎0 𝑛 + 𝑎1 ∑𝑥 + 𝑎2 ∑𝑥 2

∑xy = 𝑎0 ∑𝑥 + 𝑎1 ∑𝑥 2 + 𝑎2 ∑𝑥 3

∑x2y = 𝑎0 ∑𝑥 2 + 𝑎1 ∑𝑥 3 + 𝑎2 ∑𝑥 4

y= a0 + a1x + a2x2

Where:
(∑𝑦)(∑𝑥 2 )− (∑𝑥)(∑𝑥𝑦)
a0 = 𝑛(∑𝑥 2 )− (∑𝑥)2

𝑛(∑𝑥𝑦)− (∑𝑥)(∑𝑦)
a1 = 𝑛(∑𝑥 2 )− (∑𝑥)2

∑y – is the sum of all values of Y


∑x – is the sum of all values of X
n – is the total number of pairs of X and Y
∑y2 – is the sum of all squares of each value of Y
∑x2 - is the sum of all squares of each value of X
∑xy – is the sum of the individual product of each pair of X and Y

96
Chapter Exercise

Direction: Consider the following pairs of measurement.

1. x 2 5 7 1 4 3 0 2
y 10 4 2 8 5 3 5 8 X - Achievement

2. x 2 7 5 4 9 3 3 4 5 6 Y - GPA
y 20 35 48 51 71 39 45 25 60 70

It is assumed that achievement test scores should be correlated with student's


classroom performance. One would expect that students who consistently perform
well in the classroom (tests, quizzes, etc.) would also perform well on a standardized
achievement test (0 - 100 with 100 indicating high achievement). A teacher decides to
examine this hypothesis. At the end of the academic year, she computes a correlation
between the students achievement test scores (she purposefully did not look at this
data until after she submitted students grades) and the overall g.p.a. for each student
computed over the entire year. The data for her class are provided above.

a. Construct a scatter diagram for each.


b. LSRL of Y on X
c. LSRL of X on Y
d. Draw the LSRL on the scatter diagram
e. Find ρ or r (correlation coefficient) in two ways.

97

You might also like