You are on page 1of 33

LINEAR REGRESSION

INTRODUCTION/MOTIVATION

• VERY STRONG CORRELATION (with a coefficient of


0.80) between AVERAGE STUDY HOURS per week
and STATISTICS GRADE
• QUESTION: IS IT POSSIBLE TO PREDICT YOUR STATISTICS
GRADE GIVEN YOUR AVERAGE STUDY HOUR PER WEEK?
INTRODUCTION/MOTIVATION

• Given the weight of a pregnant mother,


can we predict the weight of her infant?
• Is it possible to estimate the glucose
level of a person, given her/his age?
• Can we forecast the number of deaths
caused by lung cancer, if we have data
on the cigarette consumption?
 Self-taught naturalist,
anthropologist, astronomer, and
statistician
 A real-life “Indiana Jones
character”.
 Studied data on relative sizes of
parents and their offspring
 “Regression towards mediocrity”
(regression to the mean)
Regression Analysis

• It is a statistical method that determines the


nature of the relationship between variables,
that is either positive or negative, linear or non-
linear.
• It gives the regression equation that enables us
to predict the value of the dependent variable
given the value of the independent variable
Correlation vs. Regression

 Correlation describes the strength of a linear


relationship between two variables

 Regression tells us how to draw the straight line


described by the correlation
Regression Line
Also called the line of best fit or the least-
squares regression line
It is the line drawn through a scatter plot
which can be used to find the direction of
the association between the two variables.
It is the line that divides the points on the
scatter plot such that the number of points
above is approximately equal to the number
of points below.
Example:

SBP (mmHg)
220

200

180

160

140

120

100

80
Wt (kg)
60 70 80 90 100 110 120
Regression coefficient

• The regression coefficient is the slope of the


regression line and tells you what the nature
of the relationship between the variables is.
• How much change in the independent
variables is associated with how much change
in the dependent variable.
• The larger the regression coefficient the more
change.
Equation of the Regression Line:

ŷ  a  bX

where:
y = predicted or fitted value of y
x = the value of any particular observation of the independent
variable
y = the value of any particular observation of the dependent variable
a = y-intercept
b = intercept of the regression line
Examples:
1. A teacher wishes to determine if the number of
absences a student incurs correlates with his or her
general weighted average. He chose a sample of 7
students and then surveyed their GWA and number of
absences.
Number of 1 2 3 5 7 9 10
Absences (x)
GWA (y) 97.8 90.2 86.4 87.3 85.4 84.5 78.2

Find the equation of the regression line and predict the


GWA of a student who incurred 6 absences.
Solution:
y
120
100

80

60

40

20
0 x
0 2 4 6 8 10 12
Solution:

Student x y xy x2 y2
1 1 97.8 97.8 1 9564.84
2 2 90.2 180.4 4 8136.04
3 3 86.4 259.2 9 7464.96
4 5 87.3 436.2 25 7621.29
5 7 85.4 597.8 49 7293.16
6 9 84.5 760.5 81 7140.25
7 10 78.2 782 100 6115.24
Total 37 609.8 3114.2 269 53335.78
Solution:

y’ = 94.963 – 1.485(6)
y’ = 86.054
Examples:
2. A sample of 6 persons was selected the value of their
age ( x variable) and their weight is demonstrated in the
following table. Find the regression equation and what is
the predicted weight when age is 8.5 years.

Serial no. Age (x) Weight (y)


1 7 12
2 6 8
3 8 12
4 5 10
5 6 11
6 9 13
Examples:
3. The following are the age (in years) and systolic
blood pressure of 20 apparently healthy adults.
Age (x) B.P (y) Age (x) B.P (y)

20 120 46 128
43 128 53 136
63 141 60 146
26 126 20 124
53 134 63 143
31 128 43 130
58 136 26 124
46 132 19 121
58 140 31 126
70 144 23 123

Find the regression equation. What is the predicted


blood pressure for a man aging 25 years?
https://www.easycalculation.com/sta
tistics/regression.php
Think-Pair-Share Activity
Your turn!
1. You will be divided into 3 groups consisting of 15 members for each.
Using a measuring tool, each group will be asked to collect the
following the data (dependent and independent variable) from their
members.
• Group 1 : Arm span (x) and Height (y)
• Group 2 : Neck circumference (x) and Waistline (y)
• Group 3: Forearm (x) and Foot (y)
2. From the data gathered, the following shall be obtained:
a. Regression equation/model
b. Scatter plot
c. Interpretation of the slope and y-intercept
3. Each group will present the output and demonstrate the use of the
regression equation in predicting their dependent variable.
MS Excel Application
The owner of a chain fruit shake stores would like to
study the correlation between atmospheric
temperature and sales during the summer season. A
random sample of 12 days is selected with the
results given as follows:
Day 1 2 3 4 5 6 7 8 9 10 11 12
Temp 79 76 78 84 90 83 93 94 97 85 88 82
(F)
Total 147 143 147 168 206 155 192 211 209 187 200 150
Sales
(Units)
Step 1: Encode the data set from cell A2:A13 and
B2: B13.
Step 2: Type Slope in cell A15 and Intercept in cell A16. Put the cursor
in cell B15 and click on fx, then select Statistical  Slope. Click Ok.
Step 3: Enter the cell range of the data in the Input Range box.
Step 4: Click OK. The value of the Slope will appear on the target cell
B15.
Step 5: Put the cursor in cell B16 and click on fx, then select Statistical
 Intercept. Click OK.
Step 6: Enter the cell range of the data in the Input Range box.
Step 7: Click OK. The value of the Intercept will appear on the target
cell B16.
SPSS Software Application
The owner of a chain fruit shake stores would like to
study the correlation between atmospheric
temperature and sales during the summer season. A
random sample of 12 days is selected with the
results given as follows:
Day 1 2 3 4 5 6 7 8 9 10 11 12
Temp 79 76 78 84 90 83 93 94 97 85 88 82
(F)
Total 147 143 147 168 206 155 192 211 209 187 200 150
Sales
(Units)
Steps:
1. Encode the temperature data set from VAR00001
box and Total sales data set from VAR00002 box.
2. Go to Variable View.
3. Type Temperature and Total Sales in the Name
Column.
4. Return to Data View. Select Regression  Linear…
5. Move Total_Sales data to Dependent and
Temperature Data to Independent(s). Select Enter
in Method, then click OK.
6. The result will appear on a separate worksheet.

Sirug, Winston S., Statistics and Probability for Senior High School Core Subject: A Comprehensive Approach © 2017. Mindshapers Co, Inc.
1. What do you think are the challenges in
learning Correlation and Regression by the
students?
2. As teachers, how can we ease
their struggle?

You might also like