Professional Documents
Culture Documents
Association between
Quantitative Variables
Chapter 6
6.1 Scatterplots
Is household natural gas consumption
associated with climate?
Annual household natural gas consumption
measured in thousands of cubic feet (MCF)
Climate as measured by the National Weather
Service using heating degree days (HDD)
Copyright 2011 Pearson Education, Inc.
3 of 30
6.1 Scatterplots
Association between Numerical Variables
A graph displaying pairs of values as points on a
two-dimensional grid
The explanatory variable is placed on the x-axis
The response variable is placed on the y-axis
Copyright 2011 Pearson Education, Inc.
4 of 30
6.1 Scatterplots
Scatterplot of Natural Gas Consumption (y)
versus Heating Degree-Days (x)
Copyright 2011 Pearson Education, Inc.
5 of 30
6.2 Association in Scatterplots
Visual Test for Association
Compare the original scatterplot to others
that randomly match the coordinates
If you can pick the original out as having a
pattern, then there is an association
Copyright 2011 Pearson Education, Inc.
6 of 30
6.2 Association in Scatterplots
Describing Association
1. Direction. Does it trend up or down?
2. Curvature. Is the pattern linear or curved?
3. Variation. Are the points tightly clustered
around the trend?
4. Outliers. Is there something unexpected?
Copyright 2011 Pearson Education, Inc.
7 of 30
6.2 Association in Scatterplots
Gas Consumption vs. Heating Degree Days
1. Direction: Positive.
2. Curvature: Linear.
3. Variation: Considerable scatter.
4. Outliers: None apparent.
Copyright 2011 Pearson Education, Inc.
8 of 30
6.3 Measuring Association
Covariance
A measure that quantifies the linear
association
Depends on units of measurement and is
therefore difficult to interpret
Copyright 2011 Pearson Education, Inc.
9 of 30
1 1 2 2
cov( , )
1
n n
x x y y x x y y x x y y
x y
n
x
z rz
y a bx
and /
y x
a y bx b rs s
6.4 Summarizing Association with a Line
Line Relating Gas Consumption (y) to
Heating Degree Days (x)
Copyright 2011 Pearson Education, Inc.
19 of 30
x y 0126 . 0 6 . 42
6.4 Summarizing Association with a Line
Lines and Prediction
Use the correlation line to customize an ad for
estimated savings from insulation based on
climate.
For a home in a cold climate (HDD = 8,800), the
predicted gas consumption is 154 MCF.
At $10 / MCF, the predicted cost is $1,540.
Assuming that insulation saves 30% on gas bill,
estimated savings is $462.
Copyright 2011 Pearson Education, Inc.
20 of 30
6.5 Spurious Correlation
Lurking Variables
Scatterplots and correlation reveal
association, not causation
Spurious correlations result from underlying
lurking variables
Copyright 2011 Pearson Education, Inc.
21 of 30
6.5 Spurious Correlation
Checklist: Covariance and Correlation
Numerical variables
No obvious lurking variables
Linear
Outliers
Copyright 2011 Pearson Education, Inc.
22 of 30
4M Example 6.1:
LOCATING A NEW STORE
Motivation
Is it better to locate a new retail outlet far
from competing stores?
Copyright 2011 Pearson Education, Inc.
23 of 30
4M Example 6.1:
LOCATING A NEW STORE
Method
Is there an association between sales at the retail
outlets and distance to nearest competitor? For
55 stores in the chain, data are gathered for total
sales in the prior year and distance in miles from
the nearest competitor.
Copyright 2011 Pearson Education, Inc.
24 of 30
4M Example 6.1:
LOCATING A NEW STORE
Mechanics
Copyright 2011 Pearson Education, Inc.
25 of 30
4M Example 6.1:
LOCATING A NEW STORE
Mechanics
Compute the correlation between sales and
distance to be r = 0.741
Copyright 2011 Pearson Education, Inc.
26 of 30
4M Example 6.1:
LOCATING A NEW STORE
Message
The data show a strong, positive linear association
between distance to the nearest competitor and
sales. It is better to locate a new store far from its
competitors.
Copyright 2011 Pearson Education, Inc.
27 of 30
Best Practices
To understand the relationship between two
numerical variables, start with a scatterplot.
Look at the plot, look at the plot, look at the plot.
Use clear labels for the scatterplot.
Copyright 2011 Pearson Education, Inc.
28 of 30
Best Practices (Continued)
Describe a relationship completely.
Consider the possibility of lurking variables.
Use a correlation to quantify the association
between two numerical variables that are linearly
related.
Copyright 2011 Pearson Education, Inc.
29 of 30
Pitfalls
Dont use the correlation if data are categorical.
Dont treat association and correlation as
causation.
Dont assume that a correlation of zero means
that the variables are not associated.
Dont assume that a correlation near -1 or +1
means near perfect association.
Copyright 2011 Pearson Education, Inc.
30 of 30