You are on page 1of 20

ActivitiesNotes for the Instructor

Activity 1.1 Head SizesUnderstanding Variability


In this activity, students should be able to see the difference between variability due to
measurement error and person-to-person variability. In this activity, person-to-person
variability will likely be larger than variability introduced by having different people
taking measurements.
In Step 9, students should speculate that the scheme proposed would result in more
variability since the measurements will reflect both person-to-person variability and
variability introduced by having different people doing the measuring.
Activity 1.2 sti!ating Sizes
ctual si!es for the ten shapes are
" # $ % & ' ( ) 9 "*
%% #* "' %* %" %( '' "% %) $$
+on,t give the students very much time to estimate the si!es, and be sure to remind them
not to draw on the figure. -his activity introduces the idea of deviations .here estimated
/ actual0 and leads students to consider the sum of s1uared deviations as a measure of
overall error.
Activity 1." gg Variability
-his activity asks students to consider the issue of measurement error and to think about
variability.
Activity 1.# $ig %eet& 'ittle %eet
-his activity e2amines variability and asks students to informally compare two groups on
the basis of variability and center.
Activity 2.1 (esigning a Sa!)ling *lan
It is often very difficult r impractical to select a simple random sample from a population.
In particular, selecting a simple random sample of students from a large school could
prove difficult because it is unlikely that a student could get access to a reasonable
sampling frame. -his activity asks students to consider how they might select a sample
that, while not necessarily a simple random sample, might still be considered as
representative of the students at the school. 3ncourage students to consider the need to
vary location, day of the week, time of day, etc., and to think about how they will decide
which students in a proposed location will actually be asked to participate.
Activity 2.2 An +)eri!ent to ,est for the Stroo) ffect
Students generally find this to be an interesting e2periment. -wo reasonable designs are
". ssign volunteer sub4ects at random to one of the two e2perimental conditions
.te2t or colored rectangles0
#. 5ave all sub4ects process both the te2t and the rectangle lists. 6or this design, it is
important that the order of the two e2perimental conditions be determined at
random for each sub4ect.
5ave the class talk about the difference between these two designs and note that although
it takes different forms in the two designs, randomi!ation is critical in both designs.
-his activity also provides a good forum for talking about e2traneous variables and how
the proposed design.s0 address them.
Activity 2.# -luster Sa!)ling
-his activity is fairly straightforward and introduces students to cluster sampling. -he
last step e2plores the idea of sampling variability, and important concept in 7hapter ). It
is worth spending some time with the class discussing the responses to this step.
Activity 2.. S)eed Sorting
In this activity, students design and carry out an e2periment and then informally look for
differences among the three treatments8e2perimental conditions. It also asks students to
articulate why random assignment of sub4ects to treatments is important.
Activity /.1 'ocating States
-his activity bridges 7hapters # and $, asking students to develop a sampling plan and
then implement it to collect data. -he graphical displays of 7hapter $ are then used to
summari!e the resulting data.
+epending on where you are located, you may need to bring in a 9S map to help decide
whether you are closer to :ebraska or ;ermont in Step )<
If the class comes up with a reasonable sampling plan, they should feel comfortable
generali!ing the results to the population of students at your school in Step "*. -his
would be an appropriate place to talk about why it would not be a good idea to generali!e
beyond your school=to the district, county, state, etc., or to other age groups.
Activity /.2 $ean -ounters0
-his activity has students collect data that is paired in nature. >raphical displays are used
to summari!e the resulting data and students are led to consider looking at differences.
5opefully, students see that looking at the distribution of differences provides
information that is not apparent in the two individual distributions. -his is an important
ides in 7hapter "", when the distinction between independent and paired samples
determines the appropriate method of analysis.
?ou can come back to this data in 7hapter "" if you need a good data set to illustrate the
paired t test.
Activity /./ 121 1arginal *lots
marginal plot is a scatter plot with added univariate plots for the two variables used to
construct the scatter plot. @I:I-A can construct marginal plots. 6or e2ample, below
are two marginal plots=one that uses a histogram as the univariate display and one that
uses a dot plot.
Head.W
H
e
a
d
.
L
10 9 8 7 6 5 4
18
16
14
12
10
Marginal Plot of Head.L vs Head.W
Head.W
H
e
a
d
.
L
10 9 8 7 6 5 4
18
16
14
12
10
Marginal Plot of Head.L vs Head.W
?ou will need a fairly sensitive scale for this activity. Aiology, 7hemistry and Bhysics
faculty are likely to have a good scale that you might be able to borrow.
In Step ), have students think about which type of marginal plot would be best if the
sample si!e is very large .histogram marginal plot0.
Activity /." Stretchability of 3ubber $ands
-his activity asks students to collect data, summari!e it using graphical displays, and to
think critically about graphical displays. Aefore attempting this activity, make sure that
you have the appropriate supplies=-5I: rubber bands and AI>, 53;? nuts for
weights. If the nuts are not heavy enough, the stretch will not be very noticeable and it
will be hard to measure. -ry this one out before attempting it in class<
Activity ".1 -ollecting and Su!!arizing Nu!erical (ata
-his activity has students implement the sampling plan developed in ctivity #.". If you
didnCt do ctivity #.", you will need to integrate the sampling design into this activity.
Activity "./ $o+)lot Sha)es
SolutionsD
Ao2plot 6ive-:umber Summary Eeasoning
III -his is the only five number
summary consistent with an
outlier on the high side.
A I @edian closer to the upper
1uartile, short lower
whisker.
7 II @edian closer to the lower
1uartile, but nearer the
center of the bo2 than I;.
+ I; @edian closer to the lower
1uartile, and upper whisker
longer than lower whisker.
Activity "." Understanding Variability and Nu!erical 1easures of Variability
favorite activity=gets students thinking about variability visually and then
numerically.
Activity ".# -o!)aring $rands of -hocolate -hi) -oo4ies
-his activity has students compare two different brands of chocolate chip cookies and
then use post-it notes to create a graphical display that is e1uivalent to a back-to-back
stem-and-leaf plot. In Step (, students should describe how the two distributions are
similar or different with respect to center, spread and shape.
In Step "", students may want to consider other factors .price, nutritional information,
appearance etc.0 in addition to the number of chocolate chips in making their
recommendations.
Activity #.1 +)loring -orrelation and 3egression
-his activity has students use two applets .on the t2t 7+0 to e2plore correlation .Steps "
F #0 and how sum of s1uared error is used in fitting the least s1uares line .Step $0.
Activity #.2 Age and %le+ibility
In this activity, students will collect data on age and a measure of fle2ibility and then
summari!e the data using a scatter plot and a least s1uares line. It is important to have a
wide range of ages represented in the data set.
?ou may want to spend a bit of time talking about the proposed measure of fle2ibility=
sometimes students worry at first that it will depend on height, but usually they are
convinced that the measure is reasonable after discussion. If students are concerned,
have them also record height, and they can then use the data to see if there does appear to
be a relationship between height and the measure of fle2ibility.
Aefore Step $, it may be useful to have the class discuss which of the two variables .age
and fle2ibility0 is the response variable.
Activity #./ *a)er ,o5elsA Nonlinear 3elationshi)
Brior to carrying out this activity, ask students to bring in a partially used roll of paper
towels. ?ou will want to only use rolls that have standard si!e paper towels .some rolls
now on the market have much smaller towel sheets0. If you have a large class, you can
4ust have a subset of the class bring in rolls="* to #* in total would be enough. If you
have storage space, you can save the towels once they have been taken off the rolls and
counted. Some of the later activities use paper towel sheets.
-he scatterplot in Step % should e2hibit a nonlinear patter. Gne practical situation where
a model like this might be useful .see Step (0 is for estimating the remaining length of
fabric on a bolt of fabric.
Activity #." +)onential (ecay
-his activity has students generate bivariate data that illustrates a nonlinear relationship.
Students then recommend a transformation of the data. ?ou can e2tend this activity by
asking students to actually fit a nonlinear model to the data.
Activity ..1 6isses
-his is a simple .and sweet0 activity that uses simulation to estimate a probability.
Activity ..2 A -risis for uro)ean S)orts %ans77
-his activity uses simulation to appro2imate a probability distribution. -he distribution
obtained is actually the sampling distribution of a sample proportion, and Step %
introduces the idea of using the information provided by the sampling distribution to
reach a conclusion.
If you have a small class, you may need to have each student conduct more than one trial
in order to have a sufficient number of trials.
?ou may want to revisit this activity again in 7hapter ) when sampling distributions are
formally introduced.
Activity ../ ,he 8Hot Hand8 in $as4etball
In this activity, simulation issued to appro2imate the distribution of the longest run in a
se1uence of trials. Students are often surprised to find that long runs are not as
uncommon as they e2pected. ?ou might want to ask students to predict the length of the
longest run of heads that would be observed prior to actually performing the simulation.
If you have a small class, you may need to have each student conduct more than one trial
in order to have a sufficient number of trials.
-here is no right or wrong answer for the 1uestion posed in Step (, but it can be an
interesting discussion<
Activity .." ,he 1onty Hall *roble!
-his is the now famous .but sometimes unintuitive0 @onty 5all problem. -he activity is
designed to help students understand the reasoning that leads to the correct
answer8strategy.
Activity ..# fron9s (ice: An Unintuitive +a!)le
n unintuitive by interesting e2ample. ?ou may need to remind students what
HtransitiveH means<
Activity ;.1 3otten ggs77
?ou have to chuckle at the newspaper clip that is the basis for this activity<
In step #, the probability calculations shown depend on the assumption of independence.
-his may not be reasonable if eggs in the same carton come from the same farm, same
chicken, were processed in the same batch, etc.
Step $ has students carry out a simulation of the strategy proposed by the restaurant
manager. If you have a small class, you may need to have each pair of students perform
more than "* trials to obtain a sufficient number of trials.
Activity ;.2 ,he Sound of the Nor!al (istribution
nother one of my favorite activities< ?ou really can HhearH the normal distribution.
-he variable of interest is
2 I time for a kernel to pop
and the distribution is formed by the 2 values for the collection of kernels in the bag.
Activity ;./ *ass the 1essage
-his activity is a variation on the old childrenCs pass the message game, where someone
whispers a message to a second person, who then passes it to a third and so on. In that
game, the message that emerges at the end can be 1uite different from the original
message.
-he simulation is straightforward. -he key think to recogni!e is that the end result will
be correct any time thee is an even number of transmission errors.
Activity <.1 (o Students =ho ,a4e the SA, 1ulti)le ,i!es Have an Advantage
in -ollege Ad!issions7
-his activity lets students simulate sampling distributions and then use them to make a
policy recommendation regarding whether admissions at selective universities should
consider the highest, mean or most recent score for a student who takes the e2am multiple
times.
Sample results from Barts " and # are shown below .of course results will vary somewhat
from student to student0.
@a2#
max2
F
r
e
q
u
e
n
c
y
1290 1260 1230 1200 1170 1140 1110
140
120
100
80
60
40
20
0
Hist ogram of max2
@ean#
mean2
F
r
e
q
u
e
n
c
y
1290 1260 1230 1200 1170 1140 1110
140
120
100
80
60
40
20
0
Hist ogram of mean2
Eecent#
recent2
F
r
e
q
u
e
n
c
y
1290 1260 1230 1200 1170 1140 1110
140
120
100
80
60
40
20
0
Hist ogram of recent2
@a2&
max5
F
r
e
q
u
e
n
c
y
1290 1260 1230 1200 1170 1140 1110
140
120
100
80
60
40
20
0
Hist ogram of max5
@ean&
mean5
F
r
e
q
u
e
n
c
y
1290 1260 1230 1200 1170 1140 1110
140
120
100
80
60
40
20
0
Histogram of mean5
Eecent&
recent5
F
r
e
q
u
e
n
c
y
1290 1260 1230 1200 1170 1140 1110
140
120
100
80
60
40
20
0
Hist ogram of recent 5
-hings to noteD -here is a clear advantage to students taking the test multiple times when
the highest score is used and the advantage is greater when the test is taken & times than
when it is taken # times. -his is not surprising, but he graphs allow students to get a
sense of the magnitude of the advantage. -he distributions for most recent score looks
similar no matter how many times the test is taken, and remains centered at about "#**,
the students true ability.
-he distribution of mean# and mean& are centered at about "#**, but are less spread out
than the distribution of recent# and recent&. Students should be able to relate what they
see in the mean histograms to what they know about the sampling distribution of the
sample mean.
If your students donCt have access to @I:I-A, this activity can easily be adapted to a
demonstration, where you show the computer .or provide it in handouts0.
Activity <.2 (efective 121s
-his activity has students use what they know about the sampling distribution of a sample
proportion to decide if the number of defective @F@s in a sample of "** is consistent
with a claim of "*J defective.
-his activity can be revisited when B-values are introduced in 7hapter "*.
Activity >.1 ?etting a %eel for -onfidence 'evel
Bart " of this activity has students use an applet to e2plore the meaning of confidence
level. -he main point is that the confidence level specifies an error rate for the method=
it is the proportion of intervals constructed in repeated sampling that include the true
value of the population characteristic being estimated. Boint out that the value of the
population characteristic is fi2ed and does not change. Khat changes from sample to
sample is the interval itself.
Bart # of this activity looks at why the t distribution is used to construct confidence
intervals when the population standard deviation in not known. 6or small sample si!es, if
the ! interval is used when the population standard deviation is unknown, the long-fun
proportion of the time that the resulting interval will include the population mean is too
small=less than the stated confidence level.
s the sample si!e increases, the proportion containing the population value gets much
closer to the desired confidence level. -his is why some te2ts say it is GL to use a !
interval even if the population standard deviation is unknown, as long as the sample si!e
is large.
Activity >.2 An Alternative -onfidence Interval for a *o)ulation *ro)ortion
In section 9.# mentions that the HusualH large sample confidence interval for a population
proportion is often not very good in the sense that the actual long-run proportion of
intervals that include the value of the population proportion may differ substantially from
the stated confidence level. -his activity has students compare Hcapture rateH for the
usual large-sample confidence interval with that of the alternative interval proposed in the
te2t.
Activity >.# =ater Stains
-his activity has students collect data and construct confidence intervals. +ata is
collected on the width of the water stain .at the widest part of the stain0 when M tsp of
water is spilled onto a paper towel and when N tsp. of water is spilled onto a paper towel.
.5ere is a place you can use those leftover paper towels from ctivity &.$.0
@any students are surprised at the fact that the mean width of the stain for N tsp water
does not appear to be twice the mean width for M tsp. of water. Khat is happening is that
the E3 of the stain is about twice as large, but this does not mean that the width of the
stain doubles. .-hink of a rectangle=if both the width and length of the rectangle
double, the area increases by a factor of %.0
Activity 1@.1 -o!)aring the t and z (istributions
-his activity uses simulation to show why the statistic
x
s
n

is not well described by a standard normal distribution when n is small and the population
standard deviation is unknown, even if the population distribution is normal. Students
generate #** samples of si!e & from a normal distribution and then compute both the !
statistic .using the known population standard deviation0 and the t statistic for each
sample. Students see that there is more variability in the t values than in the ! values and
that while the histogram of the ! value looks like the standard normal, the histogram of
the t values is more spread out.
-ypical graphs .student simulation results will vary from student to student0 are shown
below.
! values
z
F
r
e
q
u
e
n
c
y
4 3 2 1 0 -1 -2 -3
50
40
30
20
10
0
Histogram of z
t values
t
F
r
e
q
u
e
n
c
y
4 3 2 1 0 -1 -2 -3
50
40
30
20
10
0
Hist ogram of t
Activity 1@./ 'a)sed ,i!e
-his activity has students collect data and use it to perform a one-sample hypothesis test.
If you have a small class si!e, you may need to recruit some additional sub4ects to ensure
an ade1uate sample si!e. In my e2perience, the distribution of times appears skewed
rather than normal, so it is best to have a sample si!e of $* or more.
Activity 11.1 Heliu! %illed %ootballs77
-his activity uses data from the +ata and Story Oibrary .+SO0. Aefore looking at the
data, have your students answer the 1uestion posed in Step ". -his can lead to an
interesting discussion<
If you prefer to provide the data rather than have your students go to the Internet to
retrieve it, the data follows. If you havenCt seen the +SO web site, you might want to
spend a bit of time e2ploring it. It is a nice source of data and e2amples.
Trial Air Helium
1 25 25
2 23 16
3 18 25
4 16 14
5 35 23
6 15 29
7 26 25
8 24 26
9 24 22
10 28 26
11 25 12
12 19 28
13 27 28
14 25 31
15 34 22
16 26 29
17 20 23
18 22 26
19 33 35
20 29 24
21 31 31
22 27 34
23 22 39
24 29 32
25 28 14
26 29 28
27 22 30
28 31 27
29 25 33
30 20 11
31 27 26
32 26 32
33 28 30
34 32 29
35 28 30
36 25 29
37 31 29
38 28 30
39 28 26
Activity 11.2 ,hin4ing About (ata -ollection
-his activity focuses on the difference between independent samples and paired samples,
and also provides students with a chance to review some of the material on e2perimental
design. -he activity can consist of 4ust Steps " / $, which focus on data collection issues,
or you can include Step %, which asks students to actually collect the data and perform an
appropriate hypothesis test.
Activity 11." 1ore on (efective 121s
In this activity, students inspect @F@s for defects, comparing the proportion of
defective for plain and peanut @F@s. Students should also classify defective @F@s by
type of defect so that this data can be used again in ctivity "#.$.
Activity 11.# =hich =eighs 1ore-o4e or (iet -o4e7
-his is a fun activity=you will be surprised at the student answers to the 1uestion in Step
". Students seem to be evenly split between the three possible responses, and so the
discussion gets the students interested in collecting data to answer the 1uestion. .It turns
out that 7oke weighs more, due to the sugar that is dissolved in the solution.0
-o carry out this activity you will need a fairly sensitive scale=science departments
usually have these, so you might ask a science faculty member if you can borrow one for
this activity.
Activity 12.1 *ic4 a Nu!ber& Any Nu!berA
Beople are notoriously poor random number generators. -his activity gives students
practice with the 7hi-s1uare goodness-of-fit test and also reminds them that they should
use random number tables or a random number generator when they need random
numbers.
Activity 12.2 -olor and *erceived ,aste
-his activity has students design an e2periment to see if the color of a food item changes
the way people perceive its taste. 7ollecting the data can be a bit time consuming, so if
you choose not to have the students actually collect the data, an alternate approach would
be to complete Step " .the design part0 and then provide students with hypothetical data
that can be used to complete the table in Step $ and the test in Step %.
Activity 12./ *eanut and *lain 121 (efects
-his activity uses data collected in ctivity "".$ to perform a 7hi-s1uare test. If you
didnCt do ctivity "".$, you will need to do the data collection part of that activity at this
time.
-his activity also provides an opportunity to talk about the distinction between the 7hi-
s1uare test of homogeneity and the 7hi-s1uare test of independence. -his is a test for
homogeneity.
Activity 1/.1 Are ,all =o!en fro! 8$ig8 %a!ilies
-his activity primarily focuses on the model utility test in simple linear regression.
@I:I-A output for the data of this e2ample follows.
Silings
H
e
i
g
!
t
6 5 4 3 2 1 0
67
66
65
64
63
62
61
Scatt er"lot of Heig!t vs Silings
Pearson correlation of Height and i!lings " 0#396
P$%alue " 0#257
The regression e&uation is
Height " 64#3 ' 0#366 i!lings
Predictor (oef ) (oef T P
(onstant 64#2543 0#7532 85#31 0#000
i!lings 0#3662 0#3001 1#22 0#257
" 1#55637 *$& " 15#7+ *$&,ad-. " 5#2+
Activity 1/.2 ?olden 3ectangles
-his activity introduces the idea of fitting a linear model that goes through the origin .i.e.
has intercept I *0. @I:I-A will fit such a model, but if you donCt have access to
@I:I-A or a software program that will fit such a model, you can have students fit the
model by hand.
-he model is
y x = +
and the least s1uares estimate of

is
#
xy
b
x
=

-he estimated variance of the residuals is


#
#
#
#
"
e
xy
y
x
s
n




and the test statistic for testing


*
D .'") H =
is
#
.'")
e
b
t
s
x

Activity 1/./ Na!e 'engths


-his activity is straightforward. Students use data on name lengths to carry out a test to
determine if there is a positive correlation between first name length and last name
length.
Activity 1".1 +)loring the 3elationshi) $et5een Nu!ber of *redictors and
Sa!)le Size
-his activity shows that if enough variables are included in a multiple regression model,
it is possible to achieve an
#
R
of "**J and an
e
s
of * even when there is no relationship
between y and any of the predictors. @I:I-A output follows
#egression $nalysis% y versus x&
The regression e&uation is
/ " 24#6 $ 0#195 01
Predictor (oef ) (oef T P
(onstant 24#622 2#925 8#42 0#004
01 $0#1949 0#1411 $1#38 0#261
" 0#612314 *$& " 38#9+ *$&,ad-. " 18#5+
#egression $nalysis% y versus x&' x2
The regression e&uation is
/ " 29#9 $ 0#224 01 $ 0#232 02
Predictor (oef ) (oef T P
(onstant 29#939 4#336 6#90 0#020
01 $0#2240 0#1205 $1#86 0#204
02 $0#2321 0#1557 $1#49 0#275
" 0#516166 *$& " 71#0+ *$&,ad-. " 42#1+
#egression $nalysis% y versus x&' x2' x(
The regression e&uation is
/ " 27#7 $ 0#294 01 $ 0#203 02 ' 0#150 03
Predictor (oef ) (oef T P
(onstant 27#728 3#116 8#90 0#071
01 $0#29430 0#08850 $3#33 0#186
02 $0#2025 0#1047 $1#93 0#304
03 0#14981 0#07984 1#88 0#312
" 0#343323 *$& " 93#6+ *$&,ad-. " 74#4+
#egression $nalysis% y versus x&' x2' x(' x)
The regression e&uation is
/ " 35#5 $ 0#0278 01 $ 0#325 02 $ 0#0535 03 $ 0#350 04
)
Predictor (oef (oef T P
(onstant 35#5481 1 1 1
01 $0#0278208 1 1 1
02 $0#324598 1 1 1
03 $0#0534616 1 1 1
04 $0#350013 1 1 1
" 1
Activity 1#.1 +)loring SingleB%actor ANCVA
Step "D
7ase "D Oooks like the variance may not be the same for all three populations
7ase #D 7onsistent with assumptions
7ase $D 7onsistent with assumptions
7ase %D @ay worry about assumption of normality for population $
Step #D
7ase D Brobably not all the same
7ase AD 9nsure
7ase 7D 7ould be the same
Step $D @I:I-A output follows
7ase
*ne+,ay $-*.$% Sam"le &' Sam"le 2' Sam"le (
ource 23 4 3 P
3actor 2 3313#69 1656#84 259#60 0#000
)rror 21 134#03 6#38
Total 23 3447#72
7ase A
*ne+,ay $-*.$% Sam"le &' Sam"le 2' Sam"le (
ource 23 4 3 P
3actor 2 2918 1459 12#55 0#000
)rror 21 2442 116
Total 23 5359
7ase 7
*ne+,ay $-*.$% Sam"le &' Sam"le 2' Sam"le (
ource 23 4 3 P
3actor 2 237#0 118#5 1#55 0#235
)rror 21 1604#0 76#4
Total 23 1841#0

You might also like