Professional Documents
Culture Documents
Pretace
INTRODUCTION
s.
MOORE
Purdue University
------~ : : ~ - - - - - -
-xvu
xviii
Introduction
The essays in this book iIlustrate learning from data in many set
Introduction
XIX
xx
Introduction
Introduction
nonresponders are different from the responders? If so, don't take the
polI results too seriously.
Sorne sample surveys are more trusrworthy. Government surveys,
such as the monthly Current Population Survey (CPS) that produces
the unemployment rate and much other information, have much
higher rates of response. Only about 6% or 7% of the households
chosen at random for the CPS sample don't respondo The Bureau of
Labor Statistics, unlike pollsters, makes its response rates publico
Knowing the details increases our confidence in the findings.
Before you trust the results o/a statistical study, ask about details
o/how the study was conducted.
3~O 1
3000
2500
e
ro
e
1!
2000
::J
D:I
1500
Q)
g
1000 --.
...
~,
Yogi Berras famous saying is a motto for learning from data. A few
carefully chosen graphs are often more instructive than great piles of
numbers. Consider the outcome of the 2000 presidential election in
Florida.
Elections don't come much closer: after much recounting, state
officials declared that George Bush had carried Florida by 537 votes
out of almost 6 million votes casto Floridas vote decided the election
and made George Bush rather than Al Gore president. Lawsuits fol
lowed, and the Supreme Court upheld the resulto Legal and political
issues aside, Figure 1 displays a graph that plots votes for the third
party candidate Pat Buchanan against votes for the Democratic can
didate Al Gore in Floridas 67 counties.
What happened in Palm Beach Counry?The question leaps out from
the graph. In this large and heavily Democratic county, a conservative
third-party candidate did far better relative to the Democratic Party
candidate than in any other county. The points for the other 66 coun
ties show votes for both candidates increasing together in a roughly
straight-line pattern. Both counts go up as county population goes up.
Based on this pattern, we would expect Buchanan to receive around
800 votes in Palm Beach County. He actually received more than
3400 votes. That difference determined the election result in Florida
500 --j
XXI
O~
O
100,000
200,000
300,000
400,000
FIGURE 1 Votes for Pat Buchanan versus votes for Al Gore in Florida's 67 counties
and in the nation. All this from a simple graph. Once you have data in
hand, the first rule of data analysis is:
XXII
lnrroaucuon
such as smoking, good habits such as regular exercise, and so on. The
variation among women will overwhelm the effect of taking hor
mones unless we can find a way ro see through the variation.
Inrroducrion
XXlll
has few benefits and sorne risks trump observational studies that show
benefits, but we can't be absolutely sure that the experimental findings
are right. There remains sorne risk that by bad luck the dummy-pill
group received healthier women than the hormone group. So, statis
tical findings are always uncertain. The laws of probability again come
ro our rescue: we can attach ro our findings a statement of just
how uncertain they are, and we can design our studies ro make the
remaining uncertainty as smalI as we may wish. Opinion polIs, for
example, give not only the percentage of the sample who support the
president but also a "margin of error" that describes the uncertainty in
applying the sample result ro the wider universe of alI adults. Saying
how much variation remains belongs with strategies for reducing vari
ation in the statistician's roolkit for dealing with variation.
XXIV
Introduction
PUBLIC POLICY
ANDSOCIAL
SCIENCE
------~ : : ~ - - - - - -
-1