You are on page 1of 16

b o o k e xc e r p t

Damned Lies
and Statistics

i l lu st r at i o n s by
b i l l m ay e r

R
Remember the old saw that figures don’t lie,

but liars do figure? Joel Best, the author of

Damned Lies and Statistics: Untangling Num-

bers from the Media, Politicians and Activists (University of California Press), would

probably not be so kind to the figures. Indeed, Best, who teaches sociology and crimi-

nal justice at the University of Delaware, plainly spends a good portion of his time at

the office teasing elusive truths from the witches’ brew of numbers that pervade debates

over public policy. ¶ This witty, well-written excerpt – from a chapter of the book titled

“mutant statistics” – is apt to leave the reader more depressed than angry. For much of

the difficulty with numbers discussed therein is a

product of ignorance rather than of intent. Read it

and weep. Or, at the very least, learn to view statis-

tics and the nice folks who peddle them as proof of

this or that with deepest skepticism.

by joel best

© University of California Press, reprinted with permission. Third Quarter 2001 65


Not all statistics start out bad,
but any statistic can be made worse. Numbers – even good numbers – can
be misunderstood or misinterpreted. Their meanings can be stretched,
twisted, distorted or mangled. These alterations create what we can call
“mutant statistics” – distorted versions of the original figures.

Many mutant statistics have their roots in number seem even more credible.
innumeracy – difficulty grasping the meaning As statistics gain wide circulation, number
of numbers and calculations – which is wide- laundering occurs. The figures become hard-
spread. Not only is much of the general pub- er to challenge because everyone has heard
lic innumerate, but also the advocates pro- them and everyone assumes they must be
moting social and economic problems are correct. This is especially true when numbers
often not any better. They may become con- reinforce our beliefs or interests. (“Of course
fused about a number’s precise meaning. that’s true!”)
They may misunderstand how a problem has Consider one widely circulated statistic
been defined, how it has been measured, or about the dangers of anorexia nervosa.
what sort of sampling method has been used. Anorexia usually occurs in young women,
At the same time, their commitment to their and some feminists argue that it is a response
cause (“After all, it’s a big problem!”) may to societal pressures for women to be beauti-
lead them to improve a statistic, to make the ful and to cultural standards that equate slen-
numbers seem more dramatic, more com- derness with beauty. Advocates who were
pelling. Some mutant statistics may be prod- seeking to draw attention to the problem esti-
ucts of advocates’ cynicism, of their deliberate mated that 150,000 American women were
attempts to distort information in order to anorexic, and noted that anorexia could be
make their claims more convincing. This fatal.
seems particularly likely when mutation At some point, some feminists began re-
occurs at the hands of large institutions that porting that each year 150,000 women died
twist information into the form most favor- from anorexia. (In fact, only about 70 deaths
able to their interests. But mutation can also a year are attributed to anorexia.) This simple
be a product of sincere, albeit muddled, inter- transformation – turning an estimate for the
pretations by innumerate advocates. total number of anorexic women into the
Once someone utters a mutant statistic, annual number of fatalities – produced a dra-
there’s a good chance that those who hear it matic, memorable statistic. Advocates repeat-
will accept it and repeat it. Innumerate advo- ed the erroneous figure in books, in newspa-
cates influence their audiences: the media per columns, on talk shows and so on. There
repeat mutant statistics and the public accepts were soon numerous sources for the mistaken
– or does not challenge – whatever numbers number. A student searching for material for
the media present. A respected commentator a term paper on anorexia, for instance, had a
may hear a statistic and repeat it, making the good chance of encountering – and repeating

66 The Milken Institute Review


– this wildly inaccurate statistic, and each rep- cal reasoning. We rarely are able to count all
etition helped ensure that the mutant statistic the cases of some social problem. Instead, we
would live on. collect some evidence from a sample and gen-
Yet it should have been obvious that some- eralize from it. Generalization involves some
thing was wrong with this figure. Anorexia basic processes: the problem must be defined,
typically affects young women. In the United and a means of measurement and a sample
States, roughly 8,500 females 15 to 24 years must be chosen. These are elementary steps in
old die from all causes each year; another social research. But even the most basic prin-
47,000 women aged 25 to 44 also die annual- ciples can be violated and, surprisingly often,
ly. What were the chances, then, that there no one notices when this happens. Mutant
could be 150,000 deaths from anorexia each statistics – based on flawed definitions, poor
year?
But, of course, most of us have
no idea how many young women
die each year. (“It must be a lot.”)
When we hear that anorexia kills
150,000 young women a year, we
assume that whoever cites the
number must know that it is true.

let me count the ways


How and why does mutation oc-
cur? Here, I explore four common
ways of creating mutant numbers,
beginning with the most basic
error – making inappropriate gen-
eralizations from a statistic. I then
turn to transformations – taking a
number that means one thing and
interpreting it to mean something
completely different. Next, I ex-
plore confusion – transformations
that involve misunderstanding the
meaning of more complicated statistics. measurements or bad samples – emerge, and
Finally, I consider compound errors – the often receive a surprising amount of atten-
ways in which bad statistics can be linked to tion.
form chains of error. In these four ways, bad Questionable Definitions. Consider the
statistics not only take on lives of their own, flurry of media coverage about the so-called
but they do increasing damage as they persist. epidemic of fires in African-American chur-
ches in the South in 1996. Various groups
Generalization: charged that the fires were the work of a racist
Elementary Forms of Error conspiracy. Their claims recalled the history
Generalization is an essential step in statisti- of racial terrorism in the South: black church-

Third Quarter 2001 67


es had often been targets of arson or bomb- difficult to assess the evidence. The advocates
ing. Perhaps because 1996 was an election who offered lists of fires (and asserted that
year, politicians – both Democrats (including each was evidence of a racist conspiracy) may
President Clinton and Vice President Al have been convinced, but those who tried to
Gore) and Republicans – denounced the fires, identify cases using some sort of clear defini-
as did both the liberal National Council of tion failed to find any evidence of an epidemic.
Churches and the conservative Christian Inadequate Measurement. Clear, precise
Coalition. definitions are not enough. Whatever is
Activists (such as the anti-racist Center for defined must also be measured, and meaning-
Democratic Renewal) tried to document the less measurements will produce meaningless
increased number of fires, producing lists of statistics. For instance, consider recent federal
church arsons and statistics about the num- efforts to count hate crimes – crimes motivat-
ber of suspicious fires as evidence that the ed by racial, religious or other prejudice.
problem was serious. However, investigations, In response to growing concern, the
first by journalists and later by a federal task Federal Bureau of Investigation invited local
force, called those claims into question. law enforcement agencies to submit annual
While there were certainly some instances reports on hate crimes within their jurisdic-
in which whites burned black churches out of tions and, beginning in 1991, the bureau
racist motives, there was no evidence that a began issuing national hate-crime statistics.
conspiracy linked the fires. Moreover, the defi- Although the FBI had collected data on the
nition of what was a racially motivated incidence of crime from local agencies for
church fire proved to be unclear; the activists’ decades, counting hate crimes posed special
lists included fires at churches with mostly problems. When police record a reported
white congregations, fires known to have crime – say, a robbery – it is a relatively
been set by blacks, teenage vandals or mental- straightforward process. Usually the victim
ly disturbed people, and fires set to collect comes forward and tells of being forced to
insurance. surrender money to the robber. But identify-
When journalists checked the records of ing a hate crime requires something more: an
the insurance industry, they discovered not assessment of the criminal’s motive. A rob-
only that the number of fires in 1996 was not bery might be a hate crime if prejudice moti-
unusually high, but also that church arsons vates the robber, but the crimes committed by
had been generally declining since at least robbers with other motives are plainly not
1980. The federal task force ultimately failed hate crimes, even if the robber and the victim
to find any evidence of either an epidemic of are of different races.
fires or a conspiracy. There are real disagreements about how to
In short, statistics attempting to demon- define and measure hate crimes. Not surpris-
strate the existence of an epidemic of church ingly, some activists favor broad, inclusive
arsons lacked a clear definition of what ought standards that will avoid false negatives. Some
to count as a racially motivated church fire. feminists, for example, argue that rapes
By the same token, those who claimed there should automatically be considered hate
was an epidemic did not define the number of crimes on the grounds that all rape is moti-
fires it would take to constitute one. Indeed, vated by gender prejudice. But local officials,
the absence of any clear definitions made it who may be reluctant to publicize tensions

68 The Milken Institute Review


within their communities, may favor much While the record-keeping may improve
narrower standards: a cross-burning on an over time, the hate-crime statistics reported
African-American family’s lawn may be clas- during the program’s early years were nearly
sified as a teenage prank rather than as a hate worthless. Practices for recording hate crimes
crime. obviously varied widely among jurisdictions,
Because there is much variation in how – making meaningful comparisons impossible.
and even whether – agencies measure hate Moreover, it should be noted that, as report-
crimes, hate-crime statistics have been ing does improve, the numbers of reported
incomplete and uneven. In 1991, the FBI col- hate crimes will almost certainly increase.
lected hate-crime data from only 32 states; That is, incidents that previously would not
less than a quarter of all law enforcement have been counted as hate crimes will be

Clear, precise definitions are not enough.


Whatever is defined must also be measured,
and meaningless measurements will produce
meaningless statistics.

agencies supplied reports. By 1996, 49 states


and the District of Columbia reported some
data, but many agencies still did not partici-
pate.
More important, many of the agencies that
did file reports indicated that they had re-
corded no hate crimes during 1996. Twelve
states reported fewer than 10 hate crimes
apiece and Alabama’s law enforcement agen-
cies reported none. So long as many agencies
refused to submit hate-crime statistics – and counted and successive annual reports will
others used wildly different standards to clas- show the incidence of hate crime rising.
sify hate crimes – the data collected and pub- Measurement is always important, but this
lished would have little value. We might even example illustrates why new statistical mea-
suspect that the jurisdictions that report the sures should be handled with special caution.
most hate crimes will be those with the most Bad Samples. Earlier, I emphasized the
liberal governments, because they are more importance of generalizing from representa-
likely to press law enforcement agencies to tive samples. This is a basic principle, but one
take such reporting seriously. This suggests that is easily ignored.
that hate-crime statistics may be a better mea- Consider a study subtitled A Survey of
sure of local officials’ political beliefs than of 917,410 Images, Descriptions, Short Stories,
the incidence of hate crimes. and Animations Downloaded 8.5 Million

Third Quarter 2001 69


Times by Consumers in Over 2,000 Cities in 40 searcher did not collect a representative sam-
Countries, Provinces, and Territories. An un- ple of Internet traffic. Rather, he examined
dergraduate published this research in 1995 postings to only 17 of some 32 Usenet groups
in a law review, reporting that 83.5 percent of that carried image files. In fact, his findings
the downloaded images were pornographic. showed that pornographic images accounted
In 1995, the Internet was still a novel phe- for about 3 percent of Usenet traffic, while
nomenon; people worried that children were Usenet accounted for only about an eighth of
frequent users, and that parents did not the traffic on the Internet.
understand the Internet well enough to pro- The sample was drawn from precisely that
tect their children from questionable content. portion of the Internet where pornographic
images were concentrated and was
thus anything but representative.
An alternative way to summarize
the study’s findings was that only
one-half of 1 percent of Internet
traffic involved pornographic ima-
ges – a markedly lower figure than
83.5 percent.
All three cases discussed above
received extensive coverage from
the media; all three attracted the
attention of political leaders; all
three involved mutant statistics. It
is also true that, in all three cases,
the statistics eventually drew criti-
cism. However, critics are not always
successful in influencing the public.
Many people probably remain con-
vinced that most Internet traffic is
pornographic, that members of a racist
conspiracy set many church fires, and so
on. The impact of mutant statistics is
often long-lived.

Claims that an extensive research project Transformation:


revealed that a substantial majority of Changing the Meaning of Statistics
Internet traffic involved pornography gener- Another common statistic mutation involves
ated considerable concern. The huge scope of transforming a number’s meaning. Usually,
the study – 917,410 images downloaded 8.5 this takes place when someone tries to repeat
million times – implied that it must have been a number, but manages to say something dif-
exhaustive. ferent; recall that 150,000 people with ano-
But, of course, a large sample is not neces- rexia became 150,000 annual deaths from
sarily a good sample. In this case, the re- anorexia.

70 The Milken Institute Review


Of course, not all transformations are as everyone who repeated the statistic made all
obvious as equating having a disease with four transformations, but the number’s origi-
dying from it. Often transformations involve nal meaning soon became lost in a chorus of
more subtle misunderstandings or logical claims linking “pedophile priests” to the 6
leaps. percent figure.
Consider the evolution of one social com- This example suggests that it is impossible
mentator’s estimate that 6 percent of to predict all the ways a number might be
America’s 52,000 Roman Catholic priests “are misunderstood and given an entirely new
at some point in their adult lives sexually pre- meaning. While it may be especially easy to
occupied with minors.” This estimate origi- transform estimates and guesses because the
nated with a psychologist and former priest language of guessing is often vague, more pre-
who treated disturbed clergymen and derived cisely defined statistics can also undergo
the figure from his observations. It was, in transformation.
short, an educated guess. Still, his claim was Homicide statistics offer an example. In
often repeated and, in the process, trans- addition to gathering reports of homicides to
formed in at least four important ways. calculate crime rates, the FBI also tries to col-
First, some of those who repeated the fig- lect more detailed information for its supple-
ure forgot that it was an estimate, and re- mentary homicide reports. The FBI encour-
ferred to the number as though it were a well- ages law enforcement agencies to complete a
established fact – presumably a finding from brief form for each homicide that asks, for
a survey of priests. Second, while the psychol- example, about the victim’s age, gender and
ogist’s estimate was based on a sample of ethnicity, the relationship between victim and
priests who had sought psychological treat- offender, and the circumstances of the homi-
ment (and therefore might well be especially cide – whether the death occurred during the
likely to have experienced inappropriate course of a robbery, during an argument, and
attractions to young people), he generalized so on.
to all priests. Third, although the original esti- These reports are inevitably incomplete.
mate referred to sexual attraction rather than When the police find the body of a homicide
actual behavior, those who repeated the num- victim, they ordinarily can identify the vic-
ber often suggested that 6 percent of all tim’s age and gender, and they often – but not
priests had had sexual contacts with young always – can specify the circumstances of the
people. Fourth, those young people became homicide. But unless they identify the offend-
redefined as children; commentators charged er, they usually cannot know the nature of the
that 6 percent of priests were pedophiles. Al- victim-offender relationship. In such cases,
though the original estimate in fact suggested the relationship is coded “unknown.” Roughly
that twice as many priests were attracted to 15 to 20 percent of the reports to the FBI re-
adolescents as to younger children, this sub- ports list the circumstances as unknown;
tlety was lost. nearly 40 percent indicate that the victim-
Thus, an estimate that perhaps 6 percent offender relationship is unknown.
of priests in treatment were at some point Completing the paperwork for the FBI is a
sexually attracted to young people was trans- byproduct of police work and may receive a
formed into the so-called fact that 6 percent relatively low priority. Agencies are supposed
of all priests had had sex with children. Not to submit these reports within five days of the

Third Quarter 2001 71


end of the month in which the homicide the victim-offender relationship was un-
becomes known. While the FBI asks for known were probably serial murders.
updated reports when additional information Recent claims blaming most homicides on
becomes available, many agencies don’t both- strangers use similar logic. Reports to the FBI
er to report changes. Thus, a homicide initial- classify about 15 percent of victims and
ly reported as involving unknown circum- offenders as strangers, but nearly 40 percent
stances or an unknown victim-offender rela- of victim-offender relationships as unknown.
tionship may later be solved, but the police do Some interpretations assumed that any
not necessarily report what more they have unknown relationship must involve stran-
learned to the FBI. gers. So they added 15 percent and 40 percent
A classification of “unknown” means just and concluded that strangers commit most
that – at the time the report was completed, (55 percent) of all murders.
the police didn’t have the information. How- Both the serial murder and the stranger-
ever, people sometimes make assumptions homicide claims transformed the meaning of
about the nature of the unknown circum- “unknown” by assuming that, if the police
stances or unknown victim-offender relation- can’t classify the victim-offender relationship,
ships reported to the FBI. In the early 1980s, then the homicide must be the work of a
the FBI drew attention to the problem of ser- stranger – or even a serial killer. This is an
ial murderers. There had been several promi- unwarranted logical leap. Researchers who
nent serial murder cases in the news, and the have conducted more careful studies (for
press argued that this was, if not a new crime example, examining officials’ final classifi-
problem, at least one that was more common cations for homicides) have concluded that
than ever before. The FBI estimated that there strangers account for only 20 to 25 percent of
might be as many as 35 serial murderers all homicides, not more than half, and that
active at any one time, and the media claimed serial murderers kill perhaps 400 victims a
that serial murderers might account for as year, not 4,000.
many as 4,000 or 5,000 deaths per year. Some The lesson from the misinterpretation of
commentators mangled these numbers fur- these statistics, then, concerns transforma-
ther, reporting that there were 4,000 to 5,000 tions created by careless inferences about the
active serial killers. meaning of official statistics. In these cases,
It should have been apparent that there observers assumed that they knew what had
was something wrong with these statistics. actually happened in cases that the police
They implied that each killer murdered more labeled “unknown.” They produced dramatic,
than 100 victims per year – an improbably frightening figures that exaggerated the
high average. How did the analysts arrive at deaths caused by strangers or serial murder-
the figure of 4,000 victims? Simple: they ers, and those transformed figures were wide-
assumed that all – or at least a large share – of ly circulated.
the reported homicides involving unknown Transformations involve shifts in meaning;
circumstances or an unknown victim-offend- advocates for some cause convert a statistic
er relationship were serial murders. Serial about X into a statistic about Y. This is an
murderers often kill victims unknown to obvious error. Sometimes transformations
them; therefore, those inclined to sensational- are inadvertent, reflecting nothing more than
ize the numbers assumed that cases in which sloppy language. In such cases, people try to

72 The Milken Institute Review


repeat a statistic, but they accidentally reword circulation for years – and continued to be
a claim in a way that creates a whole new repeated even after they were called into ques-
meaning. Of course, other transformations tion. After all, the mutant statistics were now
may be deliberate efforts to mislead in order readily available; people could easily find
to advance the advocates’ cause. them on the Internet or printed in many
Certainly, transformations often increase a sources.
claim’s impact by making it more dramatic: Transformation errors are thus easy to

some statistics get mangled because they


are difficult to grasp, and are therefore
easily confused.

the number of anorexics becomes a body


count; priests attracted to adolescents become
priests having sex with children; homicides of
unknown circumstances become serial kil-
lings. Such statistics get repeated precisely
because they are dramatic, compelling num-
bers. A transformation that makes a statistic
seem less dramatic is likely to be forgotten,
but a more dramatic number stands a good
chance of being repeated. It is a statistical ver-
sion of Gresham’s Law: bad statistics drive out
good ones. make, but difficult to set right. Transforma-
There is another lesson here: transforma- tion only requires that one person with in-
tion errors often reflect innumeracy. It should fluence over the media misunderstand a sta-
have been obvious that anorexia could not tistic and repeat the number in a way that
kill 150,000 women per year. Similarly, the gives it a new meaning. Even if someone rec-
people who asserted that there were 35 active ognizes the mistake and calls attention to it,
serial murderers killing 4,000 victims each the error is likely to live on.
year should have realized that those two fig-
ures made no sense when combined, that Confusion: Garbling Complex Statistics
both could not be correct. The examples discussed so far involve misun-
Advocates are not the only ones to blame. derstanding simple, straightforward statistics.
The reporters who wrote stories about all But some statistics get mangled because they
those deaths from anorexia or serial murder are difficult to grasp, and are therefore easily
should have asked themselves whether those confused.
numbers were plausible; they might even have Consider Work Force 2000, a 1987 report
investigated the claims before repeating them. commissioned by the United States Depart-
Yet, in each case, these numbers received wide ment of Labor that projected changes in the

Third Quarter 2001 73


American work force. Work-force demogra- 2.2 million – would be white males’ “net addi-
phics are gradually changing for several rea- tion to the work force” Because the numbers
sons; most important, a growing proportion of female and nonwhite workers are growing
of women work, so females account for a faster than those of white males, white males
growing percentage of workers. In addition, made up a relatively small share – less than 15
the percentage of workers who are nonwhite percent – of the anticipated total net addition
is growing, a fact that reflects several develop- to the work force.
ments, including immigration patterns and Rather than describing the gradual decline
ethnic differences in birth rates. The com- in white males’ proportion of the work force
bined effect of these changes is gradually re- in terms of a straightforward percentage (47.9
ducing the proportion of white males in the percent in 1988, falling to 44.8 percent in
work force. In 1988, when Work Force 2000 2000), the authors of Work Force 2000 chose
to use a more obscure measure (net
additions to the work force). That
was an unfortunate choice, because
it invited confusion. In fact, it even
confused the people who prepared
the report. Work Force 2000 came
with an executive summary for those
too busy to read the entire document. The
summary mangled the report’s findings by
claiming that “only 15 percent of the new
entrants to the labor force over the next 13
years will be native white males, compared to
47 percent in that category today.” That sen-
tence was wrong for two reasons. First, it con-
fused net additions to the labor force
(expected to be roughly 15 percent white
males) with all new entrants to the labor
appeared, white males accounted for 47.9 force (white males were expected to be about
percent of all workers, and the report project- 32 percent of all those entering the labor
ed that, by 2000, this percentage would fall to force); and, second, it made a meaningless
44.8 percent. comparison between the percentage of white
However, rather than describing the change males among net work force entrants and
in such easily understood terms, the report’s white males’ percentage in the existing labor
authors chose to speak of “net additions to force (roughly 47 percent). The statistical
the work force.” The report predicted the comparison seemed dramatic, but it was
populations of workers that would enter and pointless.
leave the work force (because of death, retire- Unfortunately, the dramatic number cap-
ment and so on) between 1988 and 2000. For tured people’s attention. The press fixed on
example, the authors estimated that 13.5 mil- the decline in white male workers as the re-
lion white males would join the work force port’s major finding, and began to repeat the
and 11.3 million would leave. The difference – error. Officials at the Department of Labor

74 The Milken Institute Review


tried to clarify its meaning, but the mutant The report’s authors ought to have realized
statistic took on a life of its own. Politicians, that most people would not grasp this rela-
labor and business leaders, and activists all tively complicated idea. And, of course, the
warned that the workplace was about to un- people who interpreted the report (beginning
dergo a sudden change – that white males with the authors of the executive summary)
were an endangered species. unintentionally mangled it to produce figures
The mangled statistic was itself remangled; with new, wildly distorted meanings. Thus, a
for example, one official testified before Con- correct but difficult-to-understand statistic
gress that by “the year 2000, nearly 65 percent became an easy-to-understand but complete-
of the total work force will be women,” yet no ly wrong number.
one asked how or why that might occur. Similar confusion characterized press cov-
It is easy to see why people repeated these erage of a medical study supposedly showing
claims. The notion that white males would that doctors referred blacks and women for
soon become a small proportion of all work- cardiac catheterization less often than whites
ers offered support for a variety of political and men. In the study, researchers gave doc-
ideologies. Liberals saw the coming change as tors information (e.g., descriptions of chest
proof that more needed to be done to help pain and results of stress tests) about fiction-
women and minorities – who, after all, would al patients who were described as either black
be the workers of the future. Liberal propos- or white, female or male. The doctors were
als based on Work Force 2000 called for ex- asked how they would treat these patients,
panded job training for nontraditional (that and the researchers examined which kinds of
is, nonwhite or female) workers, additional patients were referred for cardiac catheteriza-
programs to educate management and work- tion. Interestingly, white women, black men,
ers about the need for diversity in the work- and white men were equally likely to receive
place, and so on. referrals for the procedure: catheterization
In contrast, conservatives viewed the was recommended for 90.6 percent of the
changing work force as further evidence that patients in each group. In contrast, only 78.8
immigration, feminism and other develop- percent of black women were referred for
ments threatened traditional social arrange- catheterization.
ments. In response to claims that white male This study attracted considerable press
workers were disappearing, a wide range of coverage when the media summarized the
people found it easier to agree (“We knew it! results as showing that blacks and women
We told you so!”) than to ask critical ques- were 40 percent less likely to receive cardiac
tions about the statistical claims. testing. How could the press produce this
The reaction to Work Force 2000 teaches a mangled statistic from these data?
disturbing lesson: complex statistics are The answer lies in the researchers’ decision
prime candidates for mutation. Not that the to report their results in terms of odds ratios.
statistics in Work Force 2000 were all that Producing this statistic involved a two-stage
complex. But the meaning of “net additions calculation. First, the researchers calculated
to the work force” was not obvious, and when the odds of people in different groups being
people tried to put it in simpler language – referred for catheterization. Remember that
such as “new workers” – they mangled the 90.6 percent of white women received refer-
concept. rals; this means that, among 1,000 white

Third Quarter 2001 75


women, 906 would get referrals, and 94 would stood the odds ratio (0.6) to mean the relative
not; therefore the odds of a white woman likelihood of receiving the procedure. The
being referred were 9.6 to 1 (906 referrals/94 correct comparison would have involved cal-
non-referrals = 9.6). That is, for every white culating the risk ratios – that is, figuring the
woman not referred, 9.6 would be referred. relative chance, or risk, of being referred for
Black men and white men had the exactly testing.
same percentages of being referred, and If 90.6 percent of whites and 84.7 percent
therefore the same odds of referral. But the of blacks are referred, then blacks are 93 per-
odds of black women being referred were cent as likely (84.7/90.6 = .93) to get referrals.
only 3.7 to 1. Among 1,000 black women, That is, blacks were 93 percent as likely to be
there would be 788 referrals and 212 non- referred as whites, and women 93 percent as
referrals. Thus for every black woman not likely as men. Blacks and women, then, were
referred, 3.7 would be referred. not 40 percent less likely to receive referrals;
Notice that when white men and white they were 7 percent less likely to be referred.
women are grouped together, both sexes had As in the case of Work Force 2000, the mis-
the same rates of referral and the same aggre- interpretation began by mangling a poorly
gate odds – 9.6 to 1. However, when black understood statistic. Reporters tried to trans-
men (who had the same rate of referral as late the unfamiliar notion of odds ratios into
whites) were combined with black women more familiar statements of probability, and
(who had a lower rate of referral), the overall their resulting claims were simply wrong.
odds for all blacks were lower – 5.5 to 1. Sim- Two other aspects of this case deserve
ilarly, the aggregate odds for all men (black mention. First, the researchers’ decision to
and white) were higher than the aggregate compare grouped data for men and women
odds for all women. (and blacks with whites) distorted their find-
So far, we have been talking about the odds ings. Remember that white men, black men,
of being referred for cardiac catheterization. and white women had exactly the same rates
But the researchers reported the odds ratios. of referrals. The use of aggregate comparisons
This slightly more complicated statistic obscured the real pattern (that only black
involves a second stage of calculation. For ex- women were referred at lower rates). Rather
ample, the ratio of the odds of men being than suggesting that all women or all blacks
referred (9.6) to the odds of women being re- were less likely to receive referrals, the re-
ferred (5.5) is 1 to 0.6 (5.5/9.6 = 0.6). Sim- searchers should have emphasized that black
ilarly, the ratio of the odds of whites being women received different recommendations
referred (9.6) to the odds of blacks being from all other patients.
referred (5.5) is 1 to 0.6. Odds ratios, like net Second, we might wonder about the signi-
additions to the work force, are statistics that ficance of receiving those referrals. The press
lack intuitive meaning. Most people don’t reports simply assumed that referrals for car-
think in terms of odds ratios, nor do they un- diac catheterization (an invasive medical pro-
derstand what the term means. cedure that carries its own risks) were always
Certainly, this was true for the reporters appropriate, in effect implying that every
who announced that blacks and women were patient should have received a referral. But
only 60 percent as likely to receive heart test- this may be wrong. Perhaps the study showed
ing as whites and men were. They misunder- that physicians were too quick to refer men

76 The Milken Institute Review


and white women for a risky procedure. But Rarely criticized, they gain widespread accep-
the press reports never considered this possi- tance, and they are repeated over and over.
bility. The press also tended to forget that the Each repetition makes the number seem
doctors in this study were examining ficti- more credible. And, of course, bad statistics
tious files, not treating real patients. can become worse through mutation. But
The ease with which somewhat complex that’s not the end of the process. Bad statistics
statistics can produce confusion is troubling can have additional ramifications when they
because we live in a world in which complex become the basis for calculating still more
numbers are becoming more common. Sim- statistics.
ple statistical ideas – fractions, percentages, We can think about this process as com-
rates – are reasonably well understood by pounding errors into a chain of bad statistics;

The ease with which somewhat complex statistics


can produce confusion is troubling because we live
in a world in which complex numbers
are becoming more common.

many people. But many social problems in-


volve complex chains of cause and effect that
can be understood only through complicated
models. Thus, current understandings for
why some people develop heart disease or
cancer assume that heredity plays a part, that
various behaviors (diet, exercise, smoking
and so on) play roles, and that the environ-
ment has an influence. Sorting out the inter-
connected causes of these problems requires
relatively complicated statistical ideas – net
additions, odds ratios and the like. If we have
an imperfect understanding of these ideas,
and if the reporters and other people who
relay the statistics to us share our confusion,
the chances are good that we’ll soon be hear- one questionable number becomes the basis
ing – and perhaps making decisions on the for a second statistic that is, in turn, flawed.
basis of – mutated statistics. The process can continue as the second bad
number leads to a third, and so on.
Compound Errors: Consider some of the uses to which the
Creating Chains of Bad Statistics Kinsey Reports have been put. During the
Bad statistics often take on a life of their own. 1930s and 1940s, the biologist Alfred Kinsey

Third Quarter 2001 77


and his colleagues conducted lengthy inter- a substantial number of active homosexuals,
views with several thousand people about as well as many people who had been impris-
their sexual experiences. These interviews be- oned.
came the basis for two books, Sexual Behavior Nonetheless, commentators sometimes
in the Human Male (1948) and Sexual Be- treat the Kinsey findings as though they offer
havior in the Human Female (1953), that are an authoritative, representative portrait of the
popularly known as the Kinsey Reports. American population. For example, gay and
The books challenged the polite fiction lesbian activists sometimes argue that one-
tenth of the population is homo-
sexual, and they refer to the
Kinsey Reports to support that
claim.
Rather than define heterosexu-
al and homosexual as a simple
dichotomy, the Kinsey Reports
described a continuum that
ranged from individuals who had
never had a homosexual experi-
ence to those who had some inci-
dental homosexual experiences
to those whose sexual experi-
ences had been exclusively ho-
mosexual. Still, the male report
estimated that “10 percent of the
males are more or less exclusively
homosexual... for at least three
years between the ages of 16
and 55.” Later surveys, based on
more-representative samples,
have concluded that 3 to 6 per-
that most sex was confined to marriage; they cent of males (and a lower percentage of
revealed that many people had experience females) have had significant homosexual
with a wide range of sexual behaviors, such as experience, and that the incidence of homo-
masturbation and premarital sex. However, sexuality among adults is lower – between 1
the Kinsey data could not provide accurate and 3 percent. However, gay and lesbian
estimates for the incidence of different sexual activists often dispute these lower estimates;
behaviors. While the interviews constituted a they prefer the one-in-ten figure because it
large sample, that sample was not representa- suggests that homosexuals are a substantial
tive, let alone random. It contained far more minority – roughly equal in number to
college-educated people than there were in African-Americans in the United States – who
the general population and, in an effort to form a group too large to be ignored. Thus,
explore a broad range of sexual experiences, the 10 percent figure lives on, and it is often
Kinsey deliberately arranged interviews with used in calculating other new statistics about

78 The Milken Institute Review


gays and lesbians. Consider, for example, of homosexuals among all teenagers is esti-
claims that one-third of teenage suicides – or mated at 3 percent, or 6 percent, the number
roughly 1,500 deaths a year – involve gay or of gay teenage suicides falls. If the rate at
lesbian adolescents. Gay activists invoke this which homosexual teens commit suicides is
statistic to portray the hardships gay and les- only twice that of heterosexuals, the number
bian youth confront. It suggests that stigma falls.
and social isolation are severe enough to drive This example offers two important les-
many adolescents to kill themselves. sons. The first is a reminder that bad statistics
But how could anyone hope to measure can live on. Most social scientists consider
gay teenage suicides accurately? Many gays Kinsey’s 10 percent estimate for homosexual-
and lesbians try to conceal their sexual orien- ity too high. Yet some gay and lesbian activists
tation, and some teenagers might have been continue to cite it precisely because it is the
driven to suicide because keeping that secret largest available number. In turn, 10 percent
was becoming a burden. But, given this secre- often figures into other calculations – not just
cy, how could anyone know which teenagers about gay teenage suicides, but estimates of
who commit suicide are gay or lesbian? the number of gay voters, the size of the gay
So how did advocates arrive at the statistic? population at risk of AIDS, and so on.
They constructed a chain of bad statistics. The second lesson is perhaps harder to
They began with the familiar, Kinsey-based learn. Any claim about the number of gay
claim that one-tenth of the population – teenage suicides should set off alarm bells.
including, presumably, one-tenth of teenagers Given the difficulties in learning which deaths
– is homosexual. Roughly 4,500 teenage are suicides and which teenagers are gay, it
deaths are attributed to suicide each year; on obviously must be hard to learn the number
average, then, 10 percent of those – 450 sui- of gay teenage suicides. It is not unreasonable
cides – should involve gay or lesbian teens. to ask how the advocates arrived at that num-
(Note that we have already incorporated our ber, and which assumptions lay behind their
first dubious statistic: that 10 percent of the calculations. But, of course, such examina-
population is gay or lesbian.) tions are the exception, not the rule.
Next, advocates drew upon various studies Compound errors can begin with any of
that suggested that homosexuals attempted the standard sorts of bad statistics – a guess, a
suicide at two to three times the rate of het- poor sample, an inadvertent transformation,
erosexuals. Note that this figure presumes perhaps confusion over the meaning of a
knowledge about the rates of an often-secre- complex statistic. People inevitably want to
tive behavior in two populations – one itself put statistics to use, to explore their implica-
often hidden. Multiplying 10 percent by three tions. An estimate for the number of home-
led to an estimate that gays and lesbians less people can help us predict the costs of
accounted for 30 percent of suicides – and social services for the homeless, just as an es-
this figure was in turn rounded up to one- timate of the proportion of the population
third. Thus, one-third of 4,500 teenage sui- that is homosexual lets us predict the number
cides – 1,500 deaths – involve gay or lesbian of gay and lesbian teenagers who may attempt
youths. suicide. But when the original numbers are
Notice how the final figure depends on the bad, compound errors can result. Assessing
proponents’ assumptions. If the proportion such statistics requires another level of critical

Third Quarter 2001 79


tracing the history of a number, and learning
how its meaning or use changes over time.
Mutant statistics don’t always start out bad.
Although a bad statistic often provides an
excellent basis for mutation, even good statis-
tics can be mangled into bad mutations.
Mutant statistics are not necessarily evi-
dence of dishonesty. Many advocates are per-
fectly sincere, yet innumerate. Their convic-
tion that the problem is serious and that they
need to make that seriousness clear to others,
coupled with their misunderstanding of what
the numbers actually mean, provides the
foundation for many of the sorts of errors
detailed here.
However, there is also deliberate manipu-
lation – conscious attempts to turn statistical

Drama ensures repetition, while innumeracy


discourages critical thinking.

information to particular uses. Data can be


presented in ways that convey different
impressions, and it is not uncommon for
advocates to selectively choose which num-
bers they report, and to pick the words they
use to describe the figures with care. That is,
the numbers are selected because they pro-
mise to persuade, to support the advocates’
positions.
Whether they are sincere or cynical, advo-
thinking; one must ask both how advocates cates prefer dramatic statistics, numbers that
produced the statistic at hand (1,500 gay make the problem seem as serious, the need
teenage suicides), and whether they based as urgent, as possible. If, through transforma-
their calculations on earlier numbers that are tion or confusion or compound error, they
themselves questionable (e.g., 10 percent of devise a mutant statistic that happens to be
the population is homosexual). more dramatic than some original figure,
there is a good chance that the mutation will
the roots of mutant statistics spread. Drama ensures repetition, while in-
Detecting mutant statistics often requires numeracy discourages critical thinking. M

80 The Milken Institute Review

You might also like