You are on page 1of 30

Students Guide

Vo l u m e 2 : Q ua n t i tat i v e A n a ly s i s a n d Stat i st i c s C l a s s e s
P repared

by

S. Clayton Palmer

I nstructor of E conomics
W eber S tate U n iversity

J. Lon Carlson

A ssociate P rofessor
D epartment of E conomics
I lli nois S tate U n iversity

www.HarperAcademic.com

P r e fa c e
Introduction
Welcome to the Students Guide to SuperFreakonomics! SuperFreakonomics drives home the point
that there is no argument without a statistical analysis of the data. The conclusions in the book
have been drawn from the datausing the statistical methods you are being introduced to in your
classes.
Many students view statistical methods and statistical inference as difficult, if not impossible, to
master. However, this perception is often based on the notion that statistics is not applicable to
anything that real people deal with. As an old college song says, Were working our way through
college, to get a lot of knowledge that well probably never, ever, use again.1 SuperFreakonomics
dispels this myth. What we learn from Levitt and Dubner is that when you understand the data, you
understand the world!
As you read SuperFreakonomics, you will see the real-world applications of statistical concepts in
specific situations. But, there are no explanations of the statistical methods used. The purpose of
this guide is to help you understand the analyses presented in SuperFreakonomics by providing a
bridge between the material covered in an introductory course in statistics and the engaging topics covered in what we consider to be one of the most fascinating books in economics.
There is one other task we attempt to accomplish in this guide. The authors of SuperFreakonomics
show you what its like to think like a statistician. The material we present here is intended to make
the job even easier for you.

Organization of the Students Guide


We organized the material in this guide to help you identify the key points in each chapter and to
ensure that you have a firm grasp of the key concepts presented in the book. The first section of
each chapter in this guide consists of an overview that highlights the major topics and points presented in the book. The overview is designed to alert you to the major topics and is not intended to
serve, in any way, as a substitute for the material in the text.
The second section of each chapter highlights key statistical concepts and methods that are
addressed in the corresponding chapter of SuperFreakonomics. The purpose of this section is to
alert you to the major factors that affect the relationships being illustrated.These concepts and
methods are not organized the same way your statistics text is organized. Simple concepts and
more complex methods show up in the same chapters of SuperFreakonomics. You will learn about
simple statistical techniques and more complex methods in all of the chapters. Weve labeled

1. Were Working Our Way Through College Music and lyrics by Johnny Mercer & Richard Whiting. From the movie Varsity Show (Warner Bro. 1937)

them according to the subjects in your statistical text book so that you can slide more easily from
SuperFreakonomics to your text book and vice versa.
The third section of each chapter consists of a list of what we have termed core competencies.
How well you are able to respond to each of the questions listed in this section will be a strong indicator of the extent to which you understand the material presented in the book.

Using the Student Guide


When using the student guide, remember that the overview and discussions of terms are not substitutes for reading SuperFreakonomics. Instead they are designed to flag key topics and alert you to
specific items you may have missed. When completing the core competency questions, we recommend that you avoid using your book or notes to answer the questions on your initial run through.
Instead, use the questions as a way to flag topics you have not yet mastered. When you cannot
answer a specific question, or you answer it incorrectly, you should take that as a clue: go back and
devote more time to the topic.

One More Helpful Thought


This guide was prepared for students taking a class in quantitative analysis or statistics. Another
guide has been prepared for students enrolled in an economics class. It contains descriptions of
how the topics of SuperFreakonomics relate standard economic concepts and provides a bridge
between this book and a traditionally organized economics text. Depending on the class your are
taking and the way your instructor has organized the curriculum, you may find the economics-oriented student guide helpful.

Chapter 1
How Is a Street Prostitute Like a Department-Store Santa?
Summary
Levitt and Dubner begin Chapter 1 by describing some of the ways women have been abused and
discriminated against over time. Then, they point out that although conditions in certain countries
have improved dramatically over the last few decades, women still suffer the effects of discrimination. This is especially true in the labor market, where a significant wage differential between men
and women continues to persist despite increased educational opportunities and legislative initiatives such as Title IX. This leads to consideration of the labor market in which women clearly dominate the supply sidethe market for prostitution.

We know that the topic of prostitution may make some instructors uncomfortable. The discomfort
factor notwithstanding, however, this topic demonstrates how economic analysis can provide an
objective assessment of at least some of the benefits and costs of a profession that provokes a vast
range of responses from outside observers. Remind your students to focus on the data!
The authors begin their analysis of the market for prostitution by describing what the market was
like at the turn of the century in Chicago, and how the subsequent criminalization of this activity
affected wages and working conditions over time. Information on relative wages in absolute and
current dollars suggests why so many women would consider working as a prostitute. Focusing
on the upscale Everleigh Club, the authors also show how manipulation of supply and demand
through product differentiationsupply a service that entails greater marginal costs and target
that segment of consumers with greater willingness to paycould result in a price much higher
than the broader market equilibrium. They then provide useful insights regarding how societys
objectives might be best met when it comes to enforcement of certain laws, i.e., it is more effective
to focus on demand than supply if certain activities are going to be effectively curtailed.
The next part of the chapter focuses on information gleaned from data collection efforts by Sudhir
Venkatesh that shed considerable light on the current conditions in the market for prostitution in
Chicago, ranging from wages and prices for various services rendered to the effects of relying on
marketers (e.g., pimps) to sell ones products or services. At this point, the authors also demonstrate
that what may be true in one marketpimps tend to improve market outcomes for prostitutes
may not hold elsewhere, e.g., procuring the services of a realtor does not necessarily leave the seller
of a house better off. This portion of the chapter concludes with an examination of the role the
police play in controlling prostitution and how the solutions they devise can be at odds with the
objectives of policymakers, i.e., the principal-agent problem.
Attention then turns to consideration of how the increase in the range of opportunities for educated women to work outside of teaching has simultaneously increased their average pay and disadvantaged school children. The latter is simply a result of a decrease in the average level of ability
of people entering the labor market for teachers. The authors then consider several possible explanations for the wage gap between men and women that continues to exist even after the increase
in the availability of higher paying jobs for women. The final section tells the story of a woman who
decided to become a prostitute on her own terms, and how she used sound marketing strategies
and economic principles to achieve a considerable level of financial success.
Basic Statistical Concepts
1. Descriptive Statistics. This chapter provides numerous descriptors of data sets. Some examples
are: the wages earned by street prostitutes, the wages earned among Americans generally,
accounting for differences in gender, the prices charged for various sexual acts performed by
street prostitutes. You should use this chapter to as a guide to the various data types, how surveys are designed, how data are gathered and how the data are analyzed and presented.

Types of Data. Most of the examples cited above are cross-sectional data. This is in contrast to
time-series data. The description of how the mean wage of a prostitute has changed over time is an
example of time-series data. Make certain to note the difference between these two types of data.
In addition, pay attention to what information is qualitative. Among the examples of qualitative
data, you will find nominal datathe street names of prostitutes and the locations where sexual
services are rendered are examples.
Data Gathering Techniques. How were the data on prostitution gathered? Sudhir Venkatesh collected data from prostitutes regarding sexual services rendered as well as data on a number of
variables associated with the sex trade in South Chicago. Recognizing that traditional methods of
gathering data regarding prostitution have not produced reliable results, he opted for a non-conventional method. He used trackers. These trackers were people who stood on street corners or sat
in brothels with the prostitutes themselves. The trackers collected data from the prostitutes as soon
as the customer was gone. The prostitutes were paid to provide the data to the tracker. Most of the
trackers were themselves former prostitutes.
Note that the collection of these data began with a designing a survey: a careful consideration of
what questions would be asked and how would they be asked. The next task was to identify the
desired sample size and how the participants of the survey would be chosen. Among other considerations, Venkatesh choose two main areas. These two areas differed by one variable: the use of
pimps. This way, even though Venkatesh was engaged in an observational study, he had the rough
equivalent of a natural experiment in which one data set differed from the other on the basis of
this one variable.
Data Gathering Complications. Gathering data related to a sensitive and morally provocative market
like the sex trade can be difficult. Its a tricky business and the data gathered wont always be reliable. Notice that Venkatesh decided that, in order to elicit cooperation and honest responses from
the prostitutes surveyed, a properly credentialed tracker would be a former prostitute. Moreover, it
was often necessary to pay for the prostitutes cooperation in answering the survey questions. The
data gathering difficulties encountered by Vankatesh indicate that analyses of an illicit trade should
be viewed with suspicion. The gathering of credible and reliable data will be difficult, expensive, and
time consuming.
Experimental vs. Observational Data. The data set on the South Chicago sex trade collected by
Vankatesh is an example of observational data gathering. An experimental approach to gathering
data involves carefully controlling the variables of interest. In observational data gathering, there
is no attempt to control the variables. A survey is an oft-used method of gathering observational or
non-experimental data.
Sampling Techniques: Simple Random Sampling vs. Convenience Sampling. The data on the South
Chicago sex trade collected by Venkatesh came from 160 prostitutes in three South Chicago locations over a two-year period. He collected data on 13 variables from 2,200 sexual transactions.

Techniques regarding simple random samples described in statistics textbooks state that, within
any population of N elements, any possible sample of size n has an equal probability of being
selected. There are a variety of ways in which to ensure a random sample. One is to assign a number
to each member of a population and then generate a set of n random numbers and include only
those who are selected by the random numbers generator in the sample. Given the illicit nature of
prostitution, it would be extremely difficult, if not impossible to determine the number of prostitutes in South Chicago, i.e., the population size, let alone locate and survey only those prostitutes
who are selected by a random number generator.
You might say that the data gathered is an example of convenience sampling. Your statistics text
describes the different varieties of sampling. Convenience sampling is a nonprobablility sampling
technique. It is often said that, with convenience sampling, it is impossible to evaluate the goodness of the sample in terms of representing a population. Therefore, the calculation of sample
statistics, e.g., the sample mean and standard deviation, is a useless exercise. The other side of this
argument is that information from a convenience sample can be used to represent population
parameters so long as those who are not sampled do not differ in any systematic way from those
who are the subject of the sample. Therefore, the relevant question becomes: Are these 160 prostitutes, and the services they provide, systematically different from the population of prostitutes in
South Chicago? If not, the information gathered can be used just as the information gathered from
a simple random sample or a census of prostitutes in two main areas of South Chicago.
2. Normative vs. positive analysis. Positive analysis relates to what is. Prostitutes who ply their
trade under the management of a pimp typically earn higher hourly wages is an example of
a positive statement and can be the target of positive statistical analysis. Normative analysis
relates to what ought to be. It is not statistical analysis.
If your statistics text includes any information at all on the difference between positive and normative analysis, its probably in the first chapter. Yet, the temptation, even for a statistician, to engage
in moralizing is strong. Statistical analysis is, at its heart, heartless. It does not account for or consider the moral aspects of the subject matter or consider how life should be. It merely analyzes how
life is. This chapter, about prostitution, draws conclusions from the data without judgment. This is
how statisticians must practice their craft.
3. Hypothesis Testing. In this chapter, Levitt and Dubner compare the average wage of prostitutes
who use a pimp to those who market and manage themselves. In addition to a comparison of
wages, the analysis includes a comparison of the number of incidents of assault by a client and
the number of times a prostitute was forced to give a freebie to a gang member.
These are cases in which a statistician gathers data on important variables and compares the mean
of one sample against the mean of another. He or she will attempt to infer whether the differences in the point estimates for these populations can be used to infer a difference in the respective populations. In other words, is the difference in mean hourly wage noted from the sample data
evidence of the existence of pimpact or is it merely a coincidence associated with random sam-

pling? The statistician will make this decision based on statistical significance. This is accomplished
through hypothesis testing. This begins by stating a null and alternative hypothesis. In the prostitution case these hypothesis statements are as follows:
H0: 1= 2
Ha: 1 2

Where:

1 is the population mean hourly wage of prostitutes without pimp managers


2 is the population mean hourly wage of prostitutes with pimp managers

The next steps are in your textbook in the chapter on hypothesis testing: 1) specify the level of significance, 2) compute the value of the test statistic, 3) calculate the rejection rule, and 4) use the
value of the test statistic and the rejection rule to determine whether to reject H0.
Hypothesis testing requires some practice. There are simple examples in every statistics text.
More Hypothesis Testing. In addition to the question of prostitutes and pimps, the value of real
estate agents is examined. Two samples are examined. The first sample consists of houses made
available for sale by the home owner using FSBOMadison.com. The second sample consists of
homes sold using a real estate agent. The mean sales price of the house in both samples is not statistically different. Testing the null hypothesis, the conclusion can be reached that the mean sales
price of a house in one sample doesnt differ from the mean sales price in the other sample. For the
sake of practice, pull out a piece of paper and write the null and alternative hypotheses for yourself.
4. Sampling: Mendacity among Respondents. Another problem related to conducting surveys is the
issue of the truthfulness of the answers provided by survey participants. It is now clearly understood thatin some casessurvey respondents will exaggerate claims, distort information and
out-right lie to surveyors. This is particularly the case for surveys regarding sensitive subjects
such as drug abuse and sex. If there is some reason to believe that individuals giving responses
to a survey are not entirely truthful, the survey results fall under a cloud of suspicion and provide less-than-reliable results. This chapter of SuperFreakonomics describes a Mexican welfare
program entitled Oportunidades. Here, economists studying the clients of this program found
that the claims of the value of personal assets were routinely underreported.
Core Competencies
If you have read and carefully studied this chapter, you should be able to complete the following
tasks. Note: these competencies do not begin with simpler statistical concepts and work toward more
difficult one. Instead, they follow the organization of the book.
1. Describe how men and women have compared over time with respect to such factors as life
expectancy, discrimination, and abuse. How has the comparison between the two changed over
the past 100 years?

2. Referring to the study by Bertrand, Goldin, and Katz, list and explain the three main variables
they identify as the primary causes of the observed wage gap between men and women.
3. How would you use multivariate regression analysis to determine the observed wage gap
between men and women is a result of sexual discrimination and not other factors that may
differentiate men and women in the labor market?
4. What are the steps in completing a hypothesis test, testing whether there is a statistically significant difference between two sample meanssuch as the mean wage of women as compared to men?
5. Compare the average wages earned by women in legal professions, e.g., seamstress, cleaning
woman, to those earned by prostitutes in Chicago at the turn of the century.
6. Describe how the revenues earned by the proprietors and the employees of the Everleigh Club
compared to the average for prostitutes in Chicago. What factors accounted for this difference?
7. A recognized problem with surveys and questionnaires that gather data about sensitive or morally ambiguous subjects is that the responses by those being questioned are often unreliable.
Explain why the method Sudhir Venkatesh used to collect data on prostitution in Chicago is
superior to more traditional survey methods, particularly with respect to the reliability.
8. Using the example of the data collected on prostitution in Chicago, explain the difference
between time-series and cross-sectional data.
9. Explain the difference between normative and positive analysis. How is this distinction useful in
studying the sex trade?
10. How does the illicit and illegal nature of prostitution make the collection of data for statistical
analysis difficult, expensive and potentially unreliable?
11. Explain the use of trackers by Sudhir Venkatesh in gathering data about the sex trade in
Chicago. What, do you think, would be useful qualifications for the job of tracker?
12. Using the table presented in SuperFreakonomics on the menu of sex acts and the mean price of
each, what graphical devise would be useful to visually illustrate this information? Could you
create one that would array the information in an informative and visually compelling way?
13. Using the data collected on the sex trade in Chicago, explain the difference between an experimental study and an observational study. Can you think of a way to gather basic data on the sex
trade using an experimental approach?
14. Explain the difference between simple random sampling and convenience sampling. Which was
used by Sudhir Venkatesh to collect data on the sex trade?

15. How is data from convenience sampling to be used? Under what conditions can standard methods of statistical inference be used with convenience sampling?
16. Explain what Levitt and Dubner mean by pimpact. In addition, describe a method for statistically
measuring pimpact.
17. Explain what Levitt and Dubner mean by rimpact. Describe a method for statistically measuring
rimpact.
18. How might one use multiple regression analysis to explain what variables determine the price
of a house. What independent variables do you think should be included in the regression
analysis?
19. Sudhir Vankatesh concludes that the mean hourly wage for a street prostitute who uses a pimp
as a manager is higher than one who doesnt. What statistical method can be used to determine
if this difference is statistically meaningful?
20. What information is needed complete a hypothesis test of two sample meanssuch as the
mean hourly wage of prostitutes with a pimp and those without?
21. Answer the question posed by the chapters title, i.e., how is a street prostitute like a department store Santa?

Chapter 2
Why Should Suicide Bombers Buy Life Insurance?
Summary
This chapter begins with a description of how certain outcomes a person experiences, e.g., susceptibility to certain illnesses, disease, propensity for success in academics and the sports world, can
be tied to characteristics of ones parents, e.g., their religion, name, or success on the field of play, as
well the individuals date of conception. The authors then use the fact that what makes a particular
boy 800 times more likely to play in the major leagues than a randomly selected onethe boys
father played in the majorsto segue to the question: Who produces terrorists?
The next part of the chapter focuses on various characteristics of a terrorist, e.g., the family background of the typical terrorist; the distinction between a terrorist and a revolutionary; how the
act of terrorism works to achieve the terrorists goals, including the public goods aspect of terrorism; the direct and indirect costs of terrorism; and the unintended benefits a terrorist threat can
yield, e.g., a reduction in other types of crime due to increased policing activities and reduced illness because people dont fly as much. This discussion once again highlights how the conventional
wisdomterrorists are poor and uneducatedcan be at odds with realityterrorists tend to
9

come from well-educated, higher-income families. The discussion of terrorism in turn serves as a
springboard to examine the importance of information. The authors focus on how the September
11 attacks in particular, and terrorist acts in general, could easily overwhelm most emergency rooms
around the country. The reasons include the design of most emergency rooms and information
constraints.
Levitt and Dubners discussion of Craig Feieds efforts to transform emergency care at a Washington
area hospital provides an interesting snapshot of the evolution of hospital emergency care over
the past 50 years and an excellent illustration of the value of information. Adequate information
about a patients background is critical to successfully treating that patient in an emergency situation. The problem for Feied and his colleague was how to make that information readily available.
The answer was object-oriented programmingand the result was Azyxxi, which Microsoft subsequently purchased from its creators and renamed Amalga. The adoption of Amalga in turn has
resulted in substantially improved medical care for patients. It also has produced a massive amount
of data which can be used to, among other things, evaluate the relative effectiveness of doctors.
In the next part of the chapter, the authors emphasize the many pitfalls that can be encountered
in attempting to answer a question such as: Who are the best and worst doctors in the E.R.? One
of the most pressing problems is the lack of correlation between the quality of health care and the
death rate of a particular doctors patients. After identifying the type of data that are most likely to
reflect a doctors relative skill the authors note that, in fact, there is relatively little variation among
E.R. doctors with respect to skill. That being said, they also point out that some doctors, e.g., females
from top-rated medical schools, appear to perform marginally better than their peers. Factors that
are actually more important from the patients perspective include his/her specific ailment, gender, income level, and ability to avoid going to the hospital if at all possible. This leads to a brief
discussion of some of the factors that are correlated with longevity, including professional success,
religious behavior, and inheritance taxes. This is followed by an examination of the relative ineffectiveness of chemotherapy as a treatment for many forms of cancer and a brief exploration of
possible explanations for why chemotherapy nonetheless continues to be used so heavily.
The final section of the chapter explains the chapters title by examining those factors that tend to
distinguish a terrorist from the broader population. For example, the typical terrorist owns a mobile
phone, is a student, and rents his home. Equally important, the typical terrorist does not have a
savings account, does not withdraw money from an ATM on Friday afternoon, and does not buy
life insurance. Based on these characteristics, it would be in the terrorists interest to purchase life
insurance to throw the authorities off of his track. While these characteristics substantially reduce
the pool of potential terrorists, there are an unacceptably large number of people who, based on
this information, would be falsely accused of being a terrorist. The authors then explain that the
relative intensity of an unspecified banking activity (which cannot be described for security reasons) substantially reduces the pool of suspected terrorists and may greatly enhance our ability to
thwart future attacks.

10

Basic Statistical Concepts


1. D
 escriptive Statistics. As with the previous chapter, this chapter is replete with examples of
descriptive statistics. Honestly, its a statistical feast. You can use this chapter to see illustrations
of data presentations, data analysis, and data gathering techniques. You can see from the data
on terrorists and from the data regarding patients and ER doctors that there is significant information in producing simple averages, ranges, relative frequencies, and in creating data tables.
Survey Design. The preparation of a survey, the choice of participants, and the development of a
survey instrument are to be approached with caution. As a student of statistics, the example of
Craig Freied and the Washington Hospital Center, should serve as a case study in the problems
encountered in designing a survey. While the WHC possessed a significant data set, the ER suffered
from datapenia, (a word invented by Freied). Doctors were spending significant amounts of time
managing information.
The survey questions, given to doctors by medical assistants, facilitated the development of a medical information system that allowed doctors to spend more time treating patients. The information
systemnamed Azyxxibegan with an astutely prepared survey.
2. C
 orrelation versus causation. Many principles texts like to address certain fallacies that must be
avoided when conducting scientific inquiry. One is the problem of confusing correlation with
causation. For example, the number of people who drown and ice cream sales are positively
correlated: both tend to increase in the summer. However, we should not then conclude that the
increase in ice cream sales causes an increase in drownings (or vice versa).
The chapters opening discussion of the correlation between health problems and the timing of
certain individuals conception/birth highlights the importance of digging deep enough to find
the actual cause of such correlations; the hidden variable that is the causal variable, e.g., parents
religious practices. The same is true of success in sports. It is not the fact that someone is born
earlier in the year that makes them better at sports, rather, boys and girls born earlier in the recruiting year for such things as Little League baseball, tend to be more physically mature during a key
time period.
Given statistic that a ball player is 50% more likely to be selected by a major league team if his
birthday is in August instead of July, if one didnt separate correlation from causation, youd be
tempted to believe that ones astrological sign conferred baseball talents.
More correlation vs causation. This same distinction is also critically important when evaluating
the relative effectiveness of doctors. Focusing on patient outcomes could be extremely misleading
depending on how patients are assigned to specific doctors. As such, the observation that a particular doctors patients have higher mortality rate than another doctor does not allow us to conclude
that the first doctor is not as good as the second doctor. As described in the next paragraph, this
could be a problem of the dreaded selection bias.

11

3. S election Bias. Clearly selection bias is an ill to be avoided and is a subject covered by every basic
statistics text. Yet, in practice, its often difficult to avoid. College instructors and researchers
often recruit students to participate in studies of various things. A note is placed in the student
newspaper or bulletin board asking for volunteers. Medical clinics will often advertise on the
radio or newspaper for people having a certain condition in order to test a new drug or therapy.
In hospitals and universities alike, the reliance on volunteers introduces a selection bias.
As described in this chapter, measuring a doctors skill also points out that selection bias can be
hard to avoid. This problem occurs because patients arent randomly assigned to doctors. Better
doctors may even have a higher death rate because the sicker patients may seek out the doctors
with the best reputations. This exampleattempting to measure a doctors skillsis a good case
study on encountering and avoiding sample bias.
Its important for the student to know that most of the statistical methods that he or she learns
from the statistics textbook are based on the premise that the data gathered from a sample are
representative of the population.
4. M
 ultivariate Regression Analysis. What makes a good terrorist? In the example of Ian Horsley,
the characteristics of people who commit bank fraud and those of terrorists are revealed. This
example can be used to describe how such characteristics can be revealed through regression
analysis. Think for a minute about how a regression equation would be developed that could be
used as a model to predict a terrorist or a person who is planning on committing bank fraud
at least a model that would be evidence enough to cast suspicion on a person. What would the
dependant variable be? It might be a dichotomous variable: Terrorist or not a terrorist. Some of
the independent variables could be: Did a person open his bank account with cash? Is a persons
address a P.O. Box? Did the person open a savings account as well? Others independent variables
would be numeric or indices: the frequency of large withdrawals or the number of overseas
transactions.
So, once youd gathered historical data on the banking habits of terrorists and non-terrorists,
you would calculate the regression coefficients and test the regression equation for statistical
significance.
Model Specification. Starting with a regression equation and testing the equation and each variable
for statistical significance begins the process of model specification. You may decide some variables
are not statistically significant and drop them out of your equation. You could use a process called
stepwise regression to specify your model. Once you have a regression equation that passes muster,
you now use this equation as a predictive toola modelto assist in predicting who might be a
terrorist or bank criminal.
5. L agged Variables. A statistical technique used in developing a regression model is a lagged variable. It is a variable that is lagged one or more time periods before the dependent variable. The
theory is that some variables take time to produce results. The analyses done by Almond and

12

Mazumder described at the beginning of this chapter regarding the accident of birth requires
that one or more of the independent variables be cast as a lagged variable. Another example
is the case of the Spanish flu. One way to reveal that the cause of a persons relatively lower
lifetime income is the 1918 flu pandemic is to correlate a persons current income with a lagged
time variable. The same could be done for the oddities of horse racing in 2004. The prequel to
this book, Freakonomics, included a rather notable example of a lagged variable. The reduction
in the U.S. crime rate was revealed to have likely been caused by an increase in abortions 18 to 25
years earlier.
6. P
 roper Choice of Dependent Variable. At times, when concentrating on collecting and developing
data in order to populate a data set in order to estimate the coefficients of a regression equation, it is not clear what the dependent variable should be. This chapter considers the risks that
individuals face when considering the choice of driving or flying to a destination. What is it that
people are really trying to avoid? Is it death? That is unlikely because, though many people avoid
thinking about it, death is inevitable. Levitt and Dubner describe how people are more interested in avoiding risks that they themselves dont control. For example, when one is flying in a
plane, ones safety is determined by someone else. In a car, ones safety is largely controlled by
oneselfor so we like to think.
Core Competencies
Once you have read and carefully studied this chapter you should be able to complete the following
tasks. Note: these competencies do not begin with simpler statistical concepts and work toward more
difficult one. Instead, they follow the organization of the book.
1. Describe how the month in which someone is born can influence his success in sports in his
adolescent and early teen years. Use this to explain the difference between correlation and
causation.
2. What visual techniques would you use to illustrate the relative age effect? Could you create
this visual with the information given? Remember, the purpose creating a visualization of data
display is to make it compelling and informative.
3. Explain how you would use multivariate regression analysis to determine how likely it is for a
person to play major-league baseball. What independent variables would you use?
4. What is a dummy variable? How would a dummy variable be used in a regression analysis to
determine whether there is a relative-age effect in professional sports?
5. What is a lagged variable? How would a lagged variable be used to determine the life-time
earnings of someone whose mother suffered from the Spanish flu in during her pregnancy?
6. Explain the statistical idea that correlation is not causation as it relates to the relative age
effect. Wouldnt certain astrological signs be correlated with great baseball players?

13

7. How could multivariate regression analysis be used to determine the characteristics of the
typical terrorist? Could this technique be used by law enforcement agencies to profile likely
terrorists?
8. What is risk analysis? How is it illustrated in describing the likelihood of death by terrorist?
9. In comparing the risks of driving a car versus the risk of flying an airplane, what risk factor
should be used (i.e. if using a regression equation, what is the dependent variable)? Should we
be measuring the risk of serious injury? The risk of death? The risk of death per mile? Or per trip?
10. Discuss the importance of information in the provision of treatment in a hospital emergency
room. In particular, focus on how the amount of information readily available to doctors can
affect their ability to successfully treat patients.
11. List the four variables Craig Feied identified as critical for an information system to possess in
order for the system to be truly effective. What are the methods a statistician would use for
gathering data on these variables?
12. What is datapenia? Describe how Craig Feied designed a survey that would collect the critical
information needed to develop an information system for hospital staff that was truly effective.
13. Summarize the available data on the relative effectiveness of chemotherapy as a treatment for
cancer. Suppose you were to gather data to confirm or refute the results described. How would
you gather these data? What method(s) would you use to analyze the data you have gathered?
14. According to Levitt and Dubner, the fact that the age-adjusted mortality rate for cancer is essentially unchanged over the past half century hides some good news. Describe this good news and
explain what variable is added to the age-adjusted cancer mortality rate to be able to discern
this good news.
15. The mortality rate among active duty military is almost the same during the 1980a time of
peaceas during the years in which the military was fighting wars in Iraq and Afghanistan.
Information on what variables could be gathered to help explain this phenomenon?
16. If you were interested in determining if there is a difference between the active duty military
mortality rates during a time of peace with the same mortality rate during a time of war, how
would you state the null hypothesis? What are the steps involved in testing the null hypothesis
you have described?
17. Explain the concept of selection bias. How might this be a relevant factor in determining the skill
of doctors?
18. How could multivariate regression analysis be used to determine the variables correlated with
the development of lung cancer? Describe what independent variables you would include in the
regression analysis.
14

19. Suppose that one of the independent variables included in the regression is whether a person
smokes cigarette. Suppose that this variable turns out to be statistically significant. Would you
have proved that cigarette smoking causes lung cancer?
20. Explain why an algorithm designed to identify the terrorists in a population that is 99 percent accurate is nonetheless of limited value, especially as the size of the relevant population
increases or the number of terrorists in the population declines.

Chapter 3
Unbelievable Stories About Apathy and Altruism
Summary
In this chapter, Levitt and Dubner examine the evolution of experimental economics and the subfield of behavioral economics. They set up the discussion by describing one of the more shocking
stories from the 1960s: the brutal murder of Kitty Genovese and the apparent apathy displayed by a
large number of bystanders. As they discuss the rise in crime that occurred between 1950 and 1970,
they explore a variety of possible explanations for the observed increase, including reduced arrest
and imprisonment rates and the baby boom. The most interesting explanation, which also has
empirical support, is the advent of television. (Why increased television viewing leads to increased
crime is still somewhat of a mystery.) The authors then shift back to the Genovese murder to ask
why no one exhibited altruism on that particular night.
The next part of the chapter, which examines such questions as why and to what extent altruism
exists, begins by considering the distinction between altruism and self-serving behavior. As Gary
Becker has argued, many seemingly altruistic actions, e.g., visiting someone in a retirement home,
frequently include a strategic element. Subsequent empirical analysis has confirmed this. To better
understand altruism, a group of economists moved their work to the laboratory, i.e., experimental economics, to gain additional insights. Building on the work of John Nash and the Prisoners
Dilemma, two new games, Ultimatum and Dictator, were developed and refined. The overwhelming conclusion that emerged from experiments involving these games was that the vast majority
of humans are in fact altruistic. This, in turn, cast considerable doubt on the traditional assumption
of the rational, self-interested economic agent. It also raised the possibility that most, if not all, of
societys problems could be solved simply by relying on the altruistic nature of human beings. Take,
for example, the case of organ transplants. If humans are indeed altruistic, it seems there would
never be a shortage of kidneys for transplants considering a person is born with two but really only
needs one.
Attention then shifts to the subfield of behavioral economics and the work of John List. After
chronicling Lists career path, including his work in mainstream behavioral economics, the authors
describe how List began to explore the relationship between laboratory results and the real world.
15

In short, what List found was that the participants in an experiment behave much differently when
they dont know they are part of an experiment. Moreover, as the experiment more closely resembles the real world, results approach what traditional theory would predict, i.e., rational, self-interested behavior. The explanation for why previous laboratory experiments produced the results they
did include selection bias, the effects of scrutiny, and context and its corresponding incentives.
The last part of the chapter considers what constitutes pure altruism, as opposed to impure altruism, and then returns to the case of the Genovese murder. What is really interesting here is that all
of the shock waves felt throughout the country in the wake of this story may have been more the
result of fiction than fact.
Basic Statistical Concepts
1. Descriptive Statistics. This chapter provides a number of descriptions of rising crime rates. Unlike
many of the examples of cross-sectional data in previous chapters these are examples of timeseries data. Just for practice, as you read through this chapter, identify the type of data youre
looking at: Is it time-series, cross-sectional or both? Also, try this: What visual devise would you
use to present the crime rate data over time. Make it something that would be immediately
informative.
One of the studies referred to in this chapter uses data from the U.S. government, referred to as
a longitudinal study. How are these data different from time-series data?
2. Correlation vs. Causation. Levitt and Dubner note in this chapter that the rise of TV watching is
correlated with the increase in the crime rate. This appears, without further examination, to be
a spurious correlation. After all, how can watching TV lead to criminal behavior? There must be a
thirdcausalvariable at work here.
However, Levitt and Dubner postulate some hypotheses suggesting a causal link between
youthful TV watching and adult crime. They are compelled to this by finding additional varieties of correlation between youthful TV watching and adult crime. Levitt and Dubner extend the
ways that they statistically examine the correlation of these two variables. For example, they
look at crime rate differentials for different age groups within a city.
Just for practice, how you would test some of these hypotheses? Did the Andy Griffith Show
make people think that the law was incompetent so they would easily get away with criminal
activity?
3. Experimental vs. Observational Studies and a natural experiment: This chapter is largely about
an experimental approach to gathering data on human behavior. Featured prominently is the
experimental work of people like John List. This is unusual because economists and those who
practice statistics in the social sciences are often relegated to gathering observational data.
As a student of statistics, you should be able to distinguish between the experimental approach to
gathering and analyzing data and the observational approach.

16

Entertainingly included in this chapter is an example of a natural experiment: a case where circumstances surreptitiously prevail and observational data are similar to the setting up an experiment. The natural experiment was the large and sudden release of people from prison as a result of
lawsuits related to prison overcrowding. We will encounter another serendipitously arranged set of
circumstances that forms a natural experiment in the next chapter: a study of germs and childbirth
by Dr. Ignatz Semmelweis.
The laboratory experiments by John List and other behavioral economists areas we have said
not the usual fare of economists or statisticians in the social sciences. Therefore, the experiments
are a useful contrast to observational studies. From these examples you should learn that the selection of the participants in these experiments is important. Also, how you set up the experiment is
important. We describe this further in the following paragraphs.
4. Multiple Regression Analysis. Explaining the increase in U.S. crime rates during the decades of
the 1950s, 60s, and 70s is a statistical problem that lends itself to a multiple regression analysis technique. There are a variety of variables that are possible candidates as causes of the
increased crime rate: population increase, a growing anti-authoritarian attitude, and reductions
in prison population are examples used in this chapter.
While the specific data are not given, enough information is presented in this chapter to understand how a regression model that explains an increase in crime rates might look. As practice, use
this chapter and make a list of the significant independent variables. See if you can determine the
degree of influence of these variables on an increasing crime rate.
Now, describe the multivariate regression analysis that you would develop. What would the independent variables be? How would you gather the appropriate data?
5. Multicollinearity. The OLS method of calculating the coefficients for regression equations make
assumptions about the variables included. One of these is that the independent variables
are not correlated with each other. In examining the causes of the increase in the crime rate
through the 1950s, 60s, and 70s, an astute observer would include independent variables such
as: the incidence of poverty, the number of single parent families, the number of female heads
of household and the incarceration rate. If you think about it, there may be an approximate linear relationship among some of these independent variables. If this condition exists, its known
as multicollinearity. One of the possible cures for multicollinearity is to respecify the model to
eliminate or reduce the relationship among the variables. As you learn about regression analysis, you will learn how to test for this problem as well as other conditions that may violate the
assumptions of OLS estimators.
6. Selection Bias. In examining the behavior of those who played the game Dictator, List concluded
something about the behavior of the population. Were the subjects of the observations a representative sampling of the population? Likely not. As pointed out by Levitt and Dubner, college freshmen who volunteer for a laboratory experiment differ from the general population in
important ways.
17

This is a good example of importance of obtaining a random sample. A properly selected random
sample will be likely to be representative of the population. At least, with a random sample, the statistical techniques you learn from your textbook will allow you to create confidence intervals and
perform tests of significance.
As we described in our review of the first chapter, there are samples described in SuperFreakonomics
that are not, strictly speaking, random samples (the prostitute data for instance). The analysis performed using the data from these samples can be inferred to the population if the subjects that are
observed do not differ in any systematic way from the population.
7. Behavior Modified by Observation? Do you modify a persons behavior by observing him or her?
This question is not relevant in conducting observational studies. However, laboratory experimentation using humans as subjects potentially affects the behavior of the subject. This means
that the behavior observed by the researcher during an experiment does not translate to the
population as a whole.
The behavior observed by John List through laboratory experiments seems to contradict the behavior observed in the field or when the rules were less strict and there no one is observing the participants during the experiment. One explanation for this is described in the paragraph above. This is
also referred to as the effect of scrutiny. Do people act differently when they think they are being
observed? Levitt and Dubner conclude that people may act with a greater degree of altruism in laboratory when they know they are being observed because there is a social stigma to acting greedily.
8. Mendacity, Artificial Context and the Reliability of Responses. Levitt and Dubner describe another
reason why the work done by List suggests that the results of many laboratory experiments
should be considered unreliable: context. A laboratory setting is artificial. The initial conclusions derived from the Ultimatum and Dictator games, and the results of John Lists subsequent
experiments, based on slight modifications of those games, clearly demonstrate that, in order
for the results that flow from it to be valid, a model must sufficiently reflect the relevant features of the world it is trying to emulate.
A related context problem involves the use of college volunteers. While playing Dictator, these
students may be more interested in pleasing the investigator than reaching inside themselves and
showing their true selves. Levitt and Dubner refer to college volunteers as scientific do-gooders.
So, as a student of statistics, what do you conclude? In order to avoid unreliable results and conclusions, laboratory experiments must be set up to avoid selection bias, the behavioral changing
effects of scrutiny and must have a context that doesnt have an artificial feel.
Core Competencies
Once you have read and carefully studied this chapter you should be able to complete the following
tasks. Note: these competencies do not begin with simpler statistical concepts and work toward more
difficult one. Instead, they follow the organization of the book.

18

1. Explain why the brutal murder of Kitty Genovese raised such serious questions about the seemingly limitless apathy of people in the United States.
2. Explain the related to U.S. crime rates through the decades of the 1950s, 60s and 70s.
3. Describe the trend of arrests per crime and the incarceration rate over these same decades.
4. How could multivariate regression analysis be used to determine the influence of changes in
arrest rates, incarceration rates, and the postwar baby boom on the crime rate?
5. SuperFreakonomics describes growing crime rates during the 1950s, 60s, and 70s. At the same
time, the U.S. population rate was growing. State the null hypothesis you would test if you
wanted to determine whether the per capita crime rate in this time period was statistically significantly different from the 1940s. What steps are needed to conduct a hypothesis test on this
question?
6. Variables that might be used to develop a model for the crime rate include the incidence of poverty, the number of female heads of households and the incarceration rate. Explain why there
might be a correlation among these variables.
7. A number of assumptions are required for the least squares method of estimating coefficients
of a multivariate regression analysis to get unbiased estimators. What are these assumptions?
In reference to the previous core competency, which of these assumptions might be violated?
What is this condition known as? How can it be addressed?
8. Describe the correlation between television viewing and the rate of crime in individual cities.
Does this correlation indicate that television viewing increases crime?
9. How was the statistical technique referred to as lagged variables used in establishing this
correlation?
10. What explanations do the authors give as to why television viewing at a young age, specifically,
The Andy Griffith Show, is correlated with criminal behavior later in life? How, using statistical
methods, could you further investigate this explanation?
11. What is a natural experiment? Explain how a natural experiment was used to determine the
relationship between incarceration and crime rates. How is it different from observational
studies?
12. This chapter describes the work by several behavioral economists using laboratory experiments.
Explain how the approach of experimental economics and the resulting subfield of behavioral
economics differs from most economic research.
13. What are the differences between John Lists experiments involving baseball trading cards in a
laboratory setting and his experiments on a real trading floor? What did List hypothesize might
be the source of the observed differences?
19

14. Considering the experiments using the various forms of the Dictator game, how do different
rules of the same game affect the conclusions of the researcher?
15. Given that seemingly minor modifications of rules affect the outcomes of laboratory experiments and thus the conclusions about human behavior drawn by the researcher, how can you
know whether an experiment isnt missing some delicate rule? In other words, can research,
done in the context of a laboratory experiment, ever be considered reliable?
16. Surveys and questionnaires are used extensively to gather information that a statistician summarizes, analyzes, and draws conclusions from. Explain whether the context of these surveys can
cause the responses of these experiments to differ from behaviors one might observe in the real
world. What does this say about the reliability of information gathered through surveys and
questionnaires?
17. Summarize, in general terms, the modifications John List made to the Dictator game, and
explain why the results of his experiments ended up casting extreme doubt on the conclusions
regarding human beings and altruistic behavior that were based on the initial version of the
game.
18. Explain the role selection bias has in gathering reliable information from experimental studies.
As described by the authors, how was the selection of experiment participants biased?
19. What is the role of scrutiny in gathering reliable data from a laboratory experiment such as
Dictator? How can it change the outcomes of these experiments?
20. How does a further examination of the facts and data related to the Kitty Genovese murder
compare with the story told by the New York Times article published shortly after the incident?

Chapter 4
The Fix Is Inand Its Cheap and Simple
Summary
This chapter is all about problems, solutions to those problems, and unintended consequences
that can arise. The specific problems addressed range from health-related issues (including deaths
among mothers of newborns and polio), to deaths due to car accidents, to population growth and
species extinction. A recurring theme is that in all of the cases considered, the solution to the problem is invariably simple and relatively cheap, once the problem and its cause are well understood.
The chapter begins by describing the puerperal fever epidemic that struck top European hospitals
in the mid-1800s. As Levitt and Dubner point out, many of the suspected causes, in one way or

20

another, held women responsible for the outbreak. And it wasnt until an enterprising administrator who relied on statistical observations began to examine the problem, and the accidental death
of a leading doctor, that the problem was solved. After all was said and done, the disease was linked
to exposure to germs delivered by doctors who did not adequately clean their hands after performing autopsies and just before they delivered babies. And so it was that the solution to the problem
was as cheap and simple as one could possibly imagine, i.e., wash your hands, Doctor!
The authors discuss a number of solutions to problems that have proven to have unintended, and
costly, consequences. The Americans with Disabilities Act has made it easier for these folks to move
about in society, but it also has left them with fewer job opportunities because potential employers became afraid of running afoul of the law and simply chose not to hire the disabled. And then
theres the Endangered Species Act, which has led many property owners to engage in habitat
destruction, which is the opposite of the laws intent. The same is true of such policies as per unit
pricing of trash disposal, which has encouraged illegal dumping and resulted in more burn victims
because people have an increased incentive to burn their trash, and a mandatory debt relief law
that ended up limiting the availability of credit to those who need it most.
The next part of the chapter focuses on a variety of situations in which major problems have been
solved using relatively cheap and simple fixes. Ammonium nitrate, for example, proved to be a
rather inexpensive solution to the problem of how to feed a rapidly growing world population. The
discovery of oil averted what otherwise would have been the extinction of whales. And a set of vaccines ultimately eliminated the burgeoning costs polio imposed in the United States and abroad.
At this point, the authors also stress the difference between treating a problem after it arises and
effectively eliminating the problem. This is the nature of vaccines and other preventive drugs in
medicine. There is no doubt that considerable costs are incurred in developing new vaccines and
drugs. But it is also true that once the vaccine/drug has been developed, its costs are dwarfed by
the benefits in the form of adverse health effects avoided.
Levitt and Dubner then return to a topic addressed at some length in their previous book, i.e.,
deaths associated with automobile accidents. However, this time, they address the problem more
broadly, considering deaths of both children and adults. Beginning with death rates in general, they
emphasize the distinction between mitigating the adverse effects of an accident ex post, and avoiding the harm altogether. This was indeed the motivation for putting seat belts in cars. If someone is
prevented from being thrown about the interior of a car when an accident occurs, the injuries suffered will be much less than they would be without such a restraint, regardless of how hard or soft
objects in the car are. Levitt and Dubner turn to the issue of child safety seats. This time around, the
conclusions are more specific, i.e., children under the age of 2 are clearly better off in car seats, while
children in the 2-6 age category are equally well off using either seat belts or car seats, especially
when it comes to serious injury. They wrap up this part of the chapter by suggesting a possible solution that would be both simple and cheapdesign seat belts specifically for children.

21

The last section of the chapter describes a possible solution to one of the more destructive problems created by Mother Nature: hurricanes. After explaining the process by which hurricanes form
(you cant identify a solution if you dont first understand the problem), they describe a relatively
low-cost solution that is currently being tested. The ultimate question, of course, is whether governments can be persuaded to try something that is both cheap and simple.
Basic Statistical Concepts
1. Descriptive Statistics. The statistics in this chapter are, like the other chapters in this book, abundant. This chapter, however, contains some rather astonishing statistics: the death rates of
mothers giving birth from puerperal fever, the reduction in child fatalities through the use of
child safety seats, the whale harvest by the Captain Ahabs of the 19th century among others.
As a student of statistics, you might like the fact that much of the data in this chapter are often
related to government policies and the unintended consequences of these policies. Examples
include data on the how the Americans with Disabilities Act might have resulted in less employment of the disabled and how the Endangered Species Act might result in a reduction in the habitat available to endangered species.
This is also a good chapter to illustrate both cross-sectional and time-series data since there are
several examples of both. There are also a number of cases of data that are combined time-series
and cross-sectional. A good exercise for you as a statistics student would be to read through this
chapter and identify the type of data being described.
2. C
 orrelation versus causation. As anyone who has had basic statistics knows, correlation
does not imply causation. This is a lesson that we have now come across in every chapter of
SuperFreakonomics. That being said, correlation and causation are often treated synonymously.
Thus, it should not be surprising that such factors as personal predisposition (no doubt many
mothers about to deliver their babies are quite anxious), foul air in the delivery wards (remember this was happening in the 1800s), the presence of male doctors, and catching chill or leaving
the delivery room too soon were suggested as possible causes of puerperal fever. While there
may have been a high correlation between the presumed causes and the fever, hindsight clearly
shows the cause which was related only to the male doctors, because they were performing
autopsies and not washing up before entering the delivery room.
This correlation is causation link is not a problem relegated to the 19th century, however. In the
middle of the 20th century, some researchers thought that polio was caused by the consumption
of ice cream because they both tended to spike in the summer. Such a connection is no more defensible than the suggestion that the number of deaths by drowning is caused by the consumption
of ice cream given that both are observed to increase in the summer monthsa famous case of
misapplied logic
This chapter also includes at least one example of correlation having meaningful causation: the
discovery and use of oil and the salvation of the whale. Here, the increased use of oil is correlated

22

with reductions in the harvesting of whales. The correlation, in this case, is not spurious, but is, in
fact causal. So, are you confused? Are you thinking that weve changed our minds here in the penultimate chapter and now say its okay to draw a causal link when only correlation is known. Heaven
forbid! Correlation is used as evidence to support a hypothesis. Or, there must be an explanation;
a theory to support the causal link (in this case: oil from crude is a perfect substitute for oil from
whale blubber). That doesnt mean that all theories that describe a causal link between two variables that are supported by the data are good theories. A grade-A student is always open to new
theories.
3. Experimental vs. Observational Studies. Levitt and Dubner tackle the question of the efficacy of
child-restraining systems in automobiles. The first thing a statistician might do is to take an
observational approach. That is, to look at the data already gathered by the alphabet soup of
government agencies. However, it turns out that the available data relates only to mortality and
injury rates for kids in childrens car seats or for kids who are completely unrestrained in the car.
Comparisons of restraining systems are not possible with the existing data. So, a lesson for the
student. When data regarding the variables of interest are not available, a researcher has to conduct his or her own experiment.
Conclusions are drawn regarding an experiment conducted by Levitt and Dubner comparing the
child safety aspects of child seats versus standard seat belts. Since Levitt and Dubner found no
adequate source of data to test the safety efficacy of seat belts as compared to child safety seats,
they conducted their own experiment.
Another interesting note for the student of statistics is that Levitt and Dubner had a difficult time
finding a crash test laboratory that would allow them to purchase the services of the lab! This is
because the laboratory owners didnt want to be involved in a test that would draw conclusions
that would offend the car-seat manufactures.
4. Selection Bias: A major portion of this chapter is about child restraints in automobiles. The statistic cited by child-safety researchers is that booster seats reduce significant injury by roughly 60
percent. The data used by the researchers to arrive at this conclusion is garnered from insurance
records that describe interviews with parents. It may well be that these records are fraught with
distorted information. There are two reasons for this. First, an automobile accident involving
ones child is a traumatic event for the parent. He or she may misremember the details of such
an event. Second, the parents interviewed by their insurance company may be intimidated or
embarrassed causing them to misrepresent the truth about how restrained their children may
have been at the time of an accident. This is known as response bias. For both of these reasons,
the information given by parents may not be considered reliable.
We now have described several examples of how selection bias distorts the goal of collecting reliable information. Selection bias seems to be difficult to avoid.
5. Multiple Regression Analysis. The advent of the agricultural revolution has been the subject of a
considerable amount of analysis. The development of a model to assist in explaining the agricultural revolution can be accomplished through multivariate regression analysis. As a student
23

of statistics, as you read the part of the chapter on the agricultural revolution you may wonder
what variables would be considered as significant independent variables. For practice, construct
a regression equation using the variables mentioned. Which of these, do you think, would be
considered the most consequential?
6. Heteroskedasticity. The unbiasness of OLS estimators relies on the assumption of homoskedasticity. This funny word means that the error terms for each observation of a variable are normally
distributed and that this distribution is the same for each value of the variable. In this chapter,
there is an example that could be used to illustrate heteroskedasticity: the powerful damage
wrought by hurricanes. Suppose a hurricane researcher is attempting to develop a regression
equation in which damage done by hurricanes is the dependent variable. Suppose the size of the
hurricane is one of the independent variableswhether measured by category or by energy output. It is likely that the error term would increase in size as the size of the hurricane increases-with a result of heteroskedasticity.
Core Competencies
Once you have read and carefully studied this chapter you should be able to complete the following
tasks. Note: these competencies do not begin with simpler statistical concepts and work toward more
difficult one. Instead, they follow the organization of the book.
1. Describe the various (incorrect) proposed explanations for the dramatic increase in puerperal
fever that occurred at General Hospital in Vienna in the 1840s. For each one, explain how you
think these explanations came about.
2. What type of data was collected by Semmelweis to study maternity mortality rates?
3. Describe whether the mortality data for the Vienna hospital represents a randomized sample, a
convenience sample or a census or something else. What difference does this make in terms of
drawing conclusions on the mortality rate of the doctors ward versus the midwives ward?
4. Considering the data collected on mortality rates at the Vienna hospital in the various wards,
what are the values of the point estimates? How are point estimates different from population
parameters?
5. Describe the process Semmelweis went through to develop a hypothesis for the cause of the
higher mortality rates in the doctors ward at the Vienna hospital.
6. Identify the variable that Semmelweis determined to be correlated to the incidence of puerperal
fever at General Hospital in Vienna. Did he decide that this correlation indicated causality? What
did he do to test his hypothesis? What is confirmed?
7. Lets pretend to start over and test Semmelweis hypothesis according to standard statistical
methods. What is the dependent variable in this case? What are the independent variables?

24

8. When testing a medical procedure, the method most often used is to have a random selection
of patients treated with the medical procedure and a control group that is not treated. How
would you set up this type of experiment to test Semmelweis hypothesis? What is the null
hypothesis?
9. Summarize the costs, both personal and economic, that resulted from the polio epidemic that
struck the United States in the early to mid 1900s.
10. Explain the principal of unintended consequences. Use examples from this chapter to illustrate
the principal.
11. How would you test the hypothesis that the use of forceps during child delivery reduces the
incidence of death and injury?
12. Using multivariate regression analysis, how would you create a model that explains the agricultural revolution?
13. Explain the process of stepwise regression. How might you use this method to develop the
model of the agricultural revolution mentioned above?
14. Consider how the risk of travel might be examined. What variable would you choose as the
dependent variable in an analysis of variables that are correlated with risk? Explain how using
mortality per number of miles driven vs. mortality per trip taken affects your conclusions?
15. Explain why Levitt and Dubner conducted their own experiments on child safety seats versus
adult seat belts rather than just resorting to the F.A.R.S. data set.
16. Using the child safety seat examples described by Levitt and Dubner, explain the difference
between observational studies and experimental studies.
17. Explain selection bias. Describe how it becomes a problem in gathering reliable responses from
the parent interview data of automobile companies.
18. Explain response bias. Why does response bias materialize in interviewing parents about the
circumstances related to automobile accidents in which young children are involved?
19. Considering child restraint systems in automobiles, how could a researcher design a study
that minimizes the chance that his or her research will be tainted by either selection bias or by
response bias?
20. Focusing on children over the age of two, what do the data tell us about the relative effectiveness of seat belts and child safety seats with respect to deaths resulting from car crashes?
21. The least squares method of estimating the coefficients of a regression analysis makes certain
assumptions about the behavior of the data in order to produce unbiased estimators of these

25

coefficients. Suppose the dependent variable is the economic damage done by a hurricane and
that the independent variable is the size of a hurricane. As the size of the hurricane increases,
it seems likely that the error term would increase in size as the size of the hurricane increases.
What is this condition known as? How can the model be specified to reduce or eliminate
this effect?

Chapter 5
What Do Al Gore and Mount Pinatubo Have in Common?
Summary
What if much of whats been proposed to address the problem of climate change is wrong or otherwise off the mark? This chapter takes a refreshingly objective look at the problem, focusing on
global warming. The first part of the chapter points out that as recently as the 1970s, global cooling, not global warming, was the major concern of many climatologists. The subsequent increase in
average global temperatures, however, moved global warming to the forefront. The authors explain
many of the suspected causes of global warming, ranging from carbon emissions, to methane from
cows and other ruminants, to changes in agricultural production. Levitt and Dubner also take a
close look at the unique character of the global warming dilemma, emphasizing the considerable
challenges scientists face as they try to predict what will happen over the longer term. They wrap
up the introduction by considering the near religious dimension of the movement to stop global
warming.
In the next part of the chapter, the authors present a very accessible discussion of the concept of
an externality, beginning with negative externalities such as pollution, and the idea of using a tax
to induce parties responsible for negative externalities to internalize them. They also are careful
to point out the considerable difficulties that would be encountered in any attempt to use a tax to
internalize the externalities that result in global warming. The discussion then moves to the concept of positive externalities with a fair amount of attention devoted to specific examples.
The first is the adoption of the LoJack by many automobile owners as a means to thwart the efforts
of would-be car thieves. This generates external benefits in the form of an increased ability of police
to locate and take down the chop shops where most stolen cars end up, and a reduction in auto
thefts for both people who install a LoJack and those who do not. The second example considered
is the eruption of Mount Pinatubo in 1991 and the decrease in the average global temperature that
followed. This serves as a springboard to the work that goes on at Intellectual Ventures (I.V.). It also
answers the question: What do Al Gore and Mount Pinatubo have in common? They both suggest a
way to cool the planet.
We were actually introduced to Intellectual Ventures in the previous chapter when we learned
about the approach that is being proposed to mitigate the adverse effects of hurricanes. The

26

authors take the time to acquaint the reader with many of the gifted scientists who work for I.V.
and, in particular, those who are working on solutions to the global warming problem. One of the
novel aspects of the solutions being proposed is also a carryover from the previous chapter; they
are, by and large, simple and relatively cheap. To better appreciate the solutions being proposed by
I.V., we are first provided with I.V.s assessment of the global warming problem, its causes, and the
likely effectiveness (little if any) of the solutions that have been proposed by others, e.g., Al Gores
focus on reducing the amount of CO2 being emitted into the atmosphere. In contrast, I.V. prefers
to take its cue from the effects of big-ass volcanoes, and reduce global warming by reducing the
amount of the suns rays that reach the earth. While Budykos Blanket, a garden hose to the sky that
would extend 22 miles into the stratosphere and spew out sulfur dioxide in much the same manner
as would a large volcanic eruption, is currently the focus of much of their attention, the folks at I.V.
are also considering other approaches, including extending the smokestacks of specific coal-fired
electric plants into the stratosphere, and stimulating cloud formation over the oceans to reflect
back more of the suns rays. Levitt and Dubner then point out the major obstacle to all of these
solutions, which collectively fall under the heading of geoengineering. The problem, quite simply, is the often encountered inability to change peoples behavior, especially when the proposed
change is considered repugnant by some.
The last part of the chapter chronicles the obstacles Ignatz Semmelweis encountered in his efforts
to get doctors to wash their hands before seeing each patient. Levitt and Dubner then point out
that this problem has persisted into modern times. In fact, only recently have certain hospitals
been able, with the help of a computer screen saver featuring bacteria-laden hand prints, to induce
doctors to achieve near 100 percent compliance with guidelines for hand washing. A similar challenge was confronted in Africa where the AIDS epidemic continued unabated until intervention in
the form of circumcision substantially reduced the incidence of HIV infections among the affected
population.
Basic Statistical Concepts
1. Descriptive Statistics. This last chapter, like the four before it, is replete with statistical information. The main focus of this chapterglobal warmingprovides examples of time-series data.
The atmospheric temperature of the Earth over a series of years and over a series of decades,
the change in ocean elevations over time and the change in global temperatures after the eruption of big ass volcanoes are all examples of time-series data. These stand in contrast to much
of the data presented in the previous chapters in SuperFreakonomics: The prices of sex acts and
the wages of prostitutes presented in chapter one are cross-sectional data.
Data can be presented in a variety of visualizations. Can you imagine how the global warming
data could be presented in an informative and creative way? This visualization could be a presentation of mean global temperature over time, or it could be a scattergram that presents two variables:
the mean global temperature and the amount of green-house gasses produced by anthropomorphic sources.

27

2. Normative versus positive analysis. Some of the efforts to reduce the pollutants that are believed
to contribute to global warming appeal to peoples altruistic motives. This is the approach being
taken in Al Gores massive PR campaign. While this is not meant to deny that Gores position
is prompted from scientific findings, there is nonetheless, a normative aspect to the appeal
for the adoption of specific policies. The argument goes like this: There is growing evidence
of global warming. Carbon dioxide is one of the greenhouse gasses believed to contribute to
global warming. Many activities people engage in cause an increase in atmospheric carbon
dioxide. Humans are morally obligated to reduce the amount of CO2 they are emitting into the
atmosphere.
Compare this to the proposal by the I.V. brainiacs that sulfur dioxide should be spewed into the
stratosphere to help cool the earth. This proposal is based on the observations of how a few bigass volcanoes have induced a cooling of the earth for a few years after they erupted. This policy
suggestion lacks normative considerations. It is, however, likely to be met by a normative backlash:
humans should not fight the effects of existing pollution by adding more pollution.
3. Correlation vs Causation. More than any other contemporary topic, the issue of global warming
is one that evokes the question of whether the correlation of human economic production and
atmospheric warming is a simply a correlation or whether the increasing production of CO2 by
humans is the cause of global warming. This chapter presents several examples of the confusion
regarding this question. In the mid-1970s, world-wide cooling was predicted. However, over the
past 100 years, global ground temperatures have risen.
Thus, the subject of this chapter serves to illustrate the logical fallacy of equating the mere correlation of two variables with causation. This is a case in which a more robust model must be prepared to include variables that may play a significant role in determining the fate of the global
atmosphere.
4. Multivariate Regression Analysis. As you now know, the development of a properly specified multivariate regression equation is a precursor to model building. The development of a predictive
model of global climate change is facilitated by the creation of one or more multivariate regression equations. Each variable included in the equation(s) used would be tested for statistical
significance. After reading this chapter, consider what independent variables might be included
in a multivariate regression analysis where the dependent variable is mean global temperature.
Levitt and Dubner suggest several variables that may be important determinants in a regression
equation. Just to get you started, here are two variables Levitt and Dubner suggest: the number
of airplane flights and the amount of bovine flatulence.
5. Model Building. Model building consists of two main tasks: selecting the proper functional form
for the model in order to properly describe the relationships among the variables and selecting
the independent variables that should be included in the model. In this chapter the question of
proper model building is paramount. According to the authors: To predict global surface temperatures, one must take into account these [airplane flight information] and many other factors
including evaporation, rainfall and . . . animal emissions.
28

6. Forecasting. Forecasting is the use of models or other statistical techniques to forecast the values of specific variables into the future. In order to accomplish this, a model needs to be developed. In this chapter, several models of climate change are described. Predictions of atmospheric
temperature, sea level and world-wide agricultural production are made by these models.
Atmospheric temperature forecasts are the major thrust of this chapter. In your statistics text, you
may have come across methods of developing forecasts. The simplest of these is trend analysis:
drawing a trend line through a set of values of two variables. Why is a simple trend analysis insufficient to forecast the global atmospheric change?
You should now be able to explain how to prepare forecasts using regression analysis. You begin
with theory and choose independent variables based on climate theory. You collect data, estimate
coefficients and go through the process of properly specifying a model. You then use this model to
predict values of the dependent variablein this case mean global temperature or the increase in
sea levels.
7. Scientific Uncertainty. Statistical textbooks assume that data can be used to produce sample statistics, to test hypotheses, to create regression equations, or to develop models. In this chapter,
Levitt and Dubner cast doubt on this viewpoint. They note that it is not clear what data are relevant. They also cast doubt on the idea that a model created from the best data available is sufficiently complex to handle the complicated case of global climate change. They even describe
a situation of scientific Darwinism: Only those scientists whose models predict catastrophic
global climate change will receive funding to conduct research and fully develop atmospheric
models.
8. As a student of statistics, how do you determine the reliability of the data you collect? How do
you deal with cases in which scientists disagree as to the appropriate data to use?
Core Competencies
Once you have read and carefully studied this chapter you should be able to complete the following
tasks. Note: these competencies do not begin with simpler statistical concepts and work toward more
difficult one. Instead, they follow the organization of the book.
1. Even without the use of sophisticated statistical methods much can be learned by simply examining the relevant data. Describe the sources of greenhouse gases.
2. To what degree would greenhouse gases be eliminated by buying localin terms of food production and delivery?
3. What visual devise could be used to clearly illustrate the relationship between anthropomorphic
greenhouse gas emissions and mean global temperature? What type of data are referred to in
this chapter?
4. List and explain the three primary science-related factors Levitt and Dubner cite as reasons why
global warming is a uniquely thorny problem.
29

5. Explain why existing climate models have a difficult time predicting the climate future.
6. What independent variables would be included in a multivariate regression analysis that would
be used as a model to predict future global mean temperature?
7. Suppose you include the eruption frequency of big-ass volcanoes as one of the independent
variables in the regression analysis described above. How would the value of the estimated
coefficient of this variable compare with the value of other independent variables?
8. What does model specification mean? How is it relevant to building models that are used to
forecast future global temperatures?
9. What statistical technique could be used to determine the deterrent effect of the club versus
the LoJack? State the null hypothesis for this experiment.
10. Describe the process of stepwise regression. What role to t-statistics and F scores play in stepwise regression? What would the process be for using this technique in developing a model for
global warming?
11. Describe the effect the eruption of Mount Pinatubo had on the earths average temperature in
the two years following the eruption.
12. How does the existence of scientific uncertainty complicate statistical analysis?
13. Summarize Intellectual Ventures assessment of the current generation of climate-assessment
models and what this assessment means for our ability to rely on those models in formulating
sound global-warming policy.
14. What is normative analysis? How is it used by some environmental activists to determine the
appropriate solution to global warming. How does this differ from positive analysis?
15. Describe the primary driver behind rising sea levels and contrast this with the argument made
by many environmental activists regarding the assumed cause of this phenomenon.
16. What is geoengineering?
17. Drawing on the observed effects big-ass volcanoes have on the earths temperature, explain
how Budykos Blanket, i.e., the garden hose to the sky would work to reduce or eliminate the
process of global warming.
18. Describe the expected costs of implementing the Budykos Blanket strategy. How does this
compare with the cost of Al Gores PR campaign regarding global warming?

Economics students will find a student guide to Superfreakonomics and Freakonomics at


www.HarperAcademic.com
30

You might also like