You are on page 1of 2

Hello, and welcome to the first

unit of Data Analysis and


Statistical Inference,
Introduction to Data.
This unit will introduce you to
the basics of collecting, analyzing, and
visualizing data,
as well as making data based decisions.
Let's start with a historical
look at data as evidence.
In the United States, anti-smoking
research started in the 1930's,
when cigarette smoking
became increasingly popular.
While some smokers seemed to be
sensitive to cigarette smoke,
others were completely unaffected.
Anti-smoking research was faced with
resistance, based on claims like: My
uncle smokes three packs a day and
he's in perfectly good health.
Such evidence, while maybe real, is based
on a limited sample size that might not be
representative of the population.
We call such evidence, anecdotal evidence.
At the time, it was concluded that
smoking is a complex human behavior,
by its nature difficult to study,
confounded by human variability.
However today, our understanding
of the health effects of smoking,
is much different.
In time, researchers were able to
examine larger samples of cases.
In other words, more smokers, and
with data collected from a larger sample
over time, trends showing negative health
impacts of smoking became much clearer.
The goal of this course is to teach you
to make sense of data using statistical
tools, in order to be able to explore
relationships between variables and
make date informed decisions.
Throughout the course, you will be
introduced to numerous studies, and
when faced with a new study or a data set,
the first question you should always
ask yourself is, what is the population
of interest, and what is the sample?
For example, let's consider this
study titled Alcohol Brand Use and
Injury in the emergency department,
published in 2013.
The study explored the research question,
are consumers of
certain alcohol brands more likely to end
up in the emergency room with injuries?
Based on this question alone,
it appears that the population
of interest is, everyone.

In other words,
ideally the researchers would like
to find an answer to this question,
that can result in a recommendation for
everyone who consumes alcohol.
However, a closer look at the study
reveals that the sample used in
this study, was only a group
of emergency room patients at
the Johns Hopkins Hospital
in Baltimore in the US.
These are patients who visited
the hospital with an injury, and
alcohol brand consumption data were
collected from patients who drank within
six hours of presentation at the hospital.
Therefore, the results of this study
can really only be generalized to
residents of Baltimore.
Since certain brands may be more easily
available in this area than others due to
national brand marketshare.
Similarly, there may be transient
alcohol consumption habits of
people who live in this area,
versus everywhere else in the world.
Now that you are a little more familiar
with how to approach statistical studies,
let's give a brief overview of
what's to come in this unit.
We will start by defining
populations of interest,
discuss methods of taking
samples from this population,
designing studies that can best
answer particular research questions.
We will also learn to identify
the scope of inference for a study.
Such as whether we can make causal
versus correlational statements, and
when we can generalize our conclusions
to the population at large.
We will also learn methods of
exploratory data analysis,
such as data visualizations and
summary statistics.
And, we will wrap up the unit with
a light simulation based introduction to
Statistical Inference.

You might also like