You are on page 1of 20

Statistics for Researchers

Observational Studies 1
Rex and Jane Galbraith

Department of Statistical Science

University College london, January 2017

Introduction 2
Observational Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Observational studies versus experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Why do we need observational studies? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Fluoridation and cancer mortality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
The Connecticut crackdown on speeding. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Connecticut traffic fatalities 1951–1959 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Road fatality rates for Connecticut and ”control” states . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

Design and interpretation 10


Problems of non-experimental studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Examples of observational biases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
Regression to the mean . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
Lothian road accident data — a curious paradox? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Evidence that the worst sites are improving? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
Evidence that the worst sites are deteriorating? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
But neither is happening . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
Confounding – does smoking prolong life? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
Broken down by age . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
Confounding: examples and remedies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
Simpson’s Paradox . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
Explanation? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
Another situation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
Simpson’s paradox: how to handle? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
Causal diagrams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
The same numbers in a different context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
Difficulty . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

Study plans 29
Cross-sectional, prospective and retrospective studies . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
What can we estimate? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
How to interpret? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

1
Odds ratios and risk ratios 33
Probabilities and odds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
Odds ratios . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
Odds ratios from prospective and retrospective studies. . . . . . . . . . . . . . . . . . . . . . . . . . . 36
For example: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
Risk ratios (or relative risks) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
Numerical values of p1 for given p0 and ψ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
Comparing proportions or probabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
Comparing proportions (continued) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
Exercise. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

2
Introduction 2 / 42

Observational Studies
No Experimental Manipulation

sample survey
administrative records
convenience or “happenstance” sample

Purpose?

description of “target population”


prediction of properties of further cases
learning about underlying “laws”
causal inference

3 / 42

Observational studies versus experiments


Example: compare two medical treatments A and B
Experiment: Each eligible patient is assigned to one of A or B by an objective randomisation procedure.
Subsequent care is closely standardised and a clearly defined response (e.g., death within one
year) is measured for each patient.
Suppose that the two groups of responses differ by more than can reasonably be ascribed to chance. Then we
may reasonably conclude (provided that the experiment has been correctly administered and analysed) that the
difference between the groups is a consequence of the difference between A and B.

Pure observational study: From hospital records, information is assembled on the same response
variable for two groups of patients, one group having received treatment A and the other treatment
B.
Suppose again that there is a clear difference in responses between groups. What can we conclude?
Although the difference is unlikely to be due to “chance” there are now many other possible explanations of it, in
addition to a treatment effect.
Note that the data here might “look” exactly the same as that for the experiment.

4 / 42

3
Why do we need observational studies?
Impossible to do an experiment

– Ethical obstacles (e.g., smoking and lung cancer)


– Practical difficulties (e.g., CO2 emission and global warming)
– Logical difficulties (e.g., gender and mathematical ability)

Economic reasons

– Already have data


– Experiment too costly to mount.

Limitations of experiments

– Restricted conditions and factor levels (results may not generalise to “field” conditions)

5 / 42

Fluoridation and cancer mortality


Burk and Yiamouyannis (1975) looked at cancer mortality in 10 US cities that had fluoridated
water supplies and in 10 other US cities that had not been fluoridated.
In the fluoridated cities, cancer mortality had increased by 20% while in the non-fluoridated cities
it had increased by only 10%. They concluded that this provided evidence that fluoridation
causes cancer.

Oldham and Newell (1975) re-analysed the data, taking into account the differing age-sex-race
compositions of the different cities (for it is known that cancer mortality depends on these
factors). They concluded that there was no evidence of a link between fluoridation and cancer
mortality.
In fact, the excess cancer rate (over the national average) had increased by 4% in the
non-fluoridated cities and by only 1% in the fluoridated cities. (Applied Statistics, 26, 125-135.)
6 / 42

The Connecticut crackdown on speeding


Late in 1955, the governor of Connecticut imposed a programme of mandatory suspension of
driving licences for speeding. In that year 324 people were killed on Connecticut’s roads.

In the next year, 1956, the road death toll was only 284. The crackdown was hailed as a
success; the governor stated

“With the saving of 40 lives in 1956, a reduction of 12.3% from the 1955 motor vehicle
death toll, we can say that the program is definitely worthwhile.”

7 / 42

4
Connecticut traffic fatalities 1951–1959

8 / 42

Road fatality rates for Connecticut and ”control” states

Fatality rate = number of


road fatalities per 100,000
persons per year

9 / 42

5
Design and interpretation 10 / 42

Problems of non-experimental studies


Selection Bias — sample not representative of target population.
e.g., investigation into attitudes of carers for elderly people, with interviews conducted at a
University hospital.

Inadequate Controls — bias or time-trends make comparisons difficult.


e.g., how to choose controls in case-control studies; how to assess effect of seat belt legislation on
road accidents.

Uncontrolled before/after studies — “regression to the mean”


e.g., select road intersections with high accident rate for trial of a new signalling system. Even if the
new system is ineffective, an improvement is to be expected.

Confounding — effects of interest may be systematically confounded with other effects


e.g., in the fluoridation example, age, sex and race are confounding variables.

11 / 42

Examples of observational biases


• Oscar winners live longer than actors who have not won an oscar (Annals of International
Medicine, 2001)
• Life expectancy of famous orchestral conductors is longer “due to arm exercise”
• Do left handed people have a shorter life span? (NEJM, 1991)
• Abraham Wald: re-reinforcing fighter aircraft in WW2
• Measurements of fission track lengths — length-biased sampling
• Publication bias in meta analyses and literature reviews
• Daryl Bem and precognition
• efficacy of the anti-depressant Reboxetine

12 / 42

6
Regression to the mean
If a variable is measured for several individuals on two occasions then, other things being equal,
individuals with a high first value will tend to have a lower second value, and individuals with a
low first value will tend to have a higher second value.
This phenomenon (discovered by Francis Galton) of regression to the mean is powerful,
pervasive and widely misunderstood.
It is a consequence of natural or statistical variation.
The essential phenomenon is that a value may be high partly because of inherent size but
also partly because “random” effects have conspired to produce a higher value than might
otherwise have occurred. On the second occasion the latter effect is not repeated. One can
also regard this as a form of selection bias — the first value is selected precisely because it is
high.

13 / 42

Examples
Screening selection. If patients are selected for treatment because of some extreme value (e.g., high
blood pressure) then, even if the treatment has no effect, their average blood pressure is likely be
lower after treatment.

Educational tests. Army recruits are given a test. Those with high marks are praised, those with low
marks are threatened with failure. In a subsequent test, those who were initially praised mostly got
lower marks and those who were threatened got higher marks. Is it counter-productive to give praise
and helpful to threaten?

Batting averages. Look at the batsman with the highest average mid-way through the cricket season.
Now look at his average for the second half of the season — this is likely to be lower. Why?

Stature of men. Sons of tall fathers are on average tall, but (on average) not as tall as their fathers. Sons
of short fathers are on average short, but (on average) not as short as their fathers. (Does this imply
that, after several generations, men will all be of similar stature?)

14 / 42

7
Lothian road accident data — a curious paradox?
Frequency distribution of road accident sites, cross-classified by the numbers of accidents at each in two
time periods:

Quoted in Senn and Collie (1988) Road traffic Engineering and Control, 168–169.
15 / 42

Evidence that the worst sites are improving?

If you classify the sites by the number of accidents they had in 79/80 it seems that the worst
sites have got better. (Lower accident rate in the second period.)
16 / 42

8
Evidence that the worst sites are deteriorating?

On the other hand, if you classify the sites by the number of accidents they had in 81/82 it seems
that the worst sites have got worse. (Lower accident rate in the first period.)
17 / 42

But neither is happening . . .

In fact, looking at the data as a whole shows that the accident rates are very stable — mean
accidents per site is 0.976 in 79/80 and 0.964 in 80/81. Another case of regression to the mean
— and the unnatural graphical presentation adds to the confusion.
18 / 42

9
Confounding – does smoking prolong life?
Here are some mortality rates from a 20 year study (1974–1994) of 1314 British women (a
sample drawn from the 1974 electoral roll in Whickham, UK) classified by smoking status; where
n is the number in each group and d is the number who died (from any cause) with 20 years:
n d d/n
All women 1314 369 28.1%
Smokers 582 139 23.9%
Non-Smokers 732 230 31.4%

The mortality rate is higher for the non-smokers.


19 / 42

Broken down by age


Here are the rates for the data broken down into three age groups (i.e.,age in 1974):

n d d/n
Age 18–34 Smokers 179 5 2.8%
Non-Smokers 219 6 2.7%
Age 35–64 Smokers 354 92 26.0%
Non-Smokers 320 59 18.4%
Age 65+ Smokers 49 42 85.7%
Non-Smokers 193 165 85.5%

A different story emerges — age is a confounding factor. (There are other interesting patterns
too.)
20 / 42

Confounding: examples and remedies


Examples

• fluoridation of water supplies and cancer


• oral contraceptives and heart attacks
• oral contraceptives and cervical cancer

Remedies

• restriction
• matching
• stratification (adjustment by sub-classification)
• regression

These do not always work perfectly.


21 / 42

10
Simpson’s Paradox
Hypothetical data for 400 men and 400
women:
Recover? Rate
Males: yes no total
Drug? yes 70 30 100 70%
no 180 120 300 60% So the drug is
beneficial for men
Females: yes no total and for women —
but harmful for
Drug? yes 90 210 300 30%
patients of
no 20 80 100 20% unknown sex!

All: yes no total


Drug? yes 160 240 400 40%
no 200 200 400 50%
22 / 42

Explanation?
Drug? Rate
yes no total
Males 100 300 400 25%
Females 300 100 400 75%

Recover?
yes no total
Males 250 150 400 62%
Females 110 290 400 28%

The treated (drug) group is biased against men, but men fare better than women whether on the
drug or not.
23 / 42

11
Another situation
A similar phenomenon can arise with group means:
ȳ = mean score on cognitive test
n = number of people

Males Females All


No drug n 300 100 400
ȳ 24 8 20
Drug n 100 300 400
ȳ 28 12 16

Males Females All


Both groups n 400 400 800
ȳ 25 11 18
24 / 42

Simpson’s paradox: how to handle?


Think predictively whether to apply the drug to a new patient.

1. Sex known: use the specific table.


1
2. Sex unknown (!) Use prob(male) = prob(female) = 2 .

   
1
prob(recover|no drug) = 2 × 0.60 + 12 × 0.20 = 0.40
   
1
prob(recover|drug) = 2 × 0.70 + 12 × 0.30 = 0.50

— standardisationa

Use a causal model to guide interpretation (figure).


25 / 42
a
In general can use prob(male) = p and prob(female) = 1 − p where p need not equal 12 .

12
Causal diagrams

26 / 42

The same numbers in a different context


No heart disease: Survival? Rate
yes no total
Smoking? yes 70 30 100 70%
no 180 120 300 60%
Heart disease:
Smoking? yes 90 210 300 30%
no 20 80 100 20%

All:
Smoking? yes 160 240 400 40%
no 200 200 400 50%

Here it is not appropriate to stratify by heart disease, which itself may be affected by Smoking,
and relevant to Survival (figure).

In this case, the All table is more relevant — but does this compare like with like? Are there
pre-disposing genetic factors?
27 / 42

13
Difficulty
There may be further unmeasured individual characteristics (age, location, general health,
genetic factors, . . .) that are associated with outcome and confounded with the treatment (or
factor of interest).

In controlled experiments, this difficulty is avoided by the use of randomisation.

28 / 42

Study plans 29 / 42

Cross-sectional, prospective and retrospective studies


Hypothetical data to study the relation between smoking (S) and lung cancer (L):

L L̄
S 40 100 140
S̄ 30 300 330
70 400 470

Cross-sectional: Random sample of 470 people, classified by both S/S̄ and L/L̄.

Prospective: Random samples of 140 smokers and 330 non-smokers, classified by L/L̄.

Retrospective: Random samples of 70 people with lung cancer and 400 without, classified by
S/S̄.

30 / 42

14
What can we estimate?
In either the cross-sectional or prospective study, we can estimate the probability of lung cancer among
smokers:
40
estimated prob(L|S) = = 29%
140
and among non-smokers:
30
estimated prob(L|S̄) = = 9%
330
— direct information.
These are not available in the retrospective study. Instead, we have
40
estimated prob(S|L) = = 57%
70
100
estimated prob(S|L̄) = = 25%
400
— indirect information (values depending on the study design).

31 / 42

How to interpret?
It can be shown that
prob(S|L) > prob(S|L̄)
if, and only if,
prob(L|S) > prob(L|S̄)

So we can get some qualitative knowledge — evidence of the existence of an association, and
its sign

— and a little quantitative knowledge: the odds ratio (see below).


32 / 42

15
Odds ratios and risk ratios 33 / 42

Probabilities and odds


p
The odds of an event with probability p is the ratio .
1−p
0.8
e.g., if p = 0.8 then the odds is =4 (i.e., “4 to 1”).
0.2

odds
In terms of the odds, the probability is p =
1 + odds
5 5
e.g., if the odds is 5 to 1 then the probability is p = =
1+5 6
5/2 5 5
if the odds is 5 to 2 then the probability is p = = =
1 + 5/2 2+5 7

1
Note that odds = 1 corresponds to probability = 2
1
odds greater than 1 correspond to probabilities greater than 2
, and
1
odds less than 1 correspond to probabilities less than 2

34 / 42

Odds ratios
For example, to compare the risks of lung cancer (L) for smokers (S) and non-smokers (S̄). Let
p1 = prob(L|S) and p0 = prob(L|S̄) .

The two odds are p1 /(1 − p1 ) and p0 /(1 − p0 ) and the odds ratio is
p1 /(1 − p1 ) p1 1 − p0
ψ = = × .
p0 /(1 − p0 ) p0 1 − p1

If p1 = p0 then ψ = 1; p1 > p0 then ψ > 1; and if p1 < p0 then ψ < 1.


To interpret an odds ratio quantitatively, express p1 in terms of ψ and p0 :
ψp0
p1 =
1 − p0 + ψp0
e.g., if ψ = 2 and p0 = 0.2 then p1 = 0.33; or if ψ = 2 and p0 = 0.4 then p1 = 0.57, etc.

An odds ratio by itself carries only a little information. We also need to know the “baseline risk”
p0 .
35 / 42

16
Odds ratios from prospective and retrospective studies
To compare the risks of lung cancer for smokers and non-smokers, the odds ratio is

prob(L|S)/prob(L̄|S)
prob(L|S̄)/prob(L̄|S̄)

This can be estimated directly from a prospective study.

But a retrospective study gives us estimates of the odds ratio

prob(S|L)/prob(S̄|L)
prob(S|L̄)/prob(S̄|L̄)

i.e., the ratio of the odds of being a smoker amongst those with lung cancer to the odds of being
a smoker amongst those without lung cancer.

However, it can be shown that these two odds ratios are equal! So by estimating the latter we are
also estimating the former.
36 / 42

For example:
Consider the original data on smoking and lung cancer:

L L̄
S 40 100 140
S̄ 30 300 330
70 400 470

40/100
For a prospective study: odds ratio = = 4
30/300
40/30
For a retrospective study: odds ratio = = 4
100/300
— the same either way.

But we still can’t estimate the baseline risk from a retrospective study.
37 / 42

17
Risk ratios (or relative risks)
The ratio p1 /p0 is called the risk ratio or relative risk. It has a more direct interpretation than an
odds ratio, but again we really need to know p0 (the baseline risk) also in order to appreciate it.
If p0 and p1 are both small, then the odds ratio and risk ratio are numerically very similar:

e.g., if p0 = 0.01 and p1 = 0.02 the risk ratio = 2 and the odds ratio = 2.02.

In such cases an odds ratio can be interpreted as a risk ratio.


But if p0 and p1 are not small, then the odds ratio and risk ratio are quite different:
e.g., if p0 = 0.2 and p1 = 0.4 the risk ratio = 2 but the odds ratio = 2.67.

Note: it is a common mistake to interpret an odds ratio as a risk ratio. They are different — often
very different — and are only similar in value when p0 and p1 are small.
38 / 42

Numerical values of p1 for given p0 and ψ


odds ratio ψ
p0 1.1 1.2 1.5 2.0 5.0
.001 .0011 .0012 .0015 .002 .0050
.002 .0022 .0024 .0030 .004 .0099
.005 .0055 .0060 .0075 .010 .0245
.01 .011 .012 .015 .020 .0481
.02 .022 .024 .030 .039 .093
.05 .055 .059 .073 .095 .208
.10 .109 .118 .143 .182 .357
.20 .216 .231 .273 .333 .556
.50 .524 .546 .600 .667 .833

For small values of p0 (and moderate ψ ) you can see that p1 approximately equals ψp0 , but for
large p0 this does not hold.
39 / 42

18
Comparing proportions or probabilities
Proportions of deaths (within one year) from cancer and from heart disease, for cigarette smokers and
non-smokers (from Doll and Peto, 1976, BMJ, 2, 1525–1536):
RR OR RD
from cancer 
smokers pS = .00140
14.0 14.0 130
non-smokers pN = .00010

from heart disease 


smokers pS = .00669
1.62 1.62 256
non-smokers pN = .00413

pS
RR = = risk ratio
pN
pS 1 − pN
OR = × = odds ratio
pN 1 − pS
RD = (pS − pN ) × 105 = risk difference per 100,000 per year

40 / 42

Comparing proportions (continued)


Because the probabilities of death (within one year) are small, the odds ratio is practically equal
to the risk ratio.

The risk ratio is much higher for cancer than for heart disease.
A smoker is 14 times more likely to die of cancer within one year compared to a non-smoker; but only
1.62 times more likely to die of heart disease.

But the actual risks (pS and pN ) are higher for heart disease.

And the risk difference is higher for heart disease (256 per 100,000 person years) than for
cancer (130 per 100,000 person years)
In 100,000 smokers and 100,000 non-smokers, 256 more smokers die within one year of heart disease
than non-smokers; but only 130 more from cancer.

41 / 42

19
Exercise
The data below refer to a study of 262 young and middle aged women who were admitted to 30 coronary
care units in Northern Italy with acute MI during the period 1983–1988a . Each case was matched with two
control patients admitted to the same hospitals with other acute disorders. All patients were classified
according to whether they had ever been smokers. Here are the numbers:

Ever smoker
Yes No
MI cases 172 90
Controls 173 346

1. What type of study is this?


2. Calculate the row totals and the proportions of ever smokers separately for MI cases and for controls, along with
their standard errors, and add them to the above table
3. Is it possible to estimate the risk of MI for ever smokers (in this group) and never smokers? If so, do so. If not,
why not?
4. Is it possible to estimate the odds ratio of MI for ever and never smokers? If so, do so and calculate its
approximate standard error.
5. Can the odds ratio be interpreted as a risk ratio in this case? Explain your answer.
6. Write a one-sentence summary of the result of this study.

42 / 42
a
Source: J.Epidemiol. and Commun. Helath, 43, 214–217 (1989)

20

You might also like