You are on page 1of 62

SAMPLE SIZE

ESTIMATION
Community Medicine
North Bengal Medical College

Research Process

Research Planning
Hypothesis and Aims
Research Design
Data Collection
Organization and Presentation
Data Analysis
Interpretation and Conclusion
Publication

Research Design

Study Type and Design


Sampling Method
Sample Size

Why Estimate Sample Size?


Too small sample: may fail to answer the
question or answer imprecisely
Too big a sample: may answer the
question but may be logistically difficult or
costly
The goal:
to estimate an appropriate number of
participants given the study design that
will give reasonably precise values with
adequate power.

So, the

Sample -

Must be of optimum size & should be large


enough to give valid estimate about
population characteristics.
There is no magic number that we can
point to as an optimum sample size.
Also we can not say what percentage of
population should be sampled.

Factors Determining Sample


Size
Nature of universe

Confidence interval

Type of study

Design effect

Sampling technique

Anticipated

Magnitude of
problem
Precision & power of
the study

dropouts

SOME TERMINOLOGIES
- For Understanding Sample Size
Estimation

Hypotheses
Hypothesis: a prediction about the outcome of
research
Hypothesis testing is a procedure that uses
sample data to evaluate an hypothesis about a
population parameter (e.g. mean, standard
deviation, proportion)
Briefly, we make a decision about the
hypothesis on the basis of our sample data.

Types of Hypotheses
Null Hypothesis (H0):
a statement which usually claims a zero difference
which the researcher tries to disprove, reject or nullify.
(The mean weight of males and females are not different. )

Alternative Hypothesis (H1):


the statement we actually want to test;
usually postulates a non-zero difference or relationship
(The mean weight of males and females are different.)

Directional Hypotheses
H 0: 1 = 2
H 1: 1 2
Two-sided test

H1: 1 > 2; 1 < 2


One-sided test

Errors in Hypothesis
Testing

Type I () Error & Confidence


Interval
Type I Error ( error)
Rejecting H0 when it is actually true
Concluding a difference when actually no
difference exists

Confidence Interval:
The probability that an estimate of a
population parameter is within certain
specified limits of the true value;
commonly denoted by : 1- .

p Value and Significance


Level
p-value:

Probability of type I () error = p-value


A probability indicating how likely to get a sample with
such a test statistic like ours or with more extreme one
provided that the H0 is true.
The smaller the p value - more unlikely the H0 is true.

Significance level
An arbitrarily and a priori declared probability
threshold.
Cut-off point for p-value, below which H0 will be
rejected.
Typically set at 5%. (i.e. = 0.05; CI (1- ) = 0.95)

Type II () Error and Power


Type II Error ( )
Accepting H0 when it is actually false
Concluding no difference when one does
exist

Power:

Probability of detecting difference if one


exists
It is commonly denoted by: 1
Interpretation: 80% power means there is
Commonly used power: 80% or 90%
80% chance that a true effect /difference will
be found and 20% chance that a true effect
will be missed ( error).

Z and Z
Z value required for a chosen level
of
(Type I error)
Z value required for a chosen level
of (Type II error)
OR a chosen level of Power (1- )

Level of (p
value)

0.05

1.96

0.01
0.001

2.57
3.29

Level of

0.10
0.15

1.28
1.04

0.20

0.84

0.25

0.67

Precision
A measure of how close an estimate is to
the true value of a population parameter.
It may be expressed in absolute terms or
relative to the estimate.
Desired width of the confidence interval
for sample estimate

Design Effect
A measure of variability due to selection
of study subjects by any sampling
method other than simple random
sampling.
Thus ultimately the calculated sample
size is multiplied by 2 (usually) to get
the same precision as simple random
sampling

Standard Deviation (
or SD)

Most frequently used measure of dispersion of


data
(Root-Means-Square-Deviation)
SD is given by the formula
SD = (X ) 2 / n 1
When sample size > 30, denominator n is used
instead of n 1.
More the SD, more the dispersion of data around
mean.
When sample size increases then SD decreases.

Standard Error
When studying a population or universe, many
different samples can be chosen out of it.
If we calculate the sample mean, we would
see that all the sample means are different,
though all the samples have been drawn from
same universe.
Mean of all the sample means will corroborate
to population mean. The standard deviation of
the means is a measure of the standard error
and is given by the formula SD/ n.
n

Sample Size
Calculations

Sample Size: for various types

of study
Cross sectional Study
One sample situation
Two sample situation

Case Control Study


Cohort Study
Experimental Study

Cross-Sectional :

One Sample

Situations
Outcome measure is dichotomous
variable (proportion)
Estimating a population proportion with
specified absolute precision
Estimating a population proportion with
specified relative precision

Outcome measure is continuous


variable (mean, standard
deviation)

Estimating a population
proportion with specified absolute
precision

Required information and notation


(a) Anticipated population proportion
(b) Confidence level
100(1-)%
(c) Absolute precision required on either
side of the proportion (in percentage
points)
d

Sample Size: n = z1 -/2P(1P)


d2

Example

: Problem 1

A local health department wishes to


estimate the prevalence of tuberculosis
among children under five years of age in
its locality.
How many children should be included in
the sample so that the prevalence may be
estimated to within 5 percentage points of
the true value with 95 % confidence, if it is
known that the true rate is unlikely to
exceed 20 %?

Problem 1:

Solution

(a) Anticipated population proportion 20 %


(b) Confidence level
95 %
(c) Absolute precision (15 % -25 %)
5%
z1 -/2= 1.96
Sample Size= z1 -/2P(1- P) = (1.96)2 x 0.2 x 0.8
2

d2
= 246 children

(0.05)2

Estimating a population
proportion with specified
relative precision

Required information and notation


(a) Anticipated population proportion
P
(b) Confidence level
100(1-)%
Sample
n = z1
(c)
RelativeSize:
precision
2
-/2(1- P)
2 P

Example:

Problem 2

An investigator seeks to estimate the


proportion of children in the country who
are receiving appropriate childhood
vaccinations (Immunization coverage).
How many children must be studied if the
resulting estimate is to fall within 10 % of
the true proportion with 95 % confidence?
(The vaccination coverage is expected to
be 50%.)

Problem 2: Solution
(a) Anticipated population proportion
(b) Confidence level
95%
(c) Relative precision (45% to 55%)
%) z1 -/2= 1.96

50 %

10 % (of 50

Sample Size= z1 -/2(1- P) = (1.96)2 x 0.5


2 P

(0.1)2 X 0.5
= 384 children

Continuous Outcome
Variable
Required information and notation
(a) Anticipated population SD

(b) Confidence level


100(1)%
(c) Relative precision 2
Sample Size: n = 2z1 -/2

Example:

Problem 3

Calculate the sample size to obtain an


estimate of Hb % in a community ,
where Hb % in the community is 10.4
gm % & SD is 2.1 gm %.
The chosen confidence level is 95 %
and relative precision (allowable error)
is 5 %

Problem 3:

Solution

(a) Anticipated population SD

2.1

(b) Confidence level


95 %
(c) Relative precision (5% of 2.1)
0.52
z1 -/2 = 1.96
N = 2 z1 -/22
62.65 = 63

= ( 1.96)2 x (2.1)2
(0.52)2

Cross-Sectional:

Two Sample

Situations
Estimating the difference between
two population proportions with
specified absolute precision
Estimating the difference between
two population proportions with
specified relative precision

Estimating difference between two


population proportions with
specified absolute precision
Required information and notation
a.Anticipated population proportions
P1
and P2
b.Confidence level
100(1-)%
c.Absolute precision required on either side
of the true value of the difference between
the proportions (in percentage points)
d
d. Intermediate value
V=P1 (1P1)+P2 (1-P2)

Sample Size:
2
n= z1 -/2 [P1 (1-P1) + P2 (1-P2)]

d2
or
n= z1 -/2 V

d2
where V= P1 (1-P1) + P2 (1-P2)

Example:

Problem 4

In a pilot study of 50 agricultural workers in an


irrigation project, it was observed that 40% had
active Schistosomiasis.
A similar pilot study of 50 agricultural workers not
employed on the irrigation project demonstrated
that 32% had active Schistosomiasis.
If an epidemiologist would like to carry out a larger
study to estimate the Schistosomiasis risk
difference to within 5 percentage points of the
true value with 95% confidence, how many people
must be studied in each of the two groups?

Problem 4:
a.
b.
c.
d.

Solution

Anticipated population proportions


40%, 32%
Confidence level
95%
Absolute precision (in percentage points)
5
Intermediate value
0.46

z1 -/2= 1.96
2
Sample Size= z
1 -/2 V =

(1.96)2 X 0.46

d2
= 707 in each group.

(0.05)2

Case Control Study

Disease

Expose Unexpose
d
d
a
b

No
c
d
disease
Odds Ratio (OR) = ad/bc

Sample Size:

Case Control

Study
Required information and notation
a.Two of the following should be known:
Anticipated probability of "exposure for people
with the disease
[a/(a + b)] P1
Anticipated probability of "exposure for people
without the disease [c/(c + d)]
P2
Anticipated odds ratio OR

b. Confidence level 100(1-)%


c. Relative precision

Sample Size = z1 -/2


P1(1 P1)]

P2(1- P2)]

[loge(1- )]2

Sample size can be derived using appropriate


table for case control studies

Example:

Problem 5

In an area cholera is posing a serious public


health problem; about 30 % of the population
are believed to be using water from
contaminated sources.
A case-control study of the association between
cholera and exposure to contaminated water is
to be undertaken in the area to estimate the
odds ratio to within 25 % of the true value,
which is believed to be approximately 2, with 95
% confidence.
What sample sizes would be needed in the
cholera and control groups?

Problem 5: Solution
Anticipated probability of "exposure" given "disease= ?
Anticipated probability of "exposure" given "no
disease
(approximated by overall exposure rate)
= 30 %
Anticipated odds ratio

Confidence level

95 %

Relative precision

25 %

Applying formula sample size of 408 would be needed in


each group.
This can be derived from appropriate table of sample size
for case control studies

If number of cases are large, then same


number of controls;
Cases : Control =
1:1
If number of cases are small, then number
of controls may be twice or thrice the
number of cases.
Cases : Controls = 1 : 2 / 3 / 4

Cohort Study
Diseas
e
Exposed
a
Unexpos
c
ed
Relative Risk = a/a+b
c/c+d

No
disease
b
d

Sample Size:

Cohort study

Required information and notation


a.Two of the following should be known:
Anticipated probability of disease in people
exposed to the factor of interest
P1
Anticipated probability of disease in people not
exposed to the factor of interest
P2
Anticipated relative risk RR

b. Confidence level 100(1-)%


c. Relative precision

Sample Size = z1 -/2

(1 P1)

P1
[loge(1- )]2

+ (1- P2)

P2

Sample size can be derived using


appropriate table for cohort studies

Example:

Problem 6

An epidemiologist is planning a study to


investigate the possibility that a certain lung
disease is linked with exposure to a recently
identified air pollutant.
What sample size would be needed in each of
two groups, exposed and not exposed, if the
researcher wishes to estimate the relative risk to
within 50 % of the true value (which is believed
to be approximately 2) with 95 % confidence?
The disease is present in 20% of people who are
not exposed to the air pollutant.

Problem 6:

Solution

Anticipated probability of disease given "exposure?


Anticipated probability of disease given "no
exposure
20%
Anticipated relative risk
2
Confidence level
95%
Relative precision
50%
Applying the formula, sample size of 44 would be
needed in each group.
This can be derived using appropriate table of
sample size for cohort studies

Sample Size:

Experimental

Studies
Considerations:
Here the purpose is to test null hypothesis,
thus sample size calculation requires
specification of limits of errors one is
willing to accept in accepting or rejecting
null hypothesis (type I & II error).
Outcome Measure
- Dichotomous variable OR Continuous
variable.
Effect size of clinical importance

Effect Size of Clinical


Importance
This is the smallest difference between
the group means or proportions (or odds
ratio / relative risk closest to unity)
which would be considered to be
clinically or biologically important.
The sample size should be set so that if
such a difference exists, is very likely
that a statistically significant result
would be obtained

Outcome Measure: Dichotomous

Variable

Required information and notation


a. Test value of the difference between P1-P2= 0
the population proportions under the
null hypothesis
b. Anticipated values of the population
P1 and P2
proportions
c. Level of significance 100 %
d. Power of the test 100(1-)%
e. Alternative hypothesis:
either (P1-P2)> 0 or (P1-P2) < 0 (for one-sided
test)
Or P1 - P2 0 (for two-sided test)

N=
(Z
+P2Q2)

1-

+ Z )

(P1Q1

(P1 - P2)

Where Q1 is 1-P1 and Q2 is 1-P2

Example:

Problem 7

Estimate the sample size for a trial to


study the effects of a new treatment
over standard treatment to reduce
the 5 year mortality in patients with
a particular cancer.
The success rate of the standard
drug is 55% and with the new drug it
is expected to be 70%.

Problem 7: Solution
Z

at 95% confidence limit= 1.96

When power of the trial 80%, =0.2, Z = 0.84


P1=0.7, Q1= 0.3; P2=0.55, Q2= 0.45
n=

(Z

1-

+ Z ) 2 (P1Q1 +P2Q2)

(P1-P2) 2
= (1.96+ 0.84)2 {(0.7 X 0.3) +(055 X 0.45)}
(0.7-0.55)2
= 160 each from study and control groups

Outcome Measure: Continuous

Variable
Required information and notation
a.Estimate of variable of individual values

b. Magnitude of difference that is desired to

detect
c. Level of significance
100
%
Sample size n = 2 (Z 1- + Z
d. Power of the test
100(1-)
%) 2 2

()

Example:

Problem 8

Estimate the sample size for a trial to


study the effects of a new drug over the
standard drug to reduce the morbidity in
patients with COPD.
The standard deviation of FEV 1 of the
standard drug is 0.4 ml and the
difference between mean FEV 1 values of
treatment group and control group is 150
ml.

Problem 8

: Solution

= Standard deviation of FEV1 (From previous


study, we have got SD of FEV1 = 0.40)
Z (value of Z for ) = 1.96 (p = 0.05, 95%
confidence desired two tailed test)
Z (value of Z for beta)= 0.84 (20% beta error,
thus 80% power desired two tailed test)
(Difference to be detected) = 150 ml (0.15 l)
or larger difference between mean FEV 1 values
of experimental group and control group.

Applying the formula

n= 2 (Z

1-

+ Z ) 2 ()
()

2 (1.96+ 0.84) (0.40)2


(0.15) 2
= 125

So, 125 subjects per group; hence total


125 X 2 = 250 subjects

Recommended Reading

SAMPLE SIZE
DETERMINATION IN HEALTH
STUDIES- A Practical Manual
S. K. Lwanga & S. Lemeshow

Free Sample Size Software


Epi- Info:
http://wwwn.cdc.gov/epiinfo/
Win Pepi:
http://www.brixtonhealth.com/pepi4
windows.html

Thank You

You might also like