Professional Documents
Culture Documents
Considerations in Determining
Sample Size for Pilot Studies
Melody A. Hertzog*
College of Nursing, University of Nebraska Medical Center, Lincoln Division, 1230 O Street,
Suite 131, P.O. Box 880220, Lincoln, NE 68588-0220
Accepted 12 August 2007
181
182
pa
CI95
CI90
CI68
.15
.30
.15
.30
.15
.30
.15
.30
.15
.30
.15
.30
.15
.30
(.00, .31)
(.10, .50)
(.02,.28)
(.14, .46)
(.04, .26)
(.16, .44)
(.05, .25)
(.17, .43)
(.06, .24)
(.18, .42)
(.07, .23)
(.19, .41)
(.07, .23)
(.20, .40)
(.02, .28)
(.13, .47)
(.04, .26)
(.16, .44)
(.06, .24)
(.18, .42)
(.07, .23)
(.19, .41)
(.07, .23)
(.20, .40)
(.08, .22)
(.21, .39)
(.08, .22)
(.22, .38)
(.07, .23)
(.20, .40)
(.09, .22)
(.22, .38)
(.09, .21)
(.23, .37)
(.10, .20)
(.24, .36)
(.10, .20)
(.24, .36)
(.11, .19)
(.25, .35)
(.11, .19)
(.25, .35)
a
Because the standard error of a proportion equals the square root of [p(1 p)/n],
it has the same value for p and for 1 p. Therefore, the width of the confidence
intervals tabled would also apply to proportions of .85 (i.e., 1.15) and .70 (1.3).
183
N
20 (10 per group)
CI95
CI90
CI68
.30
.40
.50
.60
.30
.40
.50
.60
.30
.40
.50
.60
.30
.40
.50
.60
.30
.40
.50
.60
.30
.40
.50
.60
.30
.40
.50
.60
(.16, .66)
(.05, .72)
(.07, .77)
(.21, .82)
(.07, .60)
(.05, .67)
(.17, .73)
(.31, .79)
(.01, .56)
(.10, .63)
(.22, .70)
(.36, .77)
(.02, .53)
(.14, .61)
(.26, .68)
(.39, .75)
(.05, .51)
(.16, .59)
(.28, .67)
(.41, .74)
(.07, .50)
(.18, .58)
(.30, .66)
(.42, .73)
(.09, .49)
(.20, .57)
(.31, .65)
(.44, .72)
(.09, .61)
(.03, .68)
(.15, .74)
(.29, .80)
(.01, .56)
(.11, .63)
(.23, .70)
(.36, .77)
(.04, .52)
(.15, .60)
(.27, .68)
(.40, .75)
(.07, .50)
(.18, .58)
(.30, .66)
(.43, .73)
(.09, .48)
(.20, .57)
(.32, .65)
(.44, .72)
(.22, .47)
(.22, .55)
(.33, .64)
(.46, .71)
(.12, .46)
(.23, .54)
(.35, .63)
(.47, .71)
(.07, .50)
(.18, .58)
(.30, .66)
(.42, .73)
(.12, .46)
(.23, .55)
(.34, .63)
(.46, .71)
(.15, .44)
(.26, .53)
(.37, .61)
(.49, .69)
(.16, .43)
(.27, .51)
(.38, .60)
(.50, .68)
(.18, .41)
(.28, .50)
(.40, .59)
(.51, .68)
(.19, .41)
(.29, .50)
(.40, .59)
(.52, .67)
(.19, .40)
(.30, .49)
(.41, .58)
(.52, .67)
Using a method illustrated in Fan and Thompson (2001), upper and lower limits of confidence
intervals for a selection of alpha levels were calculated and are presented in Table 4. There is some
disagreement about the accuracy of these limits if
the set of items does not follow a multivariate
normal distribution (Bonett, 2002), but these intervals should be sufficiently precise for the purpose
under discussion here.
The precision of coefficient alpha is also
influenced by scale length (a twofold change in
length changing interval limits by approximately
.01), with the effect slightly larger for smaller
sample sizes, but differences are so small that only
values for a 20-item scale are reported here.1 The
values in Table 4 do not vary much across the
range of sample sizes considered; for samples of
2540 per group, observed alpha should probably
be at least .75 in order to have reasonable
1
Once data are collected, the researcher can use SPSS (2005) to
compute limits of a confidence interval around alpha for that
specific data by requesting computation of an intraclass
correlation for a random effects model as an option when
calculating reliability.
184
N
20 (10 per group)
CI95
CI90
CI68
.70
.75
.80
.70
.75
.80
.70
.75
.80
.70
.75
.80
.70
.75
.80
.70
.75
.80
.70
.75
.80
(.37, .87)
(.46, .90)
(.55, .92)
(.45, .85)
(.53, .87)
(.62, .90)
(.50, .83)
(.57, .86)
(.65, .89)
(.52, .82)
(.60, .85)
(.67, .88)
(.54, .81)
(.61, .84)
(.69, .88)
(.56, .80)
(.63, .84)
(.70, .87)
(.57, .80)
(.63, .83)
(.70, .87)
(.44, .85)
(.52, .88)
(.60, .91)
(.50, .83)
(.58, .86)
(.65, .89)
(.54, .81)
(.61, .85)
(.68, .88)
(.56, .80)
(.62, .84)
(.70, .87)
(.57, .80)
(.64, .83)
(.71, .87)
(.58, .79)
(.65, .83)
(.72, .86)
(.59, .78)
(.66, .82)
(.72, .86)
(.56, .80)
(.62, 84)
(.70, .87)
(.59, .78)
(.65, .82)
(.72, .86)
(.61, .77)
(.67, .81)
(.73, .85)
(.62, .77)
(.68, .81)
(.74, .85)
(.63, .76)
(.69, .80)
(.75, .84)
(.63, .76)
(.69, .80)
(.75, .84)
(.64, .75)
(.70, .80)
(.76, .84)
Cronbachs a
CI95
CI90
CI68
.70
.75
.80
.85
.70
.75
.80
.85
.70
.75
.80
.85
.70
.75
.80
.85
.70
.75
.80
.85
.70
.75
.80
.85
.70
.75
.80
.85
(.47, .86)
(.56, .88)
(.65, .91)
(.74, .93)
(.52, .84)
(.60, .86)
(.68, .89)
(.76, .92)
(.55, .82)
(.62, .85)
(.70, .88)
(.77, .91)
(.57, .81)
(.64, .84)
(.71, .87)
(.78, .90)
(.58, .80)
(.65, .83)
(.72, .87)
(.79, .90)
(.59, .79)
(.66, .83)
(.73, .86)
(.79, .90)
(.60, .79)
(.66, .82)
(.73, .86)
(.80, .89)
(.52, .84)
(.60, .87)
(.68, .90)
(.76, .92)
(.55, .82)
(.63, .85)
(.70, .88)
(.78, .91)
(.58, .80)
(.65, .85)
(.72, .87)
(.79, .90)
(.59, .79)
(.66, .83)
(.73, .86)
(.80, .86)
(.60, .79)
(.67, .82)
(.73, .86)
(.80, .89)
(.61, .78)
(.67, .82)
(.74, .85)
(.80, .89)
(.61, .78)
(.68, .81)
(.74, .85)
(.81, .89)
(.60, .80)
(.67, .83)
(.73, .86)
(.80, .90)
(.62, .78)
(.68, .82)
(.75, .85)
(.81, .89)
(.63, .77)
(.69, .81)
(.75, .85)
(.82, .88)
(.64, .76)
(.70, .80)
(.76, .84)
(.82, .88)
(.64, .76)
(.70, .80)
(.76, .84)
(.82, .88)
(.65, .75)
(.71, .79)
(.77, .83)
(.82, .88)
(.65, .75)
(.71, .79)
(.77, .83)
(.83, .87)
185
N
20
30
40
50
60
70
80
Z2 .01
Z2 .06
Z2 .14
0a
0a
0a
0a
0a
0a
0a
.01
.03
.03
.04
.04
.05
.05
.09
.11
.12
.12
.12
.13
.13
a
As a proportion of variance, Z2 is always positive, but applying the correction for
bias can result in small negative values, which are customarily treated as zero.
186
Z2
CI95
CI90
CI68
.01
.06
.14
.01
.06
.14
.01
.06
.14
.01
.06
.14
.01
.06
.14
.01
.06
.14
.01
.06
.14
(.00, .21)
(.00, .32)
(.00, .41)
(.00, .17)
(.00, .27)
(.00, .37)
(.00, .14)
(.00, .24)
(.00, .34)
(.00, .12)
(.00, .22)
(.01, .32)
(.00, .11)
(.00, .20)
(.02, .30)
(.00, .10)
(.00, .19)
(.02, .29)
(.00, .09)
(.00, .18)
(.03, .28)
(.00, .17)
(.00, .27)
(.00, .37)
(.00, .13)
(.00, .23)
(.00, .33)
(.00, .11)
(.00, .21)
(.01, .30)
(.00, .10)
(.00, .19)
(.02, .29)
(.00, .09)
(.00, .18)
(.03, .27)
(.00, .08)
(.00, .17)
(.04, .27)
(.00, .07)
(.00, .16)
(.04, .26)
(.00, .07)
(.00, .17)
(.02, .27)
(.00, .06)
(.00, .15)
(.04, .25)
(.00, .06)
(.01, .14)
(.05, .23)
(.00, .05)
(.01, .13)
(.06, .23)
(.00, .05)
(.01, .12)
(.06, .22)
(.00, .04)
(.02, .12)
(.07, .21)
(.00, .04)
(.02, .12)
(.07, .21)
187
Z2 a
Effect
CI95
CI90
CI68
20 per group
.06
Group
GT
Group
GT
Group
GT
Group
GT
Group
GT
Group
GT
(.00, .30)
(.00, .23)
(.01, .40)
(.04, .34)
(.00, .25)
(.01, .19)
(.02, .36)
(.05, .29)
(.00, .21)
(.01, .16)
(.02, .31)
(.06, .26)
(.00, .26)
(.01, .21)
(.02, .37)
(.05, .31)
(.01, .21)
(.02, .16)
(.04, .31)
(.07, .26)
(.00, .17)
(.02, .14)
(.04, .28)
(.08, .24)
(.01, .17)
(.03, .15)
(.07, .27)
(.09, .25)
(.02, .15)
(.04, .12)
(.08, .24)
(.10, .22)
(.02, .12)
(.04, .11)
(.08, .22)
(.11, .20)
.14
30 per group
.06
.14
40 per group
.06
.14
of replications (at least 1,000) would be performed, but a smaller sample was deemed
sufficient to provide the rough estimates needed
here.
The bootstrap confidence intervals are presented in Table 7. The confidence intervals are
consistently narrower for the Group Time
interaction than for the Group effect. This would
be expected, as the correlation among measurements increases power and precision more for
within-subjects factors than for between-subjects
factors. In addition, the lower limits of the
confidence intervals for the effect of group are
higher with this design than they were for the same
effect size when samples were independent
(Table 6), suggesting that pilot studies perhaps
could be slightly smaller when this design is
employed. Nevertheless, using the most liberal CI
(68%), it is evident that, as with the independentgroups design, even group sizes of 40 will be
barely adequate when the effect size is moderate.
A single-factor within-subjects design, such
as might be used in an efficacy study with a
single group being measured before and after
treatment, is another common design in pilot
studies. Ordinarily the aim of an efficacy study is
to demonstrate change following implementation
of an intervention as a preliminary step before
comparing the intervention to an alternative.
Although estimates of the mean difference and
variance from an efficacy study may be useful in
planning the two-group study, the effect size for
the change within a single group may not
correspond to the between-group effect that will
be tested in the later study. Considering that direct
effect size estimation is not the ultimate goal of an
Research in Nursing & Health DOI 10.1002/nur
188
189
efficacy study than would be desirable if comparing interventions. Larger samples would be
needed if there were greater violation of the
sphericity assumption or if the within-subject
correlation were less than .60. For example, with a
within-subject correlation of only .40, 30 instead
of 20 would be needed using a .05 to have power
of .80 for an effect size of Z2 .08.
Abstract Review
Abstracts of pilot studies (as defined in this article)
funded by National Institutes of Health (NIH)
National Institute of Nursing Research (NINR)
R03 and R15 grants from 2002 to 2004 were
obtained using the CRISP database (National
Institutes of Health, 2005), and Medline was
searched for articles on pilot studies published in
2004 and referenced in the category of nursing.
Studies were eliminated if they were qualitative
studies, reported an evaluation of a pilot program,
or involved a clustered design. Studies whose
abstracts gave no sample size information
were also not included in the summary. During
the defined period, NINR funded 30 studies that
were identified in their abstracts as pilot studies;
14 studies met the criteria outlined above. Two of
the 14 studies were multi-phase and contributed
a second sample size to the review. Total sample
sizes ranged from 24 to 400, with a median of 49.
Of the 16 samples, 2 were purely psychometric
studies (n 40 and n 400), 3 were single-group
correlational or descriptive studies (n 40, 100,
110), and 11 were quasi-experimental or small
randomized clinical trials. This latter group
included one study reporting results from two
experiments (n 24 and n 48) in which all
participants received multiple, randomly ordered
treatments, 7 studies having two groups (median
size of 24 per group) and two studies each having
three groups (n 25 and n 27 per group).
The Medline search yielded 199 studies, 96 of
which met criteria. Their total sample sizes ranged
from 3 to 419, with a median of 34.5. Of those
involving single groups, 13 were purely psychometric studies (median size of 84), 35 were
correlational/descriptive (median size of 40), and
21 were feasibility or efficacy studies (median size
of 18). Of the remaining 27 studies, 24 were twogroup comparisons, with a median group size of
20.5. The remaining 3 studies each had 3 groups,
with group sizes of 10, 15, or 30. Although the
sample sizes recommended in this article are
within the ranges found in the abstract reviews, it
is clear that many of these studies would be too
Research in Nursing & Health DOI 10.1002/nur
190
191