Professional Documents
Culture Documents
Joseph A. Durlak
Loyola University Chicago
The objective of this article is to offer guidelines regarding the selection, calculation, and interpretation of effect
sizes (ESs). To accomplish this goal, ESs are first defined and their important contribution to research is
emphasized. Then different types of ESs commonly used in group and correlational studies are discussed.
Several useful resources are provided for distinguishing among different types of effects and what modifications
might be required in their calculation depending on a studys purpose and methods. This article should assist
producers and consumers of research in understanding the role, importance, and meaning of ESs in research
reports.
All correspondence concerning this article should be addressed to Joseph A. Durlak, Department of Psychology, Loyola
University Chicago, 6525 N. Sheridan Road, Chicago, IL 60626, USA. E-mail: jdurlak@luc.edu.
Journal of Pediatric Psychology 34(9) pp. 917928, 2009
doi:10.1093/jpepsy/jsp004
Advance Access publication February 16, 2009
Journal of Pediatric Psychology vol. 34 no. 9 The Author 2009. Published by Oxford University Press on behalf of the Society of Pediatric Psychology.
All rights reserved. For permissions, please e-mail: journals.permissions@oxfordjournals.org
918 Durlak
variables they are investigating are correlated. ESs provide publication purposes, but as an essential component of
this information by assessing how much difference there is good research. ESs achieve an important purpose because
between groups or how strong the relationship is between they help to assess the overall contribution of a study.
variables. In other words, ESs assess the magnitude or Once it is understood what an ES is in general terms,
strength of the findings that occur in research studies. and what valuable information it provides, many ESs
This is critical information that cannot be obtained solely can be presented directly or easily calculated. What does
by focusing on a particular p-value such as .05 (Thompson, become complicated are the nearly infinite number of ways
2006; Volker, 2006). There is no straightforward relationship that researchers can design their study and analyze their
between a p-value and the magnitude of effect. A small p-value data which, in turn, may require some modification in the
can relate to a low, medium, or high effect. Moreover, as usual way of determining ESs. The following discussion
discussed later, there is no straightforward relationship focuses on the most common ES indices and research
between the magnitude of an effect and its practical or clinical situations, but many additional user-friendly references
value. Depending on the circumstances, an effect of lower are cited so that readers can gain further informa-
magnitude on one outcome can be more important than tion that applies to their particular research scenarios.
an effect of higher magnitude on another outcome. The description of different types of ESs begins first with
those used in group designs.
difference between the post-test means in the numerator of untreated control group. A separate ES for each experimen-
the equation and using standard deviation units in the tal condition can be calculated using Equations (1) or (2).
denominator, hence the term standardized mean differ- An ES could also be calculated comparing the two exper-
ence. This standardization permits direct comparisons imental groups using either one in place of the control
across studies using the same index of effect. A SMD of group in Equations (1) or (2). However, the latter ES
0.50 based on outcome A from one study is directly com- (E group versus another E group) is likely to be lower
parable to a SMD of 0.50 calculated on that same in magnitude than the E versus C comparison because
outcome in another study. two interventions are being compared. Youth should ben-
The two most common SMD statistics are Hedges efit in each condition. Child therapy reviews have found
g and Cohens d [see Equations (1) and (2) in the appen- that treatment to treatment comparisons yield ESs that can
dix, respectively). There are some differences in how these be up to one-third lower than those obtained in
statistics are calculated, but both are positively biased treatment versus control conditions (Kazdin, Bass, Ayers
estimators of an ES when sample sizes are small. & Rodgers, 1990). Helpful advice for calculating SMDs
Therefore, it is important to correct for their upwards for different research situations is provided by Lipsey
bias. The correction factor is in the second and third and Wilson (2001).
parts of Equations (1) and (2). Practically speaking, the
(a) controls, but improves over time while the control group
45
stays the same. In Figure 1c the intervention group once
40
again has more problems than controls at pre but remains
35 unchanged over time while the control group deteriorates
30 from pre to post.
Problems
25
especially with smaller subject samples. For example,
Experimental
in a large scale meta-analysis of school-based interventions
20 Control
(Durlak, Weissberg, Taylor, Dymnicki & Schellinger,
15
2008) that included both quasi-experimental and rando-
10 mized studies with a wide variation in samples sizes, less
5 than 1% of the pre-ESs were truly zero. Admittedly, some
0 pre differences might be negligible, but it can be worth-
Pre Post while to assess their magnitude and then calculate
(c)
and report adjusted ESs. Adjusted ESs should be
35 reported along with the pre- and post-ES so as not to
confuse readers.
30
Calculating an adjusted ES can have a major impact on
25 the final result. Suppose the hypothetical pre- and post-ESs
in Figure 1a are 0 and 0.20; in Figure 1b, they are 0.20
Problems
20 Experimental
and 0.20 and in Figure 1c they are 0.20 and 0.20,
15 Control respectively.
10
As a result, the adjusted ESs would be:
deterioration in adjustment that is observed in control A person in the intervention group is 2.5 times more
groups over time as portrayed in Figure 1c. On other occa- likely to graduate than someone in the control group
sions, however, prevention can be effective because it (Liberman, 2005).
results in a greater reduction of problems in the inter- Sometimes there is confusion when authors present
vention groups than in controls (as in Figure 1b or in data for a negative or undesirable outcome. For example,
Figure 1a, but with some pre between-group differences). assume a hypothetical result indicating that 20% of an
There are two different messages that can be drawn intervention group but 50% of a control group no longer
from such findings. In the former case, one might conclude met diagnostic criteria for a clinical disorder following
that youth will worsen over time if we do not take effective treatment. Using Equation (4), the odds of the interven-
preventive action; in the latter case, one might conclude tion group having a diagnosis are 0.2/0.8 or 0.25 while
that participants will be better adjusted following the corresponding odds for the controls is 0.5/0.5
their involvement in preventive programs. or 1.0. The OR in this case is 0.25/1.0 or 0.25.
The odds for the intervention group is thus one-fourth
ORs
the odds of the controls, or, alternately, the odds for the
Another possible index of effect in group designs is the OR control group is four times higher than the intervention
which reflects the odds of a successful or desired outcome group.
in the traditional fashion, and is obtainable in the stan- most appropriate effect in each situation is important.
dard output of statistical packages; r is a widely used When outcomes are truly dichotomous (high school
index of effect that conveys information both on the graduation, diagnostic status), the OR is usually the best
magnitude of the relationship between variables and its choice. In group designs when continuous outcome
direction (Rosenthal, 1991). The possible range of r is measures are used, and raw mean differences cannot
well known: from 1.00 through zero (absolutely no be compared across studies, SMDs are preferred, and
relationship) to 1.00. Variants of r, such as rho, the r is best for correlational studies. However, ones research
point-biserial coefficient, and the phi coefficient can also goals and the statistical assumptions related to the use
be used as an ES. of an ES statistic affect the choice of the most
In most cases, when multiple regression analyses appropriate measure.
are conducted, the magnitude of effect for the total For example, McGrath and Meyer (2006) and Ruscio
regression equation is simply the multiple R. The unique (2008) offer a valuable discussion of the relative merits
contribution of each variable in a multiple regression of SMDs and r in two group designs when one variable
can be determined by using the t-value that is provided is dichotomous and the other is not. These authors have
by statistical packages when that variable enters the regres- shown that the superiority of one index over another
sion. Apply Equation (6) from the appendix. In structural depends on several factors such as unequal ns and heter-
general conventions do not automatically apply. Moreover, than one of higher magnitude based on another type
assuming that large effects are always more important of measure. In general, researchers place more confidence
than small or medium ones is unjustified. It is not in more rigorously conducted investigations although what
only the magnitude of effect that is important, but also constitutes rigor varies from area to area. Usually, there
its practical or clinical value that must be considered. is more confidence in a finding based on a more
For example, based on what has been achieved in objective measurement strategy than one based solely
many different types of interventions, educational research- on self-report. The reader can imagine other research
ers have indicated that ESs around 0.20 are of policy situations where one research method is preferred over
interest when they are based on measures of academic another. In other words, do not focus only on the magni-
achievement (Hedges & Hedberg, 2007). This would tude of effects; some ESs carry more weight in the arena
suggest that a study with an effect of 0.20, which at first of scientific opinion than others because they are based
glance, might be misconstrued as a small effect if one on stronger research methodology.
automatically invokes Cohens original conventions, can
be an important outcome in some research areas. Comparability of Outcomes is Important
In other words, authors should examine prior relevant With respect to the second guideline for interpreting ES
research to provide some indication of the magnitude appropriately, it is essential to consider the outcomes being
Still another possibility is to use Rosenthal and basic data are available (e.g., n, mean, and SD). Even if
Rubins (1982) binominal ES display (BESD). The BESD this information is missing, ESs can be estimated in many
uses r to calculate the relative success rates (SRs) for two instances using other information such as t- or F-values,
groups. If SMDs are used, they are first converted to r probability levels, and the numbers or percentages of
(e.g., via the relevant formula in the appendix). The SRs participants with different outcomes. Several sources
for two groups are determined by the formulas: provide the necessary details for correlational studies
SR 50% r/2 (converted to a percentage) for the and group designs (Lipsey & Wilson, 2001; Rosenthal,
intervention group, and 2001).
SR 50% r/2 (converted to a percentage) for the In sum, ESs become meaningful when they are judged
control group. within the context of prior relevant research and the
For example, suppose an r of 0.20 was obtained. magnitude of the effect is only one factor. These judgments
Using the above formulae, the SRs would be 50% plus should be based on the characteristics of the research
0.20/2 0.10 (10%) which equals 60% for the interven- studies producing the effect, the type of outcomes
tion group and 50% 0.20/2 0.10 (10%) which equals assessed, and the practical or clinical significance of the
40% for the controls. SRs are interpreted such that results.
if there was no effect, the SRs for both groups would be
or persuasive. It is up to each author to use ESs to explain Durlak, J. A., Weissberg, R. P. Taylor, R. D., Dymnicki,
the meaning and practical value of their findings. Volker A. B., & Schellinger, K. B. Promoting social and
(2006) offers this helpful admonition: The easy availabil- emotional learning enhances school success:
ity of Cohens arbitrary guidelines should not be an excuse Implications of a meta-analysis. Unpublished
for us to fail to seek out and/or determine our own manuscript, Loyola University Chicago.
domain-specific standards based on empirical data and Durlak, J. A., & Wells, A.M. (1997). Primary prevention
reasoned arguments (p. 671). mental health programs for children and
adolescents: A meta-analytic review. American Journal
Concluding Comments of Community Psychology, 25, 115152.
Finch, S., & Cumming, G. (2008). Putting research in
It is important for consumers and producers of research to
understand what ESs are, why they are important, and how context: Understanding confidence intervals from one
to calculate and interpret them. Guidelines regarding ESs or more studies. Journal of Pediatric Psychology.
are offered in Table I. For those who currently do not have Advance Access published December 18, 2008,
the necessary background and knowledge, this article doi:10.1093/jpepsy/jsn118.
along with the other references cited here are intended to Fleiss, J. L. (1994). Measures of effect size for categorical
data. In H. Cooper, & L. V. Hedges (Eds.),
Liberman, A. M. (2005). How much more likely? of effect sizes. Washington, DC: What Works
The implications of odds ratios for probabilities. Clearinghouse. Retrieved August 22, 2008 from http://
American Journal of Evaluation, 26, 253266. ies.ed.gov/ncee/wwc/references/iDocViewer/Doc.aspx?
Lipsey, M. W., & Wilson, D. B. (1993). The efficacy docId19&tocId5.
of psychological, educational, and behavioral Volker, M. A. (2006). Reporting effect sizes in school
treatment. Confirmation from meta-analysis. psychology research. Psychology in the Schools, 43,
American Psychologist, 48, 11811209. 653672.
Lipsey, M. W., & Wilson D. B. (2001). Practical meta- Weisz, J. R., Jenson Doss, A., & Hawley, K. M. (2005).
analysis. Thousand Oaks, CA: Sage. Youth psychotherapy outcome research: A review and
McGrath, R. E., & Meyer, B. (2006). When effect sizes critique of the evidence base. Annual Review of
disagree: The case of r and d. Psychological Methods, 11, Psychology, 56, 337363.
386401. Wilkinson, L., Task Force on Statistical Inference, & APA
Meyer, G. J., McGrath, R. E., & Rosenthal, R. (2003). Board of Scientific Affairs. (1999). Statistical methods
Basic effect size guide with SPSS and SAS syntax. in psychology journals: Guidelines and explanations.
Retrieved January 21, 2008 from http://www.tandf. American Psychologist, 54, 594604.
co.uk/journals/resources/hipa.asp. Wilson, D. B., Gottfredson, D. C., & Najaka, S. S. (2001).
s
SDE 2 nE 1 SDC 2 nC 1 Effects for Correlational Designs
SD pooled
nE nC 2 (6) Computing r from an independent t-test
s
(2) Calculating Cohens d from means and standard t2
deviations r 2
t df
r
ME MC N3 N2
d
Sample SD pooled N 2:25 N Transforming One ES Into Another
s (7) Hedges g computed from r
p
SDE 2 SDC 2 r= 1 r2
where sample SD pooled g p
2 dfn1 n2 =n1 n2
(3) Calculating 95% CI for g:
(8) Transforming Hedges g to r
CI Critical value at :05 SD of g s
g2 n1 n2
s r
g2 n1 n2 n1 n2 df
N g2
SD of g (9) Hedges g computed from Cohens d