Professional Documents
Culture Documents
REVIEW ARTICLE
SUMMARY
Background: An understanding of p-values and confidence
intervals is necessary for the evaluation of scientific
articles. This article will inform the reader of the meaning
and interpretation of these two statistical concepts.
Methods: The uses of these two statistical concepts and
the differences between them are discussed on the basis
of a selective literature search concerning the methods
employed in scientific articles.
Results/Conclusions: P-values in scientific studies are
used to determine whether a null hypothesis formulated
before the performance of the study is to be accepted or
rejected. In exploratory studies, p-values enable the recognition of any statistically noteworthy findings. Confidence
intervals provide information about a range in which the
true value lies with a certain degree of probability, as well
as about the direction and strength of the demonstrated
effect. This enables conclusions to be drawn about the
statistical plausibility and clinical relevance of the study
findings. It is often useful for both statistical measures to
be reported in scientific articles, because they provide
complementary types of information.
Dtsch Arztebl Int 2009; 106(19): 3359
DOI: 10.3238/arztebl.2009.0335
Key words: publications, clinical research, p-value,
statistics, confidence interval
Johannes Gutenberg-Universitt Mainz: Zentrum fr Kinder- und Jugendmedizin, Zentrum Prventive Pdiatrie: Dr. med. du Prel, MPH
Johannes Gutenberg-Universitt Mainz: Institut fr Medizinische Biometrie,
Epidemiologie und Informatik: Prof. Dr. rer. nat. Hommel, Dr. rer. nat. Rhrig,
Prof.Dr. rer. nat. Blettner
What is a p-value?
In confirmatory (evidential) studies, null hypotheses
are formulated, which are then rejected or retained
with the help of statistical tests. The p-value is a probability, which is the result of such a statistical test. This
probability reflects the measure of evidence against the
null hypothesis. Small p-values correspond to strong
evidence. If the p-value is below a predefined limit, the
results are designated as "statistically significant" (1).
The phrase "statistically striking results" is also used in
exploratory studies.
If it is to be shown that a new drug is better than an old
one, the first step is to show that the two drugs are not
equivalent. Thus, the hypothesis of equality is to be
rejected. The null hypothesis (H0) to be rejected is then
formulated in this case as follows: "There is no difference
between the two treatments with respect to their effect."
For example, there might be no difference between two
antihypertensives with respect to their ability to reduce
blood pressure. The alternative hypothesis (H1) then states
that there is a difference between the two treatments.
This can either be formulated as a two-tailed hypothesis
(any difference) or as a one-tailed hypothesis (positive
or negative effect). In this case, the expression "one-tailed"
means that the direction of the expected effect is laid
down when the alternative hypothesis is formulated. For
example, if there is clear preliminary evidence that an
antihypertensive has on average a stronger hypertensive
effect than the comparator drug, the alternative hypothesis
can be formulated as follows: "The difference between
the mean hypotensive activity of antihypertensive 1 and
the mean hypotensive activity of antihypertensive 2 is
positive." However, as this requires plausible assumptions
about the direction of the effect, the two-tailed hypothesis
is often formulated.
335
MEDICINE
336
MEDICINE
FIGURE 1
Using the example of the difference in the mean systolic blood pressure between two groups,
it is examined how the size of the confidence interval (a) can be modified by changes in dispersion (b, c), confidence interval (d, e), and sample size (f, g). The difference between the
mean systolic blood pressure in group 1 (150 mm Hg) and in group 2 (145 mm Hg) was
5 mmHg. Example modified from (6)
337
MEDICINE
FIGURE 2
338
Conclusion
Taken in isolation, p-values provide a measure of the
statistical plausibility of a result. With a defined level of
significance, p-values allow a decision about the rejection or maintenance of a previously formulated null
hypothesis in confirmatory studies. Only very restricted
statements about effect strength are possible on the basis
of p-values. Confidence intervals provide an adequately
plausible range for the true value related to the measurement of the point estimate. Statements are possible on
the direction of the effects, as well as its strength and the
presence of a statistically significant result. In conclusion,
it should be clearly stated that p-values and confidence
intervals are not contradictory statistical concepts. If the
size of the sample and the dispersion or a point estimate
are known, confidence intervals can be calculated from
p-values, and conversely. The two statistical concepts
are complementary.
Conflict of interest statement
The authors declare that there is no conflict of interest as defined by the
guidelines of the International Committee of Medical Journal Editors.
Manuscript received on 23 July 2008, revised version accepted on
21 August 2008.
Translated from the original German by Rodney A. Yeates, M.A., Ph.D.
MEDICINE
REFERENCES
1. Bland M, Peacock J: Interpreting statistics with confidence. The
Obstetrician and Gynaecologist 2002; 4: 17680.
2. Houle TT: Importance of effect sizes for the accumulation of knowledge. Anesthesiology 2007; 106: 4157.
3. Faller, H: Signifikanz, Effektstrke und Konfidenzintervall. Rehabilitation 2004; 43: 1748.
4. Greenfield ML, Kuhn JE, Wojtys EM: A statistics primer. Confidence
intervals. AmJ Sports Med 1998; 26: 145-9. No abstract available.
Erratum in: Am J Sports Med 1999; 27: 544.
5. Bender R, Lange St: Was ist ein Konfidenzintervall? Dtsch Med
Wschr 2001; 126: 41.
6. Altman DG: Confidence intervals in practice. In: Altman DG, Machin
D, Bryant TN, Gardner MJ. BMJ Books 2002; 69.
7. Weiss C: Intervallschtzungen. Die Bedeutung eines Konfidenzintervalls: In: Wei C: Basiswissen Medizinische Statistik. Springer
Verlag 1999; 1912.
8. Moher D, Schulz KF, Altman DG fr die CONSORT Gruppe: Das
COSORT Statement: berarbeitete Empfehlungen zur Qualittsverbesserung von Reports randomisierter Studien im Parallel-Design.
Dtsch Med Wschr 2004; 129: 16-20.
9. Moher D, Cook DJ, Eastwood S, Olkin I, Rennie D, Stroup DF: Improving the quality of reports of meta-analyses of randomized
controlled trials: the QUOROM statement. Quality of Reporting of
Meta-analyses. Lancet 1999; 354: 1896900.
10. Easterbrook PJ, Berlin JA, Gopalan R, Matthews DR: Publication
bias in clinical research. Lancet 1991; 337: 86772.
11. Shakespeare TP, Gebski VJ, Veness MJ, Simes J: Improving interpretation of clinical studies by use of confidence levels, clinical
Corresponding author
Dr. med. Jean-Baptist du Prel, MPH
Zentrum fr Kinder- und Jugendmedizin
Zentrum Prventive Pdiatrie Mainz
Langenbeckstr. 1
55101 Mainz, Germany
duprel@zpp.klinik.uni-mainz.de
339