Professional Documents
Culture Documents
Figure 3. Scatter Plot Showing Association between BMI Figure 4. Mean RBP4 by Obesity Status
and Serum RBP4 Normal weight, BMI < 25; overweight, BMI 25.0–29.9; obese, BMI ≥ 30.
ate when comparing means among groups. (Standard approaches are based on different assumptions and
deviations summarize variability in individual scores.) can have different interpretations. The assumptions
Note the contrast between Figures 1 and 4. Figure 1 for each procedure are important. If assumptions are
displays the distribution of individual serum RBP4 lev- not met, then procedures will fail to maintain desirable
els, whereas Figure 4 displays summary statistics on statistical properties (e.g., confidence intervals will not
serum RBP4 levels by obesity status. The most appro- maintain claimed probabilities, and hypothesis tests
priate display depends on the objective (e.g., to dem- may be more likely to produce incorrect results). Some
onstrate variability in individual scores or to compare assumptions are extremely important and others can
means among groups). be relaxed. An important assumption is independence
of sampling units. Some procedures are based on the
Unit of Analysis assumption that the outcome of interest is approxi-
The unit of analysis refers to the entity on which mea- mately normally distributed. Many techniques maintain
surements are made. In clinical studies, the unit of their statistical properties in the presence of violations
analysis is often the individual. Some studies may of this assumption. We will indicate the situations in
measure a particular characteristic or biomarker in the which this assumption is important.
same individual repeatedly over time. Statistical pro- There are two general areas of statistical inference—
cedures assume independence of the units of analy- estimation and hypothesis testing. In estimation, we
sis. Thus, in study designs where units are measured generate confidence interval (CI) estimates of unknown
repeatedly over time, the analyses must reconcile population parameters (e.g., the mean in a single popu-
these dependencies. lation, the difference in proportions in two independent
samples) based on sample data appropriately accounting
Number of Comparison Groups and Relationships for sampling variability. Confidence interval estimates are
among Groups interpreted as a range of plausible values for an unknown
In many statistical applications, it is of interest to com- population parameter with a probability attached (Sul-
pare groups on the basis of the primary outcome(s). For livan, 2006). In hypothesis testing, we formally compare
example, we might want to compare characteristics population parameters based on sample data, again
shown in Table 1 by obesity status. The nature of the accounting for sampling variability. We set up competing
primary outcome dictates how the comparison will be hypotheses, called the null and research hypotheses. The
made. If the primary outcome is continuous, then means null hypothesis is the no-difference or no-effect state-
(e.g., mean ages, blood pressures) or medians (e.g., ment, whereas the research hypothesis states the antici-
median triglycerides) are compared among groups. If pated or hypothesized difference or effect. A test statistic
the primary outcome is discrete (e.g., gender or diabetes is computed which summarizes the sample information
status), then proportions are compared among groups. as it relates to the null hypothesis. Hypothesis tests pro-
The number of comparison groups is important in duce a p value, which is the probability of observing a test
determining the appropriate statistical test. One-group statistic as large or larger than that observed if the null
or one-sample procedures are used to compare a single hypothesis were true (D’Agostino et al., 2004). A small p
study sample to a known referent (e.g., a historical com- value (e.g., p value < 0.05) would suggest that there is less
parator). Two-group or two-sample procedures are very than a 5% probability of observing a difference as large or
popular and can be used, for example, to compare com- larger than that observed in the study sample, and would
peting treatments (e.g., active drug versus placebo). A likely lead to rejection of the null hypothesis in favor of the
critically important issue is whether the groups are inde- research hypothesis. The investigator must choose the
pendent or matched/paired. Independent groups are appropriate significance criterion on which to make that
physically separate and are comprised of distinct sam- decision (e.g., 0.05, 0.01). Both significant (i.e., p value <
pling units (e.g., different experimental units assigned 0.05) and nonsignificant p values should be provided so
to the active drug versus placebo), whereas dependent, that the reader can judge the significance (or lack thereof)
matched, or paired groups are often produced when the of the findings.
same sampling units are measured twice (e.g., before and
after an exposure) or when the sampling units are paired Procedures for Statistical Inference
(e.g., siblings, litter mates). In these situations, proce- We now describe popular procedures for statisti-
dures are needed to appropriately account for within- cal inference. Investigators must choose whether a
subject variability. This is discussed further below. confidence interval approach or a hypothesis testing
approach is appropriate in a given setting. We provide
Statistical Inference some examples below to illustrate the difference in
There are many procedures that are used for statis- approaches.
tical inference. Each procedure has assumptions One-Sample Procedures for Means and Proportions
about the design and about the distribution of the As noted above, there are one-sample procedures for
primary outcome. It is important to recognize that means and proportions. One-sample studies are most
there are often several ways to analyze data. Different useful when investigating new techniques or technolo-
Table 4. Confidence Interval and Test of Hypothesis for the Difference in Means
CI for difference in population meansa 1 1
( x1 x 2 ) t sp
n1 n2
Test for difference in independent population means µ1 − µ2
Null hypothesis µ1 = µ2
Research hypothesis µ1 ≠ µ2
Test statistica x1 - x 2
t
sp 1 1
n1 n2
a
Where x1 and x2 are the means in the study samples, t is the value from the t distribution reflecting the desired confidence level
(e.g., 95%), sp is the pooled standard deviation (appropriate when the population variances are assumed to be equal and computed
2 2
(n 1 1)s 1 (n 2 1)s2
by combining the variances in the two study samples, sp ), and n1 and n2 are the respective sample sizes.
n1 n 2 2
Using the data summarized in Figure 4, a 95% con- When the data are matched or paired, then the analy-
fidence interval for the difference in mean RBP4 levels sis is focused on difference scores. For example, sup-
between normal and overweight participants is 28.2 ± 14.1, pose a study is conducted in which measures of serum
or (14.1, 42.3). The difference in means is 28.2 units with a RBP4 are taken on n sampling units at baseline and then
margin of error of 14.1 units. We are 95% confident that the again after 6 weeks of exposure to an exercise program.
true difference in mean RBP4 levels between normal and Suppose the objective is to assess the change in the
overweight participants is between 14.1 and 42.3 units. primary outcome in response to the exercise program.
Comparing mean RBP4 levels between normal and Because two measurements are taken on each sam-
overweight using a test of hypothesis (see Table 4) pro- pling unit, we violate the assumption of independence of
duces t = −4.92, which is highly statistically significant with sampling units. The procedure is to compute difference
p = 0.0027 (i.e., the mean RBP4 levels are statistically sig- scores on each unit by subtracting the measurements
nificantly different between participants who are of normal (e.g., baseline to 6 weeks). A confidence interval for the
weight as compared to overweight). The confidence inter- mean difference or a test about the mean difference in
val and test of hypothesis are two different approaches the population can be conducted. The confidence inter-
to making the comparison. The 95% confidence interval val formula and the hypothesis testing procedure are
provides the range of plausible values for the difference shown in Table 5.
in means (i.e., 14.1–42.3), whereas the test of hypothesis It is very important to note that in the paired test (and
produces the significance of the difference (i.e., p value = confidence interval for paired data), summary statistics
0.0027). Because the confidence interval does not include (i.e., the mean and standard deviation) are based on dif-
0 (i.e., the null value), the confidence interval also indicates ference scores.
that there is a significant difference in means. Tests for Means in More Than Two Groups
When there are more than two independent groups, the
procedure to test for differences in means is analysis
Table 5. Confidence Interval and Test of Hypothesis
of variance (ANOVA). In ANOVA, there are k (>2) inde-
for the Mean Difference in Matched or Paired
pendent groups and again variances among groups are
Samples
assumed to be equal. In addition, data are assumed to
CI for population mean differencea sd
xd t follow a normal distribution. The procedure for testing
n the equality of means is shown in Table 6.
Test for population mean difference µd The above is suitable to test for differences in means
Null hypothesis µd = 0 across groups defined by a single factor (e.g., treatments
or exposures). For example, using the data summarized
Research hypothesis µd ≠ 0
in Figure 4, we can compare the mean serum RBP4 lev-
Test statistica els by obesity status using ANOVA. The test produces
t=
F = 19.6, which is highly statistically significant with p =
0.0014 (i.e., the three means are significantly different).
a
Where xd is the mean of the difference scores in the study
sample, md is the mean difference specified under the null ANOVA is a very general procedure that can also be
hypothesis, t is the value from the t distribution reflecting the used to test for differences in means as a function of
desired confidence level, sd is the standard deviation of the two or more factors. For example, suppose we wish to
difference scores in the study sample, and n is the sample test for differences in expression in various cell types
size (i.e., the number of independent sampling units, equal to exposed to different experimental conditions. Two-fac-
the number of pairs).
tor ANOVA is used to test for differences in expression
D’Agostino, R.B., Sullivan, L.M., and Beiser, A. (2004). Introductory Please cite this article as:
Applied Biostatistics (Belmont, CA: Duxbury Brooks/Cole). Dukes, K.A., and Sullivan, L.M. (2007). A Review of Basic Biosta-
tistics. In Evaluating Techniques in Biochemical Research, D. Zuk,
Littell, R.C., Henry, P.R., and Ammerman, C.B. (1998). Statisti- ed. (Cambridge, MA: Cell Press), http://www.cellpress.com/misc/
cal analysis of repeated measures data using SAS procedures. J. page?page=ETBR.