Professional Documents
Culture Documents
Alan Agresti
Stable URL:
http://links.jstor.org/sici?sici=0883-4237%28199202%297%3A1%3C131%3AASOEIF%3E2.0.CO%3B2-A
Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at
http://www.jstor.org/about/terms.html. JSTOR's Terms and Conditions of Use provides, in part, that unless you have obtained
prior permission, you may not download an entire issue of a journal or multiple copies of articles, and you may use content in
the JSTOR archive only for your personal, non-commercial use.
Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained at
http://www.jstor.org/journals/ims.html.
Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed
page of such transmission.
The JSTOR Archive is a trusted digital repository providing for long-term preservation and access to leading academic
journals and scholarly literature from around the world. The Archive is supported by libraries, scholarly societies, publishers,
and foundations. It is an initiative of JSTOR, a not-for-profit organization with a mission to help the scholarly community take
advantage of advances in technology. For more information regarding JSTOR, please contact support@jstor.org.
http://www.jstor.org
Mon Aug 27 14:34:53 2007
Statistical Science
1992,Vol. 7,No. 1,131-177
Abstract. The past decade has seen substantial research on exact infer-
ence for contingency tables, both in terms of developing new analyses
and developing efficient algorithms for computations. Coupled with
concomitant improvements in computer power, this research has re-
sulted in a greater variety of exact procedures becoming feasible for
practical use and a considerable increase in the size of data sets to
which the procedures can be applied. For some basic analyses of contin-
gency tables, it is unnecessary to use large-sample approximations to
sampling distributions when their adequacy is in doubt. This article
surveys the current theoretical and computational developments of
exact methods for contingency tables. Primary attention is given to the
exact conditional approach, which eliminates nuisance parameters by
conditioning on their sflicient statistics. The presentation of various
exact inferences is unified by expressing them in terms of parameters
and their sufficient statistics in loglinear models. Exact approaches for
many inferences are not yet addressed in the literature, particularly for
multidimensional contingency tables, and this article also suggests
additional research for the next decade that would make exact methods
yet more widely applicable.
Key words and phrases: Categorical data, chi-squared tests, computa-
tional algorithms, conditional inference, Fisher's exact test, logistic
regression, loglinear models, odds ratios, sufficient statistics.
methods apply to a fixed number of cells, as cell Maternal drinking and congenital malformations
cells to grow as the sample size grows (e.g., Morris, Source: Graubard and Korn (1987).
EXACT INFERENCE FOR CONTINGENCY TABLES 133
and efficiency of algorithms have made exact meth- For two-way contingency tables, let X denote the
ods feasible for a wider variety of inferential analy- row classification and Y the column classification.
ses and for a larger collection of table sizes and For three-way tables, denote the third classification
sample sizes. In this survey of exact inferences for by Z. For simplicity, denote loglinear models by
contingency tables, I indicate which computations standard symbols pertaining to their minimal suffi-
can currently be done and also highlight areas in cient statistics. For instance, (X, Y) denotes the
Sections 2 to 4 present a variety of exact methods a three-way table. The minimal sufficient statistics
for contingency tables in the context of loglinear are i n i + ) and {n+j) for ( X , Y) and in,,,) and
models. Sections 2 and 3 focus on two-way tables, {n+jk) for (XZ, YZ). Finally, denote expected fre-
Section 2 for the 2 x 2 case and Section 3 for the quencies by m and sample proportions by p, for
I x J case. Section 4 focuses on three-way tables, example, {mij = E(nij)} and { pij = nij/n} in a
discusses computing feasibility of exact methods, inference in contingency tables has been a condi-
using currently available software. tional one. Suppose an inference refers to a param-
Nearly all the literature on exact methods for eter in some loglinear model. Exact conditional
contingency tables emphasizes hypothesis testing; inferential methods utilize the distribution of the
but, in each section, I also indicate the scope sufficient statistic for that parameter, conditional
of work on interval estimation. My discussion on sufficient statistics for the other parameters
throughout the article takes the viewpoint of classi- ("nuisance" parameters) in the model. For in-
cal frequentist conditional inference, with appli- stance, suppose one wants to test a hypothesis Ho
cation to loglinear models for contingency tables. that corresponds to a loglinear model symbolized by
Section 8 mentions other approaches. It presents an Mo, under the assumption that a more general
unconditional approach to exact inference and also model M, holds, corresponding to an alternative
indicates how Bayesian inferences correspond to hypothesis HI. Denote minimal sufficient statistics
conditional inferences for certain choices of prior for the models by To and TI. Exact inference uses
distributions. The final section discusses possible the conditional distribution of T, given To (e.g.,
future directions for research on exact methods for Andersen, 1974). By the definition of sufficiency,
Throughout the article, I assume a standard Pois- nuisance parameters, thus making exact inference
table, the cell counts inij) might have a multino- contingency table, the saturated loglinear model
conditioning further on the row totals yields inde- est focuses on 8 = exp(A,,), the others being nui-
pendent multinomial sampling. All sampling mod- sance parameters. The sufficient statistics are n for
els lead to the same exact inferences, since those p, n,, for A?, n,, for A,; and n,, for A,,. To conduct
inferences condition on marginal totals that con- exact inference about 8, consider the conditional
tain as a subset the naturally fixed totals, and distribution of n,, given n, n,,, and n,,. This
since the parameters of usual interest are not the conditional distribution is the same as the one
proportions in the margins that are fixed under having elements P(Y, = n,,, Y, = n,, - n,, I Y, +
some sampling designs but not under others. Y, = n+,), where {Yi, i = 1 , 2 ) are indepen-
134 A. AGRESTI
dent binomial random variables with parameters the model by conditioning on sufficient statistics
[ ni+, mi, /(mi, +
mi,)]. Straightforward calcu- for them.
lation shows that it equals
2. EXACT INFERENCE FOR 2 x 2 TABLES
This simple case still generates an enormous vol-
ume of publications, partly because of controversy
(discussed in Section 8.1) about applying exact
methods that condition on marginal totals that are
not naturally fixed under Poisson, multinomial, or
binomial sampling schemes. The most common
where the index of summation ranges from
sampling model for 2 x 2 tables is independent
max(0, n,++ n+, - n) to min(n,+, n+,), the possi-
binomial sampling in the two rows. Denote the
ble values for n,, for the given marginal totals.
"success" probabilities by a, and a,, for which the
This is the noncentral hypergeometric distribution
odds ratio is 8 = [a, /(1 - a,)]/[a, /(1 - a,)]. The
(Fisher, 1935a; Cornfield, 1956).
hypothesis of homogeneity states that a, = a,. For
To test statistical independence of X and Y [A,,
Poisson or full multinomial sampling, conditioning
= 0 in model (1.1)1, one uses this distribution with
on { n,+, n,+ ) gives binomial sampling, and statis-
8 = 1. Of course, to complete a test, one needs to
tical independence is equivalent to homogeneity.
specify a test statistic and the way to compute the Under the hypothesis of homogeneity, conditioning
p-value. In testing a loglinear model M, against a
as well on n+, to eliminate the nuisance parameter
more complex model MI, in this article I will gen- (the common value of a, and a,) yields the hyper-
erally base p-values on the exact conditional distri- geometric distribution (1.2) with 8 = 1.
bution of Rao's efficient score statistic for testing
that the extra parameters that are in M, but not 2.1 Fisher's Exact Test
in Mo equal zero. The efficient score statistic is
based on the vector of partial derivatives of the log Fisher's exact test (Fisher, 1934, 1935a; Yates,
likelihood with respect to the extra parameters, 1934; Irwin, 1935) of H,: 8 = 1 is "well known,"
evaluated at the null estimates (Rao, 1973, Section and I refer the reader to Lehmann (1986, pages
6e). For testing independence in a 2 x 2 table, this 151-162) for details of its derivation under various
suggests [ n,, - (n,+n+,)/ nl, suitably normalized, sampling schemes. The null hypergeometric dis-
as a test statistic. tribution has E(n,,) = n,+n+, / n and Var(n,,) =
The p-value, based on extreme values of the test n1+n+,n2+n+,/n2(n- 1). To test H,: 8 = 1
statistic, is calculated using the distribution for against HI: 8 > 1, the p-value is
that statistic that is_induced by the exact condi-
tional distribution of { nij) . When M, contains one
more parameter than M,, tests based on the effi-
cient score statistic are uniformly most powerful where S = { t: t 2 n,,). The set S is identical to
unbiased (UMPU), as a consequence of a result for that of tables for which the sample odds ratio is at
exponential families (e.g., Lehmann, 1986, Theo- least as large as observed.
rem 3, page 147). Denote the ML estimator of the The hypergeometric applies directly as a sam-
parameters in M, by (e,, el), where 9, denotes the pling model when both sets of marginal counts are
estimator of the parameters that are in M, but not naturally fixed. A classic example of this case is
in M,. For large samples with a fixed number of Fisher's (1935b) tea-tasting experiment, relating to
cells, these tests are asymptotically equivalent to a woman's claim to be able to judge whether tea or
~ a l tests
d based on el and likelihood-ratio tests of milk was poured in the cup first. The woman was
Mo against M, (Rao, 1973, pages 418-420). given eight cups of tea, in four of which tea was
My reason for relating exact conditional infer- poured first and in four of which milk was poured
ence to loglinear models in this article is as first, and was told to guess which four had tea
follows. Loglinear models are generalized linear added first. The contingency table for this design
models for categorical data that use the canonical (see Table 2) has n = 8 and n,+= n+, = 4; n,, can
link, that is, they directlyqmodel the natural pa- take values O,1,2,3,4 with corresponding one-sided
rameter (the log mean) of a natural exponential p-values (for HI: 8 > 1) of 1.0, 0.986, 0.757, 0.243
family (the Poisson). Such generalized linear mod- and 0.014.
els permit reduction of the data through sufficient Ways of forming two-sided p-values in Fisher's
statistics (McCullagh and Nelder, 1989, page 32). exact test were discussed by Gibbons and Pratt
Thus, one can eliminate nuisance parameters in (1975), Yates and discussants (1984), Davis (1986),
EXACT INFERENCE FOR CONTINGENCY TABLES 135
TABLE2 ing the upper endpoint; if n,, = min(n,+, n,,), then
Fisher's tea-tasting experiment
the upper endpoint is m, and one uses P = a in
Guess poured first obtaining the lower endpoint. For the tea-tasting
Poured first Milk Tea Total
data, the 95% confidence interval obtained in this
-
manner is (0.21,626.2).
Milk 3 1 4 The discreteness of the distribution of n,, limits
Tea 1 3 4 the confidence intervals to a discrete set of possible
Total 4 4 8
endpoints, for fixed a . Thus, the true confidence
coefficient is at least 1 - a , rather than exactly
1 - a , and its value depends on the value of I9
Dupont (1986), Mantel (1987), and Lloyd (1988a). (Neyman, 1935). It is strictly greater than 1 - a
The most popular approaches are (a) double unless the true 19 is an attainable endpoint. I use
the one-sided p-value, (b) (2.1) with S = { t: quotes around "exact" in referring to such inter-
f ( t l n, n,,, n,,) If(n1, I n, n,,, n,,)), and (c) (2.1) vals to reflect this behavior.
with S = {t: I t - E(n,,)I 2 1 n,, - E(n,,)I}. The Baptista and Pike (1977) described an alterna-
third option is identical to the null probability that tive approach that also guarantees the confidence
the Pearson chi-squared statistic is at least as large level but sometimes gives slightly shorter inter-
as observed. Different approaches can give differ- vals. Their interval is a generalization of Sterne's
ent results because of the discreteness and poten- (1954) confidence interval for a single binomial
tial skewness of the hypergeometric distribution. parameter. One inverts a family of acceptance re-
For instance, consider the table having counts by gions that are formed using the minimal number of
row (10,90/20,80) (i.e., 10 and 90 in the first row, most likely outcomes. Specifically, for each 00, one
20 and 80 in the second), discussed by Dupont finds a set A(0,) of possible n,, values such that
(1986). The null distribution of n,, is symmetric the probability of the set is at least 1 - a and such
about 15, and the three p-values are identical and that every integer in the set is at least as likely to
equal 0.073. However, for the nearly identical table occur as every integer outside the set. The confi-
(10,91/20,80), p-value (a) equals 0.069, whereas dence interval is the set of 8, values for which the
(b) and (c) equal 0.050. observed n,, falls in A(0,).
It is not possible to construct "exact" confidence
2.2 "Exact" Estimation in 2 x 2 Tables intervals for association measures that are not
The non-null conditional distribution (1.2) of n,, functions of the odds ratio. They do not occur as
is used in constructing "exact" confidence intervals parameters in generalized linear models with Pois-
for the odds ratio. For data n,,, the conditional ML son or binomial random component using canonical
estimate of 6 is t h e value of 19 that maximizes links. Thus, the usual conditioning arguments do
probability (1.2). The estimate is obtained using not eliminate nuisance parameters. For instance,
iterative methods (Cornfield, 1956), and differs from consider estimation of the difference of probabili-
the unconditional ML estimate 6 = n,, n,, / n,, n,,. ties 6 = a, - a, for independent binomial samples.
For instance, for Fisher's tea-tasting data (Table 2), The joint sampling distribution can be expressed in
n,, = 3 and the unconditional ML estimate is 9; the terms of 6 and a,, for instance, but conditioning on
conditional ML estimate maximizes the conditional the marginal totals does not eliminate a,. Santner
likelihood (1.2), and Snell (1980) discussed this and other difficul-
ties in interval estimation of the difference of pro-
portions and the relative risk. They also described
ways of getting conservative confidence intervals
and equals 6.41. for these parameters, but the usual conservative-
For testing H,: 8 = Oo against HI: 6 > O,, the ness due to discreteness is compounded because of
p-value is P = C,f(t I n, n,,, n+,;8,), where S = the approach used to eliminate the nuisance
{t: t 2 n,,}. For testing against H,: 0 < O, S = {t: parameter.
t I n,,}. One can obtain an "exact" confidence in- "Exact" interval estimates exist in certain ex-
terval for 19 by inverting the test (Cornfield, 1956; treme situations in which asymptotic interval esti-
Mantel and Hankey, 1971'; Thomas, 1971). The mates do not. Suppose the conditional inference
lower endpoint is the 8, value for which P = a / 2 uses the distribution of TI, given To. When T,
in testing H,: 19 = 8, against H I : 8 > 8,. The upper assumes its maximum or minimum value, the un-
endpoint is the 8, value for which P = a / 2 in conditional (or conditional) ML estimator of 6 and
testing Ho against H,: 8 < 8,. If n,, = 0, then the its asymptotic standard error do not exist (unless
lower endpoint is 0, and one uses P = a in obtain- one adds some constant to the cells), but "exact"
136 A. AGRESTI
one-sided confidence intervals do exist. For a point pair, one eliminates the nuisance parameters { a h }
estimate in such cases, some authors (e.g., Hirji, by conditioning on the pairwise success totals
Mehta and Patel, 1987, 1988; Hirji, Tsiatis and { ylh + Y , ~ }Given
. a total of 0, P(Ylh = YZh= 0) =
Mehta, 1989) use median unbiased estimates. To 1, and given a total of 2, P(Ylh = Y,, = 1) = 1.
illustrate, for a 2 x 2 table, when n,, = 0, the ML The conditional distribution of ( Y, ,, Y2,) depends
estimator of the loglinear association parameter on 0 only when y,, + YZh = 1. For each of the n*
(the log odds ratio) and the ML estimator ( C C l / nij) such pairs, direct calculation using (2.2) shows that
of the asymptotic variance of the ML estimator do P(Ylh = 1, Y,, = 0) = [l + exp(0)l-l. Since n,, =
not exist. The median unbiased estimate is the 8, Cy,, for these n* pairs, the distribution of n,,
value that gives P = 0.5 and the "exact" upper conditional on n* is binomial with parameter [l +
100(1 - a)% confidence limit is the 8, value that exp(0)I-l. The parameter equals 112 under
gives P = a , in testing against HI: 8 < 8,. marginal homogeneity ( 0 = 0). This conditional
analysis implies that pairs having identical re-
2.3 Comparing Dependent Proportions sponse at the two occasions are irrelevant to testing
Next, consider the comparison of two proportions a , + =a+,.
when each sample contains the same subjects, or Cox (1966) generalized this analysis for data in
the sample consists of matched pairs. For instance, which each case is matched with several controls.
one might have repeated measurement of a binary Gart (1969) gave an exact test for comparing
response at two occasions. Let nij be the number of matched proportions in crossover designs.
observations in category i at occasion 1 and cate-
gory j at occasion 2. Let a i j denote the probability 3. EXACT INFERENCE IN I x J TABLES
of response i at occasion 1 and response j at occa- For I x J tables, statistical independence of X
sion 2, and let pij = nij/n. The sample proportions and Y corresponds to the loglinear model M, =
of " S U C C ~ S S ~ S "p l + at occasion 1 and p + , at occa- (X, Y), which has form
sion 2 are dependent, rather than independent,
because of the matching. The hypothesis of log mij = p + AT + A;.
marginal homogeneity, H,: a, = a ,, corresponds
+ +
to homogeneity for the 2 x 2 table consisting of the To test this model against the saturated model
two marginal distributions. For binary responses, [ M, = (1.1) extended to I rows and J columnsl, one
marginal homogeneity is equivalent to symmetry, uses the null distribution of inij} given { n , + }and
a,, = a,, To obtain an exact test, one conditions
on n* = n,, + n,,. Under H,, n,, has a binomial For multinomial sampling, the cell probability
(n*, 112) distribution. A two-sided p-value parameters {aij} in the distribution of {n,,} can be
is the sum of binomial probabilities for n,, val- expressed in terms of the marginal probabilities
ues at least as far from n*/2 as observed. This and (I - 1)(J - 1) odds ratios. Conditional on
is a small-sample version of McNemar's test i n , + } and {n+j}, Cornfield (1956) noted that the
(McNemar, 1947). distribution of inij} depends only on the odds ra-
Cox (1958a) provided an argument, of which I tios. The conditional probabilities are proportional
now sketch the outline, that motivates this condi- to
tional approach. Let ( Y,,, Y,,) denote the hth pair
of observations, where a "1" response denotes cate-
gory 1 (success) and "0" denotes category 2.
Consider the logit model
When I = 2, ordering the tables by T is equiva- quencies { mijk),model (XY, XZ, YZ) has the form
lent to ordering them by U = Cj yjnIj. Many statis-
tical tests use U for various choices of scores log mi,, =p + A?+ A+: % + hEY+ +
(Graubard and Korn, 1987). For arbitrary mono-
tone scores, the exact test for T is a small-sample and model (XZ, YZ) is its special case in which
version of a trend test proposed by Cochran (1954) xu -
{hij 0
- 1.
and Armitage (1955). The statistic with midrank
4.1 Testing Conditional Independence in 2 x 2 x K
scores is used in exact Wilcoxon tests for ordered
Tables
categorical data (Klotz, 1966; Mehta, Pate1 and
Tsiatis, 1984). A common approach to testing conditional inde-
pendence [ M, = (XZ, YZ)] performs the analysis
3.2 Exact Estimation for I x J Tables under the assumption of no three-factor interaction
"Exact" confidence intervals are rarely used for [MI = (XY, XZ, YZ)]. The sufficient statistics are
I x J tables, probably because of the complexity. the X-Z and Y-Z two-way marginal tables for M,,
Reducing parameter dimensionality could be use- and these as well as the X-Y marginal table for
ful in many applications, by using an unsaturated MI. The relevant conditional distribution is that of
loglinear model. Agresti, Mehta and Pate1 (1990) the X-Y marginal table, given the X-Z and Y-Z
discussed confidence intervals for ordinal classifica- marginal tables. The conditioned totals are the
tions when odds ratios satisfy the pattern marginal counts for the K partial tables. For the
2 x 2 x K case, the distribution simplifies to that
of = Cknllk, given {nl+k,n2+k,n+lk, n+2k, =
1, . . . , K}. Under the assumption of no three-factor
implied by the linear-by-linear association model interaction, Birch (1964) showed that UMPU tests
(3.2). For this pattern, the non-null distribution of of conditional independence utilize T (see also
T = CCxiyjnij is Lehmann 1986, pages 163-164).
Conditional on the strata margins, inllk, k =
1,. . . , K) have independent hypergeometric distri-
butions, each of form (1.2) with 8 = 1. The product
of the K mass functions determines the null distri-
bution of their sum. For the one-sided alternative
where C, is the sum of (11,IIjnij!)-l for all tables
of a "positive" association (odds ratio greater than
with the given marginal distributions having T = t .
1.0 in each level of Z), the p-value for Birch's exact
One obtains a confidence interval for 0 and thus
test of conditional independence is the null proba-
{aij} by inverting the test of H,:p = Po, as in the
bility that Cknllk is at least as large as observed,
case of the odds ratro for a 2 x 2 table. When T
for the fixed marginal totals. Thomas (1975),
takes its maximum or minimum possible value for
Pagano and Tritchler (1983b), and Mehta, Pate1
the given marginals, the conditional and uncondi-
and Gray (1985) gave algorithms for implementing
tional ML estimators of P and asymptotic standard
this test, which may be regarded as an exact small-
errors do not exist, but one-sided confidence bounds
sample version of the Cochran-Mantel-Haenszel
for 0 and odds ratios using it do exist.
test.
The exact McNemar test for matched pairs (Sec-
4. EXACT INFERENCE IN THREE-WAY
tion 2.3) is a special case of Birch's exact test of
CONTINGENCY TABLES
conditional independence. In this representation,
I,next consider three-way tables, paying special each of the n matched pairs has a 2 x 2 table
attention to the 2 x 2 x K case. Such tables occur, relating occasion (or member of pair) to response.
for instance, when one compares a binary response One tests M, = conditional independence of occa-
(Y) for two treatments (X), using data obtained at sion and response, given the pair, under the as-
K levels of a possibly confounding factor (2).Two sumption of MI = homogeneous odds ratios in the
hypotheses of importance are (1) conditional inde- n 2 x 2 tables.
pendence of X and Y, given Z, and (2) no three- For 2 x 2 x K tables in which it is unrealistic to
factor interaction, meaning' that the true X-Y odds expect the K conditional X-Y odds ratios to be
ratio is identical at each level of Z. Conditional similar or even of the same sign, the saturated
independence of X and Y, given Z, corresponds to model is a more relevant alternative than no
the loglinear model symbolized by (XZ, YZ), and three-factor interaction. In that case, the efficient
no three-factor interaction corresponds to the log- score statistic is CkX;, where X l denotes the
linear model (XY, XZ, YZ). For cell-expected fre- Pearson statistic for testing independence between
EXACT INFERENCE FOR CONTINGENCY TABLES 139
X and Y within the kth level of Z. This statistic is wirth (1988, page 266). The data refers to P =
commonly used in asymptotic tests of conditional whether promoted and R = race, stratified by M =
independence against the general alternative. month of promotion consideration (in 1974). Cases
Currently, there do not seem to be any computer involved GS-13 level computer specialists being
algorithms for this case. considered for promotion to level GS-14. (It appears
that many of the subjects appeared in two-or all
4.2 Testing Homogeneity of Odds Ratios in three strata, but this information is not available.
2 x 2 x K Tables So, like Gastwirth, I treat the promotion decisions
Birch's exact test using test statistic C,n,,, as- as independent.) Under the assumption of a con-
sumes homogeneity of the odds ratios in the 2 x stant odds ratio 8 between P and R at each level of
2 x K table. Zelen (1971) presented an exact test of M, we first test H,: 8 = 1 against H,: 8 < 1. In
this assumption. Here, M, = (XY, XZ, YZ), and using the one-sided alternative for the association
M, is the saturated model. The conditional distri- between P and R, the test is sensitive to evidence of
bution, given each of the three sets of two-way possible discrimination against blacks, in the sense
marginal totals, has probabilities proportional to of the probability of promotions being lower for
(IIiIIjIIknijk!)-l.For the 2 x 2 x K case, the fixed blacks than whites. Given the P and R marginal
totals are in,,,, n,+,, n,,,, n,,,, k = 1,. . K) a ,
totals at each level of M, n,,, can range between 0
and {n,,,, n,,,, n,,,, n,,,). Zelen defined the p- and 4, n,,, can range between 0 and 4, and n,,,
value to be the sum of probabilities of all 2 x 2 x K can range between 0 and 2. The test statistic T =
tables that are no more probable than the observed Cn,,, can assume values between 0 and 10, and
table. Alternatively, one could define P by order- under H,, E(T) = 2.90 and u(T) = 1.35. Note that
ing the tables with the given two-way margins the sample data represent the most extreme possi-
by x2 for testing the fit of loglinear model ble result in each of the three cases. The observed
(XY, XZ, YZ). Thomas (1975) and Pagano and test statistic is T = 0, and the p-value is the null
Tritcher (1983b) gave algorithms for implementing value of P(T r 0), which is 0.026. Zelen's test of
Zelen's test. the hypothesis of a constant odds ratio is a test of
To improve potential power in testing model fit of the loglinear model (PR, PM, RM). The refer-
(XY, XZ, YZ), one could instead test it against an ence set consists of the subset of these 2 x 2 x 3
unsaturated model. When the levels of Z have a tables that also satisfy n,,+= 0. The observed table
natural ordering, one could use the alternative log- is the only such table, so the test is degenerate,
linear model by which the log odds ratios change giving P = 1.0.
linearly across the K strata; that is, 4.3 Exact Estimation in 2 x 2 x K Tables
log mi,,= (XY, XZ, YZ) + I(i =j = 1) 6zk, Assuming no three-factor interaction and condi-
tioning on the strata totals, the joint distribution of
for fixed monotone scores { z,), where I(.) is the
,,,,
{ n . . . , n,,,) is the product of K terms of the
type given in (1.2) for the 2 x 2 case. This distribu-
indicator function. The relevant distribution is then tion depends only on the common odds ratio. Birch
that of ~ , z , n , , , , conditional on all two-way (1964) discussed the conditional ML estimator of
marginal totals. Zelen (1971) also presented such the common odds ratio, and Gart (1970) defined
an exact test. "exact" confidence intervals as a direct extension
To illustrate some exact tests for conditional in- of Cornfield's "exact" intervals for 2 x 2 tables.
dependence and for homogeneity of odds ratios for Thomas (1975), Pagano and Tritchler (1983b),
2 x 2 x K tables, consider Table 4, a 2 x 2 x 3 Mehta, Pate1 and Gray (1985), and Vollset, Hirji
table based on a larger table presented by Gast- and Elashoff (1991) provided algorithms for calcu-
lating such intervals. When the sufficient statistic
T = C knllk attains its minimum or maximum pos-
TABLE4 sible value, asymptotic confidence intervals (e.g.,
Example of 2 x 2 x K analysis using the Mantel-Haenszel approach; see Agresti
July ' August September
1990, pages 235-236) do not exist, but one-sided
promotions promotions promotions "exact" confidence intervals are well defined.
Race Yes No Yes No Yes No
Table 4 has the boundary value T = 0, resulting
in a degenerate conditional ML estimate of 0.0 for
Black 0 7 0 7 0 8 an assumed constant odds ratio. An "exact" 95%
White 4 16 4 13 2 13 upper confidence bound for the odds ratio equals
Source: Gastwirth (1988). 0.779, the value 8, that gives P = 0.05 in testing
140 A. AGRESTI
H:, 19 = 6 , against HI: 19 < 19,. The median unbi- (XZ, YZ)] using an exact test of independence of X
ased estimator equals 0.152. and Z in that two-way table. For instance, one can
conduct an exact test of ( P , RM) against (PR, RM)
4.4 Exact Methods for I x J x K Tables by simply testing whether P and R are indepen-
In principle, methods for exact testing and esti- dent in the marginal table (0,22/10,42); for this
,nation extend to loglinear models for multiway table, the Pearson statistic has two-sided P =
tables. Current computational algorithms are re- 0.056. Similarly, one can conduct an exact test of
stricted to certain analyses for 2 x J x K tables. In [I: ( X, Y, Z)] against [2: ( X, YZ)] using an exact
this subsection, I outline the types of inferences for test of independence of Y and Z in the two-way
which it would be useful to extend computational Y-Z marginal table. Computational algorithms are
algorithms in the I x J x K case. unavailable for the other situations, discussed in
Consider the hypotheses corresponding to the fol- the remainder of this subsection.
lowing five hierarchical loglinear models: To test hypothesis [I: (X, Y, Z)l against the satu-
rated model, one conditions on the sufficient statis-
(1) (X, Y, 2): Mutual independence of X, Y, tics for (X, Y, Z), which are {n,++,n+j+,n++,).
and Z; The resulting mass function is
(2) ( X, YZ): Joint independence of X and the
Y-Z classification;
(3) ( XZ, YZ): Conditional independence of X
and Y, given Z;
(4) (XY, XZ, YZ): No three-factor interaction;
(5) ( XYZ): Saturated model.
For each of models (1)to (4), one can consider exact Stumpf and Steyn (1986) gave formulas for the
testing against the alternative of the next most first- and second-order moments of the cell counts.
complex model, and against the general alternative To construct an exact test of hypothesis [I:
of the saturated model. ( X , Y, Z)l against the saturated model, one could
Certain tests are special cases of ones already use this distribution to generate the exact con-
developed for two-way tables, so they do not require ditional distribution of the Pearson statistic for
separate consideration. Hypothesis [2: (X, YZ)] is a testing the fit of model (X, Y, 2 ) . This case is
special case of statistical independence for a two- relatively unimportant, as the hypothesis of mu-
way table, in which the second classification con- tual independence is plausible in very few applica-
sists of the J K combinations of categories of Y and tions.
Z. Thus, an exact test of [2: (X, YZ)] against [5: A more important case is a test of conditional
(XYZ)] is simply a standard exact test of indepen- independence [3: ( XZ, YZ)] of X and Y against
dence for a two-way ( I x J K ) table. For instance, [4: (XY, XZ, YZ)]. Such a test generalizes Birch's
in Table 4, an exact test of ( P , RM) (i.e., that test for 2 x 2 x K tables. Here, one tests condi-
promotion is jointly independent of race and month tional independence under the assumption that the
of decision) is an exact test for the two-way table ( I - 1 ) ( J - 1) odds ratios relating X and Y are
having rows (0,4,0,4,0,2/7,16,7,13,8,13). The identical across the K levels of Z. Model (XZ, YZ)
Pearson test statistic of 5.62 has an exact p- has the sufficient statistics ({ni+k),{n+jk)), and
value of 0.353. [The attained significance for model (XY, XZ, YZ) has these plus { nij+). One
this joint test is weak compared to the p-value of uses the distribution of ({ nij+} I { ni+k),
0.026 obtained for the P-R association alone in which is a product of multivariate hypergeometric
the 'one-sided exact test of (PM, RM) against distributions from the various layers of Z. Let d
(PR, PM, RM). This shows how severely the evi- denote the ( I - 1)(J - 1) x 1 vector having ele-
dence represented by an effect of a certain size can ments
diminish when the degrees of freedom on which it
is based increase drastically.]
A test of [2: (X, YZ)] against [3: (XZ, YZ)l tests
whether the hxZ term in model (XZ, YZ) is zero.
By standard collapsibility results (e.g., Bishop,
1971; Agresti, 1990, pages 146 and 230), when this
model holds the hXZ term is identical to the hXZ The efficient score test orders tables in the refer-
term in the saturated two-way loglinear model for ence set using d'VP1d, where V is the null co-
the marginal X-Z two-way table. Thus, one can variance matrix of d. Birch (1965) proposed an
conduct an exact test of [2: (X, YZ)] against [3: asymptotic test of this type.
EXACT INFERENCE FOR CONTINGENCY TABLES 141
Alternatively, one could test conditional indepen- where the two rows for stratum k contain one
dence [3: (XZ, YZ)] against [5: (XYZ)], if one be- observation in each row, giving the responses at
lieves that the association between X and Y may the two occasions for subject k. Such a test is an
vary considerably across levels of Z. An efficient exact analog of asymptotic tests described by White,
score statistic is then the Pearson statistic for test- Landis and Cooper (1982) and Kuritz, Landis and
ing the fit of (XZ, YZ), which is C,X;, where X; Koch (1988).
is the Pearson statistic for testing independence Interval estimation for I x J x K tables would
between X and Y within the kth level of Z. This seem to be awkward except for simpler models that
test would also use the conditional distribution that reduce the dimensionality of the parameter space.
fixes the XZ and YZ marginal tables. Since it has Examples include models that describe the X-Y
a broader alternative than the generalized Birch conditional association by a linear-by-linear term
test, this test would tend to be less powerful than or describe three-factor interaction by a linear trend
that test when model (XY, XZ, YZ) provides a de- in conditional log odds ratios.
cent approximation to the actual distribution.
An exact test of no three-factor interaction [4: 5. EXACT INFERENCE IN LOGISTIC
(XY, XZ, YZ)] for I x J x K tables generalizes Ze- REGRESSION MODELS
len's test for 2 x 2 x K tables. Here, the null hy-
Exact inferences for loglinear models discussed
pothesis states that the (I - 1)(J - 1) odds ratios
in previous sections have counterparts for other
relating X and Y are identical across the K levels
generalized linear models that use natural parame-
of Z. The relevant conditional distribution has
ters as the basis of the link function. A closely
probabilities proportional to (IIiIIjIIknijk!)-l,for
related example for categorical data is logistic re-
the reference set of tables having XY, XZ and YZ
gression modeling. Assuming a binomial distribu-
marginal tables identical to the observed ones. An
tion with parameter ?r for the response, one uses
efficient score test against the general alternative
the logit link, log[?r/(l - T)]. Logistic models are
[5: (XYZ)] orders tables in the reference set by the
particularly useful when highlighting one categori-
Pearson statistic for testing the fit of the model.
cal variable in the contingency table as a response
For ordinal variables, one would modify the above
and the others as explanatory. Loglinear models do
ideas by constructing tests to increase power against
not make this distinction, although logistic models
important alternatives, such as has been done for
with qualitative explanatory variables have equiv-
I x J tables. For instance, to test conditional inde-
alent loglinear representations.
pendence (XZ, YZ), one might choose an alter-
For subject i, let yi denote a binary response,
native model that implies a monotone conditional
and let x i = (xio,xi,, . . . , xi,) denote values of k
X-Y association. One could use the single degree- explanatory variables, where xi, = 1. The logistic
of-freedom test statistic C ,{ C C xi yj[ nijk -
regression model is
(ni+kn+jk)/n++kl) to order the reference set of ta-
bles having the fixed X-Z and Y-Z marginal tables.
This results from comparing model (XZ, YZ) to a
model of homogeneous linear-by-linear association,
whereby the term in model (XY, XZ, YZ) is
replaced by a term @xiyj for fixed monotone row
and column scores. Such a test would be an exact
analog of an asymptotic score test proposed by
Mantel (1963) and Birch (1965). .
There has been some work of this type for the Under the usual assumption that { yi) are indepen-
2 x J x K case with ordered levels of Y (Mehta, dent Bernoulli outcomes, the sufficient statistic for
Pate1 and Senchaudhuri, 1991). In this case, one Pj is Tj = Ciyixij, j = 0,. . . , k. As noted by Cox
can take x, = 1and x, = 0, and consider the condi- (1958b, 1970), one can conduct exact inference for
tional distribution of C k[Cjyjnijk].For preselected Pj using the distribution of Tj, conditional on {Ti,
monotone scores, this gives a stratified version of i # j). Such inference is called conditional logistic
the Cochran-Armitage trend test. For rank { yj) regression (Bayer and Cox, 1979; Breslow and Day,
scores, it gives a stratified version of the Wilcoxon 1980, Chapter 7; Tritchler, 1984; Hirji, Mehta and
test. Another application of this case is testing Patel, 1987).
marginal homogeneity in an I x I table with the To illustrate, Table 5 shows some data from a
same ordered row and column categories. One can case-control study (Shapiro et al., 1979) relating
conduct this test by conducting the exact test of cigarette smoking to myocardial infarction for
conditional independence for the 2 x I x n table, women of various ages using oral contraceptives.
142 A. AGRESTI
TABLE5
continuous. The { y,) values may be completely
Example for exact logistic analysis
time increasing polynomially in the sample size, on test statistic values for tables having path pass-
rather than exponentially. However, Vollset, Hirji ing through that node. In this way, one can deter-
and Elashoff (1991) noted that this method's use of mine tables that necessarily do or do not contribute
trigonometric functions and complex arithmetic can to the p-value, without processing the remaining
introduce substantial round-off errors for non-null parts of paths in the network passing through that
calculations when there is a wide range between node. Vollset, Hirji and Elashoff (1991) gave an
the largest and smallest of the combinatorial coeffi- adaptation of the network algorithm that uses an
cients that determine the exact distribution (e.g., a algebraic rather than geometric representation,
ratio of the two exceeding about lo2'). treating convolutions of distributions using polyno-
Among the most popular and versatile programs mial multiplication. Their algorithm computes the
developed in the past decade have been ones using null distribution on the natural rather than loga-
the network algorithm. This algorithm has been rithmic scale. Using this approach, they obtained
applied to several problems in a series of papers by faster computation of "exact" confidence limits for
Cyrus Mehta, Nitin Pate1 and some coworkers. For the common odds ratio in 2 x 2 x K tables.
instance, see Mehta and Pate1 (1983, 1986) for its These days, algorithms such as the network algo-
application to Freeman-Halton exact tests for I x J rithm can practically handle analyses for most I x
tables; Mehta, Pate1 and Tsiatis (1984) for exact J and 2 x 2 x K tables having a small to moderate
tests for 2 x J tables; Mehta, Pate1 and Gray (1985) number of cells and small to moderate cell counts.
for inference for the common odds ratio in 2 x 2 x K To illustrate, Table 6 reports the CPU time on a
tables; Hirji, Mehta and Pate1 (1987) for exact lo-
gistic regression; Agresti, Mehta and Pate1 (1990)
for exact inference in I x J tables with ordered
categories; Mehta, Pate1 and Senchaudhuri (1991)
for inference for 2 x J x K tables; and Hilton,
Mehta and Pate1 (1991) for Smirnov tests for cate-
gorical or continuous data.
I provide here only a brief outline of the network
representation for the reference set for an I x J
table with fixed margins, and refer the reader to
the previously mentioned papers for technical de-
tails on the algorithm itself. The network represen-
tation consists of nodes and arcs, constructed in
+
J 1 stages. For k = 0,. . . , J, the nodes at stage
k have the form (k,w,), where w, = (w,, , . . . , w,,),
with wik = nil + .. + iiik and w, = 0. There are
as many nodes at stage k as there are possible
partial sums for the first k columns of the table.
Arcs emanate from each node at any stage k, each
arc being connected to a distinct node at stage FIG. 1. Network representation for 3 x 3 tables having row
+
k 1. The network is constructed recursively by margin ( 6 , 1 , 2 )and column margin ( 1 , 2 , 6 ) .
specifying all successor nodes (k + 1,wk+,) that
are connected by arcs to each node (k, w,). A path
through the network is a sequence of arcs (0,O) -,
TABLE 6
(1, w,) -, . + ( J , w,). Each path represents a Sample CPU times (seconds) for Freeman-Halton test on
distinct table in the reference set, with entries I x I tables with uniform marginal counts andp-values
(w,+, - w,) in column k + 1. The network repre- approximately 0.05, using SAS on a DEC 3100 workstation
sentation is used in calculating the exact distribu-
tion by stagewise recursion, beginning at node Total sample size
(0,O). Figure 1 shows the network representa-
tion for the 3 x 3 table discussed in Section 3.1
having row margins (6,1,2) and column margins
(1, 2, 6). The topmost path gives the table
3/0,0,1/0,0,2).
For simply calculating p-values, one can dramat-
ically increase the speed of the network algorithm
by computing at each node lower and upper bounds a More than 6 hours.
EXACT INFERENCE FOR CONTINGENCY TABLES 145
DEC 3100 workstation needed for SAS (PROC of 0.0004 and a 99% confidence interval for that
FREQ) to perform the Freeman-Halton test for exact p-value of (0.0000,O.0009).
I x I tables of various sizes having uniform An advantage of the Monte Carlo method is that
marginal totals and cell frequencies changed from the amount of computational work is much less
uniformity sufficiently to give p-values approxi- dependent on the sample size n and table size I x J
mately equal to 0.05. Times are much faster when than for methods for exact analysis. For a method
marginal counts are nonuniform. For instance, a of simulating tables that involves taking a random
5 x 5 table with n = 100 having both margins permutation of n integers, Agresti, Wackerly and
equal to (60,25,10,3,2) took only 40 seconds CPU Boyett (1979) noted that the CPU time is approxi-
time. mately linear in n and stable in I and J. Patefield
The remaining problematic area is large, sparse (1981) provided a method that is more efficient for
tables, for which there are a large number of cells large n. Although it takes longer to generate each
and cell counts too small to appeal to standard table with Monte Carlo methods, only a relatively
asymptotic theory. For two-way tables, examples small number need to be generated. In principle,
would be tables of at least about 50 cells, having this method could be used to approximate precisely
several fitted values less than about 5. Even with any exact analysis, including those for which exact
current computing power, the conditional reference calculations may never be feasible.
set for such tables is often too large to be handled Mehta, Pate1 and Senchaudhuri (1988) described
by exact methods. Moreover, the cardinality grows a more sophisticated and faster Monte Carlo ap-
so rapidly as a function of the number of cells that, proach, using importance sampling. Tables are
regardless of future improvements in computer sampled from the conditional reference set in pro-
power, it may always be possible to produce tables portion to their importance for reducing the vari-
that cannot be handled exactly. ance of the estimated p-value, rather than in
A good compromise for handling large, sparse proportions corresponding to their hypergeometric
tables is to estimate precisely the inferential char- probabilities. In importance sampling, each Sam-
acteristics of interest, such as exact p-values and pled table provides an estimate of the p-value that
confidence intervals. One can do this using Monte is designed to be much better than the crude
Carlo sampling of tables in the reference set, by Bernoulli estimate provided by Monte Carlo Sam-
simulating the conditional hypergeometric Sam- pling. The tables are sampled using a network
pling distribution (Agresti, Wackerly and Boyett, algorithm. For linear rank tests for 2 x J tables,
1979; Boyett, 1979; Cox and Plackett, 1980; Pate- they noted that importance sampling can be up to
field, 1981, 1982; Kreiner, 1987; StatXact, 1991). four orders of magnitude more efficient than Monte
Each sampled table provides a Bernoulli random Carlo sampling. That is, to achieve a certain fixed
variable, indicating whether the test statistic is at accuracy with a p-value estimate, the ratio of the
least as large as observed. The estimated exact number of tables sampled using the Monte Carlo
p-value is the sample mean of those Bernoulli ran- approach versus importance sampling was about
dom variables, which is the proportion of sampled 10,000 for tests such as the trend test. However,
tables that have test statistics at least as large as the initial overhead involved in using importance
the observed one. The precision of the estimate sampling, due partly to using backward induction
is determined by the estimated sample variance with the network algorithm to set up the network-
P ( l - P)/N, where N is the number of tables Sam- based sampling scheme, makes it inefficient for
'
pled. Sampling of 17,000 tables ensures the esti- certain very large data sets.
mate is good to within 0.01 with confidence at least
7.2 Software
0.99.
Agresti (1990, page 308) gave a 16 x 5 table with Until recently, software for exact methods for
219 observations, relating various characteristics of contingency tables was nearly nonexistent, at least
alligators to their primary food choice (having cate- in the most popular statistical packages. Even now,
gories: fish, invertebrate, reptile, bird, other). For nearly all packages can perform Fisher's exact test
such large tables, one could either sample a fixed but little if anything else. With a couple of excep-
number of tables N to guarantee a certain accu- tions, our discussion here is limited to the most
racy, or repeatedly take samples of N' tables until commonly used packages.
achieving a certain accuracy. Using the latter ap- SAS (using procedure FREQ) and IMSL (using
proach with N' = 2,000 and desired accuracy routine CTPRB) can perform Fisher's exact test
0.0005 for a p-value for testing independence with and the Freeman-Halton extension for I x J ta-
the Pearson statistic, Monte Carlo sampling of bles, but do not give options to perform tests that
10,000 tables provides an estimated exact p-value base the p-values on goodness-of-fit statistics or
146 A. AGRESTI
ordinal statistics. SAS uses the network algorithm in the conditional reference set. Vollset and Hirji
from Mehta and Pate1 (1983), whereas IMSL uses (1991) gave a fast GAUSS program for the exact
an algorithm that enumerates the entire reference test of conditional independence and confidence in-
set. Thus, although neither program seems to have terval for a common odds ratio, and indicated that
limits on table sizes, SAS can handle a much greater it can handle up to about 1,000 points in the distri-
variety of tables in a reasonable amount of time. bution of C , n,,, .
Currently, BMDP and SPSSXonly perform Fisher's
exact test. All these packages seem to use the table 8. OTHER APPROACHES TO EXACT
probability as the basis of ordering the reference INFERENCE
set for two-sided p-values.
StatXact (1991) is a statistical package specializ- This discussion has focused on the classical condi-
ing in exact nonparametric inference and in exact tional approach to exact inference for contingency
inference for contingency table problems. Devel- tables. This section discusses controversies re-
oped by Mehta and Pate1 and colleagues, it uses garding that approach and describes alternative
versions of the network algorithm described in their approaches that produce results having some con-
articles. For 2 x J tables, StatXact performs a gen- nection with those for exact conditional methods.
eral linear rank test that includes as special cases
8.1 Controversy Over Exact Conditional Approaches
the Wilcoxon test and a trend test with arbitrary
scores. For I x J tables with min(I, J ) I 5, it can Most of the debate about exact conditional meth-
perform the Freeman-Halton test and exact tests ods for categorical data has focused on their use
of independence using the Pearson or likelihood- with Fisher's exact test when both margins of the
ratio chi-squared statistics. For tables with ordered table are not naturally fixed. I discuss the contro-
columns, it can perform the exact test using the versy only briefly, as it has already generated an
Kruskal-Wallis statistic when max(I, J ) I 5. enormous literature. See, for instance, Barnard
When rows are also ordered, it can perform the (1945, 1947, 1949, 1979, 1989), Berkson (1978),
exact test of linear-by-linear association described Basu (1979), Kempthorne (1979), Upton (1982),
by Agresti, Mehta and Pate1 (1990) when min(I, J ) Suissa and Shuster (1984, 1985), Yates and discus-
s 5, and it can use the Jonckheere-Terpstra statis- sants (1984), Bhapkar (1986), Haber (1987, 1989),
tic when I s 5. For 2 x 2 x K tables, it performs D'Agostino, Chase and Belanger (1988), Lloyd
Birch's exact test of conditional independence (for (1988b), Rice (1988), Little (1989), Camilli (1990),
K I 200), Zelen's exact test of homogeneity of odds Mehta and Hilton (1990), Routledge (1990), Storer
ratios, and "exact" confidence intervals for an as- and Kim (1990), and Greenland (1991).
sumed common odds ratio. For 2 x J x K tables, The perceived problem with the test results
StatXact performs str-atified linear rank tests and mainly from the conditional distribution of n,, (or
can perform inference for parameters in conditional the odds ratio) being highly discrete, much more so
logistic regression models. For I x J and 2 x J x K than when one or neither margin is fixed. This
tables, StatXact performs Monte Carlo sampling results in the test being quite conservative in a
for cases beyond its capability for exact inference conditional or unconditional sense, when used with
(as long as I s 50, J s 50, K 5 200, and I J K s a fixed significance level such as a = 0.05. The
2500), and it can perform importance sampling for actual probability of rejecting the null hypothesis
2 x J tables. The StatXact manual is also a good may be considerably less than the nominal level.
reference for many examples of exact conditional Proponents of Fisher's test (e.g., Yates, 1984) ar-
analyses. Many of the StatXact routines are gue that (1) one should not use arbitrary fixed
also available in the EGRET statistical software significance levels (versus simply reporting the
package (EGRET, 1991). p-value), (2) one should not average into the cal-
Baptista and Pike (1977) gave a FORTRAN pro- cu lation of the p-value other tables whose mar-
gram for the Sterne-type confidence interval for the g i n a l ~did not occur, and (3) no substantive loss of
odds ratio in a single 2 x 2 table. For 2 x 2 x K information about H,, results from conditioning
tables, Thomas (1975) gave a FORTRAN program on the marginals (i.e., the marginal counts are
for the conditional ML estimate and an "exact" approximately ancillary).
confidence interval for a common odds ratio, and The randomized-decision version of Fisher's exact
for exact tests of conditional independence and test, using randomization on the boundary of a
no three-factor interaction (when K I 20). This critical region to achieve a fixed significance level
program can be slow since, unlike algorithms a , is UMPU (Tocher, 1950). This is of little solace
presented by Pagano and Tritchler (1983b) and for practical work, since randomization is not used,
StatXact (1991), it requires evaluating every table although it indicates that Fisher's test may also
EXACT INFERENCE FOR CONTINGENCY TABLES 147
perform well for the mid-p definition of a p-value. and Belanger, 1988). Others recommend using an
The mid-p value is half the probability of the ob- "exact" unconditional test.
served result plus the probability of more extreme To illustrate the latter approach, consider testing
values. For discrete data, the mid-p value has null n, = n, (0 = 1)for two independent binomial sam-
behavior more nearly like a uniform ( 0 , l ) random ples. One first computes an exact p-value P ( a ) for
variable than the ordinary p-value (e.g., its null each possible common value n of n, and n,. For
expected value is 0.5, and the sum of its two one- this p-value, one might use the product binomial
tailed p-values equals 1.0). It has been recom- probability that a z statistic (or chi-squared statis-
mended (e.g., by Lancaster, 1961; Plackett in the tic) for comparing two proportions is at least as
discussion of Yates, 1984; Barnard, 1989, 1990) as large as observed, when n is the value of the
a good compromise between having a conservative nuisance parameter. The global p-value for the test
test and using randomization on the boundary to with unknown n is then P = sup, P(n). Proposed
eliminate problems from discreteness. For compar- by Barnard (1945, 1947), this test was disavowed
ing two binomial probabilities n1 and a,, Hirji, by him in later publications (e.g., Barnard, 1949,
Tan and Elashoff (1991) noted that the mid-p ad- discussion of Yates, 1984). Boschloo (1970), McDon-
justment to Fisher's exact test has actual levels of ald, Davis and Milliken (1977), Suissa and Shuster
significance closer to nominal levels than do classi- (1984, 1985), Haber (1986a, 1987, 1989), Shuster
cal asymptotic tests. This is especially true when (1988) discussed computational implementation of
the common value of i n i ) is near 0 or 1or when the the unconditional test. Suissa and Shuster (1991)
sample sizes nl+ and n,, are quite different. Haber extended it to comparisons of dependent propor-
(1986b) obtained similar conclusions both for com- tions for matched pairs.
paring binomials and for multinomial sampling The 2 x 2 table having entries (3,0/0,3), dis-
over the four cells. Hirji (1991) showed that, for cussed by Barnard (1945) and Fisher (1945) [see
tests of parameters in conditional logistic models also Little (1989) and Routledge (1990)1, illustrates
for case-control designs with unmatched binary that results with the conditional and unconditional
covariates, mid-p adjustments to an exact test per-
form well in approximating nominal levels com-
pared to the exact test and asymptotic score tests.
the Fisher p-value is 2(:)(:) fi:)
approaches can be quite discre ant. For HI: 0 # 1,
= 0.100, and the
asymptotic Pearson chi-squared test has a p-value
Analogous remarks apply to "exact" interval es- of 0.014. For the exact binomial test of n, = a,
timation. Although necessarily conservative, "ex- having only fixed row totals (3,3), the Pearson
act" interval estimates for the odds ratio can be so chi-squared value of 6.0 that occurs for the ob-
highly conservative as to be less useful than served table and for table (0,3/3,0) is the maxi-
asymptotic large-sample approaches in terms of mum possible, and the p-value for given nuisance
long-run coverage performance. An adaptation of parameter a is 2a3(1 - a)3. The supremum of this
Cornfield's "exact" method uses a mid-p adjust- over 0 Ia I1occurs at n = 112, giving p-value =
ment, choosing 0 endpoints that have one-sided 1/32. This unconditional p-value is related to
mid-p values of a / 2 . This approach does not guar- Fisher conditional p-values by
antee the desired coverage probability, but simula-
tions by Mehta and Walsh (1992) showed that it 6
performs well in this respect. For small samples, P ( X 2 2 6) = p(x22 6 1 n,, = k) P ( n + , = k ) .
the mid-p-adjusted intervals can be much narrower. k=O
occur with a certain probability, is surely no reason is constant, she showed that the Bayesian evidence
for enhancing our judgment of significance in cases against the null is weaker as the number of pairs
where it has not occurred;. . . it is only the sam- giving the same response at both occasions in-
pling distribution of samples of the same type that creases, for fixed n,, and n,,. This result differs,
can supply a rational test of significance." and is perhaps more intuitively pleasing, than the
The unconditional test is so computationally in- exact conditional ML result. For examples of other
tensive that it seems complex to extend it to larger Bayesian analyses for 2 x 2 tables, see Leonard
contingency-table problems (Mehta and Hilton, (1975), Chen and Novick (1984) and Nurminen and
1990). In addition, removing nuisance parameters Mutanen (1987).
by taking a supremum may itself produce quite
conservative analyses for tables having larger 9. FUTURE RESEARCH
numbers of such parameters. It should be noted By the turn of the century, we should see ad-
that asymptotic procedures can also be quite con- vances in applicability of exact methodology for
servative. For instance, Koehler and Larntz (1980) contingency tables at least comparable to those of
noted that the likelihood-ratio test tends to be the past decade. One does not need a crystal ball to
highly conservative when most expected frequen- predict that computer speed will continue to in-
cies are smaller than 0.5. To illustrate, consider crease and algorithms will be further improved, so
the 3 x 9 table (0, 7, 0, 0, 0, 0, 0, 1, 111, 1, 1, 1, that tables not now feasible for analysis soon will
1,1,1,0,0/0,8,0,0,0,0,0,0,0), discussed in the be. In addition, it is reasonable to expect develop-
StatXact manual. For the likelihood-ratio statistic, ment of algorithms to handle new types of categori-
the asymptotic p-value is 0.0837 and the exact cal data, in particular, more complex relationships
p-value is 0.0015; for the Pearson statistic, the for larger tables in higher dimensions.
values are 0.1342 and 0.0013. This article has emphasized exact inference for
8.3 Bayesian Approaches
contingency tables in the context of loglinear mod-
eling. I believe that presenting the methods in the
For 2 x 2 tables, Bayesian approaches using cer- context of inference for parameters in models helps
tain "conservative" prior distributions give results to unify a variety of exact conditional methods. It
equivalent to conditional tests. For instance, Al- also helps to identify areas in which additional
tham (1969) gave an exact Bayesian analysis com- research would yield fruitful results, such as exact
paring parameters for two independent binomial inferences for I x J x K tables and a general good-
samples. She tested H,: a, 5 a, against a, > a, ness-of-fit test for loglinear models. This section
using a beta(a,, Pi) prior distribution for a,; i.e., summarizes other avenues for future research.
the prior for a, is proportional to (?ri)OL(l - a,)@, An important but difficult area for future re-
with a = a, - 1 and,@= Pi - 1, i = 1,2. The pos- search is exact analysis of model parameters for
+
terior distributigns are beta(a:, (3:) with a: = a i contingency tables that are large and sparse. For
nil and Pi = Pi + ni2.Taking the Bayesian p-value such data, standard asymptotic methods behave
to be the posterior probability that a, 4 ?r, Al- poorly; yet, it has been impractical to apply exact
tham showed this equals the one-sided p-value for methods. The size of the conditional reference set is
Fisher's exact test when one uses improper prior often too large to handle when the table has a large
distributions (a,, (3,) = (1,O) and (a,, (3), = (0,l). number of cells. This can also happen when the
This represents prior belief favoring the null hy- table is small but contains very large as well as
pothesis, in effect penalizing oneself against con- small cell counts. For such problematic tables, there
@,
cluding that a, > ?r,. If a, = =,y, i = 1,2, where should be additional research on (1) hybrid algo-
0 +y I1, Altham showed that the Bayesian p- rithms that use exact methods for some parts of the
value is smaller than the Fisher p-value, and the computation and approximations for other parts
difference between the two is no greater than the (Baglivo, Olivier and Pagano, 1988), (2) fast ways
null probability of the observed data. of simulating the exact distribution (Kreiner, 1987;
Altham (1971) also gave Bayesian analyses for Mehta, Patel and Senchaudhuri, 1988), and (3) bet-
dependent proportions. For a simple model in which ter asymptotic approximations (e.g., Koehler, 1986),
the probability of success is the same for each including saddlepoint approximations (Davison,
subject at a given occasion, 'she again showed that 1988; Booth and Butler, 1990; Bedrick and Hill,
the classical p-value is a Bayesian p-value for a 1992; Pierce and Peters, 1992).
prior distribution favoring the null hypothesis. For An area in which large, sparse tables commonly
a model similar to Cox's, in which the probability occur is modeling of longitudinal data with categor-
of success varies by subject but the occasion effect ical responses. Conditioning on sufficient statistics
EXACT INFERENCE FOR CONTINGENCY TABLES 149
to eliminate nuisance parameters, such as subject in extreme cases in which nearly all observations
random effects, can be very helpful. Even then, fall in one or two marginal categories for each
the large number of cells in the table make exact classification, sampling distributions are much
methods or approximations for them quite chal- "less discrete" for larger tables. As the number of
lenging. Another complication is that some useful dimensions or the number of response categories
models do not have reductions of data through per dimension increases, the sampling distributions
sufficiency, such as loglinear and logistic models that determine exact tests and confidence inter-
for the marginal probabilities rather than joint cell vals rapidly approach a continuous form. Thus, the
probabilities. conservativeness of the inferences becomes less
An area that has scope for lots of additional work problematic. For instance, performance of exact
is exact inference for I x J x K tables. Section 4.4 conditional tests should be closer to that of their
described several exact inferences for standard log- randomized versions, many of which are UMPU.
linear models. Other exact inferences are of inter- Also, the three approaches discussed for confidence
est when at least one variable is ordinal. In testing intervals for odds ratios (test-based with a / 2 in
conditional independence, one could use test statis- each tail, adapted Sterne, test-based with mid-p
tics designed to improve power for narrower alter- values) should all have true confidence levels close
natives. For instance, Section 4.4 discussed the to the nominal one.
possibility of testing conditional independence For cases in which discreteness is problematic,
against a linear-by-linear alternative. When X is there should be more work on developing adapta-
nominal and Y is ordinal, one could use a statistic tions of exact methods that, while perhaps sacrific-
analogous to a stratified Kruskal-Wallis statistic. ing "exactness," give performance closer to the
Such a statistic results from comparing (XZ, YZ) to randomized exact versions. Examples are adapta-
a loglinear model in which the X-Y partial associa- tions of exact tests and confidence intervals using
tion term has the form p i yj, for parametric row the mid-p value (e.g., Hirji, Tan and Elashoff, 1991;
effects { p i ) and rank scores { y j ) . Efficient score Mehta and Walsh, 1992), and tests and confidence
statistics for these cases are equivalent to statistics intervals that use the exact distribution computed
proposed by Birch (1965) and Landis, Heyman and at the ML estimate of the nuisance parameter (e.g.,
Koch (1978) for large-sample testing of conditional Conlon and Thomas, 1990; Storer and Kim, 1990).
independence. Similar remarks apply to exact tests So far there has been relatively little attention
of no three-factor interaction. Greater power for paid to algorithms for calculating power for exact
narrower alternatives (unsaturated models for conditional tests. The literature refers almost ex-
three-factor interaction) would be achieved by clusively to 2 x 2 tables. For instance, Gail and
using statistics that treat one, two or all three Gart (1973), Haseman (1978) and Suissa and
variables as ordinal. - Shuster (1985) discussed the determination of sam-
Of course, exact inferences for standard loglinear ple size for obtaining desired powers in fixed-a use
models and for models recognizing ordinality are of Fisher's exact test.
also relevant for four-way and higher dimensional Finally, it would be very useful to have a gen-
tables. Many tests of mutual independence or con- eral-purpose algorithm for exact tests comparing
ditional independence can be re-expressed in terms two nested loglinear models. This would unify ex-
of two-way or three-way tables, but not all. Also, act tests of goodness of fit for all loglinear models,
complications may result in tests about three-factor as well as tests against unsaturated alternatives.
and higher order interactions. The reference set of In summary, much has been accomplished in the
'tables having the required fixed 'marginals may be past decade, but this is only the tip of the iceberg
difficult to generate or simulate. For instance, in a of what can be done. In the future, statistical prac-
four-way table, an exact test of no three-factor in- tice for categorical data should place more empha-
teraction deals with a reference set in which all six sis on "exact" methods and less emphasis on
two-way margins are fixed. methods using possibly inadequate large-sample
There is unlikely to be as much controversy with approximations.
exact conditional methods for models for higher
dimensional tables as there has been with Fisher's ACKNOWLEDGMENTS
exact test for 2 x 2 tables. That controversy is
probably due less to philosophical disagreements This work was partially supported by NIH Grant
about conditioning than to practical consequences GM43824. The author appreciates helpful com-
of using. continuously defined measures such as ments on manuscript organization and content from
p-values with highly discrete distributions. Except Dr. Cyrus Mehta and two referees. Computations
150 A. AGRESTI
for many of the examples presented in this article BIRCH,M. W. (1965). The detection of partial association 11: The
were obtained using StatXact (1991). general case. J. Roy. Statist. Soc. Ser. B 27 111-124.
AGRESTI, A. (1990). Categorical Data Analysis. Wiley, New York. els. Biometrika 77 787-796.
Psychometrika 42 111-125.
BOYETT,J. (1979). Random R x C tables with given row and
Bwmetrika 58 561-576.
CANTOR, A. B. (1979). A computer algorithm for testing signifi-
ANDERSEN, A. H. (1974). Multidimensional contingency tables.
cance in M x K contingency tables. In Proceedings of the
Scand. J. Statist. 1 115-127.
Statistical Computing Section 220-221. Amer. Statist. As-
ARMITAGE, P. (1955). Tests for linear trends in proportions and
soc., Washington, DC.
frequencies. Biometries 11 375-386.
CHEN,J. J. and NOVICK,M. R. (1984). Bayesian analysis for
290-301.
CONLON, M. and THOMAS, R. G. (1990). A new coddence inter-
Comp. 12 297-301.
Biometrika 34 123-138.
BARNARD,
G. A. (1990). Must clinical trials be large? The inter-
quanta1 data. Biometrika 53 215-220.
319-324.
ternat. Statist. Rev. 57 19-43.
BEDRICK, E. J. and HILL, J. R. (1992). An empirical assessment equality of two independent binomial populations. Amer.
sion parameter. Bwmetrics. To$appear. DAVIS,L. J. (1986). Exact tests for 2 by 2 contingency tables.
BHAPKAR,V. P. (1986). On conditionality and likelihood with generalized linear models. J. Roy. Statist. Soc. Ser. B 50
nical Report 253, Dept. Statistics, Univ. Kentucky. DUPONT, W. D. (1986). Sensitivity of Fisher's exact test to minor
Epidemiology Research Corporation, Seattle. statistics with frequency tables with small expected cell
FISHER,R. A. (1934). Statistical Methods for Research Workers. counts. J. Amer. Statist. Assoc. 83 555-560.
(Originally published 1925, 14th ed. 1970.) Oliver and Boyd, HASEMAN, J. K. (1978). Exact sample sizes for use with the
803-814.
GASTWIRTH,
J. L. (1988). Statistical Reasoning in Law and Pub-
IRWIN,J. 0. (1935). Tests of significance for differences between
Statist. 4 1159-1189.
KLOTZ,J. and TENG,J. (1977). One-wajl layout for counts and
ries 43 471-476.
LANCASTER,
H. 0. (1961). Significance tests in discrete distribu-
GREENLAND, S. (1991). On the logical justification of conditional
tions. J. Amer. Statist. Assoc. 56 223-234.
248-251.
Rev. 46 237-254.
HABERMAN, S. J. (1974). Log-linear models for frequency tables independence in 2 x 2 tables against a two-sided alterna-
HABERMAN,
S. J. (1977). Log-linear models and frequency tables
LLOYD,C. J. (1988b). Some issues arising from the analysis of
152 A. AGRESTI
MCCULLAGH,
P. (1986). The conditional distribution of goodness-
[Correction: (1983) 12 1209-1210.1
145-158.
ing binomial proportions. Biometrics 44 1-22.
MEHTA,C. R. and PATEL,N. R. (1983). A network algorithm for SHAPIRO,S., SLONE,D., ROSENBERG, L., KAUFMAN, D., STOLLEY,
performing Fisher's. exact test in r x c contingency tables. P. D. and MIEWINEN,0 . S. (1979). Oral contraceptive use in
Mass.
exact confidence interval for the common odds ratio in
80 969-973.
999-1005.
STUMPF,R. H. and STEYN,H. S. (1986). Exact distributions
MEHTA,C . R., PATEL,N. R. and SENCHAUDHURI, P. (1991). Exact associated with a n I x J x K contingency table. Comm.
stratified linear rank tests for binary data. In Computing- Statist. Theory Methods 15 1889-1904.
Science and Statistics: Proceedings of the 23rd Symposium SUISSA,S. and SHUSTER, J. J. (1984). Are uniformly most power-
on the Interface. m. M. Keramidas., ed.), 200-207. Interface ful unbiased tests really best? Amer. Statist. 38 204-206.
~oundation;Fairfax Station, VA. SUISSA,S. and SHUSTER, J. J. (1985). Exact unconditional sam-
Assoc. 78 435-440.
UPTON,G. J. G. (1982). A comparison of alternative tests for the
Comment
Edward J. Bedrick and Joe R. Hill
We congratulate
- Professor Agresti
- for his com- CHECKING LOGISTIC REGRESSION MODELS
prehensive review of exact inference with cate-
gorical data. We share his enthusiasm for exact We would like to convey some of our recent work
on model checking for logistic regression and some
conditional methods and believe that the coming
years will produce many important computational of our thoughts regarding conditional inference.
breakthroughs in this area. For the sake of simplicity, we assume that a single
model is under consideration. A little notation is
The mechanics of conditioning on sufficient
statistics to generate reference distributions for es- required. The usual logistic regression model has
two distinct parts: a sampling component and a
timation, testing and model checking with loglin-
structural component. The sampling component
ear models for Poisson data and logistic regression
specifies that Y = (Y,, . . . , Y,)' is a vector of inde-
models for binomial data are well-known, but the
utility of conditioning in these settings is not uni-
pendent binomial random variables with Y, -
versally agreed upon. Furthermore, the role of con- Bin(m,, n,). The structural or regression compo-
nent of the model is given by
ditioning- in the analysis of discrete generalized
-
linear models with noncanonical link functions has logit ( n ) = XP,
received little attention from most of the statistical
community. As a result, scientists and statisticians where logit(*) is a n n x 1 vector of log-odds with
are familiar with conditional methods, but many elements log{ a,/ ( l - n,)), X is a n n x p full-
are unsure how ,such methods should be incor- column rank design matrix with ith row xi, and 0
porated into a n overall strategy for analyzing cate- is a p x 1 vector of unknown regression parame-
gorical data. We feel that the use and abuse of ters. Under model (I), S = X' Y is sufficient for 0.
conditional methods will not be fully understood or Let 7i be the MLE of a under this model.
appreciated without such a strategy. We hope that The distribution of the data pr(Y; P), indexed by
Professor Agresti's survey and the ensuing discus- 0, can be factored into the marginal distribu-
sions stimulate further work in this direction. tion of the sufficient statistic S, and the condi-
tional distribution of the data given the sufficient
statistic: