You are on page 1of 24

A Survey of Exact Inference for Contingency Tables

Alan Agresti

Statistical Science, Vol. 7, No. 1. (Feb., 1992), pp. 131-153.

Stable URL:
http://links.jstor.org/sici?sici=0883-4237%28199202%297%3A1%3C131%3AASOEIF%3E2.0.CO%3B2-A

Statistical Science is currently published by Institute of Mathematical Statistics.

Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at
http://www.jstor.org/about/terms.html. JSTOR's Terms and Conditions of Use provides, in part, that unless you have obtained
prior permission, you may not download an entire issue of a journal or multiple copies of articles, and you may use content in
the JSTOR archive only for your personal, non-commercial use.

Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained at
http://www.jstor.org/journals/ims.html.

Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed
page of such transmission.

The JSTOR Archive is a trusted digital repository providing for long-term preservation and access to leading academic
journals and scholarly literature from around the world. The Archive is supported by libraries, scholarly societies, publishers,
and foundations. It is an initiative of JSTOR, a not-for-profit organization with a mission to help the scholarly community take
advantage of advances in technology. For more information regarding JSTOR, please contact support@jstor.org.

http://www.jstor.org
Mon Aug 27 14:34:53 2007
Statistical Science
1992,Vol. 7,No. 1,131-177

A Survey -of Exact Inference


for Contingency Tables
Alan Agresti

Abstract. The past decade has seen substantial research on exact infer-
ence for contingency tables, both in terms of developing new analyses
and developing efficient algorithms for computations. Coupled with
concomitant improvements in computer power, this research has re-
sulted in a greater variety of exact procedures becoming feasible for
practical use and a considerable increase in the size of data sets to
which the procedures can be applied. For some basic analyses of contin-
gency tables, it is unnecessary to use large-sample approximations to
sampling distributions when their adequacy is in doubt. This article
surveys the current theoretical and computational developments of
exact methods for contingency tables. Primary attention is given to the
exact conditional approach, which eliminates nuisance parameters by
conditioning on their sflicient statistics. The presentation of various
exact inferences is unified by expressing them in terms of parameters
and their sufficient statistics in loglinear models. Exact approaches for
many inferences are not yet addressed in the literature, particularly for
multidimensional contingency tables, and this article also suggests
additional research for the next decade that would make exact methods
yet more widely applicable.
Key words and phrases: Categorical data, chi-squared tests, computa-
tional algorithms, conditional inference, Fisher's exact test, logistic
regression, loglinear models, odds ratios, sufficient statistics.

Table of Contents 4.3 Exact estimation in 2 x 2 x K tables


4.4 Exact methods for I x J x K tables
1. Introduction 5. Exact Inference for Logistic Regression Models
1.1 Historical perspective 6. Exact Goodness of Fit
1.2 Outline and notation 7. Computing Feasibility
1.3 The exact conditional approach 7.1 Algorithms
2. Exact Inference for 2 x 2 Tables 7.2 Software
2.1 Fisher's exact test 8. Other Approaches to Exact Inference
2.2 "Exact" estimation in 2 x 2 tables 8.1 Controversy over exact conditional approaches
2.3 Comparing dependent proportions 8.2 Exact unconditional approach
'3. Exact Inference in I x J Tables' 8.3 Bayesian approaches
f3.1 Linking test statistics to alternatives 9. Future Research
3.2 Exact estimation for I x J tables References
4. Exact Inference in Three-Way Contingency Tables
4.1 Testing conditional independence in 2 x 2 x K 1. lNTRODUCTlON
tables This article surveys the development of exact
4.2 Testing homogeneity of odds ratios in 2 x 2 x K inferential methods for contingency tables. 1inter-
tables relate various inferences by expressing them in
terms of parameters in a hierarchy of loglinear
models. The presentation focuses primarily on
Alan Agresti is Professor of Statistics, University of exact conditional methods, in which one obtains
Florida, Griffin-Floyd Hall, Gainesville, Florida sampling distributions not dependent on other
32611. unknown parameters by conditioning on their
sufficient statistics. I also discuss some long-stand- 1975; Haberman, 1977; Koehler, 1986; McCullagh,
ing controversies for such methods and present 1986; Zelterman, 1987), information on the ade-
topics for additional research that should soon be quacy of these approximations for standard models
feasible given the continual improvement in com- is at an infant stage. [Cressie and Read (1989)
puter hardware and software. surveyed research on the adequacy of various
asymptotic approximations.] Also, simulation stud-
1.I Historical Perspective
ies have shown that it is hopeless to expect simple
Traditionally, statistical inference for contin- guidelines to indicate when asymptotic large-sam-
gency tables has relied heavily on large-sample ple approximations are adequate (e.g., Koehler and
approximations for sampling distributions of pa- Larntz, 1980). Even when the sample size is quite
rameter estimators and test statistics. Many of large, recent work has shown that large-sample
these approximations are special cases of ones that approximations can be very poor when the contin-
apply more generally than to categorical data [e.g., gency table contains both small and large expected
chi-squared approximations for likelihood-ratio frequencies (Haberman, 1988). Regarding ade-
statistics and normal approximations for maximum quacy of asymptotics, the sample size n often has
likelihood (ML) estimators of model parameters]. less relevance than the discreteness of the sam-
With this emphasis on large-sample methods, the pling distribution. Thus, "small-sample methods"
development of inferential methods for categorical for categorical data more accurately refer to
data parallels the historically earlier development methods needed when there are few points or rela-
of inferential methods for continuous data. For in- tively large probabilities in the support of that
stance, E. S. Pearson's recently published manu- distribution.
script about Student notes that studies analyzed by Table 1, taken from Graubard and Korn (1987),
Karl Pearson's laboratory usually involved large- illustrates that different statistics and approxima-
size data sets. When Gosset presented his queries tions can give quite different results, even for very
about small samples that led to his development of large samples. That table, which refers to a
techniques using the t-distribution, Pearson replied, prospective study of maternal drinking and congen-
"Only naughty brewers deal in small samples" ital malformations, has 32,574 observations. For
(Pearson, 1990, page 73). testing independence of alcohol consumption and
R. A. Fisher's Statistical Methods for Research malformation, asymptotic chi-squared tests give p-
Workers was at the forefront of advocating exact values of 0.017 using the Pearson statistic and
procedures for small samples. In the preface of the 0.190 using the likelihood-ratio statistic. Exact tests
first edition of that book (1925), Fisher stated, of a type discussed in Section 3.1 using these crite-
" . . . the traditional machinery of statistical pro- ria give p-values of 0.034 and 0.139, respectively.
cesses is wholly unsuited to the needs of practical A test based on a trend alternative that utilizes the
research. Not only does it take a cannon to shoot a ordering of column categories by assigning scores
sparrow, but it misses the sparrow! The elaborate (0,0.5,1.5,4,7) to them has exact p-value more
mechanism built on the theory of infinitely large than three times the p-value based on an asymp-
samples is not accurate enough for simple labora- totic normal approximation (0.017 versus 0.005 in
tory data. Only by systematically tackling small the one-sided case).
sample problems on their merits does it seem possi- The lag in the development and use of exact
ble to apply accurate tests to practical data." Not inferences for contingency tables is partly ex-
surprisingly, t-procedures received strong emphasis plained by the later development of methods for
iq that text, and "Fisher's exact ,test" for 2 x 2 categorical data compared with continuous data,
contingency tables appeared in the 1934 and but also by the greater computational complexity.
subsequent editions. However, recent advances in computational power
The importance of improving the scope of exact
methods for categorical data has become increas-
ingly clear in recent years. Standard asymptotic TABLE1

methods apply to a fixed number of cells, as cell Maternal drinking and congenital malformations

expected frequencies grow to infinity. Yet, re- Alcohol consumption


searchers often attempt to alialyze additional vari- (average no. of drinks/day)
ables as the sample size grows; thus, large expected Malformation 0 <1 1-2 3-5 2 6
frequencies may be the exception rather than the
norm. Although recent research has introduced new Absent 17,066 14,464 788 126 37

asymptotic approaches that permit the number of Present 48 38 5 1 1

cells to grow as the sample size grows (e.g., Morris, Source: Graubard and Korn (1987).
EXACT INFERENCE FOR CONTINGENCY TABLES 133
and efficiency of algorithms have made exact meth- For two-way contingency tables, let X denote the

ods feasible for a wider variety of inferential analy- row classification and Y the column classification.

ses and for a larger collection of table sizes and For three-way tables, denote the third classification

sample sizes. In this survey of exact inferences for by Z. For simplicity, denote loglinear models by

contingency tables, I indicate which computations standard symbols pertaining to their minimal suffi-

can currently be done and also highlight areas in cient statistics. For instance, (X, Y) denotes the

which current capabilities are inadequate. model of statistical independence in a two-way

table, and (XZ, YZ) denotes the model of condi-

1.2 Outline and Notation


tional independence between X and Y, given Z, in

Sections 2 to 4 present a variety of exact methods a three-way table. The minimal sufficient statistics

for contingency tables in the context of loglinear are i n i + ) and {n+j) for ( X , Y) and in,,,) and

models. Sections 2 and 3 focus on two-way tables, {n+jk) for (XZ, YZ). Finally, denote expected fre-

Section 2 for the 2 x 2 case and Section 3 for the quencies by m and sample proportions by p, for

I x J case. Section 4 focuses on three-way tables, example, {mij = E(nij)} and { pij = nij/n} in a

with emphasis on the 2 x 2 x K case. Section 5 two-way table.

discusses exact inference for logistic regression


models, and Section 6 discusses exact goodness-of-fit 1.3 The Exact Conditional Approach
testing for loglinear and logistic models. Section 7 Historically, the most common approach to exact

discusses computing feasibility of exact methods, inference in contingency tables has been a condi-

using currently available software. tional one. Suppose an inference refers to a param-

Nearly all the literature on exact methods for eter in some loglinear model. Exact conditional

contingency tables emphasizes hypothesis testing; inferential methods utilize the distribution of the

but, in each section, I also indicate the scope sufficient statistic for that parameter, conditional

of work on interval estimation. My discussion on sufficient statistics for the other parameters

throughout the article takes the viewpoint of classi- ("nuisance" parameters) in the model. For in-

cal frequentist conditional inference, with appli- stance, suppose one wants to test a hypothesis Ho

cation to loglinear models for contingency tables. that corresponds to a loglinear model symbolized by

Section 8 mentions other approaches. It presents an Mo, under the assumption that a more general

unconditional approach to exact inference and also model M, holds, corresponding to an alternative

indicates how Bayesian inferences correspond to hypothesis HI. Denote minimal sufficient statistics

conditional inferences for certain choices of prior for the models by To and TI. Exact inference uses

distributions. The final section discusses possible the conditional distribution of T, given To (e.g.,

future directions for research on exact methods for Andersen, 1974). By the definition of sufficiency,

contingency tables. -. the conditional distribution does not depend on the

Throughout the article, I assume a standard Pois- nuisance parameters, thus making exact inference

son or mu1tinorr)ial sampling model for cell counts possible.

in the contingency table. For instance, in an I x J To illustrate, for Poisson sampling in a 2 x 2

table, the cell counts inij) might have a multino- contingency table, the saturated loglinear model

mial distribution generated by n independent tri- has the form

als with IJ cell probabilities {.rrij). Or, the counts


inij, j = 1 , . . . , J ) in row i might have a multino-
'

mial distribution, with counts in different rows


,being independent. In the first case ("full" multi- The parameters {Aij) describing association in this
nomial sampling), n = CC nij is fixed. In the sec- model pertain to the odds ratio. For instance, sup-
ond case (independent multinomial sampling), pose we achieve identifiability by setting = %=
i n i + = Cjnij, i = 1 , . . . , I ) are fixed. When inij) &, = A,, = &, = 0. Then A,, =. log(8), where 8 de-
are independent Poisson random variables, con- notes the odds ratio, 8 = ( m ~ ~ m ~ ~ ) / ( m The
~
~ m ~ ~
ditioning on n yields full multinomial sampling; parameter set is (p, A?, A,; A,,), and normally inter-

conditioning further on the row totals yields inde- est focuses on 8 = exp(A,,), the others being nui-

pendent multinomial sampling. All sampling mod- sance parameters. The sufficient statistics are n for

els lead to the same exact inferences, since those p, n,, for A?, n,, for A,; and n,, for A,,. To conduct

inferences condition on marginal totals that con- exact inference about 8, consider the conditional

tain as a subset the naturally fixed totals, and distribution of n,, given n, n,,, and n,,. This

since the parameters of usual interest are not the conditional distribution is the same as the one

proportions in the margins that are fixed under having elements P(Y, = n,,, Y, = n,, - n,, I Y, +

some sampling designs but not under others. Y, = n+,), where {Yi, i = 1 , 2 ) are indepen-

134 A. AGRESTI

dent binomial random variables with parameters the model by conditioning on sufficient statistics
[ ni+, mi, /(mi, +
mi,)]. Straightforward calcu- for them.
lation shows that it equals
2. EXACT INFERENCE FOR 2 x 2 TABLES
This simple case still generates an enormous vol-
ume of publications, partly because of controversy
(discussed in Section 8.1) about applying exact
methods that condition on marginal totals that are
not naturally fixed under Poisson, multinomial, or
binomial sampling schemes. The most common
where the index of summation ranges from
sampling model for 2 x 2 tables is independent
max(0, n,++ n+, - n) to min(n,+, n+,), the possi-
binomial sampling in the two rows. Denote the
ble values for n,, for the given marginal totals.
"success" probabilities by a, and a,, for which the
This is the noncentral hypergeometric distribution
odds ratio is 8 = [a, /(1 - a,)]/[a, /(1 - a,)]. The
(Fisher, 1935a; Cornfield, 1956).
hypothesis of homogeneity states that a, = a,. For
To test statistical independence of X and Y [A,,
Poisson or full multinomial sampling, conditioning
= 0 in model (1.1)1, one uses this distribution with
on { n,+, n,+ ) gives binomial sampling, and statis-
8 = 1. Of course, to complete a test, one needs to
tical independence is equivalent to homogeneity.
specify a test statistic and the way to compute the Under the hypothesis of homogeneity, conditioning
p-value. In testing a loglinear model M, against a
as well on n+, to eliminate the nuisance parameter
more complex model MI, in this article I will gen- (the common value of a, and a,) yields the hyper-
erally base p-values on the exact conditional distri- geometric distribution (1.2) with 8 = 1.
bution of Rao's efficient score statistic for testing
that the extra parameters that are in M, but not 2.1 Fisher's Exact Test
in Mo equal zero. The efficient score statistic is
based on the vector of partial derivatives of the log Fisher's exact test (Fisher, 1934, 1935a; Yates,
likelihood with respect to the extra parameters, 1934; Irwin, 1935) of H,: 8 = 1 is "well known,"
evaluated at the null estimates (Rao, 1973, Section and I refer the reader to Lehmann (1986, pages
6e). For testing independence in a 2 x 2 table, this 151-162) for details of its derivation under various
suggests [ n,, - (n,+n+,)/ nl, suitably normalized, sampling schemes. The null hypergeometric dis-
as a test statistic. tribution has E(n,,) = n,+n+, / n and Var(n,,) =
The p-value, based on extreme values of the test n1+n+,n2+n+,/n2(n- 1). To test H,: 8 = 1
statistic, is calculated using the distribution for against HI: 8 > 1, the p-value is
that statistic that is_induced by the exact condi-
tional distribution of { nij) . When M, contains one
more parameter than M,, tests based on the effi-
cient score statistic are uniformly most powerful where S = { t: t 2 n,,). The set S is identical to
unbiased (UMPU), as a consequence of a result for that of tables for which the sample odds ratio is at
exponential families (e.g., Lehmann, 1986, Theo- least as large as observed.
rem 3, page 147). Denote the ML estimator of the The hypergeometric applies directly as a sam-
parameters in M, by (e,, el), where 9, denotes the pling model when both sets of marginal counts are
estimator of the parameters that are in M, but not naturally fixed. A classic example of this case is
in M,. For large samples with a fixed number of Fisher's (1935b) tea-tasting experiment, relating to
cells, these tests are asymptotically equivalent to a woman's claim to be able to judge whether tea or
~ a l tests
d based on el and likelihood-ratio tests of milk was poured in the cup first. The woman was
Mo against M, (Rao, 1973, pages 418-420). given eight cups of tea, in four of which tea was
My reason for relating exact conditional infer- poured first and in four of which milk was poured
ence to loglinear models in this article is as first, and was told to guess which four had tea
follows. Loglinear models are generalized linear added first. The contingency table for this design
models for categorical data that use the canonical (see Table 2) has n = 8 and n,+= n+, = 4; n,, can
link, that is, they directlyqmodel the natural pa- take values O,1,2,3,4 with corresponding one-sided
rameter (the log mean) of a natural exponential p-values (for HI: 8 > 1) of 1.0, 0.986, 0.757, 0.243
family (the Poisson). Such generalized linear mod- and 0.014.
els permit reduction of the data through sufficient Ways of forming two-sided p-values in Fisher's
statistics (McCullagh and Nelder, 1989, page 32). exact test were discussed by Gibbons and Pratt
Thus, one can eliminate nuisance parameters in (1975), Yates and discussants (1984), Davis (1986),
EXACT INFERENCE FOR CONTINGENCY TABLES 135
TABLE2 ing the upper endpoint; if n,, = min(n,+, n,,), then
Fisher's tea-tasting experiment
the upper endpoint is m, and one uses P = a in
Guess poured first obtaining the lower endpoint. For the tea-tasting
Poured first Milk Tea Total
data, the 95% confidence interval obtained in this
-
manner is (0.21,626.2).
Milk 3 1 4 The discreteness of the distribution of n,, limits
Tea 1 3 4 the confidence intervals to a discrete set of possible
Total 4 4 8
endpoints, for fixed a . Thus, the true confidence
coefficient is at least 1 - a , rather than exactly
1 - a , and its value depends on the value of I9
Dupont (1986), Mantel (1987), and Lloyd (1988a). (Neyman, 1935). It is strictly greater than 1 - a
The most popular approaches are (a) double unless the true 19 is an attainable endpoint. I use
the one-sided p-value, (b) (2.1) with S = { t: quotes around "exact" in referring to such inter-
f ( t l n, n,,, n,,) If(n1, I n, n,,, n,,)), and (c) (2.1) vals to reflect this behavior.
with S = {t: I t - E(n,,)I 2 1 n,, - E(n,,)I}. The Baptista and Pike (1977) described an alterna-
third option is identical to the null probability that tive approach that also guarantees the confidence
the Pearson chi-squared statistic is at least as large level but sometimes gives slightly shorter inter-
as observed. Different approaches can give differ- vals. Their interval is a generalization of Sterne's
ent results because of the discreteness and poten- (1954) confidence interval for a single binomial
tial skewness of the hypergeometric distribution. parameter. One inverts a family of acceptance re-
For instance, consider the table having counts by gions that are formed using the minimal number of
row (10,90/20,80) (i.e., 10 and 90 in the first row, most likely outcomes. Specifically, for each 00, one
20 and 80 in the second), discussed by Dupont finds a set A(0,) of possible n,, values such that
(1986). The null distribution of n,, is symmetric the probability of the set is at least 1 - a and such
about 15, and the three p-values are identical and that every integer in the set is at least as likely to
equal 0.073. However, for the nearly identical table occur as every integer outside the set. The confi-
(10,91/20,80), p-value (a) equals 0.069, whereas dence interval is the set of 8, values for which the
(b) and (c) equal 0.050. observed n,, falls in A(0,).
It is not possible to construct "exact" confidence
2.2 "Exact" Estimation in 2 x 2 Tables intervals for association measures that are not
The non-null conditional distribution (1.2) of n,, functions of the odds ratio. They do not occur as
is used in constructing "exact" confidence intervals parameters in generalized linear models with Pois-
for the odds ratio. For data n,,, the conditional ML son or binomial random component using canonical
estimate of 6 is t h e value of 19 that maximizes links. Thus, the usual conditioning arguments do
probability (1.2). The estimate is obtained using not eliminate nuisance parameters. For instance,
iterative methods (Cornfield, 1956), and differs from consider estimation of the difference of probabili-
the unconditional ML estimate 6 = n,, n,, / n,, n,,. ties 6 = a, - a, for independent binomial samples.
For instance, for Fisher's tea-tasting data (Table 2), The joint sampling distribution can be expressed in
n,, = 3 and the unconditional ML estimate is 9; the terms of 6 and a,, for instance, but conditioning on
conditional ML estimate maximizes the conditional the marginal totals does not eliminate a,. Santner
likelihood (1.2), and Snell (1980) discussed this and other difficul-
ties in interval estimation of the difference of pro-
portions and the relative risk. They also described
ways of getting conservative confidence intervals
and equals 6.41. for these parameters, but the usual conservative-
For testing H,: 8 = Oo against HI: 6 > O,, the ness due to discreteness is compounded because of
p-value is P = C,f(t I n, n,,, n+,;8,), where S = the approach used to eliminate the nuisance
{t: t 2 n,,}. For testing against H,: 0 < O, S = {t: parameter.
t I n,,}. One can obtain an "exact" confidence in- "Exact" interval estimates exist in certain ex-
terval for 19 by inverting the test (Cornfield, 1956; treme situations in which asymptotic interval esti-
Mantel and Hankey, 1971'; Thomas, 1971). The mates do not. Suppose the conditional inference
lower endpoint is the 8, value for which P = a / 2 uses the distribution of TI, given To. When T,
in testing H,: 19 = 8, against H I : 8 > 8,. The upper assumes its maximum or minimum value, the un-
endpoint is the 8, value for which P = a / 2 in conditional (or conditional) ML estimator of 6 and
testing Ho against H,: 8 < 8,. If n,, = 0, then the its asymptotic standard error do not exist (unless
lower endpoint is 0, and one uses P = a in obtain- one adds some constant to the cells), but "exact"
136 A. AGRESTI

one-sided confidence intervals do exist. For a point pair, one eliminates the nuisance parameters { a h }
estimate in such cases, some authors (e.g., Hirji, by conditioning on the pairwise success totals
Mehta and Patel, 1987, 1988; Hirji, Tsiatis and { ylh + Y , ~ }Given
. a total of 0, P(Ylh = YZh= 0) =
Mehta, 1989) use median unbiased estimates. To 1, and given a total of 2, P(Ylh = Y,, = 1) = 1.
illustrate, for a 2 x 2 table, when n,, = 0, the ML The conditional distribution of ( Y, ,, Y2,) depends
estimator of the loglinear association parameter on 0 only when y,, + YZh = 1. For each of the n*
(the log odds ratio) and the ML estimator ( C C l / nij) such pairs, direct calculation using (2.2) shows that
of the asymptotic variance of the ML estimator do P(Ylh = 1, Y,, = 0) = [l + exp(0)l-l. Since n,, =
not exist. The median unbiased estimate is the 8, Cy,, for these n* pairs, the distribution of n,,
value that gives P = 0.5 and the "exact" upper conditional on n* is binomial with parameter [l +
100(1 - a)% confidence limit is the 8, value that exp(0)I-l. The parameter equals 112 under
gives P = a , in testing against HI: 8 < 8,. marginal homogeneity ( 0 = 0). This conditional
analysis implies that pairs having identical re-
2.3 Comparing Dependent Proportions sponse at the two occasions are irrelevant to testing
Next, consider the comparison of two proportions a , + =a+,.
when each sample contains the same subjects, or Cox (1966) generalized this analysis for data in
the sample consists of matched pairs. For instance, which each case is matched with several controls.
one might have repeated measurement of a binary Gart (1969) gave an exact test for comparing
response at two occasions. Let nij be the number of matched proportions in crossover designs.
observations in category i at occasion 1 and cate-
gory j at occasion 2. Let a i j denote the probability 3. EXACT INFERENCE IN I x J TABLES
of response i at occasion 1 and response j at occa- For I x J tables, statistical independence of X
sion 2, and let pij = nij/n. The sample proportions and Y corresponds to the loglinear model M, =
of " S U C C ~ S S ~ S "p l + at occasion 1 and p + , at occa- (X, Y), which has form
sion 2 are dependent, rather than independent,
because of the matching. The hypothesis of log mij = p + AT + A;.
marginal homogeneity, H,: a, = a ,, corresponds
+ +

to homogeneity for the 2 x 2 table consisting of the To test this model against the saturated model
two marginal distributions. For binary responses, [ M, = (1.1) extended to I rows and J columnsl, one
marginal homogeneity is equivalent to symmetry, uses the null distribution of inij} given { n , + }and
a,, = a,, To obtain an exact test, one conditions
on n* = n,, + n,,. Under H,, n,, has a binomial For multinomial sampling, the cell probability
(n*, 112) distribution. A two-sided p-value parameters {aij} in the distribution of {n,,} can be
is the sum of binomial probabilities for n,, val- expressed in terms of the marginal probabilities
ues at least as far from n*/2 as observed. This and (I - 1)(J - 1) odds ratios. Conditional on
is a small-sample version of McNemar's test i n , + } and {n+j}, Cornfield (1956) noted that the
(McNemar, 1947). distribution of inij} depends only on the odds ra-
Cox (1958a) provided an argument, of which I tios. The conditional probabilities are proportional
now sketch the outline, that motivates this condi- to
tional approach. Let ( Y,,, Y,,) denote the hth pair
of observations, where a "1" response denotes cate-
gory 1 (success) and "0" denotes category 2.
Consider the logit model

where a i j = (aijaIJ)/(aiJaIj).For exact tests of


statistical independence (all a i j = I), this distribu-
where the indicator ' I ( t = 2) equals 1 when t = 2 tion simplifies to the multiple hypergeometric. The
and 0 when t = 1. This model (a special case of the probability of a table inij} having the given
Rasch model) permits separate response distribu- marginal totals equals
tions for each pair, but assumes a common effect,
exp(0) representing the odds ratio of success at
occasion 2 compared with occasion 1. For the joint
mass function of the data under the assumption n!n n n,,!
that Ylh and Y,, are independent within each
EXACT INFERENCE FOR CONTINGENCY TABLES 137
3.1 Linking Test Statistics to Alternatives using T fall in a complete class of unbiased and
Let p,,, denote the null probability (3.1) of the admissible tests.
observed table. Freeman and Halton (1951) defined When X is nominal and Y is ordinal, one might
the p-value for a conditional test of independence to let T be the Kruskal-Wallis statistic adjusted for
ties (Klotz and Teng, 1977). In that statistic, aver-
be the null probability of the set of tables having
age ranks are used as scores {yj} for the column
probability no greater than pobs. For the 2 x 2
categories, and the test is sensitive to variation
case, this simplifies to a two-sided version of
among the mean ranks computed for the condi-
Fisher's exact test. Many statisticians have argued
tional distributions within the rows. This is an
that the p-value should instead be based on the
efficient score statistic for a loglinear model having
exact distribution of some meaningful statistic T
form (3.2), with "row effects" {fix,} treated as
(such as the efficient score statistic) for quantifying
parameters. For 2 x J tables, Soms (1985) pre-
the departure of the data from Ho (e.g., Yates,
sented exact tests sensitive to other alternatives.
1934; Fisher, 1950; Healy, 1969; Agresti and Wack-
See Haberman (1974), Goodman (1979) and Agresti
erly, 1977; Cressie and Read, 1989). The p-value
(1990, Chapter 8) for discussions of loglinear
for the Freeman-Halton test may be regarded as
models with scores.
one that uses a statistic negatively related to p,,,,
To illustrate that tables that are "more contra-
such as T = - 2 log( p,,,).
dictory" to Ho according to some statistic T need
When both classifications are nominal, the usual
not be less likely, consider the twelve 3 x 3 tables
alternative to the independence model is the satu-
having row totals (6,l, 2) and column totals (1,2,6).
rated model. The efficient score statistic is then the
Table 3 gives the conditional null distribution of
Pearson chi-squared statistic for testing the good-
T = CCxiyjnij, for { x i = y, = i - 2). The distribu-
ness-of-fit of the independence model,
tion has support between - 7 and 0 with a mean of
- 2.22. It is-highly irregular, far from its limiting
normal distribution. Table 3 shows that, for some
margins, it is not possible to obtain small p-values
where mij = ni+n . I n . One could let the p-value for some alternatives (e.g., a test using T for the
be the null probabh!ity that X 2 is at least as large one-sided alternative of a positive association). The
as observed (Yates, 1934; Agresti and Wackerly, table having entries by row (0,2,4/1,0,0/0,0,2),
1977; Baker, 1977). More generally, one could use a one of two tables to have T = -2, has only a
power divergence statistic (Cressie and Read, 1989), quarter of the probability of the table having en-
special cases of which are the Pearson statistic and tries (1,2,3/0,0,1/0,0,2), the only table having
the likelihood-ratio statistic. T = 0. Yet, the first table is less contradictory to
When both classifict$ions are ordinal, it is often Ho than the second, using I T - E(T) I as the crite-
important to have good power for detecting a mono- rion. The first table has a p-value of 1.0 using this
tone trend in the association. To do this, one could criterion, but only 0.2856 in the Freeman-Halton
let T relate to a Spearman or Pearson-type correla- test. By contrast, the I x I table having nii = 1for
tion (Patefield, 1982), the difference between the all i and nij = 0 for all i + j has P = (2/1!) using
numbers of concordant and discordant pairs (Agresti ) T - E(T) I but has P = 1.0 for the Freeman-
and Wackerly, 1977), or the Jonckheere-Terpstra Halton test or the exact test using x2 as the
statistic (StatXact, 1991). In the first case, the criterion.
statistic on which the reference tables are ordered
is T = CCxiyj(nij - ni+n+j/n), .for two sets of
mon,otone scores { xi) for rows and { yj) for columns.
An exact conditional test using this statistic is an TABLE3
Exact conditional distribution of T = CC(i - 2 ) ( j - 2)nij
efficient score test for the loglinear model of linear- under independence, for margins ( 6 , 1 , 2 )and ( 1 , 2 , 6 )
by-linear association (Agresti, Mehta and Patel,
1990). That model has the form No.
t of tables Probabilities P(T = t )

(Birch, 1965; Haberman, 1974; Goodman, 1979),


with the special case 0 = 0 representing statis-
tical independence of X and Y. For model (3.2),
the sufficient statistics are T plus those for the
independence model ({ ni+),{ n,)). Cohen and
Sackrowitz (1991) showed that nonrandomized tests
138 A. AGRESTI

When I = 2, ordering the tables by T is equiva- quencies { mijk),model (XY, XZ, YZ) has the form
lent to ordering them by U = Cj yjnIj. Many statis-
tical tests use U for various choices of scores log mi,, =p + A?+ A+: % + hEY+ +
(Graubard and Korn, 1987). For arbitrary mono-
tone scores, the exact test for T is a small-sample and model (XZ, YZ) is its special case in which
version of a trend test proposed by Cochran (1954) xu -
{hij 0
- 1.
and Armitage (1955). The statistic with midrank
4.1 Testing Conditional Independence in 2 x 2 x K
scores is used in exact Wilcoxon tests for ordered
Tables
categorical data (Klotz, 1966; Mehta, Pate1 and
Tsiatis, 1984). A common approach to testing conditional inde-
pendence [ M, = (XZ, YZ)] performs the analysis
3.2 Exact Estimation for I x J Tables under the assumption of no three-factor interaction
"Exact" confidence intervals are rarely used for [MI = (XY, XZ, YZ)]. The sufficient statistics are
I x J tables, probably because of the complexity. the X-Z and Y-Z two-way marginal tables for M,,
Reducing parameter dimensionality could be use- and these as well as the X-Y marginal table for
ful in many applications, by using an unsaturated MI. The relevant conditional distribution is that of
loglinear model. Agresti, Mehta and Pate1 (1990) the X-Y marginal table, given the X-Z and Y-Z
discussed confidence intervals for ordinal classifica- marginal tables. The conditioned totals are the
tions when odds ratios satisfy the pattern marginal counts for the K partial tables. For the
2 x 2 x K case, the distribution simplifies to that
of = Cknllk, given {nl+k,n2+k,n+lk, n+2k, =
1, . . . , K}. Under the assumption of no three-factor
implied by the linear-by-linear association model interaction, Birch (1964) showed that UMPU tests
(3.2). For this pattern, the non-null distribution of of conditional independence utilize T (see also
T = CCxiyjnij is Lehmann 1986, pages 163-164).
Conditional on the strata margins, inllk, k =
1,. . . , K) have independent hypergeometric distri-
butions, each of form (1.2) with 8 = 1. The product
of the K mass functions determines the null distri-
bution of their sum. For the one-sided alternative
where C, is the sum of (11,IIjnij!)-l for all tables
of a "positive" association (odds ratio greater than
with the given marginal distributions having T = t .
1.0 in each level of Z), the p-value for Birch's exact
One obtains a confidence interval for 0 and thus
test of conditional independence is the null proba-
{aij} by inverting the test of H,:p = Po, as in the
bility that Cknllk is at least as large as observed,
case of the odds ratro for a 2 x 2 table. When T
for the fixed marginal totals. Thomas (1975),
takes its maximum or minimum possible value for
Pagano and Tritchler (1983b), and Mehta, Pate1
the given marginals, the conditional and uncondi-
and Gray (1985) gave algorithms for implementing
tional ML estimators of P and asymptotic standard
this test, which may be regarded as an exact small-
errors do not exist, but one-sided confidence bounds
sample version of the Cochran-Mantel-Haenszel
for 0 and odds ratios using it do exist.
test.
The exact McNemar test for matched pairs (Sec-
4. EXACT INFERENCE IN THREE-WAY
tion 2.3) is a special case of Birch's exact test of
CONTINGENCY TABLES
conditional independence. In this representation,
I,next consider three-way tables, paying special each of the n matched pairs has a 2 x 2 table
attention to the 2 x 2 x K case. Such tables occur, relating occasion (or member of pair) to response.
for instance, when one compares a binary response One tests M, = conditional independence of occa-
(Y) for two treatments (X), using data obtained at sion and response, given the pair, under the as-
K levels of a possibly confounding factor (2).Two sumption of MI = homogeneous odds ratios in the
hypotheses of importance are (1) conditional inde- n 2 x 2 tables.
pendence of X and Y, given Z, and (2) no three- For 2 x 2 x K tables in which it is unrealistic to
factor interaction, meaning' that the true X-Y odds expect the K conditional X-Y odds ratios to be
ratio is identical at each level of Z. Conditional similar or even of the same sign, the saturated
independence of X and Y, given Z, corresponds to model is a more relevant alternative than no
the loglinear model symbolized by (XZ, YZ), and three-factor interaction. In that case, the efficient
no three-factor interaction corresponds to the log- score statistic is CkX;, where X l denotes the
linear model (XY, XZ, YZ). For cell-expected fre- Pearson statistic for testing independence between
EXACT INFERENCE FOR CONTINGENCY TABLES 139

X and Y within the kth level of Z. This statistic is wirth (1988, page 266). The data refers to P =
commonly used in asymptotic tests of conditional whether promoted and R = race, stratified by M =
independence against the general alternative. month of promotion consideration (in 1974). Cases
Currently, there do not seem to be any computer involved GS-13 level computer specialists being
algorithms for this case. considered for promotion to level GS-14. (It appears
that many of the subjects appeared in two-or all
4.2 Testing Homogeneity of Odds Ratios in three strata, but this information is not available.
2 x 2 x K Tables So, like Gastwirth, I treat the promotion decisions
Birch's exact test using test statistic C,n,,, as- as independent.) Under the assumption of a con-
sumes homogeneity of the odds ratios in the 2 x stant odds ratio 8 between P and R at each level of
2 x K table. Zelen (1971) presented an exact test of M, we first test H,: 8 = 1 against H,: 8 < 1. In
this assumption. Here, M, = (XY, XZ, YZ), and using the one-sided alternative for the association
M, is the saturated model. The conditional distri- between P and R, the test is sensitive to evidence of
bution, given each of the three sets of two-way possible discrimination against blacks, in the sense
marginal totals, has probabilities proportional to of the probability of promotions being lower for
(IIiIIjIIknijk!)-l.For the 2 x 2 x K case, the fixed blacks than whites. Given the P and R marginal
totals are in,,,, n,+,, n,,,, n,,,, k = 1,. . K) a ,
totals at each level of M, n,,, can range between 0
and {n,,,, n,,,, n,,,, n,,,). Zelen defined the p- and 4, n,,, can range between 0 and 4, and n,,,
value to be the sum of probabilities of all 2 x 2 x K can range between 0 and 2. The test statistic T =
tables that are no more probable than the observed Cn,,, can assume values between 0 and 10, and
table. Alternatively, one could define P by order- under H,, E(T) = 2.90 and u(T) = 1.35. Note that
ing the tables with the given two-way margins the sample data represent the most extreme possi-
by x2 for testing the fit of loglinear model ble result in each of the three cases. The observed
(XY, XZ, YZ). Thomas (1975) and Pagano and test statistic is T = 0, and the p-value is the null
Tritcher (1983b) gave algorithms for implementing value of P(T r 0), which is 0.026. Zelen's test of
Zelen's test. the hypothesis of a constant odds ratio is a test of
To improve potential power in testing model fit of the loglinear model (PR, PM, RM). The refer-
(XY, XZ, YZ), one could instead test it against an ence set consists of the subset of these 2 x 2 x 3
unsaturated model. When the levels of Z have a tables that also satisfy n,,+= 0. The observed table
natural ordering, one could use the alternative log- is the only such table, so the test is degenerate,
linear model by which the log odds ratios change giving P = 1.0.
linearly across the K strata; that is, 4.3 Exact Estimation in 2 x 2 x K Tables
log mi,,= (XY, XZ, YZ) + I(i =j = 1) 6zk, Assuming no three-factor interaction and condi-
tioning on the strata totals, the joint distribution of
for fixed monotone scores { z,), where I(.) is the
,,,,
{ n . . . , n,,,) is the product of K terms of the
type given in (1.2) for the 2 x 2 case. This distribu-
indicator function. The relevant distribution is then tion depends only on the common odds ratio. Birch
that of ~ , z , n , , , , conditional on all two-way (1964) discussed the conditional ML estimator of
marginal totals. Zelen (1971) also presented such the common odds ratio, and Gart (1970) defined
an exact test. "exact" confidence intervals as a direct extension
To illustrate some exact tests for conditional in- of Cornfield's "exact" intervals for 2 x 2 tables.
dependence and for homogeneity of odds ratios for Thomas (1975), Pagano and Tritchler (1983b),
2 x 2 x K tables, consider Table 4, a 2 x 2 x 3 Mehta, Pate1 and Gray (1985), and Vollset, Hirji
table based on a larger table presented by Gast- and Elashoff (1991) provided algorithms for calcu-
lating such intervals. When the sufficient statistic
T = C knllk attains its minimum or maximum pos-
TABLE4 sible value, asymptotic confidence intervals (e.g.,
Example of 2 x 2 x K analysis using the Mantel-Haenszel approach; see Agresti
July ' August September
1990, pages 235-236) do not exist, but one-sided
promotions promotions promotions "exact" confidence intervals are well defined.
Race Yes No Yes No Yes No
Table 4 has the boundary value T = 0, resulting
in a degenerate conditional ML estimate of 0.0 for
Black 0 7 0 7 0 8 an assumed constant odds ratio. An "exact" 95%
White 4 16 4 13 2 13 upper confidence bound for the odds ratio equals
Source: Gastwirth (1988). 0.779, the value 8, that gives P = 0.05 in testing
140 A. AGRESTI

H:, 19 = 6 , against HI: 19 < 19,. The median unbi- (XZ, YZ)] using an exact test of independence of X
ased estimator equals 0.152. and Z in that two-way table. For instance, one can
conduct an exact test of ( P , RM) against (PR, RM)
4.4 Exact Methods for I x J x K Tables by simply testing whether P and R are indepen-
In principle, methods for exact testing and esti- dent in the marginal table (0,22/10,42); for this
,nation extend to loglinear models for multiway table, the Pearson statistic has two-sided P =
tables. Current computational algorithms are re- 0.056. Similarly, one can conduct an exact test of
stricted to certain analyses for 2 x J x K tables. In [I: ( X, Y, Z)] against [2: ( X, YZ)] using an exact
this subsection, I outline the types of inferences for test of independence of Y and Z in the two-way
which it would be useful to extend computational Y-Z marginal table. Computational algorithms are
algorithms in the I x J x K case. unavailable for the other situations, discussed in
Consider the hypotheses corresponding to the fol- the remainder of this subsection.
lowing five hierarchical loglinear models: To test hypothesis [I: (X, Y, Z)l against the satu-
rated model, one conditions on the sufficient statis-
(1) (X, Y, 2): Mutual independence of X, Y, tics for (X, Y, Z), which are {n,++,n+j+,n++,).
and Z; The resulting mass function is
(2) ( X, YZ): Joint independence of X and the
Y-Z classification;
(3) ( XZ, YZ): Conditional independence of X
and Y, given Z;
(4) (XY, XZ, YZ): No three-factor interaction;
(5) ( XYZ): Saturated model.
For each of models (1)to (4), one can consider exact Stumpf and Steyn (1986) gave formulas for the
testing against the alternative of the next most first- and second-order moments of the cell counts.
complex model, and against the general alternative To construct an exact test of hypothesis [I:
of the saturated model. ( X , Y, Z)l against the saturated model, one could
Certain tests are special cases of ones already use this distribution to generate the exact con-
developed for two-way tables, so they do not require ditional distribution of the Pearson statistic for
separate consideration. Hypothesis [2: (X, YZ)] is a testing the fit of model (X, Y, 2 ) . This case is
special case of statistical independence for a two- relatively unimportant, as the hypothesis of mu-
way table, in which the second classification con- tual independence is plausible in very few applica-
sists of the J K combinations of categories of Y and tions.
Z. Thus, an exact test of [2: (X, YZ)] against [5: A more important case is a test of conditional
(XYZ)] is simply a standard exact test of indepen- independence [3: ( XZ, YZ)] of X and Y against
dence for a two-way ( I x J K ) table. For instance, [4: (XY, XZ, YZ)]. Such a test generalizes Birch's
in Table 4, an exact test of ( P , RM) (i.e., that test for 2 x 2 x K tables. Here, one tests condi-
promotion is jointly independent of race and month tional independence under the assumption that the
of decision) is an exact test for the two-way table ( I - 1 ) ( J - 1) odds ratios relating X and Y are
having rows (0,4,0,4,0,2/7,16,7,13,8,13). The identical across the K levels of Z. Model (XZ, YZ)
Pearson test statistic of 5.62 has an exact p- has the sufficient statistics ({ni+k),{n+jk)), and
value of 0.353. [The attained significance for model (XY, XZ, YZ) has these plus { nij+). One
this joint test is weak compared to the p-value of uses the distribution of ({ nij+} I { ni+k),
0.026 obtained for the P-R association alone in which is a product of multivariate hypergeometric
the 'one-sided exact test of (PM, RM) against distributions from the various layers of Z. Let d
(PR, PM, RM). This shows how severely the evi- denote the ( I - 1)(J - 1) x 1 vector having ele-
dence represented by an effect of a certain size can ments
diminish when the degrees of freedom on which it
is based increase drastically.]
A test of [2: (X, YZ)] against [3: (XZ, YZ)l tests
whether the hxZ term in model (XZ, YZ) is zero.
By standard collapsibility results (e.g., Bishop,
1971; Agresti, 1990, pages 146 and 230), when this
model holds the hXZ term is identical to the hXZ The efficient score test orders tables in the refer-
term in the saturated two-way loglinear model for ence set using d'VP1d, where V is the null co-
the marginal X-Z two-way table. Thus, one can variance matrix of d. Birch (1965) proposed an
conduct an exact test of [2: (X, YZ)] against [3: asymptotic test of this type.
EXACT INFERENCE FOR CONTINGENCY TABLES 141
Alternatively, one could test conditional indepen- where the two rows for stratum k contain one
dence [3: (XZ, YZ)] against [5: (XYZ)], if one be- observation in each row, giving the responses at
lieves that the association between X and Y may the two occasions for subject k. Such a test is an
vary considerably across levels of Z. An efficient exact analog of asymptotic tests described by White,
score statistic is then the Pearson statistic for test- Landis and Cooper (1982) and Kuritz, Landis and
ing the fit of (XZ, YZ), which is C,X;, where X; Koch (1988).
is the Pearson statistic for testing independence Interval estimation for I x J x K tables would
between X and Y within the kth level of Z. This seem to be awkward except for simpler models that
test would also use the conditional distribution that reduce the dimensionality of the parameter space.
fixes the XZ and YZ marginal tables. Since it has Examples include models that describe the X-Y
a broader alternative than the generalized Birch conditional association by a linear-by-linear term
test, this test would tend to be less powerful than or describe three-factor interaction by a linear trend
that test when model (XY, XZ, YZ) provides a de- in conditional log odds ratios.
cent approximation to the actual distribution.
An exact test of no three-factor interaction [4: 5. EXACT INFERENCE IN LOGISTIC
(XY, XZ, YZ)] for I x J x K tables generalizes Ze- REGRESSION MODELS
len's test for 2 x 2 x K tables. Here, the null hy-
Exact inferences for loglinear models discussed
pothesis states that the (I - 1)(J - 1) odds ratios
in previous sections have counterparts for other
relating X and Y are identical across the K levels
generalized linear models that use natural parame-
of Z. The relevant conditional distribution has
ters as the basis of the link function. A closely
probabilities proportional to (IIiIIjIIknijk!)-l,for
related example for categorical data is logistic re-
the reference set of tables having XY, XZ and YZ
gression modeling. Assuming a binomial distribu-
marginal tables identical to the observed ones. An
tion with parameter ?r for the response, one uses
efficient score test against the general alternative
the logit link, log[?r/(l - T)]. Logistic models are
[5: (XYZ)] orders tables in the reference set by the
particularly useful when highlighting one categori-
Pearson statistic for testing the fit of the model.
cal variable in the contingency table as a response
For ordinal variables, one would modify the above
and the others as explanatory. Loglinear models do
ideas by constructing tests to increase power against
not make this distinction, although logistic models
important alternatives, such as has been done for
with qualitative explanatory variables have equiv-
I x J tables. For instance, to test conditional inde-
alent loglinear representations.
pendence (XZ, YZ), one might choose an alter-
For subject i, let yi denote a binary response,
native model that implies a monotone conditional
and let x i = (xio,xi,, . . . , xi,) denote values of k
X-Y association. One could use the single degree- explanatory variables, where xi, = 1. The logistic
of-freedom test statistic C ,{ C C xi yj[ nijk -
regression model is
(ni+kn+jk)/n++kl) to order the reference set of ta-
bles having the fixed X-Z and Y-Z marginal tables.
This results from comparing model (XZ, YZ) to a
model of homogeneous linear-by-linear association,
whereby the term in model (XY, XZ, YZ) is
replaced by a term @xiyj for fixed monotone row
and column scores. Such a test would be an exact
analog of an asymptotic score test proposed by
Mantel (1963) and Birch (1965). .
There has been some work of this type for the Under the usual assumption that { yi) are indepen-
2 x J x K case with ordered levels of Y (Mehta, dent Bernoulli outcomes, the sufficient statistic for
Pate1 and Senchaudhuri, 1991). In this case, one Pj is Tj = Ciyixij, j = 0,. . . , k. As noted by Cox
can take x, = 1and x, = 0, and consider the condi- (1958b, 1970), one can conduct exact inference for
tional distribution of C k[Cjyjnijk].For preselected Pj using the distribution of Tj, conditional on {Ti,
monotone scores, this gives a stratified version of i # j). Such inference is called conditional logistic
the Cochran-Armitage trend test. For rank { yj) regression (Bayer and Cox, 1979; Breslow and Day,
scores, it gives a stratified version of the Wilcoxon 1980, Chapter 7; Tritchler, 1984; Hirji, Mehta and
test. Another application of this case is testing Patel, 1987).
marginal homogeneity in an I x I table with the To illustrate, Table 5 shows some data from a
same ordered row and column categories. One can case-control study (Shapiro et al., 1979) relating
conduct this test by conducting the exact test of cigarette smoking to myocardial infarction for
conditional independence for the 2 x I x n table, women of various ages using oral contraceptives.
142 A. AGRESTI

TABLE5
continuous. The { y,) values may be completely
Example for exact logistic analysis

determined by the given sufficient statistics, mak-


Smoking level ing the conditional distribution degenerate.
(cigaretteslday) Algorithms for exact conditional inference for lo-
Age Disease status 0 1-24 >24
gistic regression can be applied to perform infer-
ence for equivalent loglinear models. To illustrate,
25-29 Myocardial infarction 0 1 3 consider loglinear modeling of several I x 2 tables.
Control 25 25 12 Regard the tables as a three-way I x 2 x K cross-
30-34 Myocardial infarction 0 1 8 classification of X, Y and 2. The loglinear model is
Control 13 10 10 equivalent to a logistic model for response Y when-
ever that loglinear model has a general association
Source: Shapiro et al. (1979). term relating X and 2. For instance, the logistic
model (5.2) for Table 5 corresponds to the loglinear
Let {nij,) denote the count at level i of smoking, j model having form
of disease status, and k of age. Let {xi) denote
scores assigned to the levels of smoking. Let ?ri, log mij, = A, + A? + A? + A: + A?: + :A? + p,xiyj
denote the probability of disease for subjects
at level i of smoking and k of age. One might where { y, = 1, y2 = 0) and S = smoking, D =
consider the model disease status, and A = age. This is a special case
of the loglinear model of homogeneous linear-
by-linear S-D association.

6. EXACT GOODNESS OF FIT


Then {ni+,) are fixed for this model, and
are fixed by the retrospective nature of the study. One can interpret tests of independence, condi-
To conduct exact inference about p,, one considers tional independence and no three-factor interaction
the distribution of C ,(C xi nil,) (the sufficient against general alternatives as tests of goodness of
statistic for PI), conditional on these totals. For fit of loglinear models. In principle, one could use
{x, = 0, x2 = 12.5, x, = 301, the conditional ML analogous methods to construct exact tests of good-
estimate of @, is 0.130, and the exact p-value for ness of fit for other loglinear or logistic regression
testing p, = 0 against 0, > 0 is 0.000. Here, re- models. For a particular model M, the reference set
sults are similar to those obtained with the uncon- consists of all tables having the observed values for
ditional ML analysis, for which the estimate of P, the minimal sufficient statistics. Given that the
of 0.133 has an estimated standard error of 0.042. model holds, the conditional distribution of the data
One could add an interaction term to the model and given those sufficient statistics is independent of
do an exact analysis for it. This would be particu- any parameters. One could construct the test by
larly natural with additional age strata represent- computing the null distribution of a goodness-of-fit
ing greater variation in the age factor, since one statistic, such as the Pearson statistic. The p-value
might expect the effect of smoking to increase at for testing the model is the conditional probability
higher age levels. 'that the goodness-of-fit statistic is at least as large
Several special cases of exact analyses for the as observed. The ML fitted values for the model are
logistic model have received attention in the litera- the same for all tables in the conditional reference
ture., For instance, Breslow and Day (1980), Peritz set.
(1982), and Hirji, Mehta and Pate1 (1988) discussed For instance, in testing independence with
exact inference for matched case-control studies Fisher's exact test, one also implicitly tests the
with the logistic model, in which case a subset of adequacy of the loglinear model of independence,
the explanatory variables are used for matching. (X, Y). To test the more general model (3.2) of
When k = 1in (5.1), the exact test of 0, = 0 using linear-by-linear association in a two-way table, one
T, = C yi xi, given To = X i yi is a special case of considers the conditional distribution of the data
the linear-by-linear test described in Section 3.1 given i n i + ) , {n+j),and CCxi yjnij.
applied to I x 2 tables. Here, I represents the McCullagh (1986) showed that, even for large
number of distinct sample values of the explana- samples, it is beneficial to perform goodness-
tory variable, and the test may be regarded as an of-fit tests using the conditional rather than
exact version of the Cochran-Armitage trend test. unconditional distribution. Although an exact
Difficulties can arise in exact inference for logis- goodness-of-fit test makes theoretical sense for any
tic regression when some explanatory variables are model having simple sufficient statistics, a general
EXACT INFERENCE FOR CONTINGENCY TABLES 143
computer algorithm is not available for it. This is a Freeman-Halton test using a state-of-the-art FOR-
useful topic for future research. There is also a TRAN program on an IBM 3701165 in about 15
need for work on exact distributions of localized seconds CPU time. This year, I performed the same
measures of goodness of fit, such as cell residuals. analysis in 15 seconds total time using the software
Bedrick and Hill (1990) gave exact conditional tests package StatXact (1991) on a 386-version PC (the
for a single outlier and for multiple outliers in CompuAdd 386SX, with math coprocessing chip)
logistic regression. running on MS-DOS at 16 mHz, and in 1second of
CPU time using SAS (PROC FREQ) on a worksta-
tion (DEC 3100). The 4 x 4 table (7,5,0,0/1,15,
7. COMPUTING FEASIBILITY
1,0/0,7,7,0/0,0,4,9) has 12,798,781 tables in the
Doing computations for exact conditional infer- reference set. In 1977, Klotz and Teng (1977) esti-
ence requires working with the set of contingency mated it would take 6 hours on a Univac 1110 at
tables having the given values of the sufficient the University of Wisconsin to perform an exact
statistics that are fixed for the inference. The po- Kruskal-Wallis test for a table having these mar-
tentially huge cardinality of the conditional refer- gins. I performed this analysis with StatXact in
ence set has been a severe impediment to the use of about 8 minutes of total time on a 386 version PC.
exact tests. This table has a very small p-value (0.0000 rounded
To illustrate, a 4 x 4 table with only 20 observa- to four decimal places), which results in consider-
tions can have as many as 40,176 tables with the able savings in time for algorithms that are able to
same margins; a 4 x 4 table with 100 observations determine whether many tables contribute to the
has a maximum cardinality on the order of 7.2 x p-value without explicitly enumerating all their
lo9. For given I and J and marginal proportions, cells.
the number of tables in the reference set of I x J
7.1 Algorithms
tables with those fixed proportions increases expo-
nentially in the sample size n. For fixed n, the A variety of algorithms have been used in com-
number of tables having given row and column puting exact conditional distributions. Verbeek and
marginal proportions also increases rapidly as I Kroonenberg (1985) presented a good survey of
and J increase or as the row and column propor- those used for I x J contingency tables. These in-
tions become more homogeneous. For instance, a clude algorithms that provide total enumeration of
5 x 5 table has a maximum cardinality on the the tables in the reference set (e.g., March, 1972;
order of 2.1 x l o 6 for 20 observations and 9.2 x Boulton, 1974; Baker, 1977; Cantor, 1979; Balmer,
1014 for 100 observations. Good (1976, 1977) and 1988), algorithms that compute the characteristic
Gail and Mantel (1977) gave approximations for function and invert it via Fourier transforms (e.g.,
the cardinality, and Agresti and Wackerly (1977) Pagano and Tritchler, 1983a, b), network algo-
and Agresti, Wackerly and Boyett (1979) gave max- rithms (Mehta and Patel, 1983) and Monte Carlo
ima for several table dimensions and sample sizes. algorithms (e.g., Agresti, Wackerly, and Boyett,
Enormous improvements achieved recently both 1979). Their paper also has enlightening discus-
in algorithms and in computer power have made sions of practical problems related to the algo-
exact inference much more feasible than it was a rithms and hardware for implementing them, such
decade ago. Most analyses conducted then with a as how to ensure proper comparison of extremely
mainframe computer can be conducted now in the small probabilities.
same order of time on a personal computer. When Algorithms that provide total enumeration of the
one is interested only in a p-value rather than the reference set are very time-consuming, and ade-
entire distribution of some statistic, substantial quate only for small problems. In the characteristic
savings in time are obtained using algorithms that function approach (Good, Gover and Mitchell, 1970;
do not require total enumeration of the reference Good, 1982; Pagano and Tritchler, 1983a, b), one
set (Pagano and Halvorsen, 1981; Mehta and Patel, computes the characteristic function of the statistic
1983). With some algorithms, exact analyses can be of interest (such as a goodness-of-fit statistic) using
easily conducted on a PC running on MS-DOS when a recurrence relation and then inverts it using a
the order of the cardinality is about lo7. They can Fourier transform to obtain the relevant distribu-
be conducted when the cardinality is much larger, tion. The fast Fourier transform (Cooley and Tukey,
using computers having operating systems with 1965) is a popular method for fast convolution of
larger memory capacities. long sequences, and so it is a natural one to apply
To illustrate, consider the 3 x 4 table (60,4,1, to analyses (such as those for 2 x 2 x K tables)
0/1,5,4,1/3,3,3,2), having n = 100 and 33,675 ta- involving convolutions of distributions. This method
bles in the reference set. In 1978, I performed the is relatively space and time efficient, computation
144 A. AGRESTI

time increasing polynomially in the sample size, on test statistic values for tables having path pass-
rather than exponentially. However, Vollset, Hirji ing through that node. In this way, one can deter-
and Elashoff (1991) noted that this method's use of mine tables that necessarily do or do not contribute
trigonometric functions and complex arithmetic can to the p-value, without processing the remaining
introduce substantial round-off errors for non-null parts of paths in the network passing through that
calculations when there is a wide range between node. Vollset, Hirji and Elashoff (1991) gave an
the largest and smallest of the combinatorial coeffi- adaptation of the network algorithm that uses an
cients that determine the exact distribution (e.g., a algebraic rather than geometric representation,
ratio of the two exceeding about lo2'). treating convolutions of distributions using polyno-
Among the most popular and versatile programs mial multiplication. Their algorithm computes the
developed in the past decade have been ones using null distribution on the natural rather than loga-
the network algorithm. This algorithm has been rithmic scale. Using this approach, they obtained
applied to several problems in a series of papers by faster computation of "exact" confidence limits for
Cyrus Mehta, Nitin Pate1 and some coworkers. For the common odds ratio in 2 x 2 x K tables.
instance, see Mehta and Pate1 (1983, 1986) for its These days, algorithms such as the network algo-
application to Freeman-Halton exact tests for I x J rithm can practically handle analyses for most I x
tables; Mehta, Pate1 and Tsiatis (1984) for exact J and 2 x 2 x K tables having a small to moderate
tests for 2 x J tables; Mehta, Pate1 and Gray (1985) number of cells and small to moderate cell counts.
for inference for the common odds ratio in 2 x 2 x K To illustrate, Table 6 reports the CPU time on a
tables; Hirji, Mehta and Pate1 (1987) for exact lo-
gistic regression; Agresti, Mehta and Pate1 (1990)
for exact inference in I x J tables with ordered
categories; Mehta, Pate1 and Senchaudhuri (1991)
for inference for 2 x J x K tables; and Hilton,
Mehta and Pate1 (1991) for Smirnov tests for cate-
gorical or continuous data.
I provide here only a brief outline of the network
representation for the reference set for an I x J
table with fixed margins, and refer the reader to
the previously mentioned papers for technical de-
tails on the algorithm itself. The network represen-
tation consists of nodes and arcs, constructed in
+
J 1 stages. For k = 0,. . . , J, the nodes at stage
k have the form (k,w,), where w, = (w,, , . . . , w,,),
with wik = nil + .. + iiik and w, = 0. There are
as many nodes at stage k as there are possible
partial sums for the first k columns of the table.
Arcs emanate from each node at any stage k, each
arc being connected to a distinct node at stage FIG. 1. Network representation for 3 x 3 tables having row
+
k 1. The network is constructed recursively by margin ( 6 , 1 , 2 )and column margin ( 1 , 2 , 6 ) .
specifying all successor nodes (k + 1,wk+,) that
are connected by arcs to each node (k, w,). A path
through the network is a sequence of arcs (0,O) -,
TABLE 6
(1, w,) -, . + ( J , w,). Each path represents a Sample CPU times (seconds) for Freeman-Halton test on
distinct table in the reference set, with entries I x I tables with uniform marginal counts andp-values
(w,+, - w,) in column k + 1. The network repre- approximately 0.05, using SAS on a DEC 3100 workstation
sentation is used in calculating the exact distribu-
tion by stagewise recursion, beginning at node Total sample size
(0,O). Figure 1 shows the network representa-
tion for the 3 x 3 table discussed in Section 3.1
having row margins (6,1,2) and column margins
(1, 2, 6). The topmost path gives the table
3/0,0,1/0,0,2).
For simply calculating p-values, one can dramat-
ically increase the speed of the network algorithm
by computing at each node lower and upper bounds a More than 6 hours.
EXACT INFERENCE FOR CONTINGENCY TABLES 145
DEC 3100 workstation needed for SAS (PROC of 0.0004 and a 99% confidence interval for that
FREQ) to perform the Freeman-Halton test for exact p-value of (0.0000,O.0009).
I x I tables of various sizes having uniform An advantage of the Monte Carlo method is that
marginal totals and cell frequencies changed from the amount of computational work is much less
uniformity sufficiently to give p-values approxi- dependent on the sample size n and table size I x J
mately equal to 0.05. Times are much faster when than for methods for exact analysis. For a method
marginal counts are nonuniform. For instance, a of simulating tables that involves taking a random
5 x 5 table with n = 100 having both margins permutation of n integers, Agresti, Wackerly and
equal to (60,25,10,3,2) took only 40 seconds CPU Boyett (1979) noted that the CPU time is approxi-
time. mately linear in n and stable in I and J. Patefield
The remaining problematic area is large, sparse (1981) provided a method that is more efficient for
tables, for which there are a large number of cells large n. Although it takes longer to generate each
and cell counts too small to appeal to standard table with Monte Carlo methods, only a relatively
asymptotic theory. For two-way tables, examples small number need to be generated. In principle,
would be tables of at least about 50 cells, having this method could be used to approximate precisely
several fitted values less than about 5. Even with any exact analysis, including those for which exact
current computing power, the conditional reference calculations may never be feasible.
set for such tables is often too large to be handled Mehta, Pate1 and Senchaudhuri (1988) described
by exact methods. Moreover, the cardinality grows a more sophisticated and faster Monte Carlo ap-
so rapidly as a function of the number of cells that, proach, using importance sampling. Tables are
regardless of future improvements in computer sampled from the conditional reference set in pro-
power, it may always be possible to produce tables portion to their importance for reducing the vari-
that cannot be handled exactly. ance of the estimated p-value, rather than in
A good compromise for handling large, sparse proportions corresponding to their hypergeometric
tables is to estimate precisely the inferential char- probabilities. In importance sampling, each Sam-
acteristics of interest, such as exact p-values and pled table provides an estimate of the p-value that
confidence intervals. One can do this using Monte is designed to be much better than the crude
Carlo sampling of tables in the reference set, by Bernoulli estimate provided by Monte Carlo Sam-
simulating the conditional hypergeometric Sam- pling. The tables are sampled using a network
pling distribution (Agresti, Wackerly and Boyett, algorithm. For linear rank tests for 2 x J tables,
1979; Boyett, 1979; Cox and Plackett, 1980; Pate- they noted that importance sampling can be up to
field, 1981, 1982; Kreiner, 1987; StatXact, 1991). four orders of magnitude more efficient than Monte
Each sampled table provides a Bernoulli random Carlo sampling. That is, to achieve a certain fixed
variable, indicating whether the test statistic is at accuracy with a p-value estimate, the ratio of the
least as large as observed. The estimated exact number of tables sampled using the Monte Carlo
p-value is the sample mean of those Bernoulli ran- approach versus importance sampling was about
dom variables, which is the proportion of sampled 10,000 for tests such as the trend test. However,
tables that have test statistics at least as large as the initial overhead involved in using importance
the observed one. The precision of the estimate sampling, due partly to using backward induction
is determined by the estimated sample variance with the network algorithm to set up the network-
P ( l - P)/N, where N is the number of tables Sam- based sampling scheme, makes it inefficient for
'

pled. Sampling of 17,000 tables ensures the esti- certain very large data sets.
mate is good to within 0.01 with confidence at least
7.2 Software
0.99.
Agresti (1990, page 308) gave a 16 x 5 table with Until recently, software for exact methods for
219 observations, relating various characteristics of contingency tables was nearly nonexistent, at least
alligators to their primary food choice (having cate- in the most popular statistical packages. Even now,
gories: fish, invertebrate, reptile, bird, other). For nearly all packages can perform Fisher's exact test
such large tables, one could either sample a fixed but little if anything else. With a couple of excep-
number of tables N to guarantee a certain accu- tions, our discussion here is limited to the most
racy, or repeatedly take samples of N' tables until commonly used packages.
achieving a certain accuracy. Using the latter ap- SAS (using procedure FREQ) and IMSL (using
proach with N' = 2,000 and desired accuracy routine CTPRB) can perform Fisher's exact test
0.0005 for a p-value for testing independence with and the Freeman-Halton extension for I x J ta-
the Pearson statistic, Monte Carlo sampling of bles, but do not give options to perform tests that
10,000 tables provides an estimated exact p-value base the p-values on goodness-of-fit statistics or
146 A. AGRESTI

ordinal statistics. SAS uses the network algorithm in the conditional reference set. Vollset and Hirji
from Mehta and Pate1 (1983), whereas IMSL uses (1991) gave a fast GAUSS program for the exact
an algorithm that enumerates the entire reference test of conditional independence and confidence in-
set. Thus, although neither program seems to have terval for a common odds ratio, and indicated that
limits on table sizes, SAS can handle a much greater it can handle up to about 1,000 points in the distri-
variety of tables in a reasonable amount of time. bution of C , n,,, .
Currently, BMDP and SPSSXonly perform Fisher's
exact test. All these packages seem to use the table 8. OTHER APPROACHES TO EXACT
probability as the basis of ordering the reference INFERENCE
set for two-sided p-values.
StatXact (1991) is a statistical package specializ- This discussion has focused on the classical condi-
ing in exact nonparametric inference and in exact tional approach to exact inference for contingency
inference for contingency table problems. Devel- tables. This section discusses controversies re-
oped by Mehta and Pate1 and colleagues, it uses garding that approach and describes alternative
versions of the network algorithm described in their approaches that produce results having some con-
articles. For 2 x J tables, StatXact performs a gen- nection with those for exact conditional methods.
eral linear rank test that includes as special cases
8.1 Controversy Over Exact Conditional Approaches
the Wilcoxon test and a trend test with arbitrary
scores. For I x J tables with min(I, J ) I 5, it can Most of the debate about exact conditional meth-
perform the Freeman-Halton test and exact tests ods for categorical data has focused on their use
of independence using the Pearson or likelihood- with Fisher's exact test when both margins of the
ratio chi-squared statistics. For tables with ordered table are not naturally fixed. I discuss the contro-
columns, it can perform the exact test using the versy only briefly, as it has already generated an
Kruskal-Wallis statistic when max(I, J ) I 5. enormous literature. See, for instance, Barnard
When rows are also ordered, it can perform the (1945, 1947, 1949, 1979, 1989), Berkson (1978),
exact test of linear-by-linear association described Basu (1979), Kempthorne (1979), Upton (1982),
by Agresti, Mehta and Pate1 (1990) when min(I, J ) Suissa and Shuster (1984, 1985), Yates and discus-
s 5, and it can use the Jonckheere-Terpstra statis- sants (1984), Bhapkar (1986), Haber (1987, 1989),
tic when I s 5. For 2 x 2 x K tables, it performs D'Agostino, Chase and Belanger (1988), Lloyd
Birch's exact test of conditional independence (for (1988b), Rice (1988), Little (1989), Camilli (1990),
K I 200), Zelen's exact test of homogeneity of odds Mehta and Hilton (1990), Routledge (1990), Storer
ratios, and "exact" confidence intervals for an as- and Kim (1990), and Greenland (1991).
sumed common odds ratio. For 2 x J x K tables, The perceived problem with the test results
StatXact performs str-atified linear rank tests and mainly from the conditional distribution of n,, (or
can perform inference for parameters in conditional the odds ratio) being highly discrete, much more so
logistic regression models. For I x J and 2 x J x K than when one or neither margin is fixed. This
tables, StatXact performs Monte Carlo sampling results in the test being quite conservative in a
for cases beyond its capability for exact inference conditional or unconditional sense, when used with
(as long as I s 50, J s 50, K 5 200, and I J K s a fixed significance level such as a = 0.05. The
2500), and it can perform importance sampling for actual probability of rejecting the null hypothesis
2 x J tables. The StatXact manual is also a good may be considerably less than the nominal level.
reference for many examples of exact conditional Proponents of Fisher's test (e.g., Yates, 1984) ar-
analyses. Many of the StatXact routines are gue that (1) one should not use arbitrary fixed
also available in the EGRET statistical software significance levels (versus simply reporting the
package (EGRET, 1991). p-value), (2) one should not average into the cal-
Baptista and Pike (1977) gave a FORTRAN pro- cu lation of the p-value other tables whose mar-
gram for the Sterne-type confidence interval for the g i n a l ~did not occur, and (3) no substantive loss of
odds ratio in a single 2 x 2 table. For 2 x 2 x K information about H,, results from conditioning
tables, Thomas (1975) gave a FORTRAN program on the marginals (i.e., the marginal counts are
for the conditional ML estimate and an "exact" approximately ancillary).
confidence interval for a common odds ratio, and The randomized-decision version of Fisher's exact
for exact tests of conditional independence and test, using randomization on the boundary of a
no three-factor interaction (when K I 20). This critical region to achieve a fixed significance level
program can be slow since, unlike algorithms a , is UMPU (Tocher, 1950). This is of little solace
presented by Pagano and Tritchler (1983b) and for practical work, since randomization is not used,
StatXact (1991), it requires evaluating every table although it indicates that Fisher's test may also
EXACT INFERENCE FOR CONTINGENCY TABLES 147
perform well for the mid-p definition of a p-value. and Belanger, 1988). Others recommend using an
The mid-p value is half the probability of the ob- "exact" unconditional test.
served result plus the probability of more extreme To illustrate the latter approach, consider testing
values. For discrete data, the mid-p value has null n, = n, (0 = 1)for two independent binomial sam-
behavior more nearly like a uniform ( 0 , l ) random ples. One first computes an exact p-value P ( a ) for
variable than the ordinary p-value (e.g., its null each possible common value n of n, and n,. For
expected value is 0.5, and the sum of its two one- this p-value, one might use the product binomial
tailed p-values equals 1.0). It has been recom- probability that a z statistic (or chi-squared statis-
mended (e.g., by Lancaster, 1961; Plackett in the tic) for comparing two proportions is at least as
discussion of Yates, 1984; Barnard, 1989, 1990) as large as observed, when n is the value of the
a good compromise between having a conservative nuisance parameter. The global p-value for the test
test and using randomization on the boundary to with unknown n is then P = sup, P(n). Proposed
eliminate problems from discreteness. For compar- by Barnard (1945, 1947), this test was disavowed
ing two binomial probabilities n1 and a,, Hirji, by him in later publications (e.g., Barnard, 1949,
Tan and Elashoff (1991) noted that the mid-p ad- discussion of Yates, 1984). Boschloo (1970), McDon-
justment to Fisher's exact test has actual levels of ald, Davis and Milliken (1977), Suissa and Shuster
significance closer to nominal levels than do classi- (1984, 1985), Haber (1986a, 1987, 1989), Shuster
cal asymptotic tests. This is especially true when (1988) discussed computational implementation of
the common value of i n i ) is near 0 or 1or when the the unconditional test. Suissa and Shuster (1991)
sample sizes nl+ and n,, are quite different. Haber extended it to comparisons of dependent propor-
(1986b) obtained similar conclusions both for com- tions for matched pairs.
paring binomials and for multinomial sampling The 2 x 2 table having entries (3,0/0,3), dis-
over the four cells. Hirji (1991) showed that, for cussed by Barnard (1945) and Fisher (1945) [see
tests of parameters in conditional logistic models also Little (1989) and Routledge (1990)1, illustrates
for case-control designs with unmatched binary that results with the conditional and unconditional
covariates, mid-p adjustments to an exact test per-
form well in approximating nominal levels com-
pared to the exact test and asymptotic score tests.
the Fisher p-value is 2(:)(:) fi:)
approaches can be quite discre ant. For HI: 0 # 1,
= 0.100, and the
asymptotic Pearson chi-squared test has a p-value
Analogous remarks apply to "exact" interval es- of 0.014. For the exact binomial test of n, = a,
timation. Although necessarily conservative, "ex- having only fixed row totals (3,3), the Pearson
act" interval estimates for the odds ratio can be so chi-squared value of 6.0 that occurs for the ob-
highly conservative as to be less useful than served table and for table (0,3/3,0) is the maxi-
asymptotic large-sample approaches in terms of mum possible, and the p-value for given nuisance
long-run coverage performance. An adaptation of parameter a is 2a3(1 - a)3. The supremum of this
Cornfield's "exact" method uses a mid-p adjust- over 0 Ia I1occurs at n = 112, giving p-value =
ment, choosing 0 endpoints that have one-sided 1/32. This unconditional p-value is related to
mid-p values of a / 2 . This approach does not guar- Fisher conditional p-values by
antee the desired coverage probability, but simula-
tions by Mehta and Walsh (1992) showed that it 6
performs well in this respect. For small samples, P ( X 2 2 6) = p(x22 6 1 n,, = k) P ( n + , = k ) .
the mid-p-adjusted intervals can be much narrower. k=O

For instance, for Fisher's tea-tasting data, having


,rows (3,1/1,3), the "exact" 95%' confidence inter- But the conditional probability that X 2 2 6 is zero
val' is (0.21,626.2), and the mid-p-based 95% confi- except when n,, = 3, so the unconditional p-value
dence interval is (0.31,308.6). Vollset, Hirji and is a weighted average of the Fisher p-value for the
Afifi (1991) showed similar good performance of the observed column marginals and p-values of 0 corre-
mid-p approach for interval estimation of parame- sponding to the impossibility of getting results as
ters in conditional logistic designs. extreme as observed if other marginals had oc-
cured, that is, 1/32 = 0.10[(:)(1/2)'], where the
8.2 Exact Unconditional Approach
term in brackets is the binomial probability (when
Other methods for 2 x 2'tables have been moti- a = 112) of the column marginals (3,3). Fisher
vated by the controversy about the conditioning (1945) remarked, "It is my view that the existence
argument and the conservativeness of Fisher's ex- of these less informative possibilities should not
act test. Some authors argue that it is better to use affect our judgment of significance based on the
a robust asymptotic test than a possibly highly series actually observed. . . . The fact that such an
conservative exact test (e.g., D'Agostino, Chase unhelpful outcome as these might occur, or must
148 A. AGRESTI

occur with a certain probability, is surely no reason is constant, she showed that the Bayesian evidence
for enhancing our judgment of significance in cases against the null is weaker as the number of pairs
where it has not occurred;. . . it is only the sam- giving the same response at both occasions in-
pling distribution of samples of the same type that creases, for fixed n,, and n,,. This result differs,
can supply a rational test of significance." and is perhaps more intuitively pleasing, than the
The unconditional test is so computationally in- exact conditional ML result. For examples of other
tensive that it seems complex to extend it to larger Bayesian analyses for 2 x 2 tables, see Leonard
contingency-table problems (Mehta and Hilton, (1975), Chen and Novick (1984) and Nurminen and
1990). In addition, removing nuisance parameters Mutanen (1987).
by taking a supremum may itself produce quite
conservative analyses for tables having larger 9. FUTURE RESEARCH
numbers of such parameters. It should be noted By the turn of the century, we should see ad-
that asymptotic procedures can also be quite con- vances in applicability of exact methodology for
servative. For instance, Koehler and Larntz (1980) contingency tables at least comparable to those of
noted that the likelihood-ratio test tends to be the past decade. One does not need a crystal ball to
highly conservative when most expected frequen- predict that computer speed will continue to in-
cies are smaller than 0.5. To illustrate, consider crease and algorithms will be further improved, so
the 3 x 9 table (0, 7, 0, 0, 0, 0, 0, 1, 111, 1, 1, 1, that tables not now feasible for analysis soon will
1,1,1,0,0/0,8,0,0,0,0,0,0,0), discussed in the be. In addition, it is reasonable to expect develop-
StatXact manual. For the likelihood-ratio statistic, ment of algorithms to handle new types of categori-
the asymptotic p-value is 0.0837 and the exact cal data, in particular, more complex relationships
p-value is 0.0015; for the Pearson statistic, the for larger tables in higher dimensions.
values are 0.1342 and 0.0013. This article has emphasized exact inference for
8.3 Bayesian Approaches
contingency tables in the context of loglinear mod-
eling. I believe that presenting the methods in the
For 2 x 2 tables, Bayesian approaches using cer- context of inference for parameters in models helps
tain "conservative" prior distributions give results to unify a variety of exact conditional methods. It
equivalent to conditional tests. For instance, Al- also helps to identify areas in which additional
tham (1969) gave an exact Bayesian analysis com- research would yield fruitful results, such as exact
paring parameters for two independent binomial inferences for I x J x K tables and a general good-
samples. She tested H,: a, 5 a, against a, > a, ness-of-fit test for loglinear models. This section
using a beta(a,, Pi) prior distribution for a,; i.e., summarizes other avenues for future research.
the prior for a, is proportional to (?ri)OL(l - a,)@, An important but difficult area for future re-
with a = a, - 1 and,@= Pi - 1, i = 1,2. The pos- search is exact analysis of model parameters for
+
terior distributigns are beta(a:, (3:) with a: = a i contingency tables that are large and sparse. For
nil and Pi = Pi + ni2.Taking the Bayesian p-value such data, standard asymptotic methods behave
to be the posterior probability that a, 4 ?r, Al- poorly; yet, it has been impractical to apply exact
tham showed this equals the one-sided p-value for methods. The size of the conditional reference set is
Fisher's exact test when one uses improper prior often too large to handle when the table has a large
distributions (a,, (3,) = (1,O) and (a,, (3), = (0,l). number of cells. This can also happen when the
This represents prior belief favoring the null hy- table is small but contains very large as well as
pothesis, in effect penalizing oneself against con- small cell counts. For such problematic tables, there
@,
cluding that a, > ?r,. If a, = =,y, i = 1,2, where should be additional research on (1) hybrid algo-
0 +y I1, Altham showed that the Bayesian p- rithms that use exact methods for some parts of the
value is smaller than the Fisher p-value, and the computation and approximations for other parts
difference between the two is no greater than the (Baglivo, Olivier and Pagano, 1988), (2) fast ways
null probability of the observed data. of simulating the exact distribution (Kreiner, 1987;
Altham (1971) also gave Bayesian analyses for Mehta, Patel and Senchaudhuri, 1988), and (3) bet-
dependent proportions. For a simple model in which ter asymptotic approximations (e.g., Koehler, 1986),
the probability of success is the same for each including saddlepoint approximations (Davison,
subject at a given occasion, 'she again showed that 1988; Booth and Butler, 1990; Bedrick and Hill,
the classical p-value is a Bayesian p-value for a 1992; Pierce and Peters, 1992).
prior distribution favoring the null hypothesis. For An area in which large, sparse tables commonly
a model similar to Cox's, in which the probability occur is modeling of longitudinal data with categor-
of success varies by subject but the occasion effect ical responses. Conditioning on sufficient statistics
EXACT INFERENCE FOR CONTINGENCY TABLES 149

to eliminate nuisance parameters, such as subject in extreme cases in which nearly all observations
random effects, can be very helpful. Even then, fall in one or two marginal categories for each
the large number of cells in the table make exact classification, sampling distributions are much
methods or approximations for them quite chal- "less discrete" for larger tables. As the number of
lenging. Another complication is that some useful dimensions or the number of response categories
models do not have reductions of data through per dimension increases, the sampling distributions
sufficiency, such as loglinear and logistic models that determine exact tests and confidence inter-
for the marginal probabilities rather than joint cell vals rapidly approach a continuous form. Thus, the
probabilities. conservativeness of the inferences becomes less
An area that has scope for lots of additional work problematic. For instance, performance of exact
is exact inference for I x J x K tables. Section 4.4 conditional tests should be closer to that of their
described several exact inferences for standard log- randomized versions, many of which are UMPU.
linear models. Other exact inferences are of inter- Also, the three approaches discussed for confidence
est when at least one variable is ordinal. In testing intervals for odds ratios (test-based with a / 2 in
conditional independence, one could use test statis- each tail, adapted Sterne, test-based with mid-p
tics designed to improve power for narrower alter- values) should all have true confidence levels close
natives. For instance, Section 4.4 discussed the to the nominal one.
possibility of testing conditional independence For cases in which discreteness is problematic,
against a linear-by-linear alternative. When X is there should be more work on developing adapta-
nominal and Y is ordinal, one could use a statistic tions of exact methods that, while perhaps sacrific-
analogous to a stratified Kruskal-Wallis statistic. ing "exactness," give performance closer to the
Such a statistic results from comparing (XZ, YZ) to randomized exact versions. Examples are adapta-
a loglinear model in which the X-Y partial associa- tions of exact tests and confidence intervals using
tion term has the form p i yj, for parametric row the mid-p value (e.g., Hirji, Tan and Elashoff, 1991;
effects { p i ) and rank scores { y j ) . Efficient score Mehta and Walsh, 1992), and tests and confidence
statistics for these cases are equivalent to statistics intervals that use the exact distribution computed
proposed by Birch (1965) and Landis, Heyman and at the ML estimate of the nuisance parameter (e.g.,
Koch (1978) for large-sample testing of conditional Conlon and Thomas, 1990; Storer and Kim, 1990).
independence. Similar remarks apply to exact tests So far there has been relatively little attention
of no three-factor interaction. Greater power for paid to algorithms for calculating power for exact
narrower alternatives (unsaturated models for conditional tests. The literature refers almost ex-
three-factor interaction) would be achieved by clusively to 2 x 2 tables. For instance, Gail and
using statistics that treat one, two or all three Gart (1973), Haseman (1978) and Suissa and
variables as ordinal. - Shuster (1985) discussed the determination of sam-
Of course, exact inferences for standard loglinear ple size for obtaining desired powers in fixed-a use
models and for models recognizing ordinality are of Fisher's exact test.
also relevant for four-way and higher dimensional Finally, it would be very useful to have a gen-
tables. Many tests of mutual independence or con- eral-purpose algorithm for exact tests comparing
ditional independence can be re-expressed in terms two nested loglinear models. This would unify ex-
of two-way or three-way tables, but not all. Also, act tests of goodness of fit for all loglinear models,
complications may result in tests about three-factor as well as tests against unsaturated alternatives.
and higher order interactions. The reference set of In summary, much has been accomplished in the
'tables having the required fixed 'marginals may be past decade, but this is only the tip of the iceberg
difficult to generate or simulate. For instance, in a of what can be done. In the future, statistical prac-
four-way table, an exact test of no three-factor in- tice for categorical data should place more empha-
teraction deals with a reference set in which all six sis on "exact" methods and less emphasis on
two-way margins are fixed. methods using possibly inadequate large-sample
There is unlikely to be as much controversy with approximations.
exact conditional methods for models for higher
dimensional tables as there has been with Fisher's ACKNOWLEDGMENTS
exact test for 2 x 2 tables. That controversy is
probably due less to philosophical disagreements This work was partially supported by NIH Grant
about conditioning than to practical consequences GM43824. The author appreciates helpful com-
of using. continuously defined measures such as ments on manuscript organization and content from
p-values with highly discrete distributions. Except Dr. Cyrus Mehta and two referees. Computations
150 A. AGRESTI

for many of the examples presented in this article BIRCH,M. W. (1965). The detection of partial association 11: The

were obtained using StatXact (1991). general case. J. Roy. Statist. Soc. Ser. B 27 111-124.

BISHOP,Y. M. M. (1971). Effects of collapsing multidimensional

contingency tables. Biometries 27 545-562.

REFERENCES BOOTH,J. and BUTLER,R. (1990). Randomization distributions

and saddlepoint approximations in generalized linear mod-

AGRESTI, A. (1990). Categorical Data Analysis. Wiley, New York. els. Biometrika 77 787-796.

AGRESTI,A., MEHTA,C. R. and PATEL,N. R. (1990). Exact


BOSCHLOO, R. D. (1970). Raised conditional level of significance

inference for contingency tables with ordered categories. J.


for the 2 x 2 table when testing for the equality of two

Amer. Statist. Assoc. 85 453-458.


probabilities. Statist. Neerlandica 21 1-35.

AGRESTI,A. and WACKERLY, D. (1977). Some exact conditional


BOULTON, D. M. (1974). Remark on algorithm 434. Comm.

tests of independence for R x C cross-classification tables.


ACM 17 326.

Psychometrika 42 111-125.
BOYETT,J. (1979). Random R x C tables with given row and

AGRESTI, A., WACKERLY, D. and BOYETT,J.


(1979). Exact condi- column totals. J. Roy. Statist. Soc. Ser. C 28 329-332.

tional tests for cross-classifications: Approximation of a t -


BRESLOW, N. E. and DAY,N. E. (1980). The Analysis of Case-
tained significance levels. Psychometrika 44 75-83.
Control Studies. IARC Scientific Publications No. 32, Lyon,
ALTHAM,P. M. E. (1969). Exact Bayesian analysis of a 2 x 2
France.
contingency table and Fisher's 'exact' significance test. J.
CAMILLI, G. (1990). The test of homgeneity for 2 x 2 contingency

Roy. Statist. Soc. Ser. B 31 261-269.


tables: A review of and some personal opinions on the

ALTHAM, P. M. E. (1971). The analysis of matched proportions.


controversy. Psychological Bulletin 108 135-145.

Bwmetrika 58 561-576.
CANTOR, A. B. (1979). A computer algorithm for testing signifi-
ANDERSEN, A. H. (1974). Multidimensional contingency tables.
cance in M x K contingency tables. In Proceedings of the
Scand. J. Statist. 1 115-127.
Statistical Computing Section 220-221. Amer. Statist. As-
ARMITAGE, P. (1955). Tests for linear trends in proportions and
soc., Washington, DC.
frequencies. Biometries 11 375-386.
CHEN,J. J. and NOVICK,M. R. (1984). Bayesian analysis for

BAGLIVO, J., OLIVIER,D. and PAGANO, M. (1988). Methods for the


binomial models with general beta prior distributions.

analysis of contingency tables with large and small cell


Journal of Educational Statistics 9 163-175.

counts. J. Amer. Statist. Assoc. 83 1006-1013.


COCHRAN, W. G. (1954). Some methods of strengthening the
BAKER,R. J. (1977). Exact distributions derived from two-way
common x 2 tests. Biometries 10 417-451.

tables. J. Roy. Statist. Soc. Ser. C 26 199-206.


COHEN,A. and SACKROWITZ, H. B.

(1991). Tests for indepen-

BALMER, D. W. (1988). Recursive enumeration of r x c tables for


dence in contingency tables with ordered alternatives. J.

exact likelihood evaluation. J. Roy. Statist. Soc. Ser. C 37


Multivariate Anal. 36 56-67.

290-301.
CONLON, M. and THOMAS, R. G. (1990). A new coddence inter-

BAPTISTA,J. and PIKE, M. C. (1977). Exact two-sided confidence


val for the difference of two binomial proportions. Comput.

limits for the odds ratio in a 2 x 2 table. J. Roy. Statist.


Statist. Data Anal. 8 237-241.

Soc. Ser. C 26 214-220.

COOLEY, J. M. and TUKEY,J. W. (1965). An algorithm for the

BARNARD, G. A. (1945). A new test for 2 x 2 tables (Letter to the

machine calculation of complex Fourier transforms. Math.

Editor). Nature 156 177.

Comp. 12 297-301.

BARNARD,G. A. (1947). Significance tests for 2 x 2 tables.

CORNFIELD, J. (1956). A statistical problem arising from retro-

Biometrika 34 123-138.

spective studies. Proc. Third Berkeley Symp. Math. Statist.

BARNARD, G. A. (1949). St'atistical inference. J. Roy. Statist.

Probab. 4 135-148 Univ. California Press, Berkeley.

Soc. Ser. B 11 115-139.

BARNARD, G. A. (1979). In contradiction to J. Berkson's dis-


Cox, D. R. (1958a). Two further applications of a model for

praise: Conditional tests can be more efficient. J. Statist.


binary regression. Biometrika 45 562-565.

Plann. Inference 3 181-188.


Cox, D. R. (195813). The regression analysis of binary sequences

BARNARD, G. A. (1989). On alleged gains in power from lower


(with discussion). J. Roy. Statist. Soc. Ser. B 20 215-242.

p-values. Statistics in Medicine 8 1469-1477.


Cox, D. R. (1966). A simple example of a comparison involving

BARNARD,
G. A. (1990). Must clinical trials be large? The inter-
quanta1 data. Biometrika 53 215-220.

pretation of P-values and the combination of test results.


Cox, D. R. (1970). Analysis of Binary Data. Chapman and Hall,
Statistics in Medicine 9 601-614.
London.
BASU,D. (1979). Discussion of "In dispraise of the exact test" by Cox, M. A. A. and PLACKETT, R. L. (1980). Small samples in

J. Berkson. J. Statist. Plann. Inference 3 189-192.


contingency tables. Biometrika 67 1-13.

BAYER,L. and Cox, C. (1979). Exact tests of significance in


CRESSIE,N. and READ,T. R. C. (1989). Pearson x 2 and the

binary regression models. J. Roy. Statist. Soc. Ser. C 28


loglikelihood ratio statistic G2: A comparative review. In-

319-324.
ternat. Statist. Rev. 57 19-43.

BEDRICK, E. J. and HILL,J. R. (1990). Outlier tests for logistic


D'AGOSTINO, R. B., CHASE,W. and BELANGER, A. (1988). The

regression: A conditional approach, Bwmetrika 77 815-827.


appropriateness of some common procedures for testing the

BEDRICK, E. J. and HILL, J. R. (1992). An empirical assessment equality of two independent binomial populations. Amer.

of saddlepoint approximations for testing a logistic regres- Statist. 42 198-202.

sion parameter. Bwmetrics. To$appear. DAVIS,L. J. (1986). Exact tests for 2 by 2 contingency tables.

BERKSON, J. (1978). In dispraise of the exact test. J. Statist.


Amer. Statist. 40 139-141.

Plann. Inference 2 27-42.


DAVISON,A. C. (1988). Approxima_te conditional inference in

BHAPKAR,V. P. (1986). On conditionality and likelihood with generalized linear models. J. Roy. Statist. Soc. Ser. B 50

nuisance parameters i n models for contingency tables. Tech- 445-461.

nical Report 253, Dept. Statistics, Univ. Kentucky. DUPONT, W. D. (1986). Sensitivity of Fisher's exact test to minor

BIRCH,M. W. (1964). The detection of partial association I: The


perturbations in 2 x 2 contingency tables. Statistics in

2 x 2 case. J. Roy. Statist. Soc. Ser. B 26 313-324.


Medicine 5 629-635.

EXACT INFERENCE FOR CONTINGENCY TABLES 151


EGRET. (1991). EGRET Statistical Software. Statistics and HABERMAN, S. J. (1988). A warning on the use of chi-squared

Epidemiology Research Corporation, Seattle. statistics with frequency tables with small expected cell

FISHER,R. A. (1934). Statistical Methods for Research Workers. counts. J. Amer. Statist. Assoc. 83 555-560.

(Originally published 1925, 14th ed. 1970.) Oliver and Boyd, HASEMAN, J. K. (1978). Exact sample sizes for use with the

Edinburgh. Fisher-Irwin test for 2 x 2 tables. Biometrics 34 106-110.

FISHER,R. A. (1935a). The logic of inductive inference (with


HEALY,M. J. R. (1969). Exact tests of significance in contin-

discussion). J. Roy. Statist. Soc. Ser. A 98 39-82.


gency tables. Technometrics 11 393-395.

FISHER,R. A. (1935b). The Design of Experiments (8th ed. 1966).


HILTON, J. F., MEHTA,C. R. and PATEL,N. R. (1991). Exact
Oliver and Boyd, Edinburgh.
Smirnov tests using a network algorithm. Technical Report
FISHER,R. A. (1945). A new test for 2 x 2 tables (Letter to the
14, Dept. Epidemiology and Biostatistics, Univ. California,
Editor). Nature 156 388.
San Francisco.
FISHER,R . A. (1950). The significance of deviations from expec-
HIRJI, K. F. (1991). A comparison of exact, mid-P, and score

tation in a Poisson series. Biometrics 6 17-24.


tests for matched case-control studies. Biometrics 47

FREEMAN, G. H. and HALTON,J. H. (1951). Note on a n exact


487-496.

treatment of contingency, goodness of fit and other problems


HIRJI,K. F., MEHTA,C. R. and PATEL,N. R. (1987). Computing

of significance. Biometrika 38 141-149.


distributions for exact logistic regression. J. Amer. Statist.

GAIL,M. H. and GART,J. J. (1973). The determination of sample


Assoc. 82 1110-1117.

sizes for use with the exact conditional test in 2 x 2 compar-

HIRJI, K. F., MEHTA,C. R. and PATEL,N. R. (1988). Exact

ative trials. Bwmetrics 29 441-448.


inference for matched case-control studies. Biometrics 44

GAIL,M. H. and MANTEL,N. (1977). Counting the number of

803-814.

r x c contingency tables with fixed margins. J. Amer.

HIRJI,K. F., TAN,S. and ELASHOFF, R. M. (1991). A quasi-exact

Statist. Assoc. 72 859-862.

GART,J. J. (1969). An exact test for comparing matched propor-


test for comparing two binomial parameters. Statistics in

tions in crossover designs. Biometrika 56 75-80.


Medicine 10 1137- 1153.

GART,J. J. (1970). Point and interval estimation of the common


HIRJI, K. F., TSIATIS,A. A. and MEHTA,C. R. (1989). Median

odds ratio in the combination of 2 x 2 tables with fixed


unbiased estimation for binary data. Amer. Statist. 43

marginals. Biometrika 57 471-475.


7-11.

GASTWIRTH,
J. L. (1988). Statistical Reasoning in Law and Pub-
IRWIN,J. 0. (1935). Tests of significance for differences between

lic Policy 1. Academic, San Diego.


percentages based on small numbers. Metron 12 83-94.

GIBBONS, J. D. and PRATT,J. W. (1975). P-values: Interpretation


KEMPTHORNE,
0. (1979). In dispraise of the exact test: Reactions.
and methodology. Amer. Statist. 29 20-25.
J. Statist. Plann. Inference 3 199-213.

GOOD,I. J. (1976). On the application of symmetric Dirichlet


KLOTZ,J. (1966). The Wilcoxon, ties, and the computer. J.

distributions and their mixtures to contingency tables. Ann.


Amer. Statist. Assoc. 61 772-787.

Statist. 4 1159-1189.
KLOTZ,J. and TENG,J. (1977). One-wajl layout for counts and

GOOD,I. J. (1977). The enumeration of arrays and a generaliza-


the exact eumeration of the Kruskal-Wallis H distribution

tion related to contingency tables. Discrete Math. 1 9 23-45.


with ties. J. Amer. Statist. Assoc. 72 165-169.

GOOD,I. J. (1982). The fast calculation of the exact distribution


KOEHLER, K. (1986). Goodness-of-fittests for log-linear models in

of Pearson's chi-squared and of the number of repeats within


sparse contingency tables. J. Amer. Statist. Assoc. 81

the cells of a multinomial by using a Fast Fourier Trans-


483-493.

form. J. Statist. Comput. Simulation 14 71-78.


KOEHLER, K. and LARNTZ, K. (1980). An empirical investigation

GOOD,I. J., GOVER,T. NF and MITCHELL, G. J. (1970). Exact


of goodness-of-fit statistics for sparse multinomials. J.

distributions for x 2 and for the likelihood-ratio statistic for


Amer. Statist. Assoc. 75 336-344.

the equiprobable multinomial distribution. J. Amer.


KREINER,S. (1987). Analysis of multidimensional contingency

Statist. Assoc. 65 267-283.


tables by exact conditional tests: Techniques and strategies.

GOODMAN, L. A. (1979). Simple models for the analysis of associ-


Scand. J. Statist. 14 97-112.

ation in cross-classifications having ordered categories. J.


KURITZ,S. J., LANDIS,J. R. and KOCH,G. G. (1988). A general

Amer. Statist. Assoc. 74 537-552.


overview of Mantel-Haenszel methods: Applications and re-

GRAUBARD, B. I. and KORN,E. L. (1987). Choice of column scores


. cent developments. Annual Review of Public Health 9

for testing independence in ordered 2 x K tables. Biomet-


123-160.

ries 43 471-476.
LANCASTER,
H. 0. (1961). Significance tests in discrete distribu-
GREENLAND, S. (1991). On the logical justification of conditional
tions. J. Amer. Statist. Assoc. 56 223-234.

tests for two-by-two contingency tables. Amer. Statist. 45

LANDIS,J. R., HEYMAN, E.


R. and KOCH,G. G. (1978). Average

248-251.

partial association in three-way contingency tables: A re-

HABER,M. (1986a). An exact unconditional test for the 2 x 2

view and discussion of alternative tests. Internat. Statist.

comparative trial. Psychological Bulletin 99 129-132.

Rev. 46 237-254.

HABER,M. (198613). A modified exact test for 2 x 2 contingency

LEHMANN, E. L. (1986). Testing Statistical Hypotheses, 2nd ed.


tables. Biometrical J. 28 455-463.

Wiley, New York.


HABER,M. (1987). A comparison of some conditional and uncon-

ditional exact tests for 2 by 2 contingency tables. Comm.


LEONARD, T. (1975). Bayesian estimation methods for two-way

Statist. Simulation Comput. 16 999-1013.


contingency tables. J. Roy. Statist. Soc. Ser. B 37 23-37.

HABER,M. (1989). 3 0 the marginal totals of a 2 x 2 contingency


LITTLE,R. J. A. (1989). Testing the equality of two independent

table contain information regarding the table proportions?


binomial proportions. Amer. Statist. 43 283-288.

Comm. Statist. Theory Methods 18 147-156.


LLOYD, C. J. (1988a). Doubling the one-sided P-value in testing

HABERMAN, S. J. (1974). Log-linear models for frequency tables independence in 2 x 2 tables against a two-sided alterna-

with ordered classifications. Biometrics 36 589-600.


tive. Statistics in Medicine 7 1297-1306.

HABERMAN,
S. J. (1977). Log-linear models and frequency tables
LLOYD,C. J. (1988b). Some issues arising from the analysis of

with small expected cell counts. Ann. Statist. 5 1148-1169.


2 x 2 contingency tables. Austral. J. Statist. 30 35-46.

152 A. AGRESTI

MANTEL, N. (1963). Chi-square tests with one degree of freedom:


random R x C tables with given row and column totals. J .

Extensions of the Mantel-Haenszel procedure. J . Amer.


Roy. Statist. Soc. Ser. C 30 91-97.

Statist. Assoc. 58 690-700.


PATEFIELD, W. M. (1982). Exact tests for trends in ordered

MANTEL,N. (1987). Exact tests for 2 x 2 contingency tables


contingency tables. J . Roy. Statist. Soc. Ser. C 31 32-43.

(Letter). Amer. Statist. 41 159.


PEARSON,E . S. (1990). 'Student' A Statistical Biography of

MANTEL,N. and HANKEY,B. J. (1971). Programmed analysis of


William Sealy Gosset (R. L. Plackett and G. A. Barnard,

a 2 x 2 contingency table. Amer. Statist. 25 40-44.


eds.). Clarendon Press, Oxford, England.

MARCH,D. L. (1972). Exact probabilities for R x C contingency


PERITZ,E. (1982). Exact tests for matched pairs: Studies with

tables. Comm. ACM 15 991-992.


covariates. Comm. Statist. Theory Methods 11 2165-2166.

MCCULLAGH,
P. (1986). The conditional distribution of goodness-
[Correction: (1983) 12 1209-1210.1

of-fit statistics for discrete data. J. Amer. Statist. Assoc.


PIERCE,D. A. and PETERS,D. (1991). Practical use of higher-order
81 104-107.
asymptotics for multiparameter exponential families. J .
MCCULLAGH, P. and NELDER,J. A. (1989). Generalized Linear Roy. Statist. Soc. To appear.
Models, 2nd ed. Chapman and Hall, London. RAO,C. R. (1973). Linear Statistical Inference and Its Applica-
MCDONALD, L. L., DAVIS,B. M. and MILLIKEN, G.
A. (1977). A tions, 2nd ed. Wiley, New York.
nonrandomized unconditional test for comparing two pro-
RICE, W. R. (1988). A new probability model for determining

portions in 2 x 2 contingency tables. Technometrics 19


exact P-values for 2 x 2 contingency tables when compar-

145-158.
ing binomial proportions. Biometrics 44 1-22.

MCNEMAR, Q. (1947). Note on the sampling error of the differ-


ROUTLEDGE, R. D. (1990). Resolving the controversy over Fisher's
ence between correlated proportions or percentages. Psy-
exact test. Paper presented a t the International Biometric
chometrika 12 153-157.
Conference, Budapest.
MEHTA,C. and HILTON,J. (1990). Exact power of conditional and
SANTNER, T. J. and SNELL,M. K. (1980). Small-sample confi-

unconditional tests: Going beyond the 2 x 2 contingency


dence intervals for p, - p, and p, l p , in 2 x 2 contingency

table. Unpublished manuscript.


tables. J. Amer. Statist. Assoc. 75 386-394.

MEHTA,C. R. and PATEL,N. R. (1983). A network algorithm for SHAPIRO,S., SLONE,D., ROSENBERG, L., KAUFMAN, D., STOLLEY,
performing Fisher's. exact test in r x c contingency tables. P. D. and MIEWINEN,0 . S. (1979). Oral contraceptive use in

J. Amer. Statist. Assoc. 78 427-434.


relation to myocardial infarction. Lancet 8119 743-746.

MEHTA,C. R. and PATEL,N. R. (1986). FEXACT: A Fortran


SHUSTER, J. J. (1988). EXACTB and CONF: Exact unconditional

subroutine for Fisher's exact test in unordered r x c contin-


procedures for binomial data. Amer. Statist. 42 234.

gency tables. ACM Trans. Math. Soffware 12 154-161.

SOMS,A. P. (1985). Permutation tests for k-sample binomial

MEHTA,C. R. and WALSH,S. J. (1992). Comparison of exact,

data with comparisons of exact and approximate P-levels.

mid-p, and Mantel-Haenszel confidence intervals for the


Comm. Statist. Theory Methods 14 217-233.

common odds ratio across several 2 x 2 contingency tables.

STATXACT. (1991). StatXact: Statistical Software for Exact Non-


Amer. Statist. To appear.

parametric Inference, Version 2. Cytel Software, Cambridge,


MEHTA,C. R., PATEL,N. R. and GRAY,R. (1985). Computing a n

Mass.
exact confidence interval for the common odds ratio in

STERNE,T. E. (1954). Some remarks on c ~ ~ d e n or c efiducial

several 2 by 2 contingency tables. J. Amer. Statist. Assoc.

limits. Biometrika 41 275-278.

80 969-973.

MEHTA,C. R., PATEL,N. R. and SENCHAUDHURI, P.


(1988). STORER, B. E. and KIM,C. (1990). Exact properties of some exact

Importance sampling for estimating exact probabilities in


test statistics for comparing two binomial proportions. J .

permutational inference. J. Amer. Statist. Assoc. 8 3


Amer. Statist. Assoc. 85 146-155.

999-1005.
STUMPF,R. H. and STEYN,H. S. (1986). Exact distributions

MEHTA,C . R., PATEL,N. R. and SENCHAUDHURI, P. (1991). Exact associated with a n I x J x K contingency table. Comm.

stratified linear rank tests for binary data. In Computing- Statist. Theory Methods 15 1889-1904.

Science and Statistics: Proceedings of the 23rd Symposium SUISSA,S. and SHUSTER, J. J. (1984). Are uniformly most power-

on the Interface. m. M. Keramidas., ed.), 200-207. Interface ful unbiased tests really best? Amer. Statist. 38 204-206.

~oundation;Fairfax Station, VA. SUISSA,S. and SHUSTER, J. J. (1985). Exact unconditional sam-

MEHTA,C. R., PATEL,N. R. and T s r ~ n s A. , A. (1984). Exact


ples sizes for the 2 by 2 binomial trial. J. Roy. Statist. Soc.

significance testing to establish treatment equivalence with


Ser. A 148 317-327.

ordered categorical data. Bwmetrics 40 819-825.


SUISSA,S. and SHUSTER, J. J. (1991). The 2 x 2 matched pair

MORRIS,C. N. (1975). Central limit theorems for multinomial


trial: Exact unconditional design and analysis. Biometries

s p . Ann. Statist. 3 165-188.


47 361-372.

NEYMAN, J. (1935). On the problem of confidence limits. Ann.


THOMAS,D. G. (1971). Exact confidence limits for the odds ratio

Math. Statist. 6 111-116.


in a 2 x 2 table. J . Roy. Statist. Soc. Ser. C 20 105-110.

NURMJNEN, M. and MUTANEN, P. (1987). Exact Bayesian analy-


THOMAS,D. G. (1975). Exact and asymptotic methods for the

sis of two proportions. Scand. J. Statist. 14 67-77.


combination of 2 x 2 tables. Computers and Biomedical

PAGANO,M. and HALVORSEN, K. T. (1981). An algorithm for


Research 8 423-446.

finding the exact significance levels of r x c contingency


TOCHER, K. D. (1950). Extension of the Neyman-Pearson theory

tables. J. Amer. Statist. Assoc. 76 931-934.


of tests to discontinuous variates. Biometrika 37 130-144.

PAGANO, M. and TRITCHLER, D. (1983a). On obtaining permuta-


TRITCHLER, D. (1984). An algorithm for exact logistic regression.
tion distributions in polynomial time. J. Amer. Statist.
J . Amer. Statist. Assoc. 79 709-711.

Assoc. 78 435-440.
UPTON,G. J. G. (1982). A comparison of alternative tests for the

PAGANO,M. and TRITCHLER,D. (1983b). Algorithms for the


2 x 2 comparative trial. J . Roy. Statist. Soc. Ser. A 145

analysis of several 2 x 2 contingency tables. SZAM J . Sci.


86-105.

Statist. Comput. 4 302-309.


VERBEEK,A. and KROONENBERG, P. M. (1985). A survey of
PATEFIELD,W. M. (1981). An efficient method of generating algorithms for exact distributions of test statistics in r x c
EXACT INFERENCE FOR CONTINGENCY TABLES 153
contingency tables with fixed margins. Comput. Statist. criteria for categorical data. Internat. Statist. Rev. 50
Data Anal. 3 159-185. 27-34.
VOLLSET, S. E. and HIRJI, K. F. (1991). A microcomputer pro- YATES,F. (1934). Contingency tables involving small numbers
gram for exact and asymptotic analysis of several 2 x 2 and the x 2 test. J. Roy. Statist. Soc. Supp. 1 217-235.
tables. Epidemiology 2 217-220. YATES,F. (1984). Tests of significance for 2 x 2 contingency
VOLLSET, S. E., HIRJI,K. F. and AFIFI,A. A. (1991). Evaluation tables. J. Royal Statist. Soc. Ser. A 147 426-463.
of exact and asymptotic interval estimators in logistic anal- ZELEN,M . (1971). The analysis of several 2 x 2 contingency
ysis of matched case-control studies. Biometrics 47 tables. Biometrika 58 129-137.
1311-1325. ZELEN,M. (1972). Exact significance tests for contingency tables
VOLLSET, S. E . , HIRJI, K. F. and ELASHOFF, R. M. (1991). Fast embedded in a 2" classification. Proc. Sixth Berkeley Symp.
computation of exact confidence limits for the common odds Math. Statist. Probab. 1 737-757. Univ. California Press,
ratio in a series of 2 x 2 tables. J. Amer. Statist. Assoc. 86 Berkeley.
404-409. ZELTERMAN, D. (1987). Goodness-of-fit tests for large sparse
WHITE,A. A,, LANDIS,J. R. and COOPER,M. M. (1982). A note multinomial distributions. J. Amer. Statist. Assoc. 82
on the equivalence of several marginal homogeneity test 624-629.

Comment
Edward J. Bedrick and Joe R. Hill

We congratulate
- Professor Agresti
- for his com- CHECKING LOGISTIC REGRESSION MODELS
prehensive review of exact inference with cate-
gorical data. We share his enthusiasm for exact We would like to convey some of our recent work
on model checking for logistic regression and some
conditional methods and believe that the coming
years will produce many important computational of our thoughts regarding conditional inference.
breakthroughs in this area. For the sake of simplicity, we assume that a single
model is under consideration. A little notation is
The mechanics of conditioning on sufficient
statistics to generate reference distributions for es- required. The usual logistic regression model has
two distinct parts: a sampling component and a
timation, testing and model checking with loglin-
structural component. The sampling component
ear models for Poisson data and logistic regression
specifies that Y = (Y,, . . . , Y,)' is a vector of inde-
models for binomial data are well-known, but the
utility of conditioning in these settings is not uni-
pendent binomial random variables with Y, -
versally agreed upon. Furthermore, the role of con- Bin(m,, n,). The structural or regression compo-
nent of the model is given by
ditioning- in the analysis of discrete generalized
-
linear models with noncanonical link functions has logit ( n ) = XP,
received little attention from most of the statistical
community. As a result, scientists and statisticians where logit(*) is a n n x 1 vector of log-odds with
are familiar with conditional methods, but many elements log{ a,/ ( l - n,)), X is a n n x p full-
are unsure how ,such methods should be incor- column rank design matrix with ith row xi, and 0
porated into a n overall strategy for analyzing cate- is a p x 1 vector of unknown regression parame-
gorical data. We feel that the use and abuse of ters. Under model (I), S = X' Y is sufficient for 0.
conditional methods will not be fully understood or Let 7i be the MLE of a under this model.
appreciated without such a strategy. We hope that The distribution of the data pr(Y; P), indexed by
Professor Agresti's survey and the ensuing discus- 0, can be factored into the marginal distribu-
sions stimulate further work in this direction. tion of the sufficient statistic S, and the condi-
tional distribution of the data given the sufficient
statistic:

Edward J. Bedrick is an Assistant Professor, De-


partment of Mathematics and Statistics, University Taking a Fisherian point of view (Fisher, 1950),
of New Mexico, Albuquerque, New Mexico 87131. inferences about 0 are based on pr(S; P), whereas
Joe R. Hill is an R&D Specialist at EDS Research, model checks use the conditional distribution
5951 Jefferson Street, NE, Albuquerque, New pr(Y 1 S). Letting sobs= X' yobs be the observed
Mexico 87109. value of the sufficient statistic for the logistic model,

You might also like