Professional Documents
Culture Documents
JWBS074-c15 JWBS074-Huitema August 21, 2011 7:48 Printer Name: Yet to Come
CHAPTER 15
15.1 INTRODUCTION
1. The values predicted from the equation may be outside the possible range for
probability values (i.e., zero through one).
2. The homoscedasticity assumption will be violated because the variance on Y
will differ greatly for various combinations of treatments and covariate scores.
3. The error distributions will not be normal.
The Analysis of Covariance and Alternatives: Statistical Methods for Experiments, Quasi-Experiments,
and Single-Case Studies, Second Edition. Bradley E. Huitema.
© 2011 John Wiley & Sons, Inc. Published 2011 by John Wiley & Sons, Inc.
321
P1: TIX/OSW P2: ABC
JWBS074-c15 JWBS074-Huitema August 21, 2011 7:48 Printer Name: Yet to Come
The achievement outcome data presented in Example 6.1 (Chapter 6) are approxi-
mately continuous, but they can be modified (for purposes of this chapter) to become
dichotomous. This was accomplished by changing each achievement score to 0 if the
original value was 33 or less, and to 1 if the original value was equal to or greater
than 34. Hence, each subject was classified as either unsuccessful (0) or successful
(1) with respect to achievement. This 0–1 dependent variable was then regressed
(using OLS) on the group-membership dummy variables and the covariate in order
to estimate the parameters of the ANCOVA model.
After the model was fitted the equation was used to predict the probability of
academic success for each of the 30 subjects. That is, dummy variable and covariate
scores for each subject were entered in the equation and Ŷ was computed for each
subject. It turned out that one of the predicted values was negative and one was greater
than one. This is an undesirable property for a procedure that is intended to provide
a probability value. But this is not the only problem with the analysis.
Recall that the conventional approach for identifying departures from ANCOVA
assumptions involves inspecting the residuals of the fitted model. The residuals shown
below are from Example 6.1.
0.5
0.0
RESI1
–0.5
–1.0
0.0 0.2 0.4 0.6 0.8 1.0 1.2
FITS1
πi = β0 + β1 X 1i + β2 X 2i + · · · + βm X mi
Odds Ratio
π
The odds of an event occurring is defined as 1−π . This is simply the population
proportion (or probability) of a “1” response (the event occurred) divided by the
proportion of “0” responses (the event did not occur). For example, if the proportion
of subjects who pass a test is .75 the odds ratio is .75
.25
= 3. The odds ratio for the
occurrence of an event has properties that are very different than the properties of
the proportion. One problem with proportions is that the substantive implications of
a given difference in proportions that fall near .50 are often very different than that
of the same difference falling near zero or one.
Suppose a study comparing two treatments finds that a difference in the proportion
of patients who survive is (.55 − .45) = .10, and a second study finds a difference
P1: TIX/OSW P2: ABC
JWBS074-c15 JWBS074-Huitema August 21, 2011 7:48 Printer Name: Yet to Come
of (.15 − .05) = .10. Although the treatment effect in each study is an absolute
difference in proportions of .10, the relative improvement in the two studies is quite
different. The first treatment in the first study resulted in a relative increase .10/.45 =
22%, whereas the first treatment in the second study resulted in a relative increase
of .10/.05 = 200%.
Whereas the proportion is confined to values zero through one, the odds ratio
ranges from −∞ to +∞. This important property has implications for justifying the
assumptions of the logistic model, which models the logit.
Logit
π
The population logit is defined as loge ( 1−π ). It can be seen that this is simply the log
(using base e) of the odds ratio; the less popular term “log-odds” is certainly more
descriptive than logit, but I stick with the more popular term in the remainder of the
chapter. The sample estimate of the logit computed for subject i is denoted as
π̂i
loge
1 − π̂i
π
loge = β0 + β1 X 1 + β2 X 2 + · · · + βm X m ,
1−π
Probability Estimation
After the parameters are estimated there is often interest in computing the probability
of an “event” for certain subjects or groups. Suppose a study uses a dichotomous
variable where the event is experiencing a heart attack. If this event is scored as
1 and not having a heart attack is scored as 0, the probability of being a 1 given
certain scores on the predictors is likely to be of interest. The probability estimates
P1: TIX/OSW P2: ABC
JWBS074-c15 JWBS074-Huitema August 21, 2011 7:48 Printer Name: Yet to Come
I now return to the example described at the end of Section 15.1. Dichotomous
ANCOVA can be computed on 0–1 outcome data via logistic regression; the ap-
proach is very similar to the method used to compute conventional ANCOVA through
OLS regression. The software and the details are different but the general ideas are
the same.
The first step is to regress the dichotomous dependent variable on the dummy
variables and the covariate using logistic regression. Second, regress the dichotomous
dependent variable on only the covariate. The output from both steps is shown below
for the three-group example described in Section 15.1.
Log-Likelihood = −13.966
Test that all slopes are zero: G = 12.449, DF = 3, P-Value
= 0.006
Log-Likelihood = −16.993
Test that all slopes are zero: G = 6.394, DF = 1, P-Value
= 0.011
Note that like OLS regression output, there is a column heading for predictors,
coefficients, and standard errors for the predictors. Unlike OLS output, there is
neither a column of t-values nor an ANOVAR summary table with the F-test for
the multiple regression. Instead we find z, p, the odds ratio, the 95% confidence
interval on the odds ratio, the log-likelihood, and a G-statistic along with the related
degrees of freedom and p-value. The z- and p-values are direct analogs to the t- and
p-values in OLS, and the G-statistic is the analog to the ANOVAR F. The G-value can
be interpreted as a chi-square statistic. The log-likelihood is related to the notion of
residual variation but it will not be pursued in this brief introduction. Additional detail
on logistic regression is available in the excellent work of Hosmer and Lemeshow
(2000).
A very convenient property is associated with the two G-values shown above.
Denote the first one as G (D1 ,D2 ,X ) ; it is associated with three predictors (D1 , D2 , and
the covariate X) in this example. Denote the second one as G (X ) ; it is associated with
only one predictor (the covariate X).
The difference (G (D1 ,D2 ,X ) − G (X ) ) = X 2AT . This chi-square statistic is used to test
for adjusted treatment effects. The null hypothesis can be written as: H0 : π1 adj =
π2 adj = · · · = π j adj , where the π j adj are the adjusted population probabilities of a “1”
response. This hypothesis is the analog to the hypothesis tested using conventional
ANCOVA, and the chi-square statistic is the analog to the conventional ANCOVA
F-test. The G-values and the associated degrees of freedom described in the output
shown above are summarized in Table 15.1.
The p-value associated with an obtained chi-square of 6.100 with two degrees
of freedom is .047. This is implies that at least one of the three adjusted group
probabilities of academic success differs from the others.
The approach shown in this example generalizes to any number of treatment
groups. If there are two treatments, the first regression includes one dummy variable
and one covariate; the second regression includes only the covariate. Similarly, if
Table 15.1 Omnibus Test for Adjusted Treatment Effects in an Experiment with Three
Treatment Groups, One Covariate, and a Dichotomous Outcome Variable
Predictors in the Model G-Statistic Degrees of Freedom
D1 , D2 , X G (D1 ,D2 ,X ) = 12.449 3
X G (X ) = 6.349 1
Difference = 6.100 = χAT 2
Difference = 2
P1: TIX/OSW P2: ABC
JWBS074-c15 JWBS074-Huitema August 21, 2011 7:48 Printer Name: Yet to Come
there are four groups, the first regression includes three dummy variables and the
covariate; the second regression includes only the covariate.
Minitab will compute these adjusted probabilities. When entering menu commands
for logistic regression, click on “Predictions” and enter the appropriate dummy vari-
able scores and grand covariate means in the window below “Predicted event prob-
abilities for new observations.” The corresponding command line editor commands
used to compute the adjusted mean probability for the first treatment group in the
example study are listed below.
The complete logistic regression analysis output (not shown here) appears; the last
portion of the results contains the adjusted mean probability for group 1 (labeled as a
predicted event probability). It can be seen below that π̂1 adj = .3067. The last line of
output confirms that the values entered for dummy variables and the grand covariate
mean are 1, 0, and 49.333.
Output
Predicted Event Probabilities for New Observations
New Obs Prob SE Prob 95% CI
1 0.306740 0.170453 (0.0842125, 0.680404)
The same approach is used to compute the adjusted probabilities for groups 2 and
3. They are: π̂2 adj = .8917 and π̂3 adj = .6642. The chi-square statistic is the omnibus
test for differences among these three values; they are the essential descriptive results
that will be reported. An alternative is to convert the probabilities to the corresponding
odds ratios (i.e., .44, 8.23, and 1.98).
P1: TIX/OSW P2: ABC
JWBS074-c15 JWBS074-Huitema August 21, 2011 7:48 Printer Name: Yet to Come
Table 15.2 Test for Homogeneous Logistic Regressions in an Experiment with Three
Treatment Groups, One Covariate, and a Dichotomous Outcome Variable
Predictors in the Model G-Statistic Degrees of Freedom
D1 , D2 , X , D1 X , D2 X G (D,X,DX) = 13.716 5
D1 , D2 , X G (D,X) = 12.449 3
Difference = 1.267 = χAT
2
Difference = 2
The approach described in the case of one covariate generalizes directly to multiple
covariates. The first step involves the logistic regression of the 0–1 outcome variable
on all required dummy variables and all covariates. The second step involves the
regression of the 0–1 outcome variable on all covariates. The corresponding G-
statistics are G (D,X) and G (X) .
The example shown below has three groups and two covariates. The 0–1 outcome
scores are identical to those used in the previous section; the two covariates are the
same as shown in the example of multiple covariance analysis presented in Chapter
10 (Table 10.1). As can be seen in the output shown below, the first regression has
two dummy variables and two covariates. The second regression has two covariates.
Log-Likelihood = −14.177
Test that all slopes are zero: G = 12.027, DF = 2, P-Value
= 0.002
The difference between the two G-statistics is 6.169 and the difference between
the degrees of freedom associated with the G-statistics is 2. The p-value associated
with an obtained chi-square value of 6.169 is .0457. Hence, it is concluded that there
are differences among treatments with respect to the probability of academic success.
The estimated probability of success for each treatment and the associated predictor
scores used to compute it can be seen in the output listed below.
Group 1
Group 2
Predicted Event Probabilities for New Observations
New Obs Prob SE Prob 95% CI
1 0.915468 0.0856622 (0.552986, 0.989563)
Values of Predictors for New Observations
New Obs D1 D2 X1 X2
1 0 1 49.3333 5
Group 3
Predicted Event Probabilities for New Observations
New Obs Prob SE Prob 95% CI
1 0.713984 0.198870 (0.270144, 0.943934
Values of Predictors for New Observations
New Obs D1 D2 X1 X2
1 0 0 49.3333 5
Multiple comparisons among the mean adjusted event probabilities estimated for
the various treatment groups may be of interest. If so, approximate analogs to
Fisher–Hayter and Tukey–Kramer approaches are shown in Table 15.3. The stan-
dard errors SEi and SE j shown in this table are included in the previously described
Minitab output associated with the option for “Predicted Event Probabilities for New
Observations.” The critical values are based on infinite degrees of freedom.
be of interest. For example, the FH-type test for treatments 1 and 2 is computed as
follows:
.286 − .915
= 4.10 = q.
(.199094)2 + (.0856622)2
2
The example of dichotomous outcome data analyzed in this chapter was obtained by
simply forcing quantitative data into a dichotomy. It is of interest to compare results
of conventional ANCOVA on the original data (previously presented in Chapters 6,
7 and 10) with those using logistic analysis on the transformed data. Whenever
information is thrown away by forcing continuous data into a dichotomy (almost
always a bad practice), there is usually a loss of sensitivity unless there are outliers in
the original data. So larger p-values are expected using dichotomized data. Table 15.4
summarizes results from parallel analyses.
The pattern of outcomes is completely consistent for all analyses regardless of the
descriptive measure. That is, treatment 1 yields the poorest performance and treatment
2 yields the highest performance with respect to means, adjusted means, proportions,
and adjusted proportions. Inferentially, however, it can be seen that the p-values
for conventional ANCOVA and multiple ANCOVA are considerably smaller than
for the logistic counterparts. This confirms common wisdom regarding the effects
of forcing a continuous variable into a dichotomy. Of course, when the dependent
variable is a true dichotomy (e.g., alive versus dead) the methods of this chapter are
recommended.
P1: TIX/OSW P2: ABC
JWBS074-c15 JWBS074-Huitema August 21, 2011 7:48 Printer Name: Yet to Come
15.9 SUMMARY