You are on page 1of 5

BST762/STA632 Homework Assignment #3

Due Thursday, October 8, 2015, at the beginning of class


1. For this problem, you will be working with the dataset (smoking.sas7bdat) from the
Vlagtwedde-Vllaardingen Study, which is used as an example in Section 6.5 of your book.
In short, your outcome is FEV1 , and we are interested in comparing the mean time trend of
FEV1 over time for former (smoke=0) and current smokers (smoke=1). This dataset is an
example for which the temporal spacing is the same for everyone (planned measurements at
0, 3, 6, 9, 12, 15, and 19 years), but subjects did not have to contribue data at each of these
7 time points. In fact, some of the 133 subjects only contributed FEV1 at one time point.
The book fit a model that included the main effects and interaction of time in years and
smoking category. They further found that an interaction does not belong in the model.
Therefore, you will fit the following model for this problem:
F EV1ij = 0 + 1 smokei + 2 T imeij + ij
a) Just by looking at Figure 6.4 on page 155, do you think that the books conclusion that
an interaction does not belong in the model is a correct assumption? Why or why not? (5
points)
b) Your book fit this model using an unstructured covariance matrix. You are to fit it
with multiple covariance structures, using the full dataset, and then again using the first 35
subjects. Fill in the results for the tables in the Word document. (Do this twice, once for the
full dataset, and once for the reduced dataset. Therefore, you will have a total of 4 tables.)
(15 points)
Below is example code that you can use to obtain the reduced dataset:
data sample; set smoking; if id>35 then delete; run;
data smoking; set smoking; t=time; run;
When utilizing empirical SE estimates incorporating the Mancl and DeRouen (2001) correction, the following code may be helpful. The t is needed in the random statement because
not everyone contributes an observation at each time point.
proc glimmix data=smoking empirical=FIRORES;
class id t;
model fev1 = smoker time / solution;
random t / subject=id type=un vcorr=1 residual;
run;

2
c) Look at your results from the analyses of the full dataset.
i) Are there notable differences between the use of the typical model-based standard errors
(SEs) and df relative to the use of the Kenward and Roger Adjustment? Why or why not?
(5 points)
ii) Are there notable differences between use of the empirical SEs and df relative to the use
of the bias-corrected (using the Mancl and DeRouen correction) SEs and df? Why or why
not? (5 points)
iii) Comparing SE estimates, which structure seems to be correct, or at least reasonbly
close to being correct? Hint: Remember when these different SE estimators are and are
not appropriate; i.e., when they are and are not consistent estimators for the true SEs. (5
points)
d) Look at your results from the analyses of the reduced dataset.
i) For which structure are there notable differences between the use of the typical model-based
standard errors (SEs) and df relative to the use of the Kenward and Roger Adjustment? Why
does this occur only for this one structure? Hint: How many nuisance covariance parameters
are you estimating? (5 points)
ii) Are there notable differences between use of the empirical SEs and df relative to the use
of the bias-corrected (using the Mancl and DeRouen correction) SEs and df? Why or why
not? (5 points)

2. Suppose we carry out a general study of 75 subjects, and we are simply interested in
the association between X and Y (see the dataset association.sas7bdat). Fit the following
simple linear regression model in proc reg:
Yi = 0 + 1 Xi + i
a) Look at the diagnostic plots that proc reg automatically outputs. Do you see any model
violations? If so, what violation(s) do you see, and how can you tell? (5 points)
b) Fit the model again, only using the robust empirical SEs (use the Kauermann & Carroll
correction). Fit the model in proc glimmix. Here is the appropriate code:
proc glimmix data=hw.Association empirical=root;
class id;
model y=x / solution;
random _residual_ / subject=id type=vc;
run;
i) How have the SE estimates changed? (You should report the model-based SE estimates
from a, and the empirical SEs.) (5 points)
ii) Which SE estimates are appropriate to use: the model-based estimates from proc reg or
the empirical estimates? (5 points)

3. Suppose we carry out a general study of 100 subjects, and we fit the model below using
the quest3data.sas7bdat dataset. This study is meant to represent a study in which subjects
come in for four equally spaced visits, and we are interested in the association between two
variables (x1 and x2 ) and an outcome (Y ). All variables are time-dependent; i.e., their values
are not fixed throughout the study. Such variables could be blood pressure, body weight,
etc. We think that there is no time effect, but we want to first test and make sure there is
no time effect. Therefore, this model has a main effect for time, and interactions between
time and x1 and x2 . Suppose we know the true covariance structure has common variances
at each time point and the correlation structure is AR-1. (Note: I generated/simulated this
fake dataset, so I know that the true model from which I generated data has this structure
and has no time effect.)
All tests should be at the 5% significance level. For a)-c), fit the true covariance structure,
and assume that you are confident that it is the true covariance structure. Note that time
is continuous, so do not include it in the class statement in SAS.
Yij = 0 + 1 x1ij + 2 x2ij + 3 timej + 4 timej x1ij + 5 timej x2ij + ij ;
i = 1, . . . , 100; j = 1, 2, 3, 4; timej = j 1

a) Carry out a likelihood ratio test for the following (testing to see if time belongs in the
model): (10 points)

H0 : 3 = 4 = 5 = 0
HA : 3 6= 0 and/or

4 6= 0 and/or

5 6= 0

b) Take time completely out of the model, and refit as below.


Yij = 0 + 1 x1ij + 2 x2ij + ij ;
Use model-based standard error estimates and the Kenward and Roger (1997) adjustment.
i) Use Wald tests (you will do 2 separate tests) to test whether or not each of these two
covariates are associated with the outcome. Specifically, test the following for j = 1, 2:
H0 : j = 0
HA : j 6= 0
For each test, give the value for the test statistic, state what distribution the statistic approximately follows under H0 , give the p-value, and state your conclusion. (10 points)
ii) What are the 95% confidence intervals (CIs) for 1 and 2 ? Show your work. Hint: The
critical values for the above two tests (if based on a t-distribution) are approximately 1.97
(exact critical values can be obtained using the following R code: qt(.975,df)). (5 points)

5
c) You will now compare SE estimates. For model-based SE estimates, use the Kenward
and Roger (1997) adjustment. For empirical SE estimates, use the Kauermann and Carroll
(2001) bias-correction.
i) What are the model-based SE estimates for 1 and 2 ? What are the empirical SE
estimates for 1 and 2 ? Are the empirical and model-based SE estimates similar? Why or
why not? (5 points)
ii) Now fit the model using a covariance structure that assumes common variances and a CS
correlation. What are the model-based SE estimates for 1 and 2 ? What are the empirical
SE estimates for 1 and 2 ? Are the empirical and model-based SE estimates similar? Why
or why not? (5 points)
iii) Which working covariance structure resulted in the smaller empirical SE estimates, and
why? (5 points)

You might also like