You are on page 1of 18

Stat Comput (2016) 26:443–460

DOI 10.1007/s11222-014-9518-5

Bayesian longitudinal item response modeling with restricted


covariance pattern structures
Caio L. N. Azevedo · Jean-Paul Fox ·
Dalton F. Andrade

Received: 6 August 2012 / Accepted: 25 September 2014 / Published online: 25 October 2014
© Springer Science+Business Media New York 2014

Abstract Educational studies are often focused on growth restrictions. The study is motivated by a large-scale lon-
in student performance and background variables that can gitudinal research program of the Brazilian Federal gov-
explain developmental differences across examinees. To ernment to improve the teaching quality and general struc-
study educational progress, a flexible latent variable model ture of schools for primary education. It is shown that the
is required to model individual differences in growth given growth in math achievement can be accurately measured
longitudinal item response data, while accounting for time- when accounting for complex dependencies over grades
heterogenous dependencies between measurements of stu- using time-heterogenous covariances structures.
dent performance. Therefore, an item response theory model,
to measure time-specific latent traits, is extended to model Keywords Longitudinal item response theory · Covariance
growth using the latent variable technology. Following patterns · Bayesian inference · MCMC
Muthén (Learn Individ Differ 10:73–101, 1998) and Azevedo
et al. (Comput Stat Data Anal 56:4399–4412, 2012b), among
others, the mean structure of the model represents devel- 1 Introduction
opmental change in student achievement. Restricted covari-
ance pattern models are proposed to model the variance– Longitudinal item response data occur when students are
covariance structure of the student achievements. The main assessed at several time points (Singer and Andrade 2000).
advantage of the extension is its ability to describe and This kind of data consist of response patterns of different
explain the type of time-heterogenous dependency between examinees responding to different tests at different measure-
student achievements. An efficient MCMC algorithm is given ment occasions (e.g. grades). This leads to a complex depen-
that can handle identification rules and restricted paramet- dence structure that arises from the fact that measurements
ric covariance structures. A reparameterization technique is from the same student are typically correlated (Tavares and
used, where unrestricted model parameters are sampled and Andrade 2006).
transformed to obtain MCMC samples under the implied Various longitudinal item response theory (LIRT) models
have been proposed to handle the correlation between mea-
surements made over time. The popular mixed-effects regres-
sion modeling approach is often considered, where random
C. L. N. Azevedo (B) effects are used to model the between-student and within-
Department of Statistics, University of Campinas, student dependencies. Conoway (1990) proposed a Rasch
Campinas, São Paulo, Brazil
e-mail: cnaber@ime.unicamp.br
LIRT model to analyze panel data and proposed a marginal
maximum likelihood method (Bock and Aitkin 1981) for
J.-P. Fox
Department of Research Methodology, University of Twente,
parameter estimation. Liu and Hedeker (2006) developed a
Enschede, The Netherlands comparable three-level model to analyze LIRT data for ordi-
D. F. Andrade
nal response data. Eid (1996) defined a LIRT model for poly-
Department of Informatics and Statistics, Federal University of Santa tomous response data. Douglas (1999) analyzed longitudinal
Catarina, Florianópolis, Brazil response data from a quality of life instrument using a joint

123
444 Stat Comput (2016) 26:443–460

model, which consisted of proportional odds model and the within-Gibbs sampling algorithm when dealing with IRT
graded item response model. models. The proposed MCMC algorithm recovers all para-
In these mixed effects models, the assumption of con- meters properly and accommodates a wide range of variance–
ditional independence is achieved by the random effects. covariance structures. Using a parameter transformation
That is, the assumption is made that the time-variant mea- method, MCMC samples are obtained of restricted parame-
surements are conditionally independent given the student’s ters by transforming MCMC samples of unrestricted para-
latent trait. The random effects imply a compound symme- meters. The proposed modeling framework is extended with
try covariance structure, which assumes equal variances and various Bayesian model-fit assessment tools. Among other
covariances over time. In practice, the within-student latent things, a Bayesian p-value is defined based on a suitable dis-
trait dependencies are often not completely modeled and the crepancy measure for a global model-fit assessment and it is
errors are still correlated over time. Furthermore, in regres- shown how Bayesian latent residuals can be used to evaluate
sion using repeated measurements it is common for the errors the normality assumptions (Albert and Chib 1995).
to show a time series structure (e.g., an autoregressive depen- This paper is outlined as follows. After introducing the
dence) (Fitzmaurice et al. 2008; Hedeker and Gibbons 2006). Bayesian LIRT model, the FGS method is given, which can
If the dependence structure of the errors is not correctly spec- handle different variance–covariance structures. Then, the
ified, the parameter estimates and their standard errors will accuracy of the MCMC estimation method as well as the
be biased. prior sensitivity are assessed. Subsequently, a real data study
Therefore, following the work of Jennrich and Schluchter is presented, where the data set comes from a large-scale
(1986), Muthén (1998) and Tavares and Andrade (2006), longitudinal study of children from the fourth to the eight
restricted covariance pattern models are considered to model grade of different Brazilian public schools. One of the objects
the time series structure of the errors. That is, the errors of the study is to analyze the student achievements across
are allowed to correlate over time, and different variance- different grade levels. The model assessment tools are used
covariance structures are proposed to capture time-specific to evaluate the fit of the model. In the last section, the results
between-student variability and time-heterogenous longitu- and some model extensions are discussed.
dinal dependencies between latent traits. An important aspect
is that the covariance matrices considered allow for time-
heterogenous variances and covariances. The covariance pat- 2 The model
tern modeling framework is integrated in the LIRT model-
ing approach. At the student level, the time-specific latent A longitudinal test design is considered, where tests are
traits are assumed to be multivariate normally distributed, administered to different examinees at different points in
and the within-student correlation structure is modeled using time. For each measurement occasion at time point t, t =
a covariance pattern model. This makes it possible to model 1, ..., T, n t examinees complete a test consisting of It items.
the specific type of time-invariant and time-variant depen- The design can be typed as an incomplete block design such
dencies. that common items are defined T across tests and the total num-
This modeling framework builds on the work of Tavares ber of items equals I ≤ t=1 It . For a complete design,
and Andrade (2006), who proposed a logistic three-parameter n t = n, for all t. Dropouts and inclusion of students during
IRT model with a multivariate normal population distrib- the study are allowed.
ution for the latent traits. They used a covariance matrix to The following notation will be introduced. Let θ jt rep-
model the within-examinee dependency, where the variances resent the latent trait of examinee j ( j = 1, . . . , n,) at
are allowed to vary over time but the covariance structure time-point or measurement occasion t (t = 1, . . . , T ),
is assumed to be time-homoscedastic. The modeling frame- θ j. = (θ j1 , . . . , θ j T )t the vector of the latent traits of the
work also relates to the generalized linear latent trait model examinee j, and θ .. = (θ 1. , . . . , θ n. )t the vector of all latent
for longitudinal data of Dunson (2003), where the latent vari- traits. Let Yi jt represent the response of examinee j to item
able covariance structure is modeled via a linear transition i (i = 1, . . . , I ) in time-point t, Y . jt = (Y1 jt , . . . , Y It jt )t
model using observed predictors and an autoregressive com- the response vector of examinee j in time-point t, Y ..t =
ponent. (Y t.1t , . . . , Y t.n t t )t the response vector of all examinees in
A full Gibbs sampling (FGS) algorithm is developed, time-point t, Y ... = (Y t..1 , . . . , Y t..T )t the entire response
which avoids the use of MCMC methods that require adap- set. Let ζ i denote the vector of parameters of item i, ζ =
tive implementations, like the Metropolis–Hastings algo- (ζ t1 , . . . , ζ tI )t the whole set of item parameters, and ηθ the
rithm, to regulate the convergence of the algorithm. Fur- vector with population parameters (related to the latent traits
thermore, Sahu (2002) and Azevedo et al. (2012a) have distribution).
shown that an FGS algorithm tends to perform better, in A LIRT model is proposed that consists of two stages. At
terms of parameter recovery, than a Metropolis–Hastings- the first stage, a time-specific two-parameter IRT model is

123
Stat Comput (2016) 26:443–460 445

considered for the measurement of the time-specific latent and the measurement error associated with the traits dif-
traits given observed dichotomous response data. The item- fer over students and measurement occasions (e.g., NELS,
specific response probabilities are assumed to be indepen- the national education longitudinal study; student monitor-
dently distributed given the item and time-specific latent trait ing systems for pupils in the Netherlands). As noted by
parameters. At the second stage, the subject-specific latent Muthén (1998), the available longitudinal test data are of
traits are assumed to be multivariate normally distributed complex multivariate form, often involving different test
with a time-heterogenous covariance structure, that is: forms, attrition, and students sampled hierarchically within
schools.
Yi jt | (θ jt , ζ i ) ∼ Bernoulli(Pi jt ) Second, a fitted covariance pattern can provide insight in
Pi jt = P(Yi jt = 1 | θ jt , ζ i ) = Φ(ai θ jt − bi ) (1) the residual correlation between latent measurements over
θ j. |ηθ ∼ N T (μθ , Ψ θ ), (2) time. When a covariance pattern is identified, information
about the underlying growth process will be revealed. That
where ηθ consists on μθ and Ψ θ and Φ(.) stands for the is, on top of the change in latent traits modeled in the mean
cumulative normal distribution function. In this parametriza- structure, the fitted covariance pattern can further explain
tion, the difficulty parameter bi = ai bi∗ is a transformation the growth in latent traits. For example, when covariances of
of the original difficulty parameter denoted by bi∗ . latent traits change over time, a fitted covariance pattern can
The within-subject dependencies among the time-specific be used to identify and describe the type of change.
latent traits are modeled using a T -dimensional normal dis- Third, by correctly modeling the subject-specific corre-
tribution, denoted as N T (μθ , Ψ θ ), with mean vector μθ and lated residuals across measurement occasions, more accu-
unstructured covariance matrix Ψ θ , where rate statistical inferences can be made from the mean struc-
⎡ ⎤ ⎡ ⎤ ture. Here, time-heteroscedastic covariance structures are
μθ1 ψθ1 ψθ12 . . . ψθ1T
⎢ μθ2 ⎥ ⎢ ψθ12 ψθ2 . . . ψθ2T ⎥ considered to model complex patterns of residuals over
⎢ ⎥ ⎢ ⎥ time, where population variances of latent measurements
μθ = ⎢ . ⎥ and Ψ θ = ⎢ . .. .. .. ⎥ , (3)
⎣ .. ⎦ ⎣ .. . . . ⎦ can differ over time-points. A restricted covariance pattern
μθ T ψθ1T ψθ2T . . . ψθT model can lead to a decrease in model fit compared to the
unstructured covariance pattern model, when the data do
respectively. A total of T (T2+1) parameters need to be esti- not fully support the restriction. When the restricted covari-
mated for the unstructured covariance model. ance model is supported by the data, it can lead to an
improved fit compared to the unstructured model. Further-
2.1 Restricted covariance pattern structures more, due to the decrease in the number of model para-
meters, model selection criteria might prefer the restricted
In the LIRT model, the mean component of the multivariate covariance model over the unstructured covariance model.
population distribution for the latent traits can be extended Muthén (1998) and Jennrich and Schluchter (1986), among
to allow latent growth curves and include explanatory infor- others, already noted that the efficiency of the mean struc-
mation. Besides modeling the mean structure, it is also ture parameters can be improved by modeling the covari-
possible to model the correlation structure between latent ance structure parsimoniously. For sample sizes and unbal-
traits. Therefore, different covariance patterns are consid- anced data, the parameter estimates are most likely to be
ered, which are restricted versions of the unrestricted covari- improved.
ance matrix (Eq. 3). Each restricted covariance pattern can Muthén (1998) explained in more detail that the growth
address specific dependencies between the latent traits. model consists of two components, a mean and a covari-
Several arguments can be given to explicitly model the ance structure component. Both components need to be mod-
covariance structure of the errors. First, the unstructured eled properly to describe the growth, and data interpretations
covariance model for the latent variables measured at dif- depend on the specification of each component. Hedeker and
ferent occasions will allow one parameter for every unique Gibbons (2006) and Fitzmaurice et al. (2008) have shown
covariance term. There are no assumptions made about the that the analysis of longitudinal multivariate response data
nature of the residual correlation between the latent traits requires more complex covariance structures to capture the
over time. However, for unbalanced data designs, small sam- often complex dependency structures. Here, different covari-
ple sizes with respect to the number of subjects and items, ance pattern models will be considered. For all cases, the sam-
and many measurement occasions, the unstructured covari- pling design is allowed to be unbalanced, where subjects can
ance model may lead to unstable covariance parameter esti- vary in the number of measurement occasions and response
mates with large posterior variances (Hedeker and Gibbons observations per measurement. The measurement times can
2006; Jennrich and Schluchter 1986). In practical test situ- vary over subjects and are not restricted to be equally spaced
ations, the test length differs over occasions and students, over subjects.

123
446 Stat Comput (2016) 26:443–460

2.1.1 First-order heteroscedastic autoregressive model: be suitable when correlations decay quickly due to rela-
ARH tively large time-spaces between non-consecutive measure-
ment occasions. This covariance pattern is represented by,
This structure assumes that the correlations between sub-
ject’s latent traits decrease, when distances between instants ⎡ ⎤
ψ θ1 ψθ1 ψθ2 ρθ 0 ... 0
of evaluation increase. However the magnitude of the corre- ⎢ ψθ1 ψθ2 ρθ ψ θ ψθ ψθ3 ρθ ... 0 ⎥
⎢ 2 2 ⎥
lations depend only on the distance between the time-points, ⎢ 0 ψθ ψθ ρ θ ψ θ ... 0 ⎥
Ψθ = ⎢ 2 3 3 ⎥ .
not on their values. In addition, the variances are not assumed ⎢ .. .. .. .. . ⎥
⎣ . . . . .. ⎦
to follow any specific pattern. The form of the covariance 0 0 0 . . . ψθT
matrix is given by, (6)
⎡ ⎤
ψθ 1 ψθ 1 ψθ 2 ρ θ ... ψ ψ ρ T −1
⎢ ψθ ψθ ρ θ1 θT θT −2 ⎥
⎢ ψθ 2 ... ψθ 2 ψθ T ρ θ ⎥
Ψθ =⎢
1
.
2 θ
. .. . ⎥, In this case, ψθt ∈ (0, ∞) for t = 1, . . . , T , but ρθ ∈
⎢ . . . ⎥
⎣ . . . . ⎦ (−k, k),√where k depends on the value of T . For instance,
T −1
ψθ 1 ψθ T ρ θ ψθ2 ψθT ρθT −2 ... ψθ T k ≈ 1/ 2, for T = 3 and k = 1/2 for large T . For more
(4) details see Singer and Andrade (2000), Andrade and Tavares
(2005), and Tavares and Andrade (2006).
where ψθt ∈ (0, ∞), for t = 1, . . . , T , and ρθ ∈ (−1, 1).
Note that the number of parameters for the ARH(1) is T + 1,
which is much lower than the number of parameters for the 2.1.4 Heteroscedastic covariance model: HC
unstructured covariance matrix. For more details, see Singer
and Andrade (2000), Tavares and Andrade (2006), Andrade The heteroscedastic covariance (HC) model is a restricted
and Tavares (2005) and Fitzmaurice et al. (2008). version of the heteroscedastic uniform model, where in the
HC model, a common covariance is assumed across time
2.1.2 Heteroscedastic uniform model: HU points. As in the other covariance models, time-heterogenous
variances are assumed. The HC model also corresponds to
This is a special case of the ARH, which also assumes an unstructured covariance matrix with equal covariances.
time-heterogenous variances over measurement occasions Note that the time-heterogeneous variances define time-
but time-homogenous correlations over time. So, the HU heterogenous correlations, while assuming a common covari-
model assumes equal correlations between all pairs of time- ance term across time. Subsequently, relatively high time-
specific latent trait measurements, independently of the dis- specific latent trait variances will specify a low correlation
tance between them. The heteroscedastic uniform covariance between them. The HC covariance structure is represented
matrix is given by, by,
⎡ ⎤ ⎡ ⎤
ψ θ1 ψθ1 ψθ2 ρθ . . . ψθ1 ψθT ρθ ψθ1 ρθ ... ρθ
⎢ ψθ1 ψθ2 ρθ ψθ2 ... ψθ2 ψθT ρθ ⎥ ⎢ ρθ ψθ2 ⎥
⎢ ⎥ ⎢ ... ρθ ⎥
Ψθ = ⎢ .. .. .. .. ⎥ , Ψθ = ⎢ .
⎣ . ⎦ .. .. .. ⎥, (7)
. . . ⎣ .. . . . ⎦
ψθ1 ψθT ρθ ψθ2 ψθT ρθ ... ψθT
ρθ ρθ . . . ψθT
(5)
where, ψθt ∈ (0, ∞) for t = 1, . . . , T , and |ρθ | <
again, ψθt ∈ (0, ∞), for t = 1, . . . , T , and ρθ ∈ (−1, 1).
mini, j ψθi ψθ j . Fore more details, see Andrade and Tavares
This covariance structure reduces the number of covariance
(2005) and Tavares and Andrade (2006).
parameters to one, while allowing T variance parameters,
which resembles the total number of parameters used for the
ARH. See Singer and Andrade (2000), Tavares and Andrade 2.1.5 First-order autoregressive moving-average model:
(2006), Andrade and Tavares (2005) and Fitzmaurice et al. ARMAH
(2008) for more details.
As the first-order autoregressive ARH structure, correlations
2.1.3 Heteroscedastic Toeplitz model: HT between subject’s latent traits decrease as long as the dis-
tances between the instants of evaluation increase. However,
As a special case of covariance pattern HU, the heteroscedas- the decrease is further parameterized due to the additional
tic Toeplitz model assumes a zero covariance between sub- covariance parameter γθ . This covariance matrix, denoted
ject’s latent traits of two nonconsecutive instants. This might as ARMAH, generalizes the ARH model, since it supports a

123
Stat Comput (2016) 26:443–460 447

more flexible modeling of the time-specific correlations. The points). The common items, also known as anchors, make it
ARMAH covariance matrix is represented by, possible to measure the latent traits on one common scale.
⎡ ⎤ However, the estimation of the covariance parameters is
ψθ1 ψθ1 ψθ2 γθ ... ψ ψ γ ρ T −2
⎢ θ1 θT θ θT −3 ⎥ complicated due to the identifying constraints. Note that even
⎢ ψθ1 ψθ2 γθ ψθ2 ... ψθ2 ψθT γθ ρθ ⎥
Ψ θ =⎢
⎢ . . .. . ⎥.
⎥ for the unstructured covariance matrix, a restriction is implied
⎣ . . . . ⎦
. . .
T −2 on a variance parameter, which leads to an restricted unstruc-
ψθ1 ψθT γθ ρθ ψθ2 ψθT γθ ρθT −3 ... ψθT
tured covariance matrix. Furthermore, the restrictions on the
(8)
parameters of the latent trait distribution also complicate the
In this case, ψθt ∈ (0, ∞) for t = 1, . . . , T , and specification of priors. In this case, assuming an inverse-
(γθ , ρθ ) ∈ (−1, 1)2 . For more details, see Singer and Wishart distribution for the unstructured covariance matrix
Andrade (2000) and Rochon (1992). is not possible, when the variance parameter of the first time-
point is restricted to one.
In the present latent variable framework, a novel prior
2.1.6 Ante-dependence model: AD
modeling approach will be followed to account for the
restricted covariance structure. Following McCulloh et al.
The last covariance structure model that will be considered is
(2000), a parametrization of the latent trait’s covariance struc-
specifically useful when time points are not equally spaced
ture is considered. Therefore, the following partition of the
and/or there is an additional source of variability present.
latent traits structure is defined,
This (first-order) AD model is a general covariance model
that allows for changes in the correlation structure over time θ j. = (θ j1 , θ j2 , . . . , θ j T )t = (θ j1 , θ j (1) )t ,
and unequally spaced measurement occasions. μθ = (μθ1 , μθ2 , . . . , μθT )t = (μθ1 , μθ (1) )t ,
The AD model generalizes the ARH using time-specific
covariance parameters. It also generalizes the ARMAH where, θ j (1) = (θ j2 , . . . , θ j T )t and μθ(1) = (μθ2 , . . . , μθT )t .
model, since the covariance structure of the ARMAH is In this notation, the index (1) indicates that the first compo-
defined with ρθ1 = γθ and ρθ2 , . . . , ρθT = ρθ . The AD nent is excluded. It follows that the covariance structure, see
model supports a more dynamic modeling of the covariance definition in Eq. (3), is partitioned as,
pattern compared to ARH and ARMAH. The AD covariance
ψθ1 ψ tθ(1)
model is represented by, Ψθ = , (10)
ψ θ (1) Ψ θ (1)
⎡ T
−1 ⎤
where ψ θ (1) = (ψθ12 , . . . , ψθ1T )t and
⎢ ψθ1 ψθ1 ψθ2 ρθ1 ... ψθ1 ψθT ρθt ⎥
⎢ ⎥ ⎡ ⎤
⎢ t=1 ⎥

⎢ T
−1 ⎥

ψθ2 . . . ψθ2T
⎢ .. ⎥ .
Ψ θ(1) = ⎣ ...
⎢ ψθ1 ψθ2 ρθ1 ψθ2 ... ψθ2 ψθT ρθt ⎥ ..
⎢ ⎥
Ψ θ =⎢ t=2 ⎥, . . ⎦ (11)
⎢ .. .. .. ⎥
⎢ .. ⎥ ψθ2T . . . ψθT
⎢ . . . . ⎥
⎢ ⎥
⎢ −1
T

−1
T

⎣ ⎦ From properties of the multivariate normal distribution,
ψθ1 ψθT ρθt ψθ2 ψθT ρθt . . . ψθ T
t=1 t=2
see Rencher (2002), it follows that

(9) θ j (1) |θ j1 ∼ N(T −1) μ∗ , Ψ ∗ , (12)
where, ψθt ∈ (0, ∞) and ρθt ∈ (−1, 1), for t = 1, 2, ..., T − where
1. In conclusion, the AD model permits variances and cor- 
μ∗ = μθ(1) + ψθ−1 ψ θ(1) θ j1 − μθ1 ,
relations to change over time, and uses 2T − 1 parameters. 1

For more details, see Singer and Andrade (2000) and Nunez- and
Anton and Zimmerman (2000).
Ψ ∗ = Ψ θ(1) − ψθ−1
1
ψ θ(1) ψ tθ(1) . (13)
2.2 A restricted unstructured covariance structure As a result, when conditioning on the restricted first-time
point parameter, θ j1 , the remaining θ j (1) are conditionally
The latent variable framework will require a reference time- multivariate normally distributed given θ j1 , with an unre-
point to identify the latent scale. To accomplish that, the latent stricted covariance matrix. The matrix Ψ ∗ is an unstructured
mean and variance of the first time-point will be fixed to covariance matrix without any identifiability restrictions, see
zero and one, respectively. This way, the latent trait estimates Singer and Andrade (2000). As a result, the common mod-
across time will be estimated on one common scale, since eling (e.g., using an Inverse-Wishart prior) and estimation
an incomplete test design is used such that common items approaches can be applied for Bayesian inference, see Gel-
are administered at different measurement occasions (time- man et al. (2004).

123
448 Stat Comput (2016) 26:443–460

For estimation purposes (using the restriction ψθ1 = 1), it not the case, the prior specification of covariance parameters
is convenient to eliminate the restricted parameter ψθ1 from becomes much more complicated. Here, identification rules
the vector of covariances, ψ θ (1) (see the covariance structures impose restrictions on the covariance parameters of the latent
in Equations from (4) to (9)) between the first component, trait distribution. Therefore, the covariance structure is mod-
θ j1 , and the remaining components θ j (1) . This new vector is eled by conditioning on the restricted parameters, which are
denoted as ψ ∗ and is equal to related to the first measurement occasion. Following McCul-
loh et al. (2000), this approach supports a proper implementa-
ψ ∗ = ψ θ (1) / ψθ1 . (14)
tion of the identifying restrictions and a FGS implementation.
Subsequently, the conditional distribution of the unre- Conjugate prior distributions are considered, see Gel-
stricted latent variables is expressed as man et al. (2004) and Gelman (2006). According to the
ψ∗  approach presented in Sect. 2, the parameters of interest are
(μtθ , ψθ1 , ψ ∗ )t and Ψ ∗ . Conjugate priors are specified as,
t
θ j (1) | θ j1 = μθ(1) + θ j1 − μθ1 + ξ j . (15)
ψθ1
where ξ j ∼ N (0, Ψ ∗ ). The variance and correlation para- μθ ∼ N T (μ0 , Ψ 0 ) , (17)
meters, ψθ1 ∼ I G(ν0 , κ0 ) , (18)


ψ and Ψ , ∗
(16) ψ ∼ N T −1 (μψ , Ψ ψ ) , (19)

Ψ ∼ I WT −1 (νΨ , Ψ Ψ ) , (20)
define an one-to-one relation with the free parameters of the
original covariance matrix Ψ θ , since the parameter ψθ1 is
where I G(ν0 , κ0 ) stands for the inverse-gamma distribu-
restricted to 1. As a result, the estimates of the population
tion with shape parameter ν0 and scale parameter κ0 , and
variances and covariances can be obtained from the estimates
I WT −1 (νΨ , Ψ Ψ ) for the inverse-Wishart distribution with
of Eq. (16). The latent variable distribution of the first mea-
degrees of freedom νΨ and dispersion matrix Ψ Ψ .
surement occasion will be restricted to identify the model.
The prior for the item parameters is specified as
This is done by re-scaling the vector of latent variable values

of the first measurement occasion to a pre-specified scale in  t
each MCMC iteration. The latent variable population distrib- p ζ i = (ai , bi ) | μζ , Ψ ζ ∝ exp −0.5 ζ i − μζ Ψ −1
ζ
ution of subsequent measurement occasions are conditionally 

specified according to Eq. (15), given the restricted popula- × ζ i − μζ 11(ai >0) , (21)
tion distribution parameters of the first measurement occa-
sion. Subsequently, the covariance parameters of the latent
where μζ and Ψ ζ are the hyperparameters, and 11 the usual
multivariate model are not restricted for identification pur-
indicator function. The hyperparameters are fixed and often
poses, which will facilitate a straightforward specification of
set in such a way that they represent reasonable values for
the prior distributions.
the prior parameters.
In order to facilitate an FGS approach, and to account for
missing response data, an augmented data scheme will be
3 Bayesian inference and Gibbs sampling methods
introduced, see Albert (1992) and Albert and Chib (1993).
An augmented scheme is introduced to sample normally dis-
The marginal posterior distributions comprise the main tool
tributed latent response data Z ... = (Z 111 , ..., Z IT nT )t , given
to perform Bayesian inference. Unfortunately, it is not pos-
the discrete observed response data; that is,
sible to obtain closed-form expressions of the marginal pos-
terior distributions. An MCMC algorithm will be used to Z i jt |(θ jt , ζ i , Yi jt ) ∼ N (ai θ jt − bi , 1), (22)
obtain samples from the marginal posteriors, see Gamerman
and Lopes (2006). More specifically, we will develop a FGS where Yi jt is the indicator of Z i jt being greater than zero.
algorithm to estimate all parameters simultaneously. To handle incomplete block designs, and indicator vari-
MCMC methods for longitudinal and multivariate probit able I is defined that defines the set of administered items for
models have been developed by, among others, Albert and each occasion and subject. This indicator variable is defined
Chib (1993), Chib and Greenberg (1998), Chib and Carlin as follows,
(1999), Imai and Dyk (2005), and McCulloh et al. (2000).

A particular problem in Bayesian modeling of longitudi- 1, item i administered for examinee j at time point t
Ii jt = (23)
nal multivariate response data is the prior specification for 0, missing by design.
covariance matrices. An Inverse-Wishart prior distribution
is plausible when covariance parameters are not function- The not-selective missing responses due to uncontrolled
ally dependent, see Tiao and Zellner (1964). When this is events as dropouts, inclusion of examinees, non-response,

123
Stat Comput (2016) 26:443–460 449

or errors in recoding data are marked by another indicator, 7. Simulate ψ ∗ from ψ ∗ | (.).
which is defined as, 8. Simulate Ψ ∗ from Ψ ∗ | (.).
 9. Compute the unstructured covariance matrix using the
1, observed response of examinee j at time pointt on item i
Vi jt = (24) sampled covariance components from Steps 6–8 and Eqs.
0, otherwise,
(10), (13) and (14).
It is assumed that the missing data are missing at ran- 10. Through a parameter transformation method using
dom (MAR), such that the distribution of patterns of miss- sampled unstructured covariance parameters, compute
ing data does not depend on the unobserved data. When the restricted covariance components of interest. The sam-
MAR assumption does not hold and the missing data are pled restricted covariance structure Ψ θ is used when
non-ignorable, a missing data model can be defined to model repeating steps 2–8.
explicitly the pattern of missingness. In case of MAR, the
observed data can be used to make valid inferences about the To handle the restriction μθ1 = 0, the expression in Eq.
model parameters. (12) is used to simulate μθ(1) . To simulate (μθ1 , ψθ1 )t , the
To ease the notation, let indicator matrix I represent both following decomposition is used in (27),
cases of missing data. Then, under the above assumptions,
the distribution of augmented data Z ... (conditioned on all p(θ j. |ηθ ) = p(θ j (1) |ηθ , θ j1 ) p(θ j1 |ηθ1 ).
other quantities) is given by
where ηθ1 = (μθ1 , ψθ1 )t . To identify the model, the scale

n
of the latent variable of measurement occasion one is trans-
p(z ... | y... , ζ , θ .. , ηθ , I) ∝ formed to mean zero and variance one. It is also possible to
 
t=1 j=1 i|Ii jt=1 restrict the parameters (μθ1 , ψθ1 )t to specific values.
2  
In Step 9, MCMC samples of Ψ ∗ are drawn from an
× exp −0.5 z i jt − ai θ jt + bi 11(zi jt ,yi jt ) , (25)
inverse-Wishart distribution, and each sampled covariance
where 11(zi jt ,yi jt ) represents the restriction that z i jt is greater matrix is restricted to be positive definite. Now, the follow-
(lesser) than zero when yi jt equals one (zero), according to ing relationship can be defined,
Eq. (22).  
det (Ψ θ ) = det (ψθ1 )det Ψ θ(1) − ψθ−1
t
Given the augmented data likelihood in Eq. (25) and the 1
ψ θ(1) ψ θ(1)
prior distributions in Eqs. (2), (17), (18), (19), (20) and (21), = ψθ1 det (Ψ ∗ ),
the joint posterior distribution is given by:
using Eqs. (10) and (13) and a property of the determinant of
p(θ .. , ζ , μθ , ψθ1 , ψ ∗ , Ψ ∗ |z ... , y... ) block matrices. As a result, the det (Ψ θ ) is greater than zero
∝ p(z ... |θ .. , ζ , y... ) p(θ .. |ηθ ) since both the determinant of Ψ ∗ and ψθ1 are greater than
× p(ζ |μζ , Ψ ζ ) p(ηθ ). (26) zero. This implies positive definite samples of Ψ θ .
In MCMC Step 10, parameters of a posited covariance pat-
where tern structure are computed given an MCMC sample of the

n
unrestricted unstructured covariance parameters. Each sim-
p(θ .. |ηθ ) = p(θ j. |ηθ ), (27) ulated covariance matrix will be positive definite, since it is
j=1
based on a positive definite unstructured covariance matrix.
and In the Appendix, the reparameterization for each covariance
structure is specified, which facilitates the sampling of the
p(ηθ ) = p(μθ ) p(ψθ1 ) p(ψ ∗ ) p(Ψ ∗ ) .
parameters of the restricted covariance matrices. That is, in
This posterior distribution (26) has an intractable form but, each MCMC iteration, parameters of a specific covariance
as shown in the Appendix, the full conditionals are known pattern are computed using sampled unstructured covariance
and easy to sample from. Let (.) denote the set of all necessary parameters. This procedure is based on the notion that each
parameters. The FGS algorithm is defined as follows: restricted covariance pattern is nested in the most general
unstructured pattern, and that in the MCMC procedure para-
1. Start the algorithm by choosing suitable initial values. meter transformations can be used to achieve draws from the
Repeat steps 2–10. transformed parameter distribution.
2. Simulate Z i jt from Z i jt | (.), i = 1, . . . , It , j =
1, . . . , n, t = 1, . . . , T .
3. Simulate θ j. from θ j. | (.), j = 1, ..., n. 4 Selection of covariance structure
4. Simulate ζ i from ζ i | (.), i =1,...,I.
5. Simulate μθ from μθ | (.). Accurate inferences are obtained when selecting the most
6. Simulate ψθ1 from ψθ1 | (.). appropriate covariance pattern. Selecting a too simple covari-

123
450 Stat Comput (2016) 26:443–460

ance pattern can lead to underestimated standard errors and where the penalty function for model complexity is deter-
biased parameter estimates, see Singer and Andrade (2000) mined by an estimate of the effective number of model para-
and Singer and Andrade (1994). Selecting a too complex meters, which allows nonzero covariance among model para-
covariance pattern can lead to a decrease in power and effi- meters.
ciency. The general method to select an appropriate covari- For the AIC and the BIC, the penalty function for model
ance structure is based on some Bayesian optimality crite- complexity is determined by the effective number of parame-
rion. The different covariance structures are viewed as com- ters in the model, which is difficult or impossible to ascertain
peting covariance models and the one that optimizes the when random effects are involved. Following Spiegelhalter et
Bayesian model criterion is selected. Attention is focused al. (2002) and Congdon (2003), the posterior mean deviance
on three criteria for model selection, which are widely used is used as a penalized fit measure, which includes a measure
in the literature; the deviance information criterion (DIC), of complexity. Then, the following specification is made for
posterior expectation of the Aikaike’s information criterion the AIC and the BIC,
(AIC) and the posterior expectation of the Bayesian informa-
tion criterion (BIC), see Spiegelhalter et al. (2002). For each AI Ci = Di (ϑ) + 2(2(I − 1) + (T + n Ψ θ )),
criterion, the covariance pattern is selected with the smallest
criterion value. All competing covariance structures are time and
heterogenous, which generalizes the work of Andrade and
Tavares (2005) and Tavares and Andrade (2006). B I Ci = Di (ϑ) + (2(I − 1) + (T + n Ψ θ )) ln(n ∗ ),
The general form of the different information criteria is
the deviance (i.e. minus two times the log-likelihood) plus a i = 1, 2, respectively, where n Ψ θ is the total number of
penalty term for model complexity, which includes the num- covariance parameters, T is the number of time points and
 T 
ber of model parameters. Let ϑ denote the set of relevant n ∗ = nj=1 t=1 i=1 I Vi jt .
parameters, that is, the latent traits, the item and the popu- The AIC and BIC results are not guaranteed to lead to
lation parameters, then the following deviance is considered the same model, see Spiegelhalter et al. (2002) and Ando
to define the model selection criteria, (2007). The BIC has a much higher penalty term for model
complexity than the AIC. Therefore, a relatively more con-
 
D1 (ϑ) = −2 log p( y | θ, ζ ) + log p(θ | μθ , Ψ θ ) cise description of the covariance structure can be expected
⎛ from the BIC. When different results of the two criteria are

T  n  obtained, the model selected by the BIC is preferred over the
= −2 ⎝ log P(Yi jt = yi jt |θ jt , ζ i )
one selected by the AIC, see Spiegelhalter et al. (2002) and
t=1 j=1 i|Ii jt =1
⎞ Ando (2007).

n
The deviance can be approximated using the MCMC out-
+ log p(θ j | μθ , Ψ θ ) ⎠ put, and using G MCMC iterations the posterior mean of the
j=1
deviance is estimated by
= −2(L L + L L L T ),
1 
G
where p(θ j | μθ , Ψ θ ) represents the density of the mul- Di (ϑ) = Di (ϑ g ),
T n  G
tivariate normal distribution, L L = g=1
 j=1 i|Ii jt =1
t=1
log P(Yi jt = yi jt |θ jt , ζ i ) and L L L T = nj=1 log p(θ j | and the deviance at the posterior mean by,
μθ , Ψ θ ).
The deviance depends highly on the estimated latent traits. ⎛ ⎞
1 
G
The covariance structure will influence the latent trait esti- Di (ϑ̂) = Di ⎝ ϑ g⎠ ,
mates, although they are mostly influenced by the data. The G
g=1
terms L L and L L L T both emphasize the fit of the latent
traits, and will diminish the importance of the fit of the covari- with index g representing the g-th value of the valid MCMC
ance structure. Therefore, the deviance term D2 (ϑ) = L L is sample (considering the burn-in and the thin value).
also considered to evaluate the fit of the covariance structure Here, the selection of the most optimal covariance struc-
by evaluating the fit of the latent traits in the likelihood term. ture is carried out using Bayesian measures of model com-
Let Di (ϑ) denote the posterior mean deviance and Di (ϑ̂) plexity as in Spiegelhalter et al. (2002). It also possible to
the deviance at the posterior mean. Then, the DIC is defined use pseudo-Bayes factors as in Kass and Raftery (1995), or
as, reversible Jump MCMC algorithms, see Green (1995) and
Azevedo (2008), which would require a different computa-
D I Ci = 2Di (ϑ) − Di (ϑ̂), i = 1, 2, tional implementation.

123
Stat Comput (2016) 26:443–460 451

4.1 Model assessment: posterior predictive checks and the variance, respectively. The posterior predictive check
based on the score distribution is evaluated using MCMC
Besides using model selection criteria for selecting the output.
covariance structure, the fit of the general LIRT model The predictive score distribution is easily calculated using
can be evaluated using Bayesian posterior predictive tests the MCMC output. In each iteration, a sample of the score
and Bayesian residual analysis techniques (Albert and Chib distribution is obtained. This is accomplished by generating
1995). The literature about posterior predictive checks for response data for the sampled parameters. Subsequently, the
Bayesian item response models shows several diagnostics for number of subjects can be calculated for each possible score
evaluating the model fit. A general discussion can be found in, at each time-point. For each possible score, the median and
among others, Stern and Sinharay (2005), Sinharay (2006), 95 % credible interval is calculated to evaluate the score dis-
Sinharay et al. (2006), and Fox (2004, 2005, 2010). tribution.
The common posterior predictive tests can be generalized A general approach for model adequacy assessment using
to make them applicable for the LIRT model. Each poste- Bayesian (latent) residuals is described by Albert and Chib
rior predictive test is based on a discrepancy measure, where (1995). Here, Bayesian residuals are analyzed for the latent
this discrepancy measure is defined in such a way that a spe- traits at each time-point. The following quantity is consid-
cific assumption or general fit of the model can be evaluated. ered,
The main idea is to generalize the well known discrepancy
measures to a longitudinal structure. 
θj − 
μθt
In general, let yobs be the matrix of observed responses,  ,
ψθt
and yr ep the matrix of replicated responses generated from
its posterior predictive distribution. The posterior predictive
distribution of the response data of time-point t is represented for t = 1, 2, ..., T , using posterior mean estimates. Sub-
by sequently, the normality assumption is evaluated using box
      and/or Q-Q plots.
r ep r ep
p yt | yt obs
= p yt | θ t p θ t | yobs
t dθ t ,

where θ t denotes the set of model parameters correspond-


5 Simulation study
ing time-point
 t. Generally, given a discrepancy measure
D yt , θ t , the replicated data are used to evaluate whether
Convergence properties and parameter recovery were ana-
the discrepancy value given the observed data is typical
lyzed using simulated data. The following hyperparameter
under the model. A p-value can be defined that quantifies
settings were used in the simulation study:
the extremeness of the observed discrepancy value in time-
point t,
        μψ = 0T −1 , Ψ ψ = τ I T −1 (28)
(obs) (r ep) (obs) (obs) 
p0 y t = P D yt , θ t ≥ D yt , θ t | yt , Ψ Ψ = (νΨ − T + 1) I − Ψ ψ , (29)
where the probability is taken over the joint posterior of
(r ep) where νΨ = 5, τ = 1/8 and the hyperparameters for the
( yt , θ t ). In some cases, the discrepancy measure can be
item parameters were specified as: μζ = (1, 0) and Ψ ζ =
generalized from the time-point level to the population level.
diag(0, 5, 3).
In that case, the discrepancy measure can be used to evaluate
Responses of n = 1,000 examinees were simulated for
model fit at the time-point and population level.
three measurement occasions. At each occasion, data were
Here, p-values based on a chi-square distance, predictive
simulated according to a test of 24 items. There were six
distributions of latent scores, and Bayesian latent residuals
common items between test one and two, and six between
are considered (Fox 2004, 2010; Azevedo et al. 2011). The
test two and three. The item parameter values vary in terms
chi-square posterior predictive check is defined to evaluate
of discrimination power and difficulty, properly . For each
the predictive score distribution with the observed score dis-
examinee, a total of 60 items were administered.
tribution. The discrepancy measure for evaluating the score
Examinees’ latent traits were generated from a three-
distribution is defined as,
variate normal distribution with μθ = (0, 1, 2)t . The within-

 nl,t − E(nl,t ) 2 subject latent traits were correlated according to an ARH
D (yt ) = , covariance structure, where ψ θ = (1, 0.9, 0.95)t and ρθ =
V (nl,t )
l
0.75. This implies latent growth in the mean structure, weak
where nl,t is the number of subjects with a score l at measure- heterogeneous latent trait variance across time, and a strong
ment occasion t, and E(.) and V (.) stand for the expectation within-subject correlation over time.

123
452 Stat Comput (2016) 26:443–460

5.1 Convergence and autocorrelation assessment stored every 30th iteration. The MCMC sample composed
by storing every 30th value showed negligible autocorrela-
Following Gamerman and Lopes (2006), the convergence of tion. Posterior density plots (not shown) using the sampled
the MCMC algorithm was investigated by monitoring trace values showed that symmetric behavior of the posteriors,
plots generated by three different sets of starting values, and which support the posterior mean as a Bayesian point esti-
by evaluating Geweke’s and Gelman and Rubin’s conver- mate.
gence diagnostics. In each plot, three different chains are plotted, which
Following DeMars (2003), the sampled latent traits were correspond to three different initial values. From a visual
transformed to the scale of the simulated latent traits accord- inspection it can be concluded that within 100 (thinned)
ing to iterations each chain of simulated values reached the same
area of plausible parameter values. Each MCMC chain
  mixed very well, which indicates that the entire area of the
θ ∗∗
j. = Chol (Ψ θ ) Chol (Sθ )
−1
θ ∗j. − θ + μθ ,
parameter space was easily reached. The Geweke diagnos-
tic, based on a burn-in period of 16,000 iterations, indi-
where θ ∗j. are the simulated latent traits, θ and Sθ are the cated convergence of the chains of all model parameters.
sample mean vector and covariance matrix, respectively, and Furthermore, the Gelman–Rubin diagnostic were close to
Chol stands for the Cholesky decomposition. one, for all parameters. Convergence was established eas-
Figure 1 represents trace plots of latent trait popula- ily without requiring informative initial parameter values
tion parameters for occasions two and three. The popu- or long burn-in periods. Therefore, the burn-in was set
lation parameters of time point one were fixed for iden- to be 16,000, and a total of 46,000 values were simu-
tification. Figure 2 represents trace plots of parameters lated, and samples were collected at a spacing of 30 itera-
of two randomly selected items. Sampled values were tions.

Trace of population mean 2 Trace of population mean 3


2.0
1.0
0.5

1.0
0.0

0.0

0 500 1000 1500 0 500 1000 1500


Iterations Iterations

Trace of population variance 2 Trace of population variance 2.1


0.6 0.8 1.0 1.2

1.4
1.0
0.6

0 500 1000 1500 0 500 1000 1500


Iterations Iterations

Trace of correlation
0.8
0.4
0.0

0 500 1000 1500


Iterations

Fig. 1 For different starting values, trace plots of the simulated values of the population parameters

123
Stat Comput (2016) 26:443–460 453

Trace of discrimination parameter item 4 Trace of difficulty parameter item 4

0.0
1.4
1.2

−0.5
1.0

−1.0
0.8
0.6

−1.5
0 500 1000 1500 0 500 1000 1500
Iterations Iterations

Trace of discrimination parameter item 33 Trace of difficulty parameter item 33


1.8

1.2
1.6

0.8
1.4
1.2

0.4
1.0

0.0

0 500 1000 1500 0 500 1000 1500


Iterations Iterations

Fig. 2 For different starting values, trace plots of the simulated values for parameters of item 4 and 33

5.2 Parameter recovery Table 1 Replication study: results for the estimated latent trait and item
parameters
The linked test design contains 60 items such that 120 item Parameter Statistic
parameters need to be estimated and 3,000 person parame-
Corr MSE ABias Var RMSE
ters. The general population model for the person parameters
leads to an additional set of five parameters, since two pop- Latent trait .993 .086 .114 .061 .278
ulation parameters were restricted. Ayala and Sava-Bolesta Discrimination .982 .011 .028 .010 .097
(1999) suggest to consider around 1,200 subjects per item to Difficulty .999 .023 .042 .016 .122
obtain accurate parameter estimates. Here, 1,000 responses
per item were simulated since the specification of a correct Table 1 represents the results for the latent traits and item
prior structure of the LIRT becomes more important when parameters. The estimated values of the statistics indicate that
less data are available. Furthermore, the characteristics of the the MCMC algorithm recovered all parameters properly. Fur-
real data study described further on, will resemble those of thermore, the estimated posterior means of the discrimination
the simulated data study. and difficulty parameters were also close to the true values.
Different statistics were used to compare the results: mean Similar conclusions can be drawn about the estimates of the
of the estimates (M. Est.), correlation (Corr), mean of the latent trait population parameters, see Table 2. The estimated
standard error (MSE), variance (VAR), the absolute bias posterior means are close to the true values, and the biases
(ABias) and the root mean squared error (RMSE). To evaluate are relatively small.
the accuracy of the MCMC estimates, a total of ten replicated
data sets were generated, which was based on Azevedo and 5.3 Covariance structure selection
Andrade (2010) and Ayala and Sava-Bolesta (1999). For the
item and latent trait parameters, average statistics were com- The information criteria were used to compare the fit of
puted by averaging across data sets, and items and persons, the different covariance models. The results are given in
respectively. Table 3, which includes the information criteria for model

123
454 Stat Comput (2016) 26:443–460

Table 2 Replication study: results for the estimated latent trait popu- Development Program. The aim of the program is to improve
lation parameters the teaching quality and the general structure (classrooms,
Parameter Statistics libraries, laboratory informatics etc) in Brazilian public
schools. A total of 400 schools in different Brazilian states
M. est. MSE ABias Var RMSE
joined the program. Achievements in mathematics and Por-
μθ2 .968 .003 .032 .002 .053 tuguese language were measured over five years (from fourth
μθ3 1.968 .009 .032 .008 .094 to eight grade of primary school) from students of schools
ψθ2 .864 .006 .036 .005 .078 selected and not selected for the program.
ψθ3 .948 .009 .002 .009 .092 The study was conducted from 1999 to 2003. At the start,
ρθ .749 <.001 .001 <.001 .007 158 public schools were monitored, where 55 schools were
selected for the program. The sampled schools were located
comparison, as presented in Sect. 4. The information crite- over six Brazilian states with two states in each of three
ria results for the heteroscedastic toeplitz (HT) model were Brazilian regions (North, Northeast, and Center West). The
much higher in comparison to the other covariance mod- schools had at least 200 students enrolled for the daytime edu-
els, since the dependency structure was restricted to correla- cational programs, were located at urban zones, and offered
tions between two adjacent time measurements. Therefore, an educational program to the eighth grade. At baseline, a
to avoid distraction these results were not included in Table total of 12,580 students were sampled. From 2000 to 2003,
3. the cohort consisted of students from the baseline sample
From Table 3 it follows that the D I C2 selects the true who were approved to the fifth grade and did not switch
covariance model (ARH), where the AI C2 and B I C2 select schools. Students enrolled in the fifth grade but coming from
the UH structure. However, note that the ARH model was another school, and students not assessed in former grades
ranked second by these two criteria. It is not surprising that constituted a second cohort, which was followed the four sub-
the B I C2 selects the UH above the ARH, since the DIC tends sequent years. Other cohorts were defined in the same way.
to prefer simpler models. However, the AI C2 , which tends The longitudinal test design allowed dropouts and inclusions
to select more complex models, also selected the UH model, along the time points. Besides achievements, social-cultural
even though the difference from the related statistic for the information was collected. The selected students were tested
ARH model is quite small. This behavior could be caused by each year.
sampling fluctuation. In the present study, mathematic performances of 1,500
As expected, the results of the AI C1 , B I C1 and D I C1 randomly selected students, who were assessed in the fourth,
were inflated by the values of the LLLT, by emphasizing the fifth, and sixth grade, were considered. A total of 72 test
fit of the latent trait estimates. The quantification of the fit of items was used, where 23, 26, and 31 items were used in the
each particular covariance structure is not well represented test in grade four, grade five, and grade six, respectively. Five
by these criteria, since the latent trait estimates dominate the anchor items were used in all three tests. Another common
deviance term. Although the results show some consistency set of five items was used in the test in grade four and five.
when considering the D I C2 , a more thorough study is nec- Furthermore, four common items were used in the tests in
essary, which is beyond the scope of the present study. grade five and six.
In an exploratory analysis, the multiple group model
(MGM), described in Azevedo et al. (2011), was used to
6 The Brazilian school development study estimate the latent student achievements given the response
data. The MGM for cross-sectional data assumes that stu-
The data set analyzed stems from a major study initiated dents are nested in groups and latent traits are assumed to
by the Brazilian Federal Government known as the School be independent given the mean level of the group. Typical

Table 3 Selecting the optimal


covariance structure for the real Model AI C1 B I C1 D I C1 LL LLLT AI C2 B I C2 D I C2
data set: estimated Bayesian
information criteria. Bold values HU 61,696 62,844 64,610 −25,593 −3,547 54,102 55,250 56,518
indicates models chosen by the ARH 61,000 62,148 63,920 −25,636 −3,154 54,113 55,261 56,453
statistics ARMAH 61,076 62,234 63,996 −25,631 −3,195 54,114 55,271 56,461
HC 59,275 60,423 62,326 −26,442 −1,420 55,359 56,507 57,334
AD 60,834 61,991 63,759 −25,641 −3,061 54,122 55,279 56,457
Unst. 60,719 61,885 63,637 −25,652 −2,995 54,135 55,302 56,459

123
Stat Comput (2016) 26:443–460 455

Table 4 Estimated posterior variances, covariances, and correlations The results show significant between-grade dependen-
among estimated latent traits are given in the diagonal, lower and upper cies. That is, the latent traits are not conditionally inde-
triangle, respectively
pendently distributed over grades given the grade-specific
Grade four Grade five Grade six means. The estimated variances increased after grade four,
which indicates the presence of time-heterogenous vari-
Grade four 1.000 .723 .629
ances. Furthermore, given the estimates of covariances,
Grade five .659 1.152 .681
time-heteroscedastic covariances and time-decreasing cor-
Grade six .540 .641 1.071
relations are to be considered to account for within-subject
(between-grade) dependencies among latent traits. There-
Table 5 Selecting the optimal covariance structure for the real data set: fore, the LIRT model was estimated using each one of the
estimated Bayesian information criteria. Bold values indicates models covariance structures to account for the specific dependen-
chosen by the statistics cies.
Model LL AI C2 B I C2 D I C2 The response data were modeled according to the LIRT
model using different covariance structures. First, attention
HU −71, 980 147,477 148,941 150,398
was focused on selecting the optimal covariance structure.
ARH −72, 164 147,693 149,157 150,462
Second, a more detailed model fit assessment was carried
ARMAH −72, 179 147,727 149,201 150,496
out using the selected covariance structure. The three model
HC −72, 840 148,707 150,171 151,139 selection criteria were used to identify the most suitable
AD −72, 184 147,723 149,196 150,477 covariance structure. As in the simulation study, the het-
Unst. −71, 984 147,470 148,954 150,368 eroscedastic toeplitz model did not fit the data and produced
much higher information criteria estimates. For each other
covariance structure, Table 5 represents the estimated values
for the longitudinal nature of the study, a positive correlation for the AI Ci , B I Ci , and D I Ci , i = 1, 2. The information
among latent traits from the same examinee is to be expected, criteria are represented such that a smaller value corresponds
but this aspect was ignored in this explanatory analysis. Pear- to a better model fit.
son’s correlations, variances, and covariances were estimated The AI C2 and D I C2 preferred the unstructured covari-
among the vectors of estimated latent traits corresponding to ance model, where the B I C2 preferred the more parsimo-
grade four to six. The estimates are represented in Table 4. nious HU. However, the unstructured model was ranked sec-

Grade four Grade five


80 100

80 100

expected score expected score


credibility interval credibility interval
frequency
frequency

observed score observed score


60

60
40

40
20

20
0

0 20 40 60 0 20 40 60
score escore

Grade six
80 100

expected score
frequency

credibility interval
observed score
60
40
20
0

0 20 40 60
escore

Fig. 3 Observed score distribution, predicted score distribution, and 95 % central credible intervals

123
456 Stat Comput (2016) 26:443–460

Grade four Grade five

3
2

2
residuals

residuals
1

1
0

0
−1

−1
−2
−2

−3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3

Grade six
3
2
residuals
1
0
−1
−2

−3 −2 −1 0 1 2 3

Fig. 4 For each grade, Quantile-Quantile plot of estimated latent trait residuals

ond by B I C2 , whereas the UH was second ranked by AI C2 Table 6 Population parameter estimates and 95 % credible intervals
and D I C2 . There were only three measurement occasions, Grade Mean SD HPD 95%
and the correlations between grade years were high. This
made the comparison between the unstructured and UH dif- Mean
ficult. In the presence of high between grade correlations and Four (reference) 0 – –
a few time points, the information criteria results preferred Five .240 .040 [ .170, .319]
the UH (the most parsimonious model) and the unstructured Six .763 .048 [.680, .862]
covariance matrix. From the various competing covariance Variance
structures, the unstructured covariance matrix was used for Four (reference) 1 – –
further analysis. Five 1.032 .081 [.876, 1.183]
Different model fit assessment tools, based on posterior Six .969 .087 [.794, 1.131 ]
predictive densities of different quantities were used to eval- Correlations
uate the LIRT model with the ARMAH covariance structure. Four and five .857 .012 [ .832, .879]
The p-value based on a chi-squared distance, and predic- Four and six .759 .017 [ .724, .790]
tive distributions of latent scores and Bayesian latent resid- Five and six .810 .015 [ .784, .840]
uals were considered, see for more details about the poste-
rior checks Albert and Chib (1995), Azevedo (2008), and
Azevedo et al. (2011).
The Bayesian p-value was p = .398, which indicates that trait means was detected given the non-overlapping credible
the model fitted well. In addition, the observed scores fall intervals. As expected, the mean growth of math achieve-
almost all within the credible intervals for each grade, except ment over grade years is significant. The within-grade vari-
for observed scores equal to 20 in grade five, see Fig. 3. Figure ability is relatively small, but the between-grade correla-
4 represents an estimated quantile-quantile plot of the latent tions are significant. Each within-examinee latent growth was
trait residuals of each grade. In general, from visual inspec- computed, while accounting for the complex dependencies,
tion follows that the assumed normal probability distribution which showed a comparable pattern compared to the mean
in each grade seems to be appropriate. latent growth over grade years.
Table 6 represents the population parameter estimates Finally, Figs. 5 and 6 represent the posterior means and
and 95 % HPD credible intervals of the three grade lev- 95 % credible intervals of the item discrimination and orig-
els while accounting for a time-heterogenous correlation inal difficulty estimates (bi∗ ), respectively. The discrimina-
structure among latent traits. A significant growth in latent tion parameter estimates are relatively low, where approx-

123
Stat Comput (2016) 26:443–460 457

Fig. 5 Posterior means and Discrimination parameter


HPD intervals for the
discrimination parameters

1.0
0.8
estimate
0.6
0.4

0 10 20 30 40 50 60 70
item

Fig. 6 Posterior means and Difficulty parameter


HPD intervals for the difficulty
2

parameters
1
estimate
0
−1
−2

0 10 20 30 40 50 60 70
item

imately 50 % of the items have sufficient discriminating 7 Conclusions and comments


power. In addition, by comparing the difficulty parameter
estimates with the population mean estimates, it follows that A longitudinal item response model is proposed, where
the tests were relatively easy, since most of the difficulty the within-examinee latent trait dependencies are explicitly
values are below zero. To obtain more accurate estimates of modeled using different covariance structures. The time-
latent growth of well-performing and excellent examinees, heterogenous covariance structures allow for time-varying
more difficult test items are needed. The relatively easy items latent trait variances, covariances, and correlations. The com-
led to skewed population distributions (see Fig. 3), where a plex dependency structure across time and identification
lot of students performed very well, which makes it difficult issues lead to restrictions on the covariance matrix, which
to accurately measure the math performances of these stu- complicates the specification of priors and implementation
dents. However, note that the within-examinee dependency of an MCMC algorithm. By conditioning on a reference or
structure over time contributes to an improved estimate of baseline time-point, an unrestricted unstructured covariance
subject-specific latent trait, since it supports the use of infor- matrix was specified given the baseline population parame-
mation from other grade years to estimate the achievement ters. Furthermore, the restricted structured covariance mod-
level. els were handled as restricted versions of the unstructured

123
458 Stat Comput (2016) 26:443–460

restricted covariance model, which was estimated through – Step 3: Simulate the item parameters by using ζ i |(.) ∼
the developed MCMC method. ζ 
N (Ψ  ζ ), mutually indepedently, where
ζi, Ψ
i i
The developed Bayesian methods include an MCMC esti-
mation method, and different posterior predictive assessment 
ζ i = H i.. z i.. + Ψ −1ζ μζ ,
t
tools. In a simulation study, the MCMC algorithm showed a  −1
good recovery of the model parameters. The assessment tools  ζ = H i..
Ψ t
H + Ψ −1
,
i i.. ζ
were shown to be useful in evaluating the fit of the model.
H i.. = [θ − 1] • Ii , (30)
Various model extensions of the LIRT model can be con-
sidered. The latent variable distribution is assumed to be
multivariate normal. This can be adjusted for example by where Ii is the indicator vector of item i, which indicates
using a multivariate skewed latent variable distribution to the subjects responding to item i and “•” is the Hadamard
model asymmetric latent trait distributions. Furthermore, the product.
skewed latent variable approach of Azevedo et al. (2011) – Step 4: Simulate the population mean vector by using
could be used. The extension to nominal and ordinal response
data can be made by defining a more flexible response model μ ) ,
μθ1 |(.) ∼ N (μθ1 , ψ
at level 1 of the longitudinal model. Dropouts and inclusions μ
μθ(1) |(μθ1 , (.)) ∼ N T (μθ(T −1) , Ψ ),
(T −1)
of examinees were not allowed in the present data study. A
multiple imputation method could be developed to support
this situation, see Azevedo (2008). More general, the LIRT where
model can be adapted to accommodate incomplete designs,
latent growth curves, collateral information for latent traits, 
n

μθ = Ψ −1
θ θ j. + Ψ −1
0 μθ
informative mechanisms of non-response, mixture structures j=1
on latent traits and/or item and population parameters, and (T −1)
flexible latent trait distributions, among other things. This = (
μθ1 , 
μθ2 , . . . ,  μθ1 , 
μθT )t ( μθ )t ,
! "
requires defining a more general IRT model for the response  −1 ψμ t (T −1)
ψ
 μ = nΨ +Ψ μ
Ψ −1 −1
= μ
,
data using flexible priors that can include the different exten- θ (T −1) Ψ
ψ μ (T −1)
μ
sions.
 μ
μθ = Ψ μθ = (μθ1 , μθ2 , . . . , μθT )t
Acknowledgments The authors are thankfull to CNPq (Conselho
= (μθ1 , μθ(T −1) )t ,
Nacional de Desenvolvimento Científico e Tecnológico) from Brazil,
(T −1)  (T −1)
for the financial support through a Doctoral Sandwich Scholarship μθ(T −1) = μθ μ−1 ψ
+ψ μ (μθ1 − μθ1 ) ,
granted to the first author under the guidance of the two others
 μ(T −1) = Ψ
Ψ μ(T −1) (T −1) ψ
μ−1 ψ
−ψ t (T −1) .
μ μ

Appendix – Step 5: Simulate the first time point variance using


ψθ1 |(.) ∼ I G(
υ0 ,
κ0 ), where
– Step 1: Simulate the augmented data using Z i jt |(.),
according to Eq. (22). n + υ0
υ1 =
 ,
– Step 2: Simulate the latent traits using 2
n
j=1 (θ j1 − μθ1 ) + κ0
2

κ1 = .
θ j
θ j. |(.) ∼ N T (Ψ θ j )
θ j, Ψ 2

– Step 6: Simulate the vector of covariances using ψ ∗ ∼


where N T −1 (Ψ ψ , Ψ
ψ ψ  ψ ), where

 

θj = ai bi 1T + ai z i j. + Ψ −1
θ μθ , 
n
 
i|Ii jt =1 i|Ii jt =1
ψ = ψ −1/2 Ψ ∗ −1
ψ θ j (1) − μθ (1) θ j1 − μθ1
θ1 θ
⎛ ⎞−1 j=1

θ j = ⎝
Ψ ai2 I T + Ψ −1 ⎠ , + Ψ −1
ψ μψ ,
θ
i|Ii jt =1
⎛ ⎞−1

n


Ψ = ⎝ψθ−1 Ψ ∗θ −1 θ j1 − μθ1 + Ψ −1
2 ⎠ .
1 ψ
where z i j. = (z i j1 , . . . , z i j T )t . j=1

123
Stat Comput (2016) 26:443–460 459

– Step 7: Simulate the covariance matrix Ψ ∗ ∼ I WT −1 and, for t = 2, ..., T − 1, using


(  Ψ ), where
νΨ , Ψ
ψ ∗ [t :] × (ψ ∗θ(1) [t :])−1/2
νΨ = n + νΨ , ρθt = #t−1 . (38)
t =1 ρθt
n
 t
Ψ = Ψ Ψ +
Ψ θ j (1) − μ∗θ θ j (1) − μ∗θ .
j=1 – Step 11: A specific covariance pattern model is com-
puted using the appropriate restriction on the free para-
– Step 8: Calculate the original covariance matrix using meters sampled from their joint distribution. The com-
(10) and Ψ θ (1) = Ψ ∗ + ψ ∗ ψ ∗ .
t
puted restricted covariance matrix is used in the repeating
– Step 9: Calculate the population variances using MCMC Steps.
The unstructured covariance matrix is the least restric-
(ψθ2 , . . . , ψθT )t = ψ ∗θ (1) = Diag(Ψ ∗ + ψ ∗ ψ ∗t ) , (31) tive version, and assumes unique variance and covariance
parameters for the measurements of theta over time. Each
where Diag extracts the main diagonal of a square structured covariance pattern is a restricted version of
matrix. the unrestricted covariance pattern. The parameter space
– Step 10: Depending on the restricted covariance structure defined by the unstructured covariance pattern model rep-
of interest, transformations are defined for unrestricted resents all possible combinations of the different para-
parameters to facilitate draws of restricted model para- meters. Therefore, this parameter space will contain all
meters. Below, in each subitem, the following notation is possible combinations of parameters of each restricted
used: ψ ∗θ(1) is given by (31), “•” denotes the Hadamard covariance pattern model. This property is explicitly used
in the present sampling procedure. That is; each restric-
product, (.)−1/2 is an inverse-square-root pointwise oper-
tion will be used to imply a relationship between the
ator, and A[t] and A[t :] denotes the t-th component and
parameters sampled from their joint distribution. Each
the remaining values of the vector A, starting at t, respec-
relationship is implied to restrict the free parameters,
tively.
which are sampled from their joint distribution, where the
– ARH and UH: Calculate the correlation coefficient using
restriction implies a common covariance or a function of
1   the common covariance parameter, which is defined by
ρθ = 1tT −1 ψ ∗ • (ψ ∗θ (1) )−1/2 . (32) the set of free covariance parameters.
T −1

– HT: Calculate the correlation coefficient using


By sampling parameters of the unrestricted covariance

ρθ = ψ [1] × (ψ ∗θ (1) [1])−1/2 . (33) pattern, potentially all possible restricted versions can
be drawn. In the procedure, a restricted version is com-
– HC: Calculate the covariance parameter using puted from the unstructured sampled covariance parame-
ters and the restricted set of parameters are considered to
1   be the implied restricted sample from the unrestricted
ρθ = 1tT −1 ψθ1 ψ ∗ . (34) sample. Since all possible restricted samples are gener-
T −1
ated from all free possible combinations of parameters,
– ARMAH: Calculate the moving average parameter (γθ ) the restricted sample is obtained from the parameter space
using of all possible combinations of the different parameters
of the restricted covariance pattern model.
γθ = ψ ∗ [1] × (ψ ∗θ(1) [1])−1/2 (35)

and the correlation parameter (ρθ ) using References

1   Albert, J.: Bayesian estimation of normal ogive item response curves


ρθ = 1tT−1 ψ ∗ [T − 2 :] • (ψ ∗θ(1) [T − 2 :])−1/2 . using Gibbs sampling. J. Educ. Behav. Stat. 17, 251–269 (1992)
T −2 Albert, J.A., Chib, S.: Bayesian analysis of binary and polychotomous
(36) response data. J. Am. Stat. Assoc. 88, 669–679 (1993)
Albert, J.A., Chib, S.: Bayesian residual analysis for binary response
– AD: Calculate the correlation parameter using regression models. Biometrika 82, 747–769 (1995)
Ando, T.: Bayesian predictive information criterion for the evaluation
of hierarchical bayesian and empirical Bayes models. Biometrika
ρθ1 = ψ ∗ [1] × (ψ ∗θ (1) [1])−1/2 (37) 94, 443–458 (2007)

123
460 Stat Comput (2016) 26:443–460

Andrade, D.F., Tavares, H.R.: Item response theory for longitudinal Gelman, A.: Prior distribution for variance parameters in hierarchical
data: population parameter estimation. J. Multivar. Anal. 95, 1–22 models. Bayesian Anal. 1, 515–533 (2006)
(2005) Green, P.J.: Reversible jump Markov chain Monte Carlo computa-
Azevedo, C.L.N.: Multilevel multiple group longitudinal models in item tion and Bayesian model determination. Biometrika 82, 711–732
response theory: estimation methods and structural selection under (1995)
a Bayesian perspective. unpublished PhD thesis, in Portuguese Hedeker, D., Gibbons, R.D.: Longitudinal Data Analysis, 1st edn. Wiley
(2008) Series, New York (2006)
Azevedo, C.L.N., Andrade, D.F.: An estimation method for latent trait Imai, K., van Dyk, D.A.: A Bayesian analysis of the multinomial probit
and population parameters in nominal response model. Braz. J. model using marginal data augmentation. J. Econom. 124, 311–
Probab. Stat. 24, 415–433 (2010) 334 (2005)
Azevedo, C.L.N., Bolfarine, H., Andrade, D.F.: Bayesian inference for a Jennrich, R.I., Schluchter, M.D.: Unbalanced repeated-measures mod-
skew-normal IRT model under the centred parameterization. Com- els with structured covariance matrices. Biometrics 42, 805–820
put. Stat. Data Anal. 55, 353–365 (2011) (1986)
Azevedo, C.L.N., Andrade, D.F., Fox, J.-P.: A Bayesian generalized Kass, R.E., Raftery, A.E.: Bayes factors. J. Am. Stat. Assoc. 90, 773–
multiple group IRT model with model-fit assessment tools. Com- 795 (1995)
put. Stat. Data Anal. 56, 4399–4412 (2012b) Liu, L.C., Hedeker, D.: A mixed-effects regression model for longitu-
Azevedo, C.L.N., Bolfarine, H., Andrade, D.F.: Parameter recovery for a dinal multivariate ordinal data. Biometrics 62, 261–268 (2006)
skew-normal IRT model under a bayesian approach: hierarchical McCulloh, R., Polson, N.G., Rossi, P.E.: A Bayesian analysis of the
framework, prior and kernel sensitivity and sample size. J. Stat. multinomial probit model with fully identified parameters. J.
Comput. Simul. 82, 1679–1699 (2012a) Econom. 99, 173–193 (2000)
Bock, R.D., Aitkin, M.: Marginal maximum likelihood estimation of Muthén, B.O.: Longitudinal studies of achievement growth using latent
item parameters: an application of an EM algorithm. Psychome- variable modeling. Learn. Individ. Differ. 10, 73–101 (1998)
trika 46, 317–328 (1981) Nunez-Anton, V., Zimmerman, D.L.: Modelinng nonstationary longi-
Chib, S., Greenberg, E.: Analysis of multivariate probit models. Bio- tudinal data. Biometrics 56, 699–705 (2000)
metrics 85, 347–361 (1998) Rencher, R.C.: Methods of Multivariate Analysis, 1st edn. Wiley Series,
Chib, S., Carlin, B.P.: On MCMC sampling in hierarchical longitudinal New York (2002)
models. Stat. Comput. 9, 17–26 (1999) Rochon, J.: Arma covariance structures with time heterocedasticity for
Congdon, P.: Applied Bayesian Modelling. Wiley, Chichester (2003) repeated measures experiments. J. Am. Stat. Assoc. 87, 777–784
Conoway, M.R.: A: random effects model for binary data. Biometrics (1992)
46(1990), 317–328 (1990) Sahu, S.K.: Bayesian estimation and model choice in item response
De Ayala, R., Sava-Bolesta, M.: Item parameter recovery for the nom- models. J. Stat. Comput. Simul. 72, 217–232 (2002)
inal response model. Appl. Psychol. Meas. 23, 3–19 (1999) Singer, J.M., Andrade, D.F.: On the choice of appropriate error terms
DeMars, C.E.: Sample size and the recovery of nominal response model in profile analysis. Statistician 43, 259–266 (1994)
item parameters. Appl. Psychol. Meas. 27, 275288 (2003) Singer, J.M., Andrade, D.F.: Analysis of longitudinal data. In: Sen, P.K.,
Douglas, J.A.: Item response models for longitudinal quality of life data Rao, C.R. (eds.) Handbook of Statistics 18, Bioenvironmental and
in clinical trials. Stat. Med. 18, 2917–2931 (1999) Public Health Statistics, pp. 115–160. Elsevier, Amsterdam (2000)
Dunson, D.B.: Dynamic latent trait models for multidimensional lon- Sinharay, S.: A Bayesian item fit analysis for unidimensional item
gitudinal data. J. Am. Stat. Assoc. 98, 555–563 (2003) response theory models. Br. J. Math. Stat. Psychol. 59, 429–449
Eid, M.: Longitudinal confirmatory factor analysis for polytomous item (2006)
responses : model definition and model selection on the basis of Sinharay, S., Johnson, M.S., Stern, H.: Posterior predictive assessment
stochastic measurement theory. Methods Psychol. Res. Online 1, of item response theory models. Appl. Psychol. Meas. 30, 298–321
65–85 (1996) (2006)
Fitzmaurice, G., Davidian, M., Verbeke, D., Molenberghs, G.: Longi- Spiegelhalter, D.J., Best, N.G., Carlin, B.P., van der Linde, A.: Bayesian
tudinal Data Analysis, 1st edn. Chapman & Hall/CRC, London measures of model complexity and fit. J. R. Stat. Soc. Ser. B 64,
(2008) 583–639 (2002)
Fox, J.-P.: Multilevel IRT assessment. In: van der Ark, M., Sijtsma, Stern, H.S., Sinharay, S.: Bayesian model checking and model diag-
K. (eds.) New Developments in Categorical Data Analysis for the nostics. In: Dey, D.D., Rao, C.R. (eds.) Handbook of Statistics 25,
Social and Behavioral Sciences, pp. 227–252. Lawrence Erlbaum Bayesian Modelling, Thinking and Computation, pp. 171–192.
Associates, Inc., London (2004) Elsevier, Amsterdam (2005)
Fox, J.-P., Glas, C.A.W.: Bayesian modification indices for IRT models. Tavares, H.R., Andrade, D.F.: Item response theory for longitudinal
Stat. Neerl. 59, 95–106 (2005) data: item and population ability parameters estimation. Test 15,
Fox, J.-P.: Bayesian Item Response Modeling: Theory and Applications, 97–123 (2006)
1st edn. Springer, New York (2010) Tiao, G.C., Zellner, A.: On the Bayesian estimation of multivariate
Gamerman, D., Lopes, H.: Markov Chain Monte Carlo : Stochastic Sim- regression. J. R. Stat. Soc. B 26, 277–285 (1964)
ulation for Bayesian Inference, 2nd edn. Chapman & Hall/CRC,
London (2006)
Gelman, A., Carlin, J.B., Stern, H.S., Rubin, D.B.: Bayesian Data
Analysis, 2nd edn. Chapman & Hall/CRC, London (2004)

123

You might also like