You are on page 1of 10

Biometrics 60, 10251033

December 2004

Surveillance in Longitudinal Models: Detection


of Intrauterine Growth Restriction

Max Petzold,1,2, Christian Sonesson,1, Eva Bergman,3 and Helle Kieler3,4

1
Department of Statistics, Goteborg University, 405 30 Goteborg, Sweden
2
Nordic School of Public Health, 402 42 Goteborg, Sweden
3
Department of Womens and Childrens Health, Uppsala University Hospital, 751 85 Uppsala, Sweden
4
Department of Medical Epidemiology, Karolinska Institute, 171 77 Stockholm, Sweden

email: Max.Petzold@statistics.gu.se

email: Christian.Sonesson@statistics.gu.se

Summary. A new methodology for online detection of intrauterine growth restriction (IUGR) is proposed
where traditional methods for statistical surveillance are applied. Here, decient growth rate is used to
detect IUGR instead of the common surrogate measure small for gestational age (SGA). Fetal growth is
estimated by repeated measurements of symphysis-fundus (SF) height. At each time point the new method,
based on the ShiryaevRoberts method, is used to evaluate the growth in SF height. We use Swedish data
to model a normal growth pattern, which is used to evaluate the capability of the new method to detect
IUGR in comparison with a method used in practice today. Results from simulations indicate that the new
method performs considerably better than the method used today. We also illustrate the eect of some
important factors which inuence the detection ability and illuminate the tendency of the method used
today to misclassify SGA cases as IUGR.
Key words: Growth; IUGR; Monitoring; ShiryaevRoberts method; Surveillance; Symphysis-fundus.

1. Introduction have a low weight. By successively comparing the SF measure-


Intrauterine growth restriction (IUGR), considered as a sub- ments from a pregnant woman with a reference curve, SGA
optimal growth compared with the genetic growth potential is suspected if one measurement is below some chosen cen-
in utero (Goldenberg and Cliver, 1997), has for decades been tile. This method originates from the work of Westin (1977)
known to substantially contribute to perinatal mortality and and will be referred to as the Gravidogram method. Sev-
morbidity (Resnik, 2002). An early detection of IUGR may eral reference curves have been developed (Steingrimsdottir,
allow for intensive monitoring during pregnancy and for plan- Cnattingius, and Lindmark, 1995; Theron and Thompson,
ning of time and method of delivery. There is evidence that 1995; Walraven et al., 1995; Kieler et al., 1996), but the Gravi-
screening for IUGR can reduce, e.g., the risk of perinatal dogram method will be applied here as a general methodology
death and the number of antenatal admissions and caesarean not using any of these particular curves.
sections for fetal distress (Alrevic and Neilson, 1995; Wester- The limitations of using SGA as an indicator of IUGR are
gaard et al., 2001). There is also evidence of long-term com- well known. A major problem of using actual weight instead of
plications for several organ systems as a result of retarded growth is the risk of misclassication. Among infants classied
fetal growth (Barker, 1997). as SGA there will be both genetically small but healthy in-
During a pregnancy, fetal growth can be screened by re- fants as well as growth restricted. Some growth-restricted and
peated symphysis-fundus (SF) heights or ultrasonic measure- potentially ill infants might be classied as average for gesta-
ments of the fetus (Mongelli and Gardosi, 2000). In contrast tional age and therefore not detected in time. To reduce the
to ultrasound, the technique to collect SF measurements is problem of misclassication customized SF reference curves,
simple and inexpensive and without any side eects. It is thus which include information on background variables, such as
widely studied as a screening instrument and used routinely the mothers booking weight, parity, and sex of the infant,
in Scandinavia in antenatal care (Challis et al., 2002). The ca- have been suggested (Gardosi and Francis, 1999). A major
pability of repeated SF measurements to detect deviant fetal problem with the customized curves is that the sex of the in-
growth will be treated in the sequel. fant is a crucial predictor of the level of the curve and the
The common methodology for detecting IUGR is to identify sex is normally not known. However, customized compared
fetuses suspected to be small for gestational age (SGA), i.e., to standard curves are an improvement but the key problem

1025
1026 Biometrics, December 2004

remains as long as actual weight (via actual SF height) instead value. For an arbitrary ith individual, the relation between
of growth is used to measure growth restriction (Altman and ln SF and gestational age t (measured in completed weeks) is
Hytten, 1989). modeled as
A good indicator of the current growth rate would be the in-
crease in SF height from the previous time point to the current ln SF i (t) = B0i + 1 t + 2 t + ei (t), (1)
(Royston, 1995). By using the current growth we can expect where B0i and ei (t) are independent realizations from the
less impact from background variables because each individ- N( 0 , 2B0 ) and the N(0, 2e ) distributions, respectively. The
ual acts as its own control. Furthermore, it seems plausible repeated observations of an individual are further assumed to
that the prognosis of a pregnancy with SF measurements fol- be independent over time, i.e., cov(ei (u),ei (v)) = 0 for u = v.
lowing the 50th percentile in the reference curve then drifting The estimates for the model in (1) were obtained as: 0 =
down to the 5th percentile some time points later will dier 0.8949, B 2
= 0.001621, 1 = 0.08256, 2 = 1.2322, and
0
from a pregnancy with SF heights remaining stable on the 5th 2
e = 0.001455, which tted the data well. This class of mod-
percentile. Therefore, the growth rate may be a better indi- els has been proposed in a number of fetal growth studies
cator of IUGR than the SF height itself. However, one has to (see, e.g., Royston and Altman, 1994; Gurrin et al., 2001).
remember that the information about the size of the fetus is A further discussion of the model choice can be found in
lost if focusing solely on the growth rate. The size of the fetus Section 6.
contains information, which is of most importance for other The growth between successive time points for the ith in-
purposes than as an IUGR indicator. dividual can easily be found from (1) as
The question is now how to use growth data optimally to
detect IUGR. From the theory of statistical surveillance, we i (t) = ln SF i (t) ln SF i (t 1)
know that the use of only the last observation is generally not
= 1 + 2 ( t t 1) + i (t), (2)
eective to detect changes in an observed process (Frisen and
de Mare, 1991). Instead all available observations should be where i (t) = ei (t) ei (t 1). Each subject now acts as its
used to improve the ability to detect IUGR. It is thus of in- own control and the individual eect is eliminated by using
terest to develop a theoretically well-motivated methodology the i (t). Thus, the i (t) can be regarded as a summary
based on the repeated SF measurements. statistic (Matthews et al., 1990). From now on, the index i
This article is organized as follows. In Section 2, we es- will be suppressed. Successive values will not be indepen-
timate the normal growth pattern using data from a previ- dent but follow a moving average process of order 1. A time
ous Swedish study. In Section 3, we discuss the collection sequence of values will have a variancecovariance matrix
of SF data to use for surveillance. A new way using self- of Toeplitz type; = ( 2uv ), where 2uv = 2 2e if u = v, 2uv =
measurements is suggested. We discuss the general framework 2e if |u v| = 1, and zero otherwise. However, this time
for statistical surveillance in Section 4. Here, we also present dependency is taken care of in the surveillance method we
the Gravidogram method in detail and derive the new SR propose in Section 4.2.
method. In Section 5, inferential issues of the methods are
evaluated using extensive simulations. In Section 6, we dis- 3. Sampling Scheme of the Symphysis-
cuss dierent types of possible improvements of the proposed Fundus Measurements
surveillance system, both from a theoretical and a practical The sampling scheme of the SF measurements is of great im-
point of view and give some concluding remarks. portance for the detection ability. The risk of not measuring
the SF height at every time point is the risk of not observing
2. Estimating the Normal Growth Pattern the growth when the IUGR occurs. In this case, we will have a
In this section, the aim is to derive a model capturing the delay in the detection of the IUGR, which is at least as long as
essential parts of the growth pattern, which enables an infer- the time to the next measurement. Normally, the SF measure-
ential examination of the surveillance methods in the sequel. ments are not regularly spaced in time. In general, the time
To model the normal growth pattern a dataset with repeated intervals between the measurements are shorter near delivery
SF measurements on 2255 healthy women living in Sweden, (every week instead of every third or fourth as in the begin-
who participated in a multicenter prospective trial on routine ning of the series). In practice, a xed schedule can be hard
ultrasound in the second trimester between 1986 and 1987, to follow and in our dataset the number of SF measurements
was used (Kieler et al., 1996). In the data, gestational age, ex- diers between individuals from minimum 2 to maximum 17
pressed as completed gestational weeks, was calculated from spread from the 14th to the 42nd week of gestation.
the measurements of ultrasonic biparietal diameter in the sec- However, in this study, we will assume a sampling scheme
ond trimester. Women with miscarriages, induced abortions, where we collect measurements of SF every week starting from
multiple pregnancies, pregnancy-induced hypertension, or de- week 20. To shorten the intervals between measurements with-
livery before 37 weeks of gestation were excluded. out increasing the number of antenatal visits would be pos-
Having repeated measurements on each subject the total sible if the SF heights are measured by the pregnant women
variability of the response variable can be explained partly themselves. A recently published pilot study in England re-
by the between-subject variability reducing the unexplained ports positive results of training of women to measure their
residual variance. Here, genetic dierences will cause between- own SF heights (Boulos et al., 1999). The study population
subject variability that is expressed in the model as dierent was found to perform as well as midwives. The study was
intercepts. A genetically small fetus will have a small value of blinded in the way that the women marked the length on a
the intercept while a genetically large fetus will have a large blank tape. The SF height was then measured from the blank
Surveillance in Longitudinal Models 1027

tape using an ordinary measuring tape. In a coming Swedish surveillance methodology (and not on any particular reference
study of self-measurements, the women are supposed to take curve from the literature), we will evaluate the Gravidogram
repeated SF measurements at each occasion (once a week) fa- methodology using the model in (1) as our reference curve.
cilitating estimation of within- and between-individual vari- The time of an alarm can then be written as
ance. The measurements will also be compared to the mea-
tA = min{s; ln SF (s) < G(s)},
surements from the routine visits at the health center.
 1/2
where G(s) = E[ln SF (s)] 2 B
2
0
+ e2 .
4. Methods for Statistical Surveillance
Statistical surveillance or statistical process control (SPC) In models without any random intercept, properties of surveil-
was rst used in industrial settings to control the quality of lance methods which use only the last observation can gener-
the manufactured products (Shewhart, 1931). The elds of ally be derived analytically. However in this case, the random
applications have broadened since and statistical surveillance intercept complicates matters.
is now used in several areas. Examples include the monitoring 4.2 The SR Method
of fetal heart rate during labor (Frisen, 1992) and detection In this section, we will derive a new method for the de-
of an increased incidence of a disease (Sonesson and Bock, tection of IUGR studied in this article. It is based on the
2003). ShiryaevRoberts method (Shiryaev, 1963; Roberts, 1966)
Statistical surveillance is a sequential decision procedure, and will be referred to as the SR method. Many of the com-
where we at each decision time point want to determine monly used surveillance methods can be formulated in terms
whether the process we observe is in or out of control. The of conditional likelihood ratios. The optimal method for min-
process is said to be out of control if it has undergone an im- imizing the expected delay (from the event to the detec-
portant change, which we are interested in. Here we say that tion) of an alarm is the full likelihood-ratio method. How-
the process is in control if the fetus is growing at the expected ever, this method requires an assumption of the distribution
rate, while it is out of control if an IUGR has occurred. The of the change point. Because no reliable prior information
change occurs at an unknown time point , and our aim is about the distribution of the time point at which IUGR oc-
to detect it as quickly as possible. Thus, by collecting data curs is available, we will instead base our method on the
sequentially in time we want to discriminate between the two ShiryaevRoberts method. It can be regarded as being the
events; D(s) = { > s} and C(s) = { s}, where s is the full likelihood-ratio method using a noninformative prior for
decision time point. To do this we use an alarm system, which the time of the change point.
consists of two parts: an alarm statistic, which is a function We will base the SR method on (t). At time point s,
of the observations up to the current time point and an alarm let s be the vector of observations with variancecovariance
limit. The time of an alarm (the indicator that the process is matrix s . Let L(s,t  ) be the (conditional) likelihood ratio for
out of control) is the rst time the alarm statistic exceeds the the case = t  . It follows that
alarm limit.
There are several ways to construct the alarm system de- L(s, t ) = f (s | = t )/f (s | > s)
pending on the desired features of the system. The features of exp[{s s (t )}1  
s {s s (t )} ]
the methods are generally evaluated both with respect to false =
exp[{s s }s {s s } ]
1 
alarms as well as motivated ones. How to optimally choose
the alarm statistic and the alarm limits depends on the pro- = exp[{s s (t )}1  
s {s s (t )}
cess under study, what is desired to detect, and at what time {s s }1 
s {s s } ],
points the highest detection strength is desirable. These are
the same types of questions that arise also in a traditional where s (t  ) is the vector of expected values of s for the case
hypothesis-testing situation, although the time component is = t  and s is the vector of expected values of s for the
not present. However, the sequential decision situation makes case > s, which is derived from formula (2). For the SR
the standard hypothesis-testing tools nonapplicable. method all conditional likelihoods are weighted equally in the
alarm statistic together with a constant alarm limit and an
4.1 The Gravidogram Method alarm is given at
The simplest way to construct a surveillance method is to  
use only the last observation made of the process to form 
s

tA = min s; L(s, t ) > K .
the alarm statistic. If this observation deviates enough from
t =20
what is expected, an alarm is triggered (Shewhart, 1931). In
surveillance of fetus growth, this type of surveillance method The SR method allows us to specify the out of control state,
originates from the work of Westin (1977) to detect SGA. s (t  ), that is, what happens with the growth pattern when
Today this method is widely used also for the detection of an IUGR occurs. We can thus optimize the method to de-
IUGR. The Gravidogram method is based on the absolute tect a certain type of IUGR. A restricted growth (the event
size of the fetus. An alarm is triggered if the current SF value to be detected) will correspond to a decrease in the growth
is below the 2.5th population percentile in a chosen refer- rate compared to the expected one. Here we will optimize
ence curve. Note that the current SF value is an indicator of the SR method for the case of an IUGR with a completely
the total growth until time s and not an indicator of the cur- stopped growth (see Figure 1). This is usually the result of
rent growth rate. In this study, we have modeled ln SF(s) in placental insuciency secondary to extrinsic factors such as
(1) using a longitudinal approach. As our main focus is on the hypertension and diabetes mellitus (Kurjak, 1998). Note that
1028 Biometrics, December 2004

lnSF the eect of some factors on the ability to detect IUGR (the
3.7
length of the time period for detection, the time point when
3.6 the IUGR occurs, the residual variance). These factors in-
uence both the level of detection ability and the relative
3.5
performance of the methods. For the Gravidogram method
3.4 the dependency of the genetic size of the fetus will also be il-
3.3 lustrated. However, every situation is a combination of these
factors which contribute to a complex pattern of performance.
3.2
We will therefore illustrate the performance with some exam-
3.1 ples and discuss the general implications of the various factors
3.0
on both the level of detection ability and the relative perfor-
mance of the methods.
2.9
The surveillance methods were evaluated using simulations.
20 25 30 35 40
In this way we can investigate with high accuracy how the
Gestational age (t)
methods will perform when applied in clinical practice. The ln
SF values were simulated using model (1) with the estimated
Figure 1. The expected growth pattern for a normal growth
parameter values from Section 2, and the values were then
(solid line) and for the case when IUGR (here a completely
calculated from the ln SF values using (2). For each evaluation
stopped growth) occurs in the 25th week of gestation (dotted
of the performance of the methods, 1,000,000 replicates have
line).
been used.
5.1 The Distributions of the False Alarms
although the alarm statistic for the SR method implies that
To make the evaluation of the detection ability comparable
the conditional likelihoods are weighted equally, recent obser-
the probability of a false alarm is xed within a normal length
vations have more weight than old ones, which is intuitively
of a pregnancy (here 40 weeks). The Gravidogram method
appealing.
was found to have a 19.2% probability of a false alarm within
4.3 Methods Based on the Residual Score this period. In medical terms, the specicity of the method is
In Hooper, Mayes, and Demianczuk (2002) surveillance of fe- 19.2% if we regard the whole pregnancy to represent one test
tal growth was discussed, but no surveillance method was ex- period. For comparability, the alarm limit for the SR method
plicitly proposed. A model for the growth in estimated fetal was set to achieve the same probability.
weight by ultrasonic measurements was proposed. The dataset Of great importance when comparing the methods are the
used for the model consisted of 13,593 ultrasonic examinations distributions of the false alarms. In Figure 2, the cumulative
on 7888 individuals, which on average is less than 2 per sub- false alarm distributions are displayed. For the Gravidogram
ject and thus hampers the estimation of a mixed model as method the distribution of the false alarms is approximately
the one in (1). Instead the transformation to residual scores geometric, with highest probability of a false alarm at the rst
was suggested to account for the individual eect. Although it decision time point and thereafter a geometrically decreasing
might be useful in some situations, there is no special merit in probability giving the retarding shape of the cumulative dis-
using it for the type of data we generate from model (1) in our tribution. This is not the case for the SR method. For the SR
simulation study. From the model in (1), the residual score, method we need two SF measurements to start the surveil-
R(s), for a particular decision time point, s, can be found by lance, which is a drawback due to the use of values. Hence,
straightforward calculations to be we cannot have a false alarm at the rst time point. The low

R(s) = k(s) ln SF (s) E[ln SF (s)]
P(tAt)

s1  0.20
w(s) (ln SF (t) E[ln SF (t)]) ,
t=20
0.15
where k(s) and w(s) are constants depending on s and the
s1
term w(s) t=20 (ln SF (t) E[ln SF (t)]) can be regarded as
an estimator of B0i 0 . By calculating w(s) for each s one can 0.10

show that this estimator is biased. Slowly however, as more


data are collected, the bias decreases. Therefore, we prefer Gravidogram
0.05
the -transformation, which eliminates the individual eect SR
immediately.
0.00
5. Properties of the Methods 20 25 30 35 40
There are two types of alarms to consider: false alarms and Gestational age (t)
motivated ones and we have to face a tradeo between the
rate of false alarms and short delay times for motivated ones. Figure 2. The cumulative false alarm distributions as a
We will here illustrate the distribution of false alarms and function of the gestational age (t).
Surveillance in Longitudinal Models 1029

frequency of false alarms for early time points is typical for PSD(d,)
the SR method. For later time points we have a higher rate. 1.00

Since the false alarm distribution is an indicator of the sen-


sibility of the methods at dierent time points, the ability to 0.80
detect IUGR will also depend on it. This means that in gen-
eral the Gravidogram method will perform relatively better 0.60
for cases of IUGR occurring at the start of the monitoring pe-
SR ( =25)
riod. However, for severe cases of IUGR, the ability to save a 0.40 SR ( =30)
fetus is small for early time points (Kurjak, 1998). Therefore,
Gravidogram ( =25)
it could be argued that the false alarm distribution of the SR Gravidogram ( =30)
0.20
method is preferable to that of the Gravidogram method from
a medical point of view.
0.00
A most important issue to deal with is the dierence be- 0 1 2 3 4 5
tween a genetically small but healthy fetus and a fetus suf-
d
fering from IUGR. The way the Gravidogram method is con-
structed, a misclassication of SGA fetuses as IUGR is un- Figure 3. The probability of successful detection (PSD) as
avoidable. As previously stated, the genetic dierences are a function of the time in weeks, d, available for detection. The
expressed as dierent intercepts for dierent individuals in week of gestation when the IUGR occurs, , is indicated in
the mixed model for the expected growth pattern; see for- the labels.
mula (1). The value of the intercept is highly inuential on
the rate of false alarm. By using a xed intercept set to a delay time of the detection into account. There are also other
certain percentile from the distribution of B0 , this was exam- ways of evaluating surveillance methods. The most commonly
ined for the Gravidogram method. The probability of a false used ones however focus on the expected delay time. For the
alarm was then found to be 93.3%, 25.8%, 3.8%, and 0.3%, detection of IUGR the important feature is not the expected
respectively, for the 5th, 25th, 50th, and 75th percentile of delay time but rather the proportion of fetuses we can detect
the intercept. Note that the probability of a false alarm is in- within a reasonable time.
dependent of the genetic size of the fetus for the SR method
because is independent of the intercept B0 . 5.2.1 The inuence of the time available for the detection of
In medical practice, careful attention must be taken when IUGR. The detection ability as a function of the delay time
an alarm is triggered. Because the alarms can be false, for available for the detection, d, is displayed in Figure 3. It is
example, due to measurement errors, more extensive medical most important to notice that the detection ability of the
examinations have to be performed. For that purpose, ultra- SR method is considerably higher than for the Gravidogram
sonic measurements are especially useful for a deeper analysis method.
of the state of a fetus before starting complicated medical Observe that the level of detection is almost 70% within 2
actions. weeks for the SR method if the IUGR occurs in the 30th week
of gestation. In this period of the pregnancy, 2 weeks alone
5.2 Ability of Successful Detection
would correspond to a normal interval between the regular
A common way to evaluate test procedures in medicine is to visits to a health center. The corresponding sensitivity for
use the sensitivity. Usually the sensitivity is dened as the the Gravidogram method is only about 30%.
proportion of the positives which are correctly identied by
a test procedure. In this application the sensitivity is the pro- 5.2.2 The inuence of the time point of the IUGR. It should
portion of IUGR fetuses which are correctly classied. How- be clear that a large decrease in the expected value of ln SF(t)
ever, the sensitivity does not say anything about when the is easier to detect than a small one. Therefore, the level of
classication is made. The delay time in detection of IUGR detection is decreasing as a function of time due to a decreased
is crucial and therefore we must use measures of evaluation, expected growth rate (Figure 1) as can be seen in Figure 4.
which include a time component as well. When evaluating Although the SR method needs two observations before an
motivated alarms, we nd it most appropriate to consider the alarm can be triggered, it has a higher PSD than the Gravi-
probability of successful detection, dogram method also for IUGR occurring before 25 weeks of
gestation. Normally, IUGR with a completely stopped growth
PSD(d, ) = P(tA d | tA ). is believed to occur in the third trimester (Owen et al., 2001)
in which case the detection ability is much higher for the SR
It stands for the probability that the IUGR is detected with a
method. A large value of the time available for detection (d)
delay time no longer than d when the IUGR occurred at time
favors the SR method even more.
point . This measure of evaluation has been used in medi-
cal applications before (Frisen, 1992), e.g., when monitoring 5.2.3 The inuence of the residual variance. The technique
a fetus heart rate during labor. Both the sensitivity and the of measuring SF heights has been criticized for being impre-
probability of successful detection thus deal with the propor- cise. However, standardized measuring procedures and train-
tion of correctly classied positives. However for situations ing can substantially decrease the variance in the data (Rogers
when the sensitivity is generally used, the time component is and Needham, 1985; Johnsen, Jacobsen, and Kno, 1988).
not present. Therefore, the probability of successful detection One crucial issue introducing self-measurements is the qual-
can be regarded as a measure of sensitivity taking also the ity of the data reported. The residual variance, 2e , inuences
1030 Biometrics, December 2004

PSD(d,) PSD(d,30)
1.00 1.00
SR (d=1)
SR (d=2)
0.80 0.80
Gravidogram (d=1)
Gravidogram (d=2)
0.60 0.60

Gravidogram
0.40 0.40 Gravidogram (25%)
Gravidogram (50%)
0.20 0.20 Gravidogram (75%)

0.00 0.00
20 25 30 35 40 0 1 2 3 4 5
d

Figure 4. The probability of successful detection as a func- Figure 6. The probability of successful detection as a func-
tion of the week of gestation the IUGR occurs. The value of tion of the time in weeks (d) available for detection, when the
d is indicated in the labels. IUGR occurs in the 30th week of gestation. The percentile of
the intercept used is indicated in the labels.
the level of the detection ability as well as the relative perfor-
mance between the Gravidogram and the SR method. Here, method. This is also the case for the detection ability when
we have examined the eect of a decrease in the residual vari- the IUGR occurs as can be seen in Figure 6. The Gravidogram
ance by 50%. This could be the result of several factors such method is very sensitive to detect IUGR occurring for genet-
as extensive training in measuring SF height or as a result of ically small fetuses, but not for genetically large ones. Note
measuring the SF height more than once at each occasion and that the detection ability of the SR method is independent of
thus reducing the residual variance by considering the mean the value of the intercept. This means that the detection abil-
of the observations taken. ity is independent of the genetic size of the fetus, which might
In Figure 5, PSD(1, ) values are presented for the two be preferred from an ethical point of view when applying the
values of the residual variance. Because V((t)) = 2 2e and method as a screening instrument in practice.
V(lnSF(t)) = 2B0 + 2e the eect of reducing the residual vari-
5.3 The Inuence of the Sampling Scheme on the False Alarm
ance is larger for the SR method than for the Gravidogram
Distribution and the Ability of Detection
method. For the SR method the detection ability is here gen-
erally increased by over 50%. Practical clinical work to reduce The frequency of measurements in the sampling scheme has
the residual variance in SF measurements is thus of greatest a clear inuence on the properties of the methods. In general,
interest. a sparse scheme will give less information and on average in-
crease the delay time of an alarm measured in real time for all
5.2.4 The inuence of the genetic size of the fetus. As was surveillance methods. However, the detection ability does also
seen in Section 5.1, the genetic size of the fetus was impor- depend on the distribution of false alarms. A sparse sampling
tant for the probability of a false alarm for the Gravidogram scheme will allow more false alarms per sampling time point
than a frequent sampling scheme if we use a xed false alarm
rate for the whole surveillance period. This factor favors the
PSD(1,) sparse scheme, but since the frequent scheme provides more
1.00
SR information, it is to be preferred. If we had knowledge about
SR* the true distribution of IUGR, the sampling schemes could
0.80 Gravidogram have been adapted to it. This would have implied sparse sam-
Gravidogram* pling when the probability of IUGR is small and frequent
0.60 sampling during the time periods when IUGR is expected to
occur.
0.40
6. Discussion
0.20
In this article, we have shown that the methodology used
in practice today in Sweden and several other countries for
detection of IUGR can be considerably improved still using
0.00
20 25 30 35 40
SF measurements. Three important aspects of this article are
the following. First, we have focused on the growth rate in
symphysis-fundus height as being the important characteris-
Figure 5. The probability of successful detection within tic of naturally growing fetuses and not the absolute size of
1 week as a function of the week of gestation the IUGR oc- the fetus. An important advantage of using the growth rate
curs. The cases with a reduced residual variance by 50% are is that additive eects from background variables (smoking,
indicated by in the labels. parity, sex of the child) in the model describing the absolute
Surveillance in Longitudinal Models 1031

size do not inuence the growth rate. Second, we have An example is when the growth rate decreases and remains at
introduced a theoretically well-motivated surveillance method a lower rate than expected throughout the pregnancy. Often
to detect IUGR. In order to achieve an eective surveillance this pattern occurs with extrinsic conditions such as intrauter-
method, the information in the observed data must be treated ine infections or intrinsic embryonic conditions such as con-
correctly as by the SR method. Third, we have used a mea- genital malformations (Kurjak, 1998). To investigate the ro-
sure of evaluation, PSD, which focuses not only on whether bustness of the IUGR specication, we have evaluated the
the IUGR is detected or not, but also on the delay time of detection of a completely stopped growth when specifying
the detection. the SR method for a decreased growth rate. Notable deterio-
Surveillance in mixed models is rare in the literature. The ration in detection ability was only present for early changes
focus in this article is the principles of surveillance. The mixed (before the 25th week of gestation), but not for later ones,
model in (1) has been used for our simulation study and as where the nonoptimized SR method performed almost iden-
such it is a reasonable model for describing the growth in tically to the correctly specied version. This indicates that
SF height. However, the model is not meant to be used for the SR method is robust to the specication of the type of
causal interpretations. If that were the target of the article, it IUGR.
might be argued that the model is oversimplied. In general, We believe that self-measurements used together with the
the SF measures can be suspected to be dependent on sex of SR method have a large potential as a screening instrument
the infant and maternal variables such as smoking and parity in clinical practice. In practice today, cases where we suspect
(Mongelli and Gardosi, 1999). For our purpose though, to IUGR (e.g., where SGA is suspected) are measured more fre-
examine dierent surveillance methodologies, model (1) is quently than those considered to be out of risk. However,
clearly relevant. A misspecication in (1) in terms of the IUGR can occur at any time point during pregnancy. One
functional form of the expected growth or an error in the major advantage in using self-measurements is that all women
estimation the parameters 1 and 2 will not change the con- are screened equally often throughout the whole pregnancy
clusions regarding the dierences between the methods re- and not based on the initial measurement values. We also
ported from the simulation study, if the model captures the believe that self-measuring at home will be better and eas-
true mean slope approximately. However, the exact numerical ier for the mothers and that the number of missing data
results from the simulations might change slightly. will be small. Self-measuring can also contribute to a more
In our approach we have not used any assumptions on the cost-eective antenatal-care program by reducing the num-
distribution of the time point when the IUGR occurs. If we ber of visits to the health center. A recent study of standard
had information about the distribution from theory or previ- antenatal-care programs in terms of clinical outcomes, per-
ous studies we could focus the detection ability to the most ceived satisfaction, and costs concluded that a model with a
probable time points and thus achieve more ecient meth- reduced number of antenatal visits could be introduced with-
ods. Knowing the distribution of the time point of IUGR and out risk to the mother or baby (Carroli et al., 2001). A com-
a value of the incidence of IUGR, we can calculate the pre- bined program of self-measuring and visits to the health center
dictive value (Frisen, 1992) of the methods which helps us in could then be very cost-eective and increase the performance
the choice of actions to take if an alarm is triggered. The pre- of the antenatal care.
dictive value is a function of the time point of an alarm and For the clinical practitioner, the SR method is not as easy
dened as PV(s) = P(C(s) | tA = s), which is the probability to grasp as for example the Gravidogram method. This should
that the IUGR has occurred given that an alarm is triggered not, however, be taken as a motivation for using the nonsat-
at a specic time point. A constant predictive value is in many isfactory Gravidogram method. A computer program, which
situations desirable since it implies that the actions to follow performs all necessary calculations, can be used by the prac-
an alarm can be the same whenever it is triggered. The alarms titioner, who only has to register the SF measurements in a
of the Gravidogram method will not be trustworthy due to database. Also the mothers can be equipped at home with a
the high intensity of early false alarms. As a result, the SR simple calculator designed for this monitoring. What needs
method will have a considerably higher predictive value than to be done before applying the technique in practice is a
the Gravidogram method. Also in this respect the SR method large study of self-measurements in order to construct an
will be preferable to the Gravidogram method. appropriate expected growth curve. An interesting extension
We have chosen to study the behavior of the SR method of this work is to apply multivariate surveillance (Lundbye-
since it is known to have several desirable properties such as Christensen, 1989; Wessman, 1998) where the monitoring can
a near optimal expected delay and for some cases an almost be based on both weekly SF measurements and infrequent
constant predictive value. If the requirement of the method is ultrasonic measurements.
very specic, such as stated here where focus has been solely
on the PSD, the SR method is not the optimal method. An- Acknowledgements
other weighting of the conditional likelihoods could be chosen. The authors thank Professor Marianne Frisen for much good
However that would require a focus on a particular value of advice during the work with this article. Robert Jonsson, Eva
time of delay (d). Andersson, the editor, and two anonymous referees are ac-
An advantage of the SR method is that it can be opti- knowledged for valuable comments.
mized to detect a specic kind of growth. Here, the SR method
has been specied for the detection of a completely stopped Resume
growth. For the detection of other types of IUGR it would be Une nouvelle methodologie pour la detection en ligne du
preferable to specify the SR method according to this type. ralentissement de la croissance intra-uterine (IUGR) est
1032 Biometrics, December 2004

proposee la, ou des methodes statistiques traditionnelles pour Johnsen, T. S., Jacobsen, G., and Kno, T. (1988). The ef-
la surveillance sont utilisees. Dans ce cadre, une mesure de fect of practical training in obstetrics among medical
croissance faible est utilisee pour detecter lIUGR au lieu de la studentsSymphysis fundal height measurements. Med-
mesure intermediaire petit pour lage gestationnel (SGA). ical Education 22, 438444.
La croissance foetale est evaluee par des mesures repetees de Kieler, H., Axelsson, O., Hellberg, D., Nilsson, S., and
la hauteur symphyse-fundus (SF). A chaque point la nou-
Waldenstrom, U. (1996). Serial measurements of
velle methode, basee sur la methode de Shiryaev-Roberts, est
utilisee pour evaluer la croissance de la hauteur de la SF. Nous symphysis-fundus height in women with ultrasonically
avons utilise des donnees suedoises pour modeliser un modele dated pregnancies. Journal of Obstetrics and Gynaecology
de croissance normal, ce modele permet devaluer la capacite 16, 228229.
de la nouvelle methode dans la detection de lIUGR en com- Kurjak, A. (1998). Textbook of Perinatal Medicine: A
paraison a la methode utilisee en pratique aujourdhui. Les Comprehensive Guide to Modern Clinical Perinatology.
resultats des simulations indiquent que la nouvelle methode Carnforth, U.K.: Parthenon.
obtient des performances considerablement meilleures que la Lundbye-Christensen, S. (1989). Monitoring pregnancy. Com-
methode actuelle. Nous illustrons aussi leet de quelques fac- munications in StatisticsStochastic Models 5, 383399.
teurs importants, qui inuencent la capacite de detection et Matthews, J. N. S., Altman, D. G., Campbell, M. J., and
eclairent la tendance de la methode actuelle a mal classer les
Royston, P. (1990). Analysis of serial measurements in
cas dIUGR a partir des SGA.
medical-research. British Medical Journal 300, 230235.
Mongelli, M. and Gardosi, J. (1999). Symphysis-fundus height
References and pregnancy characteristics in ultrasound-dated preg-
Alrevic, Z. and Neilson, J. P. (1995). Doppler ultrasonog- nancies. Obstetrics and Gynecology 94, 591594.
raphy in high-risk pregnanciesSystematic review with Mongelli, M. and Gardosi, F. (2000). Fetal growth. Current
metaanalysis. American Journal of Obstetrics and Gyne- Opinion in Obstetrics and Gynecology 12, 111115.
cology 172, 13791387. Owen, P., Maharaj, S., Khan, K. S., and Howie, P. W.
Altman, D. G. and Hytten, F. E. (1989). Intrauterine growth- (2001). Interval between fetal measurements in predict-
retardationLets be clear about it. British Journal of ing growth restriction. Obstetrics and Gynecology 97,
Obstetrics and Gynaecology 96, 11271128. 499504.
Barker, D. J. P. (1997). The long-term outcome of retarded Resnik, R. (2002). Intrauterine growth restriction. Obstetrics
fetal growth. Clinical Obstetrics and Gynecology 40, 853 and Gynecology 99, 490496.
863. Roberts, S. W. (1966). A comparison of some control chart
Boulos, A. N., Griths, M., Allott, H., and Holt, E. M. (1999). procedures. Technometrics 8, 411430.
Trial of self-administered antenatal care: Maternal sym- Rogers, M. S. and Needham, P. G. (1985). Evaluation of fun-
physis fundal height measurements. Journal of Obstetrics dal height measurement in antenatal care. Australian and
and Gynaecology 19, 623. New Zealand Journal of Obstetrics and Gynaecology 25,
Carroli, G., Villar, J., Piaggio, G., et al. (2001). WHO sys- 8790.
tematic review of randomised controlled trials of routine Royston, P. (1995). Calculation of unconditional and condi-
antenatal care. Lancet 357, 15651570. tional reference intervals for fetal size and growth from
Challis, K., Osman, N. B., Nystrom, L., Nordahl, G., and longitudinal measurements. Statistics in Medicine 14,
Bergstrom, S. (2002). Symphysis-fundal height growth 14171436.
chart of an obstetric cohort of 817 Mozambican women Royston, P. and Altman, D. G. (1994). Regression using
with ultrasound-dated singleton pregnancies. Tropical fractional polynomials of continuous covariates-
Medicine and International Health 7, 678684. parsimonious parametric modeling. Applied Statistics
Frisen, M. (1992). Evaluations of methods for statistical Journal of the Royal Statistical Society, Series C 43,
surveillance. Statistics in Medicine 11, 14891502. 429467.
Frisen, M. and de Mare, J. (1991). Optimal surveillance. Shewhart, W. A. (1931). Economic Control of Quality of Man-
Biometrika 78, 271280. ufactured Product. London: MacMillan.
Gardosi, J. and Francis, A. (1999). Controlled trial of fun- Shiryaev, A. N. (1963). On optimum methods in quickest de-
dal height measurement plotted on customised antenatal tection problems. Theory of Probability and Its Applica-
growth charts. British Journal of Obstetrics and Gynae- tions 8, 2246.
cology 106, 309317. Sonesson, C. and Bock, D. (2003). A review and discussion
Goldenberg, R. L. and Cliver, S. P. (1997). Small for ges- of prospective statistical surveillance in public health.
tational age and intrauterine growth restriction: Deni- Journal of the Royal Statistical Society, Series A 166, 5
tions and standards. Clinical Obstetrics and Gynecology 21.
40, 704714. Steingrimsdottir, T., Cnattingius, S., and Lindmark, G.
Gurrin, L. C., Blake, K. V., Evans, S. F., and Newnham, J. P. (1995). Symphysis-fundus height: Construction of a new
(2001). Statistical measures of foetal growth using linear Swedish reference curve, based on ultrasonically dated
mixed models applied to the foetal origins hypothesis. pregnancies. Acta Obstetricia et Gynecologica Scandinav-
Statistics in Medicine 20, 33913409. ica 74, 346351.
Hooper, P. M., Mayes, D. C., and Demianczuk, N. N. (2002). Theron, G. B. and Thompson, M. L. (1995). A centile chart
A model for foetal growth and diagnosis of intrauterine for birth weight for an urban population of the Western
growth restriction. Statistics in Medicine 21, 95112. Cape. South African Medical Journal 85, 12891292.
Surveillance in Longitudinal Models 1033

Walraven, G. E. L., Mkanje, R. J. B., Vandongen, P. W. J., use of umbilical artery Doppler ultrasound in high-risk
Vanroosmalen, J., and Dolmans, W. M. V. (1995). The pregnancies: Use of meta-analyses in evidence-based ob-
development of a local symphysis-fundal height chart in stetrics. Ultrasound in Obstetrics and Gynecology 17, 466
a rural area of Tanzania. European Journal of Obstetrics, 476.
Gynecology, and Reproductive Biology 60, 149152. Westin, B. (1977). Gravidogram and fetal growth. Acta
Wessman, P. (1998). Some principles for surveillance adopted Obstetricia et Gynecologica Scandinavica 56, 273
for multivariate processes with a common change point. 282.
Communications in StatisticsTheory and Methods 27,
11431161.
Westergaard, H. B., Langho-Roos, J., Lingman, G., Marsal, Received October 2003. Revised March 2004.
K., and Kreiner, S. (2001). A critical appraisal of the Accepted March 2004.

You might also like