You are on page 1of 12

Effect of Maternal Smoking on Baby Weights and

Gestation Age
Zijing Wang, Xuanyu Liang, Kevin Xue, Xiao Wang, Jiacheng
Shi, Minsheng Liu, and Junxian Tan.
February 3, 2019

Introduction

While people nowadays are more concerned about health issues, there
exist plentiful studies showing that smoking is a significant factor
that affects human systems. Under the environment of suffering from
both direct and passive smoking, individuals are exposed to increased
risk of lung disease and cancer. Furthermore, smoking could not
only affect smokers but also have detrimental effects on the fetus.
Numerous medical reports and scientific researches have shown that
maternal smoking could create restrictions on fetus growth, and such
restrictions are the main causes of the immaturity of infants. Therefore
this report will investigate the association between maternal smoking
and immature infants.
The data of this study is composed of the measure of all pregnan-
cies between 1960 and 1967 among females in Kaiser Health Plan in
Oakland, California. Several statistical methods will be used in this
case study, including but not limited to numerical analysis, graphi-
cal interpretations, and testing methods. Numerical analysis is used
primarily to understand the summary of data. In addition, box-plots,
histograms, and Quantile-Quantile plots will be employed as graphi-
cal methods to understand the distribution of data. Several statistical
tests, including the Kurtosis test, skewness test, and hypothesis test
will be applied to analyze the distribution between the two groups.

Data

In this study, the primary data we use come from the Child Health
and Development Studies (CHDS). The data consist of pregnancies
in Kaiser Health Plan in Oakland, California from 1960 and 1967. All
mothers in this study were enrolled in this health plan and received
prenatal care. The whole data set consists 1236 babies.
Among all information in the data set1 , we are interested in the 1
We received two versions of the same
following variables: data, babies.txt and babies23.txt.
The latter contains more variables. Since
the former is sufficient for our purpose,
• weight of baby, it is used in our preprocessing step.
• gestation age,
• and maternal smoking status.
effect of maternal smoking on baby weights and gestation age 2

Weights are measured in ounces. Heights are measured in inches.


Mother’s smoking status is a binary categorical data consisting of
two labels: smoking and not-smoking. All variables except maternal
smoking status are continuous.
To avoid interference of any other variables, our data set consists
solely of births of male babies that are single (no twins) and in which
babies survived at least 28 days.

Data Preprocessing
We preprocess the data in three steps2 : 2
Relevant code is available in
preprocess.R in supplementary files.
• We select variables we need and rename them appropriately:
– baby.wt: weight of baby
– gestation: gestation age
– smoke: maternal smoking status

• We replace all unknown values with R’s NA for future convenience.


– For maternal smoking status, 9 represents unknown.
– For gestation age, 999 represents unknown.
– Every baby’s weight is available in the data set.

• We convert maternal smoking status into R’s factor with two


labels, 'not smoking' and 'smoking'.

Background

Recent studies on children’s health development have emphasized


research on baby’s birth weight as well as the gestational age. Since
babies’ birth weight and their gestational age are two important in-
dicators for infant growth. Gestational age is the measure of the time
interval between mothers’ last menstrual period and the birth of ba-
bies. The normal gestational age for an infant ranges from 37 weeks
to 42 weeks. Babies born before 37 weeks are classified as preterm or
premature babies and they usually come with lower birth weights,
which refers to infants born weighing 2500 gram or less(de Bernabé
et al 4). Doctors and researchers also use the term “Small for Gesta-
tional Age (SGA)” to classify infants who have weights below the ten
percentile compared to infants with the same gestational age.
In Risk Factors for Low Birth Weight, de Bernabé and other researchers
have listed several consequences for low-birth weight babies. At the
age of five, premature babies have on average 6.7 points lower on
intellectual quotient (IQ) than normal infants. In addition, preterm
babies could be slower in developing language abilities and may have
problems in school (de Bernabé et al 4). Hence, studying the etiology
effect of maternal smoking on baby weights and gestation age 3

of low birth weight is essential in understanding how to foster the


healthy growth of the fetus. Kyrklund and others’ research on ma-
ternal smoking show there is a positive correlation between preterm
birth and the amount smoked and smoking is closely associated with
preterm birth and spontaneous preterm birth (1). Hence, this study
will also explore the association between maternal smoking and gesta-
tional age.
In the article, de Bernabé and others have listed several risk factors
for low birth weight. They claim that birth weight could be affected
by certain chromosomes. Maternal age at pregnancy is also a factor
in determining birth weight. Other constitutional factors include eth-
nicity, marital status, and educational level. Moreover, authors have
mentioned smoking, alcohol consumption and drug abuses could
lead to lower birth weight (de Bernabé et al 5). This research paper
will emphasize on investigating the association between smoking
and the birth weight since multiple studies have shown that smok-
ing is one of the detrimental factors to low birth weight. In Cigarette
Smoking in Pregnancy: Its influence on Birth Weight and Perinatal Mor-
tality, Butler and others have studied 16,994 singletons in Britain in
1958 and claimed that smoking during pregnancy increases the late
fetal and neonatal mortality rate by 28% and reduces birth weight by
170 grams(1). This claim is well supported by other studies. In Bonel-
lie’s research, she has studied the population of 178,801 singleton live
births in Scotland between 1992 and 1994. Bonellie has used stati-
cal methods to confirm that “birthweight, adjusted from gestational
age, sex of baby and parity of the mother, [is] significantly lower for
babies born to mothers who smoked during pregnancy (1).” More-
over, Mitchell and others have shown that quitting smoking during
pregnancy may reduce the risk of SGA infants but a reduction in the
number of cigarettes smoked will not reduce the risk of SGA.
Based on previous studies and research, this research paper will in-
vestigate the association between maternal smoking and birth weight
as well as the association between maternal smoking and gestational
age. One of the hypotheses is that maternal smoking will lower birth
weight. Advanced statistical methods and analysis, including numer-
ical analysis, Q-Q plot, histogram, box-plot, Kurtosis test, skewness
test, hypothesis test, Wilcoxon Rank Sum Test, Welch T-test will be
used for analyzing the correlation between the baby weight of smok-
ing group and non-smoking group. The other hypothesis is that ma-
ternal smoking will lead to a lower gestational age (preterm birth
and spontaneous preterm birth). Similarly, numerical summary and
graphical methods will be used.
effect of maternal smoking on baby weights and gestation age 4

Methods and Data Analysis

We now investigate the two hypotheses:

1. Maternal smoking will lower birth weight.


2. Maternal smoking will decrease gestation age.

Maternal Smoking and Birth Weight


Our null hypothesis is that the expected value of the birth weight of
the non-smoking group should equal that of the smoking group. The
alternative hypothesis is that those two expected values are unequal.

Numerical summary. We first present basic numerical summary of


our data, obtained by the summary function of R.3 3
Relevant code is available in
Table 1 gives the size and portion of non-smoking and smoking numerical-summary.R in supple-
mentary files.
groups.

Table 1: This is the count and percentage


Groups Count Percentage
of maternal smoking status. There are
not smoking 742 60.03% 742 non-smoking mothers (60.03%),
484 smoking mothers (39.16%), and 10
smoking 484 39.16% mothers (0.81%) whose smoking status
NA 10 0.81% is unknown.

Table 2 lists several statistics for the data set.

Weight of Baby Weight of Baby Weight of Baby Table 2: A nummerical summary of


Statistics birth weight, for the whole data, non-
(Whole Group) (Non-smoking) (Smoking) smoking group, and the smoking group.
Min 55.0 55.0 58.0
1st Quartile 108.8 113.0 102.0
Median 120.0 123.0 115.0
Mean 119.6 123.0 114.1
3rd Quartile 131.0 134.0 126.0
Max 176.0 176.0 163.0
NA Count 0 0 0

For birth weight of non-smoking mothers, the Q1, median, and Q3


are 113.0, 123.0, and 134.0 ounces, respectively. For smoking mothers,
the numbers drop to 102.0, 115.0, and 114.1, respectively. We can tell
that babies of non-smoking mother have higher birth weights those of
smoking mothers at least in the middle 50 percent of the data, since
the non-smoking group has higher numbers for babies’ weights in all
Q1, median and Q3.

Graphic summary. To give the reader a better understanding of the


effect of maternal smoking on baby weights and gestation age 5

data, we utilize box plot, histogram, and Q-Q plot to visualize birth
weight and its relationship with respect to maternal smoking status.4 4
Relevant code is available in
Figure 1 is a box plot for birth weight. boxplot-histogram.R and qqplot.R in
supplementary files.

175
Figure 1: The box plot for birth weight,
grouped by maternal smoking status.

150
baby weight (oz.)

125

100

75

50
not smoking smoking
smoking status

This box plot shows a lot of outliers from not smoking data. These
might be because smoking has a higher interquartile range value
which makes non-smoking data to be an outliers easily. Most of the
outliers from nonsmoking data is close to the boundary of not being
an outlier.
Figure 2 is a histogram for birth weight.

Figure 2: The histogram for birth


smoke not smoking smoking
weight, grouped by maternal smoking
status.
120

80
count

40

0
50 100 150
baby weight (oz.)

The histogram for non-smoking looks symmetric, with one mode,


but the histogram of smoking is not symmetric, and it has two modes.
Therefore, smoking data is a combined with two distributions and
smoking will not have a normal distribution. From this histogram, we
also observe that is also skew to the right.
Figure 3 are two Q-Q plots for birth weights of the two groups.
By looking at the Q-Q plot, the first Q-Q plot is non-smoking and
effect of maternal smoking on baby weights and gestation age 6

Non-smoking Mothers Smoking Mothers


175

150
150
baby weight (oz.)

baby weight (oz.)


125
125

100 100

75 75

50
-2 0 2 -2 0 2
theoretical theoretical

Figure 3: Here are two QQ plots for


babies’ weights. On the left is the non-
the second Q-Q plot is smoking. From the first Q-Q plot, we can tell smoking group. On the right is the
smoking group.
that the data is normal. But when we look at the second box plot, we
realize that the distribution for smoking mother might not be normal.
Two of the end tails have not follow the line.
Skewness and kurtosis test. To further check the normality of
birth weight for the two groups, we perform both skewness and kur-
tosis tests. For skewness, both groups pass the test. As for kurtosis,
Group Skewness Kurtosis
non-smoking group has a kurtosis value of 4.04, though the smoking
group passes the group. not smoking -0.187 1.04
smoking -0.0335 -0.0120
Based on the results of Q-Q plots, skewness test and kurtosis test,
we conclude that the birth weight of the non-smoking group is not Table 3: Results of skewness and kurto-
normal, and that of the smoking group is. sis for birth weight. The computation is
done using the e1071 package. Notice
that kurtosis has been centered by sub-
Incidence of low-birth-weight babies. A common definition of tracting 3. Relevant code is available in
normality.R in supplementary files.
low-birth-weight is 88 ounces. At this cutoff, the incidence of low-
birth-weight babies for the non-smoking group is 2.96%, whereas that
of the smoking group is 7.44%. We have also tried different cutoff
values and summarized our findings in table 4. From the table we
observe that the smoking group has a more than 2% change while
the non-smoking group only obtains a 0.13% change. This means that
many smoking mothers gave births to babies of a weight between 86
and 88 ounces. If the cutoff is 90 ounces, 3.37% of the baby from not
smoking mother weights below, and 8.36% of the baby from smoking
mother weights below. So smoking mother tend to have a baby weight
between 86 and 88 ounces, this might be critical for further study
effect of maternal smoking on baby weights and gestation age 7

between the relationship of smoking and baby weight.

Table 4: Different low-birth-weight rate


Group Total At 86 At 87 At 88 At 89 At 90
for different cutoff values. Relevant code
not smoking 742 2.83% 2.83% 2.96% 3.10% 3.37% is available in lbw.R in supplementary
files.
smoking 484 5.37% 6.20% 7.44% 8.26% 8.26%

Hypothesis testing. After summarizing the data, we are now at the


position to test our null hypothesis, that the expected value of the
birth weight of non-smoking group should equal that of the smoking
group.
Our data set does not satisfy the assumptions of the two-sample T-
test, which requires the distribution is normal and the variances of the
two groups are statistically equal. The first assumption is rejected by
our normality check above. As for variance, we have employed the chi
square test to test, and the result is that the variance is not statistically.
Since the parametric hypothesis test cannot not be done to this
study, we move to use two non-parametric hypothesis tests: Welch’s T-
test and the Wilcoxon rank sum test. However, both have limitations:
Welch’s T-test still requires normality, yet the Wilcoxon rank sum
test is not strong. As a result, we applied both to increase our confi-
dence. We should perform Monte Carlo simulation to see how reliable
Welch’s T-test is, but we do not have enough resources for that.
We do a bootstrapping of both groups for mean, with a sample
size of 400—matching the size of the non-smoking group—and a
sample count of 2000. For both tests, the results are that the means
of the two distributions are not equal, with p-values less than 2.2e-
16 in both cases5 . We thus conclude that there is enough evidence to 5
Relevant code is available in
support that at the 95% confidence level, we can accept the alternative hypothesis.R in supplementary files.

hypothesis. In other words, the mean of birth weight from the non-
smoking group is different from that of the smoking group.

Maternal Smoking and Gestation Age


As before, we use graphic summary to gain basic understanding of the
data6 . 6
Relevant codes for this section are in
Figure 4 is a box plot for birth weight. the same files as specified above.

It can be seen from the box plot that the non-smoking group and
the smoking group do not share the same gestation age distribution.
Even though the differences are numerically small, since we have a
much smaller interquartile range, those differences might turn out to
be significant.
effect of maternal smoking on baby weights and gestation age 8

Figure 4: The box plot for gestation age,


350
grouped by maternal smoking status.
gestation age (day)

300

250

200

150

not smoking smoking


smoking status

Figure 5 includes two histograms about the gestation age. Since


there exist a few births with extremely low or high birth weights (see
the figure 4 above), the histogram is not clear enough for the center
range. Hence on the right we give a “zoomed-in” histogram.

smoke not smoking smoking smoke not smoking smoking

250 150

200
100
150
count

count

100
50

50

0 0
150 200 250 300 350 250 300
gestation age (day) gestation age (day)

Figure 5: On the left is the histogram


for gestation age. On the right is the
Figure 6 are two Q-Q plots for gestation ages of the two groups. same data set, but zoomed into a smaller
The Q-Q plot of smoking mother is also not normal. The data has range from 220 ounces to 340 ounces.
more points that does not follow normal distribution at two tails and
the points in the middle is also away from the normal line. So the
distribution of smoking gestation is not normal, and compare to non-
smoking distribution, non-smoking is more normal.
To confirm the non-normality of the data, we perform the skewness
and kurtosis tests. The skewness of the non-smoking group is -1.08,
which means it is skewed to the left, confirming with the histogram
above. As for the smoking group, it has a skewness of -0.224 and is
indeed more symmetric. As for the kurtosis test, the non-smoking
effect of maternal smoking on baby weights and gestation age 9

Non-smoking Mothers Smoking Mothers


350 330

300
300
gestation age (day)

gestation age (day)


250
270

200

240

150

-2 0 2 -2 0 2
theoretical theoretical

Figure 6: Here are two Q-Q plots for


gestation ages. On the left is the non-
group has a result of 11.76, probably due to the fact that the non- smoking group. On the right is the
smoking group.
smoking group has some extreme outliers, as shown in the box plot.
The smoking group has a result of 5.05, which means its distribution
is also not normal.
Group Skewness Kurtosis
Regression Against Gestation Age not smoking -1.08 8.76
smoking -0.224 2.05
Since we have known that the expected gestation age between the
two groups are unequal, and that gestation age is proportional to Table 5: Results of skewness and kur-
tosis for gestation age. As before, the
the birth weight, we would expect some birth weight difference is
kurtosis has been centered by sub-
correlated with gestation age. In particular, if we remove the effect tracting 3. Relevant code is available in
of the gestation age on birth weight, we would expect the mean birth normality.R in supplementary files.

weight difference becomes smaller.


For simplicity, we assume that gestation age has a linear relation-
ship with birth weight. We use linear regression to remove the effect.
Figure 7 shows the estimated line of the linear regression. We have to
admit that, from the plot, the variance of birth weight is large, but a
rough linear relationship can still be seen.
We compare the residual birth weights between the two groups.
The residual birth weight for the non-smoking group is 3.19 ounces,
whereas that of the smoking group is -4.87 ounces. The difference is
8.06, which is slightly less than that in the original data, 8.9 ounces
(from table 2). This confirms with our hypothesis above.
effect of maternal smoking on baby weights and gestation age 10

175
Figure 7: Linear regression of birth
weight against gestation age.

150
baby weight (oz.)

125

100

75

50
150 200 250 300 350
gestation age (day)

Theory

Mean: the mean of the sample is taking the sum of all the data in
sample and divide it by sample size, the mean of X is usually denoted
byX̄,
Median: it is a value that indicates the middle of the data sample,
located in the middle of the data sample.
Standard Deviation: Standard deviation is the summation of the
distance between data points and the median of the data.
Variance: The variance is the square of the standard deviation
Quantile: The First Quantile represents the cutoff point between
the first 25 percent of the data and the remaining 75 percent of the
data.The Third Quantile represents the cutoff point between the first
75 percent of the data and the remaining 25 percent of the data.
Linear regression: Linear regression is a linear approach to
model the relationship between two variables.
Skewness and kurtosis. Skewness and kurtosis measure the
normality of the probability distribution of a real-valued random
variable. It is defined as follows
( )3
1 Xi − X̄
skew(X) = ∑ni=1
n S
( )4
1 n Xi − X̄
kurt(X) = ∑i=1
n S
where X̄ is the expectation of X and S is the standard deviation:
1 n
X̄ = ∑ Xi
n i=1
1
S2 = ∑n (Xi − X̄)2 .
n − 1 i=1
For a normally distributed random variable X, skew(X) = 0 and
kurt(X) = 3.
effect of maternal smoking on baby weights and gestation age 11

Histogram. Histogram is a representation of data by its frequency.


It is usually used as a method to understand the distribution of a
continuous variable. By dividing data into bins, histogram shows the
density of data.
Box plot. Box plot is a graphical method to visualize a set of data
in quantiles. The maximum, minimum, median, first quantile , third
quantile, and outliers will be shown on the boxplot. Two or more
distributions could be graphed in the same plot for comparison.
Quantile-quantile plot. Quantile-quantile plot (Q-Q Plot) is a
plot that is usually used to compare if the data is normal.
Wilcoxon Rank Sum test. Wilcoxon Rank Sum Test is a order test
by ranking each set of data in ascending data. Then we compare the
order of the data to test the difference between two variables.
Welch T-test Welch T-test is a two sample test which is used to
discuss the difference between the two correspond means, with the
assumption that both the sample are normally distributed.
Let X̄, s21 ,and N1 are the 1st sample mean, sample variance, and
size.
Welch’s t-test:
X̄1 − X̄2
t= √ 2
s1 s22
N1 + N 2
2

Conclusion

Hypothesis One Conclusion


Based on the analysis presented above, we reject the null hypothesis
and conclude that there’s a significant difference between the weights
of babies given birth by maternal smoking mother and non-smoking
mother.

Hypothesis Two Conclusion


Based on the statistical methods, we can discover the association be-
tween gestational age and the status of maternal smoking. In order
to further investigate the second hypothesis, we attempt to use a lin-
ear regression to eliminate the effect of the gestational age on baby’s
birth weight. As a result, it reinforces the claim that maternal smoking
indeed affect baby’s birth weight, which is consistent from outside
researches we present.

Limitation
However, limitations does exist among our data set. Firstly, the pri-
mary data we used was collected from 1960s with a small sample size,
effect of maternal smoking on baby weights and gestation age 12

which is out of date and may not faithfully represent the current pop-
ulation. Moreover, the data we have is not normally distributed and
hence there exist constraints in conducting statistics methods like t-
tests. Also, since the data was collected from a single source, the data
units might not be independent of each other, and thus bias might
exist but we cannot investigate it further with citing outside academic
literature.

Works Cited

1. Bonellie, S R. “Effect of maternal age, smoking and deprivation on


birthweight.” Paediatric and Perinatal Epidemiology 15.1 (2001):19-
26.

2. Butler, N. R., et al. “Cigarette Smoking In Pregnancy: Its Influence


On Birth Weight And Perinatal Mortality.” The British Medical
Journal, vol. 2, no. 5806, 1972, pp. 127–130.

3. de Bernabé, Javier Valero, et al. “Risk factors for low birth weight:
a review.” European Journal of Obstetrics and Gynecology and
Reproductive Biology 116.1 (2004): 3-15.

4. Kyrklund-Blomberg, Nina B., and Sven Cnattingius. “Preterm birth


and maternal smoking: risks related to gestational age and onset
of delivery.” American journal of obstetrics and gynecology 179.4
(1998): 1051-1055.

5. Mitchell, EA, et al. “Smoking, nicotine and tar and risk of small for
gestational age babies.” Acta Paediatrica 91.3 (2002)

You might also like