Professional Documents
Culture Documents
Gestation Age
Zijing Wang, Xuanyu Liang, Kevin Xue, Xiao Wang, Jiacheng
Shi, Minsheng Liu, and Junxian Tan.
February 3, 2019
Introduction
While people nowadays are more concerned about health issues, there
exist plentiful studies showing that smoking is a significant factor
that affects human systems. Under the environment of suffering from
both direct and passive smoking, individuals are exposed to increased
risk of lung disease and cancer. Furthermore, smoking could not
only affect smokers but also have detrimental effects on the fetus.
Numerous medical reports and scientific researches have shown that
maternal smoking could create restrictions on fetus growth, and such
restrictions are the main causes of the immaturity of infants. Therefore
this report will investigate the association between maternal smoking
and immature infants.
The data of this study is composed of the measure of all pregnan-
cies between 1960 and 1967 among females in Kaiser Health Plan in
Oakland, California. Several statistical methods will be used in this
case study, including but not limited to numerical analysis, graphi-
cal interpretations, and testing methods. Numerical analysis is used
primarily to understand the summary of data. In addition, box-plots,
histograms, and Quantile-Quantile plots will be employed as graphi-
cal methods to understand the distribution of data. Several statistical
tests, including the Kurtosis test, skewness test, and hypothesis test
will be applied to analyze the distribution between the two groups.
Data
In this study, the primary data we use come from the Child Health
and Development Studies (CHDS). The data consist of pregnancies
in Kaiser Health Plan in Oakland, California from 1960 and 1967. All
mothers in this study were enrolled in this health plan and received
prenatal care. The whole data set consists 1236 babies.
Among all information in the data set1 , we are interested in the 1
We received two versions of the same
following variables: data, babies.txt and babies23.txt.
The latter contains more variables. Since
the former is sufficient for our purpose,
• weight of baby, it is used in our preprocessing step.
• gestation age,
• and maternal smoking status.
effect of maternal smoking on baby weights and gestation age 2
Data Preprocessing
We preprocess the data in three steps2 : 2
Relevant code is available in
preprocess.R in supplementary files.
• We select variables we need and rename them appropriately:
– baby.wt: weight of baby
– gestation: gestation age
– smoke: maternal smoking status
Background
data, we utilize box plot, histogram, and Q-Q plot to visualize birth
weight and its relationship with respect to maternal smoking status.4 4
Relevant code is available in
Figure 1 is a box plot for birth weight. boxplot-histogram.R and qqplot.R in
supplementary files.
175
Figure 1: The box plot for birth weight,
grouped by maternal smoking status.
150
baby weight (oz.)
125
100
75
50
not smoking smoking
smoking status
This box plot shows a lot of outliers from not smoking data. These
might be because smoking has a higher interquartile range value
which makes non-smoking data to be an outliers easily. Most of the
outliers from nonsmoking data is close to the boundary of not being
an outlier.
Figure 2 is a histogram for birth weight.
80
count
40
0
50 100 150
baby weight (oz.)
150
150
baby weight (oz.)
100 100
75 75
50
-2 0 2 -2 0 2
theoretical theoretical
hypothesis. In other words, the mean of birth weight from the non-
smoking group is different from that of the smoking group.
It can be seen from the box plot that the non-smoking group and
the smoking group do not share the same gestation age distribution.
Even though the differences are numerically small, since we have a
much smaller interquartile range, those differences might turn out to
be significant.
effect of maternal smoking on baby weights and gestation age 8
300
250
200
150
250 150
200
100
150
count
count
100
50
50
0 0
150 200 250 300 350 250 300
gestation age (day) gestation age (day)
300
300
gestation age (day)
200
240
150
-2 0 2 -2 0 2
theoretical theoretical
175
Figure 7: Linear regression of birth
weight against gestation age.
150
baby weight (oz.)
125
100
75
50
150 200 250 300 350
gestation age (day)
Theory
Mean: the mean of the sample is taking the sum of all the data in
sample and divide it by sample size, the mean of X is usually denoted
byX̄,
Median: it is a value that indicates the middle of the data sample,
located in the middle of the data sample.
Standard Deviation: Standard deviation is the summation of the
distance between data points and the median of the data.
Variance: The variance is the square of the standard deviation
Quantile: The First Quantile represents the cutoff point between
the first 25 percent of the data and the remaining 75 percent of the
data.The Third Quantile represents the cutoff point between the first
75 percent of the data and the remaining 25 percent of the data.
Linear regression: Linear regression is a linear approach to
model the relationship between two variables.
Skewness and kurtosis. Skewness and kurtosis measure the
normality of the probability distribution of a real-valued random
variable. It is defined as follows
( )3
1 Xi − X̄
skew(X) = ∑ni=1
n S
( )4
1 n Xi − X̄
kurt(X) = ∑i=1
n S
where X̄ is the expectation of X and S is the standard deviation:
1 n
X̄ = ∑ Xi
n i=1
1
S2 = ∑n (Xi − X̄)2 .
n − 1 i=1
For a normally distributed random variable X, skew(X) = 0 and
kurt(X) = 3.
effect of maternal smoking on baby weights and gestation age 11
Conclusion
Limitation
However, limitations does exist among our data set. Firstly, the pri-
mary data we used was collected from 1960s with a small sample size,
effect of maternal smoking on baby weights and gestation age 12
which is out of date and may not faithfully represent the current pop-
ulation. Moreover, the data we have is not normally distributed and
hence there exist constraints in conducting statistics methods like t-
tests. Also, since the data was collected from a single source, the data
units might not be independent of each other, and thus bias might
exist but we cannot investigate it further with citing outside academic
literature.
Works Cited
3. de Bernabé, Javier Valero, et al. “Risk factors for low birth weight:
a review.” European Journal of Obstetrics and Gynecology and
Reproductive Biology 116.1 (2004): 3-15.
5. Mitchell, EA, et al. “Smoking, nicotine and tar and risk of small for
gestational age babies.” Acta Paediatrica 91.3 (2002)