You are on page 1of 9

ReMA | Quantitative Foundations | Fall 2017

Homework 4
POSTED: 10/20/2017
DUE DATE: 10/27/2017 at the start of lab

Problem 1
Researchers wanted to assess whether risk of fractures in three rural Iowa communities differed according to whether the local
drinking water was higher calcium or not. They looked at cumulative incidence of fractures among women aged 55-80.

Fracture No Fracture Total


Higher Calcium 21 127 148
Lower Calcium 11 110 121
Total 32 237 269

a) What is the estimated probability of fracture among those exposed to higher calcium?

Probability of fr (F+) among exposed to higher Ca (C+)= # of fractured in high calcium/# of total high
calcium = P(F+|C+)= 21/148 = 14.19%

b) What is the estimated probability of fracture among those exposed to lower calcium?

Probability of fr (F+) among exposed to low Ca (C-) = # of fractured in low calcium/# of total low
calcium= P(F+|C-) 11/121=9.09%

c) Calculate the estimated risk difference (RD), an absolute measure of association, to compare fracture risk among those
exposed to higher calcium versus those exposed to lower calcium. Interpret this measure in one sentence.
Risk difference = P(F+|C+)-P(F+|C-) = (21/148)-(11/121) = 0.0510=5.10%
The woman aged 55-80 drinks high calcium water in Iowa communities experience 5.10 percentage
points increase in risk of facture compared to the woman aged 55-80 drinking low calcium water in
Iowa communities.

d) Calculate the risk ratio of the relationship between higher calcium and fracture. Interpret this measure in one sentence.
Risk Ratio= P(F+|C+) / P(F+|C-) = 1.56
During the observation, the woman aged 55-80 in Iowa community have high calcium water supply
experienced, on average, 1.56 times the risk of fracture compared to woman aged 55-80 in Iowa
community drinking low calcium water supply.

e) Calculate the odds ratio of the relationship between higher calcium and fracture. Interpret this measure in one sentence.
Odds Ratio= The odds of fr in high Ca group/odds of fr in low Ca group =
[P(F+|C+)/(1- P(F+|C+))]/[( P(F+|C-)/(1- P(F+|C-))] = [(21/148)/(1-21/148)] / [(11/121)/(1-
11/121)]=0.1653/0.1=1.65
During the observation, the odds of woman aged 55-80 in Iowa community have high calcium water
supply experienced is 1.65 times the odds of fracture compared to woman aged 55-80 in Iowa
community drinking low calcium water supply.

1
f) From what you know so far, what do you conclude about the relationship between risk of fracture and higher calcium in the
water supply? Is higher calcium good for you in terms of reducing the risk of fracture? Do you think this result surprised the
researchers? Why or why not?

In Iowa community, woman, age 55- 80, drinks high calcium water supply has higher fracture chance
compared to woman drinks low calcium water supply in the same age group. High calcium in water
supply is not reducing the risk of facture. And the result should surprise the researchers, because many
believe that low calcium supply may lead easy future fracture.

Problem 2
Among 50 puppies bred using a new color selection method, 35 were brown. In a comparable sample of 70 puppies who were not bred
using the new color selection method, 50% were brown. Use an alpha level of 0.05 to test the claim that the new color selection
method results in a different proportion of brown puppies. Remember the 4 steps:

1. State: the null and alternative hypotheses for testing the claim that the proportion of brown puppies differs by exposure.
H0= using new color selection and the proportion of brown puppies bred are independent
(RR=1)
HA= using new color selection and the proportion of brown puppies bred are associated (RR1)

2. Convert: Calculate an appropriate test statistic for assessing these hypotheses.


Brown Not Total
X2 test Brown Not brown
puppies brown
puppies puppies
puppies
New color 35 15 50 New color 50*70/120 50*50/120
method method =29.1667 =20.8333
Not new color 35 35 70
Not new color 70*70/120 70*50/120
method
method =40.8333 =29.1667
total 70 50 120

Chi-squared statistic:
X 2=(0 )2/=(35-29.1667)^2/29.1667+(15-20.8333)^2/20.8333+(35-40.8333)^2/40.8333+(35-
29.1667)^2/29.1667 =4.799946
Degree of freedom =( #rows 1) / (#columns 1 )= (2 1)/(2 1) = 1
So our chi-squared statistic is equal to 4.80

3. Compare: Refer to a known distribution to find the critical value and the p-value (or p-value limits). Do you reject the null
hypothesis?
We can use the tables in Sullivan to compare to the known distribution. We see that the critical
value for a chi-squared with 1 df at the alpha=0.05 level is 3.84. Since our statistic ( 4.80) is
greater than 3.84, our decision is to Reject H0 at the alpha=5% level.

4. Interpret: Describe your conclusions using the wording of the problem.

From the statistic, we can conclude that using the new color method has higher chance in breeding
brown puppies compared to not using the new color method. (We reject the null hypothesis of no
association at an alpha level of 5%.)
2
Problem 3
Many adults suffer from sleep-disordered breathing. To evaluate the relationship between biological sex (F versus M) and likelihood of
habitual snoring (snoring, snorting, or breathing pauses every night or almost every night, or extremely loud snoring), researchers studied
employees of state agencies in Wisconsin in 1993. In their study, they found that 103 of 335 women reported habitual snoring, while
232 of 438 men reported habitual snoring. Use the STATA output below to answer the following questions.

[The CS command in Stata gives absolute and relative measures of association as well as the chi-
square test statistic and p-value.]

. csi 103 232 232 206

| Exposed Unexposed | Total


-----------------+------------------------+------------
Cases | 103 232 | 335
Noncases | 232 206 | 438
-----------------+------------------------+------------
Total | 335 438 | 773
| |
Risk | .3074627 .5296804 | .4333765
| |
| Point estimate | [95% Conf. Interval]
|------------------------+------------------------
Risk difference | -.2222177 | -.2902365 -.1541989
Risk ratio | .5804683 | .4832277 .6972769
Prev. frac. ex. | .4195317 | .3027231 .5167723
Prev. frac. pop | .1818151 |
+-------------------------------------------------
chi2(1) = 38.17 Pr>chi2 = 0.0000

a) What is the estimated risk difference for snoring in women compared to men? Interpret.
Estimated risk difference is -0.2222= -22.22%

In 1993, the woman employees from state agencies in Wisconsin experience 22.22 percentage points
decrease in risk of snoring compared to the man who worked in the same place.

b) What is the estimated risk ratio for snoring in women compared to men? Interpret.
Estimated risk ratio is 0.5805
In 1993, the woman employees from state agencies in Wisconsin experienced, on average, 0.58 times
the risk of habitual snoring compared to man who worked in the same agencies.

c) Conduct a hypothesis test of whether biological sex and snoring risk are associated. Remember to complete the 4 steps.

State:
H0= biological sex and habitual snoring are independent (RR=1)
HA= biological sex and habitual snoring are associated (RR1)
Convert: Calculate an appropriate test statistic for assessing these hypotheses.

3
Female Male Total X2 test Female Male

Snoring 123 232 335 Snoring 335*335/773= 335*438/773=


145.1811 189.8188875
No 232 206 438
snoring No snoring 438*335/773= 438*438/773=
189.8189 248.1811
total 335 438 773

X2= (0 )2/ =(123-145.18)^2/145.18+(232-189.82)^2/189.82+(232-198.82)^2/189.82+(206-


248.18)^2/248.18 = 38.17

Compare:
We can use the tables in Sullivan to compare to the known distribution. We see that the critical value
for a chi-squared with 1 df at the alpha=0.05 level is 3.84. Since our statistic (38.17) is greater than
3.84, our decision is to Reject H0 at the alpha=5% level. In addition, from the STATA data we can find
that p<0.001,indicating (as we would expect) that we can reject H0 at the alpha=5% level.

Interpret:
From the statistic, we can conclude that biological gender is related to habitual snoring. Female sex is
less likely to snore. We reject the null hypothesis of no association at an alpha level of 5%. The
probability of snoring in woman who worked Wisconsin in state agencies in 1993 was 0.5805 times the
probability of snoring in man who worked in the same place.

Problem 4
A clinical trial is planned to compare an experimental medication designed to lower systolic blood pressure to a placebo. A total of 78
patients are recruited and randomized to receive either the experimental medication or the placebo. At the end of the trial, the mean
SBP among the 40 experimental patients is 128.5 mm Hg, with a SD of 10.4 mm Hg. For the 38 patients in the placebo arm, the mean
SBP is 135.3 mm Hg and the SD is 9.8 mm Hg.

Conduct an appropriate test to assess whether there is a significant difference in mean SBP between groups
using alpha=0.05. Remember to complete the 4 steps!

State: Lets let A denote the population mean blood pressure for experimental medication, and
B denote the population mean blood pressure for placebo.
H0: A = B
HA: A B

Convert: Calculate an appropriate test statistic for assessing these hypotheses.


= 128.5 =135.3 SA=10.4 SB=9.8

(
)
(
)

t= 1 1
= 1 1
+ +

4
(1)2 +(2)2 (39)10.42 +(37)9.8^2
=
7771.72 102.2595
Sp= = = =10.1123
+2 40+382 76

(128.5135.3)0 6.80
t= 10.1123 1 1
= 10.1123 1 1
=-6.8/ 10.1123 0.2265 =-2.9685
+ +
40 38 40 38
|t|=|-0.2685|=0.2685; df=nA+nB-2= 40+38-2= 76

Compare:
Since our |t|=0.2685 and degree of freedom =76
From the t-table, the alpha = 0.5 critical value for a t with large number of df is 1.96. Therefore,
|t|=2.9685 > 1.96, we do reject Ho the hypothesis of equal means.
In addition, we could use the t-table to find that P<0.01.

Interpret:
There is sufficient evidence in this clinical trial to support the claim that experimental medication and
placebo differ in lowering mean systolic blood pressure. We reject the null hypothesis of no association
at an alpha level of 5%.

Problem 5
Two other groups of researchers (one at Kale University, the other at Princegum) decide to try to replicate the
finding in problem 4 separately. Each group conducts their own RCT, and their summary data appears below.

Kale University N Sample Mean Sample SD


Experimental group 15 128.4 10.4
Placebo 15 135.2 9.8

Princegum University N Sample Mean Sample SD


Experimental group 40 128.6 20.4
Placebo 40 135.4 21.8

These data were read into STATA and analyzed. Use the output to answer the questions below.
[The TTESTI command in Stata accepts the sample size, sample mean, and sample SD across gropus and computes the t-test for us.]

. ttesti 15 128.4 10.4 15 135.2 9.8

Two-sample t test with equal variances


------------------------------------------------------------------------------
| Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]
---------+--------------------------------------------------------------------
x | 15 128.4 2.685268 10.4 122.6407 134.1593
y | 15 135.2 2.530349 9.8 129.7729 140.6271
---------+--------------------------------------------------------------------
combined | 30 131.8 1.919531 10.5137 127.8741 135.7259
---------+--------------------------------------------------------------------
diff | -6.8 3.689625 -14.35785 .7578544
------------------------------------------------------------------------------
diff = mean(x) - mean(y) t = -1.8430
5
Ho: diff = 0 degrees of freedom = 28

Ha: diff < 0 Ha: diff != 0 Ha: diff > 0


Pr(T < t) = 0.0380 Pr(|T| > |t|) = 0.0759 Pr(T > t) = 0.9620

. ttesti 40 128.6 20.4 40 135.4 21.8

Two-sample t test with equal variances


------------------------------------------------------------------------------
| Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]
---------+--------------------------------------------------------------------
x | 40 128.6 3.225523 20.4 122.0758 135.1242
y | 40 135.4 3.446883 21.8 128.428 142.372
---------+--------------------------------------------------------------------
combined | 80 132 2.376354 21.25475 127.27 136.73
---------+--------------------------------------------------------------------
diff | -6.8 4.720699 -16.19819 2.59819
------------------------------------------------------------------------------
diff = mean(x) - mean(y) t = -1.4405
Ho: diff = 0 degrees of freedom = 78

Ha: diff < 0 Ha: diff != 0 Ha: diff > 0


Pr(T < t) = 0.0769 Pr(|T| > |t|) = 0.1537 Pr(T > t) = 0.9231

a) How do the mean SBP values compare across the 3 studies?

The mean SBP value are very similar.


It varies maximum by 0.2 points.

Original Kale Princegum

Mean SBP 128.5 128.4 128.6


Experimental
drug
Mean SBP 135.3 135.2 135.4
Placebo

b) How do the sample SD values compare across the 3 studies?

The sample SD is identical in the Original and Kales study. To the


The Princegum study have greater SD compared to the other two study results.

Original Kale Princegum

SD 10.4 10.4 20.4


Experimental
drug
SD 9.8 9.8 21.8
Placebo

6
c) How do the sample sizes compare across the 3 studies?

The sample size in Kales study is smaller whereas the other two studies has similar sample size.

Original Kale Princegum

Sample size 40 15 40
Experimental
drug
Sample size 38 15 40
Placebo

d) Based on the Kale study, what is the p-value for comparing mean SBP across treatment groups, using a two-sided alpha=0.05
level test? What do you conclude?

According to the STATA data of Kale study, the t value = -1.8430, Pr(|T| > |t|) = 0.0759, which means we do not
reject we do not reject Ho the hypothesis of equal means at an alpha level of 5%.

(Referring to the t -table, for two-tailed alpha =0.05 critical value for a t at df 28 is 2.048.
| t |= 1.8430 < 2.048, we do not reject Ho the hypothesis of equal means.)
In Kale study, there is insufficient evidence in this clinical trial to support the claim that experimental
medication and placebo differ in lowering mean systolic blood pressure. We do not reject the null
hypothesis of no association at an alpha level of 5%.

e) Based on the Princegum study, what is the p-value for comparing mean SBP across treatment groups, using a two-sided
alpha=0.05 level test? What do you conclude?

According to the STATA data of Princegum study, the t value = -1.4405, Pr(|T| > |t|) = 0.1537, which
means we do not reject we do not reject Ho the hypothesis of equal means at an alpha level of 5%.

(Referring to the t -table, for two-tailed alpha =0.05 critical value for a t at df 78 is less than 2.045.
| t |= 1.4405< 2.045, we do not reject Ho the hypothesis of equal means.)
In Princegum study, there is insufficient evidence in this clinical trial to support the claim that
experimental medication and placebo differ in lowering mean systolic blood pressure. We do not reject
the null hypothesis of no association at an alpha level of 5%.

f) Did you obtain similar test results from the original study and the Kale study? Why did you see what you did? Explain.

No, I do not obtain similar test results. The original and Kale study have very similar means and SD, but
the sample size is different. The statistic result of original study was sufficient to conclude that there is
significant difference in mean systolic blood pressure between experimental drug and placebo.
Meanwhile, the result from Kale fails to reach similar conclusion. We can say the t value in t-test is
Influenced by the sample size, and degree of freedom.

7
g) Did you obtain similar test results from the original study and the Princegum study? Why did you see what you did? Explain.

No, I do not obtain similar test results. The original and Princegum study have very similar means and
sample size but standard deviation varies. The statistic result of original study was sufficient to
conclude that there is significant difference in mean systolic blood pressure between experimental
drug and placebo. Meanwhile, the result from Princegum fails to demonstrate similar conclusion. We
can say the result of t value in t- test is influenced by the standard deviation (SD).

Problem 6
A total of 654 children aged 3-19 participated in a cohort study in Boston, MA designed to examine the
association between exposure to secondhand smoking and respiratory disease. For each participant, at the start
of the study investigators recorded: age in years; forced expiratory volume (FEV) a measure of pulmonary
function with higher values meaning stronger function; height in inches; biological sex; and exposure to
secondhand smoke. The children were then followed up to see whether they were diagnosed with a respiratory
disease (e.g. asthma). The variables in the dataset are coded as follows:

Variable Coding
ID Identification number
AGE Age in years
FEV FEV in liters
HEIGHT Height in inches
SEX Biological sex; 1 if boy; 0 if girl
SMOKE Exposure to secondhand smoke; 1 if yes; 0 if no
RESPIRE Respiratory Disease; 1 if yes; 0 if no

Use the output below to answer the following questions.

. tab smoke respire, row col


+-------------------+
| Key |
|-------------------|
| frequency |
| row percentage |
| column percentage |
+-------------------+
| respire
smoke | 0 1 | Total
-----------+----------------------+----------
1| 279 39 | 318
| 87.74 12.26 | 100.00
| 47.37 60.00 | 48.62
-----------+----------------------+----------
0| 310 26 | 336
| 92.26 7.74 | 100.00
| 52.63 40.00 | 51.38
-----------+----------------------+----------
Total | 589 65 | 654
| 90.06 9.94 | 100.00
| 100.00 100.00 | 100.00

a) What is the estimated probability of respiratory disease among children exposed to secondhand smoke?
8
probability of respiratory disease (R) among children exposed to secondhand smoke (S) = P(R+|S+) =
39/318= 12.26%

b) What is the estimated probability of respiratory disease among children not exposed to secondhand smoke?

probability of respiratory disease (R) among children not exposed to secondhand smoke (S)= P(R+|S-
)=26/336=7.74%

c) Which measure of association can be used to capture the relationship between exposure to secondhand smoke and respiratory disease on an
absolute scale? Calculate and interpret this measure of association.

Calculating the absolute scale, we use risk difference. Risk dfference= P(R+|S+) - P(R+|S-)= 12.26%-
7.74%= 4.52%
The children in the secondhand smoke exposure group experience a 4.52 percentage points increased
in the risk of respiratory disease compared to the individuals in the unexposed group.

d) What is an appropriate measure of association that can be used to capture the relationship between exposure to secondhand smoke and
respiratory disease on the relative scale? Calculate and interpret this measure of association.

When talking about relative scale, we use Risk ratio. The risk ratio = P(R+|S+) / P(R+|S-)=
12.26/7.74=1.58
The children in the secondhand smoke exposed group experienced, on average, 1.58 times the risk of
respiratory disease compared to the children in the unexposed group.

Problem 7
Using your own words and without copying directly from your textbook or the internet, please define the following terms in 1-2 sentences.

a) Beta or Type II error: The probability of failing to reject the Null Hypothesis when the null is not true in the
population.

b) P-value: the probability of seeing a result as or more extreme than what you got in your sample when given that H0
is true in the population.

c) Significance level: is the probability of a type I error, or you reject the null hypothesis when it is true.

You might also like