You are on page 1of 6

Statistical Reasoning for Public Health: Estimation, Inference, &

Interpretation

Homework #1
1. The duration of time from first exposure to HIV infection to AIDS diagnosis is called the
incubation period. The incubation periods of a random sample of 7 HIV infected
individuals is given below (in years):

12.0 10.5 13.5 12.5


9.5 6.3 7.2

a. Report the sample mean.


b. Report the sample median.
c. Report the sample standard deviation
d. If the number 6.3 above were changed to 1.5, what would happen to the
1. sample mean?
2. sample median?
3. sample standard deviation?

State whether each would increase, decrease, or remain the same.

e. Assume that these data a seven random observations taken from a larger population
whose values are normally distributed. (even if this assumption makes little sense)
Using this assumption, coupled with prior computations, estimate an interval that
contains 95% of incubation time values in the population of HIV patients from which
the sample was taken.
f. Suppose another random sample of 100 persons is taken from the same population,
and added to the sample of 7, for a total sample of 107 HIV infected individuals.
How will the following sample statistics based on the sample of 107 compare in value
to the estimates based on the sample statistics from the sample of n=7? (greater than,
less than, about equal to..)

1. sample mean
2. sample median
3. standard deviation values

g. Suppose the distribution of these incubation period values is left-skewed in the


population of persons with HIV. If you were take single random samples of each of
the following size from this population, what will likely be the shape of the
distribution of sample values?

1. n=75
2. n=200
3. n=3,200

Statistical Reasoning for Public Health: Estimation, Inference, & Interpretation Homework 1
Page 1 of 6
2. The CDC collects anthropometric (weight, height etc..) data on large samples of US
youth, both male and female, and uses these data to create growth charts, which
essentially characterize the distributions of these measures by age and sex. For example,
for 18 years old males, the mean body mass index (BMI) is 21.9 (kg/m2) with a standard
deviation (SD) of 3.2 (kg/m2). Physicians (and patients) can use these data to figure out
how individual BMI values compare relative to the age and sex specific distribution.
Suppose you are a physician and you are screening patients at a health fair. The
following describes some of the men you have screened. You may assume the
distribution of BMI values for 18 year old males is a normal distribution.

a. Male 1 had a BMI of 26.7. His BMI was above average by how many SDs?
b. Male 2 had a BMI of 23.5. He His BMI was above average by how many SDs?
c. Male 3 had a BMI of that was .75 SDs below the average. What was his BMI
measure?
d. Estimate a range of normal BMI values, ie: a range that contains the middle 95% of
the values in the population of 18 year old males.
e. Not surprisingly, perhaps, the actual distribution of BMI values among 18 year old
males is right skewed (slightly, not heavily so). Given this fact, what additional
summary statistics would be necessary to properly estimate the interval in part d?

Statistical Reasoning for Public Health: Estimation, Inference, & Interpretation Homework 1
Page 2 of 6
3. Assume blood-glucose levels in a population of adult women are approximately normally
distributed with mean 90 mg/dL and standard deviation 13 mg/dL.

For parts a-c, answer each of the following:

1. What percentage of individuals would be called abnormal and need to be retested?


2. What is the normal range of glucose levels in units of mg/dL?

a. Suppose the abnormal range were defined to be glucose levels outside of 1 standard
deviation of the mean (i.e., either at least 1 standard deviation above the mean, or at
least 1 standard deviation below mean). Individuals with abnormal levels will be
retested.
b. Suppose the abnormal range were defined to be glucose levels outside of 1.5
standard deviations of the mean.
c. Suppose the abnormal range were defined to be glucose levels outside of 2
standard deviations of the mean.

Statistical Reasoning for Public Health: Estimation, Inference, & Interpretation Homework 1
Page 3 of 6
4. A 2011 article published in the American Journal of Public Health1 reports on the
association between neighborhood socioeconomic status and cognitive function in women. 1
As per the authors, the study sample was obtained as follows

We assessed women aged 65 to 81 years (n = 7479) who were free of


dementia and took part in the Womens Health Initiative Memory Study. The Womens

The following graphic is included in the manuscript:

The 3MSE scored is a standardized measure of cognitive functioning (higher values indicate
higher cognitive function), and the NSES quartiles are ordinal categories of neighborhood SES
scores for the subjects in the study. (Higher scores indicate higher neighborhood SES)

a. What is the range of 3MSE scores across all women included in the study?
b. Which of the SES quartiles includes the woman with the lowest 3MSE score?
c. Approximately, what is the median 3MSE score for NMES Q1?
d. Approximately, what is the median 3MSE score for NMES Q4?
e. What is the difference in 3MSE medians for Q4 compared to Q1?
f. What type of distribution do the 3MES scores have in the four NSES quartiles?
g. How do you expect the mean 3MSE score to compare to the median MSE score for NSES
Q3?
h. Generally, what is the nature of the relationship between 3MSE scores and NSES quartiles?
score distribution differences between the 4 SES quartiles?

1 Shih R, et al. Neighborhood Socioeconomic Status and Cognitive Function in Women. American Journal of
Public Health . September 2011, Vol 101, No. 9 |

Statistical Reasoning for Public Health: Estimation, Inference, & Interpretation Homework 1
Page 4 of 6
5. A 2011 article published in the Journal of Substance Abuse Treatment2 reports the results of
a clinical trial designed to assess the effects of acupuncture on anxiety related to withdrawal
from psychoactive drugs. As per the abstract:

Auricularacupuncture(AA)isawidelyacceptedtreatmentoptionforsubstanceabuse
thatisusedinmorethan700treatmentcentersworldwide.Despiteclaimsofperceived
clinicalbenefitsbypatientsandtreatmentstaff,researcheffortshavefailedto
substantiatepurportedbenefits,andthemechanism(s)bywhichAAservesinthe
treatmentofaddictionremaininconclusive.NumerousstudieshaveshownAAto
beaneffectivetreatmentforperioperativeanxiety.Inthisstudy,wehypothesizethat
AAreducestheanxietyassociatedwithwithdrawalfrompsychoactivedrugs.Thestudy
usedarandomized,controlleddesignandincludedasampleof101patientsrecruited
fromanaddictiontreatmentservice.Subjectswereassignedtooneofthreetreatment
groups(NationalAcupunctureDetoxificationAssociation[NADA]AA,AAatshampoints,
ortreatmentsettingcontrol)andwereinstructedtoattendtreatmentsessionsfor3
days.Theprimaryoutcomemeasurestateanxietywasassessedusingapretestposttest
treatmentdesign.

The following boxplots (Figure 4 in the article) show the main results from this study: (higher
anxiety scores indicate higher anxiety)

2 Black S, et al. Determining the efficacy of auricular acupuncture for reducing anxiety in patients withdrawing
from psychoactive drugs. Journal of Substance Abuse Treatment, 41 (2011) 279287.

Statistical Reasoning for Public Health: Estimation, Inference, & Interpretation Homework 1
Page 5 of 6
a. (Approximately ) What is the median anxiety score before the session started among
participants randomized to receive acupuncture (NADA)?
b. (Approximately ) What is the median anxiety score after the session finished among
participants randomized to receive acupuncture (NADA)?
c. (Approximately) What is the median anxiety score before the session started among
participant srandomized to receive Relaxation?
d. (Approximately) What is the median anxiety score after the session finished among
participants randomized to receive Relaxation?
e. Based on these boxplot presentations, what sign would the mean anxiety score difference
take on (+,-) for after the session compared to before the session for each of the three
randomization groups?
f. Likely, how will the mean differences (after session before session) in anxiety scores
compare in value across the three randomization groups?
g. Based on the boxplots, what conclusion can you make about the efficacy of acupuncture
on anxiety reduced as compared to the other two groups?
h. The authors used the graphic alone to demonstrate the study findings. Why did they not
need to first adjust for potential confounders?

Statistical Reasoning for Public Health: Estimation, Inference, & Interpretation Homework 1
Page 6 of 6

You might also like