601 Class 05

Sociology 601: Class 5, September 15, 2009
• Overview
• Homeworks
• Stata & Review standard errors
• Chapter 5
• Point estimation. (A&F 5.1)
• Confidence intervals…
o for a population mean (A&F 5.2)
o for a population proportion (A&F 5.3)
• Choosing a sufficient sample size (A&F 5.4)
1
What we have accomplished with sampling
distributions
• Given a population parameter, we know that a

sample statistic will produce a better estimate of the
population parameter when the sample is larger.
(Better means more accurate and normally
distributed).
• We know what we are doing at a qualitative level.
2
What’s next
• We will take it to a quantitative level: How good is

a given estimate from a given sample?
• We will go over formal language and equations for

using sample statistics to make inferences for
population parameters.
• Once we have equations for predicting a population

mean and standard deviation, we will discuss
formal language for defining an interval estimate, a
guess of a range of potential values for the
population parameter, based on the sample. 3
5.1: Estimation: definitions
• Point estimate: a single number, calculated from a

set of data, that is the best guess for the parameter.
• Point estimator: the equation used to produce the

point estimate. (Common notation: put a “hat” on
the parameter.)
• Interval estimate: a range of numbers around the

point estimate within which the parameter is
believed to fall. Also called a confidence interval.
4
The basics of point estimation
• The typical point estimator

Yi
of a population mean is a sample ̂  Y 
mean: n
• The typical point estimator of a

population proportion is a sample f Yi
proportion: ˆ  
n n
• Q: is this a point estimator of a mean?

ˆ  Y51
Point estimators for standard deviations.
Estimated standard deviation of observations in a

population:

ˆ s
 i
(Y  Y ) 2
n 1

6
Typical point estimators for standard errors.
• Estimated standard error ˆ

of samples drawn from a ˆ Y 
population: n
• Special case: estimated

ˆ (1  ˆ )
standard error of a ˆ ˆ 
population proportion: n
7
Choosing a good estimator
You can technically use any equation you want as a
point estimator, but the most popular ones have
certain desirable properties.
•Unbiasedness: The sampling distribution for the estimator
‘centers’ around the parameter. (On average, the estimator
gives the correct value for the parameter.)
•Efficiency: If at the same sample size one unbiased
estimator has a smaller sampling error than another unbiased
estimator, the first one is more efficient.
•Consistency: The value of the estimator gets closer to the
parameter as sample size increases. Consistent estimators
may be biased, but the bias must become smaller as the
sample size increases if the consistency property holds true.
8
Examples for point estimates:
Given the following sample of seven observations:
5,2,5,2,4,5,5
• What is the estimator of the population mean?

• What is the estimate of the population mean?
• What is the estimator of the population standard error?
• What is the estimate of the population standard error for
this sample?
• What is the estimate of the population proportion with a

value of 5 or greater?
• What is the estimate of the population standard error for the
proportion with a value 5 or greater?
9
5,2,5,2,4,5,5
• What is the estimator of the population mean?
Yi

ˆ Y 
n
• What is the estimate of the population mean?
(5+2+5+2+4+5+5) / 7 = 28 / 7 = 4
• What is the estimator of the population standard error?

ˆ s
 
ˆY  
n n
• What is the estimate of the population standard error for this sample?
o =sqrt {[(5-4)2+(2-4)2+(5-4)2+(2-4)2+(4-4)2+(5-4)2+(5-4)2]/(7-1)} / sqrt(7)
o = sqrt { [1 + 4 + 1 + 4 + 0 + 1 + 1] / 6 } / sqrt(7)
o

= sqrt(2) / sqrt(7)
o = 1.41 / 2.64
10
o = 0.53
5,2,5,2,4,5,5
• What is the estimate of the population proportion with a
value of 5 or greater?
o =4/7
o = .57
• What is the estimate of the population standard error for the

proportion with a value 5 or greater?
• = sqrt(.57 * (1-.57)) / sqrt(7)
• = sqrt (.57 * .43) / sqrt(7)
• = sqrt (.24) / sqrt(7)
• = .49 / 2.64
• = .187
11
5.2: interval estimates:
• Interval estimate (also called a confidence interval):

a range of numbers that we think has a given
probability of containing a parameter.
• Confidence coefficient: The probability that the

interval estimate contains the parameter. Typical
confidence coefficients are .95 and .99.
• We usually are told the desired confidence

coefficient, then asked to find the interval estimate
appropriate for the confidence coefficient.
12
Example of confidence interval.
95% confidence interval for a sample mean:
95%c.i.  Y  1.96 * se(Y )

95%c.i.  Y 1.96 * 
ˆY
example using age from IHDS:
. summarize age
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
age | 215754 27.34663 19.34841 0 116
. ci age
Variable | Obs Mean Std. Err. [95% Conf. Interval]
-------------+---------------------------------------------------------------
age | 215754 27.34663 .0416549 27.26499 27.42827
Q: how is std. err. of age calculated?

Q: assumptions?
13
Equations for interval estimates.
• Confidence interval of a mean

c.i.  Y  z̂ Y
• and proportion:
c.i.  ˆ  zˆ 
• where…
ˆ Y  s / n
• and where you choose z, based on the p-value
for the confidence interval you want
• Assumption: the sample size is large enough that

the sampling distribution is approximately normal
14
Notes on interval estimates:
• Usually, we are not given z. Instead we start with a

desired confidence interval (e.g., 95% confidence),
and we select an appropriate z – score.
• We generally use a 2-tailed distribution in which ½ of

the confidence interval is on each side of the sample
mean.
• What does this do to our choice of p-values for the
z-scores?
15
Equations for interval estimates.
• Example: find c.i. when Ybar =10.2, s=10.1, N=1055,

interval=95%.
• z is derived from the 95% value: what value of z leaves 95% in

the middle and 2.5 % on each end of a distribution?
• For p = .975, z = 1.96
• The standard error is s/SQRT(n) = 10.1/SQRT(1055) = .31095
• Top of the confidence interval is 10.2 + 1.96*.31095 = 10.8095

• The bottom of the interval is 10.2 – 1.96*.31095 = 9.5905
• Hence, the confidence interval is 9.59 to 10.81

16
Normality rules for confidence
• Confidence intervals assume a normal distribution of

possible samples
• Q: when can you assume normality for a sampling

distribution of a continuous interval variable (such as
income?)
• A1: when N >= 30
• A2: when observations in the population can be
assumed to be normally distributed.
17
5.3: Confidence intervals for population proportions:
• Confidence interval for a population proportion:
ˆ (1  ˆ )
ˆ  z
n
• Example, 424 of 1000 respondents in a poll report that they
plan to vote for candidate X. Calculate a 95% c.i. for this
result.
o = .424 +- 1.96 * sqrt { [ .424 * (1-.424)] / 1000 }
o = .424 +- 1.96 * sqrt { [ .424 * .576 ] / 1000 }
o = .424 +- 1.96 * sqrt { .000244}
o = .424 +- 1.96 * .0156
o = .424 +- 0.031
18
o = .395 -> .455
Normality rules for confidence intervals for sample
proportions:
• Q: when can you assume normality for a sample of a

dichotomous interval variable (yes = 1, no = 0)
• A: when n(p(1-p)) >= 10
• (For what values of p do you need an extra large n to

ensure a normal sampling distribution?)
• What can go wrong when you inappropriately assume a

normal sampling distribution?
19
Putting it all together:
• Given the following sample of seven observations:

o 5,2,5,2,4,5,5
• What is the 95% confidence interval of the population

mean?
20
What is the best phrasing for an interval estimate?
• a.) The 95% confidence interval for the population

mean is 6.8 to 9.5? Or…
• b.) There is a 95% probability that the true

population mean is between 6.8 and 9.5? Or…
• c.) We estimate that 95% of samples from the

underlying population would fall within 1.35 of the
true population mean, and we estimate that the true
population mean is 8.15? 21
Confidence intervals using STATA
• Confidence intervals for means and proportions using cii
• 95 % confidence interval for General Social Survey sexfreq question
• as per A&F example 5.1
• Command is: cii samplesize mean standarddeviation, level(level)
cii 1055 10.2 10.1, level(95)

-------------+---------------------------------------------------------------
| 1055 10.2 .3109533 9.589842 10.81016
* Variant with higher threshold for “confidence”
cii 1055 10.2 10.1, level(99)

-------------+---------------------------------------------------------------
| 1055 10.2 .3109533 9.397584 11.00242
* 95% confidence interval for proportion, as per A&F example 5.2
cii 1934 895, level(95)

-- Binomial Exact --
-------------+---------------------------------------------------------------
22
| 1934 .4627715 .011338 .4403617 .4852942
5.4: Choosing the best sample size
• Cost is directly proportional to sample size, so we

generally want the minimum sample to do the job.
• Estimating minimum sample size is commonly done

with population proportions
• With population proportions, you do not need to
make separate guesses about the population
mean and standard deviation.
• With population proportions, it is easy to identify
a conservative mean, and the bias does not vary
much. 23
Choosing the best sample size for a population
proportion
• We already have an equation for the confidence interval:
ˆ (1  ˆ )
c.i.  ˆ  z
n
• When we choose the best sample size, we choose one half
of the confidence interval (the top one) and solve for n
 (1   )
nz 2
(c.i.top1/ 2   ) 2
• Agresti and Finlay’s term for one half of the confidence
interval is the confidence bound B
24
Sample size example:
• Example: Sample size for election poll:
• Desired 95% c.i. = + or – 3%
• Preliminary estimate: π = .50
• What sample size is needed?
25
Choosing the best sample size for a sample mean
• Estimating minimum sample size is less commonly

done with population means
• With population means, you need to make
separate guesses about the population mean
and standard deviation.
• We generally have a hard time making a good
guess about a population standard deviation
without measuring it.
26
Choosing the best sample size for a population mean
• We already have an equation for the confidence interval:
s
c.i.  Y  z
n
• When we choose the best sample size, we choose one half
of the confidence interval (the top one) and solve for n
 2
nz 2
(c.i.top1/ 2   ) 2
• Again, Agresti and Finlay’s term for one half of the

confidence interval is the confidence bound B
27
Sample size example:
• Example: Sample size for study of educational

attainment among elderly native Americans:
• Desired 99% c.i. = + or –1 year
• Preliminary estimates: μ = 12, σ = 2.5
• What sample size is needed?
28

601 Class 05

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

601 Class 05

Uploaded by

Copyright:

Available Formats

Sociology 601: Class 5, September 15, 2009

• Given a population parameter, we know that a

• We know what we are doing at a qualitative level.

• We will take it to a quantitative level: How good is

• We will go over formal language and equations for

• Once we have equations for predicting a population

• Point estimate: a single number, calculated from a

• Point estimator: the equation used to produce the

• Interval estimate: a range of numbers around the

• The typical point estimator

• The typical point estimator of a

• Q: is this a point estimator of a mean?

Estimated standard deviation of observations in a

• Estimated standard error ˆ

• Special case: estimated

• What is the estimator of the population mean?

• What is the estimate of the population proportion with a

• What is the estimate of the population standard error for the

• Interval estimate (also called a confidence interval):

• Confidence coefficient: The probability that the

• We usually are told the desired confidence

95% confidence interval for a sample mean:

95%c.i.  Y  1.96 * se(Y )

Q: how is std. err. of age calculated?

• Confidence interval of a mean

• Assumption: the sample size is large enough that

• Usually, we are not given z. Instead we start with a

• We generally use a 2-tailed distribution in which ½ of

• Example: find c.i. when Ybar =10.2, s=10.1, N=1055,

• z is derived from the 95% value: what value of z leaves 95% in

• The standard error is s/SQRT(n) = 10.1/SQRT(1055) = .31095

• Top of the confidence interval is 10.2 + 1.96*.31095 = 10.8095

• Hence, the confidence interval is 9.59 to 10.81

• Confidence intervals assume a normal distribution of

• Q: when can you assume normality for a sampling

• Confidence interval for a population proportion:

• Q: when can you assume normality for a sample of a

• (For what values of p do you need an extra large n to

• What can go wrong when you inappropriately assume a

• Given the following sample of seven observations:

• What is the 95% confidence interval of the population

• a.) The 95% confidence interval for the population

• b.) There is a 95% probability that the true

• c.) We estimate that 95% of samples from the

cii 1055 10.2 10.1, level(95)

* Variant with higher threshold for “confidence”

cii 1055 10.2 10.1, level(99)

* 95% confidence interval for proportion, as per A&F example 5.2

cii 1934 895, level(95)

• Cost is directly proportional to sample size, so we

• Estimating minimum sample size is commonly done

• Example: Sample size for election poll:

• Desired 95% c.i. = + or – 3%

• Preliminary estimate: π = .50

• What sample size is needed?

• Estimating minimum sample size is less commonly

• We already have an equation for the confidence interval:

• Again, Agresti and Finlay’s term for one half of the

• Example: Sample size for study of educational

• Desired 99% c.i. = + or –1 year

• Preliminary estimates: μ = 12, σ = 2.5

• What sample size is needed?

You might also like