Professional Documents
Culture Documents
Methods for drawing conclusions about a population from sample data 2 key methods: Point estimates: calculate a single value (such as mean, proportion etc.) Confidence interval: calculate range of values that is likely to contain the true value of the parameter.
SE = n
n number of independent observations standard deviation Since standard deviation is typically unknown, if we have a sample size n 30, we can use sample standard deviation (s) instead.
Confidence interval
point estimate z * SE
z*SE - margin of error Interpreting confidence interval: We are XX% confident that the population parameter is between....
Hypothesis testing
Reject H0
Correct Decision Type 1 Error 1- (significance level) Type 2 Error Correct Decision 1- (power)
... significance level (probability, under H0, that the test concludes HA) 1- power of the test (probability, under HA, that the test concludes HA) The p-value quantifies how strongly the data favor HA over H0 (that is, shows the odds that we got difference in two treatments purely by chance). A small p-value (usually < 0: 05) corresponds to sufficient evidence to reject H0 in favor of HA . Hypothesis must be set up before observing the data. If they are not, the test must be two-sided. Test statistic: Z when point estimate is nearly normal
SE =
s diff
Z=
x diff 0 SE
1 2 SE = + n1 n 2
s1 s 2 + n1 n2
since we usually don't know population , if the sample has at least 30 observations, we can use standard deviation estimates based on sample, and Z-test. For small sample, we use t-test.
x t df
s n
x 1 x2 T =
1 2
( x 1 x 2)( 1 2 )
2 s1 s2 2 + n1 n 2
When standard deviations of two groups are nearly equal, we can use pooled standard deviation (by pooling data we improve estimation of the variance)
2 pooled
s ( n 1 )+ s 2 ( n 2 1) = 1 1 n1 + n2 2
df = n1 + n 2 2
F=
MSG MSE
MSG - measures variability between groups dfG = k-1 MSE mean square error (measures of variability within the groups) dfE = n-k If H0 is true, variation in the sample means (MSG) should be relatively small compared to within-group variation (MSE) Conditions for ANOVA analysis: Independence of data approximately normal distributions approximately constant variance in the groups
Standard error:
SE p=
Confidence interval:
p ( 1 p ) n
p z * SE
Test statistic:
Z=
p ( 1 p ) m n
Solving for n:
z n p ( 1 p ) m
( )
If we have a good estimate of p, we use it, otherwise standard error is largest when p = 0.5, so to cover the worst case scenario, if we are not sure about the true p, we choose p=0.5
( )
SE p p = SE p + SE p =
2 2
1 2 1 2
p 1 ( 1 p1 ) p 2 ( 1 p 2 ) + n1 n2
Confidence interval:
(p 1 p 2) z * SE
Nontechnical Introduction to Statistical Inference Prepared by: Gabriela Hromis
Hypothesis testing: H0: p1 p2 = 0 HA: p1 p2 0 When the null hypothesis is p1 = p2, we can use pooled estimate of the proportion (since we are assuming equal proportions)
p =
p 1 n1 + p 2 n2 n 1 + n2
SE =
p (1 p) p ( 1 p) + n1 n2
Chi-square Test
Use chi-square test when: Given a sample of cases that can be classified into several groups, determine if the sample is representative of the general population. Evaluate whether data resemble a particular distribution, such as a normal distribution or a geometric distribution.
(a) One-way table In one-way table describes counts for each outcome in a single variable. We can put our data in a table like this:
Categories observed expected C1 C2 C3 C4 Total
We want to establish whether observed counts differ from the expected counts by chance, so the sample is representative of population or not. Chi-square statistic is then:
(b) Two-way table Two way table describes counts for combinations of outcomes for two variables.
S1 C1 C2 Total S2 S3 Total
degrees of freedom: (number of rows -1)x(number of columns -1) Conditions for the chi-square test: independent observations each category has to have at least 5 expected cases degrees of freedom: k 2