You are on page 1of 26

Introduction to hypothesis testing: t-tests

In Lecture 3, we discussed the use of the normal distribution in descriptive statistics (bell-shaped traits)

The second use of the normal distribution is in statistical inference


This is possible because there are other quantities showing a normal pattern:
o For example, the mean of sample means!

Confused? Thats normal

Understanding this idea is key to statistical inference and testing

If these are bullet holes on a target, where do you guess the bulls eye is?

Somewhere around here?

Yesbut why?

Because we assume that error was randomly distributed around the bulls eye
o if youre trying to hit the target, it is unlikely that you

are going to miss more often in one direction

As weve seen, the normal distribution describes such random and symmetrical deviations around a central value

This common-sense observation captures the essence of the Central Limit Theorem, the analytical foundation of predictive statistics
6

Back to statistics: lets say we take a small sample from a population, and calculate proportion of atheists, or mean height in sample
o This sample may not represent the true

proportion of atheists or the average height in the population: it carries an error

But lets say I take many samples, and calculate mean height in each one; what would their distribution look like? Answer: the same as the bullet holes in the target
o errors are random o best guess for position of true mean height is the

centre of distribution of samples o i.e. the best guess is the mean of the samples (the mean of sample means)

Example: lottery draw


o We have 100 balls numbered 1 to 100 o (in this case, mean=50.5, sd=29) o Lets say we take samples of 5 balls, and

calculate their mean (one sample at a time)

N= 10 samples of 5

What happens as number of samples increases?


o mean of the means approaches a normal

N= 30 samples of 5

distribution o mean of means approaches true mean of the population of 100 balls!
N=100 samples of 5

9 N=200 samples of 5

So when we are trying to identify the true mean of a variable in a population, the best guess is the sample means If we have many samples, their mean is the best guess
o But this is very rarely the case!

In most cases, we have only one sample; the sample mean is your best (and only!) estimator of true mean

10

Sample has mean and standard deviation; so what is the probability that it identifies a certain value (the true mean we want t find)? W can say that the true mean is the sample mean plus or minus x (the margin of error)
o

95%

Thats why we calculate standard deviations and confidence intervals

Example: take lifespan variable > mean(lifespan, na.rm=T) [1] 69.71495 sd(lifespan, na.rm=T) >[1] 9.644646 So what is the true mean?

95%

Conventionally, we take the 95% interval around sample mean as the confidence interval: it is 95% certain that the true mean is inside it To calculate 95% confidence interval around sample mean, type

> t.test(lifespan)$conf.int [1] 68.34922 71.08068 attr(,"conf.level") [1] 0.95 Conclusion: if sample mean is 69.72 and standard deviation is 9.64, it is 95% likely that true mean is between 68.35 and 71.08

If my estimator has mean and standard deviation shown, is the true mean (the bulls eye) inside my 95% confidence interval? In this example, yes!

11

A difference: instead of standard deviation, we calculate a standard error of

mean (because the sample deviation is an error relative to true mean)


technicality!)

Sem is like the standard deviation, but divided by sample size (minus one to avoid bias

sem =

Interpretation: (i) sem is proportional to standard deviation in the population ()


o

when sampled population (balls in a bag, subjects in my study) shows more variation, samples are more variable and error (deviation between sample mean and true mean) is larger

(ii) sem is inversely proportional to the size of the sample (n)


o

A random sample of 20 gives better estimate of true mean than samples of 5

12

Last thing: to calculate confidence intervals (margin of error), we have to use the Students t-distribution
o it is similar to normal, but used when standard

deviation in population is unknown (remember: we only know standard deviation of sample) o works better than normal with small sample sizes o approaches normal when n is large

95%

This is why tests comparing means are called t-tests


13

We are ready to test hypotheses about means


o is a sample mean representative of true mean? (one-

sample t-test) o are European countries richer than sub-Saharan countries? (two-sample t-test) o does a new drug increase survival of patients (paired t-test)

t-tests provide such group comparisons; they are important to validate statements about social indicators, income, fairness, justice, historical processes etc.
o does European colonisation affect country income

o does gender affect income?

Test of significance of difference has to take two things into account:

(i) the sample sizes


If my sample sizes are very large, even a very small difference in means will be statistically significant
example: difference in colon cancer incidence between people who eat more than 600 g of red meat per week and those who dont is 13%, and is only identifiable with large samples (~100,000 people)

(ii) the measured difference in means


If difference is too large, it will be significant even if sample size is small
Example: if I am comparing average size in mice and elephants, a sample of 1 mouse and 1 elephant is good enough!

size

t-tests simply calculate whether the difference between two means/values is real = statistically significant = different from zero
o

if difference is zero, they are not different!

In order to use probability distributions, we must standardise variables; so the difference is standardised

t=

2.5%

95%

2.5%

So what we want to know is whether t (difference) is too different from zero (i.e. not similar)

t=-1.96 0 t=1.96

What is too different? Conventionally, we calculate 95% confidence intervals; if a value is inside it, it is not different from test value and there is no difference (well see how it works)

Basic rule: what we need to know is the P-value (probability value) of a t-test In a t-test, the null hypothesis (=status quo, conservative hypothesis) is always that there is no difference between the two compared values
o i.e. if you want to prove that two groups differ, you

must reject the null hypothesis

2.5%

95%

2.5%

The P-value of a test is the probability that null hypothesis is true (i.e. groups are not different)
o conventionally, we only reject null hypothesis is P

t=-1.96

t=1.96

value is less than 5% =P<0.05 o (as well see, thats because we use 95% confidence intervals)

Example: is life expectancy in the world different from 70 years?


o one-sample t-test: we are comparing a group to a value

(=the test value; a hypothetical true value of 70 years)

How to do it in R? Just specify test value as mu=70

Sample mean=69.71
o

> t.test(lifespan, mu=70) One Sample t-test data: lifespan t = -0.4117, df = 193, p-value = 0.681 alternative hypothesis: true mean is not equal to 70 95 percent confidence interval: 68.34922 71.08068 sample estimates: mean of x 69.71495

that doesnt seem to be very different from 70) t statistic, the standardised difference between sample mean and test value, is close to zero

t=-0.41
o

95% CI: [68.35-71.08]


o o

Confidence interval of lifespan: my sample suggests that life expectancy in the world in between 68.3 and 71.08 years; and this it includes 70 years

Example: is life expectancy in the world different from 70 years?


o one-sample t-test: we are comparing a group to a value

(=the test value; a hypothetical true value of 70 years)

How to do it in R? Just specify test value as mu=70

P value=0.681=68%
o

> t.test(lifespan, mu=70) One Sample t-test data: lifespan t = -0.4117, df = 193, p-value = 0.681 alternative hypothesis: true mean is not equal to 70 95 percent confidence interval: 68.34922 71.08068 sample estimates: mean of x 69.71495

This is the probability of null hypothesis (=life expectancy is not different from 70 years) Therefore, you must accept the null hypothesis

P is high
o

Conclusion: based on our sample, life expectancy in the world is not significantly different/shorter than 70 years

But is life expectancy in the world different from 75 years?


o

now we set mu=75

> t.test(lifespan, mu=75) One Sample t-test data: lifespan t = -7.6324, df = 193, p-value = 1.033e-12 alternative hypothesis: true mean is not equal to 75 95 percent confidence interval: 68.34922 71.08068 sample estimates: mean of x 69.71495

So what is the probability that average lifespan across countries is 75 years?


o o o o o

P = 1.033*10(-12) = 0.000000000001033 = 0.00000000001033%; This is very low! We must reject null hypothesis and and accept alternative hypothesis t=-7.63; thats significantly different from 0 75 years is outside 95% CI Therefore, life expectancy is below 75 years

You may also want to test whether two samples are significantly different in some respect
o for example, are South and Southeast Asian

countries richer than Latin American countries? o i.e. do differences or similarities in economic models in recent decades cause differences in average income between the two areas?

Procedure is similar: but t-statistic is now the difference between means of the two compared groups

t=

12

sedm (standard error of the difference of means) is automatically calculated by R

In file HDR2011, variable continent is seasia for South and Southeast Asian countries and latin for Latin American countries; others are NA (non-available)

> t.test(GNI ~ continent) Welch Two Sample t-test data: GNI by continent t = -1.1455, df = 20.327, p-value = 0.2653 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -13340.319 3876.397 sample estimates: mean in group latin mean in group seasia 9054.355 13786.316 Conclusion: We may think that the difference of ~US4,700 between the areas was large enough to prove a significant difference But it isnt: there is too much variation in income in the two areas

Welch test is name of this t-test P-value=0.26=26%


We cannot reject null hypothesis: Areas do not differ by income Notice that 95% CI of difference in income between the areas includes zero; i.e. you cannot exclude zero difference in income from your confidence interval

A paired test should be used when the two compared measurements are linked, i.e. the subjects/cases are not independent For example, the two group means may be two measurements from the same individual
o In the case of a trial of a new drug for blood pressure, blood pressure before and

after drug administration in the same patients

Run library ISwR


> attach(intake) #this is a file in the library ISwR > intake # what does it look like? or try head(intake)

The file intake has data on pre- and post-menstrual calorie consumption in 11 women;
o Question: is there a difference in caloric intake before and after menstrual cycle?

So now lets try a paired t-test:

> t.test(pre, post, paired=T) Paired t-test data: pre and post t = 11.9414, df = 10, p-value = 3.059e-07 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: 1074.072 1566.838 sample estimates: mean of the differences 1320.455 P value: very low! We must reject null hypothesis (no difference) Confidence interval: 95% likely that difference in calorie is between 1074 and 1566 kcal Conclusion: there is a clear difference between calorie intake pre and post

1) One-sample t-test Is income per capita (GNI) in the world significantly less than US$20000? 2) Two-sample t-test Let us compare schooling years in Southeast Asia and Latin America
o What is the average schooling of children in the two regions? o Does schooling significant differ between the two areas? What is

the probability that they do differ?

3) Paired t-test Give two examples of studies that could require paired t-tests

Confidence intervals and all t-tests assume a normal distribution, even when sample is small
o And they are based on a theory of means of various samples, which in

practice we dont have o Thats why you do not prove differences; you compare groups and give an estimate of the probability that they are difference or similar

Remember: null hypothesis is always that means are not different Current trend is to provide confidence intervals rather then P values when reporting results of tests in general (not just t-tests), so get used to calculating and interpreting them

You might also like