Lecture6 HypothesisTesting

ESS
116
Introduc)on to Data Analysis in Earth Science

Instructor: Mathieu Morlighem

E-mail: mmorligh@uci.edu (include ESS116 in subject line)
Oce Hours: 3218 Croul Hall, Friday 3:00 pm - 4:15 pm
Image Credit: NASA
Midterm exam
Part 1: take home available NOW on the class website:
hVps://eee.uci.edu/15f/42120/midterm
DUE: November 5th, 2:00 pm (do your own work)
I highly recommend starZng to work on the midterm early
Part 2: 30 min in class next week

November 5th, 2:40 pm (MSTB 118)
Open Book (no laptop/cell phone/tablet)
mix of mulZple choice and short answer quesZons
Everything from Lecture 1 to 5 (no hypothesis tesZng)
No make up exam for the midterm, no late submission

Midterm grade dropped if Final grade is beVer
Midterm EvaluaZon
Open unZl next lecture
What should be improved ?
What can I do to help you learn or is there something
that isnt working ?
Are the quick reviews useful ?
Are the lab useful ? Should they be longer ?
Do you want more MATLAB, more stats, or this is a good
balance ?
Todays lecture
1. Lecture 5 quick review

2. Lecture 6 Hypothesis tesZng
Sampling DistribuZon of the sample mean
Central Limit Theorem (CLT)
Condence intervals
Hypothesis TesZng
t-test (Comparing means)
2-test (Goodness of t)
Lecture 5 - review
Popula'on: the actual properZes of the real world
Sample: set of values imperfectly represenZng the
populaZon

Parameters: refer to the popula)on (e.g., and )
x
Sta's'cs: refer to the sample (e.g., and s)

Accuracy: quality of being close to the true value
Precision: number of signicant digits in a numerical
value (measurements or calculaZon)
Lecture 5 - review
Sample visualizaZon
Frequency Table
CumulaZve Frequency
Histogram
Rules for a good histogram

q
number of bins number of data values

histogram takes either a number
of bins, or a list of bin edges
What you need to know

Central Tendency:
Mean (average)
Median (50% higher, 50% lower)
Mode(s) (peak value(s))
Dispersion:
Range (max min)
Standard deviaZon (average distance to mean)
Variance (square of Std Dev)
Shape:
Skewness (posiZve: tail to the right, negaZve: tail to the lem)
Know how they relate to visual features on a histogram
Probability Density FuncZons

Histograms: empirical frequency distribuZon of our
sample.
N !1
A histogram for and an innitely small bin
size will produce a Probability Density func'on (PDF)
The probability that x is between x1 and x2 is:
P (x1 < x < x2 ) =
x2
f (x)dx
x1
Examples of theoreZcal DistribuZons:

Normal distribuZon (2 parameters: and )
Z distribuZon (0 parameters)
Students t distribuZon (1 parameter: )
MATLAB theoreZcal distribuZons

Normal (,)
Given x0, nd p0
>> p0 = normcdf(x0,mu,sigma);
Given p0, nd x0
>> x0 = norminv(p0,mu,sigma);
Z-distribuZon
>> p0 = normcdf(x0);
>> x0 = norminv(p0);
t-distribuZon
>> p0 = tcdf(x0,V);
>> x0 = tinv(p0,V);
2-distribuZon
>> p0 = chi2cdf(x0,V);
>> x0 = chi2inv(p0,V);

p0 = P( x < x0)

e.g.: 0.88 = P(x < 1.17)
i>Clicker quesZon
ESS 116 grades follow a Normal distribuZon of mean
800 with a standard deviaZon of 100. What is the
probability of having a grade below 500?
A.
B.
C.
D.
1 normcdf(500,800,100);
1 norminv(500,800,100);
normcdf(500,800,100);
norminv(500,800,100);
Lecture 6 Hypothesis tesZng
Sampling distribuZons
Sampling distribuZons
For one populaZon, results will vary from sample to
sample
How much do we except these results to vary from
sample to sample?
Sampling distribu'on: distribuZon associated to
samples rather than individual values from a populaZon
Example: Sampling distribu)on of the sample mean
Graph of all possible values of the sample mean and
how omen they occur
The mean of the populaZon of all possible sample
means is the same as the mean of the enZre populaZon
Sampling distribuZon
PopulaZon distribuZon
Sampling distribuZon of the

sample mean for n=10
i>Clicker quesZon
What would happen to the standard deviation of the sample mean (right) if we
increase the number of rolls for all sample (n=20 or n=100) ?
A.
B.
C.
D.
The standard deviation would increase

The standard deviation would decrease
The standard deviation would remain unchanged
Dont know.
Standard Error (SE)

Variability of X is measured by the standard deviaZon
There might be a gap between the sample mean
x
and the populaZon mean

Standard Error: variability in the sample mean
PopulaZon standard deviaZon
=p
Sample size
Decreases as the sample size increases (more precise)
Central Limit Theorem

If the distribuZon of X is normal:
The distribuZon of the sample mean is also normal

If the distribuZon of X is unknown or not normal

If n>30 the distribuZon of the sample mean X can be
approximated by a normal distribuZon:
Mean:
Standard deviaZon: x = p
n

The Central Limit Theorem does not care what the

distribuZon of X is!
hVp://onlinestatbook.com/2/sampling_distribuZons/clt_demo.html

PopulaZon distribuZon
Sampling
distribuZon
of the
sample mean
X =
Central Limit Theorem in AcZon
Example
The average male drinks 2 L of water when acZve

outdoors (with a standard deviaZon of 0.7 L). You are
planning a full day in nature with 50 men and will bring
110 L of water. What is the probability that you run out ?
Example
Population distribution
P(run out) = P(average water use > 110/5 L)
= 0.7L
= P(average water use > 2.2 L)

= P( x
> 2.2)
< 2.2)
= 1 P( x
= 2L
Sampling distribution of the Sample mean
= 1 normcdf(2.2,2,0.7/sqrt(50))
= 0.0217
=p
0.7
=p
50
N
P (
x > 2.2L)
X = = 2L
The probability of running out of water is 2.17%
Condence Interval
Condence interval in the mean

Condence Intervals: provide staZsZcal limits for your mean
values based on a degree of staZsZcal condence.

Ex: We can say with 95% condence that the average

temperature in Irvine is within [18C 24C] or 21 3 C

How to calculate this interval?

Set the level of signicance ( = 0.05 for a 95% CI)
Use a Normalized sample distribuZon of the sample mean

X
X
Follows a normal distribution
Z =
Find T such that P( -T < < +T)

= P(-T < < +T)

X
T x
T
= P( < < )
= 1-
Follows a z-distribution !
DistribuZon of the sample mean
CLT: the distribuZon of the sample mean is nearly Normal

What if we dont know , can we say s ?
If the sample size n > 30: Yes
If the sample size n < 30: Yes but
the distribuZon of X needs to be roughly normal
we pay a penalty: a t-distribuZon (faVer tails)
Example n>30
You sample 36 apples from your farms harvest of over
200,000 apples. The mean weight of the sample is 112
grams (with s = 40 grams).
What is the probability that the mean weight of the
200,000 apples is within 100 and 124 grams?
Example n>30
) = P( x
within 12 of )
P( within 12 of x
= P( x
within 12 of X )
= P(
12
Z=
<
+12
= normcdf(12/(40/6))
- normcdf(-12/(40/6))
Sampling distribution of
the Sample mean Normalized
<
= 0.9281
X
ps
n
We have a 92.8% chance that the actual

mean is within 12 grams of our sample mean
12/
12/
Example n<30
7 paZents blood pressures have been measured amer
having been given a new drug for 3 months. They had
blood pressure increases of 1.5 2.9 0.9 3.9 3.2 2.1 and 1.9

Construct the 95% Condence Interval (CI) for the true
expected blood pressure increase for all paZents in a
populaZon.
Example n<30
Here are our statistics for n = 7
x
= 2.3429
s = 1.0422
What is our 95% confidence interval ?

-z = Znv(0.025,7-1)
= -2.4469

Sampling distribution of the Sample mean

(Students t distribution with = n-1 = 6)
Z=
95%
z = Znv(0.975,7-1)
= +2.4469

x
p = 2.4469
s/ n
X
ps
n
s
x = 2.4469 p
n
x = 0.9639
2.5%
- z
2.5%
There is a 95% chance

that the mean, , is within 2.3429 0.9639
Hypothesis TesZng
IntroducZon
You read that, on average, a volcanic erupZon lasts 7 weeks (=7).
But we suspect that this number is wrong and should higher (>7).

How can we prove for a given level of signicance (=0.05)?

x
We look at the past n=100 erupZons and nd =7.2 and s =1 week.

Assuming that =7
s
=p 'p
n
n
> 7.2)
P(x
< 7.2)
= 1 P( x
= 1 normcdf(7.2,7,1/10)
= 0.0228 <
=7
x
= 7.2
Assuming that = 7, there is only a 2.3% chance of

finding a mean of 7.2 weeks, so we can reject =7
Conclusion: >7
TesZng one populaZon mean

You read that, on average, a volcanic erupZon lasts 7 weeks (=7).
But we suspect that this number is wrong and should higher (>7).

Null Hypothesis H
0
How can we prove for a given level of signicance (=0.05)?

AlternaZve Hypothesis H1
x
We look at the past n=100 erupZons and nd =7.2 and s =1 week.

Assuming that =7
=7
s
=p 'p
n
n
x
= 7.2
> 7.2)
P(x
p-value
< 7.2)
= 1 P( x
= 1 normcdf(7.2,7,1/10)
= 0.0228 <
Assuming that = 7, there is only a 2.3% chance of

finding a mean of 7.2 weeks, so we can reject =7
Conclusion: >7
Hypothesis tesZng
The classical way to make staZsZcal comparisons is to prepare
a statement about a fact for which it is possible to calculate
its probability of occurrence.
This statement is the null hypothesis and its counterpart is
the alterna've hypothesis.
The null hypothesis is tradiZonally wriVen as H0 and the
alternaZve hypothesis as H1.
A staZsZcal test measures the experimental strength of
evidence against the null hypothesis.
Curiously, depending on the risks at stake, the null hypothesis
is omen the reverse of what the experimenter actually
believes for tacZcal reasons.
Examples of Hypotheses
Let 1 and 2 be the means of 2 samples
We want to invesZgate the likelihood that their means
are the same:
Null Hypothesis:
H0: 1 = 2
AlternaZve Hypothesis : H1: 1 2
The AlternaZve Hypothesis could also be: H1: 1 > 2
The rst example of H1 is said to be two-sided or twotailed (includes both 1 > 2 and 1 < 2)
The second is said to be one-sided or one-tailed.
The number of sides has implicaZons on how to
formulate the test
Possible outcomes
H0 is correct
H0 is incorrect
H0 is accepted
Correct decision
Probability: 1-
Type II error
(missed detec)on)
Probability:
H0 is rejected
Type I error
(false alarm)
Probability:
Correct decision
Probability: 1-
Level of signicance: probability of comming a Type I error

is set before performing the test.
In a two-sided test, is split between the two opZons.
Omen, H0 and are designed with the intenZon of rejecZng H0,
thus risking a Type I error and avoiding the unbound Type II error.
The more likely this is, the more power the test has. Power is 1
Importance of choosing H0
SelecZng H0 has consequences on decision making
Customarily, tests operate on the lem column of the conZngency
table and the harder to analyze right column remains unchecked
Consider a jury trial:
H0 : not guilty
True
False
Test Accept Correct

acZon Reject Wrong
H0 : guilty
True
Test Accept Correct
acZon Reject Wrong
False
A: You assume that the defendant

isnt guilty. Wrong rejecZon: an
innocent person is guilty and
punished for the crime s/he did
not commit

B: You assume that the defendant

is guilty. Wrong rejecZon: a guilty
person is innocent and let go free
Importance of choosing H0
SelecZng H0 has consequences on decision making
Customarily, tests operate on the lem column of the conZngency
table and the harder to analyze right column remains unchecked
Consider environmental remedial acZon:
H0 : Site is clean
True
False
Test Accept Correct

acZon Reject Wrong
H0 : Site is contaminated
True
Test Accept Correct
acZon Reject Wrong
A: Wrong rejecZon means the site

is declared contaminated when it
is actually clean, which should
lead to unnecessary cleaning
False
B: Wrong decision declares a

contaminated site clean. No
acZon prolongs a health hazard
In both cases: P(Type I Error)
StaZsZc
A key step in the feasibility of being able to run a test is the ability of
nding an analyZcal expression for a staZsZc such that:
It is sensiZve to all parameters involved in the null hypothesis
It has an associated probability distribuZon

p-value: the probability that the staZsZc takes values beyond the
value calculated using the data while H0 is sZll true. Hence:
If p-value > (level of signicance), H0 is accepted
The lower the p-value, the stronger is the evidence provided by
the data against the null hypothesis.

The p-value allows to convert the staZsZc to probability units
ParZZon
The level of signicance is employed to parZZon the range of
possible values of the staZsZc into two classes:
One interval, usually the longest one (in green), contains those
values that, although not necessarily saZsfying the null
hypothesis exactly, are quite possibly the result of random
variaZon. If the staZsZc falls in this interval, H0 is accepted
accept
reject
The red interval comprises those values that, although possible,

are highly unlikely to occur. In this situaZon, H0 rejected. The
departure from H0 most likely is real, signicant.
When the test is two-sided, there are two rejecZon zones.
reject
accept
reject
Sampling distribuZon
The sampling distribuZon of a staZsZc is the distribuZon
of values taken by the staZsZc for all possible random
samples of the same size from the same populaZon.
Examples of such sampling distribuZons are:
Standard normal and the t-distribuZons for the
comparison of two means
The F-distribuZon for the comparison of two variances
The 2-distribuZon for the comparison of two
distribuZons
TesZng Procedure
1. Select the null hypothesis H0 and
the alternaZve hypothesis H1.
2. Choose the appropriate staZsZc
3. Set the level of signicance
4. Evaluate the staZsZc for the case
of interest, zs.
5. Use the distribuZon for the staZsZc in combinaZon with the level
of signicance to dene the acceptance and rejecZon intervals.
Find out either the corresponding:
p-value of the staZsZc in the probability space, or
level of signicance in the staZsZc space, z.
6. Accept the null hypothesis if zs < z or if p-value > . Otherwise,

reject H0 because its chances to be true are less than .
t-test Comparing Means
Dierence in mean
Is the dierence in mean between these 2 groups
systemaZc, or just due to chance?
Dierence in mean
Dierence in mean
Dierence in mean
Factors aecZng our condence in the answer:
Natural variability (standard deviaZon)
Sample sizes (n)

How to quanZfy our condence in the answer of the

dierence between the mean in two data samples?
Due to variability with the sample
Due to the amount of data points in the sample
Students t test
Null-Hypothesis:
(H0): 1 = 2
PopulaZon means are not staZsZcally dierent

AlternaFve Hypothesis: (H1): 1 2
PopulaZon means are staZsZcally dierent.
To accept the alternaZve hypothesis at 95% condence:

We must show only 5% probability the null hypothesis (H0) is
true, which jusZes rejecZng it
Paired vs Unpaired t-test

Unpaired t-test: the two samples are from independent
populaZons
Ex1: Are tropical sh larger than temperate sh?
Ex2: Are the temperatures in Long Beach and Death Valley
signicantly dierent?

Paired t-test: the two samples are from the same

populaZon
Ex1: Do sh get larger as they age?
Ex2: Is the annual temperature in the last 5 years in Death
Valley signicantly higher than in the Earlier 5 years?
Paired vs Unpaired t-test

Unpaired t-test (independent populaZons)
Sample 1: size n1, mean m1 and standard deviaZon s1
Sample 2: size n2, mean m2 and standard deviaZon s2
m1 m2
tstat = q 2
= n1 + n2 - 2
s1
s22
n1 + n2

Paired t-test (same populaZon)
xd , sd )
We look at the dierences between all n pairs: (
tstat
x
d
p
=
sd / n
= n - 1
Students t-test: Paired test

tstat
If tstat is large:
x
d
p
=
sd / n
The dierence between groups is bigger than the normal

variability within the sample
Therefore: the means of the 2 samples are signicantly
dierent from each other
If tstat is small:
The dierence between groups is smaller than the normal
variability within the sample
Therefore: the means of the 2 samples are not signicantly
dierent from each other
Students t-test: Paired test

We need a threshold tcrit:
If |tstat|> tcrit : the dierence between the means is unlikely

to have occurred by chance
theres likely to be a real systemaZc dierence between
the two groups (and thus theres likely to be a real
systemaZc dierence between the two condiZons)
Given a probability p, we can determine tcrit using

Students t distribuZon (with = n 1) as a funcZon
of p
Students t test (2-tailed)

tstat value is staZsZcally disZnguishable from zero. Example (90% conf)
0.4
0.35
0.3
Density
0.25
There is only a 5%
probability of
nding
t values higher
than this value
purely by chance...
There is only a 5%
probability of
nding
t values higher
than this value
purely by chance...
There is
a 90% chance
of nding
t values in this
range by chance
0.2
0.15
0.1
0.05
0
-4
-3
-2
-1
Critical Value
Summary
How to conduct a t-test?
1. Decide upon a level of signicance .
e.g. 99% and 95% are typical ( = 0.01 or 0.05)
2. From this, decide if one-tailed or two-tailed and
use MATLABS tinv command to nd tcrit
3. Compute tstat from your sample
4. Compare tstat and tcrit
If |tstat| > tcrit: the dierence is signicant (theres likely
an actual dierence between the to means)
else: the dierence is not signicant
5. OpZonal: determine the p-value
Example
We are interested in ocean acidicaZon. We measure the pH of
ocean water at the pier of Newport Beach at two dierent dates:

In 1994: 8.03, 8.08, 7.99, 8.00, 7.93, 7.98
In 2004: 7.99, 8.02, 7.92, 7.94, 8.01, 7.93
From our two sample, we have:
In 1994: m1 = 8.0017 and s1 = 0.0504
In 2004: m2 = 7.9683 and s2 = 0.0406
Does the dierence between the two means show a signicant
decrease or is it likely caused just by chance?
Example
1. Choose a level of signicance = 0.1 (CI 90%)
2. This is a one tailed test (H1: m2 < m1)
Numbers of degrees of freedom: = 6 -1 = 5

tcrit = tinv(1-0.1,5)
= 1.4759
3. Now, we have our criZcal value, what is our staZsZcs?
d = [8.03, 8.08, 7.99, 8.00, 7.93, 7.98]

- [7.99, 8.02, 7.92, 7.94, 8.01, 7.93];
tstat = mean(d)/(std(d)/sqrt(6));
= 1.4464
4. tstat < tcrit : We cannot reject H0

5. OpZonal: p-value = tcdf(tstat,5) = 0.8953
2
-test Goodness of t
2-test Goodness of t
We want to compare an observed frequency
distribuZon to a theoreZcal distribuZon.
Ex: we want to show that the yearly averaged rainfall in
Irvine follows a normal distribuZon
Ex: we want to make sure that a dice is not loaded
2 staZsZc
We decompose the number of observaZons (n) over
k intervals (or bins, or classes)
k must saZsfy n/k 5
k 10
So n 50
The Expected number of counts in any cell is Ei

The Observed number of counts is Oi
2
stat
k
X
(Oi
i=1
Ei )
Ei
2 staZsZc
2stat measures the mismatch between the Expected
and the Observed distribuZons
2stat = 0 perfect t
2stat large: poor t
Our staZsZc 2stat follows

a 2 -distribuZon !
ConducZng a 2 test
1. Formulate a null and alternaZve hypothesis:
H0: The data are consistent with a specied distribuZon
H1: The data are not consistent with a specied distribuZon
2. Choose a Signicance level: = 0.05 (5%)

3. use MATLABS chi2inv command to nd 2crit
4. Analyze Sample data
Degrees of freedom = k-1
Calculate the expected frequency counts Ei
k
2
Calculate the test staZsZc
X
(O
E
)
i
i
2
stat
5. Interpret the results
i=1
Ei
Example
Is our dice loaded? Compare to a uniform distribuZon
Value
Observed freq.
Expected freq.
(O-E)^2/E
16
10
3.6
10
2.5
10
0.1
10
0.9
10
1.6
17
10
4.9
Total
60
60
13.6

For alpha = 0.02:
chi2crit = chi2inv(1-0.1,6-1) =13.3882
The die is loaded (98% condence interval)
Next Week
Lab 6: Hypothesis tesZng

DUE: two weeks amer the lab starts (EEE)
Lecture 7: Curve Fing and interpolaZon
Midterm Part 1 take home
Midterm Part 2 in class

Lecture6 HypothesisTesting

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lecture6 HypothesisTesting

Uploaded by

Copyright:

Available Formats

ESS

Instructor: Mathieu Morlighem

Image Credit: NASA

Part 2: 30 min in class next week

No make up exam for the midterm, no late submission

Sta's'cs: refer to the sample (e.g., and s)

Rules for a good histogram

number of bins number of data values

What you need to know

Know how they relate to visual features on a histogram

Probability Density FuncZons

P (x1 < x < x2 ) =

Examples of theoreZcal DistribuZons:

MATLAB theoreZcal distribuZons

Lecture 6 Hypothesis tesZng

Sampling distribuZon of the

The standard deviation would increase

Standard Error (SE)

and the populaZon mean

Decreases as the sample size increases (more precise)

Central Limit Theorem

Central Limit Theorem

If the distribuZon of X is unknown or not normal

The Central Limit Theorem does not care what the

Central Limit Theorem

Central Limit Theorem in AcZon

The average male drinks 2 L of water when acZve

P(run out) = P(average water use > 110/5 L)

= P(average water use > 2.2 L)

The probability of running out of water is 2.17%

Condence interval in the mean

Ex: We can say with 95% condence that the average

How to calculate this interval?

Find T such that P( -T < < +T)

DistribuZon of the sample mean

CLT: the distribuZon of the sample mean is nearly Normal

We have a 92.8% chance that the actual

Here are our statistics for n = 7

What is our 95% confidence interval ?

Sampling distribution of the Sample mean

There is a 95% chance

We look at the past n=100 erupZons and nd =7.2 and s =1 week.

Assuming that = 7, there is only a 2.3% chance of

TesZng one populaZon mean

We look at the past n=100 erupZons and nd =7.2 and s =1 week.

Assuming that = 7, there is only a 2.3% chance of

Level of signicance: probability of comming a Type I error

Test Accept Correct

A: You assume that the defendant

B: You assume that the defendant

Test Accept Correct

A: Wrong rejecZon means the site

B: Wrong decision declares a

The red interval comprises those values that, although possible,

6. Accept the null hypothesis if zs < z or if p-value > . Otherwise,

t-test Comparing Means

How to quanZfy our condence in the answer of the

PopulaZon means are not staZsZcally dierent

To accept the alternaZve hypothesis at 95% condence:

Paired vs Unpaired t-test

Paired t-test: the two samples are from the same

Paired vs Unpaired t-test

Paired t-test (same populaZon)