Professional Documents
Culture Documents
and bootstrapping
Business Statistics 41000
Fall 2015
Topics
1. Hypothesis tests
I
I
I
I
I
I
I
Testing a mean: H0 : = 0
Testing a proportion: H0 : p = p0
Testing a difference in means: H0 : 1 2 = 0
Testing a difference in proportions: H0 : p1 p2 = 0
Testing a difference in means: H0 : 1 2 = 0 (paired sample)
Testing a difference in means: H0 : 1 2 = 0 (same variance)
Simulating from a null distribution
2. Confidence intervals
3. Bootstrap confidence intervals
To make things numerical, assume we recognize the shoes and know for a
fact they cost $285, or 5.65 log dollars.
So, if we call out all supposed homeless guys with shoes worth more than
exp(4.69) = $108 well only do so incorrectly 5% of the time.
3
0.010
0.005
0.000
Density
0.015
0.020
50
100
150
Price in dollars
0.3
0.2
0.1
0.0
Density
0.4
0.5
where 0 = 3.7.
6
0.2
0.1
0.0
Density
0.3
0.4
X N(, 2 ).
! 3!
! 2!
!!
+!
+ 2!
+ 3!
0.2
0.1
0.0
Density
0.3
0.4
! 3!
! 2!
!!
+!
+ 2!
+ 3!
0.2
0.0
0.1
Density
0.3
0.4
! 3!
! 2!
!!
+!
+ 2!
+ 3!
Then again, sometimes it will not be. But it will only rarely be too far off.
9
10
0.2
0.0
0.1
Density
0.3
0.4
Hypothesis testing asks the following question: if the true value were 0 ,
is my data in an unlikely region?
0 ! 3!
0 ! 2!
0 ! !
0 + !
0 + 2!
0 + 3!
0.2
0.1
0.0
Density
0.3
0.4
On the other hand, if the data falls in a likely region, we decide our
hypothesis was plausible and we fail to reject the null hypothesis.
0 ! 3!
0 ! 2!
0 ! !
0 + !
0 + 2!
0 + 3!
12
Level of tests
0.2
0.0
0.1
Density
0.3
0.4
0 ! 3!
0 ! 2!
0 ! !
0 + !
0 + 2!
0 + 3!
But one thing is always true: the probability of the rejection region (the
area under the curve) dictates how often we will falsely reject the null
hypothesis. This is called the level of the test.
13
Level of tests
0.2
0.1
0.0
Density
0.3
0.4
Because when the null hypothesis is true, we still end up in unusual areas
sometimes. How often this happens is exactly the level of the test.
0 ! 3!
0 ! 2!
0 ! !
0 + !
0 + 2!
0 + 3!
14
0.2
0.0
0.1
Density
0.3
0.4
0 3
0 2
0 +
0 + 2
0 + 3
I prefer to think of it the other way around: where we place our rejection
region dictates what the alternative hypothesis is, because it determines
what counts as unusual.
15
0.2
0.0
0.1
Density
0.3
0.4
0 ! 3!
0 ! 2!
0 ! !
0 + !
0 + 2!
0 + 3!
In all of the pictures so far, the level of the test has been = 0.05.
There is nothing special about that number.
16
0.2
0.0
0.1
Density
0.3
0.4
We could even have a rejection region in a small sliver around the null
hypothesis value.
0 3
0 2
0 +
0 + 2
0 + 3
Perhaps this would reflect evidence of cheating of some sort: the data fit
too well.
17
where X = n1
Pn
i=1
Xi .
18
19
We observe x =
269
5
= 53.8.
20
0.15
0.10
0.00
0.05
Density
0.20
0.25
40
45
50
55
60
The empirical or sample mean falls in the 10% rejection region (but not
the 5% rejection region).
21
p-values
The largest level at which we would reject our observed value is called
the p-value of the data.
In other words, the p-value is the probability of seeing data as, or more,
extreme than the data actually observed.
So the p-value will change depending on the shape of the rejection region.
So a p-value larger than the level of a test, implies that you fail to reject.
A p-value smaller than the level of a test implies you reject.
22
Application to a proportion
We ask n = 50 cola drinkers if they prefer Coke to Pepsi; 28 say they do.
Can we reject the null hypothesis that the two brands have evenly split
the local market?
We observe x = 28
50 = 0.56. The p-value is the area under the curve less
than 0.44 and greater than 0.56.
23
Coke vs Pepsi
3
0
Density
Coke vs Pepsi
6
0
Density
10
Variance unknown
But if we have a sample of reasonable size (say, more than 30), then we
can use a plug-in estimate without much inaccuracy.
That is, we use the empirical standard deviation (the sample standard
deviation) as if it were our known standard deviation: we treat
as if it
were .
26
27
0.10
0.00
0.05
Density
0.15
60
65
70
75
80
85
Z scores
In normal hypothesis tests where the rejection region is in the tail, were
essentially measure the distance of our observed measurement from the
mean under the null distribution. How far is too far is determined by
the level of our test and by the standard deviation under the null.
To get a sense of how far into the tail an observation is, we can
standardize our observation.
If X N(, 2 ), then X
N(0, 1). Applying this idea to a normal test
statistic tells us how many standard deviations away from the mean our
observed value is.
In this last example we would get z =
x67
12.6/ 35
= 2.82.
29
Z scores
The usefulness of this approach is mainly that we can remember a few
special rejection regions.
P(Z > 2.33) = 1%
P(Z > 1.64) = 5%
P(Z > 1.28) = 10%
This defines rejection regions for one-sided tests at those 1%, 5% and
10% respectively. (Include a negative sign as the circumstances require.)
The analogous two-sided thresholds are given by
32
(5 3) 0
= 1.933
z=p
42 /34 + 62 /60
Difference in proportions
Suppose we try to address the Coke/Pepsi local market share with a
different kind of survey in which we conduct two separate polls and ask
each person either Do you regularly drink Coke? or Do you regularly
drink Pepsi?.
With this set up we want to know if pX = pY .
Suppose we ask 40 people the Coke question and 53 people the Pepsi
question. In this case the observed difference in proportions has
approximate distribution
D N(0, s 2 )
where
r
s=
p1 (1 p1 ) p2 (1 p2 )
+
.
40
53
34
Difference in proportions
In practice we have to use
r
s =
p1 (1 p1 ) p2 (1 p2 )
+
.
40
53
If 30/40 people say that they regularly drink Coke and 30 out of 53
people say they regularly drink Pepsi, do we reject the null hypothesis at
the 10% level?
We find
r
s =
so z =
(0.750.566)0
s
= 1.905.
Paired samples
As a final variant, sometimes data from two groups comes paired, which
changes yet again how we approximate our variance term.
36
Paired sample
Because the samples are paired (in the sense of each happening on the
same day) we can directly approximate the variance of the difference
Di = Xi Yi by
n
X
2
D = n1
(di d)
i=1
where di = xi yi .
So the extra bit of information we need is that the standard deviation of
the daily difference between the commute times was 4 minutes.
In this case we have, under the null,
D = X Y N(0, 42 /28)
and the observed difference is d = 1.4 leads to z =
cannot reject at the 5% level.
d0
4/ 28
= 1.85. We
37
38
Power
1.0
0.6
0.4
0.2
Probability of rejecting
0.8
Two-sided
One-sided right tail
se = 1
0.0
5% level
-4
-2
Underlying mean
The probability of rejecting the null hypothesis is called the power of the
test. It will depend on the actual underlying value. The level of a test is
precisely the power of the test when the null hypothesis is true.
39
0.6
0.4
Two-sided
One-sided right tail
0.2
Probability of rejecting
0.8
1.0
Power
se = 0.1
0.0
5% level
-2
-1
Underlying mean
The power function gets more pointed around the null hypothesis value
as the sample size gets larger (which makes the standard error smaller).
40
The last 25 production days saw very low-production numbers. Has the
production facility has changed: has the N(15, 32 ) description of the
production variability become something left skewed?
41
1%
-1.03
2.5%
-0.87
5%
-0.72
How many total carts were produced in the last production period?
What is the statistic used in this hypothesis test?
42
0.5%
-1.15
1%
-1.03
2.5%
-0.87
5%
-0.72
43
Confidence intervals
One way to guarantee this is to make our intervals huge, but we can be
more clever by using ideas from hypothesis testing.
44
Confidence intervals
We consider normal confidence intervals for simplicity. Let X N(, 2 ).
We know the following facts (with a bit of algebra)
Confidence interval
x 1.96 .
n
46
Confidence interval
If we use a different number instead of 1.96, we can get different levels of
coverage. For instance, an interval of the form
x 1.64
has 90% coverage.
You will notice a straightforward relationship between confidence
intervals and hypothesis tests:
48
0.2
0.1
0.0
Density
0.3
0.4
0 ! 3!
! 2!
!!
+!
+ 2!
+ 3!
49
Simulation demo
50
Bootstrapping
51
Bootstrapping a mean
2
0
Frequency
50
60
70
80
90
Height in inches
1.96(12.6)
.
35
52
Bootstrap samples
Here is the code.
68
70
72
74
76
78
Inches