Testing Hypotheses

Testing of Hypotheses
Presented By:
Muhammad Shahid Sharif
COMSATS (Sahiwal)
Hypothesis Testing
z Goal: Make statement(s) regarding unknown population

parameter values based on sample data
z Elements of a hypothesis test:
z Null hypothesis - Statement regarding the value(s) of
unknown parameter(s). Typically will imply no association
between explanatory and response variables in our applications
(will always contain an equality)
z Alternative hypothesis - Statement contradictory to the null
hypothesis (will always contain an inequality)
z Test statistic - Quantity based on sample data and null
hypothesis used to test between null and alternative
hypotheses
z Rejection region - Values of the test statistic for which we
reject the null in favor of the alternative hypothesis
Hypothesis Testing
Test Result – H0 True H0 False
True State
H0 True Correct Type I Error
Decision
H0 False Type II Error Correct
Decision
α = P (Type I Error ) β = P (Type II Error )
• Goal: Keep α, β reasonably small

Example - Efficacy Test for New drug
z Drug company has new drug, wishes to compare
it with current standard treatment
z Federal regulators tell company that they must
demonstrate that new drug is better than current
treatment to receive approval
z Firm runs clinical trial where some patients
receive new drug, and others receive standard
treatment
z Numeric response of therapeutic effect is
obtained (higher scores are better).
z Parameter of interest: μNew - μStd
z Null hypothesis - New drug is no better than standard trt
H 0 : μ New − μ Std ≤ 0 (μ New − μ Std = 0)
• Alternative hypothesis - New drug is better than standard trt
H A : μ New − μ Std > 0

• Experimental (Sample) data:
y New y Std
s New sStd
nNew nStd
Sampling Distribution of Difference in Means
z In large samples, the difference in two sample

means is approximately normally distributed:
⎛ σ 2
σ 2 ⎞
Y 1 − Y 2 ~ N ⎜ μ1 − μ 2 , 1
+ 2 ⎟
⎜ n n ⎟
⎝ 1 2 ⎠
• Under the null hypothesis, μ1-μ2=0 and:
Y1 −Y 2
Z= ~ N (0,1)
σ 2
σ 2
1
+ 2
n1 n2
• σ12 and σ22 are unknown and estimated by s12 and s22
z Type I error - Concluding that the new drug is better

than the standard (HA) when in fact it is no better (H0).
Ineffective drug is deemed better.
z Traditionally α = P(Type I error) = 0.05
z Type II error - Failing to conclude that the new drug is

better (HA) when in fact it is. Effective drug is deemed
to be no better.
z Traditionally a clinically important difference (Δ) is
assigned and sample sizes chosen so that:
β = P(Type II error | μ1-μ2 = Δ) ≤ .20
Elements of a Hypothesis Test
z Test Statistic - Difference between the Sample means,
scaled to number of standard deviations (standard errors)
from the null difference of 0 for the Population means:
y1 − y 2
T .S . : zobs =
s12 s22
+
n1 n2
• Rejection Region - Set of values of the test statistic that are

consistent with HA, such that the probability it falls in this
region when H0 is true is α (we will always set α=0.05)
R.R. : zobs ≥ zα α = 0.05 ⇒ zα = 1.645

P-value (aka Observed Significance Level)
z P-value - Measure of the strength of evidence the

sample data provides against the null hypothesis:
P(Evidence This strong or stronger against H0 | H0 is
true)
P − val : p = P ( Z ≥ zobs )
Large-Sample Test H0:μ1-μ2=0 vs H0:μ1-μ2>0
z H0: μ1-μ2 = 0 (No difference in population means

z HA: μ1-μ2 > 0 (Population Mean 1 > Pop Mean 2)
y1 − y 2
• T . S . : z obs =
s 12 s 22
+
n1 n2
• R . R . : z obs ≥ z α
• P − value : P ( Z ≥ z obs )
• Conclusion - Reject H0 if test statistic falls in rejection region,

or equivalently the P-value is ≤ α
Example - Botox for Cervical
Dystonia
z Patients - Individuals suffering from cervical dystonia

z Response - Tsui score of severity of cervical dystonia
(higher scores are more severe) at week 8 of Tx
z Research (alternative) hypothesis - Botox A
decreases mean Tsui score more than placebo
z Groups - Placebo (Group 1) and Botox A (Group 2)
z Experimental (Sample) Results:
y1 = 10.1 s1 = 3.6 n1 = 33
y 2 = 7.7 s2 = 3.4 n2 = 35
Source: Wissel, et al (2001)
Example - Botox for Cervical
Dystonia
Test whether Botox A produces lower mean Tsui

scores than placebo (α = 0.05)
• H 0 : μ1 − μ 2 = 0
• H A : μ1 − μ 2 > 0
10.1 − 7.7 2.4
• T .S . : zobs = = = 2.82
2
(3.6) (3.4) 2 0.85
+
33 35
• R.R. : zobs ≥ zα = z.05 = 1.645
• P − val : P ( Z ≥ 2.82) = .0024
Conclusion: Botox A produces lower mean Tsui scores than

placebo (since 2.82 > 1.645 and P-value < 0.05)
2-Sided Tests
z Many studies don’t assume a direction wrt the
difference μ1-μ2
z H0: μ1-μ2 = 0 HA: μ1-μ2 ≠ 0
z Test statistic is the same as before
z Decision Rule:
z Conclude μ1-μ2 > 0 if zobs ≥ zα/2 (α=0.05 ⇒ zα/2=1.96)
z Conclude μ1-μ2 < 0 if zobs ≥ -zα/2 (α=0.05 ⇒ -zα/2= -1.96)
z Do not reject μ1-μ2 = 0 if -zα/2 ≤ zobs ≤ zα/2
z P-value: 2P(Z≥ |zobs|)
Power of a Test
z Power - Probability a test rejects H0 (depends on μ1-

μ2)
z H0 True: Power = P(Type I error) = α
z H0 False: Power = 1-P(Type II error) = 1-β
· Example:
· H0: μ1- μ2 = 0 HA: μ1- μ2 > 0
• σ12 = σ22 = 25 n1 = n2 = 25
· Decision Rule: Reject H0 (at α=0.05 significance level) if:
y1 − y 2 y1 − y 2
z obs = = ≥ 1 .645 ⇒ y 1 − y 2 ≥ 2 .326
σ 2
σ 2
2
1
+ 2
n1 n2
Power of a Test
z Now suppose in reality that μ1-μ2 = 3.0 (HA is true)
z Power now refers to the probability we (correctly)
reject the null hypothesis. Note that the sampling
distribution of the difference in sample means is
approximately normal, with mean 3.0 and standard
deviation (standard error) 1.414.
z Decision Rule (from last slide): Conclude population
means differ if the sample mean for group 1 is at
least 2.326 higher than the sample mean for group 2
z Power for this case can be computed as:
P (Y 1 − Y 2 ≥ 2.326) Y 1 − Y 2 ~ N (3, 2.0 = 1.414)

Power of a Test
2.326− 3
Power= P(Y 1 − Y 2 ≥ 2.326) = P(Z ≥ = −0.48) = .6844
1.41
• All else being equal:
• As sample sizes increase, power increases

• As population variances decrease, power increases
• As the true mean difference increases, power increases
Power of a Test
Distribution (H0) Distribution (HA)
Power of a Test
Power Curves for group sample sizes of 25,50,75,100 and

varying true values μ1-μ2 with σ1=σ2=5.
• For given μ1-μ2 , power increases with sample size
• For given sample size, power increases with μ1-μ2
Sample Size Calculations for Fixed Power
z Goal - Choose sample sizes to have a favorable
chance of detecting a clinically meaning difference
z Step 1 - Define an important difference in means:
z Case 1: σ approximated from prior experience or pilot
study - dfference can be stated in units of the data
z Case 2: σ unknown - difference must be stated in units
of standard deviations of the data
μ1 − μ 2
δ=
σ
• Step 2 - Choose the desired power to detect the the clinically
meaningful difference (1-β, typically at least .80). For 2-sided test:
2(zα / 2 + z β )
2
n1 = n2 =
δ2
Example - Rosiglitazone for HIV-1 Lipoatrophy
z Trts - Rosiglitazone vs Placebo

z Response - Change in Limb fat mass
z Clinically Meaningful Difference - 0.5 (std dev’s)
z Desired Power - 1-β = 0.80
z Significance Level - α = 0.05
zα / 2 = 1.96 z β = z.20 = .84

2(1.96 + 0.84 )
2
n1 = n2 = 2
= 63
(0.5)
Source: Carr, et al (2004)
Confidence Intervals
z Normally Distributed data - approximately 95% of
individual measurements lie within 2 standard
deviations of the mean
z Difference between 2 sample means is
approximately normally distributed in large
samples (regardless of shape of distribution of
individual measurements):
⎛ 2 ⎞
σ 2
σ
Y 1 − Y 2 ~ N ⎜ μ1 − μ 2 , 1 + 2 ⎟
⎜ n n ⎟
⎝ 1 2 ⎠
• Thus, we can expect (with 95% confidence) that our sample
mean difference lies within 2 standard errors of the true difference
(1-α)100% Confidence Interval for μ1-μ2
• Large sample Confidence Interval for μ1-μ2:
(y )
2 2
s s
1 − y 2 ± zα / 2 1
+ 2
n1 n2
• Standard level of confidence is 95% (z.025 = 1.96 ≈ 2)
• (1-α)100% CI’s and 2-sided tests reach the same
conclusions regarding whether μ1-μ2= 0
Example - Viagra for ED
z Comparison of Viagra (Group 1) and Placebo (Group 2) for
ED
z Data pooled from 6 double-blind trials
z Subjects - White males
z Response - Percent of succesful intercourse attempts in past
4 weeks (Each subject reports his own percentage)
y1 = 63.2 s1 = 41.3 n2 = 264

y 2 = 23.5 s2 = 42.3 n2 = 240
95% CI for μ1- μ2:

(41.3)2 (42.3)2
(63.2 − 23.5) ±1.96 + ≡ 39.7 ± 7.3 ≡ (32.4,47.0)
264 240
Source: Carson, et al (2002)

Testing Hypotheses

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Testing Hypotheses

Uploaded by

Copyright:

Available Formats

Testing of Hypotheses

z Goal: Make statement(s) regarding unknown population

Test Result – H0 True H0 False

α = P (Type I Error ) β = P (Type II Error )

• Goal: Keep α, β reasonably small

H 0 : μ New − μ Std ≤ 0 (μ New − μ Std = 0)

• Alternative hypothesis - New drug is better than standard trt

H A : μ New − μ Std > 0

z In large samples, the difference in two sample

• Under the null hypothesis, μ1-μ2=0 and:

z Type I error - Concluding that the new drug is better

z Type II error - Failing to conclude that the new drug is

• Rejection Region - Set of values of the test statistic that are

R.R. : zobs ≥ zα α = 0.05 ⇒ zα = 1.645

z P-value - Measure of the strength of evidence the

z H0: μ1-μ2 = 0 (No difference in population means

• Conclusion - Reject H0 if test statistic falls in rejection region,

z Patients - Individuals suffering from cervical dystonia

Test whether Botox A produces lower mean Tsui

Conclusion: Botox A produces lower mean Tsui scores than

z Power - Probability a test rejects H0 (depends on μ1-

P (Y 1 − Y 2 ≥ 2.326) Y 1 − Y 2 ~ N (3, 2.0 = 1.414)

• All else being equal:

• As sample sizes increase, power increases

Power Curves for group sample sizes of 25,50,75,100 and

z Trts - Rosiglitazone vs Placebo

zα / 2 = 1.96 z β = z.20 = .84

• Large sample Confidence Interval for μ1-μ2:

y1 = 63.2 s1 = 41.3 n2 = 264

95% CI for μ1- μ2:

You might also like