You are on page 1of 40

Simple Comparative Experiments

Comparing Two Means

Design of Experiments - Montgomery


Section 2-4 and 2-5
STAT 514

Statistical Inference
Will review two common types of statistical inference
Hypothesis testing
Confidence intervals

Will consider data collected under two different experi-


mental designs
Completely randomized design (CRD)
Randomized complete block design (RCBD)

Method of analysis depends on the design used

Comparing Two Means 2


STAT 514

Completely Randomized Design


Our focus now is on two treatments and one population
Experimental units obtained from the population
Randomly assign treatments to each experimental unit
Key: All treatment allocations are equally likely
Each EU has the same allocation probabilities
If n1 to be assigned Trt 1 and n2 to be assigned Trt 2
P (EU is assigned Trt 1) = n1 /(n1 + n2 )

Response data typically denoted using subscripts


Treatment 1 y11 , y12 , . . . , y1n1
Treatment 2 y21 , y22 , . . . , y2n2

Comparing Two Means 3


STAT 514

Two-sample t Test
Interest in comparing mean responses to Trt 1 and Trt 2
H0 : 1 = 2 (Null Hypothesis)

<
H1 : 1 > 2 (Alternative Hypothesis)
6=
Compute t statistic
s
S12 S22
to = (y 1 y 2 )/ +
n1 n2
Is observed to unusual if H0 : 1 = 2 ?

Comparing Two Means 4


STAT 514

Assumptions
1. Independent responses
2. Normally distributed observations
Assuming H0 : 1 = 2 , the sampling distribution of to
is approximately t distributed with k degrees of freedom.
Why is it approximate?
k typically estimated using a Satterthwaite approximation
Unusual quantified as the probability that a randomly
drawn t is more extreme then to (tail region)
Reject the null hypothesis when when this statistic (the
P -value) is small. Small typically based on choice of
significance level
Comparing Two Means 5
STAT 514

Equal Variance t test


One can assume variances of treatment response are
equal
In that case, we pool our two variance estimates
r
1 1 (n1 1)S12 + (n2 1)S22
to = (y 1 y 2 )/Sp + , Sp2 =
n1 n2 n1 + n2 2

Then assuming H0 : 1 = 2 results in to being t


distributed with n1 + n2 2 degrees of freedom
This method is generally not recommended. Used to be
more popular when the P -value was determined from t
tables rather than a computer

Comparing Two Means 6


STAT 514

Example
(Samuels 7.36) In a study of lettuce growth, ten seedlings
were randomly allocated to be grown in either a standard
nutrient solution or in a solution containing extra
nitrogen. After 22 days, the plants were harvested and
weighed. The table below summarizes the results. Can we
conclude that extra nitrogen enhances growth?

Leaf Dry Weight (gm)


Nutrient Solution n Mean SD
Standard 5 3.62 0.54
Extra 5 4.17 0.67

Comparing Two Means 7


STAT 514

Solution I
The test statistic is
4.17 3.62
to = q = 1.43
0.542 0.672
5
+ 5

The degrees of freedom are approximated by


(0.542 /5 + 0.672 /5)2
df =
(0.542 /5)2 (0.672 /5)2
51
+ 51
= 7.6546

This question asks about a one-sided alternative. With 7.655


degrees of freedom, the P -value equals 0.09625. At the
= 0.05 significance level, we would not reject the null
and conclude that there is not sufficient evidence to state
that extra nitrogen enhances growth.

Comparing Two Means 8


STAT 514

Solution II
The pooled variance is
S2p = (4(.54)2 + 4(.67)2 )/8 = 0.37

Our test statistic is


p
to = (4.17 3.62)/ 2(.37)/5 = 1.43

This question asks about a one-sided alternative. With 8


degrees of freedom, the P -value equal 0.09502. If = 0.10,
we would reject the null and conclude that extra nitrogen
enhances growth. For = 0.05, we would not reject the
null and conclude there is not sufficient evidence to state
that the extra nitrogen enhances growth.

Comparing Two Means 9


STAT 514

Statistical Model Approach


For pooled two-sample t test, can express model as


i = 1, 2
yij = + i + ij
j = 1, 2, . . . n
i

where
+ i = mean for treatment i
ij are iid random errors . . . ij N (0, 2 )
Can express Null in terms of treatment effects 1 and 2
H 0 : 1 = 2 = 0
H1 : at least one i different than 0

Comparing Two Means 10


STAT 514

Will use these linear model representations


throughout the course

Comparing Two Means 11


STAT 514

Using SAS - t Test


tandpermtest.sas
data lettuce;
input solution $ weight @@;
cards;
Standard 3.25 Standard 3.83 Standard 4.28
Standard 2.91 Standard 3.83 Extra 4.85
Extra 3.30 Extra 4.23 Extra 4.76 Extra 3.71
;

proc ttest;
class solution;
var weight;
run;

Comparing Two Means 12


STAT 514

Output
The TTEST Procedure
Variable: weight

solution N Mean Std Dev Std Err Minimum Maximum


Extra 5 4.1700 0.6676 0.2985 3.3000 4.8500
Standard 5 3.6200 0.5396 0.2413 2.9100 4.2800
Diff (1-2) 0.5500 0.6070 0.3839

solution Method Mean 95% CL Mean Std Dev


Extra 4.1700 3.3411 4.9989 0.6676
Standard 3.6200 2.9500 4.2900 0.5396
Diff (1-2) Pooled 0.5500 -0.3352 1.4352 0.6070
Diff (1-2) Satterthwaite 0.5500 -0.3421 1.4421

Method Variances DF t Value Pr > |t|


Pooled Equal 8 1.43 0.1898
Satterthwaite Unequal 7.6633 1.43 0.1914

Comparing Two Means 13


STAT 514

Using SAS - Permutation Test


proc multtest data=lettuce permutation nsample=1000 outsamp=pdist seed=612;
test mean(weight);
contrast Extra vs Standard 1 -1;
class solution;
*********Generate Histogram of Sampling Distribution***********;
ods graphics off; ods exclude all; ods noresults;
proc glm data=pdist;
class _class_;
model weight = _class_;
by _sample_;
estimate Extra vs Standard _class_ 1 -1;
ods output Estimates=pdist1;
run;
ods graphics on; ods exclude none; ods results;

proc univariate noprint;


histogram tValue;
run;

Comparing Two Means 14


STAT 514

Output
The Multtest Procedure
Model Information
Test for continuous variables Mean t-test
Degrees of Freedom Method Pooled
Tails for continuous tests Two-tailed
Strata weights None
P-value adjustment Permutation
Center continuous variables No
Number of resamples 1000
Seed 612

p-Values
Variable Contrast Raw Permutation
weight Extra vs Standard 0.1898 0.2390

Comparing Two Means 15


STAT 514

Approximate Sampling Distribution

Comparing Two Means 16


STAT 514

Type I and Type II errors


In hypothesis testing, two types of errors are possible
TEST RESULT
DNR R
R
E H0 I
A
L
I
T H1 II
Y

Type I error: P(reject H0 |H0 true) (false positive)

Type II error: P(do not reject H0 |H0 false) (false negative)

Power of a test (for specific H1 ) is 1P(Type II error)


Significance level is our selected P(Type I error)

Comparing Two Means 17


STAT 514

Choice of Sample Size/Power

P(Type I)
Null hypothesis

-4 -2 0 2 4

P(Type II)
Alternative hypothesis

-4 -2 0 2 4

Goal of test: Detect diff of size = |1 2 | with high prob


Choice of based on science (e.g., practical significance)
Probability to detect difference is the power
Power depends on , , , and n

Comparing Two Means 18


STAT 514

Calculating Power
Assume is known (i.e., use Normal) and n1 = n2 = n

H0 : y 1 y 2 N (0, 2 2 /n)

H1 : y 1 y 2 N (, 2 2 /n)
Reject if (using the H0 sampling dist)
p p
y 1 y 2 > z/2 2 2 /n or y 1 y 2 < z/2 2 2 /n

Power: (using the H1 sampling dist and the rejection region)

p p
P(z > z/2 / 2 2 /n) + P(z < z/2 / 2 2 /n)

Comparing Two Means 19


STAT 514

Example I
Suppose = .05, 2 =12.5, n = 25, and = 3.5

z/2 = 1.96 and 2 2 /25 = 1

Power = P(z > 1.96 3.5) + P(z < 1.96 3.5)

= .9382 + .0000

If n were only 10,

Power = P(z > 1.96 2.214) + P(z < 1.96 2.214)

= .6001 + .0000

The larger the sample size, the greater the power

Comparing Two Means 20


STAT 514

Power Calculations ( unknown)

Can reference an Operating Characteristic Curve (Power


curve) for different levels of , /, and n
Can use non-central and central t distribution functions
Reject if:
p p
y 1 y 2 > t/2,2(n1) 2Sp2 /n or y 1 y 2 < t/2,2(n1) 2Sp2 /n

Power: P(reject | H1 )
y1 y2
 p 
p t2(n1) / 2 2 /n
2Sp2 /n
p
Noncentral parameter / 2 2 /n

Compute probability of rejection given noncentral t

Comparing Two Means 21


STAT 514

Using SAS - Example on Page 47


tpower.sas
proc power;
twosamplemeans alpha=.05 nulldiff=0 sides=2
meandiff=.5 npergroup=. stddev=.25
power=.95;
run;

proc power;
twosamplemeans alpha=.05 nulldiff=0 sides=2
meandiff=.25 npergroup=. stddev=.25
power=.90;
run;

proc power;
twosamplemeans alpha=.05 nulldiff=0 sides=2
meandiff=.25 stddev=.25 power=.
npergroup=2 to 25 by 1;
plot interpol=join yopts=(ref=0.80);
run;

Comparing Two Means 22


STAT 514

Output
The POWER Procedure
Two-Sample t Test for Mean Difference

Fixed Scenario Elements


Distribution Normal
Method Exact
Number of Sides 2
Null Difference 0
Alpha 0.05
Mean Difference 0.5
Standard Deviation 0.25
Nominal Power 0.95

Computed N Per Group

Actual N Per
Power Group
0.960 8

Comparing Two Means 23


STAT 514

Output
The POWER Procedure
Two-Sample t Test for Mean Difference

Fixed Scenario Elements


Distribution Normal
Method Exact
Number of Sides 2
Null Difference 0
Alpha 0.05
Mean Difference 0.25
Standard Deviation 0.25
Nominal Power 0.9

Computed N Per Group

Actual N Per
Power Group
0.912 23

Comparing Two Means 24


STAT 514

Comparing Two Means 25


STAT 514

Confidence Intervals
In addition to an estimate, want statement of precision
100(1-)% confidence intervals
q
tn1 +n2 2,/2 Sp 1/n1 + 1/n2
q
tdf,/2 S12 /n1 + S22 /n2

If underlying assumptions are true, will fall within


100(1-)% of the possible intervals
You are 100(1-)% confident your single CI is one of
these intervals that contains

Comparing Two Means 26


STAT 514

Hypothesis Testing and CIs

Consider a two-sided hypothesis test at level


p
Reject if |y 1 y 2 | > tn1 +n2 2,/2 Sp 1/n1 + 1/n2

Now consider a 100(1 )% CI


p
Half-width of CI is tn1 +n2 2,/2 Sp 1/n1 + 1/n2
p
0 not in interval if |y 1 y 2 | > tn1 +n2 2,/2 Sp 1/n1 + 1/n2

Will reject H0 if 0 not in confidence interval

Allows us to immediately test any H0 : = 0 at level

Comparing Two Means 27


STAT 514

Sample Size Determination

Want the sample size for a desired precision/half-width


Consider equal variance and n1 = n2 = n
Half-width of CI depends on Sp and n
Problem: Sp is a random variable (depends on sample)
Solution: Specify desired probability that the observed
half width is less than or equal to the desired precision

Comparing Two Means 28


STAT 514

Using SAS
tpower.sas
proc power;
twosamplemeans ci=diff alpha=.05 halfwidth=.25 stddev=.25
npergroup=. probwidth=0.80;
run;

proc power;
twosamplemeans ci=diff alpha=.05 halfwidth=.25 stddev=.25
npergroup=. probwidth=0.50;
run;

Comparing Two Means 29


STAT 514

Output
The POWER Procedure
Confidence Interval for Mean Difference
Fixed Scenario Elements
Distribution Normal
Method Exact
Alpha 0.05
CI Half-Width 0.25
Standard Deviation 0.25
Nominal Prob(Width) 0.8
Number of Sides 2
Prob Type Conditional

Actual
Prob N Per
(Width) Group
0.801 11

Comparing Two Means 30


STAT 514

Output
The POWER Procedure
Confidence Interval for Mean Difference
Fixed Scenario Elements
Distribution Normal
Method Exact
Alpha 0.05
CI Half-Width 0.25
Standard Deviation 0.25
Nominal Prob(Width) 0.5
Number of Sides 2
Prob Type Conditional

Actual
Prob N Per
(Width) Group
0.533 9

Comparing Two Means 31


STAT 514

Paired Comparison
Can often improve precision by pairing similar EUs
Removes variation between EUs
Twins for drug/health studies - subject variability
Same tissue specimen given both trts - specimen variability
Similar plots in a field - plot variability
Because of the pairing, we perform inference on the pair
differences di . This reduces the 2n observations into n
indep differences
1
di = y1i y2i Sd2 = (di d)2
P
n1

to = d/(Sd / n)
to tn1

Comparing Two Means 32


STAT 514

Randomized Complete Block Design

Design allows you to remove known sources of variability that


you are not interested in and therefore improve precision

Group EUs into blocks such that the EUs in a block are as
similar as possible. EUs across blocks can be very different.

Randomly assign treatments to EUs within a block using


process similar to CRD

Creates a restriction on the possible allocations...fewer


possible allocations than with CRD

Pairs of EUs is the smallest block size

Comparing Two Means 33


STAT 514

Statistical Model
Pairing included as block effect (j ) in linear model

i = 1, 2
yij = + i + j + ij
j = 1, 2, . . . n

E(y 1. y 2. ) is still 1 2 as the s cancel


Lose degrees of freedom because of block effects
Two sample Paired
V(y1. y2. ): 2 2 /n 2( 2 Cov(Y1 , Y2 ))/n
DF: 2(n 1) (n 1)

Pairing advantageous when positive correlation. If correlation


slight relative to 2 , then loss of dfs may result in blocking
being a disadvantage. Only block on known sources of
variability that you want to remove.
Comparing Two Means 34
STAT 514

Example
Paired T-test/Randomization Paired Test

In a study of egg cell maturation, the eggs from each of four


female frogs were divided into two batches and one batch was
exposed to progesterone. After two minutes, the cAMP content
was measured. It is believed that cAMP is a substance that can
mediate cellular response to hormones.
FROG cAMP Content
Control Progesterone Diff
1 6 4 2
2 4 5 -1
3 5 2 3
4 4 2 2

Comparing Two Means 35


STAT 514

Solutions
t-test: d = {2, 1, 3, 2} d = 1.5 and sd = 0.866. The test
statistic is 1.732. With 3 df, the P-value is close to 0.18.

randomization: Assume the difference does not depend on


the allocation of treatments. There are 24 = 16 allocations.
The observed outcome is 2 1 + 3 + 2 = 6.
P
| d| # of occurrences
8 2
6 2
4 4
2 6
0 2
From the table, there are four of sixteen outcomes as or more unlikely simply
due to chance. Thus the P -value is 0.25.

Comparing Two Means 36


STAT 514

Using R
- Using simulation to approximate the P-value -
#### t-test ####
diff <- c(2,-1,3,2)
tdiff <- (mean(diff)/sqrt(var(diff)/4))
pvalue <- 2*pt(-abs(tdiff),3)

#### permutation test ####


absdiff <- abs(diff)
tdist <- numeric(length=10000)
for(iperm in 1:10000)
{
randassign <- 2*rbinom(4,1,.5)-1 ##Creates set of +s and -s
diff1 <- absdiff*randassign
tdist[iperm] <- (mean(diff1)/sqrt(var(diff1)/4)) ##Creates sampling dist of t statistics
}

hist(tdist,nclass=530)
pvalue1 <- length(tdist[abs(tdist) >= abs(tdiff)])/10000
print(c(pvalue,pvalue1))

Comparing Two Means 37


STAT 514

Histogram of tdist

1500
Frequency

1000
500
0

4 2 0 2 4

tdist

Comparing Two Means 38


STAT 514

Using SAS
data camp;
input frog ctrl prog;
diff = prog - ctrl;
cards;
1 6 4
2 4 5
3 5 2
4 4 2
;

proc ttest;
paired ctrl*prog;
run;

Comparing Two Means 39


STAT 514

Output
The TTEST Procedure

Difference: ctrl - prog

N Mean Std Dev Std Err Minimum Maximum


4 1.5000 1.7321 0.8660 -1.0000 3.0000

Mean 95% CL Mean Std Dev 95% CL Std Dev


1.5000 -1.2561 4.2561 1.7321 0.9812 6.4580

DF t Value Pr > |t|


3 1.73 0.1817

Comparing Two Means 40

You might also like