You are on page 1of 94

Applied Statistics II

Chapter 11 Distribution-Free Inference

Jian Zou

WPI Mathematical Sciences

1 / 94
Introduction

The Sign Test

Rank-Based Methods

2 / 94
The Need for Distribution-Free Statistical Methods

Though they do perform well under normal circumstances, classical


inference methods, such as those considered in Chapters 5
(confidence intervals) and 6 (hypothesis tests), do have some
weaknesses, particularly when sample sizes are small:
I Classical inference methods rely on assumptions about the
underlying distribution model for the data. Usually, they
assume the distribution is one of a particular class of
distributions such as the class of all normal distributions.
I These methods can be sensitive to outliers and nonnormality.

3 / 94
The Need for Distribution-Free Statistical Methods

In this chapter we present some inference methods which were


developed to help remedy the weaknesses of classical inference
methods. Because these methods do not rely on a specific
population distribution model, such as the normal, they are called
distribution-free inference methods.

As we did in Chapters 5 and 6, we begin by assuming that the data


constitute a random sample from a population to which we wish to
make inference.

4 / 94
The Sign Test

It is reported that a variant of the sign test was used by


Arbuthnott in 1710 to test whether the proportion of male births
in London was greater than 1/2. If so, the sign test is one of the
oldest of all statistical procedures.

5 / 94
The Sign Test

I The Statistical Model We assume data Y1 , . . . , Yn from a


population with a continuous distribution1 having median θ.

I The Statistical Hypotheses H0 : θ = θ0 , versus one of the


following:
Ha + : θ > θ0
H a− : θ < θ0
Ha± : θ 6 = θ0

1
Continuous means that we regard the proportion of the population taking
any single value to be 0.
6 / 94
The Sign Test

I The Test Statistic The test statistic for the sign test is the
number of observations Yi which exceed θ0 . We will call this
statistic B.
The observed value of B, b ∗ , is the number we get when we
plug the data values into B.
I The p-value The p-value is the proportion of all samples,
when the null hypothesis is true, that have a value of the test
statistic giving as much or more evidence against the null
hypothesis and in favor of the alternative hypothesis as b ∗ .

7 / 94
The Sign Test

Thus, in order to compute the p-value, we need to know:


I The distribution of B when H0 is true.
I When the test statistic gives as much or more evidence
against the null hypothesis and in favor of the alternative
hypothesis as b ∗ .

8 / 94
The Sign Test

I The Distribution of B When H0 is True: If H0 is true, the


chance any of the Yi is greater than θ0 is 1/2. Therefore, B,
the number of Yi greater than θ0 , has the same distribution
as the number of heads in n tosses of a fair coin.

We know this distribution very well: b(n, 0.5).

9 / 94
The Sign Test

I What Is Evidence Against the Null and in Favor of the


Alternative Hypothesis?
I For Ha+ : θ > θ0 , large values of B are more consistent with
Ha+ and less consistent with H0 . Therefore, the p-value is
p + = P(B ≥ b ∗ ).
I For Ha− : θ < θ0 , the p-value is p− = P(B ≤ b ∗ ).
I For Ha± : θ 6= θ0 , the p-value is p± = 2 min(p + , p− )

10 / 94
Large Sample Approximation
By the Central Limit Theorem, we know that if B ∼ b(n, p) and if
n is large, (np > 10 and n(1 − p) > 10) then, approximately,

B − np
p ∼ N(0, 1).
np(1 − p)

To obtain an approximate p-value using the normal approximation


with the continuity correction, we first assume H0 is true. Then
p = 0.5, and np = 0.5n and np(1 − p) = 0.25n. If the alternative
hypothesis is Ha+ , the p-value is

p + = P(B ≥ b ∗ ) = P(B ≥ b ∗ − 0.5)


b ∗ − 0.5n − 0.5
 
B − 0.5n
= P √ ≥ √ ≈ P(Z ≥ zu∗ ),
0.25n 0.25n
where
b ∗ − 0.5n − 0.5
Z ∼ N(0, 1), and zu∗ = √ .
0.25n
11 / 94
Large Sample Approximation

Similarly, we may approximate p− by P(Z ≤ zl∗ ), where

b ∗ − 0.5n + 0.5
zl∗ = √ .
0.25n
We then approximate p± by 2 min(P(Z ≥ zu∗ ), P(Z ≤ zl∗ )).

12 / 94
Example 1

One stage of a manufacturing process involves a


manually-controlled grinding operation. Management suspects that
the grinding machine operators tend to grind parts slightly larger
rather than slightly smaller than the target diameter of 0.75 inches,
while still staying within specification limits. To verify their
suspicion, they sample 150 within-spec parts and find that 93 have
diameters above the target diameter. Is this strong evidence in
support of their suspicion?

13 / 94
Example 1

I The Scientific Hypothesis The grinding machine operators


tend to grind parts slightly larger than the target diameter.
I The Statistical Model The median diameter of the
population of all ground parts is θ.
I The Statistical Hypotheses
H0 : θ = 0.75
Ha+ : θ > 0.75
I The Test Statistic The test statistic is B, the number of
parts with diameters larger than the target of 0.75 inches. Its
observed value is b ∗ = 93.

14 / 94
Example 1

I The P-Value The p-value is P(B ≥ 93) = 0.0021, where


B ∼ b(150, 0.5).
Since np = n(1 − p) = 150 × 0.5 = 75 > 10, we can
approximate this p-value by using the normal approximation.

93 − (150)(0.5) − 0.5
zu∗ = p = 2.86,
(150)(0.5)(0.5)
so the approximate p-value is

P(Z ≥ 2.86) = 0.0021.

Result: reject H0 in favor of Ha+ and conclude that more than


half the parts are ground too large.

15 / 94
Estimation Based on the Sign Test

A point estimator for the population median θ, can be obtained by


reasoning as follows.

If θ = θ0 , the distribution of the test statistic B is symmetric about


its mean n/2. Therefore, an estimator of θ will be the amount θ̂
that we have to subtract from each data value Yi so that the value
of B computed from Y1 − θ̂, . . . , Yn − θ̂ equals n/2. It turns out
that this estimator, θ̂ is the median of the data values Y1 , . . . , Yn .

16 / 94
Estimation Based on the Sign Test

A level L confidence interval for θ based on the sign test consists


of all values θ0 for which the test of
H0 : θ = θ0
Ha± : θ 6= θ0
does not reject at the α = 1 − L significance level: that is, for
which p± ≥ α (this is called “ inverting the test”).

In a practical sense, an exact level L interval can seldom be


computed because of the discreteness of the binomial distribution,
but exact intervals can be computed for levels close to L. One
method for doing so is the following:

17 / 94
Estimation Based on the Sign Test

1. Find an integer k so that the value


α0 = 2P(X ≥ k) = 2P(X ≤ n − k) is as close to α as
possible, where X ∼ b(n, 0.5).
2. Order the observations from smallest to largest, denoting the
ordered values Y(1) ≤ Y(2) ≤ · · · ≤ Y(n) .
3. A level L0 = 1 − α0 confidence interval for θ is
(Y(n−k+1) , Y(k) ).
Large Sample Approximation:
For large n, we may take k in step 1. to be the integer closest to
r
n n
+ z 1+L .
2 2 2
The interval is then computed as described in steps 2. and 3.

18 / 94
Example 2

The median selling prices of 11 randomly-selected residential


properties in Worcester County are ($1000’s):

115, 179, 199, 222, 225, 247, 276, 319, 342, 543, 798

Using these data, we estimate the median house price of residential


properties in Worcester County to be $247,000: the median of the
data.

19 / 94
Example 2, Continued
To obtain a 95% confidence interval for the median house price,
we look for a value k so that for X ∼ b(11, 0.5), 2P(X ≤ 11 − k)
is a close as possible to 0.05. We can use a table, a calculator, the
software (R, SAS ...), or some other technology to do this.

What we find is that 2P(X ≤ 1) = 0.0117 and


2P(X ≤ 2) = 0.0654, so we can take either k = 10 k = 9. These
choices give

k (y(n−k+1) , y(k) ) confidence level


9 (y(3) , y(9) ) = (199, 342) 1 − 0.0654 = 0.9346
10 (y(2) , y(10) ) = (179, 543) 1 − 0.0117 = 0.9883

The large sample solution (though


p n = 11 is not exactly large
sample), gives k ≈ 11/2 + 1.96 11/2 = 10.09, which gives the
second interval above.
20 / 94
Rank-Based Procedures

As their title implies, rank-based procedures make use of the ranks


of observations instead of the observations themselves.

For example, here’s how we assign ranks to the set of observations


1.2, 3.7, 0.6, 3.7, 3.7, 9.7, −1.1, 9.7:

First, sort the values: −1.1 0.6 1.2 3.7 3.7 3.7 9.7 9.7
Next, assign integers 1 2 3 4 5 6 7 8
Finally, compute ranks
(note how the integers 1 2 3 5 5 5 7.5 7.5
are averaged for equal
data values)

21 / 94
The Wilcoxon Signed-Rank Test

I The Statistical Model We assume data Y1 , . . . , Yn from a


population with a symmetric distribution centered at θ. This
means θ is the median, and if it has one, the mean of the
population.

I The Statistical Hypotheses H0 : θ = θ0 , versus one of the


following:
Ha + : θ > θ0
H a− : θ < θ0
Ha± : θ 6 = θ0

22 / 94
The Wilcoxon Signed-Rank Test

I The Test Statistic To compute the test statistic, follow


these steps:
1. Center the observations by subtracting from each the
population median under H0 . The resulting centered
observations are Yi0 = Yi − θ0 , i = 1 . . . , n.
2. Compute the ranks of the absolute values of the centered
observations: Ri = rank(|Yi0 |), i = 1 . . . , n.
3. The test statistic W is the sum of the ranks Ri for those i
corresponding to positive Yi0 values:
X
W = Ri
{i:Yi0 >0}

23 / 94
The Wilcoxon Signed-Rank Test

I The P-Value Let w ∗ denote the observed value of the test


statistic W . Computation of the p-value depends on the
alternative hypothesis (P0 (A) denotes the proportion of
samples for which A occurs, given H0 is true):
Alternative
Hypothesis p-value
Ha+ : θ > θ0 p = P0 (W ≥ w ∗ )
+

Ha− : θ < θ0 p− = P0 (W ≤ w ∗ )
Ha± : θ 6= θ0 p± = 2 min(p + , p− )

24 / 94
Example 1, Revisited

One stage of a manufacturing process involves a


manually-controlled grinding operation. Management suspects that
the grinding machine operators tend to grind parts slightly larger
rather than slightly smaller than the target diameter of 0.75 inches,
while still staying within specification limits.

To illustrate the ideas, we will assume their random sample


consists of 3 rather than 150 parts.

They are willing to assume the distribution of ground diameters is


symmetric about a median θ. They wish to test the hypotheses
H0 : θ = 0.75
Ha + : θ > 0.75

25 / 94
Example 1, Revisited

Here are the computations for obtaining w ∗ , the observed value of


W:
Original Centered Absolute
Data Data Values Ranks Contribution
yi yi0 = yi − 0.75 |yi0 | ri = rank(|yi0 |) to w ∗
0.7533 0.0033 0.0033 2 2
0.7485 −0.0015 0.0015 1 0
0.7578 0.0078 0.0078 3 3

So w ∗ = 2 + 0 + 3 = 5.

26 / 94
Example 1, Revisited

The p-value is computed from the permutation distribution of


W : the distribution of all values of the test statistic that would
have resulted under an appropriate permutation of the original
data.

27 / 94
Example 1, Revisited

Here is the idea behind the permutation distribution:

I If H0 is true, the Yi0 are independent and from the same


distribution having median 0.
I Therefore, prior to sampling, each is equally likely to be
positive or negative.
I In fact, because of the symmetry assumption, Yi0 and −Yi0
have the same distribution, so it is just as likely that we
observe (Y10 , Y20 , Y30 ), or (−Y10 , Y20 , Y30 ), or (−Y10 , Y20 , −Y30 ), or
data resulting from any other rearrangement of the signs of
the Yi0 .

28 / 94
Example 1, Revisited

The table below shows the values of W resulting from each of the
8 different possible selections of signs for the Yi0 .

Signs Assumed Yi0 Contribution to W W


(Y10 , Y20 , Y30 ) (0.0033, −0.0015, 0.0078) (2,0,3) 5*
(Y10 , −Y20 , Y30 ) (0.0033, 0.0015, 0.0078) (2,1,3) 6
(Y10 , Y20 , −Y30 ) (0.0033, −0.0015, −0.0078) (2,0,0) 2
(−Y10 , Y20 , Y30 ) (−0.0033, −0.0015, 0.0078) (0,0,3) 3
(−Y10 , −Y20 , Y30 ) (−0.0033, 0.0015, 0.0078) (0,1,3) 4
(Y10 , −Y20 , −Y30 ) (0.0033, 0.0015, −0.0078) (2,1,0) 3
(−Y10 , Y20 , −Y30 ) (−0.0033, −0.0015, −0.0078) (0,0,0) 0
(−Y10 , −Y20 , −Y30 ) (−0.0033, 0.0015, −0.0078) (0,1,0) 1

Note that the first line corresponds to the observed data, and so
its value W = w ∗ = 5.

29 / 94
Example 1, Revisited

All 8 values of W in the rightmost column (5,6,2,3,4,3,0,1)


comprise the permutation distribution of W .

Signs Assumed Yi0 Contribution to W W


(Y10 , Y20 , Y30 ) (0.0033, −0.0015, 0.0078) (2,0,3) 5*
(Y10 , −Y20 , Y30 ) (0.0033, 0.0015, 0.0078) (2,1,3) 6
(Y10 , Y20 , −Y30 ) (0.0033, −0.0015, −0.0078) (2,0,0) 2
(−Y10 , Y20 , Y30 ) (−0.0033, −0.0015, 0.0078) (0,0,3) 3
(−Y10 , −Y20 , Y30 ) (−0.0033, 0.0015, 0.0078) (0,1,3) 4
(Y10 , −Y20 , −Y30 ) (0.0033, 0.0015, −0.0078) (2,1,0) 3
(−Y10 , Y20 , −Y30 ) (−0.0033, −0.0015, −0.0078) (0,0,0) 0
(−Y10 , −Y20 , −Y30 ) (−0.0033, 0.0015, −0.0078) (0,1,0) 1

30 / 94
Example 1, Revisited

The p-value is computed as the proportion of these permutation


distribution values that are at least as large as the observed value
w ∗ = 5. (Why?)
(Answer: the alternative hypothesis is Ha+ : θ > 0.75, so large
values of W provide evidence against H0 and in favor of Ha+ ).
In computing the p-value this way, we are measuring how extreme
the actually-observed W value, w ∗ , is with respect to all the other
values of the test statistic that under H0 , are equally valid.

31 / 94
Example 1, Revisited

Since exactly two of the 8 values (the first two) are at least as
large as w ∗ , the p-value= 2/8 = 0.25.

Signs Assumed Yi0 Contribution to W W


(Y10 , Y20 , Y30 ) (0.0033, −0.0015, 0.0078) (2,0,3) 5*
(Y10 , −Y20 , Y30 ) (0.0033, 0.0015, 0.0078) (2,1,3) 6
(Y10 , Y20 , −Y30 ) (0.0033, −0.0015, −0.0078) (2,0,0) 2
(−Y10 , Y20 , Y30 ) (−0.0033, −0.0015, 0.0078) (0,0,3) 3
(−Y10 , −Y20 , Y30 ) (−0.0033, 0.0015, 0.0078) (0,1,3) 4
(Y10 , −Y20 , −Y30 ) (0.0033, 0.0015, −0.0078) (2,1,0) 3
(−Y10 , Y20 , −Y30 ) (−0.0033, −0.0015, −0.0078) (0,0,0) 0
(−Y10 , −Y20 , −Y30 ) (−0.0033, 0.0015, −0.0078) (0,1,0) 1

32 / 94
Tabulated Distribution

Table A.12, p. 763 of the text provides exact values of


P0 (W ≤ w ∗ ) for n = 3, 4, . . . 20, where P0 (W ≤ w ∗ ) is the
proportion of permutation distribution values that are no larger
than the observed value w ∗ . These values are tabulated for w ∗
that might be expected to lead to rejection of H0 .

33 / 94
Example 1, Revisited

For this example, the only tabled value tells us that


P(W ≤ 0) = 0.125.

We also know that P(W ≤ w ∗ ) = P(W ≥ n(n + 1)/2 − w ∗ ), so,


since here n(n + 1)/2 = 3(3 + 1)/2 = 6, we have
0.125 = P(W ≤ 0) = P(W ≥ 6).

Thus, the table informs us that P(W ≥ 6) = 0.125. From this, we


can only deduce that our p-value P(W ≥ 5) > 0.125.

34 / 94
Example 1, Revisited

From software, we get the information that the value of the test
statistic is 2, which equals our value w ∗ minus 3(3 + 1)/4 = 3.
Thus we learn that w ∗ = 5, as we computed already by hand.

We also learn that the p-value of the two-sided test is 0.50. Since
our alternate hypothesis is Ha+ and 2 > 0, we can conclude that
the p-value for the one-sided test is 0.50/2 = 0.25, as we already
computed.

35 / 94
Large Sample Test
When H0 is true, the expected value and variance of W are
n(n + 1)/4 and n(n + 1)(2n + 1)/24, respectively. For large n, the
standardized test statistic
W − n(n + 1)/4
Z=p
n(n + 1)(2n + 1)/24

will have approximately a N(0, 1) distribution. If we standardize


the observed value of the test statistic, w ∗ , in exactly the same
way to obtain
w ∗ − n(n + 1)/4
z∗ = p ,
n(n + 1)(2n + 1)/24
we can compute the approximate p-values

p + = P(Z ≥ z ∗ ), p− = P(Z ≤ z ∗ ), p± = 2 min(p + , p− ),

where Z is assumed N(0, 1).


36 / 94
Large Sample Test

There is an adjustment to the large sample formula if there are ties


in the data. See p.541 of the text for details.

37 / 94
Example 1, Revisited Again

We return to the grinding problem, using all 150 observations this


time to illustrate the large sample test.
The computed value of W is w ∗ = 7914. R gives the p-value,
P0 (W ≥ 7914) < 0.0001 (It’s actual value is 7.22 × 10−6 .)
The normal approximation gives the standardized test statistic

7914 − (150)(151)/4
z∗ = p = 4.22.
(150)(151)(301)/24

The resulting approximate p-value is

P(Z ≥ 4.22) = 12.22 × 10−6 .

In either case, we clearly reject the null hypothesis.

38 / 94
Paired Comparisons

The sign test and the Wilcoxon signed rank test can also be used
for paired data by taking the difference of each pair of observations.
See the text, p. 535 and p. 542, for relevant examples.

39 / 94
Estimation Based on the Wilcoxon Signed Rank Test

As with the sign test, we can use the Wilcoxon signed rank test to
develop point and interval estimators for the population median, θ.

To develop a point estimator of θ, we use the following equivalent


formulation of the test statistic W when testing H0 : θ = θ0 :

The Wilcoxon signed rank statistic W is the number of the


n(n + 1)/2 averages

Yi + Yj
, 1 ≤ i ≤ j ≤ n,
2
that are greater than θ0 . These averages are called Walsh
averages.

40 / 94
Estimation Based on the Wilcoxon Signed Rank Test

The Walsh averages play the same role for the Wilcoxon signed
rank test that the original data play for the sign test: The test
statistic for the sign test is the number of observations greater
than θ0 , while the test statistic for the Wilcoxon signed rank test is
the number of Walsh averages greater than θ0 .

The same goes for estimating θ: The estimator based on the sign
test is the median of the data, while the estimator based on the
Wilcoxon signed rank test is the median of the Walsh averages.

41 / 94
Estimation Based on the Wilcoxon Signed Rank Test

A level L confidence interval for θ based on the Wilcoxon signed


rank test consists of all values θ0 for which the test of
H0 : θ = θ0
Ha + : θ 6= θ0
does not reject at the α = 1 − L significance level: that is, for
which p± ≥ α (again: “‘inverting the test”).

42 / 94
Estimation Based on the Wilcoxon Signed Rank Test

Let
a a a
W(1) ≤ W(2) ≤ . . . ≤ W(n(n+1)/2)
denote the ordered Walsh averages. An exact symmetric level L
confidence interval for θ is of the form
a a
(W((n(n+1)/2)−k+1) , W(k) ),

where for 5 ≤ n ≤ 25, k can be obtained from Table A.14, p. 765


of the text.

As with exact intervals based on the sign test, an exact level L


interval cannot be computed for all possible levels L because of the
discreteness of the distribution of W , but Table A.14 gives the
values closest to the most commonly used values of L.

43 / 94
Estimation Based on the Wilcoxon Signed Rank Test

For large n, we may take k to be the integer closest to


r
n(n + 1) n(n + 1)(2n + 1)
+ z 1+L .
4 2 24
a
An approximate level L interval is then (W((n(n+1)/2)−k+1) a ).
, W(k)

The software can output the sorted Walsh averages to a data set,
which makes calculation of the point estimate and confidence
inteval for the median pretty simple whether you are using Table
A.14 or the large sample approximation.

44 / 94
Example 3
Consider the data set consisting of the six values: 17, 22, 3, 20, -4,
14. A point estimator of the population median is the median of
the 6(6 + 1)/2 = 21 Walsh averages shown in the table below.

Walsh average Value Walsh average Value


(y1 + y1 )/2 17.0 (y1 + y5 )/2 6.5
(y1 + y2 )/2 19.5 (y2 + y5 )/2 9.0
(y2 + y2 )/2 22.0 (y3 + y5 )/2 −0.5
(y1 + y3 )/2 10.0 (y4 + y5 )/2 8.0
(y2 + y3 )/2 12.5 (y5 + y5 )/2 −4.0
(y3 + y3 )/2 3.0 (y1 + y6 )/2 15.5
(y1 + y4 )/2 18.5 (y2 + y6 )/2 18.0
(y2 + y4 )/2 21.0 (y3 + y6 )/2 8.5
(y3 + y4 )/2 11.5 (y4 + y6 )/2 17.0
(y4 + y4 )/2 20.0 (y5 + y6 )/2 5.0
(y6 + y6 )/2 14.0

45 / 94
Example 3, Continued

The ordered observations are


a a a a
w(1) = −4.0 < w(2) = −0.5 < . . . < w(20) = 21.0 < w(21) = 22.0.

From Table A.14, we find that for L = 0.969, k = 21, and for
L = 0.937, k = 20. Therefore, we obtain a level 0.969 confidence
interval for θ as
a a a a
(w(21−21+1) , w(21) ) = (w(1) , w(21) ) = (−4.0, 22.0),

and a level 0.937 confidence interval as


a a a a
(w(21−20+1) , w(20) ) = (w(2) , w(20) ) = (−0.5, 21.0).

46 / 94
Example 3, Continued

While we would not advise using the large sample interval for
n = 6, we will compute it now to illustrate the calculations. A level
0.95 interval will take

Therefore, we will round k to [ ], and the resulting interval is


the same as the level 0.937 interval computed above.

47 / 94
The Wilcoxon Rank Sum Test
When we wish to compare the central locations of two independent
populations, the Wilcoxon rank sum test is the appropriate
rank-based analogue to the two sample t test.

We assume that the i th data value from population 1 is

Y1,i = 1,i , i = 1, . . . , n1 ,

and that the i th data value from population 2 is

Y2,i = δ + 2,i , i = 1, . . . , n2 ,

where the 1,i and the 2,i are independent and have the same
continuous distribution. The parameter δ is a location shift, which
means that the distribution of the data from population 2 is the
distribution of the data from population 1 shifted δ units.

48 / 94
The Wilcoxon Rank Sum Test

The null hypothesis is


H0 : δ = δ0 ,
where δ0 is a specific known value. If we take δ0 = 0, the null
hypothesis is that both populations have identical distributions.

The alternative hypothesis can be any of the one- or two-sided


alternatives:

Ha + : δ > δ0
H a− : δ < δ0
Ha± : δ 6 = δ0

49 / 94
The Wilcoxon Rank Sum Test
To compute the test statistic, we follow these steps:
1. Create the adjusted observations
0
Y2,i = Y2,i − δ0 , i = 1, . . . , n2 .

2. Rank all n1 + n2 values Y1,i , i = 1, . . . , n1 and


0 , i = 1, . . . , n from smallest (rank 1) to largest (rank
Y2,i 2
n1 + n2 ). Let R2,i , i = 1, . . . , n2 , denote the resulting ranks of
0 .
the Y2,i
3. The test statistic is the sum of the ranks belonging to the
observations from population 2:
n2
X
V = R2,i .
i=1

50 / 94
The Wilcoxon Rank Sum Test

As with the one sample Wilcoxon test, we will use a permutation


distribution to compute the p-value of the test.

We first note that if H0 is true, the Y1,i , i = 1, . . . , n1 , and the


0 , i = 1, . . . , n have exactly the same distribution.
Y2,i 2

51 / 94
The Wilcoxon Rank Sum Test

Suppose that Ha+ is the alternative hypothesis. Then if Ha+ is true,


0 will tend to be larger than the Y , so large values of the
the Y2,i 1,i
test statistic, V , will give evidence against H0 and in favor of Ha+ .

Therefore, we can compute the p-value, p + , as the proportion of


all appropriate permutations of the data which give values of the
test statistic, V , which exceed v ∗ , the observed value of V .

If the alternative hypothesis is Ha− , the p-value, p− , is the


proportion of all appropriate permutations of the data which give
values of the test statistic, V , smaller than v ∗ . If the alternative
hypothesis is Ha± , the p-value is p± = 2 min(p + , p− ).

52 / 94
The Wilcoxon Rank Sum Test

The appropriate permutations with which to compute the p-value


are all possible divisions of the ranks 1, 2, . . . , n1 + n2 into two
sets: one consisting of n1 of the ranks and the other consisting of
the remaining n2 ranks.

There are  
n1 + n2 (n1 + n2 )!
=
n2 n1 !n2 !
such permutations.

53 / 94
The Wilcoxon Rank Sum Test

For permutation i, the value, vi∗ , of the Wilcoxon statistic V is


calculated by summing the ranks associated with the n2
observations in the second set. The p-value p + is calculated as the
proportion of the vi∗ as great or greater than the observed value v ∗
and the p-value p− is calculated as the proportion of the vi∗ as
small or smaller than the observed value v ∗ .

54 / 94
Example 4
A company makes die-cast automotive parts. It recently replaced
two of its die-casting machines with machines using dies of a
different design. Soon after the replacement, production personnel
began to suspect that the new dies did not last as long as the old
dies. The table shows cycles to failure for four randomly-selected
dies of the new type and two randomly-selected dies of the old
type.

Cycles to
Failure Ranks
Old 9477 4
Dies 13581 6
New 7651 2
Dies 8337 3
6989 1
9568 5

55 / 94
Example 4, Continued

I The Scientific Hypothesis The scientific hypothesis is


that the new dies don’t last as long as the old dies.
I The Statistical Model We will assume that the statistical
model governing the old dies come from population 1 and the
new dies from population 2. The response is the lifetime of
the die: the number of cycles until die failure.
I The Statistical Hypotheses The statistical hypotheses are
H0 : δ = 0
Ha− δ < 0.
Notice that Ha− is the scientific hypothesis.

56 / 94
Example 4, Continued

I The Test Statistic The observed value of the test statistic


is the the sum of the ranks for the data from population 2:

v ∗ = 2 + 3 + 1 + 5 = 11.

Cycles to
Failure Ranks
Old 9477 4
Dies 13581 6
New 7651 2
Dies 8337 3
6989 1
9568 5

57 / 94
Example 4, Continued
 
6
I The P-Value The table lists all = 15 possible
4
assignments of the ranks into groups 1 and 2 and the values
of the test statistic they produce.
Group 1 Group 2 V Group 1 Group 2 V
1,2 3,4,5,6 18 3,4 1,2,5,6 14
1,3 2,4,5,6 17 3,5 1,2 4,6 13
1,4 2,3,5,6 16 2,6 1,3,4,5 13
2,3 1,4,5,6 16 3,6 1,2,4,5 12
1,5 2,3,4,6 15 4,5 1,2,3,6 12
2,4 1,3,5,6 15 4,6 1,2,3,5 11*
1,6 2,3,4,5 14 5,6 1,2,3,4 10
2,5 1,3,4,6 14
The p-value of the test is the proportion of these 15 V values
at least as small as v ∗ = 11. Since there are only 2 such
values (the last two), the p-value is p− = 2/15 = 0.133.
58 / 94
The Wilcoxon Rank Sum Test

Most statistical computer packages will compute the p-value for


the Wilcoxon rank sum test.

Table A.13, p. 764 of the text, which gives relevant probabilities of


P(V ≤ v ∗) for various values of v ∗ and small values of n1 and n2 ,
may also be used to compute p-values.

59 / 94
The Wilcoxon Rank Sum Test

It can be shown that if H0 is true, the mean and variance of V are


n2 (n1 + n2 + 1)/2 and n1 n2 (n1 + n2 + 1)/12, respectively. If n1
and n2 are large, the standardized test statistic

V − n2 (n1 + n2 + 1)/2
Z=p ,
n1 n2 (n1 + n2 + 1)/12

has approximately a N(0, 1) distribution.

60 / 94
The Wilcoxon Rank Sum Test

If we standardize the observed value of the test statistic, v ∗ , in


exactly the same way, using a continuity correction to make the
approximation better, we obtain

v ∗ − n2 (n1 + n2 + 1)/2 − 0.5


z ∗+ = p ,
n1 n2 (n1 + n2 + 1)/12

and
∗ v ∗ − n2 (n1 + n2 + 1)/2 + 0.5
z− = p ,
n1 n2 (n1 + n2 + 1)/12
we can compute the approximate p-values

p + = P(Z ≥ z ∗+ ), p− = P(Z ≤ z−

), p± = 2 min(p + , p− ),

where Z is assumed N(0, 1).

61 / 94
Example 4, Continued

∗ ), where
Here, the p-value is p− = P(Z ≤ z−
∗ =
z−

Then p− =

62 / 94
The Wilcoxon Rank Sum Test

Ties are handled exactly as for the Wilcoxon signed rank test: by
taking average ranks. See p. 547 of the text for details.

63 / 94
Estimation Based on the Wilcoxon Rank Sum Test
We can use the Wilcoxon rank sum test to develop point and
interval estimators for the location shift, δ.

The Wilcoxon rank sum statistic V equals n2 (n2 + 1)/2 plus the
number of the n1 n2 differences

Y2,i − Y1,j , 1 ≤ i ≤ n2 , 1 ≤ j ≤ n1 ,

that are greater than δ0 .

Using reasoning similar to that used in the sign and Wilcoxon


signed rank cases, we take the point estimator of δ to be the
amount the Y2,i must be shifted so that the test statistic V
regards the two samples as coming from the same population. The
resulting estimator, δ̂, is the median of the n1 n2 differences
Y2,i − Y1,j , 1 ≤ i ≤ n2 , 1 ≤ j ≤ n1 .

64 / 94
Estimation Based on the Wilcoxon Rank Sum Test

A level L confidence interval for δ based on the Wilcoxon rank sum


test consists of all values δ0 for which the test of
H0 : δ = δ0
Ha± : δ 6= δ0
does not reject at the α = 1 − L significance level: that is, for
which p± ≥ α. (Again: inverting the test!)

Let D(1) ≤ D(2) ≤ . . . ≤ D(n1 n2 ) denote the differences Y2,i − Y1,j


listed in ascending order.

An exact level L confidence interval for δ is of the form


(D(n1 n2 −k+1) , D(k) ), where for 5 ≤ n1 , n2 ≤ 12, k can be obtained
from Table A.15, p. 766 of the text.

65 / 94
Estimation Based on the Wilcoxon Rank Sum Test

As with exact intervals based on the sign test and Wilcoxon signed
rank tests, an exact level L interval cannot be computed for all
desired levels L because of the discreteness of the distribution of
D, but Table A.15 gives the values closest to the most commonly
used values of L.

For large n1 and n2 , we may take k in the confidence interval


formula (D(n1 n2 −k+1) , D(k) ) to be the integer closest to
r
n1 n2 n1 n2 (n1 + n2 + 1)
+ z 1+L ,
2 2 12
.

66 / 94
Example 4, Continued
Consider again the cycles to failure of the six dies. We want to
estimate the difference in location, δ, between the population
distributions of the cycles to failure of the old and the new dies.
To do this, we form the differences

7651 − 9477 = −1826 7651 − 13581 = −5930


8337 − 9477 = −1140 8337 − 13581 = −5244
6989 − 9477 = −2488 6989 − 13581 = −6592
9568 − 9477 = 91 9568 − 13581 = −4013

The point estimator of δ is

δ̂ = median(−6592, −5930, −5244, −4013, −2488, −1826, −1140, 91)


= (−4013 + (−2488))/2 = −3250.5.

67 / 94
Example 4, Continued

Looking at Table A.15, we find that the value of the constant


k = 8 is associated with a confidence level L = 0.866. Therefore, a
level 0.886 confidence interval for δ is

(d((2)(4)−8+1) , d(8) ) = (d(1) , d(8) ) = (−6592, 91).

While we would not advise using the large sample interval for
n1 = 2 and n2 = 4, we will compute it now to illustrate the
calculations. A level 0.95 interval will take
r
(2)(4) (2)(4)(2 + 4 + 1)
k= + z0.975 = 8.2,
2 12
Therefore, we will round k to 8, and the resulting interval is the
same as the level 0.866 interval computed above.

68 / 94
Spearman Correlation

There are at least two problems with the Pearson correlation as a


measure of the association between two variables:
I It only measures the strength of the linear association
between the variables.
I It is not resistant to outliers.
Spearman’s rho, also called the Spearman rank correlation
coefficient, is a measure of association which helps remedy these
two problems. It is also very easy to compute. Instead of dealing
with the values of the original two variables, X and Y , we form
their ranks RX and RY . Spearman’s rho, rs , is just the Pearson
correlation between RX and RY .

69 / 94
Example 5
Recall data on fuel consumption versus equivalence ratio,
considered in Chapter 7:
Ranks of Ranks of
Fuel Equivalence Fuel Equivalence
Consumption Ratio Consumption Ratio
98.0 0.64 1 1
100.0 0.65 2 2
100.1 0.66 3 3
102.0 0.74 6 4
101.0 0.75 4 5
103.0 0.77 7 7
103.2 0.76 8 6
101.9 0.80 5 8
104.0 0.81 9 9
105.0 0.88 10 10
105.5 0.90 11 11
105.6 0.91 12 12
106.0 0.92 13 13
110.0 1.00 14 14
111.0 1.02 15 15
115.0 1.04 16 16
121.5 1.14 17 17
123.5 1.16 18 18
136.0 1.24 19 19

70 / 94
Example 5, Continued
The scatterplots below show fuel consumption versus equivalence
ratio (left) and of the ranks of fuel consumption versus the ranks
of equivalence ratio (right) for these data.

R
F a
u 130 n
e k 15
l
o
C f
o 120
n f 10
s u
u e
m l
p 110
t c
i o 5
o n
n s
100 u

0.8 1.0 1.2 5 10 15


Equivalence Ratio Rank of equivalence ratio

71 / 94
Example 5, Continued

The first plot shows the association between fuel consumption and
equivalence ratio to be nonlinear, and hence the Pearson
correlation, which equals a respectable 0.9332, is not the most
appropriate summary of that association. The right scatterplot
shows a more nearly linear association for the ranks of the
variables. The stronger linear association is reflected in the higher
Pearson correlation, 0.9842, between the ranks of fuel consumption
and equivalence ratio. This Pearson correlation between the ranks
is exactly the Spearman rank correlation.

72 / 94
Example 5, Continued
To illustrate the effect of outliers on both the Pearson and
Spearman correlations, we added an outlier to the data. The added
observation has a fuel consumption of 95 and an equivalence ratio
of 2.1. The resulting outlier can be seen in the lower right corner
of the left scatterplot. The right scatterplot of the ranks shows
that this data value is still an outlier, but is less extreme.
20
R
a
F 130 n
u k
e 15
l o
f
C 120
o F
n u
s e
l 10
u
m 110
p C
t o
i n
o s 5
n u
100 m
p
t

1.0 1.5 2.0 5 10 15 20


Equivalence Ratio Rank of Equivalence Ratio

73 / 94
Example 5, Continued

The outlier greatly affects the Pearson correlation of fuel


consumption and equivalence ratio, reducing it from 0.9332 to
0.2319. In contrast, the outlier has less effect on the Spearman
correlation, which declines from 0.9842 to 0.7008. Because of the
problems Pearson correlation has with outliers and nonlinearity, we
recommend computing both the Pearson and Spearman correlation
routinely. Widely different values serve as a warning to look more
closely at the data.

74 / 94
Spearman Correlation

Recall that the Pearson correlation measures the strength of linear


association between two variables in a set of data. On the other
hand, the Spearman correlation measures the strength of
monotone association: that is, the linear association between the
ranks of the data values. The plots on the next slide illustrate
what this means.

75 / 94
Spearman Correlation
R
Y Y
1 1

X RX

R
Y Y
2 2

X RX

R
Y Y
3 3

X RX

Figure: Examples of, top: positive monotone association (left), and


corresponding ranks (right); middle: positive monotone association (left),
and corresponding ranks (right); bottom: nonlinear non-monotone
association (left), and corresponding ranks (right). The Pearson
correlations are (top to bottom): 0.9183, −0.9373, and 0.1006, while the
Spearman correlations are 1, −1 and 0.1693, respectively.

76 / 94
Spearman Correlation

If the data are a random sample from a larger population, we may


want to conduct a test of

H0 : no monotone association between X and Y in the population.

versus

Ha+ : positive monotone association


Ha− : negative monotone association
Ha± : nonzero monotone association.

77 / 94
Example 5.5

As with other rank-based procedures, we can base the test on a


permutation distribution.

To illustrate, suppose we have the following data:

X 2.3 1.7 4.4 0.5


Y 3.7 1.2 2.2 1.7

The Spearman correlation is 0.6

78 / 94
Example 5.5, Continued
Here is the permutation distribution of the ranks:
RX: 1 2 3 4
Spearman
Permutation RY values correlation
1 4 3 2 1 −1.0
2 3 4 2 1 −0.8
3 4 2 3 1 −0.8
4 4 3 1 2 −0.8
5 3 4 1 2 −0.6
6 4 2 1 3 −0.4
7 2 4 3 1 −0.4
8 4 1 3 2 −0.4
9 3 2 4 1 −0.4
10 2 3 4 1 −0.2
11 4 1 2 3 −0.2
12 2 4 1 3 0.0
13 3 1 4 2 0.0
14 1 4 3 2 0.2
15 3 2 1 4 0.2
16 2 3 1 4 0.4
17 3 1 2 4 0.4
18 1 3 4 2 0.4
19 1 4 2 3 0.4
20 2 1 4 3 0.6*
21 1 3 2 4 0.8
22 1 2 4 3 0.8
23 2 1 3 4 0.8
24 1 2 3 4 1.0

79 / 94
Example 5.5, Continued

Alternative Hypothesis p-value

Ha + p+ =
The p values are:
H a− p− =

Ha± p± =

80 / 94
Spearman Correlation

Table A.10, p. 761 of the text gives values of p + for use in testing
monotone association for samples of size 10 and less. For larger
samples,
q an approximate test of may be obtained from the fact
n−2
that rs 1−r 2 has approximately a tn−2 distribution under the
s
assumption of no monotone association.

81 / 94
Example 5.75

Consider again the data on fuel consumption (FC) and equivalence


ratio (ER). Since there are n = 19 observations, an approximate
test of
H0 : No monotone association between FC and ER.
Ha + : Positive monotone association between FC and ER.
is obtained by finding the proportion of a t17 population that
exceeds the observed value of the test statistic:
s r
∗ n−2 14
t = rs 2
= 0.9842 = 22.919.
1 − rs 1 − 0.98422

This proportion, which is the p-value, equals 1.6 × 10−14 , which is


very strong evidence of a positive monotone association.

82 / 94
The Kruskal-Wallis Test

Recall the one-way effects model described in Chapter 9:

Yij = µ + τi + ij , j = 1 . . . , ni , i = 1, . . . , k,

where Yij is the j th response from population i, µ is an overall


location measure, τi is the effect due to population i, and ij is the
random error associated with observation Yij .

As in Chapter 9, we assume that the τi sum to zero, and that the


ij are independent random variables having the same distribution.
However, we do not assume the ij have a normal distribution. We
assume only that they have the same continuous distribution.

83 / 94
The Kruskal-Wallis Test

To test for differences in population effects τi , the appropriate


rank-based analogue of the F test of Chapter 9 is the
Kruskal-Wallis Test.

We summarize the Kruskall-Wallis procedure as follows:


I The Statistical Model We assume data from one-way
effects model.
I The Statistical Hypotheses The hypotheses to be tested
are
H0 : τ1 = τ2 = · · · = τk = 0
Ha : Not all the population effects τi are 0.

84 / 94
The Kruskal-Wallis Test
I The Test Statistic To compute the test statistic, follow
these steps:
Pk
1. Rank all n = i=1 ni observations, Yij . Let Rij denote the
rank of Yij .
2. Compute
ni ni k ni
X 1 X 1 XX n+1
Ri· = Rij , R i· = Rij , R ·· = Rij = .
ni n 2
j=1 j=1 i=1 j=1

3. The test statistic H is


k X R2 k
12 X 12
H= ni (R i· − R ·· )2 = i·
− 3(n + 1).
n(n + 1) n(n + 1) ni
i=1 i=1
Pk
In computing H, note that the quantity i=1 ni (R i· − R ·· )2 is
the model sum of squares found in the ANOVA table obtained
from the one-way model when the responses are replaced by
their ranks.
85 / 94
The Kruskal-Wallis Test

I The P-Value Let h∗ denote the observed value of the test


statistic H. The p-value is p = P0 (H ≥ h∗ ), where the
notation P0 signifies that the probability is computed under
the assumption that H0 is true.
For small k and ni , tables of critical values of H have been
developed, but they are rather extensive and we do not
present them here. Rather, we take two different approaches:

86 / 94
The Kruskal-Wallis Test

o Exact p-Values We can use the computer to calculate exact


p-values.
o Large Sample Approximation When sample sizes are large
(ni ≥ 5 for all i is a common rule of thumb), the distribution
of H under the null hypothesis can be approximated by a χ2
distribution with k − 1 degrees of freedom. This means that
an approximate p-value is P(χ2k−1 ≥ h∗ ), the area under a
χ2k−1 density at or above the observed value of the test
statistic.

87 / 94
The Kruskal-Wallis Test

It is generally assumed that the observations Yij are from a


continuous distribution model, which implies that there should be
no ties (i.e., no two Yij should be equal). However, ties do occur in
practice. When ties do occur, the usual approach is to use average
ranks, as described earlier.
When using the large sample test when there are ties, it is common
practice to use a modified test statistic H 0 . The formula is found in
the text.

88 / 94
Example 6

Mucociliary clearance is a mechanism to remove foreign matter


from the respiratory system. A study compared the mucociliary
clearance of three groups of subjects: one with normal respiratory
function, one with obstructive airways disease (OAD), and one
with asbestosis (ASBT). the measured response was the half-time
of mucociliary clearance (HTMC). The data, found in MUCO, are:

89 / 94
Example 6, Continued

Group HTMC, Yij Ranks of Yij


normal 2.9 8
normal 3.0 9
normal 2.5 4
normal 2.6 5
normal 3.2 10
OAD 3.8 13
OAD 2.7 6
OAD 4.0 14
OAD 2.4 3
ASBT 2.8 7
ASBT 3.4 11
ASBT 3.7 12
ASBT 2.2 2
ASBT 2.0 1

90 / 94
Example 6, Continued

I The Scientific Hypothesis The scientific hypothesis is


that the groups differ in mucociliary clearance function.
I The Statistical Model We assume data from the one-way
model with k = 3 populations corresponding to the three
groups.
I The Statistical Hypotheses The hypotheses to be tested
are
H0 : τ1 = τ2 = τ3 = 0
Ha : Not all the population effects τi are 0.

91 / 94
Example 6, Continued

I The Test Statistic If we denote the normal, OAD and


ASBT groups as populations 1, 2 and 3, respectively, then
n1 = 5, n2 = 4, n3 = 5, n = 14 and

R1· = 8 + 9 + 4 + 5 + 10 = 36; R2· = 13 + 6 + 14 + 3 = 36;

R3· = 7 + 11 + 12 + 2 + 1 = 33,
so that
 2
362 332

12 36
H = + + − 3(14 + 1)
(14)(15) 5 4 5
= 0.771

92 / 94
Example 6, Continued

I The P-Value The p-value of this test equals 0.71. The


exact value can be computed using the software.

For illustration purposes, we will use the large-sample


approximation here, even though not all samples have sample
size greater than 5. The approximate p value is
P(χ22 ≥ 0.771) = 0.680. Based on this result, we do not
reject the null hypothesis: there is insufficient evidence to
conclude that one or more of the τi differ from 0.

93 / 94
Summary: Distribution-Free Inference Procedures

Data Type Inference about Inference based on


Single population, Median Sign test
Arbitrary shape
Single population, Median Wilcoxon signed rank
Symmetric test
Two populations, Location shift Wilcoxon rank sum
Same shape test
Multiple populations, Population effects Kruskal-Wallis test
Same shape (Location shifts)
X-Y data Monotone association Spearman correlation

94 / 94

You might also like