Applied Statistics II Chapter 11 Distribution-Free Inference

Applied Statistics II
Chapter 11 Distribution-Free Inference
Jian Zou
WPI Mathematical Sciences
1 / 94
Introduction
The Sign Test
Rank-Based Methods
2 / 94
The Need for Distribution-Free Statistical Methods
Though they do perform well under normal circumstances, classical

inference methods, such as those considered in Chapters 5
(confidence intervals) and 6 (hypothesis tests), do have some
weaknesses, particularly when sample sizes are small:
I Classical inference methods rely on assumptions about the
underlying distribution model for the data. Usually, they
assume the distribution is one of a particular class of
distributions such as the class of all normal distributions.
I These methods can be sensitive to outliers and nonnormality.
3 / 94
The Need for Distribution-Free Statistical Methods
In this chapter we present some inference methods which were

developed to help remedy the weaknesses of classical inference
methods. Because these methods do not rely on a specific
population distribution model, such as the normal, they are called
distribution-free inference methods.
As we did in Chapters 5 and 6, we begin by assuming that the data

constitute a random sample from a population to which we wish to
make inference.
4 / 94
The Sign Test
It is reported that a variant of the sign test was used by

Arbuthnott in 1710 to test whether the proportion of male births
in London was greater than 1/2. If so, the sign test is one of the
oldest of all statistical procedures.
5 / 94
The Sign Test
I The Statistical Model We assume data Y1 , . . . , Yn from a

population with a continuous distribution1 having median θ.
I The Statistical Hypotheses H0 : θ = θ0 , versus one of the

following:
Ha + : θ > θ0
H a− : θ < θ0
Ha± : θ 6 = θ0
1
Continuous means that we regard the proportion of the population taking
any single value to be 0.
6 / 94
The Sign Test
I The Test Statistic The test statistic for the sign test is the
number of observations Yi which exceed θ0 . We will call this
statistic B.
The observed value of B, b ∗ , is the number we get when we
plug the data values into B.
I The p-value The p-value is the proportion of all samples,
when the null hypothesis is true, that have a value of the test
statistic giving as much or more evidence against the null
hypothesis and in favor of the alternative hypothesis as b ∗ .
7 / 94
The Sign Test
Thus, in order to compute the p-value, we need to know:

I The distribution of B when H0 is true.
I When the test statistic gives as much or more evidence
against the null hypothesis and in favor of the alternative
hypothesis as b ∗ .
8 / 94
The Sign Test
I The Distribution of B When H0 is True: If H0 is true, the

chance any of the Yi is greater than θ0 is 1/2. Therefore, B,
the number of Yi greater than θ0 , has the same distribution
as the number of heads in n tosses of a fair coin.
We know this distribution very well: b(n, 0.5).
9 / 94
The Sign Test
I What Is Evidence Against the Null and in Favor of the

Alternative Hypothesis?
I For Ha+ : θ > θ0 , large values of B are more consistent with
Ha+ and less consistent with H0 . Therefore, the p-value is
p + = P(B ≥ b ∗ ).
I For Ha− : θ < θ0 , the p-value is p− = P(B ≤ b ∗ ).
I For Ha± : θ 6= θ0 , the p-value is p± = 2 min(p + , p− )
10 / 94
Large Sample Approximation
By the Central Limit Theorem, we know that if B ∼ b(n, p) and if
n is large, (np > 10 and n(1 − p) > 10) then, approximately,
B − np
p ∼ N(0, 1).
np(1 − p)
To obtain an approximate p-value using the normal approximation

with the continuity correction, we first assume H0 is true. Then
p = 0.5, and np = 0.5n and np(1 − p) = 0.25n. If the alternative
hypothesis is Ha+ , the p-value is
p + = P(B ≥ b ∗ ) = P(B ≥ b ∗ − 0.5)

b ∗ − 0.5n − 0.5

B − 0.5n
= P √ ≥ √ ≈ P(Z ≥ zu∗ ),
0.25n 0.25n
where
b ∗ − 0.5n − 0.5
Z ∼ N(0, 1), and zu∗ = √ .
0.25n
11 / 94
Large Sample Approximation
Similarly, we may approximate p− by P(Z ≤ zl∗ ), where
b ∗ − 0.5n + 0.5
zl∗ = √ .
0.25n
We then approximate p± by 2 min(P(Z ≥ zu∗ ), P(Z ≤ zl∗ )).
12 / 94
Example 1
One stage of a manufacturing process involves a

manually-controlled grinding operation. Management suspects that
the grinding machine operators tend to grind parts slightly larger
rather than slightly smaller than the target diameter of 0.75 inches,
while still staying within specification limits. To verify their
suspicion, they sample 150 within-spec parts and find that 93 have
diameters above the target diameter. Is this strong evidence in
support of their suspicion?
13 / 94
Example 1
I The Scientific Hypothesis The grinding machine operators

tend to grind parts slightly larger than the target diameter.
I The Statistical Model The median diameter of the
population of all ground parts is θ.
I The Statistical Hypotheses
H0 : θ = 0.75
Ha+ : θ > 0.75
I The Test Statistic The test statistic is B, the number of
parts with diameters larger than the target of 0.75 inches. Its
observed value is b ∗ = 93.
14 / 94
Example 1
I The P-Value The p-value is P(B ≥ 93) = 0.0021, where

B ∼ b(150, 0.5).
Since np = n(1 − p) = 150 × 0.5 = 75 > 10, we can
approximate this p-value by using the normal approximation.
93 − (150)(0.5) − 0.5
zu∗ = p = 2.86,
(150)(0.5)(0.5)
so the approximate p-value is
P(Z ≥ 2.86) = 0.0021.
Result: reject H0 in favor of Ha+ and conclude that more than

half the parts are ground too large.
15 / 94
Estimation Based on the Sign Test
A point estimator for the population median θ, can be obtained by

reasoning as follows.
If θ = θ0 , the distribution of the test statistic B is symmetric about

its mean n/2. Therefore, an estimator of θ will be the amount θ̂
that we have to subtract from each data value Yi so that the value
of B computed from Y1 − θ̂, . . . , Yn − θ̂ equals n/2. It turns out
that this estimator, θ̂ is the median of the data values Y1 , . . . , Yn .
16 / 94
A level L confidence interval for θ based on the sign test consists

of all values θ0 for which the test of
H0 : θ = θ0
Ha± : θ 6= θ0
does not reject at the α = 1 − L significance level: that is, for
which p± ≥ α (this is called “ inverting the test”).
In a practical sense, an exact level L interval can seldom be

computed because of the discreteness of the binomial distribution,
but exact intervals can be computed for levels close to L. One
method for doing so is the following:
17 / 94
1. Find an integer k so that the value

α0 = 2P(X ≥ k) = 2P(X ≤ n − k) is as close to α as
possible, where X ∼ b(n, 0.5).
2. Order the observations from smallest to largest, denoting the
ordered values Y(1) ≤ Y(2) ≤ · · · ≤ Y(n) .
3. A level L0 = 1 − α0 confidence interval for θ is
(Y(n−k+1) , Y(k) ).
Large Sample Approximation:
For large n, we may take k in step 1. to be the integer closest to
r
n n
+ z 1+L .
2 2 2
The interval is then computed as described in steps 2. and 3.
18 / 94
Example 2
The median selling prices of 11 randomly-selected residential

properties in Worcester County are ($1000’s):
115, 179, 199, 222, 225, 247, 276, 319, 342, 543, 798
Using these data, we estimate the median house price of residential

properties in Worcester County to be $247,000: the median of the
data.
19 / 94
Example 2, Continued
To obtain a 95% confidence interval for the median house price,
we look for a value k so that for X ∼ b(11, 0.5), 2P(X ≤ 11 − k)
is a close as possible to 0.05. We can use a table, a calculator, the
software (R, SAS ...), or some other technology to do this.
What we find is that 2P(X ≤ 1) = 0.0117 and

2P(X ≤ 2) = 0.0654, so we can take either k = 10 k = 9. These
choices give
k (y(n−k+1) , y(k) ) confidence level

9 (y(3) , y(9) ) = (199, 342) 1 − 0.0654 = 0.9346
10 (y(2) , y(10) ) = (179, 543) 1 − 0.0117 = 0.9883
The large sample solution (though

p n = 11 is not exactly large
sample), gives k ≈ 11/2 + 1.96 11/2 = 10.09, which gives the
second interval above.
20 / 94
Rank-Based Procedures
As their title implies, rank-based procedures make use of the ranks

of observations instead of the observations themselves.
For example, here’s how we assign ranks to the set of observations

1.2, 3.7, 0.6, 3.7, 3.7, 9.7, −1.1, 9.7:
First, sort the values: −1.1 0.6 1.2 3.7 3.7 3.7 9.7 9.7
Next, assign integers 1 2 3 4 5 6 7 8
Finally, compute ranks
(note how the integers 1 2 3 5 5 5 7.5 7.5
are averaged for equal
data values)
21 / 94
The Wilcoxon Signed-Rank Test
I The Statistical Model We assume data Y1 , . . . , Yn from a

population with a symmetric distribution centered at θ. This
means θ is the median, and if it has one, the mean of the
population.
I The Statistical Hypotheses H0 : θ = θ0 , versus one of the

following:
Ha + : θ > θ0
H a− : θ < θ0
Ha± : θ 6 = θ0
22 / 94
I The Test Statistic To compute the test statistic, follow

these steps:
1. Center the observations by subtracting from each the
population median under H0 . The resulting centered
observations are Yi0 = Yi − θ0 , i = 1 . . . , n.
2. Compute the ranks of the absolute values of the centered
observations: Ri = rank(|Yi0 |), i = 1 . . . , n.
3. The test statistic W is the sum of the ranks Ri for those i
corresponding to positive Yi0 values:
X
W = Ri
{i:Yi0 >0}
23 / 94
I The P-Value Let w ∗ denote the observed value of the test

statistic W . Computation of the p-value depends on the
alternative hypothesis (P0 (A) denotes the proportion of
samples for which A occurs, given H0 is true):
Alternative
Hypothesis p-value
Ha+ : θ > θ0 p = P0 (W ≥ w ∗ )
+
Ha− : θ < θ0 p− = P0 (W ≤ w ∗ )
Ha± : θ 6= θ0 p± = 2 min(p + , p− )
24 / 94
Example 1, Revisited
One stage of a manufacturing process involves a

manually-controlled grinding operation. Management suspects that
the grinding machine operators tend to grind parts slightly larger
rather than slightly smaller than the target diameter of 0.75 inches,
while still staying within specification limits.
To illustrate the ideas, we will assume their random sample

consists of 3 rather than 150 parts.
They are willing to assume the distribution of ground diameters is

symmetric about a median θ. They wish to test the hypotheses
H0 : θ = 0.75
Ha + : θ > 0.75
25 / 94
Here are the computations for obtaining w ∗ , the observed value of

W:
Original Centered Absolute
Data Data Values Ranks Contribution
yi yi0 = yi − 0.75 |yi0 | ri = rank(|yi0 |) to w ∗
0.7533 0.0033 0.0033 2 2
0.7485 −0.0015 0.0015 1 0
0.7578 0.0078 0.0078 3 3
So w ∗ = 2 + 0 + 3 = 5.
26 / 94
The p-value is computed from the permutation distribution of

W : the distribution of all values of the test statistic that would
have resulted under an appropriate permutation of the original
data.
27 / 94
Here is the idea behind the permutation distribution:
I If H0 is true, the Yi0 are independent and from the same

distribution having median 0.
I Therefore, prior to sampling, each is equally likely to be
positive or negative.
I In fact, because of the symmetry assumption, Yi0 and −Yi0
have the same distribution, so it is just as likely that we
observe (Y10 , Y20 , Y30 ), or (−Y10 , Y20 , Y30 ), or (−Y10 , Y20 , −Y30 ), or
data resulting from any other rearrangement of the signs of
the Yi0 .
28 / 94
The table below shows the values of W resulting from each of the
8 different possible selections of signs for the Yi0 .
Signs Assumed Yi0 Contribution to W W

(Y10 , Y20 , Y30 ) (0.0033, −0.0015, 0.0078) (2,0,3) 5*
(Y10 , −Y20 , Y30 ) (0.0033, 0.0015, 0.0078) (2,1,3) 6
(Y10 , Y20 , −Y30 ) (0.0033, −0.0015, −0.0078) (2,0,0) 2
(−Y10 , Y20 , Y30 ) (−0.0033, −0.0015, 0.0078) (0,0,3) 3
(−Y10 , −Y20 , Y30 ) (−0.0033, 0.0015, 0.0078) (0,1,3) 4
(Y10 , −Y20 , −Y30 ) (0.0033, 0.0015, −0.0078) (2,1,0) 3
(−Y10 , Y20 , −Y30 ) (−0.0033, −0.0015, −0.0078) (0,0,0) 0
(−Y10 , −Y20 , −Y30 ) (−0.0033, 0.0015, −0.0078) (0,1,0) 1
Note that the first line corresponds to the observed data, and so
its value W = w ∗ = 5.
29 / 94
All 8 values of W in the rightmost column (5,6,2,3,4,3,0,1)

comprise the permutation distribution of W .

(Y10 , Y20 , Y30 ) (0.0033, −0.0015, 0.0078) (2,0,3) 5*
(Y10 , −Y20 , Y30 ) (0.0033, 0.0015, 0.0078) (2,1,3) 6
(Y10 , Y20 , −Y30 ) (0.0033, −0.0015, −0.0078) (2,0,0) 2
(−Y10 , Y20 , Y30 ) (−0.0033, −0.0015, 0.0078) (0,0,3) 3
(−Y10 , −Y20 , Y30 ) (−0.0033, 0.0015, 0.0078) (0,1,3) 4
(Y10 , −Y20 , −Y30 ) (0.0033, 0.0015, −0.0078) (2,1,0) 3
(−Y10 , Y20 , −Y30 ) (−0.0033, −0.0015, −0.0078) (0,0,0) 0
(−Y10 , −Y20 , −Y30 ) (−0.0033, 0.0015, −0.0078) (0,1,0) 1
30 / 94
The p-value is computed as the proportion of these permutation

distribution values that are at least as large as the observed value
w ∗ = 5. (Why?)
(Answer: the alternative hypothesis is Ha+ : θ > 0.75, so large
values of W provide evidence against H0 and in favor of Ha+ ).
In computing the p-value this way, we are measuring how extreme
the actually-observed W value, w ∗ , is with respect to all the other
values of the test statistic that under H0 , are equally valid.
31 / 94
Since exactly two of the 8 values (the first two) are at least as
large as w ∗ , the p-value= 2/8 = 0.25.

(Y10 , Y20 , Y30 ) (0.0033, −0.0015, 0.0078) (2,0,3) 5*
(Y10 , −Y20 , Y30 ) (0.0033, 0.0015, 0.0078) (2,1,3) 6
(Y10 , Y20 , −Y30 ) (0.0033, −0.0015, −0.0078) (2,0,0) 2
(−Y10 , Y20 , Y30 ) (−0.0033, −0.0015, 0.0078) (0,0,3) 3
(−Y10 , −Y20 , Y30 ) (−0.0033, 0.0015, 0.0078) (0,1,3) 4
(Y10 , −Y20 , −Y30 ) (0.0033, 0.0015, −0.0078) (2,1,0) 3
(−Y10 , Y20 , −Y30 ) (−0.0033, −0.0015, −0.0078) (0,0,0) 0
(−Y10 , −Y20 , −Y30 ) (−0.0033, 0.0015, −0.0078) (0,1,0) 1
32 / 94
Tabulated Distribution
Table A.12, p. 763 of the text provides exact values of

P0 (W ≤ w ∗ ) for n = 3, 4, . . . 20, where P0 (W ≤ w ∗ ) is the
proportion of permutation distribution values that are no larger
than the observed value w ∗ . These values are tabulated for w ∗
that might be expected to lead to rejection of H0 .
33 / 94
For this example, the only tabled value tells us that

P(W ≤ 0) = 0.125.
We also know that P(W ≤ w ∗ ) = P(W ≥ n(n + 1)/2 − w ∗ ), so,

since here n(n + 1)/2 = 3(3 + 1)/2 = 6, we have
0.125 = P(W ≤ 0) = P(W ≥ 6).
Thus, the table informs us that P(W ≥ 6) = 0.125. From this, we

can only deduce that our p-value P(W ≥ 5) > 0.125.
34 / 94
From software, we get the information that the value of the test
statistic is 2, which equals our value w ∗ minus 3(3 + 1)/4 = 3.
Thus we learn that w ∗ = 5, as we computed already by hand.
We also learn that the p-value of the two-sided test is 0.50. Since
our alternate hypothesis is Ha+ and 2 > 0, we can conclude that
the p-value for the one-sided test is 0.50/2 = 0.25, as we already
computed.
35 / 94
Large Sample Test
When H0 is true, the expected value and variance of W are
n(n + 1)/4 and n(n + 1)(2n + 1)/24, respectively. For large n, the
standardized test statistic
W − n(n + 1)/4
Z=p
n(n + 1)(2n + 1)/24
will have approximately a N(0, 1) distribution. If we standardize

the observed value of the test statistic, w ∗ , in exactly the same
way to obtain
w ∗ − n(n + 1)/4
z∗ = p ,
n(n + 1)(2n + 1)/24
we can compute the approximate p-values
p + = P(Z ≥ z ∗ ), p− = P(Z ≤ z ∗ ), p± = 2 min(p + , p− ),
where Z is assumed N(0, 1).

36 / 94
Large Sample Test
There is an adjustment to the large sample formula if there are ties

in the data. See p.541 of the text for details.
37 / 94
Example 1, Revisited Again
We return to the grinding problem, using all 150 observations this

time to illustrate the large sample test.
The computed value of W is w ∗ = 7914. R gives the p-value,
P0 (W ≥ 7914) < 0.0001 (It’s actual value is 7.22 × 10−6 .)
The normal approximation gives the standardized test statistic
7914 − (150)(151)/4
z∗ = p = 4.22.
(150)(151)(301)/24
The resulting approximate p-value is
P(Z ≥ 4.22) = 12.22 × 10−6 .
In either case, we clearly reject the null hypothesis.
38 / 94
Paired Comparisons
The sign test and the Wilcoxon signed rank test can also be used
for paired data by taking the difference of each pair of observations.
See the text, p. 535 and p. 542, for relevant examples.
39 / 94
Estimation Based on the Wilcoxon Signed Rank Test
As with the sign test, we can use the Wilcoxon signed rank test to
develop point and interval estimators for the population median, θ.
To develop a point estimator of θ, we use the following equivalent

formulation of the test statistic W when testing H0 : θ = θ0 :
The Wilcoxon signed rank statistic W is the number of the

n(n + 1)/2 averages
Yi + Yj
, 1 ≤ i ≤ j ≤ n,
2
that are greater than θ0 . These averages are called Walsh
averages.
40 / 94
The Walsh averages play the same role for the Wilcoxon signed
rank test that the original data play for the sign test: The test
statistic for the sign test is the number of observations greater
than θ0 , while the test statistic for the Wilcoxon signed rank test is
the number of Walsh averages greater than θ0 .
The same goes for estimating θ: The estimator based on the sign
test is the median of the data, while the estimator based on the
Wilcoxon signed rank test is the median of the Walsh averages.
41 / 94
A level L confidence interval for θ based on the Wilcoxon signed

rank test consists of all values θ0 for which the test of
H0 : θ = θ0
Ha + : θ 6= θ0
which p± ≥ α (again: “‘inverting the test”).
42 / 94
Let
a a a
W(1) ≤ W(2) ≤ . . . ≤ W(n(n+1)/2)
denote the ordered Walsh averages. An exact symmetric level L
confidence interval for θ is of the form
a a
(W((n(n+1)/2)−k+1) , W(k) ),
where for 5 ≤ n ≤ 25, k can be obtained from Table A.14, p. 765

of the text.
As with exact intervals based on the sign test, an exact level L

interval cannot be computed for all possible levels L because of the
discreteness of the distribution of W , but Table A.14 gives the
values closest to the most commonly used values of L.
43 / 94
For large n, we may take k to be the integer closest to

r
n(n + 1) n(n + 1)(2n + 1)
+ z 1+L .
4 2 24
a
An approximate level L interval is then (W((n(n+1)/2)−k+1) a ).
, W(k)
The software can output the sorted Walsh averages to a data set,
which makes calculation of the point estimate and confidence
inteval for the median pretty simple whether you are using Table
A.14 or the large sample approximation.
44 / 94
Example 3
Consider the data set consisting of the six values: 17, 22, 3, 20, -4,
14. A point estimator of the population median is the median of
the 6(6 + 1)/2 = 21 Walsh averages shown in the table below.
Walsh average Value Walsh average Value

(y1 + y1 )/2 17.0 (y1 + y5 )/2 6.5
(y1 + y2 )/2 19.5 (y2 + y5 )/2 9.0
(y2 + y2 )/2 22.0 (y3 + y5 )/2 −0.5
(y1 + y3 )/2 10.0 (y4 + y5 )/2 8.0
(y2 + y3 )/2 12.5 (y5 + y5 )/2 −4.0
(y3 + y3 )/2 3.0 (y1 + y6 )/2 15.5
(y1 + y4 )/2 18.5 (y2 + y6 )/2 18.0
(y2 + y4 )/2 21.0 (y3 + y6 )/2 8.5
(y3 + y4 )/2 11.5 (y4 + y6 )/2 17.0
(y4 + y4 )/2 20.0 (y5 + y6 )/2 5.0
(y6 + y6 )/2 14.0
45 / 94
The ordered observations are

a a a a
w(1) = −4.0 < w(2) = −0.5 < . . . < w(20) = 21.0 < w(21) = 22.0.
From Table A.14, we find that for L = 0.969, k = 21, and for
L = 0.937, k = 20. Therefore, we obtain a level 0.969 confidence
interval for θ as
a a a a
(w(21−21+1) , w(21) ) = (w(1) , w(21) ) = (−4.0, 22.0),
and a level 0.937 confidence interval as

a a a a
(w(21−20+1) , w(20) ) = (w(2) , w(20) ) = (−0.5, 21.0).
46 / 94
While we would not advise using the large sample interval for
n = 6, we will compute it now to illustrate the calculations. A level
0.95 interval will take
Therefore, we will round k to [ ], and the resulting interval is

the same as the level 0.937 interval computed above.
47 / 94
The Wilcoxon Rank Sum Test
When we wish to compare the central locations of two independent
populations, the Wilcoxon rank sum test is the appropriate
rank-based analogue to the two sample t test.
We assume that the i th data value from population 1 is
Y1,i = 1,i , i = 1, . . . , n1 ,
and that the i th data value from population 2 is
Y2,i = δ + 2,i , i = 1, . . . , n2 ,
where the 1,i and the 2,i are independent and have the same
continuous distribution. The parameter δ is a location shift, which
means that the distribution of the data from population 2 is the
distribution of the data from population 1 shifted δ units.
48 / 94
The null hypothesis is

H0 : δ = δ0 ,
where δ0 is a specific known value. If we take δ0 = 0, the null
hypothesis is that both populations have identical distributions.
The alternative hypothesis can be any of the one- or two-sided

alternatives:
Ha + : δ > δ0
H a− : δ < δ0
Ha± : δ 6 = δ0
49 / 94
To compute the test statistic, we follow these steps:
1. Create the adjusted observations
0
Y2,i = Y2,i − δ0 , i = 1, . . . , n2 .
2. Rank all n1 + n2 values Y1,i , i = 1, . . . , n1 and

0 , i = 1, . . . , n from smallest (rank 1) to largest (rank
Y2,i 2
n1 + n2 ). Let R2,i , i = 1, . . . , n2 , denote the resulting ranks of
0 .
the Y2,i
3. The test statistic is the sum of the ranks belonging to the
observations from population 2:
n2
X
V = R2,i .
i=1
50 / 94
As with the one sample Wilcoxon test, we will use a permutation

distribution to compute the p-value of the test.
We first note that if H0 is true, the Y1,i , i = 1, . . . , n1 , and the

0 , i = 1, . . . , n have exactly the same distribution.
Y2,i 2
51 / 94
Suppose that Ha+ is the alternative hypothesis. Then if Ha+ is true,

0 will tend to be larger than the Y , so large values of the
the Y2,i 1,i
test statistic, V , will give evidence against H0 and in favor of Ha+ .
Therefore, we can compute the p-value, p + , as the proportion of

all appropriate permutations of the data which give values of the
test statistic, V , which exceed v ∗ , the observed value of V .
If the alternative hypothesis is Ha− , the p-value, p− , is the

proportion of all appropriate permutations of the data which give
values of the test statistic, V , smaller than v ∗ . If the alternative
hypothesis is Ha± , the p-value is p± = 2 min(p + , p− ).
52 / 94
The appropriate permutations with which to compute the p-value

are all possible divisions of the ranks 1, 2, . . . , n1 + n2 into two
sets: one consisting of n1 of the ranks and the other consisting of
the remaining n2 ranks.
There are
n1 + n2 (n1 + n2 )!
=
n2 n1 !n2 !
such permutations.
53 / 94
For permutation i, the value, vi∗ , of the Wilcoxon statistic V is

calculated by summing the ranks associated with the n2
observations in the second set. The p-value p + is calculated as the
proportion of the vi∗ as great or greater than the observed value v ∗
and the p-value p− is calculated as the proportion of the vi∗ as
small or smaller than the observed value v ∗ .
54 / 94
Example 4
A company makes die-cast automotive parts. It recently replaced
two of its die-casting machines with machines using dies of a
different design. Soon after the replacement, production personnel
began to suspect that the new dies did not last as long as the old
dies. The table shows cycles to failure for four randomly-selected
dies of the new type and two randomly-selected dies of the old
type.
Cycles to
Failure Ranks
Old 9477 4
Dies 13581 6
New 7651 2
Dies 8337 3
6989 1
9568 5
55 / 94
I The Scientific Hypothesis The scientific hypothesis is

that the new dies don’t last as long as the old dies.
I The Statistical Model We will assume that the statistical
model governing the old dies come from population 1 and the
new dies from population 2. The response is the lifetime of
the die: the number of cycles until die failure.
I The Statistical Hypotheses The statistical hypotheses are
H0 : δ = 0
Ha− δ < 0.
Notice that Ha− is the scientific hypothesis.
56 / 94
I The Test Statistic The observed value of the test statistic

is the the sum of the ranks for the data from population 2:
v ∗ = 2 + 3 + 1 + 5 = 11.
Cycles to
Failure Ranks
Old 9477 4
Dies 13581 6
New 7651 2
Dies 8337 3
6989 1
9568 5
57 / 94

6
I The P-Value The table lists all = 15 possible
4
assignments of the ranks into groups 1 and 2 and the values
of the test statistic they produce.
Group 1 Group 2 V Group 1 Group 2 V
1,2 3,4,5,6 18 3,4 1,2,5,6 14
1,3 2,4,5,6 17 3,5 1,2 4,6 13
1,4 2,3,5,6 16 2,6 1,3,4,5 13
2,3 1,4,5,6 16 3,6 1,2,4,5 12
1,5 2,3,4,6 15 4,5 1,2,3,6 12
2,4 1,3,5,6 15 4,6 1,2,3,5 11*
1,6 2,3,4,5 14 5,6 1,2,3,4 10
2,5 1,3,4,6 14
The p-value of the test is the proportion of these 15 V values
at least as small as v ∗ = 11. Since there are only 2 such
values (the last two), the p-value is p− = 2/15 = 0.133.
58 / 94
Most statistical computer packages will compute the p-value for

the Wilcoxon rank sum test.
Table A.13, p. 764 of the text, which gives relevant probabilities of

P(V ≤ v ∗) for various values of v ∗ and small values of n1 and n2 ,
may also be used to compute p-values.
59 / 94
It can be shown that if H0 is true, the mean and variance of V are

n2 (n1 + n2 + 1)/2 and n1 n2 (n1 + n2 + 1)/12, respectively. If n1
and n2 are large, the standardized test statistic
V − n2 (n1 + n2 + 1)/2
Z=p ,
n1 n2 (n1 + n2 + 1)/12
has approximately a N(0, 1) distribution.
60 / 94
If we standardize the observed value of the test statistic, v ∗ , in

exactly the same way, using a continuity correction to make the
approximation better, we obtain
v ∗ − n2 (n1 + n2 + 1)/2 − 0.5

z ∗+ = p ,
n1 n2 (n1 + n2 + 1)/12
and
∗ v ∗ − n2 (n1 + n2 + 1)/2 + 0.5
z− = p ,
n1 n2 (n1 + n2 + 1)/12
we can compute the approximate p-values
p + = P(Z ≥ z ∗+ ), p− = P(Z ≤ z−
∗
), p± = 2 min(p + , p− ),
where Z is assumed N(0, 1).
61 / 94
∗ ), where
Here, the p-value is p− = P(Z ≤ z−
∗ =
z−
Then p− =
62 / 94
Ties are handled exactly as for the Wilcoxon signed rank test: by
taking average ranks. See p. 547 of the text for details.
63 / 94
Estimation Based on the Wilcoxon Rank Sum Test
We can use the Wilcoxon rank sum test to develop point and
interval estimators for the location shift, δ.
The Wilcoxon rank sum statistic V equals n2 (n2 + 1)/2 plus the
number of the n1 n2 differences
Y2,i − Y1,j , 1 ≤ i ≤ n2 , 1 ≤ j ≤ n1 ,
that are greater than δ0 .
Using reasoning similar to that used in the sign and Wilcoxon

signed rank cases, we take the point estimator of δ to be the
amount the Y2,i must be shifted so that the test statistic V
regards the two samples as coming from the same population. The
resulting estimator, δ̂, is the median of the n1 n2 differences
Y2,i − Y1,j , 1 ≤ i ≤ n2 , 1 ≤ j ≤ n1 .
64 / 94
A level L confidence interval for δ based on the Wilcoxon rank sum

test consists of all values δ0 for which the test of
H0 : δ = δ0
Ha± : δ 6= δ0
which p± ≥ α. (Again: inverting the test!)
Let D(1) ≤ D(2) ≤ . . . ≤ D(n1 n2 ) denote the differences Y2,i − Y1,j

listed in ascending order.
An exact level L confidence interval for δ is of the form

(D(n1 n2 −k+1) , D(k) ), where for 5 ≤ n1 , n2 ≤ 12, k can be obtained
from Table A.15, p. 766 of the text.
65 / 94
As with exact intervals based on the sign test and Wilcoxon signed
rank tests, an exact level L interval cannot be computed for all
desired levels L because of the discreteness of the distribution of
D, but Table A.15 gives the values closest to the most commonly
used values of L.
For large n1 and n2 , we may take k in the confidence interval

formula (D(n1 n2 −k+1) , D(k) ) to be the integer closest to
r
n1 n2 n1 n2 (n1 + n2 + 1)
+ z 1+L ,
2 2 12
.
66 / 94
Consider again the cycles to failure of the six dies. We want to
estimate the difference in location, δ, between the population
distributions of the cycles to failure of the old and the new dies.
To do this, we form the differences
7651 − 9477 = −1826 7651 − 13581 = −5930

8337 − 9477 = −1140 8337 − 13581 = −5244
6989 − 9477 = −2488 6989 − 13581 = −6592
9568 − 9477 = 91 9568 − 13581 = −4013
The point estimator of δ is
δ̂ = median(−6592, −5930, −5244, −4013, −2488, −1826, −1140, 91)

= (−4013 + (−2488))/2 = −3250.5.
67 / 94
Looking at Table A.15, we find that the value of the constant

k = 8 is associated with a confidence level L = 0.866. Therefore, a
level 0.886 confidence interval for δ is
(d((2)(4)−8+1) , d(8) ) = (d(1) , d(8) ) = (−6592, 91).
While we would not advise using the large sample interval for
n1 = 2 and n2 = 4, we will compute it now to illustrate the
calculations. A level 0.95 interval will take
r
(2)(4) (2)(4)(2 + 4 + 1)
k= + z0.975 = 8.2,
2 12
Therefore, we will round k to 8, and the resulting interval is the
same as the level 0.866 interval computed above.
68 / 94
Spearman Correlation
There are at least two problems with the Pearson correlation as a

measure of the association between two variables:
I It only measures the strength of the linear association
between the variables.
I It is not resistant to outliers.
Spearman’s rho, also called the Spearman rank correlation
coefficient, is a measure of association which helps remedy these
two problems. It is also very easy to compute. Instead of dealing
with the values of the original two variables, X and Y , we form
their ranks RX and RY . Spearman’s rho, rs , is just the Pearson
correlation between RX and RY .
69 / 94
Example 5
Recall data on fuel consumption versus equivalence ratio,
considered in Chapter 7:
Ranks of Ranks of
Fuel Equivalence Fuel Equivalence
Consumption Ratio Consumption Ratio
98.0 0.64 1 1
100.0 0.65 2 2
100.1 0.66 3 3
102.0 0.74 6 4
101.0 0.75 4 5
103.0 0.77 7 7
103.2 0.76 8 6
101.9 0.80 5 8
104.0 0.81 9 9
105.0 0.88 10 10
105.5 0.90 11 11
105.6 0.91 12 12
106.0 0.92 13 13
110.0 1.00 14 14
111.0 1.02 15 15
115.0 1.04 16 16
121.5 1.14 17 17
123.5 1.16 18 18
136.0 1.24 19 19
70 / 94
The scatterplots below show fuel consumption versus equivalence
ratio (left) and of the ranks of fuel consumption versus the ranks
of equivalence ratio (right) for these data.
R
F a
u 130 n
e k 15
l
o
C f
o 120
n f 10
s u
u e
m l
p 110
t c
i o 5
o n
n s
100 u
0.8 1.0 1.2 5 10 15

Equivalence Ratio Rank of equivalence ratio
71 / 94
The first plot shows the association between fuel consumption and
equivalence ratio to be nonlinear, and hence the Pearson
correlation, which equals a respectable 0.9332, is not the most
appropriate summary of that association. The right scatterplot
shows a more nearly linear association for the ranks of the
variables. The stronger linear association is reflected in the higher
Pearson correlation, 0.9842, between the ranks of fuel consumption
and equivalence ratio. This Pearson correlation between the ranks
is exactly the Spearman rank correlation.
72 / 94
To illustrate the effect of outliers on both the Pearson and
Spearman correlations, we added an outlier to the data. The added
observation has a fuel consumption of 95 and an equivalence ratio
of 2.1. The resulting outlier can be seen in the lower right corner
of the left scatterplot. The right scatterplot of the ranks shows
that this data value is still an outlier, but is less extreme.
20
R
a
F 130 n
u k
e 15
l o
f
C 120
o F
n u
s e
l 10
u
m 110
p C
t o
i n
o s 5
n u
100 m
p
t
1.0 1.5 2.0 5 10 15 20

Equivalence Ratio Rank of Equivalence Ratio
73 / 94
The outlier greatly affects the Pearson correlation of fuel

consumption and equivalence ratio, reducing it from 0.9332 to
0.2319. In contrast, the outlier has less effect on the Spearman
correlation, which declines from 0.9842 to 0.7008. Because of the
problems Pearson correlation has with outliers and nonlinearity, we
recommend computing both the Pearson and Spearman correlation
routinely. Widely different values serve as a warning to look more
closely at the data.
74 / 94
Recall that the Pearson correlation measures the strength of linear

association between two variables in a set of data. On the other
hand, the Spearman correlation measures the strength of
monotone association: that is, the linear association between the
ranks of the data values. The plots on the next slide illustrate
what this means.
75 / 94
R
Y Y
1 1
X RX
R
Y Y
2 2
X RX
R
Y Y
3 3
X RX
Figure: Examples of, top: positive monotone association (left), and

corresponding ranks (right); middle: positive monotone association (left),
and corresponding ranks (right); bottom: nonlinear non-monotone
association (left), and corresponding ranks (right). The Pearson
correlations are (top to bottom): 0.9183, −0.9373, and 0.1006, while the
Spearman correlations are 1, −1 and 0.1693, respectively.
76 / 94
If the data are a random sample from a larger population, we may

want to conduct a test of
H0 : no monotone association between X and Y in the population.
versus
Ha+ : positive monotone association

Ha− : negative monotone association
Ha± : nonzero monotone association.
77 / 94
Example 5.5
As with other rank-based procedures, we can base the test on a

permutation distribution.
To illustrate, suppose we have the following data:
X 2.3 1.7 4.4 0.5

Y 3.7 1.2 2.2 1.7
The Spearman correlation is 0.6
78 / 94
Example 5.5, Continued
Here is the permutation distribution of the ranks:
RX: 1 2 3 4
Spearman
Permutation RY values correlation
1 4 3 2 1 −1.0
2 3 4 2 1 −0.8
3 4 2 3 1 −0.8
4 4 3 1 2 −0.8
5 3 4 1 2 −0.6
6 4 2 1 3 −0.4
7 2 4 3 1 −0.4
8 4 1 3 2 −0.4
9 3 2 4 1 −0.4
10 2 3 4 1 −0.2
11 4 1 2 3 −0.2
12 2 4 1 3 0.0
13 3 1 4 2 0.0
14 1 4 3 2 0.2
15 3 2 1 4 0.2
16 2 3 1 4 0.4
17 3 1 2 4 0.4
18 1 3 4 2 0.4
19 1 4 2 3 0.4
20 2 1 4 3 0.6*
21 1 3 2 4 0.8
22 1 2 4 3 0.8
23 2 1 3 4 0.8
24 1 2 3 4 1.0
79 / 94
Example 5.5, Continued
Alternative Hypothesis p-value
Ha + p+ =
The p values are:
H a− p− =
Ha± p± =
80 / 94
Table A.10, p. 761 of the text gives values of p + for use in testing
monotone association for samples of size 10 and less. For larger
samples,
q an approximate test of may be obtained from the fact
n−2
that rs 1−r 2 has approximately a tn−2 distribution under the
s
assumption of no monotone association.
81 / 94
Example 5.75
Consider again the data on fuel consumption (FC) and equivalence

ratio (ER). Since there are n = 19 observations, an approximate
test of
H0 : No monotone association between FC and ER.
Ha + : Positive monotone association between FC and ER.
is obtained by finding the proportion of a t17 population that
exceeds the observed value of the test statistic:
s r
∗ n−2 14
t = rs 2
= 0.9842 = 22.919.
1 − rs 1 − 0.98422
This proportion, which is the p-value, equals 1.6 × 10−14 , which is

very strong evidence of a positive monotone association.
82 / 94
The Kruskal-Wallis Test
Recall the one-way effects model described in Chapter 9:
Yij = µ + τi + ij , j = 1 . . . , ni , i = 1, . . . , k,
where Yij is the j th response from population i, µ is an overall

location measure, τi is the effect due to population i, and ij is the
random error associated with observation Yij .
As in Chapter 9, we assume that the τi sum to zero, and that the

ij are independent random variables having the same distribution.
However, we do not assume the ij have a normal distribution. We
assume only that they have the same continuous distribution.
83 / 94
To test for differences in population effects τi , the appropriate

rank-based analogue of the F test of Chapter 9 is the
Kruskal-Wallis Test.
We summarize the Kruskall-Wallis procedure as follows:

I The Statistical Model We assume data from one-way
effects model.
I The Statistical Hypotheses The hypotheses to be tested
are
H0 : τ1 = τ2 = · · · = τk = 0
Ha : Not all the population effects τi are 0.
84 / 94
I The Test Statistic To compute the test statistic, follow
these steps:
Pk
1. Rank all n = i=1 ni observations, Yij . Let Rij denote the
rank of Yij .
2. Compute
ni ni k ni
X 1 X 1 XX n+1
Ri· = Rij , R i· = Rij , R ·· = Rij = .
ni n 2
j=1 j=1 i=1 j=1
3. The test statistic H is

k X R2 k
12 X 12
H= ni (R i· − R ·· )2 = i·
− 3(n + 1).
n(n + 1) n(n + 1) ni
i=1 i=1
Pk
In computing H, note that the quantity i=1 ni (R i· − R ·· )2 is
the model sum of squares found in the ANOVA table obtained
from the one-way model when the responses are replaced by
their ranks.
85 / 94
I The P-Value Let h∗ denote the observed value of the test

statistic H. The p-value is p = P0 (H ≥ h∗ ), where the
notation P0 signifies that the probability is computed under
the assumption that H0 is true.
For small k and ni , tables of critical values of H have been
developed, but they are rather extensive and we do not
present them here. Rather, we take two different approaches:
86 / 94
o Exact p-Values We can use the computer to calculate exact

p-values.
o Large Sample Approximation When sample sizes are large
(ni ≥ 5 for all i is a common rule of thumb), the distribution
of H under the null hypothesis can be approximated by a χ2
distribution with k − 1 degrees of freedom. This means that
an approximate p-value is P(χ2k−1 ≥ h∗ ), the area under a
χ2k−1 density at or above the observed value of the test
statistic.
87 / 94
It is generally assumed that the observations Yij are from a

continuous distribution model, which implies that there should be
no ties (i.e., no two Yij should be equal). However, ties do occur in
practice. When ties do occur, the usual approach is to use average
ranks, as described earlier.
When using the large sample test when there are ties, it is common
practice to use a modified test statistic H 0 . The formula is found in
the text.
88 / 94
Example 6
Mucociliary clearance is a mechanism to remove foreign matter

from the respiratory system. A study compared the mucociliary
clearance of three groups of subjects: one with normal respiratory
function, one with obstructive airways disease (OAD), and one
with asbestosis (ASBT). the measured response was the half-time
of mucociliary clearance (HTMC). The data, found in MUCO, are:
89 / 94
Group HTMC, Yij Ranks of Yij

normal 2.9 8
normal 3.0 9
normal 2.5 4
normal 2.6 5
normal 3.2 10
OAD 3.8 13
OAD 2.7 6
OAD 4.0 14
OAD 2.4 3
ASBT 2.8 7
ASBT 3.4 11
ASBT 3.7 12
ASBT 2.2 2
ASBT 2.0 1
90 / 94
I The Scientific Hypothesis The scientific hypothesis is

that the groups differ in mucociliary clearance function.
I The Statistical Model We assume data from the one-way
model with k = 3 populations corresponding to the three
groups.
I The Statistical Hypotheses The hypotheses to be tested
are
H0 : τ1 = τ2 = τ3 = 0
Ha : Not all the population effects τi are 0.
91 / 94
I The Test Statistic If we denote the normal, OAD and

ASBT groups as populations 1, 2 and 3, respectively, then
n1 = 5, n2 = 4, n3 = 5, n = 14 and
R1· = 8 + 9 + 4 + 5 + 10 = 36; R2· = 13 + 6 + 14 + 3 = 36;
R3· = 7 + 11 + 12 + 2 + 1 = 33,
so that
2
362 332

12 36
H = + + − 3(14 + 1)
(14)(15) 5 4 5
= 0.771
92 / 94
I The P-Value The p-value of this test equals 0.71. The

exact value can be computed using the software.
For illustration purposes, we will use the large-sample

approximation here, even though not all samples have sample
size greater than 5. The approximate p value is
P(χ22 ≥ 0.771) = 0.680. Based on this result, we do not
reject the null hypothesis: there is insufficient evidence to
conclude that one or more of the τi differ from 0.
93 / 94
Summary: Distribution-Free Inference Procedures
Data Type Inference about Inference based on

Single population, Median Sign test
Arbitrary shape
Single population, Median Wilcoxon signed rank
Symmetric test
Two populations, Location shift Wilcoxon rank sum
Same shape test
Multiple populations, Population effects Kruskal-Wallis test
Same shape (Location shifts)
X-Y data Monotone association Spearman correlation
94 / 94

Applied Statistics II Chapter 11 Distribution-Free Inference

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Applied Statistics II Chapter 11 Distribution-Free Inference

Uploaded by

Copyright:

Available Formats

Applied Statistics II

Chapter 11 Distribution-Free Inference

WPI Mathematical Sciences

The Sign Test

Though they do perform well under normal circumstances, classical

In this chapter we present some inference methods which were

As we did in Chapters 5 and 6, we begin by assuming that the data

It is reported that a variant of the sign test was used by

I The Statistical Model We assume data Y1 , . . . , Yn from a

I The Statistical Hypotheses H0 : θ = θ0 , versus one of the

Thus, in order to compute the p-value, we need to know:

I The Distribution of B When H0 is True: If H0 is true, the

We know this distribution very well: b(n, 0.5).

I What Is Evidence Against the Null and in Favor of the

To obtain an approximate p-value using the normal approximation

p + = P(B ≥ b ∗ ) = P(B ≥ b ∗ − 0.5)

Similarly, we may approximate p− by P(Z ≤ zl∗ ), where

One stage of a manufacturing process involves a

I The Scientific Hypothesis The grinding machine operators

I The P-Value The p-value is P(B ≥ 93) = 0.0021, where

P(Z ≥ 2.86) = 0.0021.

Result: reject H0 in favor of Ha+ and conclude that more than

A point estimator for the population median θ, can be obtained by

If θ = θ0 , the distribution of the test statistic B is symmetric about

A level L confidence interval for θ based on the sign test consists

In a practical sense, an exact level L interval can seldom be

1. Find an integer k so that the value

The median selling prices of 11 randomly-selected residential

Using these data, we estimate the median house price of residential

What we find is that 2P(X ≤ 1) = 0.0117 and

k (y(n−k+1) , y(k) ) confidence level

The large sample solution (though

As their title implies, rank-based procedures make use of the ranks

For example, here’s how we assign ranks to the set of observations

I The Statistical Model We assume data Y1 , . . . , Yn from a

I The Statistical Hypotheses H0 : θ = θ0 , versus one of the

I The Test Statistic To compute the test statistic, follow

I The P-Value Let w ∗ denote the observed value of the test

One stage of a manufacturing process involves a

To illustrate the ideas, we will assume their random sample

They are willing to assume the distribution of ground diameters is

Here are the computations for obtaining w ∗ , the observed value of

The p-value is computed from the permutation distribution of

Here is the idea behind the permutation distribution:

I If H0 is true, the Yi0 are independent and from the same

Signs Assumed Yi0 Contribution to W W

All 8 values of W in the rightmost column (5,6,2,3,4,3,0,1)

Signs Assumed Yi0 Contribution to W W

The p-value is computed as the proportion of these permutation

Signs Assumed Yi0 Contribution to W W

Table A.12, p. 763 of the text provides exact values of

For this example, the only tabled value tells us that

We also know that P(W ≤ w ∗ ) = P(W ≥ n(n + 1)/2 − w ∗ ), so,

Thus, the table informs us that P(W ≥ 6) = 0.125. From this, we

will have approximately a N(0, 1) distribution. If we standardize

p + = P(Z ≥ z ∗ ), p− = P(Z ≤ z ∗ ), p± = 2 min(p + , p− ),

where Z is assumed N(0, 1).

There is an adjustment to the large sample formula if there are ties

We return to the grinding problem, using all 150 observations this

The resulting approximate p-value is

P(Z ≥ 4.22) = 12.22 × 10−6 .

In either case, we clearly reject the null hypothesis.

To develop a point estimator of θ, we use the following equivalent