You are on page 1of 10

Nonparametric Statistics

Nonparametric statistics involve less demanding assumptions about the distri-


bution of the data than do standard statistical tests; in particular, nonparamet-
ric tests do not rely on assumptions of normality. Many nonparametric tests use
ranks rather than values; when this is the case, 3S automatically converts any
scores or measurements into ranks. The statistics computed by 3S are appropri-
ate for four different problems:
Two or more independent groups. The Mann-Whitney rank-sum test and
the Kruskal-Wallis one-way analysis of variance prOvide tests of the null
hypothesis that independent samples from two or more groups come from
identical populations. Multiple comparisons are available for the Kruskal-
Wallis test. .
Paired observations. The sign test and Wilcoxon Signed-rank test both test
the hypothesis of no difference between paired observations.
Randomized blocks. The Friedman two-way analysis of variance is the
nonparametric equivalent of a two-way ANOVA with one observation per
cell or a repeated measures design with a single group. Multiple compar-
isons are available for the Friedman test. Kendall's coefficient of con-
cordance is a normalization of the Friedman statistic.
Rank correlations. The Kendall and Spearman rank correlations estimate the
correlation between two variables based on the ranks of the observations.
These statistics are discussed in many texts, including Siegel (1956), Hollander
and Wolfe (1973), Conover (1980), and Lehmann (1975). Each of these nonpara-
metric statistics has a parallel parametric test. The Kruskal-Wallis test corre-
sponds to a one-way analysis of variance (see 7D or IV). The sign and Wilcoxon
tests correspond to the paired t test, and the Mann-Whitney test corresponds to
the pooled variance two-sample t test (see also 3D for sign, Wilcoxon, and
Mann-Whitney tests in addition to t tests). The rank correlations have a parallel
in the usual Pearson product-moment correlation coefficient computed by 3D.
Several nonparametric tests and measures, including the Kendall and
Spearman rank correlation coefficients, are also available in 4E
Except for dropping the assumption that the data are normally distributed, the
nonparametric test statistics have assumptions similar to their parametric coun-
terparts. For example, the Mann-Whitney test and the pooled two-sample t
both assume that the samples are obtained from distributions that are identical

457
3S Nonparametric Statistics

(have the same shape when plotted) under the null hypothesis. In the Mann-
Whitney test and the two-sample t test, the actual probability of rejecting the
null hypothesis when it is true depends on the ratio of the variances of the two
groups (Pratt, 1964).

Where to find it Examples


35.1 The Mann-Whitney rank-sum test ............................................... ..458
35.2 The Kruskal-Wallis test and multiple comparisons ...................459
35.3 The sign test and Wilcoxon signed-rank test ............................... 461
35.4 Friedman's two-way analysis of variance and
Kendall's coefficient of concordance .............................................462
35.5 Kendall and 5pearman rank correlations .................................... 463
Special Features
Using all available data: NO DEL CASE ................................................. 465
3S Commands ...................................................................................................465
Order of Instructions .....................................................................................466
Summary Table ................................................................................................ 466

Example 38.1 The Mann-Whitney (Wilcoxon) rank-sum test is a nonparametric analog of the
two-sample t test for independent samples. The Mann-Whitney statistic, which
The Mann-Whitney is also reported by 3D, is computed whenever the Kruskal-Wallis test is request-
rank-sum test ed for data with two g:roups. The Kruskal-Wallis test for more than two groups
is discussed in Example 35.2. These statistics are explained in Appendix B.IS.
In Example 35.1 we analyze the Exercise data described in Chapter 2 and
Example 3D.2. The data are stored on disk in a file named EXERCISE.DAT. We
test whether PULSE_2, pulse rate after exercise, differs significantly between
smokers and nonsmokers. The data file has a case whose PULSE_2 is erroneous
(265 instead of 165). The impact of this outlier is lessened when we use this
method, which is based on ranks and not on exact values.
The INPUT, VARIABLE, and GROUP paragraphs in Input 35.1 are common to all
BMDP programs (see Chapters 3, 4, and 5). The FILE command tells the program
where to find the data and is used for systems like IBM PC and VAX. (For IBM
mainframes, see UNIT, Chapter 3.)
A GROUPING variable must be specified for the Mann-Whitney test. In this
example we specify our grouping variable (SMOKE) in the GROUP paragraph,
and then use CODES and NAMES to identify the values of the variable. The TEST
paragraph is required in 35; here we use it to request the Kruskal-Wallis test
(KRUSKAL). As noted above, we will also get results for the Mann-Whitney test.

Input 38.1 / INPUT FILE IS 'exercise.dat'.


VARIABLES = 6.
FORMAT IS FREE.
/ VARIABLE NAMES = id, sex, smoke, age, pulse_1, pulse_2.
/ GROUP VARIABLE = smoke.
LABEL = id.
CODES(smoke) 1, 2.
NAMES(smoke) smoke, nosmoke.

458
Nonparametric Statistics 3S

Input 38.1 / TEST VARIABLE


(continued) KRUSKAL.
/ END

Output 38.1

[1] 35 reads 40 cases. Only complete cases are used in the computations; i.e.,
cases that have no values missing or out of range. All variables are checked
for acceptable values unless you specify a USE list in the VARIABLE para-
graph, in which case only the variables in the USE list are checked. 5ee
Example 35.2 for how to use all available data for each test request.
[2] 35 prints descriptive statistics for each variable except the designated
LABEL variable:
mean
standard deviation
minimum observed value (not out of range)
maximum observed value (not out of range)
[3] For each variable specified, 35 reports the sample size (frequency) and sum
of ranks by subgroup, the Kruskal-Wallis test statistic, and the level of sig-
nificance. If VARIABLE is not specified in the TEST paragraph, then the
results are shown for all variables other than the LABEL and GROUPING
variables. Here there is no significant difference in post-exercise pulse val-
ues between smokers and nonsmokers.
[4] When there are two groups, 35 computes the Mann-Whitney (Wilcoxon)
rank-sum test statistic and its level of Significance, which coincides with
that of the Kruskal-Wallis test statistic. See Appendix B.18 for more about
significance levels.

Example 38.2 We use the Werner blood chemistry data (Appendix D) to illustrate the
Kruskal-Wallis statistic for more than two groups. We are testing whether
The Kruskal-Wallis cholesterol values for women in four different age groups come from identical
test and multiple populations. To classify the data into four AGE groups, we use the GROUP para-
graph with CODES and CUTPOINTS specified.
comparisons

459
3S Nonparametric Statistics

We specify COMPARE in the TEST paragraph to request multiple comparisons;


that is, 35 will compare every possible pair of groups. The Werner data consist of
188 cases; cholesterol and age values are present for every case, but there are
missing values (marked by asterisks in the file) for height, weight, albumin, uric
acid, and calcium. 35 normally deletes cases with missing values from all compu-
tations. We specify NO DELCASE to use all available data for each comparison;
that is, all cases with acceptable values for AGE and CHOLSTRL. See Special
Features for more about NO DELCASE. Because we obtain statistics based on
ranks rather than actual values, we do not specify range limits for cholesterol.

Input 3S.2 / I NPUT FILE = 'werner.dat'.


VARIABLES = 9.
FORMAT = FREE.
NO DELCASE.
/ VARIABLE NAMES id. age. height. weight. brthpill.
cholstrl. albumin. calcium. uricacid.
LABEL i d.

/ GROUP VARIABLE = age.


CUTPOINTS(age) = 25. 35. 45.
NAMES(age) = under_26. _26to35. _36to45.
over_45.
/ TEST VARIABLE chol strl .
COMPARE.
KRUSKAL.
/ END

Output3S.2

[1] 35 reads 188 cases. In this example, 35 eliminates cases only when data
needed for the particular test (i.e., AGE and CHOLSTRL) are missing or out
of range. If NO DELCASE were omitted, seven cases would be eliminated

460
Nonparametric Statistics 3S

because of missing values for HEIGHT, WEIGHT, ALBUMIN, URICACID, and


CALCIUM. All calculations would use the remaining 181 complete cases.
[2] For each variable specified, 35 reports the frequency and sum of ranks by
subgroup, the Kruskal-Wallis test statistic, and the level of significance. If
no variables are specified, then the results are shown for all variables other
than the LABEL and GROUPING variables. The differences in cholesterol
values for the four groups are Significant (p < .0005).
[3] The COMPARE command generates multiple comparisons of all possible
pairs of groups and reports the z values for a two-tailed test. In this example,
the under_26 group differs significantly at the .05 level from both the 36t045
and the over_45 groups (ZSTAT values of 2.92 and 3.92 are greater than the
critical value of 2.64). See Appendix B.18 for multiple comparison formulas.

Example 38.3 We use 35 to repeat the SIGN test and Wilcoxon signed-rank test performed in
Example 3D.5 for the Exercise data. The hypothesis for both tests is that there is
The sign test and no difference between matched variables or paired observations. In Input 35.3,
Wilcoxon signed- the matched variables are PULSE_l and PULSE_2, pulse rate before and after
exercise. We test whether PULSE_l differs significantly from PULSE_2. 35 auto-
rank test matically converts differences between PULSE_l and PULSE_2 into ranks for the
Wilcoxon test.

Input 38.3 / INPUT FILE IS 'exercise.dat'.


VARIABLES = 6.
FORMAT I S FREE.
/ VARIABLE NAMES = id. sex, smoke, age, pulse_I, pulse_2.
/ TEST VARIABLES = pulse_I, pulse_2.
SIGN.
WILCOXON.
/ END

Output 38.3

461
3S Nonparametric Statistics

[1] 3S computes the sign test for each pair of variables in the TEST paragraph
VARIABLES list. We look at the flagged values. For each case, the total num-
ber of nonzero differences between paired PULSE_l and PULSE_2 values (40
here) is printed in the first panel of results for the sign test. The n~ber of
positive differences appears in the second panel. Here all 40 cases showed
an increase in pulse rate after exercise, so there are no positive differences.
The third panel reports the level of significance of the sign test correspond-
ing to a two-sided test of the hypothesis that the + and - signs of the differ-
ences are equally probable (each sign has probability 0.5). See Appendix
B.18 for more information on significance levels.
[2] 3S computes the Wilcoxon signed-rank-test for each pair of variables. The
first panel of results lists the number of nonzero differences. The second
panel gives the value of the smaller of the sum of ranks for positive differ-
ences and the sum of ranks for negative differences. In the third panel 3S
reports the level of significance of the Wilcoxon signed rank test for a two-
sided test of the hypothesis that the populations have the same location
parameter. See Appendix B.I8.

Example 3S.4 We analyze corrected data from Siegel (1956, p. 233; see Data Set 35.1), using
Friedman's two-way analysis of variance and the Kendall coefficient of concor-
Friedman's two-way dance. The Friedman test is an extension of the sign test to more than two
analysis of variance matched or paired variables. This arrangement of data is known as a random-
ized block design. The rows are the blocks, and the columns are the treatments.
and Kendall's coeffi- Blocks are formed using matched samples or repeated measures (as here). The
cient of concordance null hypothesis is that of no treatment differences (the alternative hypotheses
relate to differences in location).
In this example, the data in each row are the relative ranks (from 1 to 20)
assigned by staff psychologists and speech therapists to 20 mothers based on
effectiveness of child rearing. If the data were scores, 3S would convert them to
ranks. We test whether there is no difference among the ranks of the mothers.

462
Nonparametric Statistics 3S

Data Set 3S. 1 M 0 t h e r s


4 8 9 10 11 12 13 14 15 16 17 18 19 20
Siegel data
1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
2 5 1 16 8 9 2 6 10 4 3 11 13 7 12 17 18 19 15 14 20
3 3 2 7 5 14 9 15 16 6 11 8 10 1 4 19 12 20 13 17 18
4 8 3 10 11 4 2 5 13 9 1 14 7 6 15 16 12 19 17 18 20
5 2 1 16 8 15 4 6 9 7 10 11. 5 5 3 17 11.5 14 19 18 13 20
6 16 17 5 13 15 11 7 4 9 2 18 3 6 1 19 12 10 8 14 20
7 12 9 14 6 7 2 3 10 5 4 17 8 1 15 13 16 18 11 20 19
8 11 2 13 10 7 3 4 14 6 5 17 9 1 12 8 16 20 15 18 19
9 9. 5 2 15 6 5 7 8 11 9.5 3 13 4 1 14 12 15 20 19 17 18
10 2 4 16 3 10 6 14 17 15 7 19 9 1 8 5 13 11 18 12 20
11 11 14 12 8 7 2 5 10 3 4 13 9 1 18 6 15 19 16 17 20
12
13
8 -
3
13
13
3
2
5
8
2
1
14
9
9
12
6
4
10
6
15
14
11
10
19
11
4
7
7
15
12
18
18
16
17
17
16
19
20
20
Note: Each case contains the rankings of one judge for all twenty mothers. The identification number of the
judge is not recorded in the file.

Input 3S.4 I I NPUT FI LE = 'siegel .dat'.


VARIABLES = 20.
FORMAT = FREE.
I TEST FRIEDMAN.
I END

Output 3S.4

[1] For each case 3S ranks the observations or scores for each variable (moth-
er). A case corresponds to a judge or test. For each variable 3S prints the
sum of the ranks. Since variable names were not included in the input, the
variables are labeled X(1) through X(20).
[2] 3S next reports the value of the Friedman test statistic and its level of signif-
icance. Here the Friedman statistic is significant, suggesting consistent dif-
ferences in child rearing effectiveness between mothers. We could use
COMPARE to determine which pairs of mothers differ significantly. See
Appendix B.I8.
[3] The Kendall coefficient of concordance is a normalization of the Friedman
statistic and has the same level of significance. The Kendall coefficient can
range from 0 to 1.

Example 3S.5 The Kendall and Spearman correlations estimate the association between two
variables based on the ranks of the observations. They are appropriate for data
Kendall and Spearman whose observations can be ranked, whether or not an exact numerical value
rank correlations can be assigned. The two correlations are equally powerful, but are scaled dif-
ferently. The Spearman correlation coefficient and level of significance for
matched data are also provided by 3D (see Example 3D.5). When the variables

463
3S Nonparametric Statistics

are categorical you may use 4F, which also computes standard errors for the
correlations.
We use the Werner blood chemistry data (Appendix D) to illustrate the Kendall
and Spearman rank correlations.

Input 38.5 / INPUT FILE = 'werner.dat'.


VARIABLES = 9.
FORMAT = FREE.
/ VARIABLE NAMES = id , age, height, weight, brthpill,
cholstrl, albumin, calcium, uricacid.
MAXIMUM = (chol strl )400.
MIN IMUM = (chol strl )150.
LABEL = id.
/ TEST KENDALL.
SPEARMAN.
/ END

Output 38.5

[1] 3S reads 188 cases, of which 180 are complete. All correlations are based on
those 180 pairs. To use all data for each pair, state NO DELCASE. If there is a
considerable amount of missing data, see program AM, which has several
options for analyzing incomplete data.
[2] 3S calculates the Kendall rank correlation coefficients for all possible pairs
of variables and reports the coefficients in matrix format.
[3] The Spearman rank correlation coefficients are printed in the same format
as the Kendall statistics.

464
Nonparametric Statistics 3S

Special Features

Using all available data: When NO DELCASE is in effect, each test eliminates only cases with missing or
NO DELCASE out of range values for variables needed for that test. When you are performing
tests on a number of variables, NO DELCASE maximizes the number of cases
available for each test, but this means that all tests may not be based on the
same cases. This differs from a VARIABLE USE list: a USE list includes all vari-
ables being tested in a problem, and only uses cases with acceptable values for
all these variables. If NO DELCASE is used with KENDALL and SPEARMAN cor-
relations, the correlations may be based on varying numbers and combinations
of pairs of variables, reducing your ability to compare levels of correlation.

3S I INPut

Commands The INPUT paragraph is required for the first problem in each run. It is
described in detail in Chapter 3. An additional command for 35 is DELCASE.
DELCASE. NO DELCASE.
State NO DELCASE if you want to use all available data for each variable tested.
By default, 35 uses only cases complete for all variables (no values are missing
or outside any specified range limits).

I GROUP
See Chapter 5 for a description of GROUP commands.
New Syntax VARiable =variable. VAR = BRTHPILL.
Required when KRUSKAL is specified in the TEST paragraph. State the name or
number of a variable used to classify the cases into groups. If you prefer, you
can still specify a grouping variable with the GROUPING command in the VARI-
ABLE paragraph as described in the 1990 BMDP Manual. If the grouping variable
takes on more than ten distinct values or codes, CODES or CUTPOINTS must be
specified in the GROUP paragraph (see Chapter 5).

I TEST
The TEST paragraph is required to specify the statistics to compute. It may be
repeated after END for additional analyses of the same data.
WiLcoxon. - Wilcoxon signed-rank test
KRUskal. - Kruskal-Wallis one-way analysis of variance and Mann-
Whitney rank-sum test
SIGN. - Sign test
FRIEDman. - Friedman's two-way analysis of variance and Kendall's coeffi-
cient of concordance
KENDall. - Kendall rank correlation, 't'b

SPEARman. - Spearman rank correlation, rs


New Feature KS. - Kolmogorov-Smirnov test
Direct 35 to compute the specified tests. When KRUSKAL is specified, the other
statistics are not computed, even if specified.
VARiables =list. VAR = CHO LSTRL, AGE.
List the names or numbers of variables to be included in the analysis. By

465
3S Nonparametric Statistics

default, 35 uses all variables except the GROUP and LABEL variables.
COMPare. COMPo
Use COMPARE with KRUSKAL or FRIEDMAN to obtain multiple comparisons
for the Kruskal-Wallis and Friedman tests. 35 will compare every possible pair
of groups.
TITLE='texf'. TITLE = 'PRE VERSUS POST'.
Specify a title to print at the top of each output page. By default, no title is
printed.

Order of Instructions indicates required paragraph


I INPUT
I VARIABLE
I GROUP
I TRANSFORM
I SAVE Repeat for additional problems.
I TEST See Multiple Problems, Chapter 10.
I PRINT
lEND
data
I TE5TI Repeat for
I E~ subproblems.

Summary Table for Commands Specific to 3S


Paragraphs Multiple
Commands Defaults Problems See

I INPut
NODELCASE. DELCASE. 35.1

... I GROUP
... VARiable =variable. no grouping variable; 35.1

.
...
/ TE5T
VARiables =list.
required for KRUSKAL

no tests performed 35.1


KRUskal. NOKRU. 35.1
COMPare. NOCOMP. 35.2
SIGN. NO SIGN. 35.3
WILcoxon. NOWIL. 35.3
FRIEDman. NO FRIED. 35.4
KENDall. NOKEND. 35.5
SPEARman. NO SPEAR. 3S.5
KS.
TITLE ='text'.
NOKS.
no title printed
- Cmds
Cmds

Key: ... Required paragraph


Frequently used paragraph or command

-
Value retained for multiple problems
Default reassigned

466