You are on page 1of 42

Non Parametric Testing

Dr. Jayesh N. Desai


BRCM College of Business Administration,
Athwalines, Surat. desaijn@yahoo.com,
principal@brcmbba.org

Hypothesis Testing for Statistical
Inference
Inferences about a population are made on
the basis of results obtained from a sample
drawn from that population

Want to talk about the larger population from
which the subjects are drawn, not the
particular subjects!
What Do We Test
Effect or Difference we are interested in
Difference in Means
Difference in Proportions
Odds Ratio (OR)
Correlation Coefficient
Some examples
Effect of Advertisement or Sales Promotion
program
Acceptance of product across gender
Acceptance of product across region
Elements of a hypothesis test
Null hypothesis - Statement regarding the
value(s) of unknown parameter(s). Typically
will imply no association between explanatory
and response variables in our applications
(will always contain an equality)
Alternative hypothesis - Statement
contradictory to the null hypothesis (will
always contain an inequality)

H
0
:
1
=
2

H
A
:
1

2

Two-sided test
H
A
:
1
>
2

One-sided test

Example Hypotheses
Elements of a hypothesis test
Test statistic - Quantity based on sample
data and null hypothesis used to test
between null and alternative hypotheses
Rejection region - Values of the test statistic
for which we reject the null in favour of the
alternative hypothesis

Why Use Nonparametric Tests?
Parametric hypothesis tests can be used for the
estimation of one or more unknown parameters
(e.g., population mean or variance).

Parametric tests depends on population
parameters for inferences (which are often
unrealistic)
Probability distribution (Normal distribution)
Requires normal or ration data
Requires homogeneity of variance
Large sample size
McGraw-Hill/Irwin 2007 The McGraw-Hill Companies, Inc. All rights reserved.
Parametric Tests
Questionnaire
1. Name:
2. Age:_________ Years
3. Gender: Male ( ) Female ( )
4. Income: ______________________
5. Educational Qualification:
( ) B. Com./BBA/ BCA ( ) B. Sc.
( ) B.A. ( ) B.E.
( ) M.B.A. ( ) M.H.R.D.
( ) M.A. ( ) M. Sc.
6. With how many persons you usually watch movie:________
7. Where do you like to watch movie?
( ) At home ( ) At Mulitplexes ( ) Theatres
How many movies you usually watch in a months time: ________
9. Rate Movie Type based on your likings: 1 like most 5 Dislike most
( ) Comedy 1 2 3 4 5
( ) Thriller 1 2 3 4 5
( ) Love Story 1 2 3 4 5
( ) Theme based 1 2 3 4 5
( ) English fiction 1 2 3 4 5

Why Use Nonparametric Tests?
Nonparametric tests do not rely on data
belonging to any distribution
usually they focus on the sign or rank of the data
rather than the exact numerical value.
do not specify the shape of the parent population.
can often be used in smaller samples.
can be used for ordinal and Nominal data.
McGraw-Hill/Irwin 2007 The McGraw-Hill Companies, Inc. All rights reserved.
Nonparametric Tests
Why Use Nonparametric Tests?
McGraw-Hill/Irwin 2007 The McGraw-Hill Companies, Inc. All rights reserved.
Advantages and Disadvantages of
Nonparametric Tests
Why Use Nonparametric Tests?
McGraw-Hill/Irwin 2007 The McGraw-Hill Companies, Inc. All rights reserved.
Some Common Nonparametric Tests
Nonparametric Methods
There is at least one nonparametric test
equivalent to a parametric test
These tests fall into several categories
1. Tests of differences between two groups
(independent samples)
2. Tests of differences between more than two groups
(independent samples)
3. Tests of differences between variables (dependent
samples)
4. Tests of relationships between variables
Differences between independent groups
Two samples
compare mean
value for some
variable of interest

Parametric Nonparametric
t-test for
independent
samples
Mann-Whitney
U test
Wald-Wolfowitz
runs test
Kolmogorov-
Smirnov two
sample test
Mann-Whitney Test
The Mann-Whitney test is a nonparametric
test that compares two populations.
It does not assume normality.
It is a test for the equality of medians,
assuming
- the populations differ only in centrality,
- equal variances
The hypotheses are
H
0
: M
1
= M
2
(no difference in medians)
H
1
: M
1
M
2
(medians differ)
McGraw-Hill/Irwin 2007 The McGraw-Hill Companies, Inc. All rights reserved.
Mann-Whitney Test
Step 1: Sort the combined samples from lowest
to highest.
Step 2: Assign a rank to each value.
If values are tied, the average of the ranks
is assigned to each.
Step 3: The ranks are summed for each column
(e.g., T
1
, T
2
).
Step 4: The sum of the ranks T
1
+ T
2
must be
equal to n(n + 1)/2, where n = n
1
+ n
2
.
McGraw-Hill/Irwin 2007 The McGraw-Hill Companies, Inc. All rights reserved.
Performing the Test
Mann-Whitney Test
First, combine the
samples and assign
a rank to each
observation in each
group. For example:
When a tie occurs,
each observation is
assigned the
average of the
ranks.
McGraw-Hill/Irwin 2007 The McGraw-Hill Companies, Inc. All rights reserved.
Performing the Test
Rank Heigh
t(cm)
Gende
r
Ran
k
Heigh
t
(cm)
Gender
1 193 M 9 170 M
2 188 M 10 168 F
3 185 M 11 165 F
4 183 M 12 163 F
5 180 M
6 178 M
7 175 F
8 173 F
Mann-Whitney Test
Next, arrange the
data by groups and
sum the ranks to
obtain the T
j
s.
Remember,
ST
j
= n(n+1)/2.
McGraw-Hill/Irwin 2007 The McGraw-Hill Companies, Inc. All rights reserved.
Performing the Test
Heights
of males
(cm)
Heights
of
females
(cm)
Ranks of
male
heights
Ranks of
female
heights
193 175 1 7
188 173 2 8
185 168 3 10
183 165 4 11
180 163 5 12
178 6
170 9
n
1
= 7 n
2
= 5 T
1
= 30 T
2
= 48
Mann-Whitney Test
Step 5: Calculate the mean rank sums T
1
and T
2
.
McGraw-Hill/Irwin 2007 The McGraw-Hill Companies, Inc. All rights reserved.
Performing the Test
Step 6: For large samples (n
1
< 10, n
2
> 10), use a
z test.
Step 7: For a given a, reject H
0
if
z < -z
a
or z > +z
a
The Kruskal-Wallis (K-W) test compares c
independent medians, assuming the
populations differ only in centrality.
The K-W test is a generalization of the Mann-
Whitney test and is analogous to a one-factor
ANOVA (completely randomized model).
Groups can be of different sizes if each group
has 5 or more observations.
Populations must be of similar shape but
normality is not a requirement.

Kruskal-Wallis Test
for Independent Samples
McGraw-Hill/Irwin 2007 The McGraw-Hill Companies, Inc. All rights reserved.
Kruskal-Wallis Test
for Independent Samples
First, combine the
samples and assign
a rank to each
observation in each
group. For example:
When a tie occurs,
each observation is
assigned the
average of the
ranks.
McGraw-Hill/Irwin 2007 The McGraw-Hill Companies, Inc. All rights reserved.
Performing the Test
Kruskal-Wallis Test
for Independent Samples
Next, arrange
the data by
groups and
sum the ranks
to obtain the
T
j
s.
Remember,
ST
j
=
n(n+1)/2.

McGraw-Hill/Irwin 2007 The McGraw-Hill Companies, Inc. All rights reserved.
Performing the Test
Kruskal-Wallis Test
for Independent Samples
The hypotheses to be tested are:
H
0
: All c population medians are the same
H
1
: Not all the population medians are the same
For a completely randomized design with c
groups, the tests statistic is



where n = n
1
+ n
2
+ + n
c
n
j
= number of observations in group j
T
j
= sum of ranks for group j

McGraw-Hill/Irwin 2007 The McGraw-Hill Companies, Inc. All rights reserved.
Performing the Test
Kruskal-Wallis Test
for Independent Samples
The H test statistic follows a chi-square
distribution with n = c 1 degrees of
freedom.
This is a right-tailed test, so reject H
0
if H >
c
2
a
or if p-value < a.

McGraw-Hill/Irwin 2007 The McGraw-Hill Companies, Inc. All rights reserved.
Performing the Test
Differences between dependent groups
Compare two variables
measured in the same sample




If more than two variables
are measured in same sample

Parametric Nonparametric
t-test for
dependent
samples

Sign test
Wilcoxons
matched pairs
test
Repeated
measures
ANOVA
Friedmans two
way analysis of
variance
Cochran Q
Wilcoxon Signed-Rank Test
The Wilcoxon signed-rank test compares a
single sample median with a benchmark
using only ranks of the data instead of the
original observations.
It is used to compare paired observations.
Advantages are
- freedom from the normality assumption,
- robustness to outliers
- applicability to ordinal data.
The population should be roughly symmetric.
McGraw-Hill/Irwin 2007 The McGraw-Hill Companies, Inc. All rights reserved.
Wilcoxon Signed-Rank Test
To compare the sample median (M) with a
benchmark median (M
0
), the hypotheses are:
McGraw-Hill/Irwin 2007 The McGraw-Hill Companies, Inc. All rights reserved.
When evaluating the difference between paired observations, use the
median difference (M
d
) and zero as the benchmark.
Wilcoxon Signed-Rank Test
Calculate the difference between the paired
observations.
Rank the differences from smallest to largest
by absolute value.
Add the ranks of the positive differences to
obtain the rank sum W.
McGraw-Hill/Irwin 2007 The McGraw-Hill Companies, Inc. All rights reserved.
Wilcoxon Signed-Rank Test
For small samples, a special table is required
to obtain critical values.
For large samples (n > 20), the test statistic is
approximately normal.



Use Excel or Appendix C to get a p-value.
Reject H
0
if p-value < a.
McGraw-Hill/Irwin 2007 The McGraw-Hill Companies, Inc. All rights reserved.
The Friedman test determines if c treatments
have the same central tendency (medians) when
there is a second factor with r levels and the
populations are assumed to be the same except
for centrality.
This test is analogous to a two-factor ANOVA
without replication (randomized block design)
with one observation per cell.
The groups must be of the same size.
Treatments should be randomly assigned within
blocks.
Data should be at least interval scale.
Friedman Test for Related Samples
McGraw-Hill/Irwin 2007 The McGraw-Hill Companies, Inc. All rights reserved.
Friedman Test for Related Samples
In addition to the c treatment levels that define
the columns, the Friedman test also specifies r
block factor levels to define each row of the
observation matrix.
The hypotheses to be tested are:
H
0
: All c populations have the same median
H
1
: Not all the populations have the same
median
Unlike the Kruskal-Wallis test, the Friedman
ranks are computed within each block rather
than within a pooled sample.
McGraw-Hill/Irwin 2007 The McGraw-Hill Companies, Inc. All rights reserved.
Friedman Test for Related Samples
First, assign a rank to each observation within
each row. For example, within each Trial:





When a tie occurs, each observation is
assigned the average of the ranks.
McGraw-Hill/Irwin 2007 The McGraw-Hill Companies, Inc. All rights reserved.
Performing the Test
Friedman Test for Related Samples
Compute the test statistic:



where r = the number of blocks (rows)
c = the number of treatments (columns)
T
j
= the sum of ranks for treatment j

McGraw-Hill/Irwin 2007 The McGraw-Hill Companies, Inc. All rights reserved.
Performing the Test
Friedman Test for Related Samples
McGraw-Hill/Irwin 2007 The McGraw-Hill Companies, Inc. All rights reserved.
Performing the Test
The Friedman test statistic F, follows a chi-
square distribution with n = c 1 degrees of
freedom.
Reject H
0
if F > c
2
a
or if p-value < a.

Your Doubts or Queries.
Chi-Square Test for Independence
Contingency Tables
A contingency table is a cross-tabulation of n
paired observations into categories.




Each cell shows the count of observations that
fall into the category defined by its row (r) and
column (c) heading.
Steps in Testing the Hypotheses using Chi-Square Test
Step 1: State the Hypotheses
H
0
: Variable A is independent of variable B
H
1
: Variable A is not independent of variable B

Steps in Testing the Hypotheses using Chi-Square Test
Step 2: State the Decision Rule
For a given a, look up the right-tail critical value (c
2
R
)
from c
2

table. For n = (r 1)(c 1)

Reject H
0
if c
2
cal
> test statistic.
or incase of p- value
Step 3: Construct Contingency Table
A contingency table is a cross-tabulation of n paired observations
into categories. In which each cell shows the count of observations
that fall in the cell defined by its row (r) and column (c) heading.
Steps in Testing the Hypotheses using Chi-Square Test
Steps in Testing the Hypotheses using Chi-Square Test
Step 4: Calculate the Expected Frequencies
e
jk
= R
j
C
k
/n

Step 5: Calculate the Test Statistic
The chi-square test statistic is




Step 6: Make the Decision
Reject H
0
if c
2
R
> test statistic or if the
p-value < a.
Steps in Testing the Hypotheses using Chi-Square Test
Steps in Testing the Hypotheses using Chi-Square Test
Caution
The chi-square test is unreliable if the expected
frequencies are too small.
Rules of thumb:
Cochrans Rule requires that e
jk
> 5 for all cells.
If this happens, try combining adjacent rows or
columns to enlarge the expected frequencies.

You might also like