You are on page 1of 36

Marketing Research, 2

nd
Edition
Alan T. Shao
Copyright 2002 by South-Western
4
2
5
1
3
0011 0010 1010 1101 0001 0100 1011
PPT-1
CHAPTER 16
BIVARIATE STATISTICS:
PARAMETRIC TESTS

Marketing Research, 2
nd
Edition
Alan T. Shao
Copyright 2002 by South-Western
4
2
5
1
3
0011 0010 1010 1101 0001 0100 1011
PPT-2
What The Experts Say
Whats the point in doing surveys if you cant analyze
the data? Converting and reducing data into
meaningful results is a marketing researchers key
responsibility.
--SPSS Web Page, Analysis,
http://www.spss.com/spssmr/solutions/
analysis.htm, February 19, 2001.

Marketing Research, 2
nd
Edition
Alan T. Shao
Copyright 2002 by South-Western
4
2
5
1
3
0011 0010 1010 1101 0001 0100 1011
PPT-3
Learning Objectives
Discuss the importance of parametric statistics
Describe the difference between tests of differences and
tests of associations
Explain how to use z- and t-tests to compare two groups
Describe and calculate the F-test
Discuss the meaning and use of analysis of variance
Describe correlation and regression analyses
Calculate and interpret correlation and regression statistics
Compute one-way analysis of variance manually and by
computer

Marketing Research, 2
nd
Edition
Alan T. Shao
Copyright 2002 by South-Western
4
2
5
1
3
0011 0010 1010 1101 0001 0100 1011
PPT-4
Get This! My Name Is Important to Me
In the book, How to Win Friends and I nfluence People, Dale
Carnegie wrote, Remember that a persons name is to that
person the sweetest and most important sound in any language.
A professor classified students into three groups: names (those
he could remember), no-names (those he could not remember),
and neutral-names (those whose names he never made reference
to during the conversations).
At the end of a meeting with each student, the professor would
state Oh, I have to ask you something else. My wife is selling
cookies for the church. If you want any, theyre only 25 cents.
This offer was made to examine if remembrance of a students
name made a difference regarding whether or not he or she
would comply with a request (that is, purchase the cookies).

Marketing Research, 2
nd
Edition
Alan T. Shao
Copyright 2002 by South-Western
4
2
5
1
3
0011 0010 1010 1101 0001 0100 1011
PPT-5
The results were analyzed using several different
statistical techniques, one being analysis of variance.
He found:
Not being able to remember a students name produced
compliance results (that is, purchasing the cookies) no
different from those of a condition in which the issue of
a students name was never raised.
The higher purchasing rate for those students whose
names were remembered indicates that name
remembrance facilitates compliance.
Get This! My Name Is Important to Me
contd

Marketing Research, 2
nd
Edition
Alan T. Shao
Copyright 2002 by South-Western
4
2
5
1
3
0011 0010 1010 1101 0001 0100 1011
PPT-6
Now Ask Yourself
Based on your knowledge of statistics, do you have faith in the
findings since various statistical tools were used to analyze the
data? Was it really necessary for the researchers to run statistical
tests to generate their findings?
What was meant by, The professor decided to use this method
[analysis of variance] since it tests whether there are statistically
significant differences among the means of each of the student
groups?
Were the results surprising to you? If so, what did you expect?
If not, why not?

Marketing Research, 2
nd
Edition
Alan T. Shao
Copyright 2002 by South-Western
4
2
5
1
3
0011 0010 1010 1101 0001 0100 1011
PPT-7
Parametric Tests
The sample data should be randomly drawn from a
normally distributed population.
The sample data drawn must be independent of each
other.
When examining central tendency for which two or
more samples are drawn, the population should have
equal variances.
The hypothesis tests assume that variables under
investigation are measured using either interval or ratio
scales. Furthermore, it is necessary to make some
additional assumptions.

Marketing Research, 2
nd
Edition
Alan T. Shao
Copyright 2002 by South-Western
4
2
5
1
3
0011 0010 1010 1101 0001 0100 1011
PPT-8
Tests of Difference
the first population and its samples are identified by subscript 1
the second population and its samples are identified by subscript 2

1
represents the mean of the sample drawn from population 1

2
represents the mean of the sample drawn from population 2
X
X
Can be used whenever a researcher is interested in
comparing some characteristic of one group with a
characteristic of another and determining whether or not a
significant difference exists between the two groups.

Marketing Research, 2
nd
Edition
Alan T. Shao
Copyright 2002 by South-Western
4
2
5
1
3
0011 0010 1010 1101 0001 0100 1011
PPT-9
where [Formula 16-1]


where = the difference between sample means

= the difference between population means

and = sample means for the two variables

= standard error of the difference between the means

Z-test: Difference Between Means
Used to determine whether two population means differ from
each other. This can be determined by using either the z-test or t-
test, depending on the sample size and whether or not the
population standard deviation is known for either group.
If the sample size is at least 30 and the population standard
deviations are known, the z-test should be used.
) (
2 1 2 1
2 1
) ( ) (
x x
X X
z


=
o

2
2
2
1
1
2
) (
2 1
n n
x x
o o
o + =

) (
2 1
X X
) (
2 1

1
X
2
X
2 1
x x
o

Marketing Research, 2
nd
Edition
Alan T. Shao
Copyright 2002 by South-Western
4
2
5
1
3
0011 0010 1010 1101 0001 0100 1011
PPT-10
t-Test: Difference Between Means
When the sample size is less than 30 and the population
standard deviations are unknown, we can determine whether
or not a significant difference exists between two means (or
whether the two population means are equal).
) (
2 1 2 1
2 1
) ( ) (
x x
s
X X
t


=

|
|
.
|

\
| +
|
|
.
|

\
|
+
+
=

2 1
2 1
2 1
2
2 2
2
1 1
) (
2
2 1
n n
n n
n n
s n s n
s
x x
where

Marketing Research, 2
nd
Edition
Alan T. Shao
Copyright 2002 by South-Western
4
2
5
1
3
0011 0010 1010 1101 0001 0100 1011
PPT-11
Difference Between
Two Proportions and Independent Samples
Let p
1
and p
2
be the proportions of two samples drawn from
respective populations with proportions P
1
and P
2
. The null
hypothesis is that there is no difference between the two
population proportions; that is P
1
= P
2
or stated another way, P
1
-
P
2
= 0. If the null hypothesis is true, P
1
= P
2
, the two populations
are really the same population.
The basic concept concerning the difference between two sample
proportions is analogous to that concerning the difference
between two sample means.
1. The mean of the sampling distribution (p
1
- p
2
) is equal to
the difference between the two population proportions, P
1

and P
2
, or p
1
p
2
= P
1
P
2
.

Marketing Research, 2
nd
Edition
Alan T. Shao
Copyright 2002 by South-Western
4
2
5
1
3
0011 0010 1010 1101 0001 0100 1011
PPT-12
2. The variance of the difference between two sample
proportions is the sum of variances of the two sample
proportions,
where

2
2 2
1
1 1 2 2
) (
2
2 1 2 1
n
Q P
n
Q P
p p p p + = + = o o o
2 2
1 1
1
1
P Q
P Q
=
=
When the sampling distributions of p
1
and p
2
are normal, the
distribution of the differences between p
1
and p
2
is also
normal. Since the mean of the sampling distribution of p
1
- p
2

is equal to the difference between the two population
proportions, the distribution that follows is normal.

) (
2 1 2 1
2 1
) ( ) (
p p
P P p p
z


=
o
Difference Between
Two Proportions and Independent Samples
contd

Marketing Research, 2
nd
Edition
Alan T. Shao
Copyright 2002 by South-Western
4
2
5
1
3
0011 0010 1010 1101 0001 0100 1011
PPT-13
= sample proportion successes in first group
= sample proportion successes in second group
= population proportion of first group
= population proportion of second group
= variance of the difference between two sample proportions
1
p
2
p
1
P
2
P
) (
2 1
p p
o
When P
1
= P
2
, P
1
- P
2
= 0 and P
1
Q
1
= P
2
Q
2
= PQ where Q = 1 P. Thus
) (
2 1
2 1
p p
p p
z

=
o
|
.
|

\
|
+ = + =

2 1 2
2 2
1
1 1
) (
1 1
2 1
n n
PQ
n
Q P
n
Q P
p p
o
where

Difference Between
Two Proportions and Independent Samples
contd

Marketing Research, 2
nd
Edition
Alan T. Shao
Copyright 2002 by South-Western
4
2
5
1
3
0011 0010 1010 1101 0001 0100 1011
PPT-14
Analysis of Variance
The two tests (z-tests and t-tests) are useful when testing a null
hypothesis when only two samples are involved. Analysis of
Variance (ANOVA) is often the preferred method to test whether
there is a significant difference among means of two or more
independent samples. It is applicable whenever a study involves
an interval- or ratio-scaled dependent variable.
One-Way Analysis of Variance is discussed in this chapter. It is
a bivariate statistical technique that involves only one
independent variable, although there may be multiple levels of
that variable.
The null hypothesis for ANOVA is that the means of normally
distributed populations, such as three populations, a, b, c, are
equal or
a
=
b
=
c
.

Marketing Research, 2
nd
Edition
Alan T. Shao
Copyright 2002 by South-Western
4
2
5
1
3
0011 0010 1010 1101 0001 0100 1011
PPT-15
If we take a random sample from each of the three original
populations, we may consider the three samples of subsets of a
single large sample drawn from the single large population.
Grand mean =
= X
13 4 5 4

=
+ +
+ + X X X X
c b a
The unbiased estimate of the large population variance ( ) based
on the preceding samples may be obtained by calculating the
variance between groups [MSA ( )] and the variance within
groups [MSE ( )].
2
o
2
1
s
2
2
s
Analysis of Variance contd

Marketing Research, 2
nd
Edition
Alan T. Shao
Copyright 2002 by South-Western
4
2
5
1
3
0011 0010 1010 1101 0001 0100 1011
PPT-16
The variance between groups (or between samples) is also referred to as
the mean sum of squares between (among) groups. It is sometimes
denoted as MSA or . It is written in a general form:

Variance Between Groups
2
1
s
1
) (

2
2
1


= = =

r
X X n
df
SS
MSA s
i i between
i = individual groups or samples a, b, c,
= size of group i, or size of sample drawn from population i, such
as in the preceding illustration
= mean of the items in group or sample i
= grand mean, or mean of all items in the single large sample
= deviation of group mean from grand mean
= variation, or squared deviation (The term variation has been
used loosely in previous discussions. Here, the term is limited
to represent the squared deviation.)
r = number of groups or samples, such as three groups in the above
illustration
i
n
i
X
X
X X
i

2
) ( X X
i

4 , 5 , 4 = = =
c b a
n n n

Marketing Research, 2
nd
Edition
Alan T. Shao
Copyright 2002 by South-Western
4
2
5
1
3
0011 0010 1010 1101 0001 0100 1011
PPT-17
Note that the deviation is called the effect, and the
nature of the sample i is called the treatment. Furthermore,
whenever ANOVA is used, the independent variables are
called factors, so the different levels (or categories) of a
factor are the treatments.
X X
i

Variance Between Groups contd

Marketing Research, 2
nd
Edition
Alan T. Shao
Copyright 2002 by South-Western
4
2
5
1
3
0011 0010 1010 1101 0001 0100 1011
PPT-18
Variance Within Groups
The variance within groups (or within individual samples) is
also referred to as the mean square error (MSE) or , since
it is an estimate of the random error existing in the data. It is
written in a general form
2
2
s
| |
r n
X X
df
SS
MSE s
i i within


= = =

2
2
2
) (

where individual items in group i


number of items in the single large sample

=
i
X
= + + =
c b a
n n n n

Marketing Research, 2
nd
Edition
Alan T. Shao
Copyright 2002 by South-Western
4
2
5
1
3
0011 0010 1010 1101 0001 0100 1011
PPT-19
F-Test
The F-statistic is the variance between groups divided by the
variance within groups. It is used to test for group differences and
compares one sample variance with another sample variance. It can
be presented this way:
2
2
2
1

s
s
F =
where the subscripts 1 (in the numerator) and 2 (in the
denominator) indicate the sample numbers and each represents the
estimate of the population variance based on the sample.
Represents the variance ratio in showing the relationship between
the two independently estimated population variances
Variance between groups MSA
Variance within groups MSE
F =
=

Marketing Research, 2
nd
Edition
Alan T. Shao
Copyright 2002 by South-Western
4
2
5
1
3
0011 0010 1010 1101 0001 0100 1011
PPT-20
Tests of Associations
Examine associations between two or more variables.
When two groups are studied, there will always be a
variable that predicts the actions of another variable.
The predictor variable is the independent variable, and
the criterion variable is the dependent variable.
Tests to measure statistical relationships between
variables are:
Regression Analysis
Correlation Analysis

Marketing Research, 2
nd
Edition
Alan T. Shao
Copyright 2002 by South-Western
4
2
5
1
3
0011 0010 1010 1101 0001 0100 1011
PPT-21
Scatter Diagrams
When two related variables, called bivariate data, are
plotted as points on a graph, the graph is called a scatter
diagram. A scatter diagram indicates whether the
relationship between the two variables is positive or
negative.

Marketing Research, 2
nd
Edition
Alan T. Shao
Copyright 2002 by South-Western
4
2
5
1
3
0011 0010 1010 1101 0001 0100 1011
PPT-22
Regression Analysis
Refers to statistical techniques for measuring the linear or curvilinear
relationship between a dependent variable and one or more
independent variables. The relationship between two variables is
characterized by how they vary together.
Given pairs of X and Y variables, regression analysis measures the
direction (positive or negative) and rate of change (slope) in Y as X
changes, or vice versa. Using the values of the independent variable, it
attempts to predict the values of an interval- or ratio-scaled dependent
variable.
Regression analysis requires two operations: (1) Derive an equation,
called the regression equation, and a line representing the equation to
describe the shape of the relationship between the variables. (2)
Estimate the dependent variable (Y) from the independent variable (X),
based on the relationship described by the regression equation.
The regression line is the line drawn through a scatter diagram that
best fits the data points and most accurately describes the
relationship between the two variables.

Marketing Research, 2
nd
Edition
Alan T. Shao
Copyright 2002 by South-Western
4
2
5
1
3
0011 0010 1010 1101 0001 0100 1011
PPT-23
Regression Equation and Regression Line
While all shapes are informative, a straight line is especially useful,
because it is the easiest to deal with in regression analysis to
describe the shape of the average relationship between two
variables. The straight line can be expressed by the linear equation:
bX a Y
c
+ =
where = computed value of the dependent variable
a = Y-intercept where X equals zero
b = slope of the regression line, which is the increase or decrease
in Y for each change of one unit of X
X = a given value of the independent variable
c
Y

Marketing Research, 2
nd
Edition
Alan T. Shao
Copyright 2002 by South-Western
4
2
5
1
3
0011 0010 1010 1101 0001 0100 1011
PPT-24
To create a regression model, researchers estimate the regression line
using the following equation
i o
X Y c | | + + =
1 1
where = Y-intercept where X equals zero
= slope of the regression line, which is the increase or
decrease in Y for each change of one unit of X
= a given value of the independent variable
i = observation number
= error term associated with the ith observation
1
|
i
X
i
c
o
|
Regression Equation and Regression Line
contd

Marketing Research, 2
nd
Edition
Alan T. Shao
Copyright 2002 by South-Western
4
2
5
1
3
0011 0010 1010 1101 0001 0100 1011
PPT-25
Least-Squares Method
A statistical technique that fits a straight line to a scatter
diagram by finding the smallest sum of the vertical distances
squared (i.e., ) of all the points from the straight line. The
equation derived by this method will yield a regression line that
best fits the data.
To calculate the straight line by the least-squares method, the
equation is used. We must first determine the
constants, a and b, which are called regression coefficients.
Regression coefficients are the values that represent the effect of
the individual independent variables on the dependent variable.

2
i
e
bX a Y
c
+ =

Marketing Research, 2
nd
Edition
Alan T. Shao
Copyright 2002 by South-Western
4
2
5
1
3
0011 0010 1010 1101 0001 0100 1011
PPT-26
( )
( ) ( )
2
2


=
X X n
Y X XY n
b
n
X
b
n
Y
a

=
or
X b Y a =
Least-Squares Method contd

Marketing Research, 2
nd
Edition
Alan T. Shao
Copyright 2002 by South-Western
4
2
5
1
3
0011 0010 1010 1101 0001 0100 1011
PPT-27
Standard Deviation of Regression
The standard deviation of the Y values from the regression line ( )
is called the standard deviation of regression. It is also popularly
called the standard of error of estimate, since it can be used to
measure the error of the estimates of individual Y values based on
the regression line. Thus
= the standard deviation of Y values from the mean
= the standard deviation of X values from the mean
= the standard deviation of regression of Y values from
= the standard deviation of regression of X values from
c
Y
y
s
x
s
yx
s
xy
s
c
Y
Y
c
X
X

Marketing Research, 2
nd
Edition
Alan T. Shao
Copyright 2002 by South-Western
4
2
5
1
3
0011 0010 1010 1101 0001 0100 1011
PPT-28
The standard deviation of Y values from the regression line is based on the points
representing Y values scattered around the least-squares line. The closer the
points to the line, the smaller the value of the standard deviation of regression.
Thus, the estimates of Y values based on the line are more reliable. On the other
hand, the wider the points are scattered around the least-squares line, the larger
the standard deviation of regression and the smaller the reliability of the
estimates based on the line or the regression equation. The general formula for
the standard deviation of regression of Y values on X is
( )
k n
Y Y
s
c
yx

=

2
where k = number of total (dependent and independent) variables. However, a
simpler method of computing is to use the following formula
k n
XY b Y a Y
s
yx


=

2
yx
s
Standard Deviation of Regression contd

Marketing Research, 2
nd
Edition
Alan T. Shao
Copyright 2002 by South-Western
4
2
5
1
3
0011 0010 1010 1101 0001 0100 1011
PPT-29
Correlation Analysis
Correlation Analysis: Refers to the statistical
techniques for measuring the closeness of the
relationship between two metric (interval- or ratio-
scaled) variables. It measures the degree to which
changes in one variable are associated with changes in
another. The computation concerning the degree of
closeness is based on regression statistics.

Marketing Research, 2
nd
Edition
Alan T. Shao
Copyright 2002 by South-Western
4
2
5
1
3
0011 0010 1010 1101 0001 0100 1011
PPT-30
Total Deviation, Coefficient of Determination,
and Correlation Coefficient
Total Deviation ( ). Assume there are two variables, X and Y. The
mean of Y values = ( Y)/n, , is obtained without referring to X values.
The , representing the regression line of Y values = a + bx, is obtained
with the influence of X values. If Y values are related to X values to some
degree, the deviations of Y values from must be reduced somewhat by
the introduction of X values in computing values. The total deviation of
Y from the mean is divided into two parts:
Y Y

Y
c
Y
Y
Y
c
Y
Total deviation = Unexplained deviation + Explained deviation
= + Y Y
c
Y Y Y Y
c


Marketing Research, 2
nd
Edition
Alan T. Shao
Copyright 2002 by South-Western
4
2
5
1
3
0011 0010 1010 1101 0001 0100 1011
PPT-31
The explained variation may also be referred to as the
regression sum of squares (RSS). The unexplained variation
is called the error sum of squares (ESS). This relationship may be
expressed as

Total variation = Unexplained variation + Explained variation
TSS = ESS + RSS
= +


2
) ( Y Y


2
) (
c
Y Y


2
) ( Y Y
c
Total Deviation, Coefficient of Determination,
and Correlation Coefficient contd


2
) ( Y Y


2
) (
c
Y Y

Marketing Research, 2
nd
Edition
Alan T. Shao
Copyright 2002 by South-Western
4
2
5
1
3
0011 0010 1010 1101 0001 0100 1011
PPT-32
Coefficient of Determination (r
2
)
The coefficient of determination (r
2
) is the strength of association or
degree of closeness of the relationship between two variables measured by
a relative value. It demonstrates how well the regression line fits the
scattered points. It may be defined as the ratio of the explained variation to
the total variation:

Coefficient of determination = Explained variation = RSS
Total variation TSS


or symbolically,


( )
( )

=
2
2
2
Y Y
Y Y
r
c
The range of the r
2
value is therefore from 0 to 1. When r
2
is close to 1, the Y
values are very close to the regression line. When r
2
is close to 0, the Y values
are not close to the regression line.


Marketing Research, 2
nd
Edition
Alan T. Shao
Copyright 2002 by South-Western
4
2
5
1
3
0011 0010 1010 1101 0001 0100 1011
PPT-33
Correlation Coefficient
The correlation coefficient, the square root of r
2
or
is frequently computed to indicate the direction of the
relationship in addition to indicating the degree of the
relationship.
It is the correlation between the observed and predicted values
of the dependent variable. Since the range of r
2
is from 0 to 1,
the coefficient of correlation r will vary within the range of
to , or from 0 to +1.
r r =
2
0
1

Marketing Research, 2
nd
Edition
Alan T. Shao
Copyright 2002 by South-Western
4
2
5
1
3
0011 0010 1010 1101 0001 0100 1011
PPT-34
Decision Time!
As a marketing manager, you want information from
marketing researchers that can enhance your decision-
making abilities.
If correlation analysis is a popular and informative
statistical method, why should researchers bother using
more complex, somewhat intimidating bivariate
statistical techniques?
Do you feel that there is really that much to gain from
these methods? Why or why not?

Marketing Research, 2
nd
Edition
Alan T. Shao
Copyright 2002 by South-Western
4
2
5
1
3
0011 0010 1010 1101 0001 0100 1011
PPT-35
Net Impact
The Internet can be a valuable tool to learn about
bivariate statistical techniques.
Using almost any search engine, you can find a variety
of discussions about the topic.
These discussions may be available on the Internet as
part of a companys promotion of its statistical
services, a university professors statistical seminar
notes, or PowerPoint slides that were used in a
seminar presentation.

Marketing Research, 2
nd
Edition
Alan T. Shao
Copyright 2002 by South-Western
4
2
5
1
3
0011 0010 1010 1101 0001 0100 1011
PPT-36
Chapter 16
End of Presentation

You might also like