You are on page 1of 15

Lecture-3

Cross-tabulation Tables

Cross-tabulation tables (contingency tables) display the relationship between two or more
categorical (nominal or ordinal) variables. The size of the table is determined by the number of
distinct values for each variable, with each cell in the table representing a unique combination of
values. Numerous statistical tests are available to determine whether there is a relationship
between the variables in a table.

What factors affect the products that people buy? The most obvious is probably how much
money people have to spend. In this example, we'll examine the relationship between income
level and PDA (personal digital assistant) ownership.

From the file demo.sav-

► From the menus choose:

Analyze
DescriptiveStatistics
Crosstabs

► Select Income category in thousands (inccat) as the row variable.

► Select Owns PDA (ownpda) as the column variable.

► Click OK to run the procedure

Md. Abdullah Al Mahmud


Senior Lecturer
Manarat International University
Md. Abdullah Al Mahmud
Senior Lecturer
Manarat International University
The cells of the table show the count or number of cases for each joint combination of values.
For example, 455 people in the income range $25,000–$49,000 own PDAs.

None of the numbers in this table, however, stand out in any obvious way, indicating any
obvious relationship between the variables.

Counts vs. Percentages

It is often difficult to analyze a cross-tabulation simply by looking at the simple counts in each
cell.

The fact that there are more than twice as many PDA owners in the $25,000–$49,000 income
category than in the under $25,000 category may not mean much (or anything) since there are
also more than twice as many people in that income category.

► Open the Crosstabs dialog box again. (The two variables should still be selected.)

► You can use the Dialog Recall button on the toolbar to quickly return to recently used
procedures.

► Click Cells.

► Click (check) Row in the Percentages group.

► Click Continue and then click OK in the main dialog box to run the procedure.

Md. Abdullah Al Mahmud


Senior Lecturer
Manarat International University
Md. Abdullah Al Mahmud
Senior Lecturer
Manarat International University
A clearer picture now starts to emerge. The percentage of people who own PDAs rises as the
income category rises.

Significance Testing for Cross-tabulations

The purpose of a cross-tabulation is to show the relationship (or lack thereof) between two
variables.
Although there appears to be some relationship between the two variables, is there any reason to
believe that the differences in PDA ownership between different income categories is anything
more than random variation?

A number of tests are available to determine if the relationship between two crosstabulated
variables is significant. One of the more common tests is chi-square. One of the advantages of
chi-square is that it is appropriate for almost any kind of data.

► Open the Crosstabs dialog box again.

Md. Abdullah Al Mahmud


Senior Lecturer
Manarat International University
► Click Statistics.

► Click (check) Chi-square.

► Click Continue and then click OK in the main dialog box to run the procedure.

Pearson chi-square tests the hypothesis that the row and column variables are independent. The
actual value of the statistic isn't very informative

Md. Abdullah Al Mahmud


Senior Lecturer
Manarat International University
The significance value (Asymp. Sig.) has the information we're looking for. The lower the
significance value, the less likely it is that the two variables are independent (unrelated).

In this case, the significance value is so low that it is displayed as .000, which means that it
would appear that the two variables are, indeed, related.

You can add a layer variable to create a three-way table in which categories of the row and
column variables are further subdivided by categories of the layer variable.

This variable is sometimes referred to as the control variable because it may reveal how the
relationship between the row and column variables changes when you "control" for the effects of
the third variable.

► Open the Crosstabs dialog box again.

► Click Cells.
Md. Abdullah Al Mahmud
Senior Lecturer
Manarat International University
► Uncheck Row Percents.

► Click Continue

► Select Level of Education (ed) as the layer variable.

► Click OK to run the procedure

If you look at the cross-tabulation table, it might appear that the only thing we have
accomplished is to make the table larger and harder to interpret.

But if you look at the table of chi-square statistics, you can easily see that in all but one of the
education categories, the apparent relationship between income and PDA ownership disappears
(typically, a significance value less than 0.05 is considered "significant").

Md. Abdullah Al Mahmud


Senior Lecturer
Manarat International University
This suggests that the apparent relationship between income and PDA ownership is merely an
artifact of the underlying relationship between education level and PDA ownership.

Since income tends to rise as education rises, apparent relationships between income and other
variables may actually be the result of differences in education.

Correlation analysis

Correlation means the direction and strength of linear relationship.

Simple Correlation Coefficient (r):

 A quantitative measure of the direction and strength of linear relationship.


Md. Abdullah Al Mahmud
Senior Lecturer
Manarat International University
 The Karl Pearson’s correlation coefficient (r) is defined as:

r
Cov ( X , Y )

 XY  nXY
V ( X )V (Y ) [  X 2  nX 2 ] [  Y 2  nY 2 ]
 Pearson’s correlation coefficient (r) is a measure of linear association, but if the
relationship is not linear, Pearson’s correlation coefficient is not an appropriate
statistic for measuring their association.

Bivariate Data:

Bivariate data: Data with measurements on two variables on same individual; let’s call them X
and Y.

Example: The height(X) and weight(Y) of a group of people.

Hypothesis Test for Correlation:

Null Hypothesis: H0 :  0

Alternative Hypothesis: H1 :  0

Or, H 1 :   0 or, H 1 :   0

P-value:

P-value = probability value

If P-value < 0.05, then we say null hypothesis is rejected at 5% level of significance.

To calculate a simple correlation matrix, one must use [Statistics => Correlate => Bivariate...],

To begin, enter the data as follows,

IQ GPA
102 2.75
108 4.00
109 2.25
118 3.00
79 1.67
88 2.25

Md. Abdullah Al Mahmud


Senior Lecturer
Manarat International University
Simple Correlation

 Click on [Statistics => Correlate => Bivariate...], then select and move "IQ" and
"GPA" to the Variables: list. [Explore the options presented on this controlling dialog
box.]
 Click on [OK] to generate the requested statistics.

Md. Abdullah Al Mahmud


Senior Lecturer
Manarat International University
The results from output window should look like the following,

Correlations
IQ GPA
IQ Pearson 1 .669
Correlation
Sig. (2-tailed) .147
N 6 6
GPA Pearson .669 1
Correlation
Sig. (2-tailed) .147
N 6 6

As you can see, r=0.669. The results suggest that the correlation is significant.

Illustrative Example (Pearson’s correlation coefficient):

Let us consider the following data set:

Sales experience Annual sales volume


(in year) (in Tk.’ 000)
X Y
1 80
3
97
4
92
4
102
6
103
7
98
8
119
10
123
11
110
13
125

Md. Abdullah Al Mahmud


Senior Lecturer
Manarat International University
Find the correlation coefficient between the years of experience of the salespersons and the
annual sales volume.

Solution:

 First enter the values of the variable experience(X) and sales(Y) in SPSS data sheet.
 From the menu bar choose
Analyze

Correlate

Bivariate…

 Select variables exper and sales.


 After that send these variables into the variable box.

The following options are available:

 For quantitative normally distributed variables, choose the Pearson correlation


coefficient.
 If your data are not normally distributed or have ordered categories, choose
Kendall’s tau or Spearman, which measure the association between rank orders.

Comment:

 The range of correlation coefficient(r) is -1 to +1.


 r = -1 means perfect negative relationship.
 r = 1 means perfect positive relationship.
 r = 0 means no linear relationship.

Test of significance:

 You can select Two-tailed or One tailed. If the direction of association is known
in advance, select One-tailed otherwise select Two-tailed.
Flag significant correlation:

 If the value of correlation coefficient is significant at 5% level and 1% level, are


identified with a single asterisk and double asterisk respectively.

Md. Abdullah Al Mahmud


Senior Lecturer
Manarat International University
 At the right side bottom of the dialogue box you will see the box named options... After
clicking in the options… part you will find another dialogue box named Bivariate
Correlation Options.

Bivariate Correlation Options:

Statistics – you can choose the one or both of the following:

 Means and standard deviations: Displayed for each variable.


 Cross-product deviations and covariance’s: Displayed for each pair
of variable. The cross-product of deviations is equal to the sum of the
products of the mean-corrected variables. This is the numerator of the
Pearson’s correlation coefficient. The covariance is an un-standardized
measure of the relationship between two variables, equal to the cross-
product deviation divided by N-1.
Missing Values - you can choose one of the following:

 Exclude cases pairwise: Cases with missing values for one or both of
a pair of variables for a correlation coefficient are excluded from
analysis.

 Exclude cases liswise: Cases with missing values for any variable are
excluded from all correlations.

 After choosing the desired options then click Continue and then OK
Output:

The SPSS result is as follows-

Descriptive Statistics

Std.
Mean Deviation N

Years of sales
6.70 3.831 10
experience

Annual sales volume 104.90 14.395 10

Correlations

Md. Abdullah Al Mahmud


Senior Lecturer
Manarat International University
Years of Annual
sales sales
experience volume

Years of sales Pearson Correlation 1 .886(**)


experience
Sig. (2-tailed) . .001

N 10 10

Annual sales volume Pearson Correlation .886(**) 1

Sig. (2-tailed) .001 .

N 10 10

** Correlation is significant at the 0.01 level (2-tailed).

Interpretation of the Result

Comment on r:

The value of correlation coefficient, r=0.886, which implies that there is a strong
positive association between the variables, years of sales experience and annual
sales volume.

Comment on significance:

Here p-value=0.001

Since p-value is less than 0.01, we may reject the null hypothesis at 1% level of
significance and conclude that the population correlation coefficient  is not
equal to 0, i.e., there is a linear association between years of sales experience
annual sales volume.

Md. Abdullah Al Mahmud


Senior Lecturer
Manarat International University

You might also like