Professional Documents
Culture Documents
Cross-tabulation Tables
Cross-tabulation tables (contingency tables) display the relationship between two or more
categorical (nominal or ordinal) variables. The size of the table is determined by the number of
distinct values for each variable, with each cell in the table representing a unique combination of
values. Numerous statistical tests are available to determine whether there is a relationship
between the variables in a table.
What factors affect the products that people buy? The most obvious is probably how much
money people have to spend. In this example, we'll examine the relationship between income
level and PDA (personal digital assistant) ownership.
Analyze
DescriptiveStatistics
Crosstabs
None of the numbers in this table, however, stand out in any obvious way, indicating any
obvious relationship between the variables.
It is often difficult to analyze a cross-tabulation simply by looking at the simple counts in each
cell.
The fact that there are more than twice as many PDA owners in the $25,000–$49,000 income
category than in the under $25,000 category may not mean much (or anything) since there are
also more than twice as many people in that income category.
► Open the Crosstabs dialog box again. (The two variables should still be selected.)
► You can use the Dialog Recall button on the toolbar to quickly return to recently used
procedures.
► Click Cells.
► Click Continue and then click OK in the main dialog box to run the procedure.
The purpose of a cross-tabulation is to show the relationship (or lack thereof) between two
variables.
Although there appears to be some relationship between the two variables, is there any reason to
believe that the differences in PDA ownership between different income categories is anything
more than random variation?
A number of tests are available to determine if the relationship between two crosstabulated
variables is significant. One of the more common tests is chi-square. One of the advantages of
chi-square is that it is appropriate for almost any kind of data.
► Click Continue and then click OK in the main dialog box to run the procedure.
Pearson chi-square tests the hypothesis that the row and column variables are independent. The
actual value of the statistic isn't very informative
In this case, the significance value is so low that it is displayed as .000, which means that it
would appear that the two variables are, indeed, related.
You can add a layer variable to create a three-way table in which categories of the row and
column variables are further subdivided by categories of the layer variable.
This variable is sometimes referred to as the control variable because it may reveal how the
relationship between the row and column variables changes when you "control" for the effects of
the third variable.
► Click Cells.
Md. Abdullah Al Mahmud
Senior Lecturer
Manarat International University
► Uncheck Row Percents.
► Click Continue
If you look at the cross-tabulation table, it might appear that the only thing we have
accomplished is to make the table larger and harder to interpret.
But if you look at the table of chi-square statistics, you can easily see that in all but one of the
education categories, the apparent relationship between income and PDA ownership disappears
(typically, a significance value less than 0.05 is considered "significant").
Since income tends to rise as education rises, apparent relationships between income and other
variables may actually be the result of differences in education.
Correlation analysis
r
Cov ( X , Y )
XY nXY
V ( X )V (Y ) [ X 2 nX 2 ] [ Y 2 nY 2 ]
Pearson’s correlation coefficient (r) is a measure of linear association, but if the
relationship is not linear, Pearson’s correlation coefficient is not an appropriate
statistic for measuring their association.
Bivariate Data:
Bivariate data: Data with measurements on two variables on same individual; let’s call them X
and Y.
Null Hypothesis: H0 : 0
Alternative Hypothesis: H1 : 0
Or, H 1 : 0 or, H 1 : 0
P-value:
If P-value < 0.05, then we say null hypothesis is rejected at 5% level of significance.
To calculate a simple correlation matrix, one must use [Statistics => Correlate => Bivariate...],
IQ GPA
102 2.75
108 4.00
109 2.25
118 3.00
79 1.67
88 2.25
Click on [Statistics => Correlate => Bivariate...], then select and move "IQ" and
"GPA" to the Variables: list. [Explore the options presented on this controlling dialog
box.]
Click on [OK] to generate the requested statistics.
Correlations
IQ GPA
IQ Pearson 1 .669
Correlation
Sig. (2-tailed) .147
N 6 6
GPA Pearson .669 1
Correlation
Sig. (2-tailed) .147
N 6 6
As you can see, r=0.669. The results suggest that the correlation is significant.
Solution:
First enter the values of the variable experience(X) and sales(Y) in SPSS data sheet.
From the menu bar choose
Analyze
Correlate
Bivariate…
Comment:
Test of significance:
You can select Two-tailed or One tailed. If the direction of association is known
in advance, select One-tailed otherwise select Two-tailed.
Flag significant correlation:
Exclude cases pairwise: Cases with missing values for one or both of
a pair of variables for a correlation coefficient are excluded from
analysis.
Exclude cases liswise: Cases with missing values for any variable are
excluded from all correlations.
After choosing the desired options then click Continue and then OK
Output:
Descriptive Statistics
Std.
Mean Deviation N
Years of sales
6.70 3.831 10
experience
Correlations
N 10 10
N 10 10
Comment on r:
The value of correlation coefficient, r=0.886, which implies that there is a strong
positive association between the variables, years of sales experience and annual
sales volume.
Comment on significance:
Here p-value=0.001
Since p-value is less than 0.01, we may reject the null hypothesis at 1% level of
significance and conclude that the population correlation coefficient is not
equal to 0, i.e., there is a linear association between years of sales experience
annual sales volume.