Professional Documents
Culture Documents
FOR WINDOWS
Version 15.0
Summer 2007
Contents
Purpose of handout
SPSS for Windows provides a powerful statistical and data management system in a graphical
environment. The user interfaces make statistical analysis more accessible for casual users and
more convenient for experienced users. Most tasks can be accomplished simply by pointing and
clicking the mouse.
The objective of this handout is to get you oriented with SPSS for Windows. It teaches you how
to enter and save data in SPSS, how to edit and transform data, how to explore your data by
producing graphics and summary descriptives, and how to use pointing and clicking to run
statistical procedures. It is also intended to serve as a reference guide for SPSS procedures that
you will need to know to do your homework assignments.
Window Types
SPSS Data Editor. When you start an SPSS session, you usually see the Data Editor window
(otherwise you will see a Viewer window). The Data Editor displays the contents of the working
data file. There a two views in the data editor window: 1) Data View displays the data in a
spreadsheet format with variable names listed for column headings, and 2) Variable View which
displays information about the variables in your data set. In the Data View you can edit or enter
data, and in the Variable View you can change the format of a variable, add format and variable
labels, etc.
SPSS Viewer/Output. Statistical results and graphs are displayed in the Viewer window. The
(output) Viewer window is divided into two panes. The right-hand pane contains the all the
output and the left-hand pane contains a tree-structure of the results. You can use the left-hand
pane for navigating through, editing and printing your results.
2
Chart Editor. The chart editor is used to edit graphs. When you double-click on figure or
graph, it will reappear in a chart editor window.
SPSS Syntax Editor. The Syntax Editor is used to create SPSS command syntax for using the
SPSS production facility. Usually you will be using the point and click facilities of SPSS, and
hence, you will not need to use the Syntax Editor. More information about the Syntax Editor and
using the SPSS syntax is given in the SPSS Help Tutorials under Working with Syntax. A few
instructions to get you started are given later in the handout in the section Running SPSS using
the Syntax Editor (or Command Language)
Menus
Data Editor Menu:
File. Use the File menu to create a new SPSS file, open an existing file, or read in spreadsheet or
database files created by other software programs (e.g., Excel).
Edit. Use the Edit menu to modify or copy data and output files.
View. Choose which buttons are available in the window or how the window should look.
Data. Use the Data menu to make changes to SPSS data files, such as merging files, transposing
variables, or creating subsets of cases for subset analysis.
Transform. Use the Transform menu to make changes to selected variables in the data file (e.g.,
to recode a variable) and to compute new variables based on existing variables.
Analyze. Use the Analyze menu to select the various statistical procedures you want to use, such
as descriptive statistics, cross-tabulation, hypothesis testing and regression analysis.
Graphs. Use the Graphs menu to display the data using bar charts, histograms, scatterplots,
boxplots, or other graphical displays . All graphs can be customized with the Chart Editor.
Utilities. Use the Utilities menu to view variable labels for each variable.
Add-ons. Information about other SPSS software.
Window. Choose which window you want to view.
Help. Index of help topics, tutorials, SPSS home page, Statistics coach, and version of SPSS.
Viewer Menu: Menu is similar to Data Editor menu, but has two additional options:
Insert. Use the insert menu to edit your output
Format. Use the format menu to change the format of your output.
Chart Editor Menu: Use SPSS Help to learn more about the Chart Editor.
Toolbars
Most Windows applications provide buttons arranged along the top of a window that act as
shortcuts to executing various functions. In SPSS, you will find such buttons (icons) at the top
the of the Data Editor, Viewer, Chart Editor, and Syntax windows. The icons are usually
symbolic representations of the procedure they execute when pushed, unfortunately their
meanings are not intuitively obvious until one has already used them. Hence, the best way to
learn these buttons is to use them and note what happens.
The Status Bar The Status Bar runs along the bottom of a window and alerts the user to the status
of the system. Typical messages one will see are SPSS Processor is ready,
Running procedure. The Status Bar will also provide up-to-date information concerning
special manipulations of the data file like whether only certain cases are being used in an
analysis or if the data has been weighted according to the value of some variable.
File Types
Data Files. A file with an extension of .sav is assumed to be a data file in SPSS for Windows
format. A file with an extension of .por is a portable SPSS data file. The contents of a data file
are displayed in the Data Editor window.
Viewer (Output) Files. A file with an extension of .spo is assumed to be a Viewer file
containing statistical results and graphs.
Syntax (Command) Files. A file witn an extension of .sps is assumed to be a Syntax file
containing spss syntax and commands.
1.
2.
3.
4.
5.
6.
7.
Choose Cancel
Choose File on the menu bar
Choose Open
Choose Data...
Edit the directory or disk drive to indicate where the data is located.
Double click on the filename or
Single click on the filename and choose Open
Instructions on how to read a text data file in fixed format are located in SPSS Help Tutorials
under Reading Data from a Text File.
6
Editing Data. With the Data Editor, you can modify a data file in many ways. For example you
can change values or cut, copy, and paste values, or add and delete cases.
To Change a Data Value:
1. Click on a data cell. The cell value is displayed in the cell editor.
2. Type the new value. It replaces the old value in the cell editor.
3. Press then Enter key. The new value appears in the data cell.
To Cut, Copy, and Paste Data Values
1. Select (highlight) the cell value(s) you want to cut or copy.
2. Pull down the Edit box on the main menu bar.
3. Choose Cut. The selected cell values will be copied, then deleted. Or
4. Choose Copy. The selected cell values will be copied, but not deleted.
5. Select the target cell(s) (where you want to put the cut or copy values).
6. Pull down the Edit box on the main menu bar.
7. Choose Paste. The cut or copy values will be ``pasted'' in the target cells.
To Delete a Case (i.e., a Row of Data)
1. Click on the case number on the left side of the row. The whole row will be highlighted.
2. Pull down the Edit box on the main menu bar.
3. Choose Clear.
To Add a Case (i.e., a Row of Data)
1. Select any cell in the case from the row below where you want to insert the new case.
2. Pull down the Data box on the main menu bar.
3. Choose Insert.
Defining Variables. The default name for new variables is the prefix var and a sequential fivedigit number (e.g., var00001, var00002, var00003). To change the name, format and other
attributes of a variable.
1. Double click on the variable name at the top of a column or,
2. Click on the Variable View tab at the bottom of Data Editor Window.
3. Edit the variable name under column labeled Name. The variable name must be eight
characters or less in length. You can also specify the number of decimal places (under
Decimals), assign a descriptive name (under Label), define missing values (under
Missing), define the type of variable (under Measure; e.g., scale, ordinal, nominal), and
define the values for nominal variables (under Values).
After the data is entered (or several times during data entering), you will want to save it as an
SPSS save file. See the section on Saving Data As An SPSS Save File.
8
By default in SPSS a P-value is displayed as .000 if the P-value is less than .001. You can
report the P-value as <.001 or to have SPSS display more significant digits:
1. In a SPSS (output) Viewer window double click (with the left mouse button) on the table
containing the p-value you want to display differently A ``editing box'' should appear
around the table.
2. Click on the p-value using the right mouse button.
3. Choose Cell Properties. (If you do not get this option, you need to double click on the table
to get the ragged box.)
4. Change the number of decimals to the desired number (default is 3).
5. Choose OK or
6. Double click on the p-value with the left mouse button and SPSS will display the p-value
with more significant digits. If the p-value is very small, the p-value will be displayed in
scientific notation (e.g., 1.745E-10 = 0.0000000001745).
10
11
7. Choose OK
12
Exiting SPSS
To exit SPSS:
1. Choose File on the menu bar
2. Choose Exit SPSS
If you have made changes to the data file or the output file since the last time you saved these
files, before exiting SPSS you will be asked whether you want to save the contents of the Data
Editor window and Viewer window. If you are unsure as to whether you want to save the
contents of the data or output window, choose Cancel, then display the window(s) and if you
want to save the contents of the window, follow the instructions in this handout for saving data or
output windows. SPSS will use the overwrite method when saving the contents of the window.
13
14
Now, a new variable, lntrig, which is the natural logarithm of trig, will be added to your
data set. Remember to save your data set before exiting SPSS (e.g., while in the SPSS
Data window, choose Save under File or click on the floppy disk icon).
15
Display the Data Editor window (i.e., execute the following commands while in the Data
Editor window displaying the data file you want to use to recode variables).
2. Choose Transform on the menu bar
3. Choose Recode
4. Choose Into Same Variable... or Into Different Variable...
5. Select a variable to recode from the variable list on the left and then click on the arrow
located in the middle of the window. This defines the input variable.
6. If recoding into a different variable, enter the new variable name in the box under Name:,
then choose Change. This defines the output variable.
7. Choose Old and New Values...
8. Choose Value or Range under Old Value and enter old value(s).
9. Choose New Value and enter new value, then choose Add.
10. Repeat the process until all old values have been redefined.
11. Choose Continue
12. Choose OK
After creating a new variable(s), you will probably want to save the new variable(s) by re-saving
your data using the Save command under File box on the menu bar (See Saving Data as an SPSS
Save File).
Example: Recoding a Categorical Variable
You can use the commands for recoding a variable to change the coding values of a
categorical variable. You may want to change a coding value for a particular category to
modify which category SPSS uses as the referent category in a statistical procedure. For
example, suppose you want to perform linear regression using the ANOVA (or General
Linear Model) commands, and one of your independent variables is smoking status, smoke,
that is coded 1 for never smoked, 2 for former smoker and 3 for current smoker. By
default SPSS will use current smoker as the referent category because current smoker
has the largest numerical (code) value. If you want never smoked to be the referent
category you need to recode the value for never smoked to a value larger than 3.
Although you can recode the smoking status into the same variable, it is better to recode
the variable into a new/different variable, newsmoke, so you do not lose your original data
if you make an error while recoding.
16
1.
2.
3.
4.
17
Example: Creating Indicator or Dummy Variables
You can use the commands for recoding a variable to create indicator or dummy variables
in SPSS. Suppose you have a variable indicating smoking status, smoke, that is coded 1 for
never smoked, 2 for former smoker and 3 for current smoker. To create three new
indicator or dummy variables for never, former and current smoking:
2. Choose Transform
3. Choose Recode
4. Choose Into Different
Variables...
5. Select the variable smoke as
the Input variable
6. Enter neversmoke as the name
of the Output variable, and
then choose Change.
7. Choose Old and New Values...
8. Choose Value under Old
Value. (It may already be
selected.)
9. Enter 1 (code value for never
smoker)
10. Choose Value under New
Value. (It may already be
selected.)
11. Enter 1 (to indicate never
smoker)
12. Choose Add
13. Choose All Other Values
under Old Value.
14. Choose Value under New
Value.
15. Enter 0
16. Choose Add
17. Choose Continue
18. Choose OK
Now, you have created a binary indicator variable for never smoker (coded 1 if never
smoker, 0 if former or current smoker). Next, create a binary indicator variable for
former smoker.
18
2. Choose Transform
3. Choose Recode
4. Choose Into Different
Variables...
5. Select the variable smoke as
the Input variable
6. Enter formersmoke as the
name of the Output variable,
and then choose Change. (Or
change (edit) never to former,
and then choose Change).
7. Choose Old and New Values...
8. Choose 11 under
OldNew and then choose
Remove.
9. Choose Value under Old
Value.
10. Enter 2 (code value for former
smoker)
11. Choose Value under New
Value.
12. Enter 1 (to indicate former
smoker)
13. Choose Add
14. Choose Continue
15. Choose OK
Now, you have a created a binary indicator variable for former smoker (coded 1 if former
smoker, 0 if never or current smoker). To create a binary indicator variable for current
smoker you would use similar commands to those for creating the indicator variable for
former smoke, except that now the value of 3 for smoke is coded as 1 and all other values
are coded as 0.
19
Example: Creating a Categorical Variable From a Numerical Variable
You can use the commands for recoding a variable to create a categorical variable from a numerical
variable (i.e., group values of the numerical variable into categories). For example, suppose you have
a variable that is the number of pack years smoked, packyrs, and you want to create a categorical
variable with the four categories, 0, >0 to 10, >10 to 30, and >30 pack years smoked .
1.
2.
3.
4.
5.
18.
19.
20.
21.
22.
23.
24.
25.
26.
27.
28.
29.
Note that if you may want to use different coding values depending on which category you want to
be used as the referent category in certain statistical procedures. Remember to save your data set
before exiting SPSS.
20
Example: Frequency table and bar chart for the categorical variable, smoking status.
Smoking status is
the selected
variable(s) and
Bar charts under
Charts has
been selected.
Smoking status
Percent
Valid
Percent
never
590
59.0
59.0
59.0
former
293
29.3
29.3
88.3
100.0
current
Total
117
11.7
11.7
1000
100.0
100.0
60
Cumulative
Percent
50
40
Percent
Frequency
30
20
10
0
never
former
Sm oking status
cu rrent
21
Contingency Tables for Categorical Variables. To produce contingency tables for categorical
variables:
1.
2.
3.
4.
5.
6.
7.
8.
9.
no
537
91.0%
yes
53
9.0%
Total
590
100.0%
never
Count
% within Smoking status
former
Count
257
36
293
87.7%
12.3%
100.0%
current
Total
Count
106
11
117
90.6%
9.4%
100.0%
Count
900
100
1000
90.0%
10.0%
100.0%
22
Descriptive Statistics (& Histograms) for Numerical Variables. To produce descriptive
statistics and histograms for numerical variables:
1.
2.
3.
4.
23
Mean, standard
deviation, minimum
and maximum were
selected under
Statistics, and
histogram was
selected under
Charts
1000
0
Mean
72.14
Std. Deviation
5.275
Minimum
65
Maximum
90
Histogram of Age
Histogram
120
Frequency
100
80
60
40
20
Mean =72.14
Std. Dev. = 5.275
N =1,000
0
60
65
70
75
80
A ge
85
90
95
24
Descriptive Statistics (& Boxplots) by Groups for Numerical Variables. To produce
descriptive statistics and boxplots by groups for numerical variables:
1.
2.
3.
4.
25
Descriptives
Std.
Error
Statistic
Mean
95% Confidence
Interval for Mean
221.93
Lower Bound
219.15
Upper Bound
224.72
5% Trimmed Mean
221.63
Median
219.76
Variance
Std. Deviation
yes
36.751
111
Maximum
363
Range
252
49
Skewness
.184
.094
Kurtosis
.363
.188
2.150
Lower Bound
220.53
216.30
Upper Bound
224.76
Mean
95% Confidence
Interval for Mean
1.417
1350.641
Minimum
Interquartile Range
The explore
command by
default produces
a lot of different
summaries, so
you need to
select what to
report.
All summaries
are shown for all
groups the
table has been
cropped in this
example.
400
95
350
Total cholesterol
Total
cholesterol
Family
history of
heart
attack
no
812
172
438
875
729
659
300
250
200
150
100
no
yes
26
Using the Split File Option for Summaries by Groups for Categorical and Numerical
Variables. The Split File option in SPSS is a convenient way to produce summaries, graphs, and
run statistical procedures by groups. To activate the option:
1. Choose Data on the menu bar of the Data Editor window
2. Choose Split File
3. Choose Compare groups or Organize output by groups. The two options display the output
differently. Try each option to see which works best for your needs.
4. Choose the variable that defines the groups.
5. Choose OK
Now, all the summaries, graphs, and statistical procedures you request will be done
(automatically) for each group. To turn off this option:
1.
2.
3.
4.
Example. Use the Split File option to run summaries by family history of heart attack (yes
or no).
27
Using the Select Cases Option for Summaries for a subgroup of subjects/observations.
The Select Cases option in SPSS is a convenient way to produced summaries and run statistical
procedures for a subgroup of subjects or to temporary exclude subjects from the analysis. To
activate this option:
1.
2.
3.
4.
5.
6.
7.
Now, all the summaries, graphs, and statistical procedures you request will be done using only
the selected subjects/observations. To turn off this option:
1.
2.
3.
4.
Example: Select subjects not lipid lowering medications (i.e., subjects with lipid = 0
indicating no medications).
Definition
equal
not equal
greater than or equal
less than or equal
greater than
less than
and
or
28
Choose Graphs (& then Legacy Dialogs, if Version 15) from the menu bar.
Choose Bar...
Choose Simple, Clustered, or Stacked
Choose what the data in the bar chart represent (e.g., summaries for groups of cases).
Choose Define
Select a variable from the variable list on the left and the click on the arrow next to the
Category axis.
7. Choose what the bars represent (e.g., number of cases or percentage of cases)
8. Choose OK
60.0%
60.0%
50.0%
50.0%
40.0%
40.0%
Family history of
heart attack
P
e
r
c
e
n
t
Percent
no
yes
30.0%
30.0%
20.0%
20.0%
10.0%
10.0%
0.0%
0.0%
never
former
Smoking status
current
never
form er
S moking status
current
29
Histograms
The easiest way to produce simple histograms is to use the Histogram option with the
Frequencies... command. See Descriptive Statistics (& Histograms) for Numerical Variables.
You can produce only one histogram at a time using the Histogram command.
120
100
80
Frequency
60
40
20
Mean =26.2366
Std. Dev. =4.8667
N =1,000
0
10
20
30
40
50
Boxplots
The easiest way to produce simple boxplots is to use the Boxplot option with the Explore...
command. See Descriptive Statistics (& Boxplots) By Groups for Numerical Variables.
You can produce only one boxplot at a time using the Boxplot command.
880
684
400
Serum fasting glucose
77
673
200
785
0
norm al
impaired fasting
glucose
ADA diabetes status
diabetic
30
Normal Probability Plots. To produce Normal probability plots:
1. Choose Graphs (& then Legacy Dialogs, if Version 15) from the menu bar.
2. Choose Q-Q... to get a plot of the quantiles (Q-Q plot) or choose P-P... to get a plot of the
cumulative proportions (P-P plot)
3. Select the variables from the source list on the left and then click on the arrow located in the
middle of the window.
4. Choose Normal as the Test Distribution. The Normal distribution is the default Test
Distribution. Other Test Distributions can be selected by clicking on the down arrow and
clicking on the desired Test distribution.
5. Choose OK
SPSS will produce both a Normal probability plot and a detrended Normal probability plot for
each selected variable. Usually the Q-Q plot is the most useful for assessing if the distribution of
the variable is approximately Normal.
Normal Q-Q Plot of Serum fasting glucose
2 50
40
Expected Normal Value
2 00
1 50
1 00
50
30
20
-5 0
-200
10
200
O bserved Value
40 0
60 0
10
20
30
Obser ved Value
40
50
31
Error Bar Plot. To produce an error bar plot of the mean of a numerical variable (or the means
for different groups of subjects):
1.
2.
3.
4.
5.
6.
Choose Graphs (& then Legacy Dialogs, if Version 15) from the menu bar.
Choose Error Bar...
Choose Simple or Clustered
Choose what the data in the error bars represent (e.g., summaries for groups of cases).
Choose Define
Select a variable from the variable list on the left and then click on the arrow next to the
Variable box.
7. Select the variable from the variable list that defines the groups and then click on the arrow
next to Category Axis.
8. Select what the bars represent (e.g., confidence interval, standard deviation, standard error
of the mean)
9. Choose OK
Error Bar Plot
Mean +- 2 SD Serum fasting glucose
300
250
200
150
100
50
norm al
impaired fasting
glucose
diabetic
300
200
100
0
normal
im paired fasting
glucose
diabetic
32
Scatter Plot. To produce a scatter plot between two numerical variables:
HLD cholesterol vs BMI
140
120
HDL cholesterol
100
80
60
40
20
0
10
20
30
40
50
Adding a linear regression line to a scatter plot. To add a linear regression (least-squares) line
to a scatter plot of two numerical variables:
HLD cholesterol vs BMI
140
120
HDL cholesterol
100
80
60
40
20
R Sq Linear = 0.121
0
10
20
30
40
50
Additional options:
Choose Mean under Confidence Intervals (in the Properties window) to add a prediction
interval for the linear regression line to the scatter plot or
o
Choose Individual under Confidence Intervals to add a prediction interval for individual
observations to the scatter plot.
o
6.
Click on the ``X'' in the upper right hand corner of the Chart Editor window or choose File,
and then Close to return to the Viewer window.
33
HDL cholesterol
Adding a Loess (scatter plot) smooth to a scatter plot. To add a Loess smooth to a scatter plot
of two numerical variables:
1. While in the Viewer window double
click on the scatter plot. The scatter
HLD cholesterol vs BMI
plot should now be displayed in a
window titled Chart Editor.
2. Choose Elements.
140
3. Choose Fit Line at Total.
120
4. Choose Loess (in the Properties
100
window). Default options for % of
points to fit (50%) and kernel
80
(Epanechnikov) are usually the most
60
appropriate options.
40
5. Choose Apply (in the Properties
20
window). If a line was added to the
plot in Step 3, it will be replaced by
0
the loess smooth.
10
20
30
40
50
6. Click on the ``X'' in the upper right
B ody mass index
hand corner of the Chart Editor
window or choose File, and then Close
to return to the Viewer window.
0.
Stem-and-leaf Plot. To produce stem-and-leaf plot:
Severity of Illness Index Stem-andChoose Analyze on the menu bar
Leaf Plot
Choose Descriptive Statistics
Choose Explore...
Frequency
Stem & Leaf
Dependent List: To select the variables you
2.00
4 . 34
want from the source list on the left,
7.00
4 . 6688899
highlight a variable by pointing and clicking
10.00
5 . 0001112344
the mouse and then click on the arrow
3.00
5 . 568
located next to the dependent list box.
1.00 Extremes
(>=62)
Repeat the process until you have selected
Stem width:
10.00
all the variables you want.
Each
leaf:
1 case(s)
5. Choose Plots...
6. Choose Stem-and-leaf from the Descriptive
box. Note the option may already be
selected if the little box is not empty.
7. Choose None from the Boxplot box
8. Choose Continue
9. Choose Plots for the Display option
10. Choose OK
1.
2.
3.
4.
34
3941
3742
3515
2807
2495
2608
2551
2863
2807
3062
3260
2807
3459
2353
2977
2013
3118
3033
2892
3005
3374
4394
3118
3232
2098
2353
1616
3374
1984
3232
2637
2863
3175
3515
4423
3572
2495
3062
1503
2438
We want to know if the mean birth weight in the population of SIDS infant is different
from that of normal children, 3300 grams. We could construct a 95% confidence interval,
to see if the interval contains the value of 3300 grams or we could perform a one sample t
test to test if the mean in the SIDs population is equal to 3300 (versus not equal to 3300).
35
One-Sample Statistics
N
birth weight
48
Mean
2891.1250
Std. Error
Mean
89.97885
Std. Deviation
623.39177
One-Sample Test
Test Value = 0
95% Confidence Interval
of the Difference
birth weight
t
32.131
df
47
Sig. (2-tailed)
.000
Mean
Difference
2891.12500
Lower
2710.1109
Upper
3072.1391
36
To perform a one sample t test to test if the mean in the SIDs population is equal
to 3300 versus not equal to 3300.
N
birth weight
48
Mean
2891.1250
Std. Error
Mean
89.97885
Std. Deviation
623.39177
One-Sample Test
birth weight
t
-4.544
df
47
Sig. (2-tailed)
.000
Mean
Difference
-408.87500
Upper
-227.8609
37
Paired t Test
1.
2.
3.
4.
Confidence Interval for the Difference Between Means from Paired Sample
By default a 95% confidence interval for the difference means of the paired samples will be
computed when performing a paired t test. Choose Options to change the confidence level.
Prozac Example. To compare the effect of Prozac on anxiety 10 subjects are given one
week of treatment with Prozac and one week of treatment with a placebo. The order of
the treatments was randomized for each subject. An anxiety questionnaire was used to
measure a subject's anxiety on a scale of 0 to 30. Higher scores indicate more anxiety.
Subject
Placebo
Prozac
Difference
1
2
3
4
5
6
7
22
18
17
19
22
12
14
19
11
14
17
23
11
15
3
7
3
2
-1
1
-1
8
9
10
11
19
7
19
11
8
-8
8
-1
38
Paired t test and confidence interval for the difference between paired means.
Pair 1
placebo
prozac
Mean
16.1000
14.8000
N
10
10
Std. Deviation
4.95424
4.68568
Std. Error
Mean
1.56667
1.48174
10
Correlation
.556
Sig.
.095
Mean
Std.
Deviation
Paired Differences
Std. Error
95% Confidence Interval of
Mean
the Difference
Lower
Pair 1
placebo
- prozac
1.30000
4.54728
1.43798
-1.95293
Sig. (2tailed)
df
Upper
4.55293
.904
.390
Paired t test
Sig. (2 tailed) = two-sided p-value = 0.39
t = test statistic value = .904
df = degrees of freedom
39
Two-Sample t Test
1.
2.
3.
4.
Group
Mean
Standard deviation
Prepaid (GHC)
1167
24.0
15.3
Fee-for-service (KCM)
3207
26.4
17.1
We could compare the average age between the two groups using a two sample t test or a
confidence interval for the difference between the average ages of the two groups.
40
Two sample t test and 95% confidence interval for the difference between means
(from independent samples).
T-Test
Group Statistics
age
prov
GHC
KCM
N
1167
3207
Mean
23.9846
26.3676
Std. Deviation
15.30787
17.10260
Std. Error
Mean
.44810
.30200
Equal variances
assumed
Equal variances
not assumed
47.068
Sig.
.000
41
Independent Samples Test
Equal variances
assumed
Equal variances
not assumed
df
Sig. (2-tailed)
Mean
Difference
Std. Error
Difference
-4.188
4372
.000
-2.38306
.56896
-4.410
2293.698
.000
-2.38306
.54037
Two Sample t test. SPSS by default always performs both versions of the two
sample t test assuming equal variance and unequal variances
Sig. (2 tailed) = two sided p-value = <.001 (equal var.), <.001 (unequal var.)
t = test statistic value = -4.2 (equal var.), -4.4 (unequal var.)
df = degrees of freedom = 4372 (equal var.), 2294 (unequal var.)
mean difference = difference between means = -2.4 (equal and unequal var.)
std. error difference = standard error of the difference between means = .6 (equal
var.), .5 (unequal var.)
age
Equal variances
assumed
Equal variances
not assumed
Lower
Upper
-3.49851
-1.26760
-3.44273
-1.32338
42
5.
6.
7.
8.
Aspirin Example. To compare 2 types of Aspirin, A and B, 1 hour urine samples were
collected from 10 people after each had taken either A or B. A week later the same
routine was followed after giving the other type to the same 10 people.
Person
1
2
3
4
5
6
7
8
9
10
Type A
15
26
13
28
17
20
7
36
12
18
Mean = 19.2
Standard deviation = 8.63
Type B
13
20
10
21
17
22
5
30
7
11
15.6
7.78
Difference
2
6
3
7
0
-2
2
6
5
7
3.6 = d
3.098 = sd
A Sign test or Wilcoxon Signed Rank test could be used to compare the two types of
Aspirin.
43
Mean
19.2000
15.6000
10
10
Std. Deviation
8.62554
7.77746
Minimum
7.00
5.00
Maximum
36.00
30.00
25th
12.7500
9.2500
50th (Median)
17.5000
15.0000
Sign Test
Frequencies
N
aspirinb - aspirina
Negative
Differences(a)
Positive
Differences(b)
Ties(c)
Total
8
1
1
10
Sign Test
Exact sig. (2-tailed) = exact, two-sided
p-value = 0.039
The p-value is exact because it is
computed using the Binomial
distribution instead of using an
approximation to the Normal
distribution.
75th
26.5000
21.2500
44
Negative Ranks
Positive Ranks
8(a)
1(b)
Ties
1(c)
Total
10
Mean Rank
5.38
2.00
Sum of Ranks
43.00
2.00
Information used
in the test
statistic not
usually reported;
use the previous
descriptives.
Z
Asymp. Sig. (2-tailed)
aspirinb aspirina
-2.442(a)
.015
45
46
Mann-Whitney Test
Ranks
nickel
group
1
2
Total
9
9
Mean Rank
13.78
5.22
18
Sum of Ranks
124.00
47.00
Test Statistics(b)
Mann-Whitney U
Wilcoxon W
Z
nickel
2.000
47.000
.001
.000(a)
-3.403
47
n
1568
547
1310
Mean
55.8
55.7
53.5
Standard
Deviation
15.5
16.2
15.2
48
Oneway
ANOVA
HDL cholesterol
Between Groups
Within Groups
Sum of
Squares
4344.834
821904.577
Total
826249.411
df
2
3422
Mean Square
2172.417
240.183
F
9.045
Sig.
.000
3424
Descriptives
HDL cholesterol
N
Mean
Std.
Deviation
Std.
Error
Minimum
Maximum
.391
.693
Lower Bound
55.05
54.30
Upper Bound
56.59
57.03
21
24
138
149
15.192
.420
52.64
54.29
15
129
15.534
.265
54.38
55.42
15
149
normotensive
borderline
1568
547
55.82
55.67
15.500
16.202
definite
1310
53.47
Total
3425
54.90
49
Bonferroni
normotensive
borderline
definite
borderline
normotensive
Mean
Difference
(I-J)
definite
definite
normotensive
borderline
Std.
Error
Sig.
.157
2.356(*)
.770
.580
1.000
.000
Lower Bound
-1.69
.97
Upper Bound
2.00
3.74
-.157
.770
1.000
-2.00
1.69
2.198(*)
-2.356(*)
-2.198(*)
.789
.580
.789
.016
.000
.016
.31
-3.74
-4.09
4.09
-.97
-.31
The Bonferroni method is a method that shows all pairwise comparisons/differences along
with a p-value (sig.) adjusted for the number of comparisons. In this example, subjects
with normal blood pressure and borderline hypertension have similar HDL cholesterol
levels, but subjects with definite hypertension have different HDL cholesterol levels than
both subjects with normal blood pressure and borderline hypertension.
Homogeneous Subsets
HDL cholesterol
Subset for alpha = .05
Ryan-Einot-GabrielWelsch Range
Hypertension status
definite
borderline
normotensive
N
1310
547
1
53.47
55.67
1568
Sig.
55.82
1.000
.867
50
Kruskal-Wallis Test
1.
2.
3.
4.
n
1568
547
1310
Median
12
12
14
IQR*
9, 15
9, 17
11, 20
51
Kruskal-Wallis Test
Ranks
Serum insulin
Hypertension status
normotensive
borderline
N
1568
547
Mean Rank
1526.31
1685.28
definite
1310
1948.03
Total
3425
Test Statistics(a,b)
Chi-Square
df
Serum insulin
130.816
2
Asymp. Sig.
.000
52
NPar Tests
Binomial Test
positive
Group 1
Group 2
Category
yes
no
N
125
402
Observed
Prop.
.24
.76
Test Prop.
.3
Asymp. Sig.
(1-tailed)
.001(a,b)
Total
527
1.0
a Alternative hypothesis states that the proportion of cases in the first group < .3.
b Based on Z Approximation.
53
McNemar's Test
1.
2.
3.
4.
54
McNemars test
It doesnt matter for McNemars test
which variable is selected for the
Row(s): or Columns(s). You can run
more than one test at a time.
Under
Statistics
select McNemar.
Under Cells, in
this example,
select Total
percentages.
Crosstabs
TreatmentA * TreatmentB Crosstabulation
TreatmentB
TreatmentA
died
Count
% of Total
survived
Count
died
510
82.1%
% of Total
Total
Count
% of Total
Value
621
5
.8%
Total
515
82.9%
16
90
106
2.6%
14.5%
17.1%
526
95
621
84.7%
15.3%
100.0%
Chi-Square Tests
McNemar Test
N of Valid Cases
survived
McNemars test
Exact Sig.
(2-sided)
.027(a)
55
Chi-square Test, Fishers Exact test and Trend test for Contingency Tables
If the Chi-square test is requested for a 2 x 2 table, SPSS will also compute the Fisher's Exact
test. If the Chi-square test is requested for a table larger than 2 x 2, SPSS will also compute the
Mantel-Haenszel test for linear or linear by linear association between the row and column
variables.
1.
2.
3.
4.
56
Under
Statistics
select Chisquare.
Under Cells, in
this example,
select Row
percentages.
Crosstabs
familytype * asthma Crosstabulation
asthma
familytype
Count
% within familytype
Count
% within familytype
Total
Count
% within familytype
Total
No
85
85.0%
Yes
15
15.0%
100
100.0%
194
200
97.0%
3.0%
100.0%
279
21
300
93.0%
7.0%
100.0%
Chi-Square Tests
Pearson Chi-Square
Continuity
Correction(a)
Likelihood Ratio
Asymp.
Sig. (2sided)
.000
12.961
.000
13.745
.000
Value
14.747(b)
df
Exact Sig.
(2-sided)
Exact Sig.
(1-sided)
.000
.000
N of Valid Cases
300
a Computed only for a 2x2 table
b 0 cells (.0%) have expected count less than 5. The minimum expected count is 7.00.
Chi-square test
Pearson Chi-square (without continuity correction), p-value = <.001
Pearson Chi-square with continuity correction, p-value = <.001
Asymp. Sig. (2-sided) = two-sided p-value. Asymp. is an abbreviation for asymptotic, which
means the p-value is computed using a large sample approximation based on the Normal
distribution. Check that all cells have expected cell counts 5 or greater.
Value = test statistic value
df = degrees of freedom
57
Trend Test Example. A clinical trial of a drug therapy to control pain was
performed. The investigators wanted to investigate whether adverse responses to
the drug increased with larger drug doses. Subjects received either a placebo or
one of four drug doses. In this example dose is an ordinal variable, and it
reasonable to expect that as the dose increases and rate of adverse events will
increase.
Dose
Placebo
500 mg
1000 mg
2000 mg
4000 mg
Adverse event
% (n)
18.8% (6)
21.9% (7)
28.1% (9)
31.3% (10)
50.0% (16)
n
32
32
32
32
32
There are several different methods for performing a trend test with ordinal
variables. One test, which is available in SPSS is the Mantel-Haenszel chi-square,
also called the Mantel-Haenszel test for linear association or linear by linear
association chi-square test.
Adverse events
No
dose
Count
% within dose
500
Count
% within dose
1000
4000
Total
32
21.9%
100.0%
23
32
71.9%
28.1%
100.0%
22
10
32
68.8%
31.3%
100.0%
Count
% within dose
Count
% within dose
32
100.0%
25
Count
% within dose
Total
6
18.8%
78.1%
Count
% within dose
2000
Yes
26
81.3%
16
16
32
50.0%
50.0%
100.0%
112
48
160
70.0%
30.0%
100.0%
Chi-Square Tests
Pearson Chi-Square
Likelihood Ratio
Linear-by-Linear
Association
N of Valid Cases
Value
9.107(a)
8.836
8.876
160
4
4
Asymp. Sig.
(2-sided)
.058
.065
.003
df
a 0 cells (.0%) have expected count less than 5. The minimum expected count is 9.60.
58
Using Standardized Residuals in R x C tables. When the contingency table has
more then 2 rows and 2 columns it can be hard to determine the association or the
largest differences. Standard residuals are often helpful in describing the
association, if the chi-square test indicates there is a statistically significant
association. The (adjusted) standardized residual re-expresses the difference
between the observed cell count and expected cell count in terms of standard
deviation units below or above the value 0 (the expected differences if there is no
association), and the distribution of the standardized residuals has a standard
Normal distribution. Hence, values less than -2 or greater than 2 indicate large
differences and values less than -3 or greater than 3 indicate very large
differences.
Under Cells, select Adjusted
standardized for Residuals
1.8
-.1
-1.8
where there are fewer subjects with Stage I and more subjects with Stage III or
IV than expected if there was no association between education and stage of
disease. Also, to a lesser extent, among the subjects with a college graduate
degree there a more subjects with Stage I and fewer subject with Stage III or
IV than expected if there was no association between education and stage of
disease.
59
One sample binomial test, McNemar's test, Fisher's Exact test and Chi-square
test for 2 x 2 and R x C Contingency Tables Using Summary Data
There is an easy way in SPSS to perform a one sample binomial test, a McNemar's test, a
Fisher's Exact test or a Chi-square test for a 2 x 2 or R x C table when you only have summary
data (i.e., the number of observations in each cell).
One sample binomial test. Suppose you observe 15 cases of myocardial infarction (MI) in 5000
men over a 1 year period and you want to test if the rate of MI is equal to a previously reported
incidence rate of 5 per 1000 (or 0.005).
1. In a new (empty) SPSS Data Editor window enter the following 2
rows of data:
MI
0
1
Observed
4985
15
The values of 0 and 1 used to indicate MI (no/yes) are arbitrary. The variable names are also
arbitrary (e.g., you can leave them as var0001 and var0002).
2. Next, you want to weight cases by Observed:
Choose Data
Choose Weight Cases...
Choose Weight cases by
Choose Observed and then the arrow button so the variable appears in the Frequency variable
box.
Choose OK
3. Now, run the one sample binomial test:
Choose Analyze
Choose Nonparametric Tests
Choose Binomial...
Choose MI so that in appears in the Test Variable List
Change (edit) Test Proportion to .005.
Choose OK
60
McNemar's test. Suppose you have the following summary table of presence and absence of
DKA before and after therapy for paired data,
Before
therapy
No DKA
DKA
After therapy
No DKA
DKA
128
7
19
7
61
Chi-square test and Fisher's Exact test for a 2 x 2 table. Suppose you have the following
summary table for oral contraceptive (OC) use by presence or absence of cancer (case or
control),
OC Use
No
Yes
Cases (cancer)
111
6
Controls
387
8
1. In a new (empty) SPSS Data Editor window enter the following 4
rows of data:
Case OCuse Observed
1 0 111
1 1
6
0 0 387
0 1
8
The values of 0 and 1 used to indicate case/control and OC use (no/yes)
are arbitrary. The variable names are also arbitrary (e.g., you can
leave them as var0001, var0002, and var0003).
2. Next, you want to weight cases by Observed:
Choose Data
Choose Weight Cases...
Choose Weight cases by
Choose Observed and then the arrow button so the variable appears in the Frequency variable
box.
Choose OK
3. Now, run the Chi-square (\& Fisher's Exact) test
Choose Analyze
Choose Crosstabs
Choose Case and OCuse as the row the column variables
Choose Statistics...
Choose Chi-square
Choose Continue
Choose OK
62
The commands are similar for running the Chi-square test for tables larger than 2x 2. Suppose
you have the following summary table for education level by stage of disease at diagnosis
Education level
High school or less
College
College graduate
Stage of Disease
I
II
III or IV
20
24
35
37
32
23
40
29
21
63
64
3. Now,
Choose Analyze on the menu bar
Choose Descriptive Statistics
Choose Ratio...
Numerator: Select Gender
Denominator: Select Allones
Choose Statistics...
Choose both Mean and Confidence intervals under Central Tendency
Choose Continue
Choose OK
Example of the SPSS output using the previous summary data.
Ratio Statistics
Ratio Statistics for Gender / Allones
Mean
95% Confidence Interval
for Mean
Lower Bound
Upper Bound
.392
.375
.408
1.000
Coefficient of Dispersion
Coefficient of Variation
Median Centered
.
The confidence intervals are constructed by assuming a Normal distribution
for the ratios.
The observed
proportion was .392 or
39.2%.
A 95% confidence
interval is 37.5% to
40.8%.
65
The correlations are shown on the next page. Note that SPSS will display the correlation between
variable 1 and variable 2 and between variable 2 and variable 1, which are equivalent, and similarly
the correlations between all possible pairs of variables. So, all results displayed below the diagonal
of the matrix of results are redundant.
66
Correlations
1st entry = Pearson correlation coefficient
2nd entry = Sig. (2-tailed) = p-value
3rd entry = N = the number observations or subjects with non-missing data for both variables
Correlations
Catastroph
-izing
Pearson Correlation
Sig. (2-tailed)
Catastroph
izing
1
Beck
inventory
score
.602(**)
.000
118
118
118
116
.602(**)
.445(**)
-.079
.000
.397
N
Beck inventory
Pearson Correlation
score
Sig. (2-tailed)
.000
N
Interference
Pearson Correlation
Interference
.451(**)
.000
Maximum
assisted
opening
-.029
.758
118
118
118
116
.451(**)
.445(**)
-.068
.000
.000
Sig. (2-tailed)
N
.468
118
118
118
116
-.029
-.079
-.068
.758
.397
.468
116
** Correlation is significant at the 0.01 level (2-tailed).
116
116
Maximum
Pearson Correlation
assisted
Sig. (2-tailed)
opening
Correlation
between
Catastrophizing and
Interference
= .45
P-value =
<.001
N = 118
subjects
116
Nonparametric Correlations
1st entry = Spearman rank correlation coefficient
2nd entry = Sig. (2-tailed) = p-value
3rd entry = N = the number observations or subjects with non-missing data for both variables
Correlations
Beck
inventory
score
Interference
1.000
.625(**)
.451(**)
-.013
.000
.000
.892
118
118
118
116
.625(**)
1.000
.455(**)
-.110
.000
.000
.241
118
118
118
116
.451(**)
.455(**)
1.000
-.046
.000
.000
.621
118
118
118
116
-.013
-.110
-.046
1.000
.892
.241
.621
116
116
116
116
Catastrophizing
Spearman's
rho
Catastrophizing
Correlation
Coefficient
Sig. (2-tailed)
N
Beck inventory
score
Correlation
Coefficient
Sig. (2-tailed)
N
Interference
Correlation
Coefficient
Sig. (2-tailed)
N
Maximum
assisted
opening
Correlation
Coefficient
Sig. (2-tailed)
N
Maximum
assisted
opening
Rank
correlation
between
Catastrophiz
-ing and
Interference
= .45
P-value =
<.001
N = 118
subjects
67
Confidence Interval for a Correlation Coefficient
Typically the Crosstabs command is used to produce contingency tables for categorical variables.
One of the options under Statistics is used to compute the correlation coefficient, which would
you might want to calculate for ordinal variables. However, you can also use this option for
quantitative variables.
The Crosstabs command is found by selecting
Analyze and then Descriptive Statistics.
In this example the correlation between the
quantitative variables catastrophizing and
interference will be calculated.
Select Statistics and then select Correlations.
SPSS will produce a contingency table of the crosstabulation of the two variables which you can ignore.
SPSS will display the correlation coefficient and
standard error estimate for the correlation
Symmetric Measures
Value
Asymp. Std.
Error(a)
Approx. T(b)
Approx. Sig.
Interval by Interval
Pearson's R
.451
.068
5.445
.000(c)
Ordinal by Ordinal
Spearman Correlation
.451
.076
5.449
.000(c)
N of Valid Cases
118
68
Linear Regression
1.
2.
3.
4.
69
(Multi-)Collinearity diagnostics. This option computes various statistics for detecting
collinearity between the independent variables. For example, Tolerance is the proportion of a
variable's variance not accounted for by other independent variables in the equation. A
variable with a very low tolerance contributes little information to a model, and can cause
computational problems. Another statistic is the VIF (variance inflation factor). Large values
are an indicator of multicollinearity between independent variables.
Plots... which are useful for doing regression diagnostics:
Histogram or Normal Probability Plot (P-P plot) (of the standardized residuals).
Produce all partial (residual) plots
Other scatter plots
Save... which produced variables which are useful for doing regression diagnostics:
Predicted Values (unstandardized, standardized, adjusted)
Residuals (unstandardized, standardized, studentized, delete)
Distances (Mahalanobis, Cook's, Leverage)
Influence Statistics (dfBeta, dfFit)
Note that SPSS creates a new variable for each selected Save... option and adds the new
variables to the data file. The variable names are defined in the Variable View of the Data Editor.
Once you are done using these variables you may want to delete them from the data file or save
them (by re-saving the data file).
Method. Click on the down arrow to the right of Method to display the methods available for
independent variable entry (enter, stepwise, remove, backward, forward). Enter is the default
option. The other options you enter independent variables into the model using various stepwise
methods.
Options...
You can modify the entry and removal criteria used by stepwise, remove, backward, and
forward independent variable entry methods.
You can define how observations with missing data are handled.
Previous, Block \# of \#, Next
You can use these options to enter independent variables in blocks into the regression model.
You can select different methods of variable entry for each block. This option is also useful
for computing partial F tests with the R squared change option.
70
Example. Simple linear regression of forced expiratory volume (volume, 1 second) on
height (cm).
The dependent variable in
this example is forced
expiratory volumne (fev1).
There is only 1
independent variable in
this example, height.
Additional options can be
found under Statistics,
Plots, Save, & Options.
71
Regression
Variables Entered/Removed(b)
Variables
Variables
Entered
Removed
height(a)
.
a All requested variables entered.
b Dependent Variable: fev1
Model
1
Method
Enter
Model Summary(b)
Model
1
R
R Square
.562(a)
.315
a Predictors: (Constant), height
b Dependent Variable: fev1
Adjusted R
Square
.314
Std. Error of
the Estimate
.55337
ANOVA(b)
Model
1
Regression
Residual
Total
Sum of
Squares
112.380
244.054
356.434
a Predictors: (Constant), height
b Dependent Variable: fev1
df
1
797
798
Mean
Square
112.380
.306
F
366.997
Sig.
.000(a)
ANOVA = analysis of
variance table. Not
needed when there is only
1 independent variable in
the model. The F test is
equivalent to the t test
for testing if the slope is
equal to zero in the
output that follows. (F =
t2)
72
Coefficients(a)
Unstandardized
Coefficients
Std.
B
Error
1
(Constant)
-4.330
.335
height
.039
.002
a Dependent Variable: fev1
Standardized
Coefficients
Model
Sig.
Beta
.562
-12.943
19.157
.000
.000
Upper Bound
-3.673
.043
Charts
N ormal P -P Plot of Regression Standardized Residual
1.0
0.8
0.6
0.4
0.2
0.0
0.0
0.2
0.4
0.6
0.8
1.0
73
Linear Regression Example with three independent variables
The dependent variable is forced
expiratory volume (fev1).
The independent variables are
height, age and enter.
The Enter method means all 3
independent variables will be
included in the regression model.
Statistics options
By default, Estimates and Model fit are
selected.
In this example, part and partial correlations
and collinearity diagnostics are also selected.
Plots options
Normal probability plot (of the
standardized residuals) and partial
(residual) plots are selected.
74
Regression
Variables Entered/Removed(b)
Model
1
Variables
Entered
Variables
Removed
gender,
age,
height(a)
Method
.
Enter
Std. Error of
the Estimate
.53531
ANOVA(b)
Model
1
Regression
Residual
Sum of
Squares
128.623
227.811
df
3
795
Mean
Square
42.874
.287
F
149.621
Sig.
.000(a)
Total
356.434
798
a Predictors: (Constant), gender, age, height
b Dependent Variable: fev1
Coefficients(a)
(Constant)
height
Unstandardized
Coefficients
Std.
B
Error
-.780
.593
.028
.003
age
-.025
Standardized
Coefficients
Sig.
Zeroorder
Beta
Partial
Part
Tolerance
VIF
.399
-1.315
9.143
.562
.308
.259
.423
2.364
.004
-.200
-6.857
.000
-.206
-.236
-.194
.944
1.059
.273
.059
a Dependent Variable: fev1
.201
4.591
.000
.478
.161
.130
.420
2.379
gender
.189
.000
Collinearity
Statistics
Correlations
Height, age, and gender are all statistically significant (P < .001), i.e., the regression
coefficients are different from zero.
The partial correlations (and partial R-squares, .308 2=.095, -.2362 =.056, and .1612=.026)
indicate the correlation with the dependent variable adjusted for the other variables in
the regression model.
A low tolerance value (say, <.20) or a high variance inflation factor (VIF) (say, > 5 or 10)
may indicate a multicollinearity problem.
75
1.0
0.8
0.6
0.4
0.2
0.0
0.0
0.2
0.4
0.6
0.8
1.0
0.00
fev1
-2.00
-30.00
-20 .00
-1 0.0 0
0.00
1 0.00
20.00
30.00
height
fev1
0.00
-2.00
-15.00
-10.00
- 5.0 0
0 .00
5 .00
10.00
15.00
20.00
age
Note that SPSS will also produce a partial residual plot for gender. In general, the partial
residuals plots for categorical/nominal variables are not very useful. Boxplots of the
residuals for each category of a categorical/nominal variable are useful for regression
diagnostics. To produce the boxplots you could use the Save options to save the
residuals from a regression and then the Boxplot commands to plot the residuals.
76
77
Example. Linear regression of forced expiratory volume on height (continuous variable)
and diabetes status (categorical variables; normal, impaired fasting glucose, diabetic).
Forced expiratory volume
(fev1) is the dependent
variable.
Diabetes is a categorical
variables with the 3
categories
Height is a continuous
variable
78
diabetes
3
1
Mean
Square
38.206
51.195
F
125.606
168.308
Sig.
.000
.000
1.118
3.677
.026
366.168
.000
df
2.237
height
111.378
111.378
Error
241.817
795
.304
Total
3773.779
799
Corrected Total
356.434
798
a R Squared = .322 (Adjusted R Squared = .319)
Parameter Estimates
Dependent Variable: fev1
Parameter
Std.
Error
Sig.
Intercept
[diabetes=1.00]
-4.392
.126
.337
.049
-13.025
2.549
.000
.011
[diabetes=2.00]
.046
.056
.830
.407
[diabetes=3.00]
0(a)
.039
.002
19.136
.000
a This parameter is set to zero because it is redundant.
.035
.043
height
-.063
.156
79
Example. Adding an interaction between diabetes status and height in the regression
model
To add an interaction
between two variables,
select the Build Term(s) to
show Interaction, select
two variables under Factors
& Covariates and then
select the arrow under
Build Term(s)
diabetes
df
5
1
Mean
Square
22.989
42.741
F
75.492
140.354
Sig.
.000
.000
.272
.136
.447
.639
94.349
94.349
309.823
.000
.328
.164
.539
.583
Error
241.488
793
.305
Total
3773.779
799
356.434
798
height
diabetes * height
Corrected Total
Std. Error
Sig.
Intercept
[diabetes=1.00]
-4.373
-.168
.673
.818
-6.498
-.206
.000
.837
[diabetes=2.00]
.614
.963
.637
.524
[diabetes=3.00]
0(a)
height
.039
.004
9.506
.000
[diabetes=1.00] * height
.002
.005
.361
.719
[diabetes=2.00] * height
-.003
.006
-.593
.553
[diabetes=3.00] * height
0(a)
.
a This parameter is set to zero because it is redundant.
80
Logistic Regression
1.
2.
3.
4.
81
Note that SPSS creates a new variable for each selected Save... option and adds the new
variables to the data file. The variable names are defined in the Viewer window. Once you are
done using these variables you may want to delete them from the data file or save them (be resaving the data file).
Method Click on the down arrow to the right of Method to display the methods available for
independent variable entry (enter, forward:conditional, forward:LR, forward:Wald,
backward:conditional, backward:LR, backward:Wald).
Options...
Confidence interval for odds ratio (CI for exp(B))
Hosmer-Lemeshow goodness-of-fit
You can modify the entry and removal criteria used by the backward and forward variable
entry methods.
Previous, Block # of #, Next You can use these options to enter independent variables in blocks
into the regression model. You can select different methods of variable entry for each block.
Example. Logistic regression will be used to determine the relationship between any use
of health services (coded 0 = no use, 1 = any use) and age, health index, gender and race.
Subjects in the study (Model Cities Data Set) were followed for a varying amount of time,
so the number of months followed (expos) will also be included as an independent variable
in the logistic regression model.
The dependent variable,
anyuse, is binary.
82
You can use the Categorical option to
define which variables are categorical and
SPSS will create the indicator variables.
By default the category with the largest
numerical value (last) will be the
reference group. Here, the category with
the smallest numerical value was selected
as the reference group.
Under Options you can select to have the
95% confidence intervals for the odds
ratios displayed in the output.
Also, you can run the Hosmer-Lemeshow
goodness-of-fit test.
Logistic Regression
Case Processing Summary
Unweighted Cases(a)
Selected Cases
Included in Analysis
Missing Cases
Total
Unselected Cases
N
3199
1175
Percent
73.1
26.9
4374
100.0
.0
Total
4374
100.0
a If weight is in effect, see classification table for the total number of cases.
Dependent Variable Encoding
Original Value
.00
1.00
Internal Value
0
1
83
Categorical Variables Codings
Parameter coding
race
female
white
other
Frequency
497
455
(1)
.000
1.000
black
2247
.000
male
1450
.000
female
1749
1.000
(2)
.000
.000
1.000
female(1) = female
(male is the reference group)
Caution! Make sure you understand the interpretation of the indicator variables that
SPSS creates. It is very easy to get confused. For example, in this example the variable
race is coded 1=white, 2=other, 3=black. A common mistake would be to interpret race(1) =
white and race(2) = other.
Step
Block
Chi-square
301.534
301.534
Model
301.534
df
6
6
Sig.
.000
.000
.000
Model Summary
-2 Log
Cox & Snell
Nagelkerke R
likelihood
R Square
Square
2609.415(a)
.090
.151
a Estimation terminated at iteration number 5 because parameter estimates changed by less than .001.
Step
1
Classification Table(a)
Observed
Step 1
anyuse
Overall
percentage
a The cut value is .500
.00
.00
1.00
Predicted
anyuse
percent
1.00
correct
0
542
.0
0
2657
100.0
83.1
84
Hosmer and Lemeshow Test
Step
1
Chi-square
8.368
df
Hosmer-Lemeshow goodness-of-fit
statistic is formed by grouping the data
into g groups (usually
Sig.
.398
anyuse = 1.00
Total
Observed
Expected
Observed
Expected
Observed
124
123.653
197
197.347
321
101
97.310
218
221.690
319
79
81.589
241
238.411
320
73
67.769
248
253.231
321
57
54.600
263
265.400
320
33
41.820
287
278.180
320
32
29.724
288
290.276
320
16
21.258
304
298.742
320
13
15.538
307
304.462
320
10
14
8.740
304
309.260
318
and expected values can be used to help identify where there is lack-of-fit when present.
The last table of the output usually has the results we are most interested in. It lists the
odds ratios, p-values and 95% confidence intervals for the odds ratios.
Variables in the Equation
B
S.E.
Wald
df
Sig.
Exp(B)
Step
1(a)
expos
Upper
.077
.006
167.398
.000
1.080
1.068
1.093
age
.009
.003
8.118
.004
1.009
1.003
1.016
female(1)
.501
.099
25.363
.000
1.650
1.358
2.005
12.715
.002
.950
race
race(1)
-.424
.190
4.964
.026
.655
.451
race(2)
-.530
.149
12.689
.000
.588
.440
.788
health
.048
.010
23.603
.000
1.049
1.029
1.070
-.337
.196
2.958 1
.085
a Variable(s) entered on step 1: expos, age, female, race, health.
.714
Constant
85
It is often helpful to write on your output the definition of the indicator variables, so you
dont get confused about the interpretation of the results. Also, helpful to change Exp(B)
to odds ratio, and sig. to P-value.
Odds
Ratio
Step
1(a)
expos
95.0% C.I.for
odds ratio
Lower
Upper
P-value
1.080
1.068
1.093
.000
age
1.009
1.003
1.016
.004
1.650
1.358
2.005
.000
.655
.451
.950
.026
race
other vs white
black vs white
health
.002
.588
.440
.788
.000
1.049
1.029
1.070
.000