You are on page 1of 40

SPSS TUTORIAL

Basic Data Entry

Name
The name of each SPSS variable in a given file must be unique; it must start with a letter;
it may have up to 8 characters (including letters, numbers, and the underscore _ (note that
certain key words are reversed and may not be used as variable names, e.g., "compute",
"sum", and so forth). To change an existing name, click in the cell containing the name,
highlight the part you want to change, and type in the replacement. To create a new
variable name, click in the first empty row under the name column and type a new
(unique) variable name.

Notice that we can use "cat_dog" but not "cat-dog" and not "cat dog". The hyphen gets
interpreted as subtraction (cat minus dog) by SPSS, and the space confuses SPSS as to
how many variables are being named.

Type
The two basic types of variables that you will use are numeric and string. Numeric
variables may only have numbers assigned. String variables may contain letters or
numbers, but even if a string variable happens to contain only numbers, numeric
operations on that variable will not be allowed (e.g., finding the mean, variance, standard
deviation, etc...). To change a variable type, click in that cell on the grey box with ...

Clicking on this box will bring up the variable type menu:

If you select a numeric variable, you can then click in the width box or the decimal box to
change the default values of 8 characters reserved to displaying numbers with 2 decimal
places. For whole numbers, you can drop the decimals down to 0.
If you select a string variable, you can tell SPSS how much "room" to leave in memory
for each value, indicating the number of characters to be allowed for data entry in this
string variable.

Width
The width of a variable is the number of characters SPSS will allow to be entered for the
variable. If it is a numerical value with decimals, this total width has to include a spot for
each decimal, as well as one for the decimal point. You can change a width by clicking in
the width cell for the desired variable and typing a new number or you can use the arrow
keys at the edge of the cell

Decimals
The decimals of a variable is the number of decimal places that SPSS will display. If
more decimals have been entered (or computed by SPSS), the additional information will
be retained internally but not displayed on screen. For whole numbers, you would reduce

the number of decimals to zero. You can change the number of decimal places by clicking
int he decimals cell for the desired variable and typing a new number or you can use the
arrow keys at the edge of the cell

Label
The label of a variable is a string of text to indentify in more detail what a variable
represents. Unlike the name, the label is limited to 255 characters and may contain spaces
and punctuation. For instance, if there is a variable for each question on a questionnaire,
you would type the question as the variable label. To change or edit a variable label,
simply click anywhere within the cell.

Values
Although the variable label goes a long way to explaining what the variable represents,
for categorical data (discrete data of both nominal and ordinal levels of measurement), we
often need to know which numbers represent which categories. To indicate how these
numbers are assigned, one can add labels to specific values by clicking on the ... box in
the values cell

Clicking here opens up the Value Labels dialogue box.

Click in the Value field to type a specific numeric value

Click in the Label field to type the corresponding label

Click on the Add button to add this pair of value and label to the list

You can remove a pairing created above by clicking on that pair and then clicking on the
delete button. Similarly, you can change pairing by clicking on the pair, then typing in a
new value, a new label, or both; then, you click on the Change button. When you are
satisfied with the definitions of each value, click on the OK button

Undesirable Variable Types


By Ruben Geert van den Bergon January 20, 2015 under 1.3. SPSS Data Preparation
Tutorial.
SPSS has two variable types: string variables and numeric variables. String variables have
String under Type in Variable View. All other variables are numeric. The screenshot below
illustrates this point for hotel_evaluation.sav.

A problem with some data files is that they contain string variables that should have been
numeric. A rule of thumb is thatonly nominal variables with many distinct values should be
string variables.Right. In Variable View we see that fname, bday, age and q1are string
variables. The screenshot below shows them in Data View.

First, fname holds respondents first names. Is it nominal? Yes. Does is have many different
values? Yes. Conclusion: it's an appropriate string variable. No problem here.
Second, bday holds respondents birthdays. Is it nominal? No. Conclusion: this should have been
a numeric variable. More precisely, it should be a date variable (which is also a numeric
variable). Solution: convert it. Convert String to Date Variableshows how to do so but we'll skip
that for now.
Third, age is also a metric instead of a nominal variable and thus had better be converted to
numeric as well. We'll cover this inSPSS Convert String to Numeric Variable but we'll skip it for

now.
Fourth, q1 appears to be an ordinal variable. It's not nominal and it doesn't have many distinct
values either so it's not a proper string variable. A labeled numeric variable (similar to q2 for
example) would be appropriate here. For now, we'll skip converting it.
Cronbachs Alpha (reliability analysis)
Step 2:
(a) Select "Analyze"
(b) Select "Scale"
(c) Select "Reliability Analysis"
Figure 2 shows what your screen should now display.
Figure 2: Reliability Analysis Command

Step 3: A pop-up window will appear for reliability analysis. In this window are two boxes, one
to the left and one to the right. The left contains the variables entered in SPSS (TV1, TV2, etc.),
the box to the right, which is labeled "Items," is where one moves those variables for which
Cronbach's alpha is desired. Note that I have selected the three Task Value variables in Figure 3.

Figure 3: Reliability Analysis Pop-up Window

In Figure 4, note that I have moved the three Task Value variables to the box on the right for
these are the three for which I desire Cronbach's alpha. Once we run this analysis, Cronbach's
alpha will be calculated for the three Task Value variables (items) to provide information about
the internal consistency of those three items. If we also wanted to obtain Cronbach's alpha for the
Anxiety items, would would need to re-run the analysis with only the Anxeity items appearing in
the "Items:" box. To run Cronbach's alpha with both sets of items, Task Value and Anxiety, would
be a mistake because those six items are not designed to measure the same construct and the
alpha that would result would be uninterpretable.

Figure 4: Reliability Analysis Pop-up Window

Step 4: Select desired statistics for the analysis. Click on the "Statistics" button which can be
seen in Figure 4. Once that button is selected, a pop-up window labeled "Statistics" will appear.
This window is displayed in Figure 5 below. Note in Figure 5 that I have placed a check mark
next to "Scale" and "Scale if item deleted." You should also select those two. After selecting
those two options, then click on the "Continue" button to return to the "Reliability Analysis" popup window displayed above in Figure 4, then click on the "OK" button to run the analysis.

Figure 5: Statistical Options for Reliability Analysis

Step 5: Analysis of results.


(a) Overall alpha: Now that Cronbach's alpha has been run for the three Task Value items, we
must next examine the results. Figure 6 below displays some of the results obtained. The red
arrow points to the overall alpha for the three Task Value items. As the results in Figure 6 show,
overall alpha is .907, which is very high and indicates strong internal consistency among the
three Task Value items. Essentially this means that respondents who tended to select high scores
for one item also tended to select high scores for the others; similarly, respondents who selected
a low scores for one item tended to select low scores for the other Task Value items. Thus,
knowing the score for one Task Value item would enable one to predict with some accuracy the
possible scores for the other two Task Value items. Had alpha been low, this ability to predict
scores from one item would not be possible.

Figure 6: Statistical Results for Reliability Analysis (overall alpha highlighted)

(b) Corrected Item-Total Correlation: Figure 7 below highlights the column containing the
"Corrected Item-Total Correlation" for each of the items. This column displays the correlation
between a given Task Value item and the sum score of the other two items. For example, the
correlation between Task Value item 1 and the sum of items 2 and 3 (i.e., item 2 + item 3) is r = .
799. What this means is that there is a strong, positive correlation between the scores on the one
item (item 1) and the combined score of the other two (items 2 and 3). This is a way to assess
how well one item's score is internally consistent with composite scores from all other items that
remain. If this correlation is weak (de Vaus suggests anything less than .30 is a weak correlation
for item-analysis purposes [de Vaus (2004), Suveys in Social Research, Routledge, p. 184]), then
that item should be removed and not used to form a composite score for the variable in question.
For example, if the correlation between scores for item 1 and the combined scores of items 2 and
3 was low, say r = .15, then when we create the composite (overall) score for Task Value (the step
taken after reliability analysis) we would create the composite using only items 2 and 3 and we
would simply ignore scores from item 1 because it was not internally consistent with the other
items.

Figure 7: Statistical Results for Reliability Analysis (Corrected Item-Total Correlation)

(c) Cronbach's Alpha if item Deleted: Figure 8 displays Cronbach's alpha that would result if a
given item were deleted. Like the item-total correlation presented above in (b), this column of
information is valuable for determining which items from among a set of items contributes to the
total alpha. The value presented in this column represents the alpha value if the given item were
not included. For example, for Task Value item 1, the Cronbach's alpha if item 1 was deleted
would drop from the overall total of .907 to .880. Since alpha would drop with the removal of
TV1, this item appears to be useful and contribute to the overall reliability of Task Value. Item 3,
however is less certain. Cronbach's alpha would increase from .907 to .911 if item 3 were deleted
or not used for computing an overall Task Value score. So should this item be removed and
should the overall Task Value composite be created only from items 1 and 2? In this case the
answer is no, we should instead retain all three items. Why? Note first that alpha does not
increase by a large degree from deleting item 3. Second, note that item 3 still correlates very well
with the composite score from items 1 and 2 (the item-total correlation for item 3 is .759). Since
deletion of item 3 results in little change, and since item 3 correlates well with the composite of
items 1 and 2, there is no statistical reason to drop item 3.

Figure 8: Statistical Results for Reliability Analysis (Cronbach's Alpha if item Deleted)

Once reliability analysis are done and items are deleted/ or not deleted then second step is to
develop scale.
Compute Variables

To compute a new variable, click Transform > Compute Variable.

The Compute Variable window will open where you will specify how to calculate your new
variable.

A Target Variable: The name of the new variable that will be created during the computation.
Simply type a name for the new variable in the text field. Once a variable is entered here, you
can click on Type & Label to assign a variable type and give it a label. The default type for
new variables is numeric.
B The left column lists all of the variables in your dataset. You can use this menu to add
variables into a computation: either double-click on a variable to add it to the Numeric
Expression field, or select the variable(s) that will be used in your computation and click the
arrow to move them to the Numeric Expression text field (C).
N umeric Expression: Specify how to compute the new variable by writing a numeric
expression.
D The center of the window includes a collection of arithmetic operators, Boolean operators,
and numeric characters, which you can use to specify how your new variable will be calculated.
There are many kinds of calculations you can specify by selecting a variable (or multiple
variables) from the left column, moving them to the center text field, and using the blue buttons
to specify values (e.g., 1) and operations (e.g., +, *, /).
E If: The If option allows you to specify the conditions under which your computation will be
applied.

F Function group: You can also use the built-in functions in the Function group list on the
right-hand side of the window. The function group contains many useful, common functions that
may be used for calculating values for new variables (e.g., mean, logarithm). To find a specific
function, simply click one of the function groups in the Function Group list. You will now see a
list of functions that belong to that function group in the Functions and Special Variables area.
If you click on a specific function, a description of that function will appear in the text field to
the left.

Click If (indicated by letter E in the above image) to open the Compute Variable: If Cases
window.

EXAMPLE: COMPUTING A NEW VARIABLE USING ARITHMETIC


Now we will use what we have learned throughout this tutorial to demonstrate how to compute a
new variable. In this example, we wish to compute a new variable called AverageScore that is the
average of four test scoresvariablesEnglish, Reading, Math, and Writing.
1. Click Transform > Compute Variable.

2. In the Target Variable field, type a name for the new variable that will be computed.
Let's call our new variableAverageScore.
3. Highlight each variableEnglish, Reading, Math, and Writingfrom the list on the left
and click the arrow to move each variable to the Numeric
Expression field. (Alternatively, you can double-click on the variable name to move it
to the Numeric Expression field.) Make sure you click the spacebar to create a space
between each variable.
4. Now your four variables will appear in the Numeric Expression field. Move your cursor
between each set of variables and click the + sign to add the symbol for addition to the
numeric expression. Now your expression should appear as English + Reading + Math +
Writing.
5. Now insert parentheses around the expression so that it appears as (English + Reading +
Math + Writing).
6. At the end of the expression, add the / sign and the number 4. Now your expression
should appear as(English + Reading + Math + Writing) / 4.
7. The final expression indicates that the new variable, AverageScore will be calculated as
the average of the four test scores.
8. Click OK to complete the computation and apply the changes to the data.
9. Finally, lets make sure that a new variable called AverageScore was successfully created.
o We can find the new variable in the last column in Data View or in the last row of
Variable View. If you do not see the new variable, the computation was
unsuccessful.
o We can check the syntax that was executed by looking at the log in the Output
Viewer window. After running Compute Variable, the syntax that should have
appeared in the output window is:
o COMPUTE FinalGrade1=(English + Reading + Math + Writing) / 4.
EXECUTE.
If there was an error in how the computation was specified, the log in the Output
Viewer will often show an error message.
o It is also useful to explore whether the computation you specified was applied
correctly to the data. You can spot-check the computation by viewing your data in
the Data View tab. To check that the new variable computed correctly, you can
manually calculate the averages for a few cases in your dataset just to spot-check
that the computation worked correctly.
Recoding New Variables (Reverse Coding)
RECODE INTO DIFFERENT VARIABLES

Recoding into a different variable transforms an original variable into a new variable. That is, the
changes do not overwrite the original variable; they are instead applied to a copy of the original
variable under a new name.
To recode into different variables, click Transform > Recode into Different Variables.

The Recode into Different Variables window will appear.

The left column lists all of the variables in your dataset. Select the variable you wish to recode
by clicking it. Click the arrow in the center to move the selected variable to the center text box,
(B).
A Input Variable -> Output Variable: The center text box lists the variable(s) you have
selected to recode, as well as the name your new variable(s) will have after the recode. You will
define the new name in (C).

B Output Variable: Define the name and label for your recoded variable(s) by typing them in
the text fields. Once you are finished, click Change. Now the center text box, (B), will display
both the name of the original variable as well as the name for the new variable (e.g., Height -->
Height_categ).
C Old and New Variables: Click the Old and New Values to specify how you wish to recode
the values for the selected variable.
D If: The If option allows you to specify the conditions under which your recode will be
applied. (We discuss the Ifoption in more detail later in this tutorial.)

Old and New Values


Once you click Old and New Values, a new window where you will specify how to transform
the values will appear.

1 Old Value: Specify the type of value you wish to recode (e.g., a specific value, missing data,
or a range of values) and the specific value to be recoded (e.g., a value of 1 or a range of 15).

When recoding variables, always handle the missing values first! The most common recoding
errors come from not explicitly accounting for missing values, so that they end up lumped in
with the valid values.
Value: Enter a specific numeric code representing an existing category.
System-missing: Applies to any system-missing values (.)
System- or user-missing: Applies to any system-missing values (.) or special missing
value codes defined by the user in the Variable View window
Range: For use with ordered categories or continuous measurements. Enter the lower and
upper boundaries that should be coded. The recoded category will include both
endpoints, so data values that are exactly equal to the boundaries will be included in that
category.
Range, LOWEST through value: For use with ordered categories or continuous
measurements. Recode all values less than or equal to some number.
Range, value through HIGHEST: For use with ordered categories or continuous
measurements. Recode all values greater than or equal to some number.
All other values: Applies to any value not explicitly accounted for by the previous
recoding rules. If using this setting, it should be applied last.
2 New Value: Specify the new value for your variable (i.e., a specific numeric code such as 2,
system-missing, or copy old values).
3 Old -> New: Once you have selected the old and new values for your selected variable in (1)
and (2), clickAdd in area (3), Old-->New. The recode that you have specified now appears in the
text field. If you need to change one of the recodes that you have added to the Old-->New area
section, simply click on the one you wish to change and make changes in (1) and (2) as
necessary.
You will need to repeat these steps for each value that you wish to recode. Once you have
specified all the transformations that you wish to make for the selected variable, click the
Continue button.
Descriptive Statistics
See the Pdf File uploaded

Inferential Statistics
Correlational Analysis

The Pearson product-moment correlation coefficient (Pearsons correlation, for short) is a


measure of the strength and direction of association that exists between two variables
measured on at least an interval scale. For example, you could use a Pearsons correlation to
understand whether there is an association between exam performance and time spent
revising; whether there is an association between depression and length of unemployment;
and so forth.

Click Analyze > Correlate > Bivariate... on the menu system as shown below:

Published with written permission from SPSS Inc., an IBM Company.


You will be presented with the following screen:

Published with written permission from SPSS Inc., an IBM Company.

Transfer the variables Height and Jump_Dist into the Variables: box by dragging-anddropping or by clicking the

button. You will end up with a screen similar to the one below:

Published with written permission from SPSS Inc., an IBM Company.


Note: If you study involves calculating more than one correlation and you want to carry out
these correlations at the same time, we show you how to do this in our enhanced Pearsons
correlation guide. We also show you how to write up the results from multiple correlations.

Make sure that the Pearson tickbox is checked under the -Correlation Coefficients- area
(although it is selected by default in SPSS).

Click the

button. If you wish to generate some descriptives, you can do it here

by clicking on the relevant tickbox under the-Statistics- area.

Published with written permission from SPSS Inc., an IBM Company.

Click the

Click the

button.
button.

Correlations Box
Take a look at the first box in your output file called Correlations. You will see your variable
names in two rows. In this example, you can see the variable name water in the first row and
the variable name skin in the second row. You will also see your two variable names in two
columns. See the variable names water and skin in the columns on the right? You will see four
boxes on the right hand side. These boxes will all contain numbers that represent variable
crossings. For example, the top box on the right represents the crossing between the water
variable and the skin variable. The bottom box on the left also happens to represent this
crossing. These are the two boxes that we are interested in. They will have the same information
so we really only need to read from one. In these boxes, you will see a value for Pearsons r, a
Sig. (2-tailed) value and a number (N) value.
Pearsons r
You can find the Pearsons r statistic in the top of each box. The Pearsons r for the
correlation between the water and skin variables in our example is 0.985.

When Pearsons r is close to 1


This means that there is a strong relationship between your two variables. This means that
changes in one variable are strongly correlated with changes in the second variable. In
our example, Pearsons r is 0.985. This number is very close to 1. For this reason, we can
conclude that there is a strong relationship between our water and skin variables.
However, we cannot make any other conclusions about this relationship, based on this
number.
When Pearsons r is close to 0
This means that there is a weak relationship between your two variables. This means that
changes in one variable are not correlated with changes in the second variable. If our
Pearsons r were 0.01, we could conclude that our variables were not strongly correlated.
When Pearsons r is positive (+)
This means that as one variable increases in value, the second variable also increase in
value. Similarly, as one variable decreases in value, the second variable also decreases in
value. This is called a positive correlation. In our example, our Pearsons r value of 0.985
was positive. We know this value is positive because SPSS did not put a negative sign in
front of it. So, positive is the default. Since our example Pearsons r is positive, we can
conclude that when the amount of water increases (our first variable), the participant skin
elasticity rating (our second variable) also increases.
When Pearsons r is negative (-)

This means that as one variable increases in value, the second variable decreases in value.
This is called a negative correlation. In our example, our Pearsons r value of 0.985 was
positive. But what if SPSS generated a Pearsons r value of -0.985? If SPSS generated a
negative Pearsons r value, we could conclude that when the amount of water increases
(our first variable), the participant skin elasticity rating (our second variable) decreases.
Sig (2-Tailed) value
You can find this value in the Correlations box. This value will tell you if there is a
statistically significant correlation between your two variables. In our example, our Sig.
(2-tailed) value is 0.002.

If the Sig (2-Tailed) value is greater than 05


You can conclude that there is no statistically significant correlation between your two
variables. That means, increases or decreases in one variable do not significantly relate to
increases or decreases in your second variable.
If the Sig (2-Tailed) value is less than or equal to .05
You can conclude that there is a statistically significant correlations between your two
variables. That means, increases or decreases in one variable do significantly relate to
increases or decreases in your second variable.
Our Example
The Sig. (2-Tailed) value in our example is 0.002. This value is less than .05. Because of
this, we can conclude that there is a statistically significant correlation between amount of
water consumed in glasses and participant rating of skin elasticity.

So what about the scatterplot?


You can find your scatterplot in your output file. It will look something like the graph
below. You will see a bunch of dots. Your scatterplot can tell you about the relationship
between variables, just like Pearsons r. With it, you can determine the strength and
direction of the relationship between variables.

Relationship strength
Try to imagine a line that connects the dots in your scatterplot. Is this an easy or difficult
task? This task can help you determine the strength of the relationship between your two
variables. If your variables have a strong relationship, it will be easy for your to imagine
a line connecting all of the dots. For example, in our example scatterplots, the dots seem
to go together to form a straight line. However, some scatterplots do not look like this.
With some scatterplots, the dots are scattered about so that it is very hard to imagine a
line connecting them. The dots are not densely positioned in one place. Instead, they are
all over the place. When this is the case, your variables may not have a strong
relationship.
Relationship Direction
You can use your scatterplot to understand the direction of your relationship. Your
scatterplot can tell you if you have a positive, negative or zero correlation.
Positive correlation in a scatterplot
If the line that you imagine in your graph slopes upward from zero, you can conclude
that you have a positive correlation between your variables. Increases in one variable are
correlated with increases in your other variable. Similarly, decreases in one variable are
correlated with decreases in your other variable.

Negative correlation in a scatterplot


If the line that you imagine in your graph starts high at zero and gradually slopes
downward, you can conclude that you have a negative correlation between your
variables. Increases in one variable are correlated with decreases in your other variable.
Zero correlation in a scatterplot
If the line that you imagine does not slop, or you cant imagine a line at all, you can
conclude that you have a zero correlation between your variables. That means that your
variables are not related to one another. Increases or decreases in one variable have no
effect on increases or decreases in your second variable.
Multiple Regression Analysis
Multiple regression is an extension of simple linear regression. It is used when we want to
predict the value of a variable based on the value of two or more other variables. The variable we
want to predict is called the dependent variable (or sometimes, the outcome, target or criterion
variable). The variables we are using to predict the value of the dependent variable are called the
independent variables (or sometimes, the predictor, explanatory or regressor variables).
For example, you could use multiple regression to understand whether exam performance can be
predicted based on revision time, test anxiety, lecture attendance and gender. Alternately, you
could use multiple regression to understand whether daily cigarette consumption can be
predicted based on smoking duration, age when started smoking, smoker type, income and
gender.
Multiple regression also allows you to determine the overall fit (variance explained) of the model
and the relative contribution of each of the predictors to the total variance explained. For
example, you might want to know how much of the variation in exam performance can be
explained by revision time, test anxiety, lecture attendance and gender "as a whole", but also the
"relative contribution" of each independent variable in explaining the variance.

Click Analyze > Regression > Linear... on the main menu, as shown below:

Published with written permission from SPSS Statistics, IBM Corporation.


Note: Don't worry that you're selecting Analyze > Regression > Linear... on the main menu or
that the dialogue boxes in the steps that follow have the title, Linear Regression. You have not
made a mistake. You are in the correct place to carry out the multiple regression procedure. This
is just the title that SPSS Statistics gives, even when running a multiple regression procedure.

You will be presented with the Linear Regression dialogue box below:

Published with written permission from SPSS Statistics, IBM Corporation.

Transfer the dependent variable, VO2max , into the Dependent: box and the independent
variables, age , weight , heart_rate and gender into the Independent(s): box, using the
buttons, as shown below (all other boxes can be ignored):

Published with written permission from SPSS Statistics, IBM Corporation.

Note: For a standard multiple regression you should ignore the


and
buttons
as they are for sequential (hierarchical) multiple regression. The Method: option needs to be kept
at the default value, which is

. If, for whatever reason,

you need to change Method: back to


. The
by SPSS Statistics to standard regression analysis.

is not selected,
method is the name given

Click the
button. You will be presented with the Linear Regression:
Statistics dialogue box, as shown below:

Published with written permission from SPSS Statistics, IBM Corporation.

In addition to the options that are selected by default, select Confidence intervals in the
Regression Coefficients area leaving the Level (%): option at "95". You will end up with the
following screen:

Published with written permission from SPSS Statistics, IBM Corporation.

Click the

Click the

button. You will be returned to the Linear Regression dialogue box.


button. This will generate the output.

Interpreting and Reporting the Output of Multiple Regression Analysis


SPSS Statistics will generate quite a few tables of output for a multiple regression analysis. In
this section, we show you only the three main tables required to understand your results from the
multiple regression procedure, assuming that no assumptions have been violated. However, in
this "quick start" guide, we focus only on the three main tables you need to understand your
multiple regression results, assuming that your data has already met the eight assumptions
required for multiple regression to give you a valid result:
Determining how well the model fits
The first table of interest is the Model Summary table. This table provides the R, R2,
adjusted R2, and the standard error of the estimate, which can be used to determine how well a
regression model fits the data:

Published with written permission from SPSS Statistics, IBM Corporation.

The "R" column represents the value of R, the multiple correlation coefficient. R can be
considered to be one measure of the quality of the prediction of the dependent variable; in this
case, VO2max . A value of 0.760, in this example, indicates a good level of prediction. The "R
Square" column represents the R2 value (also called the coefficient of determination), which is
the proportion of variance in the dependent variable that can be explained by the independent
variables (technically, it is the proportion of variation accounted for by the regression model
above and beyond the mean model). You can see from our value of 0.577 that our independent
variables explain 57.7% of the variability of our dependent variable, VO2max . However, you
also need to be able to interpret "Adjusted R Square" (adj. R2) to accurately report your data.
We explain the reasons for this, as well as the output, in our enhanced multiple regression guide.

Statistical significance
The F-ratio in the ANOVA table (see below) tests whether the overall regression model is a good
fit for the data. The table shows that the independent variables statistically significantly predict
the dependent variable, F(4, 95) = 32.393, p < .0005 (i.e., the regression model is a good fit of
the data).

Published with written permission from SPSS Statistics, IBM Corporation.

Estimated model coefficients


The general form of the equation to predict VO2max from age , weight , heart_rate , gender ,
is:
predicted VO2max = 87.83 (0.165 x age ) (0.385 x weight ) (0.118 x heart_rate ) +
(13.208 x gender )
This is obtained from the Coefficients table, as shown below:

Published with written permission from SPSS Statistics, IBM Corporation.

Unstandardized coefficients indicate how much the dependent variable varies with an
independent variable when all other independent variables are held constant. Consider the effect
of age in this example. The unstandardized coefficient, B1, for age is equal to -0.165
(see Coefficients table). This means that for each one year increase in age, there is a decrease in
VO2max of 0.165 ml/min/kg.
Statistical significance of the independent variables
You can test for the statistical significance of each of the independent variables. This tests
whether the unstandardized (or standardized) coefficients are equal to 0 (zero) in the population.
If p < .05, you can conclude that the coefficients are statistically significantly different to 0
(zero). The t-value and corresponding p-value are located in the "t" and "Sig." columns,
respectively, as highlighted below:

Published with written permission from SPSS Statistics, IBM Corporation.


You can see from the "Sig." column that all independent variable coefficients are statistically
significantly different from 0 (zero). Although the intercept, B0, is tested for statistical
significance, this is rarely an important or interesting finding.
Putting it all together

You could write up the results as follows:

General

A multiple regression was run to predict VO2max from gender, age, weight and heart rate. These
variables statistically significantly predicted VO2max, F(4, 95) = 32.393, p < .0005, R2 = .577.
All four variables added statistically significantly to the prediction, p < .05.
Comparing Means

Independent T-Test
The independent-samples t-test (or independent t-test, for short) compares the means between
two unrelated groups on the same continuous, dependent variable. For example, you could use an
independent t-test to understand whether first year graduate salaries differed based on gender
(i.e., your dependent variable would be "first year graduate salaries" and your independent
variable would be "gender", which has two groups: "male" and "female"). Alternately, you could
use an independent t-test to understand whether there is a difference in test anxiety based on
educational level (i.e., your dependent variable would be "test anxiety" and your independent
variable would be "educational level", which has two groups: "undergraduates" and
"postgraduates").

Click Analyze > Compare Means > Independent-Samples T Test... on the top menu,
as shown below:

Published with written permission from SPSS, IBM Corporation.


You will be presented with the Independent-Samples T Test dialogue box, as shown below:

Published with written permission from SPSS, IBM Corporation.

Transfer the dependent variable, Cholesterol , into the Test Variable(s): box, and transfer
the independent variable, Treatment , into the Grouping Variable: box, by highlighting the
relevant variables and pressing the

buttons. You will end up with the following screen:

Published with written permission from SPSS, IBM Corporation.

You then need to define the groups (treatments). Click on the


will be presented with the Define Groups dialogue box, as shown below:

button. You

Published with written permission from SPSS, IBM Corporation.

Enter 1 into the Group 1: box and enter 2 into the Group 2: box. Remember that we
labelled the Diet Treatment group as 1 and the Exercise Treatment group as 2.

Published with written permission from SPSS Inc., an IBM Company.


Note: If you have more than 2 treatment groups in your study (e.g., 3
groups: diet, exercise and drug treatment groups), but only wanted to compared two (e.g.,
the diet and drug treatment groups), you could type in 1 to Group 1: box and 3 to Group 2:box
(i.e., if you wished to compare the diet with drug treatment).

Click the

button.

If you need to change the confidence level limits or change how to exclude cases, click
the

button. You will be presented with the following:

Published with written permission from SPSS, IBM Corporation.

Click the
button. You will be returned to the Independent-Samples T
Test dialogue box.

Click the

button.

Output of the independent t-test in SPSS


SPSS generates two main tables of output for the independent t-test. However, in this "quick
start" guide, we take you through each of the two main tables in turn, assuming that your data
met all the relevant assumptions.
Group Statistics Table
This table provides useful descriptive statistics for the two groups that you compared, including
the mean and standard deviation.

Published with written permission from SPSS Inc., an IBM Company.


Unless you have other reasons to do so, it would be considered normal to present information on
the mean and standard deviation for this data. You might also state the number of participants
that you had in each of the two groups. This can be useful when you have missing values and the
number of recruited participants is larger than the number of participants that could be analysed.
A diagram can also be used to visually present your results. For example, you could use a bar
chart with error bars (e.g., where the error bars could use the standard deviation, standard error or
95% confidence intervals). This can make it easier for others to understand your results. Again,
we show you how to do this in our enhanced independent t-test guide.
Independent Samples Test Table
This table provides the actual results from the independent t-test.

Published with written permission from SPSS Inc., an IBM Company.


We can see that the group means are significantly different because the value in the "Sig. (2tailed)" row is less than 0.05. Looking at the Group Statistics table, we can see that those
people who undertook the exercise trial had lower cholesterol levels at the end of the programme
than those who underwent a calorie-controlled diet.
Reporting the output of the independent t-test

Based on the results above, we could report the results of the study as follows

General (interpretation for the report)


This study found that overweight, physically inactive male participants had statistically
significantly lower cholesterol concentrations (5.80 0.38 mmol/L) at the end of an exercisetraining programme compared to after a calorie-controlled diet (6.15 0.52 mmol/L), t(38) =
2.428, p = 0.020.

ANOVA (More than two groups)


Output

Published with written permission from SPSS Statistics, IBM Corporation.

SPSS Statisticstop ^
ANOVA Table
This is the table that shows the output of the ANOVA analysis and whether we have a statistically
significant difference between our group means. We can see that the significance level is 0.021
(p = .021), which is below 0.05. and, therefore, there is a statistically significant difference in the
mean length of time to complete the spreadsheet problem between the different courses taken.
This is great to know, but we do not know which of the specific groups differed.

Published with written permission from SPSS Statistics, IBM Corporation.


Reporting the output of the one-way ANOVA
Based on the results above, we could report the results of the study as:

General
There was a statistically significant difference between groups as determined by one-way
ANOVA (F(2,27) = 4.467, p = .021). The study suggested that advanced level took lesser time to
complete problems in spreadsheet than other levels.

You might also like