Professional Documents
Culture Documents
Name
The name of each SPSS variable in a given file must be unique; it must start with a letter;
it may have up to 8 characters (including letters, numbers, and the underscore _ (note that
certain key words are reversed and may not be used as variable names, e.g., "compute",
"sum", and so forth). To change an existing name, click in the cell containing the name,
highlight the part you want to change, and type in the replacement. To create a new
variable name, click in the first empty row under the name column and type a new
(unique) variable name.
Notice that we can use "cat_dog" but not "cat-dog" and not "cat dog". The hyphen gets
interpreted as subtraction (cat minus dog) by SPSS, and the space confuses SPSS as to
how many variables are being named.
Type
The two basic types of variables that you will use are numeric and string. Numeric
variables may only have numbers assigned. String variables may contain letters or
numbers, but even if a string variable happens to contain only numbers, numeric
operations on that variable will not be allowed (e.g., finding the mean, variance, standard
deviation, etc...). To change a variable type, click in that cell on the grey box with ...
If you select a numeric variable, you can then click in the width box or the decimal box to
change the default values of 8 characters reserved to displaying numbers with 2 decimal
places. For whole numbers, you can drop the decimals down to 0.
If you select a string variable, you can tell SPSS how much "room" to leave in memory
for each value, indicating the number of characters to be allowed for data entry in this
string variable.
Width
The width of a variable is the number of characters SPSS will allow to be entered for the
variable. If it is a numerical value with decimals, this total width has to include a spot for
each decimal, as well as one for the decimal point. You can change a width by clicking in
the width cell for the desired variable and typing a new number or you can use the arrow
keys at the edge of the cell
Decimals
The decimals of a variable is the number of decimal places that SPSS will display. If
more decimals have been entered (or computed by SPSS), the additional information will
be retained internally but not displayed on screen. For whole numbers, you would reduce
the number of decimals to zero. You can change the number of decimal places by clicking
int he decimals cell for the desired variable and typing a new number or you can use the
arrow keys at the edge of the cell
Label
The label of a variable is a string of text to indentify in more detail what a variable
represents. Unlike the name, the label is limited to 255 characters and may contain spaces
and punctuation. For instance, if there is a variable for each question on a questionnaire,
you would type the question as the variable label. To change or edit a variable label,
simply click anywhere within the cell.
Values
Although the variable label goes a long way to explaining what the variable represents,
for categorical data (discrete data of both nominal and ordinal levels of measurement), we
often need to know which numbers represent which categories. To indicate how these
numbers are assigned, one can add labels to specific values by clicking on the ... box in
the values cell
Click on the Add button to add this pair of value and label to the list
You can remove a pairing created above by clicking on that pair and then clicking on the
delete button. Similarly, you can change pairing by clicking on the pair, then typing in a
new value, a new label, or both; then, you click on the Change button. When you are
satisfied with the definitions of each value, click on the OK button
A problem with some data files is that they contain string variables that should have been
numeric. A rule of thumb is thatonly nominal variables with many distinct values should be
string variables.Right. In Variable View we see that fname, bday, age and q1are string
variables. The screenshot below shows them in Data View.
First, fname holds respondents first names. Is it nominal? Yes. Does is have many different
values? Yes. Conclusion: it's an appropriate string variable. No problem here.
Second, bday holds respondents birthdays. Is it nominal? No. Conclusion: this should have been
a numeric variable. More precisely, it should be a date variable (which is also a numeric
variable). Solution: convert it. Convert String to Date Variableshows how to do so but we'll skip
that for now.
Third, age is also a metric instead of a nominal variable and thus had better be converted to
numeric as well. We'll cover this inSPSS Convert String to Numeric Variable but we'll skip it for
now.
Fourth, q1 appears to be an ordinal variable. It's not nominal and it doesn't have many distinct
values either so it's not a proper string variable. A labeled numeric variable (similar to q2 for
example) would be appropriate here. For now, we'll skip converting it.
Cronbachs Alpha (reliability analysis)
Step 2:
(a) Select "Analyze"
(b) Select "Scale"
(c) Select "Reliability Analysis"
Figure 2 shows what your screen should now display.
Figure 2: Reliability Analysis Command
Step 3: A pop-up window will appear for reliability analysis. In this window are two boxes, one
to the left and one to the right. The left contains the variables entered in SPSS (TV1, TV2, etc.),
the box to the right, which is labeled "Items," is where one moves those variables for which
Cronbach's alpha is desired. Note that I have selected the three Task Value variables in Figure 3.
In Figure 4, note that I have moved the three Task Value variables to the box on the right for
these are the three for which I desire Cronbach's alpha. Once we run this analysis, Cronbach's
alpha will be calculated for the three Task Value variables (items) to provide information about
the internal consistency of those three items. If we also wanted to obtain Cronbach's alpha for the
Anxiety items, would would need to re-run the analysis with only the Anxeity items appearing in
the "Items:" box. To run Cronbach's alpha with both sets of items, Task Value and Anxiety, would
be a mistake because those six items are not designed to measure the same construct and the
alpha that would result would be uninterpretable.
Step 4: Select desired statistics for the analysis. Click on the "Statistics" button which can be
seen in Figure 4. Once that button is selected, a pop-up window labeled "Statistics" will appear.
This window is displayed in Figure 5 below. Note in Figure 5 that I have placed a check mark
next to "Scale" and "Scale if item deleted." You should also select those two. After selecting
those two options, then click on the "Continue" button to return to the "Reliability Analysis" popup window displayed above in Figure 4, then click on the "OK" button to run the analysis.
(b) Corrected Item-Total Correlation: Figure 7 below highlights the column containing the
"Corrected Item-Total Correlation" for each of the items. This column displays the correlation
between a given Task Value item and the sum score of the other two items. For example, the
correlation between Task Value item 1 and the sum of items 2 and 3 (i.e., item 2 + item 3) is r = .
799. What this means is that there is a strong, positive correlation between the scores on the one
item (item 1) and the combined score of the other two (items 2 and 3). This is a way to assess
how well one item's score is internally consistent with composite scores from all other items that
remain. If this correlation is weak (de Vaus suggests anything less than .30 is a weak correlation
for item-analysis purposes [de Vaus (2004), Suveys in Social Research, Routledge, p. 184]), then
that item should be removed and not used to form a composite score for the variable in question.
For example, if the correlation between scores for item 1 and the combined scores of items 2 and
3 was low, say r = .15, then when we create the composite (overall) score for Task Value (the step
taken after reliability analysis) we would create the composite using only items 2 and 3 and we
would simply ignore scores from item 1 because it was not internally consistent with the other
items.
(c) Cronbach's Alpha if item Deleted: Figure 8 displays Cronbach's alpha that would result if a
given item were deleted. Like the item-total correlation presented above in (b), this column of
information is valuable for determining which items from among a set of items contributes to the
total alpha. The value presented in this column represents the alpha value if the given item were
not included. For example, for Task Value item 1, the Cronbach's alpha if item 1 was deleted
would drop from the overall total of .907 to .880. Since alpha would drop with the removal of
TV1, this item appears to be useful and contribute to the overall reliability of Task Value. Item 3,
however is less certain. Cronbach's alpha would increase from .907 to .911 if item 3 were deleted
or not used for computing an overall Task Value score. So should this item be removed and
should the overall Task Value composite be created only from items 1 and 2? In this case the
answer is no, we should instead retain all three items. Why? Note first that alpha does not
increase by a large degree from deleting item 3. Second, note that item 3 still correlates very well
with the composite score from items 1 and 2 (the item-total correlation for item 3 is .759). Since
deletion of item 3 results in little change, and since item 3 correlates well with the composite of
items 1 and 2, there is no statistical reason to drop item 3.
Figure 8: Statistical Results for Reliability Analysis (Cronbach's Alpha if item Deleted)
Once reliability analysis are done and items are deleted/ or not deleted then second step is to
develop scale.
Compute Variables
The Compute Variable window will open where you will specify how to calculate your new
variable.
A Target Variable: The name of the new variable that will be created during the computation.
Simply type a name for the new variable in the text field. Once a variable is entered here, you
can click on Type & Label to assign a variable type and give it a label. The default type for
new variables is numeric.
B The left column lists all of the variables in your dataset. You can use this menu to add
variables into a computation: either double-click on a variable to add it to the Numeric
Expression field, or select the variable(s) that will be used in your computation and click the
arrow to move them to the Numeric Expression text field (C).
N umeric Expression: Specify how to compute the new variable by writing a numeric
expression.
D The center of the window includes a collection of arithmetic operators, Boolean operators,
and numeric characters, which you can use to specify how your new variable will be calculated.
There are many kinds of calculations you can specify by selecting a variable (or multiple
variables) from the left column, moving them to the center text field, and using the blue buttons
to specify values (e.g., 1) and operations (e.g., +, *, /).
E If: The If option allows you to specify the conditions under which your computation will be
applied.
F Function group: You can also use the built-in functions in the Function group list on the
right-hand side of the window. The function group contains many useful, common functions that
may be used for calculating values for new variables (e.g., mean, logarithm). To find a specific
function, simply click one of the function groups in the Function Group list. You will now see a
list of functions that belong to that function group in the Functions and Special Variables area.
If you click on a specific function, a description of that function will appear in the text field to
the left.
Click If (indicated by letter E in the above image) to open the Compute Variable: If Cases
window.
2. In the Target Variable field, type a name for the new variable that will be computed.
Let's call our new variableAverageScore.
3. Highlight each variableEnglish, Reading, Math, and Writingfrom the list on the left
and click the arrow to move each variable to the Numeric
Expression field. (Alternatively, you can double-click on the variable name to move it
to the Numeric Expression field.) Make sure you click the spacebar to create a space
between each variable.
4. Now your four variables will appear in the Numeric Expression field. Move your cursor
between each set of variables and click the + sign to add the symbol for addition to the
numeric expression. Now your expression should appear as English + Reading + Math +
Writing.
5. Now insert parentheses around the expression so that it appears as (English + Reading +
Math + Writing).
6. At the end of the expression, add the / sign and the number 4. Now your expression
should appear as(English + Reading + Math + Writing) / 4.
7. The final expression indicates that the new variable, AverageScore will be calculated as
the average of the four test scores.
8. Click OK to complete the computation and apply the changes to the data.
9. Finally, lets make sure that a new variable called AverageScore was successfully created.
o We can find the new variable in the last column in Data View or in the last row of
Variable View. If you do not see the new variable, the computation was
unsuccessful.
o We can check the syntax that was executed by looking at the log in the Output
Viewer window. After running Compute Variable, the syntax that should have
appeared in the output window is:
o COMPUTE FinalGrade1=(English + Reading + Math + Writing) / 4.
EXECUTE.
If there was an error in how the computation was specified, the log in the Output
Viewer will often show an error message.
o It is also useful to explore whether the computation you specified was applied
correctly to the data. You can spot-check the computation by viewing your data in
the Data View tab. To check that the new variable computed correctly, you can
manually calculate the averages for a few cases in your dataset just to spot-check
that the computation worked correctly.
Recoding New Variables (Reverse Coding)
RECODE INTO DIFFERENT VARIABLES
Recoding into a different variable transforms an original variable into a new variable. That is, the
changes do not overwrite the original variable; they are instead applied to a copy of the original
variable under a new name.
To recode into different variables, click Transform > Recode into Different Variables.
The left column lists all of the variables in your dataset. Select the variable you wish to recode
by clicking it. Click the arrow in the center to move the selected variable to the center text box,
(B).
A Input Variable -> Output Variable: The center text box lists the variable(s) you have
selected to recode, as well as the name your new variable(s) will have after the recode. You will
define the new name in (C).
B Output Variable: Define the name and label for your recoded variable(s) by typing them in
the text fields. Once you are finished, click Change. Now the center text box, (B), will display
both the name of the original variable as well as the name for the new variable (e.g., Height -->
Height_categ).
C Old and New Variables: Click the Old and New Values to specify how you wish to recode
the values for the selected variable.
D If: The If option allows you to specify the conditions under which your recode will be
applied. (We discuss the Ifoption in more detail later in this tutorial.)
1 Old Value: Specify the type of value you wish to recode (e.g., a specific value, missing data,
or a range of values) and the specific value to be recoded (e.g., a value of 1 or a range of 15).
When recoding variables, always handle the missing values first! The most common recoding
errors come from not explicitly accounting for missing values, so that they end up lumped in
with the valid values.
Value: Enter a specific numeric code representing an existing category.
System-missing: Applies to any system-missing values (.)
System- or user-missing: Applies to any system-missing values (.) or special missing
value codes defined by the user in the Variable View window
Range: For use with ordered categories or continuous measurements. Enter the lower and
upper boundaries that should be coded. The recoded category will include both
endpoints, so data values that are exactly equal to the boundaries will be included in that
category.
Range, LOWEST through value: For use with ordered categories or continuous
measurements. Recode all values less than or equal to some number.
Range, value through HIGHEST: For use with ordered categories or continuous
measurements. Recode all values greater than or equal to some number.
All other values: Applies to any value not explicitly accounted for by the previous
recoding rules. If using this setting, it should be applied last.
2 New Value: Specify the new value for your variable (i.e., a specific numeric code such as 2,
system-missing, or copy old values).
3 Old -> New: Once you have selected the old and new values for your selected variable in (1)
and (2), clickAdd in area (3), Old-->New. The recode that you have specified now appears in the
text field. If you need to change one of the recodes that you have added to the Old-->New area
section, simply click on the one you wish to change and make changes in (1) and (2) as
necessary.
You will need to repeat these steps for each value that you wish to recode. Once you have
specified all the transformations that you wish to make for the selected variable, click the
Continue button.
Descriptive Statistics
See the Pdf File uploaded
Inferential Statistics
Correlational Analysis
Click Analyze > Correlate > Bivariate... on the menu system as shown below:
Transfer the variables Height and Jump_Dist into the Variables: box by dragging-anddropping or by clicking the
button. You will end up with a screen similar to the one below:
Make sure that the Pearson tickbox is checked under the -Correlation Coefficients- area
(although it is selected by default in SPSS).
Click the
Click the
Click the
button.
button.
Correlations Box
Take a look at the first box in your output file called Correlations. You will see your variable
names in two rows. In this example, you can see the variable name water in the first row and
the variable name skin in the second row. You will also see your two variable names in two
columns. See the variable names water and skin in the columns on the right? You will see four
boxes on the right hand side. These boxes will all contain numbers that represent variable
crossings. For example, the top box on the right represents the crossing between the water
variable and the skin variable. The bottom box on the left also happens to represent this
crossing. These are the two boxes that we are interested in. They will have the same information
so we really only need to read from one. In these boxes, you will see a value for Pearsons r, a
Sig. (2-tailed) value and a number (N) value.
Pearsons r
You can find the Pearsons r statistic in the top of each box. The Pearsons r for the
correlation between the water and skin variables in our example is 0.985.
This means that as one variable increases in value, the second variable decreases in value.
This is called a negative correlation. In our example, our Pearsons r value of 0.985 was
positive. But what if SPSS generated a Pearsons r value of -0.985? If SPSS generated a
negative Pearsons r value, we could conclude that when the amount of water increases
(our first variable), the participant skin elasticity rating (our second variable) decreases.
Sig (2-Tailed) value
You can find this value in the Correlations box. This value will tell you if there is a
statistically significant correlation between your two variables. In our example, our Sig.
(2-tailed) value is 0.002.
Relationship strength
Try to imagine a line that connects the dots in your scatterplot. Is this an easy or difficult
task? This task can help you determine the strength of the relationship between your two
variables. If your variables have a strong relationship, it will be easy for your to imagine
a line connecting all of the dots. For example, in our example scatterplots, the dots seem
to go together to form a straight line. However, some scatterplots do not look like this.
With some scatterplots, the dots are scattered about so that it is very hard to imagine a
line connecting them. The dots are not densely positioned in one place. Instead, they are
all over the place. When this is the case, your variables may not have a strong
relationship.
Relationship Direction
You can use your scatterplot to understand the direction of your relationship. Your
scatterplot can tell you if you have a positive, negative or zero correlation.
Positive correlation in a scatterplot
If the line that you imagine in your graph slopes upward from zero, you can conclude
that you have a positive correlation between your variables. Increases in one variable are
correlated with increases in your other variable. Similarly, decreases in one variable are
correlated with decreases in your other variable.
Click Analyze > Regression > Linear... on the main menu, as shown below:
You will be presented with the Linear Regression dialogue box below:
Transfer the dependent variable, VO2max , into the Dependent: box and the independent
variables, age , weight , heart_rate and gender into the Independent(s): box, using the
buttons, as shown below (all other boxes can be ignored):
is not selected,
method is the name given
Click the
button. You will be presented with the Linear Regression:
Statistics dialogue box, as shown below:
In addition to the options that are selected by default, select Confidence intervals in the
Regression Coefficients area leaving the Level (%): option at "95". You will end up with the
following screen:
Click the
Click the
The "R" column represents the value of R, the multiple correlation coefficient. R can be
considered to be one measure of the quality of the prediction of the dependent variable; in this
case, VO2max . A value of 0.760, in this example, indicates a good level of prediction. The "R
Square" column represents the R2 value (also called the coefficient of determination), which is
the proportion of variance in the dependent variable that can be explained by the independent
variables (technically, it is the proportion of variation accounted for by the regression model
above and beyond the mean model). You can see from our value of 0.577 that our independent
variables explain 57.7% of the variability of our dependent variable, VO2max . However, you
also need to be able to interpret "Adjusted R Square" (adj. R2) to accurately report your data.
We explain the reasons for this, as well as the output, in our enhanced multiple regression guide.
Statistical significance
The F-ratio in the ANOVA table (see below) tests whether the overall regression model is a good
fit for the data. The table shows that the independent variables statistically significantly predict
the dependent variable, F(4, 95) = 32.393, p < .0005 (i.e., the regression model is a good fit of
the data).
Unstandardized coefficients indicate how much the dependent variable varies with an
independent variable when all other independent variables are held constant. Consider the effect
of age in this example. The unstandardized coefficient, B1, for age is equal to -0.165
(see Coefficients table). This means that for each one year increase in age, there is a decrease in
VO2max of 0.165 ml/min/kg.
Statistical significance of the independent variables
You can test for the statistical significance of each of the independent variables. This tests
whether the unstandardized (or standardized) coefficients are equal to 0 (zero) in the population.
If p < .05, you can conclude that the coefficients are statistically significantly different to 0
(zero). The t-value and corresponding p-value are located in the "t" and "Sig." columns,
respectively, as highlighted below:
General
A multiple regression was run to predict VO2max from gender, age, weight and heart rate. These
variables statistically significantly predicted VO2max, F(4, 95) = 32.393, p < .0005, R2 = .577.
All four variables added statistically significantly to the prediction, p < .05.
Comparing Means
Independent T-Test
The independent-samples t-test (or independent t-test, for short) compares the means between
two unrelated groups on the same continuous, dependent variable. For example, you could use an
independent t-test to understand whether first year graduate salaries differed based on gender
(i.e., your dependent variable would be "first year graduate salaries" and your independent
variable would be "gender", which has two groups: "male" and "female"). Alternately, you could
use an independent t-test to understand whether there is a difference in test anxiety based on
educational level (i.e., your dependent variable would be "test anxiety" and your independent
variable would be "educational level", which has two groups: "undergraduates" and
"postgraduates").
Click Analyze > Compare Means > Independent-Samples T Test... on the top menu,
as shown below:
Transfer the dependent variable, Cholesterol , into the Test Variable(s): box, and transfer
the independent variable, Treatment , into the Grouping Variable: box, by highlighting the
relevant variables and pressing the
button. You
Enter 1 into the Group 1: box and enter 2 into the Group 2: box. Remember that we
labelled the Diet Treatment group as 1 and the Exercise Treatment group as 2.
Click the
button.
If you need to change the confidence level limits or change how to exclude cases, click
the
Click the
button. You will be returned to the Independent-Samples T
Test dialogue box.
Click the
button.
Based on the results above, we could report the results of the study as follows
SPSS Statisticstop ^
ANOVA Table
This is the table that shows the output of the ANOVA analysis and whether we have a statistically
significant difference between our group means. We can see that the significance level is 0.021
(p = .021), which is below 0.05. and, therefore, there is a statistically significant difference in the
mean length of time to complete the spreadsheet problem between the different courses taken.
This is great to know, but we do not know which of the specific groups differed.
General
There was a statistically significant difference between groups as determined by one-way
ANOVA (F(2,27) = 4.467, p = .021). The study suggested that advanced level took lesser time to
complete problems in spreadsheet than other levels.