Professional Documents
Culture Documents
Assumptions
Each of the tests in this section have a number of
assumptions underlying their use. There are some general assumptions that apply to all of the parametric techniques discussed here (e.g. t-tests, analysis of variance), and additional assumptions associated with specific techniques. The general assumptions are presented in this section and the more specific assumptions are presented in the following topics, as appropriate.
Level of measurement
Each of these approaches assumes that the
dependent variable is measured at the interval or ratio level, that is, using a continuous scale rather than discrete categories. Wherever possible when designing your study, try to make use of continuous, rather than categorical, measures of your dependent variable. This gives you a wider range of possible techniques to use when analysing your data.
Random sampling
The techniques covered in this lecture assume that
the scores are obtained using a random sample from the population. This is often not the case in real-life research.
Independence of observations
The observations that make up your data must be
independent of one another. That is, each observation or measurement must not be influenced by any other observation or measurement. Violation of this assumption is very serious. There are a number of research situations that may violate this assumption of independence. For example: Studying the performance of students working in pairs or small groups. The behaviour of each member of the group influences all other group
measurements are collected in a group setting, or subjects are involved in some form of interaction with one another, should be considered suspect. In designing your study you should try to ensure that all observations are independent. If you suspect some violation of this assumption, Stevens (1996, p. 241) recommends that you set a more stringent alpha value (e.g. p<.01).
Normal distribution
What are the characteristics of a normal distribution curve?
samples are taken are normally distributed. In a lot of research (particularly in the social sciences), scores on the dependent variable are not nicely normally distributed. Fortunately, most of the techniques are reasonably robust or tolerant of violations of this assumption. With large enough sample sizes (e.g. 30+), the violation of this assumption should not cause any major problems. The distribution of scores for each of your groups can be checked using histograms obtained as part
Homogeneity of variance
Techniques in this section make the assumption that
samples are obtained from populations of equal variances. To test this, SPSS performs the Levene test for equality of variances as part of the t-test and analysis of variances analyses. If you obtain a significance value of less than .05, this suggests that variances for the two groups are not equal, and you have therefore violated the assumption of homogeneity of variance. Analysis of variance is reasonably robust to violations of this assumption, provided the size of your groups is reasonably similar (e.g. largest/smallest=1.5).
test hypotheses. With these types of analyses there is always the possibility of reaching the wrong conclusion. There are two different errors that we can make: 1. Type 1 error occurs when we think there is a difference between our groups, but there really isnt. in other words, we may reject the null hypothesis when it is, in fact, true .We can minimise this possibility by selecting an appropriate alpha level.
null hypothesis when it is, in fact, false (i.e. believing that the groups do not differ, when in fact they do). Unfortunately these two errors are inversely related. As we try to control for a Type 1 error, we actually increase the likelihood that we will commit a Type 2 error. Ideally we would like the tests that we use to correctly identify whether in fact there is a difference between our groups. This is called the power of a test. Tests vary in terms of their power (e.g. parametric tests such as t-tests, analysis of variance etc. are more powerful than non-parametric tests). Other factors that can influence the power of a test are:
1. Sample Size: When the sample size is large (e.g. 100 or more subjects), then power is not an issue. However, when you have a study where the group size is small (e.g. n=20), then you need to be aware of the possibility that a non-significant result may be due to insufficient power. When small group sizes are involved it may be necessary to adjust the alpha level to compensate. There are tables available that will tell you how large your sample size needs to be to achieve sufficient power, given the effect size you wish to detect. The higher the power, the more confident you can be that there is no real difference between the groups.
Effect size With large samples, even very small differences between groups can become statistically significant. This does not mean that the difference has any practical or theoretical significance. One way that you can assess the importance of your finding is to calculate the effect size (also known as strength of association). Effect size is a set of statistics which indicates the relative magnitude of the differences between means. In other words, it describes the amount of the total variance in the dependent variable that is predictable from knowledge of the levels of the independent variable
are: eta squared, Cohens d and Cohens f (see the formula) Eta squared represents the proportion of variance of the dependent variable that is explained by the independent variable. Values for eta squared can range from 0 to 1. To interpret the strength of eta squared values the following guidelines can be used: .01=small effect; .06=moderate effect; and .14=large effect. A number of criticisms have been levelled at eta
Missing data
It is important that you inspect your data file for
missing data. Run Descriptives and find out what percentage of values is missing for each of your variables. If you find a variable with a lot of unexpected missing data you need to ask yourself why. You should also consider whether your missing values are happening randomly, or whether there is some systematic pattern (e.g. lots of women failing to answer the question about their age).
procedures offers you choices for how you want SPSS to deal with missing data. The Exclude cases listwise option will include cases in the analysis only if it has full data on all of the variables listed in your variables box for that case. A case will be totally excluded from all the analyses if it is missing even one piece of information. This can severely, and unnecessarily, limit your sample size. The Exclude cases pairwise (sometimes shown as Exclude cases analysis by analysis) option, however, excludes the cases (persons) only if they are missing the data required for the specific analysis. They will still be included in any of the analyses for which they have the necessary information. The Replace with mean option, which is available in some SPSS statistical procedures, calculates the mean
pairwise exclusion of missing data, unless you have a pressing reason to do otherwise. The only situation where you might need to use listwise exclusion is when you want to refer only to a subset of cases that provided a full set of results.
Hypothesis testing
What is a hypothesis test
A hypothesis test uses sample data to test a hypothesis about the population from which the sample was taken.
When to use a hypothesis test
Use a hypothesis test to make inferences about one or more populations when sample data are available.
Hypothesis testing can help answer questions such as: Are students achievement in science meeting or exceeding his achievement in science ? Is the performance of teacher A better than the performance of teacher B ?
21
1. One-sample t-test
What is a one-sample t-test
A one-sample t-test helps determine whether (the population mean) is equal to a hypothesized value (the test mean). The test uses the standard deviation of the sample to estimate (the population standard deviation). If the difference between the sample mean and the test mean is large relative to the variability of the sample mean, then is unlikely to be equal to the test mean.
Cont.
When to use a one-sample t-test
Use a one-sample t-test when continuous data are available from a single random sample. The test assumes the population is normally distributed. However, it is fairly robust to violations of this assumption for sample sizes equal to or greater than 30, provided the observations are collected randomly and the data are continuous, unimodal, and reasonably symmetric
Z-test or t-test?
The shortcoming of the z test is
the value of population standard deviation to be able to compute standard error. But it is rarely known.
24
x s n
25
26
If the value of t exceeds some threshold or critical valued, t , then an effect is detected (i.e. the null hypothesis of no difference is rejected)
27
28
in the final calculation of a statistic that are free to vary. Degrees of freedom (d.f.) is computed as the one less than the sample size (the denominator of the standard deviation): df = n - 1
29
x s n
conclusion
t t critical value, reject null hypothesis t < t critical value, fail to reject null hypothesis
30
if a given therapy works to reduce test anxiety in a sample of college students. A standard measure of test anxiety is known to produce a = 20. In the sample you draw of 81 the mean = 18 with s = 9. Use an alpha level of .05
31
Write hypotheses
Ho: The average test anxiety in the sample of
college students will not be statistically significantly different than 20. Ho: = 20 H1 = The average test anxiety in the sample of college students will be statistically significantly lower than 20. H1: < 20
32
Set Criterion
one-tailed = .05
df = n 1
81 1 = 80 t critical value = - 1.671
33
34
1.671. Reject the null hypothesis and conclude that average test anxiety in the sample of college students is statistically significantly lower than 20, t = -2.0, p < .05.
35
Procedures
Select Analyze/Compare Means/One-Sample t-test
Notice, SPSS allows us to specify what Confidence Interval to calculate. Leave it at 95%. Click Continue and then Ok. The output follows.
Notice that descriptive statistics are automatically calculated in the one-sample t-test. Does our t-value agree with the one in the textbook? Look at the Confidence Interval. Notice that it is not the confidence interval of the mean, but the confidence interval for the difference between the sample mean and the test value we
38
Independent-samples T-test
An independent-samples t-test is used when you want to compare the mean score, on some continuous variable, for two different groups of subjects.
Is there a significant difference in the mean self-esteem scores for males and females? What you need: Two variables:
one categorical, independent variable (e.g. males/females); and one continuous, dependent variable (e.g. self-esteem scores).
Assumptions:
The assumptions for this test are (continuous scale, normality, independence of observation, homogeneity,
Cont.
What it does:
An independent-samples t-test will tell you whether there is a statistically significant difference in the mean scores for the two groups (that is, whether males and females differ significantly in terms of their self-esteem levels). In statistical terms, you are testing the probability that the two sets of scores (for males and females) came from the same population.
Non-parametric alternative:
Mann-Whitney Test
groups In the Group Statistics box SPSS gives you the mean and standard deviation for each of your groups (in this case: male/female). It also gives you the number of people in each group (N). Always check these values first. Do they seem right? Are the N values for males and females correct? Or are there a lot of missing data? If so, find out why. Perhaps you have entered the wrong code for males and females (0 and 1, rather than 1 and 2). Check with your codebook.
Cont.
Step 2: Checking assumptions
the two groups (males and females) is the same. The outcome of this test determines which of the t-values that SPSS provides is the correct one for you to use. If your Sig. value is larger than .05 (e.g. .07, .10), you should use the first line in the table, which refers to Equal variances assumed. If the significance level of Levenes test is p=.05 or less (e.g. .01, .001), this means that the variances for the two groups (males/females) are not the same. Therefore your data violate the assumption of equal variance. Dont panic-SPSS provides you with an alternative t-value. You should use the information in the second line of the t-test table, which refers to Equal
Cont.
Step 3: Assessing differences between the
groups If the value in the Sig. (2-tailed) column is equal or less than .05 (e.g. .03, .01, .001), then there is a significant difference in the mean scores on your dependent variable for each of the two groups. If the value is above .05 (e.g. .06, .10), there is no significant difference between the two groups. Having established that there is a significant difference, the next step is to find out which set of scores is higher.
magnitude of the differences between your groups (not just whether the difference could have occurred by chance). eta squared is the most commonly used. Eta squared can range from 0 to 1 and represents the proportion of variance in the dependent variable that is explained by the independent (group) variable. SPSS does not provide eta squared values for ttests.
the effect size of .006 is very small. Expressed as a percentage (multiply your eta square value by 100), only .6 per cent of the variance in selfesteem is explained by sex.
follows:
An independent-samples t-test was conducted to compare the self-esteem scores for males and females. There was no significant difference in scores for males (M=34.02, D=4.91) and females [M=33.17, SD=5.71; t(434)=1.62, p=.11]. The magnitude of the differences in the means was very small (eta squared=.006).
Paired-samples T-test
Paired-samples t-test (also referred to as repeated
measures) is used when you have only one group of people (or companies, or machines etc.) and you collect data from them on two different occasions, or under two different conditions. Pre-test/post-test experimental designs are an example of the type of situation where this technique is appropriate. It can also be used when you measure the same person in terms of his/her response to two different questions. In this case, both dimensions should be rated on the same scale (e.g. from 1=not at all important to 5=very
fear of statistics scores following participation in an intervention designed to increase students confidence in their ability to successfully complete a statistics course? Does the intervention have an impact on participants fear of statistics scores?
Cont.
What you need:
One set of subjects (or matched pairs). Each person (or pair) must provide both sets of scores. Two variables: one categorical independent variable (in this case it is Time: with two different levels Time 1, Time 2); and one continuous, dependent variable (e.g. Fear of Statistics Test scores) measured on two different occasions, or under different conditions.
Cont.
What it does:
A paired-samples t-test will tell you whether there is a statistically significant difference in the mean scores for Time 1 and Time 2. Assumptions: The basic assumptions for t-tests. Additional assumption: The difference between the two scores obtained for each subject should be normally distributed. With sample sizes of 30+, violation of this assumption is unlikely to cause any serious problems. Non-parametric alternative: Wilcoxon Signed Rank Test.
In the table labelled Paired Samples Test you need to look in the final column, labelled Sig. (2-tailed) this is your probability value. If this value is less than .05 (e.g. .04, .01, .001), then you can conclude that there is a significant difference between your two scores.
Cont.
Step 2: Comparing mean values
Having established that there is a significant difference, the next step is to find out which set of scores is higher (Time 1 or Time 2). To do this, look in the first printout box, labelled Paired Samples Statistics. This box gives you the Mean scores for each of the two sets of scores.
Given our eta squared value of .50, we can conclude that there was a large effect, with a substantial difference in the Fear of Statistics scores
Caution
Although we obtained a significant difference in
the scores before/after the intervention, we cannot say that the intervention caused the drop in Fear of Statistics Test scores. Research is never that simple, unfortunately! There are many other factors that may have also influenced the decrease in fear scores. Wherever possible, the researcher should try to anticipate these confounding factors and either control for them or incorporate them into the research design.
Hypothesis Testing
Analysis Of Variance
61
interested in comparing the mean scores of more than two groups. In this situation we would use analysis of variance (ANOVA). One-way analysis of variance involves one independent variable (referred to as a factor), which has a number of different levels. These levels correspond to the different groups or conditions. For example, in comparing the effectiveness of three different teaching styles on students Maths scores, you would have one factor (teaching style) with three levels (e.g. whole class, small group activities, selfpaced computer activities). The dependent variable is a continuous variable (in
Cont.
Analysis of variance is so called because it compares
the variance (variability in scores) between the different groups (believed to be due to the independent variable) with the variability within each of the groups (believed to be due to chance). An F ratio is calculated which represents the variance between the groups, divided by the variance within the groups. A large F ratio indicates that there is more variability between the groups (caused by the independent variable) than there is within each group (referred to as the error term). A significant F test indicates that we can reject the null hypothesis, which states that the population means are
Cont.
There are two different types of one-way ANOVA : between-groups analysis of variance, which is
used when you have different subjects or cases in each of your groups (this is referred to as an independent groups design); and repeated-measures analysis of variance, which is used when you are measuring the same subjects under different conditions (or measured at different points in time) (this is also referred to as a within-subjects design).
Planned comparisons
Planned comparisons
when you wish to test specific hypotheses (usually drawn from theory or past research) concerning the differences between a subset of your groups (e.g. do Groups 1 and 3 differ significantly?). Planned comparisons do not control for the increased risks of Type 1 errors. If there are a large number of differences that you wish to explore, it may be safer to use the alternative approach (post-hoc comparisons), which is designed to protect against Type 1 errors. The other alternative is to apply what is known as a Bonferroni adjustment to the alpha level that you will use to judge statistical significance. This involves setting a
Post-hoc comparisons
Post-hoc comparisons (also known as a posteriori) are
used when you want to conduct a whole set of comparisons, exploring the differences between each of the groups or conditions in your study. Post-hoc comparisons are designed to guard against the possibility of an increased Type 1 error due to the large number of different comparisons being made. With small samples this can be a problem, as it can be very hard to find a significant result, even when the apparent difference in scores between the groups is quite large. There are a number of different post-hoc tests that you can use, and these vary in terms of their nature and strictness. The assumptions underlying the posthoc tests
Multiple Comparison Tests Range Tests Only AND Range Tests Tukeys HSD (honestly significant difference) test Hochbergs GT2 Tukeys b (AKA, Tukeys WSD (Wholly Significant Difference)) S-N-K (Student-NewmanKeuls)
Multiple Comparison Tests Only Bonferroni (don't use with 5 groups or greater) Sidak Dunnett (compares a control group to the other groups without comparing the other groups to each other) LSD (least significant difference)
Gabriel
Duncan
This test is the most liberal of all Post Hoc tests and its critical t for significance is not affected by the number of groups. This test is appropriate when you have 3 means to compare. It is not appropriate for additional means. Bonferroni (AKA, Dunns Bonferroni) This test does not require the overall ANOVA to be significant. It is appropriate when the number of comparisons (c = number of comparisons = k(k-1))/2) exceeds the number of degrees of freedom (df) between groups (df = k-1). This test is very conservative and its power quickly declines as the c increases. A good rule of thumb is that the number of comparisons (c) be no larger than the degrees of freedom (df). Newman-Keuls If there is more than one true null hypothesis in a set of means, this test will overestimate they familywise error rate. It is appropriate to use this test when the number of comparisons exceeds the number of degrees of freedom (df) between groups (df = k-1) and
Cont.
Tukey's HSD (Honestly Significant Difference)
This test is perhaps the most popular post hoc. It reduces Type I error at the expense of Power. It is appropriate to use this test when one desires all the possible comparisons between a large set of means (6 or more means). Tukey's b (AKA, Tukeys WSD (Wholly Significant Difference)) This test strikes a balance between the Newman-Keuls and Tukey's more conservative HSD regarding Type I error and Power. Tukey's b is appropriate to use when one is making more than k-1 comparisons, yet fewer than (k(k-1))/2 comparisons, and needs more control of Type I error than Newman-Kuels. Scheffe This test is the most conservative of all post hoc tests. Compared to Tukey's HSD, Scheffe has less Power when making pairwise (simple) comparisons, but more Power when making complex comparisons. It is appropriate to use
used when you have one independent (grouping) variable with three or more levels (groups) and one dependent continuous variable. The one-way part of the title indicates there is only one independent variable, and between-groups means that you have different subjects or cases in each of the groups.
or more distinct categories. This can also be a continuous variable that has been recoded to give three equal groups (e.g. age group: subjects divided into 3 age categories, 29 and younger, between 30 and 44, 45 or above). For instructions on how to do this see Chapter 8; and one continuous dependent variable (e.g. optimism).
Cont.
What it does:
significant differences in the mean scores on the dependent variable across the three groups. Post-hoc tests can then be used to find out where these differences lie. Non-parametric alternative: Kruskal-Wallis Test
Assumptions
Population distribution of response variable in
each group is normal Standard deviations of population distributions for the groups are equal Independent random samples
(In practice, in a lot of cases, these arent strictly met, but we do ANOVA anyway)
Hypotheses
Null hypothesis:
H0: 1 = 2 = 3
Alternative or research hypothesis:
Ha: 1 2 or 1 3 or 3 3
Level of significance
Test statistic
Between estimate BSS g 1 F Within estimate WSS N g
df1 g 1
df 2 N g
s
Group N
n y y
i i
g 1
Mean
Walk 46 Drive 228 Bus 17 Total 291 Divided by g-1 (between estimate)
Group Mean Grand Mean 10.20 -3.900 14.43 0.330 20.35 6.250 14.10
Grand mean
BS S
s2
Group N
2 n 1 s i i
Ng
Variance
Walk 46 Drive 228 Bus 17 Total 291 Divided by N-g (within estimate)
WS S
F statistic calculation
Calculating the F statistic
Degrees of freedom
df1 (degrees of freedom in numerator) Number of samples/groups - 1
=3-1=2 df2 (degrees of freedom in denominator) Total number of cases - number of groups = 2916 - 3 = 288
Table of F distribution
Separate tables for each probability
top df2 (degrees of freedom in denominator) down side Values of F in table For degrees of freedom not given, use next lower value
p-value
Find F values for degrees of freedom (2, 313)
Conclusion
p-value < 0.001 is less than = 0.05
equal Conclude that at least one of the means is different from the others
Example
Treatment 1 60 inches 67 Treatment 2 50 52 Treatment 3 48 49 Treatment 4 47 67
42
67 56 62 64 59 72 71
43
67 67 59 67 64 63 65
50
55 56 61 61 60 59 64
54
67 68 65 65 56 60 65
Example
Step 1) calculate the sum of squares between groups:
Treatment 1 60 inches 67 42 67 56 62 64 59 72 71
Treatment 2 50 52 43 67 67 59 67 64 63 65
Treatment 3 48 49 50 55 56 61 61 60 59 64
Treatment 4 47 67 54 67 68 65 65 56 60 65
Mean for group 1 = 62.0 Mean for group 2 = 59.7 Mean for group 3 = 56.3 Mean for group 4 = 61.4
Grand mean= 59.85 SSB = [(62-59.85)2 + (59.7-59.85)2 + (56.3-59.85)2 + (61.4-59.85)2 ] xn per group= 19.65x10 = 196.5
Example
Step 2) calculate the sum of squares within groups: (60-62) 2+(67-62) 2+
Treatment 1 60 inches 67 42
Treatment 2 50 52 43
Treatment 3 48 49 50
Treatment 4 47 67 54
(42-62) 2+ (67-62) 2+ (56-62) 2+ (6262) 2+ (64-62) 2+ (59-62) 2+ (72-62) 2+ (71-62) 2+ (5059.7) 2+ (52-59.7) 2+ (4359.7) 2+67-59.7) 2+ (67-59.7) 2+ (69-59.7) 2+.(sum of 40 squared deviations) = 2060.6
67
56 62 64 59 72 71
67
67 59 67 64 63 65
55
56 61 61 60 59 64
67
68 65 65 56 60 65
Between
196.5
65.5
1.14
.344
Within
36
2060.6
57.2
Total
39
2257.1
This table gives you information about each group (number in each group, means, standard deviation, minimum and maximum, etc.) Always check this table first. Are the Ns for each group correct? Test of homogeneity of variances
o The homogeneity of variance option gives you Levenes test for
homogeneity of variances, which tests whether the variance in scores is the same for each of the three groups. o Check the significance value (Sig.) for Levenes test. If this number is greater than .05 (e.g. .08, .12, .28), then you have not violated the assumption of homogeneity of variance. o If you have found that you violated this assumption you will need to consult the table in the output headed Robust Tests of Equality of Means. The two tests shown there (Welsh and Brown-Forsythe) are preferable when the assumption of the
Cont.
ANOVA
This table gives both between-groups and within-groups sums of squares, degrees of freedom etc. The main thing you are interested in is the column marked Sig. If the Sig. value is less than or equal to .05 (e.g. .03, .01, .001), then there is a significant difference somewhere among the mean scores on your dependent variable for the three groups.
Multiple comparisons
You should look at this table only if you found a significant difference in your overall ANOVA. That is, if the Sig. value was equal to or less than .05. The posthoc tests in this table will tell you exactly where the differences among the groups
Cont.
Means plots
This plot provides an easy way to compare the mean scores for the different groups. Warning: these plots can be misleading. Depending on the scale used on the Y axis (in this case representing Optimism scores), even small differences can look dramatic.
Warning
In this example we obtained a statistically significant result, but the actual difference in the mean scores of the groups was very small (21.36, 22.10, 22.96). This is evident in the small effect size obtained (eta squared=.02). With a large enough sample (in this case N=435), quite small differences can become statistically significant, even if the difference between the groups is of little practical importance. Always interpret your results carefully, taking into account all the information you have available. Dont rely too heavily on statistical significancemany other
variables, and between-groups indicates that different people are in each of the groups. This technique allows us to look at the individual and joint effect of two independent variables on one dependent variable. The advantage of using a two-way design is that we can test the main effect for each independent variable and also explore the possibility of an interaction effect. An interaction effect occurs when the effect of one independent variable on the dependent variable
optimism? Does gender moderate the relationship between age and optimism? What you need: Three variables: two categorical independent variables (e.g. Sex: males/females; Age group: young, middle, old); and one continuous dependent variable (e.g. total optimism). Assumptions: the assumptions underlying ANOVA.
Cont.
What it does:
test for the effect of each of your independent variables on the dependent variable and also identifies any interaction effect. For example, it allows you to test for: sex differences in optimism; differences in optimism for young, middle and old subjects; and the interaction of these two variablesis there a difference in the effect of age on optimism for males and females?
Cont.
5. Click on the Post Hoc button.
From the Factors listed on the left-hand side choose the independent variable(s) you are interested in (this variable should have three or more levels or groups: e.g. agegp3). Click on the arrow button to move it into the Post Hoc Tests for section. Choose the test you wish to use (in this case Tukey). Click on Continue.
The output
These provide the mean scores, standard deviations and N for each subgroup. Check that these values are correct. Levenes Test of Equality of Error Variances.
This test provides a test of one of the assumptions
underlying analysis of variance. The value you are most interested in is the Sig. level. You want this to be greater than .05, and therefore not significant. A significant result (Sig. value less than .05) suggests that the variance of your dependent variable across the groups is not equal. If you find this to be the case in your study it is recommended that you set a more stringent significance level (e.g. .01) for evaluating the results of your two-way
Cont.
Tests Of Between-Subjects Effects.
This gives you a number of pieces of information, not necessarily in the order in which you need to check them. Interaction effects The first thing you need to do is to check for the possibility of an interaction effect (e.g. that the influence of age on optimism levels depends on whether you are a male or a female). In the SPSS output the line we need to look at is labeled AGEGP3*SEX. To find out whether the interaction is significant, check the Sig. column for that line. If the value is less than or equal to .05
Cont.
Main effects
In the left-hand column, find the variable you are interested in (e.g. AGEGP3) To determine whether there is a main effect for each independent variable, check in the column marked Sig. next to each variable. Effect size The effect size for the agegp3 variable is provided in the column labelled Partial Eta Squared (.018). Using Cohens (1988) criterion, this can be classified as small (see introduction to Part Five). So, although this effect reaches statistical significance, the actual difference in the mean values is very small. From the Descriptives table we can see that the mean scores for the three age groups (collapsed for sex) are 21.36, 22.10, 22.96. The difference between the groups appears to be of little practical
Cont.
Post-hoc
more than two levels (groups) to your independent variable. These tests systematically compare each of your pairs of groups, and indicate whether there is a significant difference in the means of each. SPSS provides these post-hoc tests as part of the ANOVA output. tests
Cont.
Multiple comparisons
The results of the post-hoc tests are provided in the table labelled Multiple Comparisons. We have requested the Tukey Honestly Significant Difference test, as this is one of the more commonly used tests. Look down the column labelled Sig. for any values less than .05. Significant results are also indicated by a little asterisk in the column labelled Mean Difference.
Cont.
Plots You will see at the end of your SPSS output a plot of the optimism scores for males and females, across the three age groups. This plot is very useful for allowing you to visually inspect the relationship among your variables.
that you will look at the results for each of the subgroups separately. This involves splitting the sample into groups according to one of your independent variables and running separate oneway ANOVAs to explore the effect of the other variable. Use the SPSS Split File option. This option allows you to split your sample according to one categorical variable and to repeat analyses separately for each group.
After splitting the file you then perform a oneway ANOVA Important: When you have finished these analyses for the separate groups you must turn the Split File option off
variance that allows you to explore differences between groups while statistically controlling for an additional (continuous) variable. This additional variable (called a covariate) is a variable that you suspect may be influencing scores on the dependent variable. SPSS uses regression procedures to remove the variation in the dependent variable that is due to the covariate/s, and then performs the normal analysis of variance techniques on the corrected or adjusted scores. By removing the influence of these additional variables ANCOVA can increase the power or sensitivity of the Ftest.
Uses of ANCOVA
ANCOVA can be used when you have a two-group pre-
test/post-test design (e.g. comparing the impact of two different interventions, taking before and after measures for each group). The scores on the pre-test are treated as a covariate to control for pre-existing differences between the groups. This makes ANCOVA very useful in situations when you have quite small sample sizes, and only small or medium effect sizes (see discussion on effect sizes in the introduction to Part. Five). Under these circumstances (which are very common in social science research), Stevens (1996) recommends the use of two or three carefully chosen covariates to reduce the error variance and increase your chances of detecting a significant difference
CONT.
ANCOVA is also handy when you have been unable to
randomly assign your subjects to the different groups, but instead have had to use existing groups (e.g. classes of students). As these groups may differ on a number of different attributes (not just the one you are interested in), ANCOVA can be used in an attempt to reduce some of these differences. The use of well-chosen covariates can help reduce the confounding influence of group differences. This is certainly not an ideal situation, as it is not possible to control for all possible differences; however, it does help reduce this systematic bias. The use of ANCOVA with intact or existing groups is somewhat of a contentious one among writers in the field. It would be a good idea to read more widely if you find yourself in this situation. Some of these issues are summarised in Stevens
Comparing means
1 group N 30 N < 30 Independen t N 30 N < 30 Normally distributed Not normal Normally distributed Not normal One-sample t-test One-sample t-test Sign test t-test t-test MannWhitney U or Wilcoxon signed-rank test paired t-test paired t-test Wilcoxon signed-rank test One way anova two or other anova
2 groups
Paired
3 or more groups
Independen t
Not normal
Dependent
Normally distributed
Not normal
THE END
X o X o t or t s SE ( X ) n
~ t-distribuion with df = n 1.
post hoc
Once you have determined that differences exist
among the means, post hoc range tests and pairwise multiple comparisons can determine which means differ. Range tests identify homogeneous subsets of means that are not different from each other. Pairwise multiple comparisons test the difference between each pair of means, and yield a matrix where asterisks indicate significantly different group means at an alpha level of 0.05 (SPSS, Inc.).
Single-step tests
Tukey-Kramer: The Tukey-Kramer test is an extension of the Tukey test to unbalanced designs. Unlike Tukey test for balanced designs, it is not exact. The FWE of the Tukey-Kramer test may be less than ALPHA. It is less conservative for only slightly unbalanced designs and more conservative when differences among samples sizes are bigger. Hochbergs GF2: The GF2 test is similar to Tukey, but the critical values are based on the studentized maximum modulus distribution instead of the studentized range. For balanced or unbalanced one-way anova, its FWE does not exceed ALPHA. It is usually more conservative than the Tukey-Kramer test for unbalanced designs and it is always more conservative than the Tukey test for balanced designs. Gabriel: Like the GF2 test, the Gabriel test is based on studentized maximum modulus. It is equivalent to the GF2 test for balanced one-way anova. For unbalanced one-way anova, it is less conservative than GF2, but its FWE may exceed ALPHA in highly unbalanced designs. Dunnett: The Dunnetts test is a test to use when the only pariwise comparisons of interest are comparisons with a control. It is an exact test, that is, its FWE is exactly equal to ALPHA, forbalanced as well as
Single-step tests
Below are short descriptions of these tests. LSD: The LSD (Least Significant Difference) test is a two-step test. First the ANOVA F test is performed. If it is significant at level ALPHA, then all pairwise t-tests are carried out, each at level ALPHA. If the F test is not significant, then the procedure terminates. The LSD test does not control the FWE. Bonferroni: The Bonferroni multiple comparison test is a conservative test, that is, the FWE is not exactly equal to ALPHA, but is less than ALPHA in most situations. It is easy to apply and can be used for any set of comparisons. Even though the Bonferroni test controls the FEW rate, in many situations it may be too conservative and not have enough power to detect significant differences.
Single-step tests
Sidak: Sidak adjusted p-values are also easy to compute.
The Sidak test gives slightly smaller adjusted p-values than Bonferroni, but it guarantees the strict control of FWE only when the comparisons are independent . Scheffe: The Scheffe test is used in ANOVA analysis (balanced, unbalanced, with covariates). It controls for the FWE for all possible contrasts, not only pairwise comparisons and is too conservative in cases when pairwise comparisons are the only comparisons of interest. Tukey: The Tukey test is based on the studentized range distribution (standardized maximum difference between the means). For oneway balanced anova, the FWE of the Tukey test is exactly equal the assumed value of ALPHA. The Tukey test is also exact for one-way balanced anova with correlated errors when the type of correlation structure is compound symmetry.
Appendix
eta squared,
ANOVA Table
Source of variation Between (k groups) d.f. k-1 Sum of squares SSB
(sum of squared deviations of group means from grand mean)
p-value Go to
k 1
nk k
Fk-1,nk-k chart
Within
(n individuals per group)
nk-k
SSW
(sum of squared deviations of observations from their group mean)
s2=SSW/nk-k
Total variation
nk-1
TSS=SSB + SSW