You are on page 1of 134

Statistical techniques to compare groups

Techniques covered in this Part


One-sample t-test Independent-samples t-test; Paired-samples t-test; One-way analysis of variance (between groups); two-way analysis of variance (between groups); and non-parametric techniques.

The different Statistical techniques to

compare groups in SPSS are:

Assumptions
Each of the tests in this section have a number of

assumptions underlying their use. There are some general assumptions that apply to all of the parametric techniques discussed here (e.g. t-tests, analysis of variance), and additional assumptions associated with specific techniques. The general assumptions are presented in this section and the more specific assumptions are presented in the following topics, as appropriate.

Level of measurement
Each of these approaches assumes that the

dependent variable is measured at the interval or ratio level, that is, using a continuous scale rather than discrete categories. Wherever possible when designing your study, try to make use of continuous, rather than categorical, measures of your dependent variable. This gives you a wider range of possible techniques to use when analysing your data.

Random sampling
The techniques covered in this lecture assume that

the scores are obtained using a random sample from the population. This is often not the case in real-life research.

Independence of observations
The observations that make up your data must be

independent of one another. That is, each observation or measurement must not be influenced by any other observation or measurement. Violation of this assumption is very serious. There are a number of research situations that may violate this assumption of independence. For example: Studying the performance of students working in pairs or small groups. The behaviour of each member of the group influences all other group

Any situation where the observations or

measurements are collected in a group setting, or subjects are involved in some form of interaction with one another, should be considered suspect. In designing your study you should try to ensure that all observations are independent. If you suspect some violation of this assumption, Stevens (1996, p. 241) recommends that you set a more stringent alpha value (e.g. p<.01).

Normal distribution
What are the characteristics of a normal distribution curve?

It is assumed that the populations from which the

samples are taken are normally distributed. In a lot of research (particularly in the social sciences), scores on the dependent variable are not nicely normally distributed. Fortunately, most of the techniques are reasonably robust or tolerant of violations of this assumption. With large enough sample sizes (e.g. 30+), the violation of this assumption should not cause any major problems. The distribution of scores for each of your groups can be checked using histograms obtained as part

Homogeneity of variance
Techniques in this section make the assumption that

samples are obtained from populations of equal variances. To test this, SPSS performs the Levene test for equality of variances as part of the t-test and analysis of variances analyses. If you obtain a significance value of less than .05, this suggests that variances for the two groups are not equal, and you have therefore violated the assumption of homogeneity of variance. Analysis of variance is reasonably robust to violations of this assumption, provided the size of your groups is reasonably similar (e.g. largest/smallest=1.5).

Type 1 error, Type 2 error and power


The purpose of t-tests and analysis of variance is to

test hypotheses. With these types of analyses there is always the possibility of reaching the wrong conclusion. There are two different errors that we can make: 1. Type 1 error occurs when we think there is a difference between our groups, but there really isnt. in other words, we may reject the null hypothesis when it is, in fact, true .We can minimise this possibility by selecting an appropriate alpha level.

Type 2 error. This occurs when we fail to reject a

null hypothesis when it is, in fact, false (i.e. believing that the groups do not differ, when in fact they do). Unfortunately these two errors are inversely related. As we try to control for a Type 1 error, we actually increase the likelihood that we will commit a Type 2 error. Ideally we would like the tests that we use to correctly identify whether in fact there is a difference between our groups. This is called the power of a test. Tests vary in terms of their power (e.g. parametric tests such as t-tests, analysis of variance etc. are more powerful than non-parametric tests). Other factors that can influence the power of a test are:

1. Sample Size: When the sample size is large (e.g. 100 or more subjects), then power is not an issue. However, when you have a study where the group size is small (e.g. n=20), then you need to be aware of the possibility that a non-significant result may be due to insufficient power. When small group sizes are involved it may be necessary to adjust the alpha level to compensate. There are tables available that will tell you how large your sample size needs to be to achieve sufficient power, given the effect size you wish to detect. The higher the power, the more confident you can be that there is no real difference between the groups.

Effect size With large samples, even very small differences between groups can become statistically significant. This does not mean that the difference has any practical or theoretical significance. One way that you can assess the importance of your finding is to calculate the effect size (also known as strength of association). Effect size is a set of statistics which indicates the relative magnitude of the differences between means. In other words, it describes the amount of the total variance in the dependent variable that is predictable from knowledge of the levels of the independent variable

Effect size statistics, the most common of which

are: eta squared, Cohens d and Cohens f (see the formula) Eta squared represents the proportion of variance of the dependent variable that is explained by the independent variable. Values for eta squared can range from 0 to 1. To interpret the strength of eta squared values the following guidelines can be used: .01=small effect; .06=moderate effect; and .14=large effect. A number of criticisms have been levelled at eta

Missing data
It is important that you inspect your data file for

missing data. Run Descriptives and find out what percentage of values is missing for each of your variables. If you find a variable with a lot of unexpected missing data you need to ask yourself why. You should also consider whether your missing values are happening randomly, or whether there is some systematic pattern (e.g. lots of women failing to answer the question about their age).

The Options button in many of the SPSS statistical

procedures offers you choices for how you want SPSS to deal with missing data. The Exclude cases listwise option will include cases in the analysis only if it has full data on all of the variables listed in your variables box for that case. A case will be totally excluded from all the analyses if it is missing even one piece of information. This can severely, and unnecessarily, limit your sample size. The Exclude cases pairwise (sometimes shown as Exclude cases analysis by analysis) option, however, excludes the cases (persons) only if they are missing the data required for the specific analysis. They will still be included in any of the analyses for which they have the necessary information. The Replace with mean option, which is available in some SPSS statistical procedures, calculates the mean

I would strongly recommend that you use

pairwise exclusion of missing data, unless you have a pressing reason to do otherwise. The only situation where you might need to use listwise exclusion is when you want to refer only to a subset of cases that provided a full set of results.

Hypothesis testing
What is a hypothesis test

A hypothesis test uses sample data to test a hypothesis about the population from which the sample was taken.
When to use a hypothesis test

Use a hypothesis test to make inferences about one or more populations when sample data are available.

Why use a hypothesis test

Hypothesis testing can help answer questions such as: Are students achievement in science meeting or exceeding his achievement in science ? Is the performance of teacher A better than the performance of teacher B ?

Hypothesis Testing with OneSample t-test

21

1. One-sample t-test
What is a one-sample t-test

A one-sample t-test helps determine whether (the population mean) is equal to a hypothesized value (the test mean). The test uses the standard deviation of the sample to estimate (the population standard deviation). If the difference between the sample mean and the test mean is large relative to the variability of the sample mean, then is unlikely to be equal to the test mean.

Cont.
When to use a one-sample t-test

Use a one-sample t-test when continuous data are available from a single random sample. The test assumes the population is normally distributed. However, it is fairly robust to violations of this assumption for sample sizes equal to or greater than 30, provided the observations are collected randomly and the data are continuous, unimodal, and reasonably symmetric

Z-test or t-test?
The shortcoming of the z test is

that it requires more information than is usually available.


To do a z-test, we need to know

the value of population standard deviation to be able to compute standard error. But it is rarely known.
24

When the population variance is unknown, we use one sample t-test


What if is unknown?
Cant compute z test statistics (z score)
x
Z=
Population standard deviation must be known

Can compute t statistic


t=

x s n

Sample standard deviation must be known

25

Hypothesis testing with a one-sample t-test


State the hypotheses
Ho: = hypothesized value H1: hypothesized value

Set the criteria for rejecting Ho


Alpha level Critical t value

26

Determining the criteria for rejecting the Ho

If the value of t exceeds some threshold or critical valued, t , then an effect is detected (i.e. the null hypothesis of no difference is rejected)

27

Table C.3 (p. 638 in text)

28

Degrees of freedom for One Sample t-test


degrees of freedom is the number of values

in the final calculation of a statistic that are free to vary. Degrees of freedom (d.f.) is computed as the one less than the sample size (the denominator of the standard deviation): df = n - 1

29

Hypothesis testing with a onesample t-test


Compute the test statistic (t-statistic)
t=

x s n

Make statistical decision and draw

conclusion
t t critical value, reject null hypothesis t < t critical value, fail to reject null hypothesis

30

One Sample t-test Example


You are conducting an experiment to see

if a given therapy works to reduce test anxiety in a sample of college students. A standard measure of test anxiety is known to produce a = 20. In the sample you draw of 81 the mean = 18 with s = 9. Use an alpha level of .05

31

Write hypotheses
Ho: The average test anxiety in the sample of

college students will not be statistically significantly different than 20. Ho: = 20 H1 = The average test anxiety in the sample of college students will be statistically significantly lower than 20. H1: < 20

32

Set Criterion
one-tailed = .05

df = n 1
81 1 = 80 t critical value = - 1.671

Use closest and most conservative value if

exact value not given

33

Compute test statistic (t statistic)


t = 18 20 = -2 = -2
9 / 81 1

34

Compare to criteria and make decision


t-statistic of -2 exceeds your critical value of -

1.671. Reject the null hypothesis and conclude that average test anxiety in the sample of college students is statistically significantly lower than 20, t = -2.0, p < .05.

35

Procedures
Select Analyze/Compare Means/One-Sample t-test

Notice, SPSS allows us to specify what Confidence Interval to calculate. Leave it at 95%. Click Continue and then Ok. The output follows.

Notice that descriptive statistics are automatically calculated in the one-sample t-test. Does our t-value agree with the one in the textbook? Look at the Confidence Interval. Notice that it is not the confidence interval of the mean, but the confidence interval for the difference between the sample mean and the test value we

Hypothesis Testing with twoSample t-test

38

Independent-samples T-test

An independent-samples t-test is used when you want to compare the mean score, on some continuous variable, for two different groups of subjects.

Summary for independent-samples ttest


Example of research question:

Is there a significant difference in the mean self-esteem scores for males and females? What you need: Two variables:
one categorical, independent variable (e.g. males/females); and one continuous, dependent variable (e.g. self-esteem scores).
Assumptions:

The assumptions for this test are (continuous scale, normality, independence of observation, homogeneity,

Cont.
What it does:

An independent-samples t-test will tell you whether there is a statistically significant difference in the mean scores for the two groups (that is, whether males and females differ significantly in terms of their self-esteem levels). In statistical terms, you are testing the probability that the two sets of scores (for males and females) came from the same population.
Non-parametric alternative:

Mann-Whitney Test

Procedure for independent-samples t-test


1. From the menu at the top of the screen click on: Analyze, then click on Compare means, then on Independent Samples T-test. 2. Move the dependent (continuous) variable (e.g. total self-esteem) into the area labelled Test variable. 3. Move the independent variable (categorical) variable (e.g. sex) into the section labelled Grouping variable. 4. Click on Define groups and type in the numbers used in the data set to code each group. In the current data file 1=males, 2=females; therefore, in the Group 1 box, type 1; and in the Group 2 box, type 2. 5. Click on Continue and then OK. The output generated from this procedure is shown below.

Interpretation of output from independent-samples t-test


Step 1: Checking the information about the

groups In the Group Statistics box SPSS gives you the mean and standard deviation for each of your groups (in this case: male/female). It also gives you the number of people in each group (N). Always check these values first. Do they seem right? Are the N values for males and females correct? Or are there a lot of missing data? If so, find out why. Perhaps you have entered the wrong code for males and females (0 and 1, rather than 1 and 2). Check with your codebook.

Cont.
Step 2: Checking assumptions

Levenes test for equality of variances:


This tests whether the variance (variation) of scores for

the two groups (males and females) is the same. The outcome of this test determines which of the t-values that SPSS provides is the correct one for you to use. If your Sig. value is larger than .05 (e.g. .07, .10), you should use the first line in the table, which refers to Equal variances assumed. If the significance level of Levenes test is p=.05 or less (e.g. .01, .001), this means that the variances for the two groups (males/females) are not the same. Therefore your data violate the assumption of equal variance. Dont panic-SPSS provides you with an alternative t-value. You should use the information in the second line of the t-test table, which refers to Equal

Cont.
Step 3: Assessing differences between the

groups If the value in the Sig. (2-tailed) column is equal or less than .05 (e.g. .03, .01, .001), then there is a significant difference in the mean scores on your dependent variable for each of the two groups. If the value is above .05 (e.g. .06, .10), there is no significant difference between the two groups. Having established that there is a significant difference, the next step is to find out which set of scores is higher.

Calculating the effect size for independent-samples t-test


Effect size statistics provide an indication of the

magnitude of the differences between your groups (not just whether the difference could have occurred by chance). eta squared is the most commonly used. Eta squared can range from 0 to 1 and represents the proportion of variance in the dependent variable that is explained by the independent (group) variable. SPSS does not provide eta squared values for ttests.

the effect size of .006 is very small. Expressed as a percentage (multiply your eta square value by 100), only .6 per cent of the variance in selfesteem is explained by sex.

Presenting the results for independent-samples t-test


The results of the analysis could be presented as

follows:

An independent-samples t-test was conducted to compare the self-esteem scores for males and females. There was no significant difference in scores for males (M=34.02, D=4.91) and females [M=33.17, SD=5.71; t(434)=1.62, p=.11]. The magnitude of the differences in the means was very small (eta squared=.006).

Paired-samples T-test
Paired-samples t-test (also referred to as repeated

measures) is used when you have only one group of people (or companies, or machines etc.) and you collect data from them on two different occasions, or under two different conditions. Pre-test/post-test experimental designs are an example of the type of situation where this technique is appropriate. It can also be used when you measure the same person in terms of his/her response to two different questions. In this case, both dimensions should be rated on the same scale (e.g. from 1=not at all important to 5=very

Summary for paired-samples ttest


Example of research question:

Is there a significant change in participants

fear of statistics scores following participation in an intervention designed to increase students confidence in their ability to successfully complete a statistics course? Does the intervention have an impact on participants fear of statistics scores?

Cont.
What you need:

One set of subjects (or matched pairs). Each person (or pair) must provide both sets of scores. Two variables: one categorical independent variable (in this case it is Time: with two different levels Time 1, Time 2); and one continuous, dependent variable (e.g. Fear of Statistics Test scores) measured on two different occasions, or under different conditions.

Cont.
What it does:

A paired-samples t-test will tell you whether there is a statistically significant difference in the mean scores for Time 1 and Time 2. Assumptions: The basic assumptions for t-tests. Additional assumption: The difference between the two scores obtained for each subject should be normally distributed. With sample sizes of 30+, violation of this assumption is unlikely to cause any serious problems. Non-parametric alternative: Wilcoxon Signed Rank Test.

Procedure for paired-samples ttest


From the menu at the top of the screen click on: Analyze, then click on Compare Means, then on Paired Samples T-test. 2. Click on the two variables that you are interested in comparing for each subject (e.g. fost1: fear of stats time1, fost2: fear of stats time2). 3. With both of the variables highlighted, move them into the box labelled Paired Variables by clicking on the arrow button. Click on OK.
1.

The output generated from this procedure is shown below

Interpretation of output from pairedsamples t-test


Step 1: Determining overall significance

In the table labelled Paired Samples Test you need to look in the final column, labelled Sig. (2-tailed) this is your probability value. If this value is less than .05 (e.g. .04, .01, .001), then you can conclude that there is a significant difference between your two scores.

Cont.
Step 2: Comparing mean values

Having established that there is a significant difference, the next step is to find out which set of scores is higher (Time 1 or Time 2). To do this, look in the first printout box, labelled Paired Samples Statistics. This box gives you the Mean scores for each of the two sets of scores.

Calculating the effect size for pairedsamples t-test

Given our eta squared value of .50, we can conclude that there was a large effect, with a substantial difference in the Fear of Statistics scores

Caution
Although we obtained a significant difference in

the scores before/after the intervention, we cannot say that the intervention caused the drop in Fear of Statistics Test scores. Research is never that simple, unfortunately! There are many other factors that may have also influenced the decrease in fear scores. Wherever possible, the researcher should try to anticipate these confounding factors and either control for them or incorporate them into the research design.

Presenting the results for pairedsamples t-test


A paired-samples t-test was conducted to evaluate the impact of the intervention on students scores on the Fear of Statistics Test (FOST). There was a statistically significant decrease in FOST scores from Time 1 (M=40.17, SD=5.16) to Time 2 [M=37.5, SD=5.15, t(29)=5.39, p<.0005]. The eta squared statistic (.50) indicated a large effect size.

Hypothesis Testing
Analysis Of Variance

61

One-way analysis of variance


In many research situations, however, we are

interested in comparing the mean scores of more than two groups. In this situation we would use analysis of variance (ANOVA). One-way analysis of variance involves one independent variable (referred to as a factor), which has a number of different levels. These levels correspond to the different groups or conditions. For example, in comparing the effectiveness of three different teaching styles on students Maths scores, you would have one factor (teaching style) with three levels (e.g. whole class, small group activities, selfpaced computer activities). The dependent variable is a continuous variable (in

Cont.
Analysis of variance is so called because it compares

the variance (variability in scores) between the different groups (believed to be due to the independent variable) with the variability within each of the groups (believed to be due to chance). An F ratio is calculated which represents the variance between the groups, divided by the variance within the groups. A large F ratio indicates that there is more variability between the groups (caused by the independent variable) than there is within each group (referred to as the error term). A significant F test indicates that we can reject the null hypothesis, which states that the population means are

Cont.
There are two different types of one-way ANOVA : between-groups analysis of variance, which is

used when you have different subjects or cases in each of your groups (this is referred to as an independent groups design); and repeated-measures analysis of variance, which is used when you are measuring the same subjects under different conditions (or measured at different points in time) (this is also referred to as a within-subjects design).

Planned comparisons and Post-hoc comparisons

Planned comparisons
Planned comparisons

(also know as a priori) are used

when you wish to test specific hypotheses (usually drawn from theory or past research) concerning the differences between a subset of your groups (e.g. do Groups 1 and 3 differ significantly?). Planned comparisons do not control for the increased risks of Type 1 errors. If there are a large number of differences that you wish to explore, it may be safer to use the alternative approach (post-hoc comparisons), which is designed to protect against Type 1 errors. The other alternative is to apply what is known as a Bonferroni adjustment to the alpha level that you will use to judge statistical significance. This involves setting a

Post-hoc comparisons
Post-hoc comparisons (also known as a posteriori) are

used when you want to conduct a whole set of comparisons, exploring the differences between each of the groups or conditions in your study. Post-hoc comparisons are designed to guard against the possibility of an increased Type 1 error due to the large number of different comparisons being made. With small samples this can be a problem, as it can be very hard to find a significant result, even when the apparent difference in scores between the groups is quite large. There are a number of different post-hoc tests that you can use, and these vary in terms of their nature and strictness. The assumptions underlying the posthoc tests

Post Hoc tests that assume equal variance

Multiple Comparison Tests Range Tests Only AND Range Tests Tukeys HSD (honestly significant difference) test Hochbergs GT2 Tukeys b (AKA, Tukeys WSD (Wholly Significant Difference)) S-N-K (Student-NewmanKeuls)

Multiple Comparison Tests Only Bonferroni (don't use with 5 groups or greater) Sidak Dunnett (compares a control group to the other groups without comparing the other groups to each other) LSD (least significant difference)

Gabriel

Duncan

Scheffe (confidence intervals that are fairly wide)

R-E-G-W F (Ryan-EinotGabriel-Welsch F test)

R-E-G-W Q (Ryan-EinotGabriel-Welsch range test)

Post Hoc tests


Fisher's LSD (Least Significant Different)

This test is the most liberal of all Post Hoc tests and its critical t for significance is not affected by the number of groups. This test is appropriate when you have 3 means to compare. It is not appropriate for additional means. Bonferroni (AKA, Dunns Bonferroni) This test does not require the overall ANOVA to be significant. It is appropriate when the number of comparisons (c = number of comparisons = k(k-1))/2) exceeds the number of degrees of freedom (df) between groups (df = k-1). This test is very conservative and its power quickly declines as the c increases. A good rule of thumb is that the number of comparisons (c) be no larger than the degrees of freedom (df). Newman-Keuls If there is more than one true null hypothesis in a set of means, this test will overestimate they familywise error rate. It is appropriate to use this test when the number of comparisons exceeds the number of degrees of freedom (df) between groups (df = k-1) and

Cont.
Tukey's HSD (Honestly Significant Difference)

This test is perhaps the most popular post hoc. It reduces Type I error at the expense of Power. It is appropriate to use this test when one desires all the possible comparisons between a large set of means (6 or more means). Tukey's b (AKA, Tukeys WSD (Wholly Significant Difference)) This test strikes a balance between the Newman-Keuls and Tukey's more conservative HSD regarding Type I error and Power. Tukey's b is appropriate to use when one is making more than k-1 comparisons, yet fewer than (k(k-1))/2 comparisons, and needs more control of Type I error than Newman-Kuels. Scheffe This test is the most conservative of all post hoc tests. Compared to Tukey's HSD, Scheffe has less Power when making pairwise (simple) comparisons, but more Power when making complex comparisons. It is appropriate to use

One-way between-groups ANOVA with post-hoc tests


One-way between-groups analysis of variance is

used when you have one independent (grouping) variable with three or more levels (groups) and one dependent continuous variable. The one-way part of the title indicates there is only one independent variable, and between-groups means that you have different subjects or cases in each of the groups.

Summary for one-way betweengroups ANOVA with post-hoc tests


What you need: Two variables:

one categorical independent variable with three

or more distinct categories. This can also be a continuous variable that has been recoded to give three equal groups (e.g. age group: subjects divided into 3 age categories, 29 and younger, between 30 and 44, 45 or above). For instructions on how to do this see Chapter 8; and one continuous dependent variable (e.g. optimism).

Cont.
What it does:

One-way ANOVA will tell you whether there are

significant differences in the mean scores on the dependent variable across the three groups. Post-hoc tests can then be used to find out where these differences lie. Non-parametric alternative: Kruskal-Wallis Test

Assumptions
Population distribution of response variable in

each group is normal Standard deviations of population distributions for the groups are equal Independent random samples
(In practice, in a lot of cases, these arent strictly met, but we do ANOVA anyway)

Hypotheses
Null hypothesis:

H0: 1 = 2 = 3
Alternative or research hypothesis:

Ha: 1 2 or 1 3 or 3 3

Level of significance

Probability of making error in decision to reject

null hypothesis For this test choose = 0.05

Test statistic
Between estimate BSS g 1 F Within estimate WSS N g

df1 g 1
df 2 N g

Between estimate calculations


Between estimate of variance

s
Group N

n y y
i i

g 1

Mean

Walk 46 Drive 228 Bus 17 Total 291 Divided by g-1 (between estimate)

Group Mean Grand Mean 10.20 -3.900 14.43 0.330 20.35 6.250 14.10

Difference Squared 15.210 0.109 39.063

Times N 699.660 24.829 664.063 1388.552 694.276

Grand mean

BS S

Within estimate calculations


Within estimate of variance

s2
Group N

2 n 1 s i i

Ng

Variance

Variance Times ni -1 2143.965 15699.351 2904.912 20748.228 72.042

Walk 46 Drive 228 Bus 17 Total 291 Divided by N-g (within estimate)

46.608 68.857 170.877

WS S

F statistic calculation
Calculating the F statistic

Between estimate 694 .276 F 9.637 Within estimate 72 .042

Degrees of freedom
df1 (degrees of freedom in numerator) Number of samples/groups - 1

=3-1=2 df2 (degrees of freedom in denominator) Total number of cases - number of groups = 2916 - 3 = 288

Table of F distribution
Separate tables for each probability

df1 (degrees of freedom in numerator) across

top df2 (degrees of freedom in denominator) down side Values of F in table For degrees of freedom not given, use next lower value

p-value
Find F values for degrees of freedom (2, 313)

= 0.05, F = 3.07 (2, 120)


= 0.01, F = 4.79 (2, 120) = 0.001, F = 7.31 (2, 120)

F = 9.637 > F = 7.31 for = 0.001


p-value < 0.001

Conclusion
p-value < 0.001 is less than = 0.05

Reject null hypothesis that all means are

equal Conclude that at least one of the means is different from the others

Example
Treatment 1 60 inches 67 Treatment 2 50 52 Treatment 3 48 49 Treatment 4 47 67

42
67 56 62 64 59 72 71

43
67 67 59 67 64 63 65

50
55 56 61 61 60 59 64

54
67 68 65 65 56 60 65

Example
Step 1) calculate the sum of squares between groups:

Treatment 1 60 inches 67 42 67 56 62 64 59 72 71

Treatment 2 50 52 43 67 67 59 67 64 63 65

Treatment 3 48 49 50 55 56 61 61 60 59 64

Treatment 4 47 67 54 67 68 65 65 56 60 65

Mean for group 1 = 62.0 Mean for group 2 = 59.7 Mean for group 3 = 56.3 Mean for group 4 = 61.4

Grand mean= 59.85 SSB = [(62-59.85)2 + (59.7-59.85)2 + (56.3-59.85)2 + (61.4-59.85)2 ] xn per group= 19.65x10 = 196.5

Example
Step 2) calculate the sum of squares within groups: (60-62) 2+(67-62) 2+

Treatment 1 60 inches 67 42

Treatment 2 50 52 43

Treatment 3 48 49 50

Treatment 4 47 67 54

(42-62) 2+ (67-62) 2+ (56-62) 2+ (6262) 2+ (64-62) 2+ (59-62) 2+ (72-62) 2+ (71-62) 2+ (5059.7) 2+ (52-59.7) 2+ (4359.7) 2+67-59.7) 2+ (67-59.7) 2+ (69-59.7) 2+.(sum of 40 squared deviations) = 2060.6

67
56 62 64 59 72 71

67
67 59 67 64 63 65

55
56 61 61 60 59 64

67
68 65 65 56 60 65

Step 3) Fill in the ANOVA table


Source of variation d.f. Sum of squares Mean Sum of Squares F-statistic p-value

Between

196.5

65.5

1.14

.344

Within

36

2060.6

57.2

Total

39

2257.1

Procedure for one-way betweengroups ANOVA with post-hoc tests


1. From the menu at the top of the screen click on: Analyze, then click on Compare Means, then on One-way ANOVA. 2. Click on your dependent (continuous) variable (e.g. Total optimism). Move this into the box marked Dependent List by clicking on the arrow button. 3. Click on your independent, categorical variable (e.g. agegp3). Move this into the box labelled Factor. 4. Click the Options button and click on Descriptive, Homogeneity of variance test, Brown-Forsythe, Welsh and Means Plot. 5. For Missing values, make sure there is a dot in the option marked Exclude cases analysis by analysis. If not, click on this option once. Click on Continue. 6. Click on the button marked Post Hoc. Click on Tukey. 7. Click on Continue and then OK.

The output is shown below.

Interpretation of output from one-way between-groups ANOVA with post-hoc tests


Descriptives

This table gives you information about each group (number in each group, means, standard deviation, minimum and maximum, etc.) Always check this table first. Are the Ns for each group correct? Test of homogeneity of variances
o The homogeneity of variance option gives you Levenes test for

homogeneity of variances, which tests whether the variance in scores is the same for each of the three groups. o Check the significance value (Sig.) for Levenes test. If this number is greater than .05 (e.g. .08, .12, .28), then you have not violated the assumption of homogeneity of variance. o If you have found that you violated this assumption you will need to consult the table in the output headed Robust Tests of Equality of Means. The two tests shown there (Welsh and Brown-Forsythe) are preferable when the assumption of the

Cont.
ANOVA

This table gives both between-groups and within-groups sums of squares, degrees of freedom etc. The main thing you are interested in is the column marked Sig. If the Sig. value is less than or equal to .05 (e.g. .03, .01, .001), then there is a significant difference somewhere among the mean scores on your dependent variable for the three groups.
Multiple comparisons

You should look at this table only if you found a significant difference in your overall ANOVA. That is, if the Sig. value was equal to or less than .05. The posthoc tests in this table will tell you exactly where the differences among the groups

Cont.

Means plots

This plot provides an easy way to compare the mean scores for the different groups. Warning: these plots can be misleading. Depending on the scale used on the Y axis (in this case representing Optimism scores), even small differences can look dramatic.

Calculating effect size


The information you need to calculate eta squared, one of the most common effect size statistics, is provided in the ANOVA table (a calculator would be useful here). The formula is: Cohen classifies .01 as a small effect, .06 as a medium effect and .14 as a large effect.

Warning
In this example we obtained a statistically significant result, but the actual difference in the mean scores of the groups was very small (21.36, 22.10, 22.96). This is evident in the small effect size obtained (eta squared=.02). With a large enough sample (in this case N=435), quite small differences can become statistically significant, even if the difference between the groups is of little practical importance. Always interpret your results carefully, taking into account all the information you have available. Dont rely too heavily on statistical significancemany other

Presenting the results


A one-way between-groups analysis of variance was conducted to explore the impact of age on levels of optimism, as measured by the Life Orientation test (LOT). Subjects were divided into three groups according to their age (Group 1: 29 or less; Group 2: 30 to 44; Group 3: 45 and above). There was a statistically significant difference at the p<.05 level in LOT scores for the three age groups [F(2, 432)=4.6, p=.01]. Despite reaching statistical significance, the actual difference in mean scores between the groups was quite small. The effect size, calculated using eta squared, was .02. Post-hoc comparisons using the Tukey HSD test indicated that the mean score for Group 1 (M=21.36, SD=4.55) was

Two-way between-groups ANOVA

Two-way between-groups ANOVA


Two-way means that there are two independent

variables, and between-groups indicates that different people are in each of the groups. This technique allows us to look at the individual and joint effect of two independent variables on one dependent variable. The advantage of using a two-way design is that we can test the main effect for each independent variable and also explore the possibility of an interaction effect. An interaction effect occurs when the effect of one independent variable on the dependent variable

Summary for two-way ANOVA


Example of research question:

What is the impact of age and gender on

optimism? Does gender moderate the relationship between age and optimism? What you need: Three variables: two categorical independent variables (e.g. Sex: males/females; Age group: young, middle, old); and one continuous dependent variable (e.g. total optimism). Assumptions: the assumptions underlying ANOVA.

Cont.
What it does:

Two-way ANOVA allows you to simultaneously

test for the effect of each of your independent variables on the dependent variable and also identifies any interaction effect. For example, it allows you to test for: sex differences in optimism; differences in optimism for young, middle and old subjects; and the interaction of these two variablesis there a difference in the effect of age on optimism for males and females?

Details of the variables used in this analysis

Procedure for two-way ANOVA


1. From the menu at the top of the screen click on: Analyze, then click on General Linear Model, then on Univariate. 2. Click on your dependent, continuous variable (e.g. total optimism) and move it into the box labelled Dependent variable. 3. Click on your two independent, categorical variables (sex, agegp3: this is age grouped into three categories) and move these into the box labelled Fixed Factors. 4. Click on the Options button. Click on Descriptive Statistics, Estimates of effect size and Homogeneity tests. Click on Continue.

Cont.
5. Click on the Post Hoc button.
From the Factors listed on the left-hand side choose the independent variable(s) you are interested in (this variable should have three or more levels or groups: e.g. agegp3). Click on the arrow button to move it into the Post Hoc Tests for section. Choose the test you wish to use (in this case Tukey). Click on Continue.

6. Click on the Plots button.


In the Horizontal box put the independent variable that has the most groups (e.g. agegp3). In the box labelled Separate Lines put the other independent variable (e.g. sex). Click on Add. In the section labelled Plots you should now see your two variables listed (e.g. agegp3*sex).

The output

Interpretation of output from two-way ANOVA


Descriptive statistics.

These provide the mean scores, standard deviations and N for each subgroup. Check that these values are correct. Levenes Test of Equality of Error Variances.
This test provides a test of one of the assumptions

underlying analysis of variance. The value you are most interested in is the Sig. level. You want this to be greater than .05, and therefore not significant. A significant result (Sig. value less than .05) suggests that the variance of your dependent variable across the groups is not equal. If you find this to be the case in your study it is recommended that you set a more stringent significance level (e.g. .01) for evaluating the results of your two-way

Cont.
Tests Of Between-Subjects Effects.

This gives you a number of pieces of information, not necessarily in the order in which you need to check them. Interaction effects The first thing you need to do is to check for the possibility of an interaction effect (e.g. that the influence of age on optimism levels depends on whether you are a male or a female). In the SPSS output the line we need to look at is labeled AGEGP3*SEX. To find out whether the interaction is significant, check the Sig. column for that line. If the value is less than or equal to .05

Cont.
Main effects

In the left-hand column, find the variable you are interested in (e.g. AGEGP3) To determine whether there is a main effect for each independent variable, check in the column marked Sig. next to each variable. Effect size The effect size for the agegp3 variable is provided in the column labelled Partial Eta Squared (.018). Using Cohens (1988) criterion, this can be classified as small (see introduction to Part Five). So, although this effect reaches statistical significance, the actual difference in the mean values is very small. From the Descriptives table we can see that the mean scores for the three age groups (collapsed for sex) are 21.36, 22.10, 22.96. The difference between the groups appears to be of little practical

Cont.
Post-hoc

Post-hoc tests are relevant only if you have

more than two levels (groups) to your independent variable. These tests systematically compare each of your pairs of groups, and indicate whether there is a significant difference in the means of each. SPSS provides these post-hoc tests as part of the ANOVA output. tests

Cont.
Multiple comparisons

The results of the post-hoc tests are provided in the table labelled Multiple Comparisons. We have requested the Tukey Honestly Significant Difference test, as this is one of the more commonly used tests. Look down the column labelled Sig. for any values less than .05. Significant results are also indicated by a little asterisk in the column labelled Mean Difference.

Cont.
Plots You will see at the end of your SPSS output a plot of the optimism scores for males and females, across the three age groups. This plot is very useful for allowing you to visually inspect the relationship among your variables.

Additional analyses if you obtain a significant interaction effect


Conduct an analysis of simple effects. This means

that you will look at the results for each of the subgroups separately. This involves splitting the sample into groups according to one of your independent variables and running separate oneway ANOVAs to explore the effect of the other variable. Use the SPSS Split File option. This option allows you to split your sample according to one categorical variable and to repeat analyses separately for each group.

Procedure for splitting the sample


1. From the menu at the top of the screen click on: Data, then click on Split File. 2. Click on Organize output by groups. 3. Move the grouping variable (sex) into the box marked Groups based on. 4. This will split the sample by sex and repeat any analyses that follow for these two groups separately. 5. Click on OK.

After splitting the file you then perform a oneway ANOVA Important: When you have finished these analyses for the separate groups you must turn the Split File option off

Presenting the results from two-way ANOVA


A two-way between-groups analysis of variance was conducted to explore the impact of sex and age on levels of optimism, as measured by the Life Orientation test (LOT). Subjects were divided into three groups according to their age (Group 1: 1829 years; Group 2: 3044 years; Group 3: 45 years and above). There was a statistically significant main effect for age [F(2, 429)=3.91, p=.02]; however, the effect size was small (partial eta squared=.02). Post-hoc comparisons using the Tukey HSD test indicated that the mean score for the 1829 age group (M=21.36, SD=4.55) was significantly different from the 45 + group (M=22.96, SD=4.49). The 3044 age group (M=22.10, SD=4.15) did not differ significantly from either of the other groups. The main effect for sex [F(1, 429)=.30, p=.59] and the

Analysis of covariance (ANCOVA)

Analysis of covariance (ANCOVA)


Analysis of covariance is an extension of analysis of

variance that allows you to explore differences between groups while statistically controlling for an additional (continuous) variable. This additional variable (called a covariate) is a variable that you suspect may be influencing scores on the dependent variable. SPSS uses regression procedures to remove the variation in the dependent variable that is due to the covariate/s, and then performs the normal analysis of variance techniques on the corrected or adjusted scores. By removing the influence of these additional variables ANCOVA can increase the power or sensitivity of the Ftest.

Uses of ANCOVA
ANCOVA can be used when you have a two-group pre-

test/post-test design (e.g. comparing the impact of two different interventions, taking before and after measures for each group). The scores on the pre-test are treated as a covariate to control for pre-existing differences between the groups. This makes ANCOVA very useful in situations when you have quite small sample sizes, and only small or medium effect sizes (see discussion on effect sizes in the introduction to Part. Five). Under these circumstances (which are very common in social science research), Stevens (1996) recommends the use of two or three carefully chosen covariates to reduce the error variance and increase your chances of detecting a significant difference

CONT.
ANCOVA is also handy when you have been unable to

randomly assign your subjects to the different groups, but instead have had to use existing groups (e.g. classes of students). As these groups may differ on a number of different attributes (not just the one you are interested in), ANCOVA can be used in an attempt to reduce some of these differences. The use of well-chosen covariates can help reduce the confounding influence of group differences. This is certainly not an ideal situation, as it is not possible to control for all possible differences; however, it does help reduce this systematic bias. The use of ANCOVA with intact or existing groups is somewhat of a contentious one among writers in the field. It would be a good idea to read more widely if you find yourself in this situation. Some of these issues are summarised in Stevens

Comparing means
1 group N 30 N < 30 Independen t N 30 N < 30 Normally distributed Not normal Normally distributed Not normal One-sample t-test One-sample t-test Sign test t-test t-test MannWhitney U or Wilcoxon signed-rank test paired t-test paired t-test Wilcoxon signed-rank test One way anova two or other anova

2 groups

Paired

N 30 N < 30 Normally distributed

Normally distributed Not normal 1 factor 2 factors

3 or more groups

Independen t

Not normal

KruskalWallis one-way analysis of variance by ranks


Repeated measures anova

Dependent

Normally distributed

Not normal

Friedman two-way analysis of variance by ranks

THE END

Test Statistic for Testing a Single Population Mean ()

X o X o t or t s SE ( X ) n

~ t-distribuion with df = n 1.

In general the basic form of a test statistic is given by:

(estimate) (hypothesized value) t SE (estimate)

post hoc
Once you have determined that differences exist

among the means, post hoc range tests and pairwise multiple comparisons can determine which means differ. Range tests identify homogeneous subsets of means that are not different from each other. Pairwise multiple comparisons test the difference between each pair of means, and yield a matrix where asterisks indicate significantly different group means at an alpha level of 0.05 (SPSS, Inc.).

Single-step tests
Tukey-Kramer: The Tukey-Kramer test is an extension of the Tukey test to unbalanced designs. Unlike Tukey test for balanced designs, it is not exact. The FWE of the Tukey-Kramer test may be less than ALPHA. It is less conservative for only slightly unbalanced designs and more conservative when differences among samples sizes are bigger. Hochbergs GF2: The GF2 test is similar to Tukey, but the critical values are based on the studentized maximum modulus distribution instead of the studentized range. For balanced or unbalanced one-way anova, its FWE does not exceed ALPHA. It is usually more conservative than the Tukey-Kramer test for unbalanced designs and it is always more conservative than the Tukey test for balanced designs. Gabriel: Like the GF2 test, the Gabriel test is based on studentized maximum modulus. It is equivalent to the GF2 test for balanced one-way anova. For unbalanced one-way anova, it is less conservative than GF2, but its FWE may exceed ALPHA in highly unbalanced designs. Dunnett: The Dunnetts test is a test to use when the only pariwise comparisons of interest are comparisons with a control. It is an exact test, that is, its FWE is exactly equal to ALPHA, forbalanced as well as

Single-step tests
Below are short descriptions of these tests. LSD: The LSD (Least Significant Difference) test is a two-step test. First the ANOVA F test is performed. If it is significant at level ALPHA, then all pairwise t-tests are carried out, each at level ALPHA. If the F test is not significant, then the procedure terminates. The LSD test does not control the FWE. Bonferroni: The Bonferroni multiple comparison test is a conservative test, that is, the FWE is not exactly equal to ALPHA, but is less than ALPHA in most situations. It is easy to apply and can be used for any set of comparisons. Even though the Bonferroni test controls the FEW rate, in many situations it may be too conservative and not have enough power to detect significant differences.

Single-step tests
Sidak: Sidak adjusted p-values are also easy to compute.

The Sidak test gives slightly smaller adjusted p-values than Bonferroni, but it guarantees the strict control of FWE only when the comparisons are independent . Scheffe: The Scheffe test is used in ANOVA analysis (balanced, unbalanced, with covariates). It controls for the FWE for all possible contrasts, not only pairwise comparisons and is too conservative in cases when pairwise comparisons are the only comparisons of interest. Tukey: The Tukey test is based on the studentized range distribution (standardized maximum difference between the means). For oneway balanced anova, the FWE of the Tukey test is exactly equal the assumed value of ALPHA. The Tukey test is also exact for one-way balanced anova with correlated errors when the type of correlation structure is compound symmetry.

Appendix

Effect size statistics, the most common of which are:

eta squared,

Cohens d and Cohens f

ANOVA Table
Source of variation Between (k groups) d.f. k-1 Sum of squares SSB
(sum of squared deviations of group means from grand mean)

Mean Sum of Squares F-statistic SSB/k-1


SSB SSW

p-value Go to

k 1

nk k

Fk-1,nk-k chart

Within
(n individuals per group)

nk-k

SSW
(sum of squared deviations of observations from their group mean)

s2=SSW/nk-k

Total variation

nk-1

TSS (sum of squared deviations of observations from grand mean)

TSS=SSB + SSW

Characteristics of a Normal Distribution


1) Continuous Random Variable. 2) Bell-shaped curve. 3) The normal curve extends indefinitely in both directions, approaching, but never touching, the horizontal axis as it does so. 4) Unimodal 5) Mean = Median = Mode 6) Symmetrical with respect to the mean. That is, 50% of the area (data) under the curve lies to the left of the mean and 50% of the area (data) under the curve lies to the right of the mean. 7) (a) 68% of the area (data) under the curve is within one standard deviation of the mean (b) 95% of the area (data) under the curve is within two standard deviations of the mean (c) 99.7% of the area (data) under the curve is within three standard deviations of the mean 8) The total area under the normal curve is equal to 1.

You might also like