Professional Documents
Culture Documents
ONE-WAY ANOVA
Step 3. [A check for equal variance - the underlying assumption of this test] For each column divide Sd2 by n-1 to obtain the variance, s 2. Divide the highest value of s2 by the lowest value of s 2 to obtain a variance ratio (F). Then look up a table of Fmax for the number of treatments in
www.biology.ed.ac.uk/archive/jdeacon/statistics/tress6.html 1/7
6/8/12
ONE-WAY ANOVA
our table of data and the degrees of freedom (number of replicates per treatment -1). If our variance ratio does not exceed the Fmax value then we are safe to proceed. If not, the data might need to be transformed. Step 4. Sum all the values of S x 2 and call the sum A.
Step 6. Sum all the values for S x to obtain the grand total. Step 7. Square the grand total and divide it by total number of observations; call this D. Step 8. Calculate the Total sum of squares (S of S) = A - D Step 9. Calculate the Between-treatments sum of squares = B - D Step 10. Calculate the Residual sum of squares = A - B [This is sometimes called the Error sum of squares] Step 11. Construct a table as follows, where *** represents items to be inserted, and where u = number of treatments and v = number of replicates. Source of variance Between treatments Residual Total Sum of squares (S of S) *** *** *** Degrees of freedom Mean square (df) = S of S / df u- 1 u(v-1) (uv)-1 *** ***
[The total df is always one fewer than the total number of data entries] Step 12. Using the mean squares in the final column of this table, do a variance ratio test to obtain an F value: F = Between treatments mean square / Residual mean square Step 13. Go to a table of F (p = 0.05) and read off the value where n1 is the df of the between treatments mean square and n2 is df of the residual mean square. If the calculated F value exceeds the tabulated value there is significant difference between treatments. If so, then look at the tabulated F values for p = 0.01 and then 0.001, to see if the treatment differences are more highly significant.
6/8/12
ONE-WAY ANOVA
using the same types of procedure, but for more than 2 samples. If you want to convince yourself of this, then try doing the Analysis of Variance for just two samples (e.g. Bacterium A and Bacterium B). You will get exactly the same result as in a t-test. Analysis of variance: worked example Replicate 1 2 3
Sx
Bacterium A 12 15 9 36 3 12
S x2
450 432
Sd2
s 2 (=Sd2 /n-1)
18 9.4
Fmax test: F = 13/4.35 = 2.99. This is lower than the Fmax of 87.5 (for 3 treatments and 2 df, at p = 0.05) so the variances are homogeneous and we can proceed with analysis of variance. If our value exceeded the tabulated Fmax then we would need to transform the data. D = (Grand total)2 total observations = 2152 9 = 5136.1
Total sum of squares (S of S) = A - D = 1192.9 Between-treatments S of S = B - D = 1140.2 Residual S of S = A - B = 52.7 Source of variance Between treatments Residual Total Sum of squares (S of S) 1140.2 52.7 1192.9 Degrees of freedom * u - 1 (=2)* u(v-1) (=6)* (uv)-1 (=8)* Mean square (= S of S df) 570.1 8.78
[* For u treatments (3 in our case) and v replicates (3 in our case); the total df is one fewer than the total number of data values in the table (9 values in our case)] F = Between treatments mean square /Residual mean square = 570.1 / 8.78 = 64.93
www.biology.ed.ac.uk/archive/jdeacon/statistics/tress6.html 3/7
6/8/12
ONE-WAY ANOVA
The tabulated value of F (p = 0.05) where u is df of between treatments mean square (2) and v is df of residual mean square (6) is 5.1. Our calculated F value exceeds this and even exceeds the tabulated F value for p = 0.001 (F = 27.0). So there is a very highly significant difference between treatments. [Note that the term "mean square" in an Analysis of Variance is actually a variance - it is calculated by dividing the sum of squares by the degrees of freedom. In a t-test we would call it s 2, obtained by dividing Sd2 by n-1. Analysis of Variance involves the partitioning of the total variance into (1) variance associated with the different treatments/samples and (2) random variance, evidenced by the variability within the treatments. When we calculate the F value, we ask, in effect, "is there a large amount of variance associated with the different treatments compared with the amount of random variance?".]
In the t-test, we calculate sd 2 as follows: In the analysis of variance, s2 for each treatment is assumed to be the same, and if n for each treatment is the same, then we could compare any two means by calculating sd 2 as follows: sd 2 = 2 x residual mean square / n We can then find sd as the square root of sd 2 and calculate t as:
If we did this for two particular means,we could compare the calculated t with that in a t-table, using the df of the residual mean square (because this reflects the residual variance in the whole experiment). There is a simpler way of doing this for any two means:
www.biology.ed.ac.uk/archive/jdeacon/statistics/tress6.html 4/7
6/8/12
ONE-WAY ANOVA
In other words, any two means would be significantly different from one another if they differ by more than "t multiplied by sd " So t(sd ) represents the least significant difference (LSD) between any two means. In scientific papers you might see data presented as follows: Bacterium 1 2 3 5% LSD Biomass (mg) 12 20.7 39 5.92
Here the author would be giving us the means for the 3 treatments (bacteria) and telling us that analysis of variance was used to find the least significant difference between any of the means at p = 0.05 (the level of probability chosen for the t value). In fact, the table above uses the data for bacterial biomass in our worked example. For 5% LSD, we find sd 2 (= 2 x residual mean square / n). It is 17.56 /3 = 5.85. We square root this to find sd = 2.42. The tabulated value of t for 6 df (of the residual mean square) is 2.45 (p = 0.05). So the 5% LSD is t(sd ) = 2.45 x 2.42 = 5.92. Our table of data indicates that each bacterium produced a significantly different biomass from every other one. A word of caution: We can be much more confident about significant difference between bacteria 1 and 3 or between bacteria 2 and 3 than we can about the difference between bacteria 1 and 2. Remember that every time we make such a comparison we run the risk of 5% error. But if we had used the t value for p = 0.01 then we could more safely make five comparisons and still have only a 1 in 20 chance of being wrong. Statisticians recommend that the LSD should never be used indiscriminately, but only to test comparisons between treatments that we "nominated" when designing the experiment. For example, each treatment might be compared with a control, but each treatment should not necessarily be compared with each other treatment. Method 2. Many people now use variants of the LSD, such as a Multiple Range Test, which enables us more safely to compare any treatments in a table. This test is far preferable to the LSD. It is explained separately on another page.
www.biology.ed.ac.uk/archive/jdeacon/statistics/tress6.html 5/7
6/8/12
ONE-WAY ANOVA
Anova: Single Factor SUMMARY Groups Column 1 Column 2 Column 3 ANOVA Source of Variation Between Groups Within Groups Total SS 1140.222 52.66667 1192.889 df 2 6 8 MS 570.1111 8.777778 F 64.94937 P-value 8.61E-05 F crit 5.143249 Count 3 3 3 Sum 36 62 117 Average 12 20.66667 39 Variance 9 4.333333 13
Note: There is always a danger in using a statistical package, because the package does whatever we tell it to do. It does not "think" or "consider" whether what we ask it to do is legitimate. For example, it does not test for homogeneity of variance. BEWARE!
CONTENTS INTRODUCTION
THE SCIENTIFIC METHOD Experimental design Designing experiments with statistics in mind Common statistical terms
www.biology.ed.ac.uk/archive/jdeacon/statistics/tress6.html 6/7
6/8/12
ONE-WAY ANOVA
Descriptive statistics: standard deviation, standard error, confidence intervals of mean. WHAT TEST DO I NEED? STATISTICAL TESTS: Student's t-test for comparing the means of two samples Paired-samples test. (like a t-test, but used when data can be paired) Analysis of variance for comparing means of three or more samples: For comparing separate treatments (One-way ANOVA) Calculating the Least Significant Difference between means Using a Multiple Range Test for comparing means For factorial combinations of treatments (Two-way ANOVA) Chi-squared test for categories of data Poisson distribution for count data Correlation coefficient and regression analysis for line fitting: linear regression logarithmic and sigmoid curves TRANSFORMATION of data: percentages, logarithms, probits and arcsin values STATISTICAL TABLES: t (Student's t-test) F, p = 0.05 (Analysis of Variance) F, p = 0.01 (Analysis of Variance) F, p = 0.001 (Analysis of Variance) c2 (chi squared) r (correlation coefficient) Q (Multiple Range test) Fmax (test for homogeneity of variance)
www.biology.ed.ac.uk/archive/jdeacon/statistics/tress6.html
7/7