You are on page 1of 6

13/01/2018 Analysis of Variance (ANOVA) in R: 5 Steps

Analysis of Variance (ANOVA) in R by Pseudopod in software

Download 5 Steps Collection I Made it! Favorite Share

This an instructable on how to do an Analysis of Variance test, commonly called


ANOVA, in the statistics software R.

ANOVA is a quick, easy way to rule out un-needed variables that contribute little to the
explanation of a dependent variable. It is acessable and applicable to people outside of
the statistics field.

This instructable will assume no prior knowledge in R and will give basic software
commands that may be trivial to an experienced user. If you are familiar with R I suggest
skipping to Step 4, and proceeding with a known dataset already in R.

R is a free, open source, and ubiquitous in the statistics field. R has all-text commands
written in the computer language S. It is helpful, but by no mean necessary, to have an
elementary understanding of text based computer languages. If you can not stand
working with text-based software I suggest that you try the statistics software JMP.

What You need:


-Access to a computer
-A data set to analyze

Estimated time to complete an ANOVA Test:


-15 minutes for a new user.
-5 minutes for an experienced user.

Step 1: Getting Started:

http://www.instructables.com/id/Analysis-of-Variance-ANOVA-in-R/ 1/6
13/01/2018 Analysis of Variance (ANOVA) in R: 5 Steps

If R isn't on your computer already it can be downloaded for free from the official
website at:
http://cran.r-project.org/bin/windows/base/ (Windows)
http://cran.r-project.org/bin/macosx/ (Mac)
http://cran.r-project.org/ (Linux)
Choose the version (32bit/64bit) that is of the operating system’s natural base.

Open R:

You will see the basic command counsel open. This is a log and output of commands
executed. However, it can’t be edited making it hard to work with, instead open a Script
window with the following:

File >>
New Script

This window acts as a basic word processor (close to notepad)from which commands
can be written, edited, and executed by right-clicking a line or selection and running it.
Alternatively the shortcut Control+r will also execute a line or selection.
http://www.instructables.com/id/Analysis-of-Variance-ANOVA-in-R/ 2/6
13/01/2018 Analysis of Variance (ANOVA) in R: 5 Steps

Note:
You can write comments in R by putting a pound sign (#) at the start of the comment. an
example of this is shown in Step 3.

Step 2: Reading Data:

.csv is perhaps the most prevalent file type when dealing with data files. .csv Files can
be made easily from excel. Alternatively you can enter your data directly into R by
naming and pointing variables (see the secondary image for help)

If you have a .csv file, great! Read it in using one of the following commands:
data name = read.csv("appropriate web page or file directory")
data name = read.csv(file.choose())
Once this is done, explore your data with the following commands:
dim(data name)
str(data name)
head(data name)
attach(data name)
Note:
You will need to run attach, or else R will not know what data set you are referring to.
http://www.instructables.com/id/Analysis-of-Variance-ANOVA-in-R/ 3/6
13/01/2018 Analysis of Variance (ANOVA) in R: 5 Steps

Step 3: Running the ANOVA Test:

You’re doing great! You are close to being done with a single independent variable
ANOVA test already.

Run the Analysis of Variance with the following R command:


name=aov(y variable~x variable) #runs the ANOVA test.
ls(name) #lists the items stored by the test.
summary(name) #give the basic ANOVA output.

The example in the images compare Calories as the dependent variable, y, compared to
one independent variable (Sugars in this example).

Note:
If R cannot find the variable specified make sure the punctuations match and that you
have executed the ‘attach(data)’ command.

Step 4: More Then One Independent Variable

http://www.instructables.com/id/Analysis-of-Variance-ANOVA-in-R/ 4/6
13/01/2018 Analysis of Variance (ANOVA) in R: 5 Steps

The case with multiple independent variables x1,x2 to xn is a simple change.

Modify the code such the independent variables are a product with an asterisk (*) in-
between them:

name=aov(y~x1*x2*xn)

Step 5: Interpreting the Data:

Lets us the multivariate model from step 4.

Here we are trying to describe Calories in terms of Sugars, Calories from Fat,
Protein, and their interactions with each other (Sugar*CalFat, Sugars*Protein,
CalFat*Protein, and Sugars*CalFat*Protein)

Focus on the column: the probability that F is greater then the listed value from
the previous column. This is often called the p value. In most cases you put
significance at the alpha=.05 level, or we require the P value to be less then
.05 to be considered statistically significant.

Immediately we can see that the terms Sugars, Sugars*CalFat, and


CalFat*Protein are not significant at the .05 level. Alternatively we see that
CalFat and Sugars*CalFat*Protein are the best terms respectively with P values
much less then .05.

From this we can conclude that if your goal is to describe Calories you only need
http://www.instructables.com/id/Analysis-of-Variance-ANOVA-in-R/ 5/6
to do a regression on CalFat or potential Sugars*CalFat*Protein. If you plan to
13/01/2018 Analysis of Variance (ANOVA) in R: 5 Steps

take more samples and all you care about its predicting or describing Calories
you now only have to gather Calories from Fat and forgo gathering all the other
variables.

Congratulations! You just completed and are now able to interpret your very own
data set with the analysis test. This handy tool can save you and your company
untold amounts of time, effort and money!

http://www.instructables.com/id/Analysis-of-Variance-ANOVA-in-R/ 6/6

You might also like