You are on page 1of 12

Mikayla Ferenz

Stat 431
Test 4

Executive Summary:
The purpose of this experiment was to test whether or not the irrigation method or
the plot that trees were planted in had a significant if any effect on the yield of orange
harvest. In my exploratory analysis I recognized that data collected would fit a
Randomized Complete Block Design perfectly. The trees were separated into plots,
which essentially functioned as our block effect, and then each irrigation method which
became our treatment effect, and was assigned to each plot randomly. I analyzed my data
and completed by hypothesis tests using this data. It should also be noted that for the
purpose of this experiment there are two hypothesis tests: first testing for significance of
our treatment, and the second testing the significance of our block.
Upon conducting our ANOVA test, we found an F-statistic of 0.6654 and a p-
value of 0.6517 for our test of irrigation method. 0.6517 >𝛼, (we assumed an α of .05,
based on a 95% confidence interval) therefore we fail to reject the null hypothesis.
Testing our block treatment, we found an F-statistic of 11.712 and a p-value of 0.001314.
0.001314 < 𝛼 (once again using a 95% confidence interval), therefore we reject the null
hypothesis. From these two tests, we concluded that there is not sufficient evidence that
irrigation method affects orange harvest, however the plot of land is statistically
significant, and thus does effect the orange harvest yield.

Conclusion:
The implications of this result are that farmers can use whichever irrigation
method is easiest or most cost-effective for them because using a specific method does
not have a significant affect on orange harvest. The results have also paved the way for
more experiments. I think the next step should be conducting an experiment that
determines the factors that have a significant effect on orange harvest, since we already
concluded that plot of land has a significant effect.
If I were to do the experiment again I think I would gather data about the makeup
of each plot of land that way an analysis of which factors in the plot of land contribute to
a better orange harvest can be conducted. My knowledge of soil is fairly limited, but that
is the data I think researchers should collect if this experiment was to be done again.
Further analysis that could be conducted also includes post-hoc comparisons to
compare the different irrigation methods, however my own personal belief is that such
analysis would be redundant. A perhaps more useful analysis would be to compare the
individual plots to determine which plots led to a better orange harvest. Researchers
could then collect more specific data on those plots to determine what factors each plot
possess and how using that information they could maximize orange harvest yields.

What I have learned:


I think the most significant thing that I learned in STATS 431 was to approach
statistics more like a scientist versus a mathematician. I am a math major and before this
class I thought of stats as “math but with more words”. I learned how to analyze data like
a researcher, not a mathematician. Ann was right when she said, “stats is not math” on
the first day of class. I think the class expanded my statistics skills to be more useful to a
real world situation.

Experimental Design:
Randomized Complete Block Design:
𝑦𝑖𝑗 = 𝜇 + 𝛼𝑖 + 𝛽𝑗 + 𝜀𝑖𝑗
𝑦𝑖𝑗 - Response variable (weight)
𝜇- Population Mean
𝛼𝑖 - Treatment effect (irrigation method)
𝛽𝑗 - Block effect (plot)
𝜀𝑖𝑗 - random error

Hypothesis Test:
𝐻0 : 𝛼1 = 𝛼2 = 𝛼3 = 𝛼4 = 𝛼5 = 𝛼6 = 0
𝐻0 : 𝛽1 = 𝛽2 = 𝛽3 = 𝛽4 = 𝛽5 = 𝛽6 = 𝛽7 = 𝛽8 = 0
𝐻𝑎 : 𝐻0 is not true (at least one 𝛼𝑖 /𝛽𝑗 is different)
Our hypothesis tests are to test whether or not the treatment effect and/or the
blocking effect has a significant effect on orange harvest. Looking at the ANOVA table
for the model orange.fit4 to test our treatment effect, we have an F-statistic of 0.6654 and
a p-value of 0.6517 > 𝛼, therefore we fail to reject the null hypothesis. We can conclude
that there is not sufficient evidence that irrigation method affects orange harvest yields.
Looking at our ANOVA table for our model orange.fit3 to test our block effect, we have
an F-statistic of 11.712 and a p-value of 0.001314 < 𝛼, therefore we reject the null
hypothesis. We can conclude that the plot the trees are planted in affects its orange
harvest yield.

Numerical Summaries:
Plot 1 2 3 4 5 6 7 8
Mean 363.33 422.67 293.17 82.17 282.83 299.33 239.83 186.67
weight
St. dev. 87.31 64.38 83.42 29.73 60.73 43.72 43.70 110.29
weight
Above is our table of means and standard deviations of weight by plot. Just by looking at
the table we can see that the means and standard deviations vary quite a bit by plot. This
suggests that plot may have a significant effect on weight.
Irrigation Basin Flood Spray Sprinkler Sprinkler+Spray Trickle
Method
Mean 291.500 229.625 223.750 292.000 291.000 299.625
weight
St. dev. 134.4778 111.4078 122.2664 110.4329 124.7512 117.1274
Weight
Above is our table of means and standard deviations of weight by irrigation method. Just
by looking at the values in the table, you can notice that the means and standard
deviations of weight do not vary much by method. This suggests that irrigation method
type may not have an effect on the harvest yield.
Plots:

Boxplots:

Above is the boxplot of weight by irrigation method. The box labeled “basin” has a
noticeably larger variance than the rest of the boxes. Also the box labeled “sprinkler” has
a noticeably smaller variance than the others, and it also has an outlier. However the rest
of the boxes seem to have constant variance. “Basin” and “sprinkler” may violate our
ANOVA condition of constant variance, however I do not think the difference is big
enough to be a concern. The boxes also look to be pretty normal. The box labeled
“trickle” is right-skewed but our other boxes look fairly good. Therefore I think our
ANOVA conditions are met for our analysis solely based on the look of our box plots.

Above is the boxplot of weight by plot. Comparing the variances of all the boxes, boxes
4, 6, and 7 have slightly smaller variances, while box 8 has a larger variance. This could
violate our constant variance ANOVA condition. Some of our boxes also look to violate
our normality condition. Boxes 1, 3, 4, and 6 are all right skewed. Our ANOVA
conditions are not met perfectly by looking at this plot, however I think an ANOVA test
may still be useful.

Scatterplots:

Above are our scatterplots. The plot on the right is weight by plot, and the one on
the left is weight by irrigation method. I knew as soon as I looked at these plots,
regression analysis was not an option because both of our predictor variable are non-
continuous.

Interaction Plot:

Above is the interaction plot of weight, by irrigation method and plot. There is no
discernable pattern to our plot, and we can conclude that there is some interaction going
on between our response variable and our predictor variables.

Diagnostic Plots:
orange.fit1=lm(weight~plot+irrigation+plot:irrigation)
Above are the diagnostic plots for our first model, which included our treatment
effect, our block effect, and the interaction term. Our residual plots look okay. Our
residual plot looks pretty random. Our QQ-plot is fairly straight except for a break in the
middle and our tails.

orange.fit2=lm(weight~plot+irrigation)
Above are the diagnostic plots for our second model. I took out the interaction
term and ran a model without it, because it was not significant. The residual plot looks
random and our QQ-plot is straight except for the tails. This set of plots look better than
the first ones.

orange.fit3=lm(weight~plot)

Above are the plots for our third model, which only includes our block variable,
plot. I took out irrigation because it was not significant and I wanted to run an analysis to
test the significance of our block effect. Our residual plot looks random and our QQ-plot
is straight in the middle.

orange.fit4=lm(weight~irrigation)
Above is the diagnostic plots for our fourth model, which only includes irrigation
method. I tested this model to test the significance of our treatment effect. Our residual
plot looks random and our QQ-plot looks straight except for the tails.

Code:
> oranges2 <- read_excel("//client/H$/Desktop/School Work/STAT
431/assignment4/oranges2.xlsx")
> View(oranges2)
> attach(oranges2)
> irrigation<-as.factor(oranges2$irrigation) #make sure irrigation
isnt read as numeric

> boxplot(weight~irrigation, main="Weight vs Irrigation",


xlab="irrigation type", ylab="weight of harvest")
> boxplot(weight~plot, main="Weight vs Plot", xlab="Plot",
ylab="weight of harvest")

> scatter.smooth(weight~plot)
> scatter.smooth(weight~irrigation)

> interaction.plot(oranges2$irrigation, oranges2$plot,


oranges2$weight)

> meanbyplot=tapply(oranges2$weight, oranges2$plot, mean, na.rm=T)


> meanbyplot
1 2 3 4 5 6 7
363.33333 422.66667 293.16667 82.16667 282.83333 299.33333 239.83333
8
186.66667

> sdbyplot=tapply(oranges2$weight, oranges2$plot, sd, na.rm=T)


> sdbyplot
1 2 3 4 5 6 7
87.30788 64.39462 83.42282 29.72821 60.73028 43.72032 43.70088
8
110.29355

> meanbyirrigation=tapply(oranges2$weight, oranges2$irrigation, sd,


na.rm=T)
> meanbyirrigation
basin flood spray sprinkler
134.4778 111.4078 122.2664 110.4329
sprinkler+spray trickle
124.7512 117.1274
> sdbyirrigation=tapply(oranges2$weight, oranges2$irrigation, sd,
na.rm=T)
> sdbyirrigation
basin flood spray sprinkler
134.4778 111.4078 122.2664 110.4329
sprinkler+spray trickle
124.7512 117.1274

> orange.fit1=lm(weight~plot+irrigation+plot:irrigation) #test with


interaction term first
> summary(orange.fit1)

Call:
lm(formula = weight ~ plot + irrigation + plot:irrigation)

Residuals:
Min 1Q Median 3Q Max
-245.786 -49.830 1.661 76.964 168.071

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 402.071 88.258 4.556 5.79e-05
***
plot -24.571 17.478 -1.406 0.168
irrigationflood -79.071 124.816 -0.634 0.530
irrigationspray -48.250 124.816 -0.387 0.701
irrigationsprinkler -73.429 124.816 -0.588 0.560
irrigationsprinkler+spray 22.964 124.816 0.184 0.855
irrigationtrickle 13.750 124.816 0.110 0.913
plot:irrigationflood 3.821 24.717 0.155 0.878
plot:irrigationspray -4.333 24.717 -0.175 0.862
plot:irrigationsprinkler 16.429 24.717 0.665 0.511
plot:irrigationsprinkler+spray -5.214 24.717 -0.211 0.834
plot:irrigationtrickle -1.250 24.717 -0.051 0.960
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 113.3 on 36 degrees of freedom


Multiple R-squared: 0.2966, Adjusted R-squared: 0.0817
F-statistic: 1.38 on 11 and 36 DF, p-value: 0.2242

> anova(orange.fit1)
Analysis of Variance Table

Response: weight
Df Sum Sq Mean Sq F value Pr(>F)
plot 1 133262 133262 10.3869 0.002696 **
irrigation 5 48198 9640 0.7513 0.590587
plot:irrigation 5 13320 2664 0.2076 0.957121
Residuals 36 461873 12830
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> plot(orange.fit1)
> orange.fit2=lm(weight~plot+irrigation) #take out interaction term
because it is not significant
> summary(orange.fit2)

Call:
lm(formula = weight ~ plot + irrigation)

Residuals:
Min 1Q Median 3Q Max
-245.00 -52.23 17.63 74.76 172.01

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 394.982 48.786 8.096 4.89e-10 ***
plot -22.996 6.782 -3.391 0.00155 **
irrigationflood -61.875 53.829 -1.149 0.25702
irrigationspray -67.750 53.829 -1.259 0.21529
irrigationsprinkler 0.500 53.829 0.009 0.99263
irrigationsprinkler+spray -0.500 53.829 -0.009 0.99263
irrigationtrickle 8.125 53.829 0.151 0.88076
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 107.7 on 41 degrees of freedom


Multiple R-squared: 0.2763, Adjusted R-squared: 0.1704
F-statistic: 2.609 on 6 and 41 DF, p-value: 0.031

> anova(orange.fit2)
Analysis of Variance Table

Response: weight
Df Sum Sq Mean Sq F value Pr(>F)
plot 1 133262 133262 11.4979 0.001553 **
irrigation 5 48198 9640 0.8317 0.534733
Residuals 41 475193 11590
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> plot(orange.fit2)

> orange.fit3=lm(weight~plot)
> summary(orange.fit3)

Call:
lm(formula = weight ~ plot)

Residuals:
Min 1Q Median 3Q Max
-224.75 -46.26 9.75 73.26 192.26

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 374.732 33.932 11.044 1.58e-14 ***
plot -22.996 6.719 -3.422 0.00131 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 106.7 on 46 degrees of freedom
Multiple R-squared: 0.2029, Adjusted R-squared: 0.1856
F-statistic: 11.71 on 1 and 46 DF, p-value: 0.001314

> anova(orange.fit3)
Analysis of Variance Table

Response: weight
Df Sum Sq Mean Sq F value Pr(>F)
plot 1 133262 133262 11.712 0.001314 **
Residuals 46 523391 11378
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> plot(orange.fit3)

> orange.fit4=lm(weight~irrigation)
> summary(orange.fit4)

Call:
lm(formula = weight ~ irrigation)

Residuals:
Min 1Q Median 3Q Max
-233.50 -63.62 -0.75 65.38 229.50

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 291.500 42.554 6.850 2.38e-08 ***
irrigationflood -61.875 60.181 -1.028 0.310
irrigationspray -67.750 60.181 -1.126 0.267
irrigationsprinkler 0.500 60.181 0.008 0.993
irrigationsprinkler+spray -0.500 60.181 -0.008 0.993
irrigationtrickle 8.125 60.181 0.135 0.893
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 120.4 on 42 degrees of freedom


Multiple R-squared: 0.0734, Adjusted R-squared: -0.03691
F-statistic: 0.6654 on 5 and 42 DF, p-value: 0.6517

> anova(orange.fit4)
Analysis of Variance Table

Response: weight
Df Sum Sq Mean Sq F value Pr(>F)
irrigation 5 48198 9639.5 0.6654 0.6517
Residuals 42 608455 14487.0
> plot(orange.fit4)

Raw Data:
weight irrigation plot
450 trickle 1
358 basin 1
331 spray 1
317 sprinkler 1
479 sprinkler+spray 1
245 flood 1
469 trickle 2
521 basin 2
402 spray 2
423 sprinkler 2
341 sprinkler+spray 2
380 flood 2
249 trickle 3
281 basin 3
183 spray 3
379 sprinkler 3
404 sprinkler+spray 3
263 flood 3
125 trickle 4
58 basin 4
70 spray 4
63 sprinkler 4
115 sprinkler+spray 4
62 flood 4
280 trickle 5
352 basin 5
258 spray 5
289 sprinkler 5
182 sprinkler+spray 5
336 flood 5
352 trickle 6
293 basin 6
281 spray 6
239 sprinkler 6
349 sprinkler+spray 6
282 flood 6
221 trickle 7
283 basin 7
219 spray 7
269 sprinkler 7
276 sprinkler+spray 7
171 flood 7
251 trickle 8
186 basin 8
46 spray 8
357 sprinkler 8
182 sprinkler+spray 8
98 flood 8

You might also like