You are on page 1of 4

Introduction to R: Data Manipulation and Statistical Analysis

Exercise

A. Loading data

1. An experiment was conducted to test the effect of 6 NPK treatments in


2003 for 2 seasons (T. Aman and Boro). Data are stored in 2 sheets of
NPK.xls. Save the 2 sheets in comma delimited files then import the two
files to R.
2. Data from two varietal trials were encoded in two separate files
(TRIAL1.tab and TRIAL2.csv). Import the two files to R.

B. Data Manipulation

1. Using the data frames created in A1:

a. Create a variable for season for both data frames.


b. Combine the two data frames into one.
c. From the combined file, create a new variable for grain yield at 14%
moisture content using the following formula:
 100 − MC   GW 
GY 14 =  ×  × (10000 )
 86   HA 
d. Delete MC, GW, and HA from the file.
e. Create a subset excluding sites Daulatdiar, Purba Abdalpur, and
Chandbil. (Note: in R “not equal to” is “!=”.)
f. Convert the data frame to parallel by creating separate columns for
TAman and Boro yield. Drop variety and farmer before converting to
parallel.
g. Delete observations with missing yield from either TAman or Boro or
both.
h. Compute means of each trt for yieldTAman and yieldBoro and store
in a data frame.

2. Using the data frames created in A2:

a. In TRIAL1, rename yield to yield1. In TRIAL2, rename yield to yield2.


b. Merge the two data frames.
c. Sort the merged file by variety name and rep.
d. Convert the data frame to serial by creating a new variable trial.
e. From the output of 2d, compute mean yield for each trial and variety
and store in a data frame.
C. Graphics

1. Using the data frame from B1e:

a. Create histogram of yield.


b. Create a separate boxplot by Site and by Variety.

2. Using the data in yld ht tn.csv create a scatterplot of Yield and Plant Ht.

3. Using the data in NV.csv:

a. Create a bargraph with error plot of varieties group by nrates.


b. Create a bargraph with error plot of nrates group by variety.
c. Obtain a multiple linegraph of varieties with error plot.

D. Descriptive Statistics

Using the data frame from B1e:

a. Obtain a summary statistics of yield; yield by variety; and yield by site.


b. Obtain a Variety x Site summary of means, variances and standard
deviation of yield using summaryBy() and tapply().
c. Divert the outputs of a and b to a text file using sink().

E. Generate randomization layouts for the following designs:

1. RCB with trt=5 and rep=4


2. RCB with 2 factors A=3, B=5, and rep=3
3. Split plot with MP=5, SP=3, and rep=3
4. Strip plot with HF=4, VF=3, and rep=4
5. Alpha lattice with trt=100, block size=4, and rep=2

F. Analysis of variance and mean comparison

1. An experiment was conducted on the effect of 6 times of inoculation of S.


linicola on the oil content of Redwing Flaxseed. The experiment was
conducted using an RCB design with 4 replications. The data is stored in
FLAXSEED.csv. Perform analysis of variance and appropriate mean
comparison. Present mean comparison graphically.

2. A randomized complete block design was used to test the effect of zinc
(Zn) on the length of roots (Length) of several lines (Line). Perform an
analysis of variance and appropriate mean for 5P and 10P separately.
Data is saved in “zinc_rcb.csv”.

3. A factorial experiment was conducted to test the effect of nitrogen rate


(main plot) on tiller number and plant height of rice cultivars (subplot). The
data for the experiment is in “tiller cnt plant ht.csv”. Perform an analysis of
variance and appropriate mean comparison for tiller count only.

4. An experiment to determine the effect of age on lesion types caused by


different isolates in 2 rice varieties was conducted. Several isolates were
collected from farmer’s fields in Candelaria, Quezon. The inoculation
experiment was done using a strip-split plot design with age of the plant as
horizontal factor, variety as the vertical factor and isolate as subplot factor.
For the purpose of this exercise and to make the analysis simpler, we will
only use the data from one isolate. The data is stored in age_variety.csv.
Perform an analysis of variance and appropriate mean comparison for a
strip-plot design with factors age of plant and variety.
G. Regression and correlation analysis
1. Given the following experimental data on rice yield (t/ha), plant height (cm)
and tiller number compute simple correlation coefficient among these
variables. Data are stored in “yld ht tn.csv”.
Obs Yield Plant ht Tiller Obs Yield Plant ht Tiller
(t/ha) (cm) No. (t/ha) (cm) No.
1 5.75 110.5 14 11 7.92 76.4 19
2 5.94 105.4 16 12 5.6 112.1 13
3 6.01 118.1 15 13 5.81 109.5 14
4 6.54 104.5 18 14 6.33 89.8 17
5 6.73 93.6 15 15 6.95 78.3 18
6 6.75 84.1 18 16 7.25 75.9 19
7 6.9 77.8 18 17 5.5 111.8 15
8 7.86 75.6 19 18 5.88 108.9 14
9 6.56 96.2 17 19 6.86 92.5 16
10 6.4 92.6 14 20 7.46 78.9 19

2. Air with varying concentration of CO2 is passed over wheat leaves at a


temperature of 35oC and the uptake of CO2 by the leaves is measured.
Results for nine leaves at different concentration (X) of uptake (Y) are
obtained and are as follows:

Rate Uptake
(cm3/dm2/hour)
75 0.00
100 0.65
120 1.00
130 0.95
160 1.80
190 2.80
200 2.50
240 4.30
250 4.50

Data is stored in “co2uptak.csv”. Perform regression analysis on Uptake


and Rate. Plot the data to determine the appropriate model to use.

You might also like