You are on page 1of 4

In this lesson we will see:

Setting the working directory [setwd()].


Basic Working with data frames.
Basic display formatting.
Some plot commands.
Getting data from websites.
sourcing a required chunk from the site.
How to define a function in R.

If you need R to help you with a function name you know, use help(<FUNCTION
NAME>). Some methods come with built in examples and for that you can use
example(<FUNCTION NAME>). If you are not sure about the function name
try ??(query string)
Before we start the lesson, lets download the data file received with this tutorial.
The first script we go through, is based on the data from the level of water in
the Michigan Lake - Huron.
For the water level analysis we first read in the data.
> huron <- read.csv("HuronLevel.csv")
> str(huron)

'data.frame':
1844 obs. of 2 variables:
$ Date : Factor w/ 1844 levels "01/01/1860","01/01/1861",..: 1 41 81 121 161 201 241 281 32
$ Level: num 177 177 177 177 177 ...
Simiar to what we did in the first tutorial, we convert Factors to Dates
> huron$Date <- as.Date(huron$Date,"%m/%d/%Y")
> str(huron)

'data.frame':
1844 obs. of 2 variables:
$ Date : Date, format: "1860-01-01" "1860-02-01" ...
$ Level: num 177 177 177 177 177 ...
This will be our first attempt to create plots (a Line Plot)
> plot(huron$Date,huron$Level,type="l",
+
xlab="Year",ylab="Height (m)",
+
main="Lake huron - Daily Water Levels")

176.5
175.5

176.0

Height (m)

177.0

177.5

Lake huron Daily Water Levels

1900

1950

2000

Year

As an exercise
you can explore various parameers of Plot function. Now we find the maximum
and minimum water levels for the entire data set and then find the days within
5% of the range of lake levels. Like C we have a way to format the display.
> minLevel <- min(huron$Level)
> maxLevel <- max(huron$Level)
> diff <- maxLevel-minLevel
Now we will see the usage of a user defined function. N.B : Any global variable
can be accessed from inside the function but not vice-a-versa.
> levels <- function(base){
+
A<- with(huron,abs(huron$Level-base) < (.05 * diff))
+
typeof(A)
+
return(A)
+ }
Now we call the function for upper 5% nad lowe 5% data
>
>
>
>

low.levels <- levels(minLevel)


high.levels <- levels(maxLevel)
years.low <- subset(huron,low.levels)
years.high <- subset(huron,high.levels)

As an addition we will do some display formatting now.


2

> sprintf("There are %d days of low lake levels.", nrow(years.low))


[1] "There are 17 days of low lake levels."
> print("The days are:")
[1] "The days are:"
> print (years.low$Date)
[1]
[6]
[11]
[16]

"1926-02-01"
"1964-03-01"
"1965-02-01"
"2013-02-01"

"1934-02-01" "1934-03-01" "1964-01-01" "1964-02-01"


"1964-04-01" "1964-11-01" "1964-12-01" "1965-01-01"
"1965-03-01" "2012-11-01" "2012-12-01" "2013-01-01"
"2013-03-01"

> sprintf("There are %d days of high lake levels.", nrow(years.high))


[1] "There are 8 days of high lake levels."
> print("The days are:")
[1] "The days are:"
> print (years.high$Date)
[1] "1861-08-01" "1876-07-01" "1876-08-01" "1876-09-01" "1886-05-01"
[6] "1886-06-01" "1886-07-01" "1986-10-01"
Finally, let us see how we can pick up the data from internet source without
downloading the files.
>
>
>
>
+
+
+
+
+
+
+
+
>
>
>
>
>
>
>

stock <- "IBM"


start.date <- "2011-01-01"
end.date <- Sys.Date()
quote <- paste("http://ichart.finance.yahoo.com/table.csv?s=",
stock,
"&a=", substr(start.date,6,7),
"&b=", substr(start.date, 9, 10),
"&c=", substr(start.date, 1,4),
"&d=", substr(end.date,6,7),
"&e=", substr(end.date, 9, 10),
"&f=", substr(end.date, 1,4),
"&g=d&ignore=.csv", sep="")
stock.data <- read.csv(quote, as.is=TRUE)
library(lattice)
library(chron)
source("http://blog.revolutionanalytics.com/downloads/calendarHeat.R")
# Plot as calendar heatmap
# Plot as calendar heatmap
calendarHeat(stock.data$Date, stock.data$Adj.Close, varname=stock)
3

Try the following chunk (may be junk!) on your own: (Does it make sense?)
>
>
>
>
>
>
>
>

lm(Volume~Open,data=stock.data)
a<-lm(Volume~Open,data=stock.data)
summary(a)
with(stock.data,plot(Open,Volume,cex=0.3))
abline(a,col="red")
par(mfrow=c(2,2))
plot(a)
par(mfrow=c(1,1))