You are on page 1of 14

Data analysis & Visualization with R

Shree Joshi twitter:@2joshis

Image Credit: waveking1/flickr

What is R ?
R is an integrated suite of software facilities for data manipulation, calculation and graphical display.
It includes an effective data handling and storage facility, a suite of operators for calculations on arrays, in particular matrices, a large, coherent, integrated collection of intermediate tools for data analysis, graphical facilities for data analysis and display either on-screen or on hardcopy, and a well-developed, simple and effective programming language which includes conditionals, loops, user-defined recursive functions and input and output facilities.

Objects/Symbols
Everything in R is an object Common Object Types
Vector - collection of same type of objects
> c(1:10) [1] 1 2 3 4 5 6 7 8 9 10 >even <- seq(from=2,to=10,by=2) >odd <-seq(from=1,to=10,by=2) >ifelse(rep(c(0,1),times=2),odd,even) >sort(c(even,odd)) * c(1,2)

List - collection of dissimilar objects


>address <- list(door=1,street="infinity loop",city="cupertino",dimension=c(1:3))

>address$door [1] 1 > address$street >[1] "infinity loop" >address$dimension >[1] 1 2 3

Objects/Symbols
Factors Ordered and unordered, similar to Enum Matrix two dimensional vector of same type data.frame Tables with columns of different object types array Timeseries

Operations
Subsetting Data
> c <- (1:10) > c [1] 1 2 3 4 5 6 7 8 9 10 > c[c<5] [1] 1 2 3 4

> c[-(1:2)] [1] 3 4 5 6 7 8 9 10

Arithmetic
>c*2
[1] 2 4 6 8 10 12 14 16 18 20

Operations
> x <- cbind(1:10) >x [,1] [1,] 1 [2,] 2 [3,] 3 [4,] 4 [5,] 5 [6,] 6 [7,] 7 [8,] 8 [9,] 9 [10,] 10 > apply(x,2,sum) [1] 55 > foo <- list(a=1:10,b=11:20) > lapply(foo,sum) > sapply(foo,sum)

Functions/Libraries
> foo <- function(x) { x*x} > foo function(x) { x*x} > foo(2) [1] 4

CRAN Package Repository

Accessing Data with R


Reading/Writing Data
CSV/Text Files
foo <-read.csv("D:/shree/R/rprojects/5el/cm26JUL2012bhav.csv",strip.white=TRUE)

foo2 <subset(foo,SERIES="EQ",select=c("SYMBOL","OPEN","HIGH","LOW","CLOSE","PREVCLOSE")) foo3<-cbind(foo2,change=(foo2$CLOSE-foo2$PREVCLOSE)/foo2$PREVCLOSE) summary(foo3$change)


breaks<-seq(from=-0.1,to=0.1,by=0.02)

f<-cut(foo3$change,breaks) Summary(f)

Databases, Excel, Rcpp Web readHTMLTable(), XML,JSON

Graphing
plot(foo3$change,col=seagreen) hist(foo3$change,breaks=50,col='seagreen') plot(foo3$change,col='seagreen',type='h')

Time series
Sequence of Orderes Data points in time
Regularly spaced Irregularly spaced

ts - regularly spaced time series mts - multiple regularly spaced time series its - irregularly spaced time series timeSeries - default for Rmetrics packages fts - R interface to tslib (c++ time series library) zoo - reg/irreg and arbitrary time stamp classes xts - an extension of the zoo class

Packages for Financial Analysis

Slide Credit: Guy Yollin

Blotter Flow

Slide Credit: Guy Yollin

QuantStrat

Slide Credit: Guy Yollin

Acknowledgements
Guy Yollin R Cookbook R in a Nutshell

You might also like