Professional Documents
Culture Documents
Anatoly Arlashin
Boston College
June 9, 2017
1 R’s basics
2 Importing data
3 Subsetting data
4 Graphics in R
2
Anatoly Arlashin Data Analysis in R: An Introduction
/15
Class R-scripts
The list below contains the links to R scripts that I used in our sessions.
3
Anatoly Arlashin Data Analysis in R: An Introduction
/15
Downloading and installing
The two essential things are R itself and the coding environment for it
called RStudio.
www.r-project.org
www.rstudio.com
Installation of both packages is pretty straightforward, the default op-
tions will be sufficient for all our needs.
4
Anatoly Arlashin Data Analysis in R: An Introduction
/15
Online resources
Online courses:
Rstudio Webinars — free video tutorials on R and RStudio
Online learning — a collection of links from RStudio’s authors
5
Anatoly Arlashin Data Analysis in R: An Introduction
/15
UI and coding basics
6
Anatoly Arlashin Data Analysis in R: An Introduction
/15
Importing data
7
Anatoly Arlashin Data Analysis in R: An Introduction
/15
Importing data tutorials
8
Anatoly Arlashin Data Analysis in R: An Introduction
/15
Working directory
# d i p l a y s the c u r r e n t working d i r e c t o r y
getwd ( )
#s e t s t h e w o r k i n g d i r e c t o r y t o ”C : /MSAE/ ”
setwd ( ”C : /MSAE/ ” )
Note that the path your put in setwd("") command must have
forward slashes.
9
Anatoly Arlashin Data Analysis in R: An Introduction
/15
Excel data (*.xls, *.xlsx)
Many ways to import Excel data, we’ll look at the gdata package and
the read.xls() and read.xlsx() functions it provides.
Note: this package requires Perl language interpreter installed on your
system. It’s preinstalled on MacOS and Linux system, while for
Windows systems one can download it here.
# l o a d package
l i b r a r y ( gdata )
# load help documentation
help ( read . x l s )
# r e a d t h e c o n t e n t s o f t h e f i l e ” mydata . x l s ”
# i n t o a new R o b j e c t c a l l e d ” mydata . x l s ”
mydata . x l s <− r e a d . x l s ( ” mydata . x l s ” )
Both functions return a data frame. The default option is to read the
contents of the first sheet in Excel workbook.
10
Anatoly Arlashin Data Analysis in R: An Introduction
/15
Stata data (*.dat)
# l o a d package
l i b r a r y ( readstata13 )
h e l p ( r e a d . dta13 )
# r e a d t h e c o n t e n t s o f t h e f i l e ” mydata . d t a ”
# i n t o a new R o b j e c t c a l l e d ” mydata . d t a ”
mydata . d t a <− r e a d . dta13 ( ” mydata . d t a ” ) # r e a d f i l e
11
Anatoly Arlashin Data Analysis in R: An Introduction
/15
Tabular separated data (*.csv, *.txt)
This is the type of data stored in a raw text format with variables
separated by either symbols (comma, semicolon, etc) or spaces.
We’ll use the build-in functions read.csv() and read.table():
We will follow the first tutorial in the list below in our class. Others
contain shorter and less formal versions of intro to data subsetting in
R.
Advanced R
Quick R
Cookbook for R
13
Anatoly Arlashin Data Analysis in R: An Introduction
/15
ggplot2 tutorials
14
Anatoly Arlashin Data Analysis in R: An Introduction
/15
data.table and dplyr
15
Anatoly Arlashin Data Analysis in R: An Introduction
/15