You are on page 1of 33

Introducing

Revolution R Open
The Enhanced R Distribution
November 12, 2014

In todays webinar:
R Update
Revolution R Open
The Reproducible R Toolkit
MRAN
Other open-source projects
DeployR Open
ParallelR
Rhadoop
Revolution R Plus
Q&A

David Smith
Chief Community Officer
Revolution Analytics
@revodavid
david@revolutionanalytics.com

Editor, blog.revolutionanalytics.com
Co-author, Introduction to R

OUR COMPANY

The leading provider


of advanced analytics
software and services
based on open source R,
since 2007

OUR PRODUCT

REVOLUTION R: The
enterprise-grade predictive
analytics application platform
based on the R language

SOME KUDOS

Visionary
Gartner Magic Quadrant
for Advanced Analytics
Platforms, 2014

What is R?
Most widely used data analysis software
Used by 2M+ data scientists, statisticians and analysts

Most powerful statistical programming language


Flexible, extensible and comprehensive for productivity

Create beautiful and unique data visualizations


As seen in New York Times, Twitter and Flowing Data

Thriving open-source community


Leading edge of analytics research

Fills the Data Science talent gap


New graduates prefer R

www.revolutionanalytics.com/what-is-r

Poll #1
What software do you use for statistical analysis? (Select all that apply.)

R
SAS
SPSS
Python
Other

Rs popularity is growing rapidly


More at blog.revolutionanalytics.com/popularity

R Usage Growth

Language Popularity

Rexer Data Miner Survey, 2007-2013

IEEE Spectrum Top Programming Languages

#9: R
Rexer Data Miner Survey

IEEE Spectrum, July 2014


6

Revolution R Open is:


Enhanced Open Source R distribution
Compatible with all R-related software
Multi-threaded for performance
Focus on reproducibility
Open source (GPLv2 license)
Available for Windows, Mac OS X, Ubuntu,
Red Hat and OpenSUSE
Download from
mran.revolutionanalytics.com

Multi-threaded performance
Intel MKL replaces standard
BLAS/LAPACK algorithms
Pipelined operations
Optimized for Intel, works for all archs

High-performance algorithms
Sequential Parallel
Uses as many threads as there are
available cores
Control with:
setMKLthreads(<value>)

No need to change any R code


Included in RRO binary distribution
More at Revolutions blog
8

100% Compatibility
Built on latest R engine
Currently R 3.1.1, R 3.1.2 in testing

100% compatible with


R scripts
R packages
Applications with R connections

Designed to work with Rstudio


No configuration required

Replaces existing R application


Side-by-side installations

Reproducibility why do we care?


Academic / Research
Verify results
Advance Research
Business
Production code
Reliability
Reusability
Collaboration
Regulation

www.nytimes.com/2011/07/08/health/research/08genes.html
http://arxiv.org/pdf/1010.1092.pdf
10

An R Reproducibility Problem

Adapted from http://xkcd.com/234/ CC BY-NC 2.5

11

Reproducible R Toolkit in RRO


Static CRAN mirror
CRAN packages fixed with each Revolution R Open update

Daily CRAN snapshots


Storing every package version since September 2014
Binaries and sources
At mran.revolutionanalytics.com/snapshot

Easily write and share scripts synced to a specific snapshot


checkpoint package installed with RRO
CRAN

CRAN mirror
http://cran.revolutionanalytics.com/

Midnight
UTC

checkpoint
server

checkpoint
package

http://mran.revolutionanalytics.com/snapshot/

Daily
RR
snapshots

library(checkpoint)
checkpoint("2014-09-17")
12

Using checkpoint
Easy to use: add 2 lines to the top of each script
library(checkpoint)
checkpoint("2014-09-17")

For the package author:


Use package versions available on the chosen date
Installs packages local to this project
Allows different package versions to be used simultaneously

For a script collaborator:


Automatically installs required packages
Detects required packages (no need to manually install!)
Uses same package versions as script author to ensure reproducibility

13

MRAN: The Managed R Archive Network


Download Revolution R
Open
Learn about R and RRO
Daily CRAN snapshots
Explore Packages
and dependencies

Explore Task Views

14

Revolution Analytics

Open Source Projects


More at projects.revolutionanalytics.com

DeployR Open
Goal: embed results from R scripts into
existing applications, in real time
Problem:
Exposing arbitrary R functions is unwise
Need to handle concurrent R sessions

Solution: DeployR Open


R, on a server, behind a firewall
Repository Manager defines entry points
Expose only authorized R functions
Automatically creates Web Services APIs
Manages and monitors pool of R sessions
Separates roles for R and app developer

DeployR Open: for prototyping integrations


Revolution R Enterprise adds grid-scaling and
enterprise authentication

More at deployr.revolutionanalytics.com

16

DeployR : Integration
DeployR does not provide any application UI.
3 integration modes embed real-time R results into existing interfaces
Web app, mobile app, desktop app, BI tool, Excel,

RBroker Framework (tutorial):


Simple, high-performance API for Java, .NET and Javascript apps
Supports transactional, on-demand analytics on a stateless R session
Client Libraries (tutorial):
Flexible control of R services from Java, .NET and Javascript apps
Also supports stateful R integrations (e.g. complex GUIs)
DeployR Web Services API:
Integrate R using almost any client languages

17

DeployR : Security / Scalability Layers


1. Anonymous execution

Only authorized, user-defined R functions accessible

No state preserved

2. Basic username / password authentication

Managed in DeployR Administration Console

3. Enterprise Authentication

Verifies identify with SSO / LDAP / Active Directory / PAM

4. Adaptive load-balancing grid

Ensures service availability

Only available in Revolution R Enterprise DeployR

18

DeployR Open demo

Fraud detection

19

RHadoop and ParallelR


Toolkits for data scientists and numerical analysts to create custom
parallel and distributed algorithms
ParallelR: parallel programming for multi-CPU servers and grids
RHadoop: map-reduce programming in R language
Mainly useful for embarrassingly parallel problems, where parallel
components work with small amounts of data
Big Data Predictive Analytics mostly not embarrassingly parallel
80+ pre-built parallel external memory algorithms included with
Revolution R Enterprise

20

RHadoop
Collection of packages for interfacing R and Hadoop
Client (desktop) R interface to Hadoop:
rhdfs: Browse, read, write and modify files stored in HDFS
rhbase: Browse, read, write and modify tables stored in HBASE
ravro: Read, write and run map-reduce on Apache Avro files in HDFS

R computations in Hadoop:
rmr2: write map-reduce tasks in R to run in Hadoop
plyrmr: R-based data manipulation computations on data in Hadoop

RHadoop Wiki: github.com/RevolutionAnalytics/RHadoop/wiki


21

Word count in RHadoop


Map:
Input: lines of text
Output: words with key value 1

Reduce:
Input: Words with several key values
Output: words with counts

Map-Reduce:
Apply map to lines of text
Gather like words together and count

22

Word count: execution

More: Video replay of Using R with


Hadoop by Jeffrey Breen

http://bit.ly/W35PLR
23

ParallelR
foreach replaces for loops

Minimal code change required

Choice of parallel backends

doParallel (base parallel)

doMC (multi-core servers)

doSNOW (grids)

Iterations run in parallel

Speedups depend on backend,


granularity

All iterations run in-memory

birthday <- function(n) {


m <- 10000
x <- numeric(m)
for(i in 1:m) {
b <- sample(1:365, n, repl=T)
x[i] <- ifelse(length(unique(b))==n,0,1)
}
mean(x) # est prob of at least 1 match
}

for(j in 1:100) birthday(j)

2-core MacBook Air: 21.9s

library("doMC")
registerDoMC(2)
x <- foreach(j=1:100) %dopar% birthday(j)

2-core MacBook Air: 12.0s


24

Introducing

Revolution R Plus

Revolution R Plus includes:


AdviseR Technical Support for:
Revolution R Open
Including R, base and recommended packages
Reproducible R Toolkit
ParallelR: Parallel programming with R
RHadoop: R integration with Hadoop

DeployR Open: Secure deployment of R to applications

Open Source Assurance for all supported components


Provides legal indemnity for subscribers

Workstation subscriptions: $1,800 per year


Server and Hadoop subscriptions also available
26

AdviseR Technical Support


Technical support for R, from the R experts.
10x5 email and phone support (in your local time zone)
Full support for R, validated packages, and third-party software
connections
Notifications of updates and bug fixes
On-line case management and knowledgebase
Access to technical resources, documentation and user forums
Defined service-level agreements for rapid responses
Included with Revolution R Plus and Revolution R Enterprise.

27

Open Source Assurance


Revolution Analytics will defend Revolution R Plus subscribers should a
third party make an intellectual property claim against covered open
source software with respect to:
copyrights, patents, trademarks, trade secrets

Covered software includes:


Revolution R Open (incl. R base and recommended packages), Reproducible R
Toolkit, DeployR Open, ParallelR, RHadoop

Revolution Analytics will defend open source software in court


If necessary, Revolution Analytics will obtain rights, modify, or replace software
found to be infringing
If a resolution cant be found, fees paid in past 12 months will be refunded.

28

The Revolution R Product Suite


Revolution R Open
Free and open source R distribution
Enhanced and distributed by Revolution Analytics
Revolution R Plus
Open-source distribution of R, packages, and other components
Enhanced, supported and indemnified by Revolution Analytics
Revolution R Enterprise
Secure, Scalable and Supported Distribution of R
With proprietary components created by Revolution Analytics

29

Revolution R Enterprise (RRE)


The All-Inclusive Big Data Big Analytics Platform

DeployR

ConnectR
ScaleR
DistributedR

DevelopR

High-performance open source R plus:


Data source connectivity to big-data objects
Big-data advanced analytics
Multi-platform environment support
In-Hadoop and in-Teradata predictive modeling
Visual Studio IDE option
Secure, Scalable R Deployment
Technical support, training and services

24x7 support option

Contact Revolution Analytics for more info: www.revolutionanalytics.com/contact-us


30

Poll #2
Which Revolution Analytics projects do you plan to use (or already use?)
Select all that apply:
1.
2.
3.
4.
5.

Revolution R Open (free distribution)


Revolution R Plus (paid subscription for support and indemnification)
Reproducible R Toolkit (checkpoint package)
DeployR Open
Rhadoop / ParallelR

31

Wrapping up
Revolution R Open is available now from
mran.revolutionanalytics.com/download

Explore Revolution Analytics open-source projects at


projects.revolutionanalytics.com

David Smith
Chief Community Officer
Revolution Analytics

Technical support and open-source assurance with


Revolution R Plus

@revodavid
david@revolutionanalytics.co
m

www.revolutionanalytics.com/plus

32

Thank you.
Next up:

Batter Up! Advanced Sports Analytics with R and Storm


December 11, 2014
revolutionanalytics.com/webinars
www.revolutionanalytics.com
1.855.GET.REVO
Twitter: @RevolutionR

You might also like