Professional Documents
Culture Documents
= Big Maths
Laurence Liew
General Manager, APAC
Who we are
Leading provider of commercial analytics platform
based on open source R statistical computing
language
Our
Software Delivers
Power: Distributed, scalable high performance
advanced analytics
Productivity: Easier to build and deploy analytic
applications
Customers
200+ Global 2000
Global Presence
Digital Media
Our Philosophy
Customer-centric innovation
Easy to do business with
Government
High Tech
Manufacturing
Retail
Telco
2
Revolution Confidential
Centre of Excellence
COE
Partner with iLEs to create new IPs in big data
analytics in Singapore
Big data analytics training/workshops
We will have our data scientist and developers work
alongside our collaboration partners.
Centre of Attachment
COA
To accelerate formation of data science team within
organization
Analytics/statistics skills
Big data infrastructure skills such as Hadoop and
HPC clusters
Petabytes
Terabytes
Gigabytes
3D/4D
Seismic
Systems
Logs
Volumes
ERP
Cost
Records
Realtime
Telemetry
Vehicle
Monitoring
Logistics
Summary
Operating
Statistics
Machine
Sensors
Geospatial
ESRI
Incidents
Alarms
Daily
Activity
Reports
Communication
Logs
Video
And
Imagery
Text
Instructions
Workorders
Reports
Volume Variety
Velocity
???
ANALYTICS
HDD -> SSD -> In-Memory
INFRASTRUCTURE AND DATABASES
What is R (Video)
http://www.youtube.com/watch?feature=player_embe
dded&v=TR2bHSJ_eck
10
= Language + Analytics
CONSUME
Big Data
Speed of
Analysis
Enterprise
Readiness
Analytic
Breadth
& Depth
Commercial
Viability
In memory bound
Disk based
scalability
Single
threaded
Parallel
threading
Community support
4500+ innovative
analytic packages
Risk of
deployment of
open source
13
Commercial
support
Leverage open
source packages
plus Big Data
ready packages
Commercial
License
13
14
14
15
32 nodes
appliance
~ $2.5M
Rows of data
1 billion
Parameters
just a few
Double
Time
80 seconds
45%
Data location
In memory
Nodes
32
1/6th
Cores
384
5%
20
RAM
1,536 GB
5%
80 GB
1 billion
7
44 seconds
On disk
Revolution R is faster on the same amount of data, despite using approximately a 20th as many cores, a 20th as
much RAM, a 6th as many nodes, and not pre-loading data into RAM.
2% of the Cost
5 nodes
Linux HPC
cluster
~ $30K
17
Platform
Time to fit
1: SAS
5 hours
2 R
250 GB Server
3: RRE
5.7 minutes
Hortonworks
Cloudera, Intel
EDW
Teradata
Clustered Systems
Linux HPC
Windows HPC
Desktop
Server
Linux
In the Cloud
CloudR
DeployR
ConnectR
ScaleR
DistributedR
19
20
Frontend
- 2-way or 4-way
- Cluster Management
- Fast HDD
- Lots of RAM
Admin Network
- Good Bandwidth
- Route admin traffic
- Typically : GE
Compute Node
- 2-way or 4-way
- Computations
- Fast CPU
- Fast HDD
- Lots of RAM
To in-database
23
Division
24
To Cloud
25
26
Hadoop + R
27
Hadoop
Data setup
Mapper
Reducer
31
Analytics
Applications
Hadoop
Big Data
Scale
100% Portability
Scalable
Compute
Hive
Data
Portability.
HBase
HDFS
Parallel Storage
Applications
Edge Node
Analytics
Applications
MapReduce
Revolution
R Enterprise
Analytics
DeployR
Revolution
R Enterprise
ScaleR Algorithms
Data
DB, EDW
M2M
ScaleR Algorithms
DistributedR
Framework
DistributedR
Framework
ConnectR:
HBase
HDFS
ODBC &
High-Speed Connectors
ConnectR:
HBase
HDFS
ODBC &
High-Speed Connectors
HDFS
HBase
33
So how do I start?
34
www.bigdatastarterkit.com
www.bigdataconsumerkit.com
35
Q & A
Revolution Analytics is the leading
commercial provider of software and
support for the popular open source R
statistics language.
E: Laurence.liew@revolutionanalytics.com
W: www.revolutionanalytics.com
36