You are on page 1of 27

Where to Deploy Hadoop:

Bare-metal or Cloud?

Michael Wendt, Sewook Wee


Data Insights R&D Group
Big Data: Bare-metal vs. Cloud

On-premise Hadoop Hadoop Hadoop-as-


full custom Appliance Hosting a-Service

Bare-metal Cloud

Copyright 2013 Accenture All rights reserved. 2


Big Data: Bare-metal vs. Cloud

On-premise Hadoop Hadoop Hadoop-as-


full custom Appliance Hosting a-Service

Bare-metal Cloud

Price-Performance
Data Privacy Data Gravity
Ratio

Data Productivity of
Enrichment Developers & Data Scientists
Copyright 2013 Accenture All rights reserved. 3
Big Data: Bare-metal vs. Cloud

On-premise Hadoop Hadoop Hadoop-as-


full custom Appliance Hosting a-Service

Bare-metal Cloud

Price-Performance
Data Privacy Data Gravity
Ratio

Data Productivity of
Enrichment Developers & Data Scientists
Copyright 2013 Accenture All rights reserved. 4
Price-Performance Ratio Views

On-premise Price-Performance Hadoop-as-


full custom Ratio a-Service

Bare-metal Cloud

Cloud? Virtualized? Slow!

Who cares! Im cheap,


just throw more in!

Copyright 2013 Accenture All rights reserved. Servers designed by Daniel Campos from The Noun Project 5
Hadoop Deployment Comparison Study

On-premise Price-Performance Hadoop-as-


full custom Ratio a-Service

Bare-metal Cloud

TCO analysis

+
Accenture Data
Platform Benchmark
Copyright 2013 Accenture All rights reserved. 6
Hadoop Deployment Comparison Study
TCO Analysis

On-premise Price-Performance Hadoop-as-


full custom Ratio a-Service

Bare-metal Cloud

TCO analysis

+
Accenture Data
Platform Benchmark
Copyright 2013 Accenture All rights reserved. 7
TCO of Bare-metal Hadoop Cluster

24 server nodes and 50 TB of HDFS


On-premise capacity*
full custom
small-scale initial production deployment
$21,845.04

Data center
Server Technical Staff for
facility and
hardware support operation
electricity
$3,000.00 $2,914.58 $6,656.00 $9,274.46
Copyright 2013 Accenture All rights reserved. Servers designed by Daniel Campos from The Noun Project 8
TCO of Hadoop-as-a-Service

Used bare-metal TCO for budget


Hadoop-as-
Calculated the number of affordable a-Service
instances
$21,845.04

Hadoop Storage Technical Staff for


service services support operation

$15,318.28 $2,063.00 $1,372.27 $3,091.49


Copyright 2013 Accenture All rights reserved. 9
TCO of Hadoop-as-a-Service Instances

14 instance 3 pricing 42 Hadoop-as-


types models combinations a-Service

Hadoop
service

Copyright 2013 Accenture All rights reserved. 10


TCO of Hadoop-as-a-Service Instances

Selected representative 3 instance types: Hadoop-as-


m1.xlarge, m2.4xlarge, cc2.8xlarge a-Service

m1.xl

m2.4xl
Hadoop
service
cc2.8xl

Copyright 2013 Accenture All rights reserved. 11


TCO of Hadoop-as-a-Service Affordable Instances

1/3 of budget
50% cluster Hadoop-as-
allocated for Spot
utilization assumed a-Service
instances

On-demand Reserved Reserved +


Instance
instances instances Spot instances
type
(ODI) (RI) (RI + SI)
m1.xlarge 68 112 192
Hadoop
service m2.4xlarge 20 41 77
$15,318.28 cc2.8xlarge 13 28 53
Copyright 2013 Accenture All rights reserved. 12
Hadoop Deployment Comparison Study
Accenture Data Platform Benchmark

On-premise Price-Performance Hadoop-as-


full custom Ratio a-Service

Bare-metal Cloud

TCO analysis

+
Accenture Data
Platform Benchmark
Copyright 2013 Accenture All rights reserved. 13
Accenture Data Platform Benchmark

Suite of real-world Hadoop


Open-
MapReduce applications Categorized
source
& selected
libraries &
From client experience, common
public
internal roadmap, public use cases
datasets
literature

Use cases Workload


Log management Sessionization
Customer preference
Recommendation engine
prediction
Text Analytics Document clustering
Copyright 2013 Accenture All rights reserved. 14
Accenture Data Platform Benchmark:
Sessionization

Log Log Log


data data data
A session is a sequence of
related interactions, useful to
analyze as a group

Bucketing
Sorting
Slicing

1 million
~150 billion
users,
log entries,
1.1 billion
~24 TB Sessions
sessions

Copyright 2013 Accenture All rights reserved. 15


Accenture Data Platform Benchmark:
Recommendation Engine

Ratings data
Used item-based collaborative Who rated what item?
filtering algorithm
Mahout example library used as
foundation
Co-occurrence matrix
How many people
rated the pair of
items?

Generated 3 million
300 million population, Recommendation
ratings 50,000 items Given the way the person rated
these items, he/she is likely to be
interested in these other items.
Copyright 2013 Accenture All rights reserved. 16
Accenture Data Platform Benchmark:
Document Clustering

Groups similar documents Corpus of crawled web pages

Application components used in Filtered and tokenized documents


many areas (e.g., search
engines, e-commerce site
optimization) Term dictionary

TF vectors

Common ~31,000 TF-IDF vectors


Crawl ARC files or
dataset, 10 ~300 million K-means
TB corpus* HTML pages
Clustered documents
Copyright 2013 Accenture All rights reserved. 17
Hadoop Deployment Comparison Study
Experiment Setup/Results

On-premise Price-Performance Hadoop-as-


full custom Ratio a-Service

Bare-metal Cloud

TCO analysis

+
Accenture Data
Platform Benchmark
Copyright 2013 Accenture All rights reserved. 18
Experiment Setup:
Price-Performance Ratio Comparison

Bare-metal Amazon
Price-Performance
Hadoop EMR
Ratio
Cluster Clusters

1 bare-metal Measure
Fixed Manual and
cluster vs. 9 execution
budget for automated
Amazon EMR time of
cluster size tuning
clusters benchmark

Copyright 2013 Accenture All rights reserved. 19


Experiment Setup:
Starfish Automated Performance Tuning Tool

Starfish (now Unravel) is an


For the experiment we ran each
automated performance tuning
benchmark twice using Starfish
tool for MapReduce jobs

Measure
execution Manual and
Profile Optimize
time of automated
phase phase
optimize tuning
phase

Copyright 2013 Accenture All rights reserved. Speedometer designed by Filippo Camedda from The Noun Project 20
Experiment Results:
Starfish Automated Performance Tuning Tool

Starfish tuned
Recommendation Engine Manually tuned Sessionization
workload w/ 11 cascaded workload
MapReduce jobs

Achieve 2+ weeks of
8x
performance manual Manual and
improvement
increases tuning, - 1 automated
in one tuning
with less cost day tuning
cycle
using Starfish iterations

Copyright 2013 Accenture All rights reserved. 21


Experiment Results:
Sessionization
Execution Time (minutes)

Bare-metal: 533

cc2.8xlarge
m2.4xlarge
m1.xlarge
408.07
381.55
250.13

229.25
204.10
172.23

125.82
166.82
114.35
13 20 68 28 41 112 53 77 192
ODI RI RI+SI
Amazon EMR Configuration
Copyright 2013 Accenture All rights reserved. 22
Experiment Results:
Recommendation Engine
Execution Time (minutes)

Bare-metal: 21.59

cc2.8xlarge
m2.4xlarge
m1.xlarge
23.33
20.13
14.28

21.97
19.97
16.30

18.48
16.92
15.08
13 20 68 28 41 112 53 77 192
ODI RI RI+SI
Amazon EMR Configuration
Copyright 2013 Accenture All rights reserved. 23
Experiment Results:
Document Clustering
Execution Time (minutes)

Bare-metal: 1186.37
cc2.8xlarge
m2.4xlarge
m1.xlarge
1661.03
1649.98

1157.37
1112.68
914.35

779.98

784.82
629.98
742.38
13 20 68 28 41 112 53 77 192
ODI RI RI+SI
Amazon EMR Configuration
Copyright 2013 Accenture All rights reserved. 24
Key Takeaways

Hadoop-as-a-Service Cloud expands the


offers a better price- performance tuning
performance ratio opportunities

Automated performance
tuning tools are a
necessity

Copyright 2013 Accenture All rights reserved. Servers designed by Daniel Campos from The Noun Project 25
Acknowledgement

Copyright 2013 Accenture All rights reserved. 26


More details
Contact us for the full white paper: Hadoop Deployment Comparison Study

Michael Wendt
R&D Developer
Data Insights R&D
Accenture Technology Labs

(408) 817-2190
michael.e.wendt@accenture.com

Scott Kurth
Group Lead
Data Insights R&D
Accenture Technology Labs

(408) 817-2775
scott.kurth@accenture.com

Copyright 2013 Accenture All rights reserved. 27

You might also like