Professional Documents
Culture Documents
Administration
Or, With Great Power Comes Great
Responsibility in MapReduce Systems
Shivnath Babu
Duke University
?
Time
Relational DBMS
19751985
19851995
Features +++++
Open source ++
19952005
Manageability Crisis,
Research +++
20052010
Claims of self-managing,
Hard to add new features
MapReduce/Hadoop
2020
Testing
Tuning
Diagnosis
Applying fixes
Configuring
Benchmarking
Capacity planning
Disaster/failure
recovery automation
Detection/repair of
data corruption
Reduce function
Input
Splits
Map
Wave 1
Map
Wave 2
Reduce
Wave 1
Reduce
Wave 2
Experiments
On EC2 and
local clusters
10
500
0.15
28
10
0.15
300
10
0.15
300
500
0.15
Running
time
Declarative HiveQL/Pig
operations
Job configuration
parameters
Space of execution
choices
Multi-job
workflows
Complexity
Problem Space
Energy
considerations
Cost in pay-as-you-go
environment
Current approaches:
Predominantly manual
Post-mortem analysis
Performance
objectives
Is this where
we want to be?
Challenges
Features of Hadoop from
a usability perspective
Ability to specify
schema late
Easy integration with
programming lang.
Pluggability
?
Time
Relational DBMS
19751985
19851995
Features +++++
Open source ++
19952005
Manageability Crisis,
Research +++
20052010
Claims of self-managing,
Hard to add new features
MapReduce/Hadoop
2020