Welcome to Scribd!

Skip carousel

Hadoop Admin

Uploaded by

rsreddy.ch5919

0% found this document useful (0 votes)

505 views13 pages

Hadoop Admin

Copyright

Available Formats

PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Hadoop Admin

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

0% found this document useful (0 votes)

505 views13 pages

Hadoop Admin

Uploaded by

rsreddy.ch5919

Hadoop Admin

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

Jump to Page

You are on page 1of 13

Search inside document

Simplifying Hadoop Usage and

Administration
Or, With Great Power Comes Great
Responsibility in MapReduce Systems
Shivnath Babu
Duke University

?
Time

Relational DBMS

19751985

New & useful

technology

19851995

Features +++++
Open source ++

19952005

Manageability Crisis,
Research +++

20052010

Claims of self-managing,
Hard to add new features

MapReduce/Hadoop

New & useful

technology
Features +++++
Open source ++

2020

Different Aspects of Manageability

Testing
Tuning
Diagnosis
Applying fixes
Configuring
Benchmarking
Capacity planning
Disaster/failure
recovery automation
Detection/repair of
data corruption

Roles (often overlap)

User (writes MapReduce

programs, Pig scripts,
HiveQL queries, etc.)
Developer
Administrator

Lifecycle of a MapReduce Job

Map function

Reduce function

Run this program as a

MapReduce job

Lifecycle of a MapReduce Job

Time

Input
Splits

Map
Wave 1

Map
Wave 2

Reduce
Wave 1

Reduce
Wave 2

How are the number of splits, number of map and reduce

tasks, memory allocation to tasks, etc., determined?

Job Configuration Parameters

190+ parameters in
Hadoop
Set manually or defaults
are used
Are defaults or rules-ofthumb good enough?

Running time (seconds)

Running time (minutes)

Experiments

On EC2 and
local clusters

Illustrative Result: 50GB Terasort

17-node cluster, 64+32 concurrent map+reduce slots
mapred.reduce. io.sort. io.sort.record.
tasks
factor
percent
10
10
0.15
Based on
popular
rule-ofthumb

500

0.15

300

0.15

300

500

0.15

Running
time

Performance at default and rule-of-thumb settings can be poor

Cross-parameter interactions are significant

Declarative HiveQL/Pig
operations
Job configuration
parameters

Space of execution
choices

Multi-job
workflows

Complexity

Problem Space
Energy
considerations
Cost in pay-as-you-go
environment

Current approaches:
Predominantly manual
Post-mortem analysis

Performance
objectives

Is this where
we want to be?

Challenges
Features of Hadoop from
a usability perspective

These features are very

useful when dealing with

Ability to specify
schema late
Easy integration with
programming lang.
Pluggability

Multiple data formats

Mix of structured and
unstructured data
Multiple computational
engines (e.g., R, DBMS)
Changes/evolution

Input data formats

Storage engines
Schedulers
Instrumentation

But, they pose nontrivial

manageability challenges

Some Thoughts on Possible Solutions

Exploit opportunities to learn
Schema can be learned from Pig Latin scripts, HiveQL queries,
MapReduce jobs
Profile-driven optimization from the compiler world
High ratio of repeated jobs to new jobs is common

Exploit the MapReduce/Hadoop design

Common sort-partition-merge skeleton
Design for robustness gives many mechanisms for adaptation &
observation (speculative execution, storing intermediate data)
Multiple map waves
Fine-grained and pluggable scheduler

Some Thoughts on Possible Solutions

Automate try-it-out and trial-and-error approaches
For example, use 5% of cluster resources to run MapReduce
tasks with a different configuration
Exploit clouds pay-as-you-go resources, EC2 spot instances

?
Time

Relational DBMS

19751985

New & useful

technology

19851995

Features +++++
Open source ++

19952005

Manageability Crisis,
Research +++

20052010

Claims of self-managing,
Hard to add new features

MapReduce/Hadoop

New & useful

technology
Features +++++
Open source ++

2020

Learn Hadoop in 24 Hours
From Everand
Learn Hadoop in 24 Hours
Alex Nordeen
No ratings yet
Exploring Hadoop Ecosystem (Volume 1): Batch Processing
From Everand
Exploring Hadoop Ecosystem (Volume 1): Batch Processing
Wei Liu
No ratings yet
Hadoop: Data Processing and Modelling
From Everand
Hadoop: Data Processing and Modelling
Garry Turkington
No ratings yet
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
From Everand
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
Wei Liu
No ratings yet
Learn Hive in 24 Hours
From Everand
Learn Hive in 24 Hours
Alex Nordeen
No ratings yet
Apache Hadoop Training
Document377 pages
Apache Hadoop Training
Ramya Varanasi
No ratings yet
Getting Started with Big Data Query using Apache Impala
From Everand
Getting Started with Big Data Query using Apache Impala
Agus Kurniawan
No ratings yet
Cloudera A Complete Guide - 2019 Edition
From Everand
Cloudera A Complete Guide - 2019 Edition
Gerardus Blokdyk
No ratings yet
Apache Oozie Essentials
From Everand
Apache Oozie Essentials
Singh Jagat Jasjit
No ratings yet
Learning HBase
From Everand
Learning HBase
Shashwat Shriparv
No ratings yet
Data Migration To Hadoop
Document26 pages
Data Migration To Hadoop
krishna
100% (2)
Google Cloud Dataproc The Ultimate Step-By-Step Guide
From Everand
Google Cloud Dataproc The Ultimate Step-By-Step Guide
Gerardus Blokdyk
No ratings yet
Hadoop 2.x Administration Cookbook
From Everand
Hadoop 2.x Administration Cookbook
Gurmukh Singh
No ratings yet
Hadoop Classroom Notes
Document76 pages
Hadoop Classroom Notes
HADOOPtraining
100% (2)
Monitoring Hadoop
From Everand
Monitoring Hadoop
Gurmukh Singh
No ratings yet
Administrator Exercise Instructions 201306
Document117 pages
Administrator Exercise Instructions 201306
Thanos Peristeropoulos
No ratings yet
Hadoop Cluster Deployment
From Everand
Hadoop Cluster Deployment
Danil Zburivsky
No ratings yet
Hadoop Admin Interview Questions
Document9 pages
Hadoop Admin Interview Questions
sriny123
No ratings yet
Hadoop Administration
Document97 pages
Hadoop Administration
arjun.ec633
No ratings yet
Hadoop Training #4: Programming With Hadoop
Document46 pages
Hadoop Training #4: Programming With Hadoop
Dmytro Shteflyuk
100% (2)
Mastering Apache Cassandra - Second Edition
From Everand
Mastering Apache Cassandra - Second Edition
Nishant Neeraj
No ratings yet
Data Engineer Interview Questions
Document6 pages
Data Engineer Interview Questions
Ghulam Mustafa
No ratings yet
Hadoop Questions
Document41 pages
Hadoop Questions
Amit Bhartiya
No ratings yet
Cloudera Hadoop Admin Notes PDF
Document65 pages
Cloudera Hadoop Admin Notes PDF
Pankaj Sharma
No ratings yet
Cloudera Administration Study Guide
Document3 pages
Cloudera Administration Study Guide
sruthianitha
No ratings yet
Cloudera Administration Handbook
From Everand
Cloudera Administration Handbook
Rohit Menon
No ratings yet
DW Concepts
Document40 pages
DW Concepts
Akash Kumar
100% (1)
Hadoop Administrator Interview Questions: Cloudera® Enterprise Version
Document13 pages
Hadoop Administrator Interview Questions: Cloudera® Enterprise Version
madhubaddapuri
No ratings yet
Hadoop Lab
Document32 pages
Hadoop Lab
ISS GVP
100% (1)
Mastering Azure Synapse Analytics: Learn how to develop end-to-end analytics solutions with Azure Synapse Analytics (English Edition)
From Everand
Mastering Azure Synapse Analytics: Learn how to develop end-to-end analytics solutions with Azure Synapse Analytics (English Edition)
Debananda Ghosh
No ratings yet
Fast Data Processing with Spark 2 - Third Edition
From Everand
Fast Data Processing with Spark 2 - Third Edition
Krishna Sankar
No ratings yet
Hadoop Training #1: Thinking at Scale
Document20 pages
Hadoop Training #1: Thinking at Scale
Dmytro Shteflyuk
100% (1)
Hadoop
Document34 pages
Hadoop
forjunklikescribd
No ratings yet
Hadoop Ecosystem
Document16 pages
Hadoop Ecosystem
poojan thakkar
No ratings yet
Cloudera Spark
Document70 pages
Cloudera Spark
İsmail Cambaz
No ratings yet
NiFi A Complete Guide - 2021 Edition
From Everand
NiFi A Complete Guide - 2021 Edition
Gerardus Blokdyk
No ratings yet
Hadoop Blueprints
From Everand
Hadoop Blueprints
Anurag Shrivastava
No ratings yet
Hadoop 2.0 – Hadoop Lab Linux |HDFS Commands
Document6 pages
Hadoop 2.0 – Hadoop Lab Linux |HDFS Commands
anirudh sanga
100% (2)
Hadoop & Big Data
Document36 pages
Hadoop & Big Data
Paresh Bhatia
No ratings yet
Hadoop and Big Data: An Introduction
Document180 pages
Hadoop and Big Data: An Introduction
valtech20086605
No ratings yet
Master Big Data Engineering with IBM
Document27 pages
Master Big Data Engineering with IBM
shrishaila_shetty
No ratings yet
AWS Big Data Specialty Study Guide PDF
Document13 pages
AWS Big Data Specialty Study Guide PDF
arjun.ec633
No ratings yet
Bigdata Notes
Document26 pages
Bigdata Notes
Anil Yarlagadda
No ratings yet
Data Lake Architecture Strategy A Complete Guide - 2021 Edition
From Everand
Data Lake Architecture Strategy A Complete Guide - 2021 Edition
Gerardus Blokdyk
No ratings yet
"Analytics Using Apache Spark": (Lightening Fast Cluster Computing)
Document99 pages
"Analytics Using Apache Spark": (Lightening Fast Cluster Computing)
santoshi sairam
No ratings yet
IBM InfoSphere DataStage A Complete Guide - 2021 Edition
From Everand
IBM InfoSphere DataStage A Complete Guide - 2021 Edition
Gerardus Blokdyk
No ratings yet
Hadoop For Windows Succinctly PDF
Document148 pages
Hadoop For Windows Succinctly PDF
Dat Nguyen Hoang
No ratings yet
Cloudera Administration PDF
Document478 pages
Cloudera Administration PDF
babu n
No ratings yet
Getting Started with Apache NiFi: A Quick Guide
Document10 pages
Getting Started with Apache NiFi: A Quick Guide
Mario Soares
No ratings yet
Hadoop Big Data Cluster Management
Document35 pages
Hadoop Big Data Cluster Management
Ekapop Verasakulvong
100% (1)
Amazon Redshift Complete Self-Assessment Guide
From Everand
Amazon Redshift Complete Self-Assessment Guide
Gerardus Blokdyk
No ratings yet
Top 100 Hadoop Interview Questions
Document17 pages
Top 100 Hadoop Interview Questions
patricia
No ratings yet
Cassandra Certification Study Guide DataStax
Document20 pages
Cassandra Certification Study Guide DataStax
Mahesh Reddy sayamolla
13% (8)
EDB Postgres Third Edition
From Everand
EDB Postgres Third Edition
Gerardus Blokdyk
No ratings yet
Introduction To Data Warehousing
Document74 pages
Introduction To Data Warehousing
svdontha
No ratings yet
Banking on Cloud Data Platforms: A Guide
From Everand
Banking on Cloud Data Platforms: A Guide
Dillip Kumar
No ratings yet
Practical Machine Learning with Spark: Uncover Apache Spark’s Scalable Performance with High-Quality Algorithms Across NLP, Computer Vision and ML
From Everand
Practical Machine Learning with Spark: Uncover Apache Spark’s Scalable Performance with High-Quality Algorithms Across NLP, Computer Vision and ML
Gourav Gupta
No ratings yet
Hadoop Multi Node Cluster
Document7 pages
Hadoop Multi Node Cluster
chandu102103
No ratings yet
What is Big Data? Four V's and Uses in Business
Document222 pages
What is Big Data? Four V's and Uses in Business
ravi
No ratings yet
HDInsight Essentials - Second Edition
From Everand
HDInsight Essentials - Second Edition
Rajesh Nadipalli
No ratings yet
Mahout Quick Guide
Document21 pages
Mahout Quick Guide
rsreddy.ch5919
No ratings yet
Oracle ASM 11g
Document50 pages
Oracle ASM 11g
madhusrib
No ratings yet
Java Quick Guide
Document13 pages
Java Quick Guide
rsreddy.ch5919
No ratings yet
Map Reduce Quick Guide
Document36 pages
Map Reduce Quick Guide
rsreddy.ch5919
No ratings yet
Oracle Demantra Sales and Operations Planning User Guide Release 7.3
Document74 pages
Oracle Demantra Sales and Operations Planning User Guide Release 7.3
rsreddy.ch5919
No ratings yet
Linux Vs Solaris Commands PDF
Document5 pages
Linux Vs Solaris Commands PDF
rsreddy.ch5919
No ratings yet
Hbase Quick Guide
Document53 pages
Hbase Quick Guide
rsreddy.ch5919
No ratings yet
Sqoop Quick Guide
Document18 pages
Sqoop Quick Guide
rsreddy.ch5919
No ratings yet
Hadoop Quick Guide
Document32 pages
Hadoop Quick Guide
rsreddy.ch5919
No ratings yet
Oracle ASM 11g
Document50 pages
Oracle ASM 11g
madhusrib
No ratings yet
Avro Quick Guide
Document28 pages
Avro Quick Guide
rsreddy.ch5919
No ratings yet
Oracle PLSQL Notes
Document59 pages
Oracle PLSQL Notes
datastage_dwh
100% (4)
Demantra Architecture
Document20 pages
Demantra Architecture
aedsouza
No ratings yet
Oracle Database 10g - DBA
Document98 pages
Oracle Database 10g - DBA
Abhishek kumar singh
100% (1)
Oracle PLSQL Notes
Document59 pages
Oracle PLSQL Notes
datastage_dwh
100% (4)
Oracle PLSQL Notes
Document59 pages
Oracle PLSQL Notes
datastage_dwh
100% (4)
Interview Questions
Document70 pages
Interview Questions
jaikolariya
No ratings yet
Oracle RDBMS & SQL Tutorial (Very Good)
Document66 pages
Oracle RDBMS & SQL Tutorial (Very Good)
api-3717772
100% (7)
Rac
Document113 pages
Rac
rsreddy.ch5919
No ratings yet
RAC Database Cloning With HOT Backup
Document3 pages
RAC Database Cloning With HOT Backup
rsreddy.ch5919
No ratings yet
Java Developer Dalchand Sharma Seeks New Opportunities
Document3 pages
Java Developer Dalchand Sharma Seeks New Opportunities
dambaku
No ratings yet
Transmisor Can TJA 1055
Document27 pages
Transmisor Can TJA 1055
Revelino Arenas P
No ratings yet
University of Central Punjab: Program (S) To Be Evaluated A. Course Description Bsse
Document5 pages
University of Central Punjab: Program (S) To Be Evaluated A. Course Description Bsse
nabeel nawaz
No ratings yet
ITNE3006 Design Network Infrastructure: Assignment
Document12 pages
ITNE3006 Design Network Infrastructure: Assignment
qwerty
100% (1)
DB2 PM Usage Guide Update
Document232 pages
DB2 PM Usage Guide Update
gborja8881331
No ratings yet
Embeddedc 150206083902 Conversion Gate02
Document424 pages
Embeddedc 150206083902 Conversion Gate02
Kishore Cherala
No ratings yet
Main characteristics of database approach
Document17 pages
Main characteristics of database approach
SaladAss Gaming
No ratings yet
Integra 1630 Communications Guide Iss 6 PDF
Document37 pages
Integra 1630 Communications Guide Iss 6 PDF
Độc Cô Quái Khách
No ratings yet
Logic Families
Document14 pages
Logic Families
Ajay Reddy
No ratings yet
Cell Characterization Concepts
Document70 pages
Cell Characterization Concepts
hex0x1
No ratings yet
Fdma, Tdma, Sdma, Cdma
Document9 pages
Fdma, Tdma, Sdma, Cdma
Raj
No ratings yet
Vol7-Thrust Lever Setup
Document9 pages
Vol7-Thrust Lever Setup
miguel caro
No ratings yet
Practicial 1 To 7,10,11,12 by Jas
Document30 pages
Practicial 1 To 7,10,11,12 by Jas
Trushi Bhensdadia
No ratings yet
Chapter 2 Additional Problems: X2.1 Three Circuit Elements
Document10 pages
Chapter 2 Additional Problems: X2.1 Three Circuit Elements
Jorge A Aguilar
No ratings yet
4:1 HDMI/DVI Switch With Equalization, DDC/CEC Buffers and EDID Replication
Document28 pages
4:1 HDMI/DVI Switch With Equalization, DDC/CEC Buffers and EDID Replication
lucas silva
No ratings yet
Digital Letter Box Project Report
Document7 pages
Digital Letter Box Project Report
Moyammir Haider
No ratings yet
WC Lab Manual
Document27 pages
WC Lab Manual
Kivvi Singh
No ratings yet
Com - Jdexpert.virtual64bit Logcat
Document5 pages
Com - Jdexpert.virtual64bit Logcat
Arjhon Jake Celades
No ratings yet
Experiment 10: Implementation of Income Tax/loan EMI Calculator and Deploy The Same On Real Devices
Document9 pages
Experiment 10: Implementation of Income Tax/loan EMI Calculator and Deploy The Same On Real Devices
SHUBHANKAR KANOJE
100% (1)
BASIC VLSI DESIGN .Ppts
Document32 pages
BASIC VLSI DESIGN .Ppts
Narayana Rao Revalla
No ratings yet
PS Administration
Document422 pages
PS Administration
Sunil Gupta
No ratings yet
SF Dump
Document14 pages
SF Dump
Felipe Xochihua
No ratings yet
Lec02 ASA
Document21 pages
Lec02 ASA
Little Bro
No ratings yet
EW8051 CompilerReference
Document431 pages
EW8051 CompilerReference
Pankaj Bodani
No ratings yet
Coa 1
Document133 pages
Coa 1
Vis Kos
No ratings yet
One Page SDLC
Document2 pages
One Page SDLC
Naveen Lohchab
No ratings yet
Pengenalan Kepada Autocad
Document28 pages
Pengenalan Kepada Autocad
Hanapi Ahamd Awat
No ratings yet
Install Cadence
Document4 pages
Install Cadence
nguyenbinh12x
No ratings yet
Network Monitoring With Zabbix - HowtoForge - Linux Howtos and Tutorials
Document12 pages
Network Monitoring With Zabbix - HowtoForge - Linux Howtos and Tutorials
Shawn Bolton
No ratings yet