Understanding Hadoop Clusters and The Network-Bradhedlund Com

Understanding Hadoop Clusters and the Network
Part 1. Introduction and Overview

Brad Hedlund
http://bradhedlund.com http://www.linkedin.com/in/bradhedlund @bradhedlund
BRAD HEDLUND .com
Hadoop Server Roles

Clients
Distributed Data Analytics

Map Reduce
Distributed Data Storage

HDFS
Job Tracker
Name Node
Secondary Name Node
masters
Data Node & Task Tracker Data Node & Task Tracker
slaves
BRAD HEDLUND .com
Hadoop Cluster
World
switch
switch
switch
Name Node DN + TT DN + TT DN + TT DN + TT
switch
Job Tracker DN + TT DN + TT DN + TT DN + TT
switch
Secondary NN
switch
Client DN + TT DN + TT DN + TT DN + TT
switch
DN + TT
DN + TT
DN + TT DN + TT DN + TT DN + TT DN + TT DN + TT DN + TT DN + TT
Rack 1
Rack 2
Rack 3
Rack 4
Rack N
BRAD HEDLUND .com
Typical Workflow
Load data into the cluster (HDFS writes) Analyze the data (Map Reduce) Store results in the cluster (HDFS writes) Read the results from the cluster (HDFS reads)
Sample Scenario:
How many times did our customers type the word Fraud into emails sent to customer service?
Huge file containing all emails sent to customer service
BRAD HEDLUND .com
File.txt
Writing files to HDFS I want to write

File.txt
Blk A
Blocks A,B,C of File.txt Client

Blk B Blk C
OK. Write to Data Nodes 1,5,6
Name Node
Data Node 1
Blk A
Data Node 5
Blk B
Data Node 6
Blk C
Data Node N
Client consults Name Node Client writes block directly to one Data Node Data Nodes replicates block Cycle repeats for next block
BRAD HEDLUND .com
Hadoop Rack Awareness Why?

switch switch
Data Node 1 B A Data Node 2 B Data Node 3 Data Node 5
Name Node
switch
Data Node 9 B C Data Node 10 C Data Node 11 Data Node 12
switch
Data Node 5 C A Data Node 6 A Data Node 7 Data Node 8
Rack aware
Rack 1: Data Node 1 Data Node 2 Data Node 3 Rack 5: Data Node 5 Data Node 6 Data Node 7
metadata
File.txt= Blk A: DN1, DN5, DN6 Blk B: DN7, DN1, DN2 Blk C: DN5, DN8,DN9
Rack 1
Rack 5
Rack 9
Never loose all data if entire rack fails Keep bulky flows in-rack when possible Assumption that in-rack is higher bandwidth, lower latency
BRAD HEDLUND .com
File.txt
Blk A
I want to write File.txt Block A
Preparing HDFS writes OK. Write to

Data Nodes 1,5,6 Client Name Node
Ready!
Blk B
Blk C
Ready Data Nodes 5,6
switch
Rack aware
Rack 1: Data Node 1 Rack 5: Data Node 5 Data Node 6
switch
Data Node 1
Ready Data Node 6 Ready?
switch
Data Node 5 Ready!
Data Node 6
Rack 1
Rack 5
BRAD HEDLUND .com
Name Node picks two nodes in the same rack, one node in a different rack Data protection Locality for M/R
Pipelined Write
File.txt Client
Blk A Blk B Blk C
Name Node
switch
Rack aware
Rack 1: Data Node 1
switch
switch
Data Node 5
A
Rack 5: Data Node 5 Data Node 6
Data Nodes 1 & 2 pass data along as its received TCP 50010
Data Node 1
A
Data Node 6
A
Rack 1
Rack 5
BRAD HEDLUND .com
Pipelined Write
File.txt Blk A: DN1, DN2, DN3
File.txt Client
Blk A Blk B Blk C
Name Node Success

switch
Block received
switch
Data Node 2
A
Rack 1: Data Node 1 Rack 5: Data Node 2 Data Node 3
switch
Data Node 1
A
Data Node 3
A
Rack 1
Rack 5
BRAD HEDLUND .com
Multi-block Replication Pipeline

File.txt
Blk A Blk B
Client
Blk C
switch
1TB File = 3TB storage 3TB network traffic
switch
Blk A Blk A
switch
switch
Data Node 1
Data Node X
Blk C
Data Node 2
Blk B
Blk A Blk B
Data Node Y
Blk C
Data Node 3
Blk C
Data Node W
Blk B
Data Node Z
Rack 1
Rack 4
BRAD HEDLUND .com
Rack 5
Client writes Span the HDFS Cluster

Client
switch
Data Node Data Node Data Node Data Node Data Node Data Node
switch
switch
switch
switch
Rack 1
Rack 2
Rack 3
Rack 4
Rack N
Factors:
Block size File Size
File.txt
More blocks = Wider spread
BRAD HEDLUND .com
Data Node writes span itself, and other racks

switch
Data Node A B C Data Node Data Node Data Node Data Node
switch
Data Node C Data Node Data Node Data Node Data Node C Data Node
switch
Data Node Data Node B Data Node Data Node B Data Node Data Node
switch
Data Node Data Node A Data Node Data Node Data Node Data Node A
switch
Rack 1
Rack 2
Rack 3
Rack 4
Rack N
Results.txt
Blk A Blk B Blk C
BRAD HEDLUND .com
Name Node
Awesome! Thanks.
metadata Name Node

DN1: A,C DN2: A,C DN3: A,C
File system
File.txt = A,C
I have blocks: A, C
Data Node 1
A C
Im alive!
Data Node 2
A C
Data Node 3
A C
Data Node N
Data Node sends Heartbeats Every 10th heartbeat is a Block report Name Node builds metadata from Block reports TCP every 3 seconds If Name Node is down, HDFS is down
BRAD HEDLUND .com
Re-replicating missing replicas

Uh Oh! Missing replicas
metadata Name Node

DN1: A,C DN2: A,C DN3: A, C
Rack Awareness
Rack1: DN1, DN2 Rack5: DN3, Rack9: DN8
Copy blocks A,C to Node 8 Data Node 1

A C
Data Node 2
A C
Data Node 3
A C
Data Node 8
A C
Missing Heartbeats signify lost Nodes Name Node consults metadata, finds affected data Name Node consults Rack Awareness script Name Node tells a Data Node to re-replicate
BRAD HEDLUND .com
Secondary Name Node

File system metadata Name Node
File.txt = A,C
Secondary Name Node
Its been an hour, give me your metadata
Not a hot standby for the Name Node Connects to Name Node every hour* Housekeeping, backup of Name Node metadata Saved metadata can rebuild a failed Name Node
BRAD HEDLUND .com
Client reading files from HDFS

Tell me the block locations of Results.txt Client Name Node
switch
Data Node 1 B A
Data Node 2 B Data Node Data Node
Blk A = 1,5,6 Blk B = 8,1,2 Blk C = 5,8,9
switch
Data Node 5 C A
Data Node 6 A
switch
Data Node 8 B C Data Node 9 C
metadata
Results.txt= Blk A: DN1, DN5, DN6 Blk B: DN7, DN1, DN2 Blk C: DN5, DN8,DN9
Data Node
Data Node
Data Node
Data Node
Rack 1
Rack 5
Rack 9
Client receives Data Node list for each block Client picks first Data Node for each block Client reads blocks sequentially
BRAD HEDLUND .com
Data Node reading files from HDFS

Tell me the locations of Block A of File.txt
switch
Data Node 1 B A
Data Node 2 B Data Node 3 Data Node
Block A = 1,5,6
switch
Name Node
switch
Data Node 8 B C Data Node 9 C
switch
Data Node 5 C A
Data Node 6 A
Rack aware
Rack 1: Data Node 1 Data Node 2 Data Node 3 Rack 5: Data Node 5
metadata
File.txt= Blk A: DN1, DN5, DN6 Blk B: DN7, DN1, DN2 Blk C: DN5, DN8,DN9
Data Node
Data Node
Data Node
Data Node
Rack 1
Rack 5
Rack 9
Name Node provides rack local Nodes first Leverage in-rack bandwidth, single hop
BRAD HEDLUND .com
Data Processing: Map

How many times does Fraud appear in File.txt?
Name Node
Client
Job Tracker
Count Fraud in Block C
Map Task
Data Node 1
Map Task
Data Node 5
Map Task
Data Node 9
Fraud = 3
Fraud = 0
Fraud = 11
File.txt
Map: Run this computation on your local data Job Tracker delivers Java code to Nodes with local data
BRAD HEDLUND .com
What if data isnt local?

How many times does Fraud appear in File.txt?
Name Node
Client
Job Tracker
switch
Count Fraud in Block C
switch
switch
I need block A
Data Node 1
Map Task
Data Node 5
Map Task
Data Node 9
Fraud = 0
Fraud = 11
no Map tasks left

Data Node 2
Rack 1
Rack 5
Rack 9
Job Tracker tries to select Node in same rack as data Name Node rack awareness
BRAD HEDLUND .com
Data Processing: Reduce

Client Job Tracker Sum Fraud
Results.txt Fraud = 14
X Y Z
HDFS
Data Node 3
Reduce Task
Fraud = 0 Map Task

Data Node 1
Map Task
A
Data Node 5
Map Task
B
Data Node 9
Reduce: Run this computation across Map results Map Tasks deliver output data over the network Reduce Task data output written to and read from HDFS
BRAD HEDLUND .com
Unbalanced Cluster
switch switch NEW
switch
switch
switch NEW
Data Node
**I was assigned a Map Task but dont have the block. Guess I need to get it.
Data Node
Data Node Data Node Data Node Data Node
Rack 1
Rack 2
New Rack
New Rack
*Im bored!
File.txt
Hadoop prefers local processing if possible New servers underutilized for Map Reduce, HDFS* Might see more network bandwidth, slower job times**
BRAD HEDLUND .com
Cluster Balancing
switch switch NEW
switch
switch
switch NEW
Data Node
Data Node
Data Node Data Node Data Node Data Node
Rack 1
Rack 2
New Rack
New Rack
File.txt
brad@cloudera-1:~$hadoop balancer
Balancer utility (if used) runs in the background Does not interfere with Map Reduce or HDFS Default speed limit 1 MB/s
BRAD HEDLUND .com
Thanks!
Narrated at: http://bradhedlund.com/?p=3108
BRAD HEDLUND .com

Understanding Hadoop Clusters and The Network-Bradhedlund Com

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Understanding Hadoop Clusters and The Network-Bradhedlund Com

Uploaded by

Copyright:

Available Formats

Understanding Hadoop Clusters and the Network

Part 1. Introduction and Overview

BRAD HEDLUND .com

Hadoop Server Roles

Distributed Data Analytics

Distributed Data Storage

Secondary Name Node

BRAD HEDLUND .com

BRAD HEDLUND .com

Writing files to HDFS I want to write

Blocks A,B,C of File.txt Client

OK. Write to Data Nodes 1,5,6

Hadoop Rack Awareness Why?

I want to write File.txt Block A

Preparing HDFS writes OK. Write to

Ready Data Nodes 5,6

Rack 5: Data Node 5 Data Node 6

Name Node Success

Rack 1: Data Node 1 Rack 5: Data Node 2 Data Node 3

Multi-block Replication Pipeline

1TB File = 3TB storage 3TB network traffic

Client writes Span the HDFS Cluster

Block size File Size

Data Node writes span itself, and other racks

BRAD HEDLUND .com

metadata Name Node

Re-replicating missing replicas

metadata Name Node

Copy blocks A,C to Node 8 Data Node 1

Secondary Name Node

Secondary Name Node

Its been an hour, give me your metadata

Client reading files from HDFS

Blk A = 1,5,6 Blk B = 8,1,2 Blk C = 5,8,9

Data Node reading files from HDFS

Data Processing: Map

Count Fraud in Block C

What if data isnt local?

Count Fraud in Block C

no Map tasks left

Data Processing: Reduce

Fraud = 0 Map Task

BRAD HEDLUND .com

You might also like