Hadoop and BigData Analysis

Uploaded by

Venkata Nagesh Kocherlakota

0% found this document useful (0 votes)

102 views18 pages

describes challenges of big data and how to deal.

Copyright

Available Formats

PPTX, PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

describes challenges of big data and how to deal.

Copyright:

Attribution Non-Commercial (BY-NC)

Available Formats

Download as PPTX, PDF, TXT or read online from Scribd

Flag for inappropriate content

0% found this document useful (0 votes)

102 views18 pages

Hadoop and BigData Analysis

Uploaded by

Venkata Nagesh Kocherlakota

describes challenges of big data and how to deal.

Copyright:

Attribution Non-Commercial (BY-NC)

Available Formats

Download as PPTX, PDF, TXT or read online from Scribd

Flag for inappropriate content

Jump to Page

You are on page 1of 18

Search inside document

Lets Hadoop

WHATS THE BIG DEAL WITH BIG DATA?

Big data is a term applied to data sets whose size is beyond the ability of commonly used software tools to capture, manage, and process the data within a tolerable elapsed time.

Gartner Predicts 800% data growth over next 5 years

Big Data opens the door to a new approach to engaging customers and making decisions

BIG DATA: WHAT ARE THE CHALLENGES?

How we can capture and deliver data to right people in real-time?

How we can understand and use big data when it is in Variety of forms?
How we can store/analyze the data given its size and computational capacity? While the storage capacities of hard drives have increased massively over the years, access speedsthe rate at which data can be read from drives have not kept up. Example: Need to process 100TB datasets On 1 node: scanning @ 50MB/s = 23 days On 1000 node cluster: scanning @ 50MB/s = 33 min Hardware Problems Hardware Problems / Process and combine data from Multiple disks
Traditional Systems: They cant scale, not reliable and expensive.

WHAT TECHNOLOGIES SUPPORT BIG DATA?

Scale-out everything: Storage Compute

WHAT MAKES HADOOP DIFFERENT?

AccessibleHadoop runs on large clusters of commodity machines or on cloud (EC2 ). RobustHadoop is architected with the assumption of frequent hardware malfunctions. It can gracefully handle most such failures. ScalableHadoop scales linearly to handle larger data by adding more nodes to the cluster. SimpleHadoop allows users to quickly write efficient parallel code. Data LocalityMove Computation to the Data. Replication - Use replication across servers to deal with unreliable storage/servers

IS HADOOP ONE-STOP SOLUTION?

Good for....

Real time Small datasets Algorithms requires large temp space Problems that are CPU bound and have lots of cross talk
11

Not good for...

Hadoop is an open source framework for writing and running distributed applications that process large amounts of data. Framework written in Java Designed to solve problem that involve analyzing large data (Petabytes) Programing model based on Googles Map Reduce Infrastructure based on Googles Big Data and Distributed File System Hadoop consists of two core components. The Hadoop Distributed File System (HDFS) - A distributed file system MapReduce - distributed processing on compute clusters.

NameNode This manages the file system namespace (metadata) and regulates access to files by clients. The NameNode executes file system namespace operations like opening, closing, and renaming files and directories DataNode This manages storage attached to the node in which they run. DataNode serves read, write, perform block operation, delete, and replication upon request from NameNode Many Data Nodes, typically one DataNode for a physical node
13

Large-Scale Data Processing o Want to use 1000s of CPUs o But dont want hassle of managing things MapReduce Architecture provides o Automatic parallelization & distribution o Fault tolerance o I/O scheduling o Monitoring & status updates MapReduce is a method for distributing a task across multiple nodes Each node processes data stored on that node Consists of two phases: o Map o Reduce

In map phase , the mapper reads data in the form of key/value pairs The Reducer process all output from mapper and arrives at final output as final key/value pairs writes to HDFS. There are two types of nodes that control the job execution process: o Jobtracker o Tasktrackers The jobtracker coordinates all the jobs run on the system by scheduling tasks to run on tasktrackers Tasktrackers run tasks and send progress reports to the jobtracker Jobtracker runs in NameNode. Tasktracker runs in DataNode.

Fear: Trump in the White House
From Everand
Fear: Trump in the White House
Bob Woodward
Rating: 3.5 out of 5 stars
3.5/5 (738)
A Man Called Ove: A Novel
From Everand
A Man Called Ove: A Novel
Fredrik Backman
Rating: 4.5 out of 5 stars
4.5/5 (4609)
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
From Everand
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
Dave Eggers
Rating: 3.5 out of 5 stars
3.5/5 (231)
The Sympathizer: A Novel (Pulitzer Prize for Fiction)
From Everand
The Sympathizer: A Novel (Pulitzer Prize for Fiction)
Viet Thanh Nguyen
Rating: 4.5 out of 5 stars
4.5/5 (119)
Never Split the Difference: Negotiating As If Your Life Depended On It
From Everand
Never Split the Difference: Negotiating As If Your Life Depended On It
Chris Voss
Rating: 4.5 out of 5 stars
4.5/5 (838)
Devil in the Grove: Thurgood Marshall, the Groveland Boys, and the Dawn of a New America
From Everand
Devil in the Grove: Thurgood Marshall, the Groveland Boys, and the Dawn of a New America
Gilbert King
Rating: 4.5 out of 5 stars
4.5/5 (265)
The Little Book of Hygge: Danish Secrets to Happy Living
From Everand
The Little Book of Hygge: Danish Secrets to Happy Living
Meik Wiking
Rating: 3.5 out of 5 stars
3.5/5 (399)
Grit: The Power of Passion and Perseverance
From Everand
Grit: The Power of Passion and Perseverance
Angela Duckworth
Rating: 4 out of 5 stars
4/5 (587)
The World Is Flat 3.0: A Brief History of the Twenty-first Century
From Everand
The World Is Flat 3.0: A Brief History of the Twenty-first Century
Thomas L. Friedman
Rating: 3.5 out of 5 stars
3.5/5 (2219)
Yes Please
From Everand
Yes Please
Amy Poehler
Rating: 4 out of 5 stars
4/5 (1891)
The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life
From Everand
The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life
Mark Manson
Rating: 4 out of 5 stars
4/5 (5794)
Principles: Life and Work
From Everand
Principles: Life and Work
Ray Dalio
Rating: 4 out of 5 stars
4/5 (599)
Team of Rivals: The Political Genius of Abraham Lincoln
From Everand
Team of Rivals: The Political Genius of Abraham Lincoln
Doris Kearns Goodwin
Rating: 4.5 out of 5 stars
4.5/5 (234)
Rise of ISIS: A Threat We Can't Ignore
From Everand
Rise of ISIS: A Threat We Can't Ignore
Jay Sekulow
Rating: 3.5 out of 5 stars
3.5/5 (137)
Shoe Dog: A Memoir by the Creator of Nike
From Everand
Shoe Dog: A Memoir by the Creator of Nike
Phil Knight
Rating: 4.5 out of 5 stars
4.5/5 (537)
The Emperor of All Maladies: A Biography of Cancer
From Everand
The Emperor of All Maladies: A Biography of Cancer
Siddhartha Mukherjee
Rating: 4.5 out of 5 stars
4.5/5 (271)
The Glass Castle: A Memoir
From Everand
The Glass Castle: A Memoir
Jeannette Walls
Rating: 4.5 out of 5 stars
4.5/5 (1711)
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
From Everand
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
Brene Brown
Rating: 4 out of 5 stars
4/5 (1090)
A Tree Grows in Brooklyn
From Everand
A Tree Grows in Brooklyn
Betty Smith
Rating: 4.5 out of 5 stars
4.5/5 (1929)
Her Body and Other Parties: Stories
From Everand
Her Body and Other Parties: Stories
Carmen Maria Machado
Rating: 4 out of 5 stars
4/5 (821)
The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers
From Everand
The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers
Ben Horowitz
Rating: 4.5 out of 5 stars
4.5/5 (344)
John Adams
From Everand
John Adams
David McCullough
Rating: 4.5 out of 5 stars
4.5/5 (2409)
The Woman in Cabin 10
From Everand
The Woman in Cabin 10
Ruth Ware
Rating: 3.5 out of 5 stars
3.5/5 (2322)
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
From Everand
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
Margot Lee Shetterly
Rating: 4 out of 5 stars
4/5 (890)
Sing, Unburied, Sing: A Novel
From Everand
Sing, Unburied, Sing: A Novel
Jesmyn Ward
Rating: 4 out of 5 stars
4/5 (1103)
Wolf Hall: A Novel
From Everand
Wolf Hall: A Novel
Hilary Mantel
Rating: 4 out of 5 stars
4/5 (3811)
Angela's Ashes: A Memoir
From Everand
Angela's Ashes: A Memoir
Frank McCourt
Rating: 4.5 out of 5 stars
4.5/5 (440)
Elon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future
From Everand
Elon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future
Ashlee Vance
Rating: 4.5 out of 5 stars
4.5/5 (474)
The Art of Racing in the Rain: A Novel
From Everand
The Art of Racing in the Rain: A Novel
Garth Stein
Rating: 4 out of 5 stars
4/5 (4200)
The Unwinding: An Inner History of the New America
From Everand
The Unwinding: An Inner History of the New America
George Packer
Rating: 4 out of 5 stars
4/5 (45)
The Yellow House: A Memoir (2019 National Book Award Winner)
From Everand
The Yellow House: A Memoir (2019 National Book Award Winner)
Sarah M. Broom
Rating: 4 out of 5 stars
4/5 (98)
The Perks of Being a Wallflower
From Everand
The Perks of Being a Wallflower
Stephen Chbosky
Rating: 4.5 out of 5 stars
4.5/5 (2099)
The Constant Gardener: A Novel
From Everand
The Constant Gardener: A Novel
John le Carre
Rating: 3.5 out of 5 stars
3.5/5 (104)
The Outsider: A Novel
From Everand
The Outsider: A Novel
Stephen King
Rating: 4 out of 5 stars
4/5 (1839)
The Light Between Oceans: A Novel
From Everand
The Light Between Oceans: A Novel
M.L. Stedman
Rating: 4.5 out of 5 stars
4.5/5 (789)
Little Women
From Everand
Little Women
Louisa May Alcott
Rating: 4 out of 5 stars
4/5 (104)
On Fire: The (Burning) Case for a Green New Deal
From Everand
On Fire: The (Burning) Case for a Green New Deal
Naomi Klein
Rating: 4 out of 5 stars
4/5 (73)
Brooklyn: A Novel
From Everand
Brooklyn: A Novel
Colm Tóibín
Rating: 3.5 out of 5 stars
3.5/5 (1937)
Manhattan Beach: A Novel
From Everand
Manhattan Beach: A Novel
Jennifer Egan
Rating: 3.5 out of 5 stars
3.5/5 (792)
Bad Feminist: Essays
From Everand
Bad Feminist: Essays
Roxane Gay
Rating: 4 out of 5 stars
4/5 (1015)
Steve Jobs
From Everand
Steve Jobs
Walter Isaacson
Rating: 4.5 out of 5 stars
4.5/5 (806)
Lecture 1 - Parallel and Distributed Computing
Document25 pages
Lecture 1 - Parallel and Distributed Computing
Sibgha Israr
No ratings yet
Ada Manual
Document180 pages
Ada Manual
Nasir Khan
No ratings yet
Asynchronous Communication Patterns & Anti-Patterns
Document15 pages
Asynchronous Communication Patterns & Anti-Patterns
archananarayanan223
No ratings yet
Advanced Introduction To Java Multi-Threading (Recovered)
Document59 pages
Advanced Introduction To Java Multi-Threading (Recovered)
Vivek Murugesan
No ratings yet
GPU Compute
Document58 pages
GPU Compute
c0de517e.blogspot.com
100% (1)
Java Multithreading
Document29 pages
Java Multithreading
Joel Hubahib
0% (1)
Laboratory Exercise: Thread Priority: 1. Frmtrackthread Window Form
Document4 pages
Laboratory Exercise: Thread Priority: 1. Frmtrackthread Window Form
Namnam Dela Cruz
No ratings yet
103-105 Primer Kod: Frmjavaswingform Textfield Textfield - 1 Textfield - 2
Document6 pages
103-105 Primer Kod: Frmjavaswingform Textfield Textfield - 1 Textfield - 2
nnmm
No ratings yet
Reader-Writer: Wikipedia Page On The Subject
Document5 pages
Reader-Writer: Wikipedia Page On The Subject
Infinite Loop
No ratings yet
Maximo Integration Framework Architecture1 0
Document17 pages
Maximo Integration Framework Architecture1 0
Manoj Valesha
No ratings yet
Jurassic Park Simulation Project
Document8 pages
Jurassic Park Simulation Project
Ayagoz Sultanbayeva
No ratings yet
3 Multi Processor Os
Document7 pages
3 Multi Processor Os
Neha Sharma
No ratings yet
Andes RVV Webinar III
Document49 pages
Andes RVV Webinar III
muhammad Ibrahim
No ratings yet
Dispatcher Server Installation
Document122 pages
Dispatcher Server Installation
Abhishek kumar
No ratings yet
Lectura 5. DynamicAllocation
Document24 pages
Lectura 5. DynamicAllocation
Yris Serrot
No ratings yet
Operating Systems: Chapter Three
Document32 pages
Operating Systems: Chapter Three
Fitawu Tekola
No ratings yet
ANSYS Mechanical APDL Parallel Processing Guide PDF
Document56 pages
ANSYS Mechanical APDL Parallel Processing Guide PDF
Tech Mit
No ratings yet
Anr 6.33 (63300007) 20171119 101032 1611174316
Document14 pages
Anr 6.33 (63300007) 20171119 101032 1611174316
Usman T-ara
No ratings yet
Distributed Computing System Quiz Questions Explained
Document9 pages
Distributed Computing System Quiz Questions Explained
apnijankari info
50% (2)
FND Debug09feb
Document557 pages
FND Debug09feb
Amar Ufaq
No ratings yet
Server Config Chapter 5 Review Questions
Document4 pages
Server Config Chapter 5 Review Questions
gk
No ratings yet
Unit-3-Process Scheduling and Deadloack
Document18 pages
Unit-3-Process Scheduling and Deadloack
YASH VARDHAN
No ratings yet
Linux Tutorial - POSIX Threads
Document12 pages
Linux Tutorial - POSIX Threads
Quang Doan
No ratings yet
ST7 SHP 2.2 MessagePassing MPI p2p Communications 1spp 2
Document53 pages
ST7 SHP 2.2 MessagePassing MPI p2p Communications 1spp 2
josh
No ratings yet
Murex GPC
Document40 pages
Murex GPC
Lekshmi Bibin
No ratings yet
Open Book Component
Document2 pages
Open Book Component
Vishal Mittal
No ratings yet
Parallel Computing Architectures & Programming
Document1 page
Parallel Computing Architectures & Programming
Rohith Raj
No ratings yet
03 - Principles of Concurrent Systems - Processes PDF
Document29 pages
03 - Principles of Concurrent Systems - Processes PDF
SteveFarra
No ratings yet
IPC Inter-Process Communication
Document18 pages
IPC Inter-Process Communication
Tushar Hedau
No ratings yet
How Timer Works in Java
Document4 pages
How Timer Works in Java
Abdul Gafur
No ratings yet