Big Data & Hadoop & How We Use It at Alchetron

Brief
BIG DATA & HADOOP

Alchetron.com
Free Social Encyclopedia
BIG DATA
HADOOP
HDFS
MAPREDUCE
ALCHETRON
FEEDBACKS
Q/A
BIG DATA & HADOOP
+
To understand BIG
DATA we will have
to understand data
THIS DRAWING WAS CREATED 40,000

YEARS AGO THIS WAS THE FIRST TIME
WHEN HUMANS STARTED RECORDING DATA
STONE TABLETS
AS TIME PASSED WE STARTED CREATING
MORE DATA AS YOU CAN SEE IN THIS PIC
WHICH IS 3000-10,000 YEARS OLD
Johannes
Gutenberg
This man invented

printing machine in
1439 that means
more data is
collected than
before
100 crore
books printed
till 18th
century & my
dear friends
you are still
not born
..
HIS GUY INVENTS INTERNET IN 1991
SIR Tim Berners-Lee Invents Internet in

1991 now
with internet the amount of data
generated
30 years of mobile
Technology
30 years of mobile
Technology
Next 20 years Computing will move on to

Microscopic level
Computers wont be in our pockets but inside
our body & mind
This is where Technology & Biology will merge
which will multiply and enhance our
capabilities a thousand times
30 years of mobile
Technology
Technological change
will be so rapid &
exponential
With invention of internet + small & less

expensive storage devices !!
Data creation explodes
Data generation statisticsDith invention

of internet + small & less expensive
2.7 Zetabytes of data exist in the digital universe today
storage devices !!
Data
Facebook stores,
accesses, and analyzes 50+ Petabytes of
creation
explodes
user generated data.
Walmart handles more than 1 million customer transactions

every hour, which is imported into databases estimated to
contain more than 2.5 petabytes of data
More than 5 billion people are calling, texting, tweeting and
browsing on mobile phones worldwide.
YouTube users upload 48 hours of new video every minute
of the day.
In 2008,Google was processing 20,000 terabytes of data
(20 petabytes) a day
With
invention
of BIG
internet
SO
WHAT IS
DATAdata
??
creation explodes
Every day, we create 2.5 quintillion bytes of data
so much that 90% of the data in the world
today has been created in the last two years
alone. This data comes from everywhere :
sensors used to gather climate information, posts
to social media sites, digital pictures and videos,
purchase transaction records, and cell phone GPS
signals to name a few.
This data isbig data.
With invention of internet data

creation explodes

creation explodes

creation explodes

creation explodes
Who will manage BIG

DATA
HADOOP
Open Source Apache Project
Written in Java
Runs on
Linux, Mac OS/X, Windows, and
Solaris
Commodity hardware
Contents
History of Hadoop
The current applications of Hadoop
Hadoop HDFS + MAP-REDUCE
Other hadoop projects
Fun Fact of Hadoop

"The name my kid gave a stuffed yellow
elephant. Short, relatively easy to spell
and pronounce, meaningless, and not used
elsewhere: those are my naming criteria.
---- Doug Cutting, Hadoop project
creator
History of Hadoop
Re
ad
sp
Map-reduce
ap
er
2004
It is an important technique!
Doug
Cutting
Joins
Yaho
o
! at 2
006
Extended
Apache Nutch
The great journey begins
History of Hadoop
Yahoo! became the primary
contributor in 2006
History of Hadoop
Yahoo! deployed large scale science
clusters in 2007.
Tons of Yahoo! Research papers
emerge:
WWW
CIKM
SIGIR
Yahoo! began running major

production jobs in Q1 2008.
Hadoop consists of 2 parts.

They are HDFS & MapReduce.
HDFS
Namenodes & Datanodes are nothing but machines

which helps the client to store data.
Metadata is stored in namenode & actual data is
stored in datanodes
A TaskTracker is a daemon and works on datanode and

is a node in the cluster that accepts tasks - Map,
Reduce and Shuffle operations - from a Jobtracker.
A JobTracker is a daemon and works on namenode
and also farms outMapReducetasks to specific nodes
in the cluster, ideally the nodes that have the data,
or at least are in the same rack.
Map-Reduce Architecture
Map-reduce is basically a data
processing engine
To understand it deeply you should
know java coding with experience
Lets try to learn the architecture of
map-reduce
An example
BORED
ALMOST THERE
BORED
ALMOST THERE
JUST ONE MORE

CODE
Another Example code
Now a days (as per latest job

market)
Software Developer Intern - IBM - Somers, NY +3 locations- Agile

development - Big data / Hadoop / data analytics a plus
Software Developer - IBM - San Jose, CA +4 locations - include Hadooppowered distributed parallel data processing system, big data analytics ...
multiple technologies, including Hadoop
Other Hadoop Projects

Ecosystem
Hadoop Core
Distributed File System

MapReduce Framework
Pig (initiated by Yahoo!)
Parallel Programming Language and Runtime
Hbase (initiated by Powerset)

Table storage for semi-structured data
Zookeeper (initiated by
Yahoo!)
Coordinating distributed systems
Hive (initiated by Facebook)

SQL-like query language and metastore
TYPICAL HADOOP CLUSTER HANDLING & PROCESSING PETA

BYTES OF DATA
1000 TB = 1 PETA BYTE APPROX..
Now a days
Who use Hadoop?
Amazon/A9
Alchetron
Fox interactive media
Google
IBM
Facebook
Quantcast
Rackspace/Mailtrust
Veoh
Yahoo!
More at http://wiki.apache.org/hadoop/PoweredBy
Lets see how we

Implemented this at
When you visit

Alchetron.com
you are interacting
with data processed
with Hadoop
Searc
h
Index
Searc
h
Index
When
youyou
visitvisit
When
Alchetron.com
Alchetron.com
youyou
areare interacting
interacting
with data
with data
processed
processed
with
Hadoop!!
with
Hadoop
!!
Organizi
ng data
Content
Filtering
References
For more information:
http://hadoop.apache.org/
http://developer.yahoo.com/hadoop/
http://alchetron.com/What-is-Big-data1530-W
http://alchetron.com/Big-Data-Hadoop260-W

Big Data & Hadoop & How We Use It at Alchetron

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Big Data & Hadoop & How We Use It at Alchetron

Uploaded by

Copyright:

Available Formats

Brief

BIG DATA & HADOOP

BIG DATA & HADOOP

THIS DRAWING WAS CREATED 40,000

This man invented

HIS GUY INVENTS INTERNET IN 1991

SIR Tim Berners-Lee Invents Internet in

Next 20 years Computing will move on to

With invention of internet + small & less

Data creation explodes

Data generation statisticsDith invention

Walmart handles more than 1 million customer transactions

This data isbig data.

With invention of internet data

With invention of internet data

With invention of internet data

With invention of internet data

Who will manage BIG

Fun Fact of Hadoop

The great journey begins

Yahoo! began running major

Hadoop consists of 2 parts.

Namenodes & Datanodes are nothing but machines

A TaskTracker is a daemon and works on datanode and

JUST ONE MORE

Another Example code

Now a days (as per latest job

Software Developer Intern - IBM - Somers, NY +3 locations- Agile

Other Hadoop Projects

Distributed File System

Pig (initiated by Yahoo!)

Parallel Programming Language and Runtime

Hbase (initiated by Powerset)

Hive (initiated by Facebook)

TYPICAL HADOOP CLUSTER HANDLING & PROCESSING PETA

Lets see how we

When you visit

You might also like