Professional Documents
Culture Documents
How it Works
Apache Hadoop
A LITTLE BEE BOOK
This book belongs to:
How it Works
Apache Hadoop
Adapted from a blog post by Mike Ferguson
For more copies of this book, or to read others in the series, visit: littlebeelibrary.com
BACK NEXT
Apache Hadoop is a set of open source software
technology components that together form a
scalable system optimised for analysing data.
4 BACK NEXT
Lets look at Hadoop in a little more detail.
6 BACK NEXT
Think of YARN (Yet Another Resource Negotiator)
as the operating system for Hadoop. It is the
cluster management software that controls how the
computing resources are allocated to the different
applications and execution engines across
the cluster.
8 BACK NEXT
Hadoop handles data at scale due to the design of
four highly parallel analytics engines: MapReduce,
Tez, Spark and Storm.
10 BACK NEXT
The MapReduce engine runs analytic applications
in batch. To do this, application developers and data
scientists need to write applications as two distinct
program components the map component and the
reduce component.
12 BACK NEXT
Tez is an alternative to the MapReduce engine.
Tez does not need to write and read intermediate
files to disk. For this reason it is generally faster
than MapReduce.
14 BACK NEXT
Storm is an execution engine for real-time streaming
analytic applications. This means data can be
analysed as soon as it is generated and BEFORE it
is stored anywhere. Sensor data and markets data
are examples.
16 BACK NEXT
If you dont want to write analytic applications in a
programming language like R, Python, Java or Scala,
you can always write in Pig scripts.
18 BACK NEXT
Hive is a free SQL interface to data stored in Hadoop
HDFS files. It allows you to connect self-service BI
tools (and applications that query data using SQL) to
Hadoop Hive and then use SQL to access data.
20 BACK NEXT
Finally there is Search. This allows you to build
search indexes by crawling the data in HDFS. Once
the indexes are built you can then explore the data
using a familiar search interface and search queries.
22 BACK NEXT
Copyright IBM Corporation 2017.All Rights Reserved.
IBM, the IBM logo andibm.comare trademarks or registered trademarks of International
Business Machines Corporation in the United States, other countries, or both.
Other product, company or service names may be trademarks or service marks of others.
24