You are on page 1of 3

Different Hadoop Technologies for Big Data:

Hadoop has been the most preferred platform for handling Big data problems that require a lot storage
and processing capacity. As data that is being retrieved from the users is also largely unstructured,
intermittent and enormous, it is difficult for traditional database architectures like Oracle Database to
process and store the data of this size.While Hadoop offers its own HDFS (Hadoop Distributed File
System) to store these large amounts data, there are various technologies readily available for us in
order to analyze the data available. One can select these technologies based on various factors like type
of data, running time, ease of use and compatibility etc.
The data at hand could be anything from sources like social media, stock exchange, networks and
sensors that is in the range of Giga bytes or more. Big data analytics is the process of finding hidden
patterns across these large sets of data that could be potentially used in favor of our business. These
patterns could be anything ranging from unfulfilled customer demands from consumer behavior data or
hidden cost reduction opportunities from production data.
HDFS (Hadoop Distributed File System):
The data (which is very large) is split into block of specific sizes and distributes them into multiple nodes
across the clusters for processing. It makes multiple copies of each data set in order to counter machine
failures. The data thus stored is available for high performance retrieval for the analyzing using our
technologies.
Data Analyzing Technologies:
While HDFS is the primary storage system used by all the Hadoop applications, we have to select one
among many available analyzing technologies to process our data. You have to chart out the code ith
tour logic for analysis and platform takes care of the rest .Some of the most used technologies with their
pros and cons is given below:

MapReduce:

A properly written code is very fast and efficient.


Ability to handle all kinds of data types.
Better environment for writing complex business logic.

Cons:

Rigid procedural Structure and hence low levels of abstraction.


More development efforts
More lines of code

Pig:
Pros:

Few lines of code hence better levels of abstraction.


Can be readily developed.
Works on all data types.

Cons:

Not as fast as MapReduce.


Not the best environment for writing complex business logic.

Hive:
Pros:

Fewer lines of code (less than both Pig and MapReduce)


Can be readily developed.

Cons:

Not as fast as MapReduce.


Works only on structured and semi-structured data.

You might also like