Professional Documents
Culture Documents
Hadoop has been the most preferred platform for handling Big data problems that require a lot storage
and processing capacity. As data that is being retrieved from the users is also largely unstructured,
intermittent and enormous, it is difficult for traditional database architectures like Oracle Database to
process and store the data of this size.While Hadoop offers its own HDFS (Hadoop Distributed File
System) to store these large amounts data, there are various technologies readily available for us in
order to analyze the data available. One can select these technologies based on various factors like type
of data, running time, ease of use and compatibility etc.
The data at hand could be anything from sources like social media, stock exchange, networks and
sensors that is in the range of Giga bytes or more. Big data analytics is the process of finding hidden
patterns across these large sets of data that could be potentially used in favor of our business. These
patterns could be anything ranging from unfulfilled customer demands from consumer behavior data or
hidden cost reduction opportunities from production data.
HDFS (Hadoop Distributed File System):
The data (which is very large) is split into block of specific sizes and distributes them into multiple nodes
across the clusters for processing. It makes multiple copies of each data set in order to counter machine
failures. The data thus stored is available for high performance retrieval for the analyzing using our
technologies.
Data Analyzing Technologies:
While HDFS is the primary storage system used by all the Hadoop applications, we have to select one
among many available analyzing technologies to process our data. You have to chart out the code ith
tour logic for analysis and platform takes care of the rest .Some of the most used technologies with their
pros and cons is given below:
MapReduce:
Cons:
Pig:
Pros:
Cons:
Hive:
Pros:
Cons: