You are on page 1of 7

Big data

WHATS THE BIG DATA?

Big data consist in a strategy of integral information management with many kind of new
data and data management with traditional data.

Big data is defined by:

Volume:

The amount of data. While the volume indicates more data, the granular nature is unique.
The big data needs to process high volumes of unstructured and low density Hadoop data
for example data from Twitter, streams in a web site o a mobile application, network traffic.

The task of Big Data is to change these hadoop data into information. Could be milliens
of terabytes or hundreds of petabytes.

Speed:

Often the faster data flows directly into memory instead of being written to disk.
Some Internet applications of things have ramifications of status and security and requires
of evaluation and action in real time. Other smart Internet products work in real time or
virtually in real time. For example, consumer e-commerce applications seek to combine
mobile device placement and personal preferences to make limited-time marketing offers.
In operational terms, mobile application experiences have a greater number of users,
increased network traffic and the expectation of immediate responses.

Variety:

New types of unstructured data. Unstructured or semi-structured data types, such as text,
audio, and video, require additional processing to extract meaning and backup metadata.
Once understood, unstructured data has many of the same requirements as structured
data, such as summary, lineage, auditability, and privacy. Additional complexities arise
when data from a known source changes without prior notice. Frequent or real-time
schema changes are a huge burden for transaction and analysis environments.
Value:

The data has an intrinsic value, but must be discovered. There are a variety of quantitative
and research techniques to extract the value of the data; From discovering a consumer's
preference or opinion to making a significant offer per location or identifying a piece of
equipment that is about to fail. The technological advance is that the cost of calculating
and storing data has decreased considerably, providing an infinity of data from which a
comparison is made between the statistical analysis of all data with the previous samples.
The technological advance makes it possible to make much more precise and adequate
decisions. However, finding value also requires new discovery processes that involve
smart, insightful analysts, business users, and executives. The real challenge of the big
data is human and has to do with learning to ask the right questions, recognize patterns,
formulate informed hypotheses and predict behaviors.

Big Data introduces new technology, processes, and skills to your information
architecture and the people that design, operate, and use them. With new technology,
there is a tendency to separate the new from the old, but we strongly urge you to resist
this strategy. While there are exceptions, the fundamental expectation is that finding
patterns in this new data enhances your ability to understand your existing data. Big Data
is not a silo, nor should these new capabilities be architected in isolation.

ARCHITECTURE OF BIG DATA

Before talk about the big data´s architecture, let us show a little about the Traditional
Information Architecture Capabilities:

Traditional Information Architecture Capabilities

Figure 1: Traditional Information Architecture Components


In the illustration, you see two data sources that use integration (ELT/ETL/Change Data
Capture) techniques to transfer data into a DBMS data warehouse or operational data
store, and then offer a wide variety of analytical capabilities to reveal the data.

Adding Big Data Capabilities

The defining processing capabilities for big data architecture are to meet the volume,
velocity, variety, and value requirements. Unique distributed (multi-node) parallel
processing architectures have been created to parse these large data sets. There are
differing technology strategies for real-time and batch processing storage requirements.
For real-time, key-value data stores, such as NoSQL, allow for high performance, index-
based retrieval. For batch processing, a technique known as “Map Reduce,” filters data
according to a specific data discovery strategy. After the filtered data is discovered, it can
be analyzed directly, loaded into other unstructured or semi-structured databases, sent
to mobile devices, or merged into traditional data warehousing environment and
correlated to structured data.

Figure 2: Big Data Information Architecture Components

In addition to the new components, new architectures are emerging to efficiently


accommodate new storage, access, processing, and analytical requirements.

A polyglot strategy suggest the big data oriented architectures will deploy multiple types
of data stores. Polyglot strategy does add some complexity in management, governance,
security, and skills.

With Lambda based architecture, we are now able to address fast data that might be
needed in an Internet of Things architecture.

The third MPP data pipelines that allow us to threat data events in a moving time windows
at variable latencies; in the long run this will change how we do ETÑ for most use cases.
Figure 3: Big Data Architecture Patterns

The defining processing capabilities for big data architecture are to meet the volume,
velocity, variety, and value requirements.

Unified Reference Architecture

Oracle has a defined view of a reference architecture based on successful deployment


patterns that have emerged. Oracle’s Information Management Architecture, illustrate key
components and flows and highlights the emergence of the Data Lab and various forms
of new and traditional data collection.

Figure 4: Conceptual model for The Oracle Big Data Platform for unified
information management
What is Hadoop

It is a framework that enables distributed processing of data from many different


machines. furtheruses a master-slave architecture. Hadoop HDFS using two
components and MapReduce.

HDFS (Hadoop Distributed File System) is the Hadoop file system allowing to store data
distributed across multiple machines.

It is a process that allows parallelize work on large volumes of data. Map and Reduce
uses two functions.

Map Simplifies data processing and dividing large blocks into smaller blocks in pairs
sorting data files by its key.

Reduces is used to combine files that have the same key, to achieve the result file form.
In the picture above we can see how the Map and reduces works.

Using the process map, the data is divided into smaller data boques to facilitate
processing of information, data blocks are sorted using the key. The process reduces
handles combined data blocks to form a single file with the result of the data.

Hadoop features:
 Economic: It is designed to run on low-cost equipment, forming clusters.

 Scalable: If more processing power or storage capacity is needed just add more
nodes to the cluster. easily achieving scalability.

 Efficient: Hadoop distributes data and processes in parallel on the nodes where
the data is located.

 Reliable: It is able to maintain multiple copies of data and automatically make a


redeployment of tasks. The key aspect of Hadoop is that instead of moving the
data to where the processing is done, Hadoop moves the processing (Tasks)
where are the data.

Advantage
 opensource, so their cost is less than other data management solutions.
 It uses a distributed approach (divide and rule).
 Scalability no concern about supply or sizing by user growth.
 Ability to adapt to traditional DW environments.
 Flexibility, can be agile in incorporating discard any data and not data for its
mutability or ignorance.
 Hadoop provides access to information and process regardless of type, and
using different paradigms.

disadvantages
 A possible excessive exchange of messages between nodes in Hadoop
(replicas, synchronizations, metadata updates, location) that can slow down the
network.
 Not all problems can be resolved or translated into problems MapReduce. I could
still be used as Hadoop distributed file system by HDFS.

Bibliography
http://www.ticout.com/blog/2013/04/02/introduccion-a-hadoop-y-su-ecosistema/

http://www.oracle.com/us/products/database/big-data-for-enterprise-519135.pdf

http://www.oracle.com/technetwork/es/articles/database-performance/oracle-12c-y-hadoop-2167947-
esa.html

http://www.oracle.com/technetwork/topics/entarch/articles/oea-big-data-guide-
1522052.pdf

You might also like