You are on page 1of 5

An Overview of Security of Big Data

Muhammad Irfan

Shahid Mehmood

Abstract:
Big data is a huge amount of data in the world in form of audio, video, images, signals
etc. It compromises on transection of schools, colleges, universities, hospitals, transporters,
industries and many more organization. In this paper summarize the big data, its
characteristics, Hadoop and security in big data.

Introduction:
In today's world, human activities and all aspects of the whole society are actively
or passively through the data in the record, covering a variety of information sensor to the
social movement of the data to mark or save. Has formed a world of information and
information to live. Every day in all kinds of systems, such as a large amount of information
data, so that all walks of life are starting to build big data applications platform. To reveal
the information rules which contain large amounts of data within the industry, mining large
data contains a huge amount of wealth from. The new type of big data platform can monitor
and forecast the dynamic and changing nodes, so as to control and adjust its operating rules.
Create a variety of visual charts to reveal the inherent relationship of various data in the industry,
and promote business value. Wisdom statistics, decision, give advice predicted events that may
occur in the future or to predict when and where will ensue and intellectual property
management. Therefore, now the information industry has announced a major event: the era of
big data wealth.[1]

Cloud Computing:
Cloud computing is one of the massive and major research areas in both the industrial and
academic fields and many researchers have been working toward its research issues. Cloud comes
with an explicit security challenge, i.e. the data owner might not have any control of where the data is
placed. The reason behind this control issue is that if one wants to get the benefits of cloud
computing, he/she must also utilize the allocation of resources and also the scheduling given by the
controls. Hence it is required to protect the data in the midst of untrustworthy processes. Since cloud
involves extensive complexity, we believe that rather than providing a holistic solution to securing the
cloud, it would be ideal to make noteworthy enhancements in securing the cloud that will ultimately
provide us with a secure cloud.

In Cloud Computing, the word Cloud means The Internet, so Cloud Computing means a
type of computing in which services are delivered through the Internet. The goal of Cloud Computing
is to make use of increasing computing power to execute millions of instructions per second. Cloud
Computing uses networks of a large group of servers with specialized connections to distribute data
processing among the servers. Instead of installing a software suite for each computer, this technology
requires to install single software in each computer that allows users to log into a Web-based service
and which also hosts all the programs required by the user. There's a significant workload shift, in a
cloud computing system.

The only thing that must be done at the user's end is to run the cloud interface software to
connect to the cloud. Cloud Computing consists of a front end and back end. The front end includes
the user's computer and software required to access the cloud network. Back end consists of various
computers, servers and database systems that create the cloud. The user can access applications in the
cloud network from anywhere by connecting to the cloud using the Internet. Some of the real time
applications which use Cloud Computing are Gmail, Google Calendar, Google Docs and Dropbox etc.

Hadoop:
HADOOP is came to solve the problem of storage and analyse of Big data. Hadoop is a open
source framework which means it is free to download and is designed to store big data by using
HDFS (Hadoop Distributed File system) and process data by using Map Reduce. Hadoop Technology
is for batch processing of data. It is a open source framework by Apache software foundation. It is
used for storing and processing Huge datasets. It is a combination of HDFS and Map Reduce. which
is tailored for frequently occurring large-scale data processing problems that can be distributed and
parallelized.

Hadoop- Google has introduced MapReduce framework for processing large amounts of data
on commodity hardware and Apaches Hadoop distributed file system (HDFS) is evolving as a
superior software component for cloud computing combined along with integrated parts such as
MapReduce.1. MapReduce job first divides the data into individual chunks which are processed by
Map jobs in parallel. The outputs of the maps sorted by the framework are then input to the reduce
tasks. Generally the input and the output of the job are both stored in a file-system. Scheduling,
Monitoring and re-executing failed tasks are taken care by the framework.2. HDFS is a file system
that spans all the nodes in a Hadoop cluster for data storage. It links together file systems on local
nodes to make it into one large file system. HDFS improves reliability by replicating data across
multiple sources to overcome node failures Cloud storage- Cloud comes with an explicit security
challenge, i.e. the data owner might not have any control of where the data is placed. The reason
behind this control issue is that if one wants to get the benefits of cloud computing, he/she must also
utilize the allocation of resources and also the scheduling given by the controls. Hence it is required to
protect the data in the midst of untrustworthy processes. The only thing that must be done at the user's
end is to run the cloud interface software to connect to the cloud.

Big Data:
Big data is an all-encompassing term for any collection of data sets so large and complex that
it becomes difficult to process them using traditional data processing applications. Big data is a
buzzword, or catch-phrase, used to describe a massive volume of both structured and unstructured
data that is so large that it's difficult to process using traditional database and software techniques. In
most enterprise scenarios the data is too big or it moves too fast or it exceeds current processing
capacity.

Big data has the potential to help companies improve operations and make Big data is
difficult to work with using most relational database management systems and desktop statistics and
visualization packages, requiring instead "massively parallel software running on tens, hundreds, or
even thousands of servers". What is considered "big data" varies depending on the capabilities of the
organization managing the set, and on the capabilities of the applications that are traditionally used to
process and analyze the data set in its domain. Big Data is a moving target; what is considered to be
"Big" today will not be so years ahead. "For some organizations, facing hundreds of gigabytes of data
for the first time may trigger a need to reconsider data management options.[4]

Enterprise
Data

Transection Public Data

Big Data
Sensor Data
Social Media

Big Data Characteristics:


Big data can be described by the following characteristics:

1. Volume
2. Variety
3. Velocity
4. Variability
5. Veracity
6. Complexity

Volume:

The quantity of data that is generated is very important in this context. It is the size of the data which
determines the value and potential of the data under consideration and whether it can actually be
considered as Big Data or not. The name Big Data itself contains a term which is related to size and
hence the characteristic.

Variety:
The next aspect of Big Data is its variety. This means that the category to which Big Data belongs to
is also a very essential fact that needs to be known by the data analysts. This helps the people, who are
closely analyzing the data and are associated with it, to effectively use the data to their advantage and
thus upholding the importance of the Big Data.

Velocity:

- The term velocity in the context refers to the speed of generation of data or how fast the data is
generated and processed to meet the demands and the challenges which lie ahead in the path of
growth and development.

Variability:

This is a factor which can be a problem for those who analyse the data. This refers to the
inconsistency which can be shown by the data at times, thus hampering the process of being able to
handle and manage the data effectively.

Veracity:

The quality of the data being captured can vary greatly. Accuracy of analysis depends on the veracity
of the source data.

Complexity:

Data management can become a very complex process, especially when large volumes of data come
from multiple sources. These data need to be linked, connected and correlated in order to be able to
grasp the information that is supposed to be conveyed by these data. This situation, is therefore,
termed as the complexity of Big Data.[5]

NEED FOR SECURITY IN BIG DATA


For marketing and research, many of the businesses use big data, but may not have the
fundamental assets particularly from a security perspective. If a security breach occurs to big data, it
would result in even more serious legal problems and reputational damage. In this new era, many
companies are using the technology to store and analyze pet bytes of data about their company,
business and their customers. As a result, information classification becomes even more critical. For
making big data secure, techniques such as encryption, logging, and honeypot detection must be
necessary. In many organizations, the deployment of big data for fraud detection is very attractive and
useful. The challenge of detecting and preventing advanced threats and malicious intruders must be
solved using big data style analysis. These techniques help in detecting the threats in the early stages
using more sophisticated pattern analysis and analyzing multiple data sources.

Not only security but also data privacy challenges existing industries and federal
organizations. With the increase in the use of big data in business, many companies are wrestling with
privacy issues. Data privacy is a liability, thus companies must be on privacy defensive. But unlike
security, privacy should be considered as an asset; therefore it becomes a selling point for both
customers and other stakeholders. There should be a balance between data privacy and national
security.

The challenges of security in cloud computing environments can be categorized into network
level, user authentication level, data level, and generic issues.
Network level:

The challenges that can be categorized under a network level deal with network protocols and
network security, such as distributed nodes, distributed data, Internode communication.

Authentication level:

The challenges that can be categorized under user authentication level deals with encryption and
decryption techniques, authentication methods such as administrative rights for nodes, authentication
of applications and nodes, and logging.

Data level:

The challenges that can be categorized under data level deals with data integrity and availability such
as data protection and distributed data.

Generic types:

The challenges that can be categorized under general level are traditional security tools, and use of
different technologies.[6]

Conclusion:
Information security in big data environment is a promising fields in information security.
This paper introduces impact to information security from two aspects of big data and cloud
computing. In general, improving system efficiency and provide general cloud storage functions on
premise to ensure user data and access authority are the research direction of future safe cloud
computing. At present, more things need to be done in cryptograph searching and reduplicate data
removing.

After all, there is an urgent need of improved solutions concerning the users to control the use
of their data and more research should be done in this field and there is also a need for more robust
approaches in key management limitation, which could extend traditional approaches to Cloud
computing.

References:
1. Research on the security technology of big data information.

2. Security issues associated with big data in cloud computing.

3. A Survey on Security of Big Data on Cloud.

4. Security issues associated with big data in cloud computing.

5. Security issues associated with big data in cloud computing.

6. Big Data on Cloud Computing- Security Issues.

You might also like