You are on page 1of 15

International Journal of Computer Systems (ISSN: 2394-1065), Volume 03 Issue 01, January, 2016

Available at http://www.ijcsonline.com/

Emergent Trends and Challenges in Big Data Analytics, Data Mining,


Virtualization and Cyber Crimes: An Integrated Global Perspective- II
Gurdeep S Hura
Department of Mathematics and Computer Science,
University of Maryland Eastern Shore, Princess Anne, MD 21853
gshura@umes.edu

Abstract
As stated in the first part of the paper, we presented a state-of-the-art starting from the concepts used in Big data
analytics starting with how this concept evolved, its applications, available tools, limitations and the current status so
that researchers and developers can understand the how this new technology can be used for new applications and also
deriving new technology, tools and frameworks. After introducing the basic concepts in big data analytics, the paper
focused on the data mining techniques that have been used in the past for offering systematic approach for system
analysis now find its use in a big way in data representation, data collection and data analysis of multimedia
applications. With different forms and formats of data from different sources, the newer data mining techniques for
collection and analysis of huge amount of multimedia data need to be introduced. It is hoped these new efficient and
formal data mining techniques will be used for understanding the big data analytics with a view to offer easy
understanding, easy data formatting, interpreting and extracting useful information from collected data in the
applications. We also presented various unresolved issues and problems dealing with big data analytics and data mining,
challenges and possible future applications. The paper further also presents the future research initiatives.

I.

ABSTRACT (PART II)

The second part II of the paper presents state-of-the art


of remaining two important technologies virtualization and
data security that have implemented in big data analytics.
One of the implementation phases for big data solutions
is data processing. There are many methods that can be
used to data processing. A simple and user friendly visual
and dynamic representation of data can be implemented by
data virtualization. This method provides not only easy
representation of data, but also dynamic behavior of the
data movement and helps to extract useful information
from the data. The virtualization tools represent the data
processing process in a very simple way for data analysis.
The paper discusses different architectures of virtualization
tool, methodologies, main frame virtualization, guidelines
and various available abstraction tools of virtualization that
have been used in big data applications.
Business and technology professionals and practitioners
are deeply concerned about data security. Since data is
coming from different devices like mobile data generation,
real-time connectivity, digital business, and other sources
have changed the entire environment difficult and harder to
protect the data assets over internet. We have seen some
security measures that have been implemented in big data
analytics and it is expected that the future big data
applications have an increasingly crucial and important role
in providing data security. Recent years have seen some
efforts in data analytics that have implemented various
counter measures for data security such as intrusion
detection, differential privacy, preventive measures,
authentication,
digital
watermarking,
malware
countermeasures and many other measures. In order to

implement operational strategies under serious crisis, data


security becomes very critical. Some organizations and
professionals are having a little bit of difficulties to be
competitive in the absence of data security and are engaged
in including advanced analytics capabilities that will
manage privacy and security challenges. By following this
approach, they are able to create confidence in
clients/customers/consumer with some level of trust. In
order to provide reassurance to customers/consumers
around privacy and data security issues, it is important to
establish a framework that will not only provide security
but it evaluates and meet business, big data technology and
needs of consumers/customers.
With a brief discussion and role of data security in big
data applications, this paper describes in brief the cyber
malicious attacks and crimes. The paper presents the
challenges and problems associated with creating a secured
communication environment over internet for big data
applications. Further, it describes briefly various attacks
and crimes over internet known as Cyber Attacks and
Crimes. The paper also presents all the known Cyber
Attacks, cyber-crimes that may affect the data processing,
data mining techniques and virtualization tools of big data
applications over internet. After understanding these
attacks and crimes, paper presents how the big data
implementation includes security issues in new
applications. Further, it also presents Cyber security
analysis for big data applications.
II.

BACKGROUND OF BIG DATA ANALYTICS


AND DATA MINING (PART I):

The first part of the paper discussed the background of


this new technology of big data analytics and then present
how it has been used to implement advanced data mining

33 | International Journal of Computer Systems, ISSN-(2394-1065), Vol. 03, Issue 01, January, 2016

Gurdeep S Hura et al

Emergent Trends and Challenges in Big Data Analytics, Data Mining, Virtualization and Cyber Crimes: An Integrated
Global Perspective- II

techniques, virtualization frameworks and data security for


various applications like public sector, manufacturing,
retails, healthcare, weather and scientific applications, etc.
The paper further described operations in big data,
discussion of known big data applications, and various
available open source tools that have been used to
implement and solve big data applications and implements
various data mining techniques.
After introducing the basic concepts in big data
analytics, the paper focused on the data mining techniques
that have been used in the past for offering systematic
approach for system analysis now find its use in a big way
in data representation, data collection and data analysis of
multimedia applications. With different forms and formats
of data from different sources, the newer data mining
techniques for collection and analysis of huge amount of
multimedia data were introduced. The paper also described
briefly the suitable data mining techniques and presents
how some of the existing techniques will be redefined with
a view to use in applications like multimedia data
applications, social networking, scientific weather data and
many other similar applications. The paper presented in
conclusion various unresolved issues and problems dealing
with big data analytics and data mining, challenges and
possible future applications. The paper further also presents
the future research initiatives.
This paper presents state-of-the art of remaining two
important technologies virtualization and data security that
have implemented in big data analytics in the following
sections.
III.

DATA VISUALIZATION

A. Basic concepts and definitions of data virtualization


Data visualization is one of the important steps in data
analysis that allows the developers to present the data in
clear and efficient format for the users. Data visualization
technique translates or maps the data or information into a
visual objects such as lines, bars, points and other similar
symbols that are contained in computer graphics [1-15, 18].
It is one of the steps in data analysis or data science and
focusses on conveying ideas effectively, both aesthetic
form and functionality and providing insights into a rather
sparse and complex data set by communicating its keyaspects in a more intuitive way. Generally developers often
fail to achieve a balance between form and function by not
designing proper visualization which actually links the
information.
Data visualization is closely related to information
graphics, information visualization, scientific visualization,
exploratory data analysis and statistical graphics. In the last
couple of decades, this has become an active area of
research, teaching and development of big data sets. It
deals with presentations of articles, resources, displaying of
connections, data, news, websites, mind maps, tools,
services and other data-related representations. We need to
identify vision to implementation that should include issues
like performance and support for enterprise wide use of
providing the linked data services. The vision must include
the importance of demonstration of business value of
linked data services like involving executives, other IT
teams, business end users early and often in proof of value

The big data with different characteristics, sources


internal or external is becoming one of the big challenges
in its management with companies. We have seen
traditional information management technologies and
approaches that provide integration of data and play an
important role in most of these companies.
Data
virtualization software offers a viable solution to speed up
integration, accurate interpretation, derivation of useful
information and improve decision-making capabilities in
applications requiring big data from multiple source
systems.
There exist a number of data representation
architectures that have been used in data virtualization tool.
The visualization architecture tool has been successfully
used in a variety of applications and has provided the
solutions in a very useful and easily understandable format.
We discuss some of these architectures of virtualization
tools along with their applications, features and limitations
for the representation of big data and extraction of useful
information from the data in the following section.

B. Data Virtualization Architecture Deployment Options


Data virtualization defines layered architectures for the
implementation of big data representation and extraction of
data. The layers include Data Abstraction Layer, Data
Services Layer, Globally-distributed Data Virtualization
Layer, and Logical Data Warehouse. There exist different
virtualization framework that have found their applications
in big data analytics. The following section describes in
brief Ciscos virtualization framework associated query
management that provides optimized queries for records
and attributes that can be defined. Once we have identified
virtualization tool and its query management, we then
introduce new methodologies to implement big data
applications. .
(i) Ciscos Data Abstraction Reference Architecture [1-2,
7-8]
This architecture consists of the following layers that
provide platform for the building data abstraction using the
data virtualization platform for any data applications.
Application Layer It maps the business Layer into
the format which each Data Consumer (user or application)
wants to consume the data. In other words, it means it
provides mapping of formatting it into XML for Web
services or creating views with different alias names that
match the way the consumers are used to seeing their data.
Business Layer This layer provides standard or other
acceptable formats for describing key business attributes
such as customers, products, formatted data, financial and
other related attributes. These attributes are defined by
defining a set of logical views which are being used on
multiple consumers by the application layer.
Physical Layer This layer integrates data sources
integrated into abstraction by defining value added tasks
such as name aliasing, value formatting, data type casting,
derived columns and light data quality checks, etc.
Metadata used here is typically derived from the physical
sources.

34 | International Journal of Computer Systems, ISSN-(2394-1065), Vol. 03, Issue 01, January, 2016

Gurdeep S Hura et al

Emergent Trends and Challenges in Big Data Analytics, Data Mining, Virtualization and Cyber Crimes: An Integrated
Global Perspective- II

We have seen traditional information architectures that


have been developed and are being employed. However
these architectures are not flexible and agile to reconfigure
to adopt changes in business strategies and modifications.
Data virtualization seems to be a suitable framework that
can accommodate these changes and needs into the
architecture.
(ii) Cisco data virtualizations
algorithms and techniques

query

optimization

These techniques provide optimized query features


and options to provide all low level details of business
requirements and needs. These techniques are very efficient
and offer timely and faster the needed information for any
business strategies. Currently, these have been accepted
widely by a number of industries for solving the big data
sets. For more details, please refer [1-2, 7]. The following
is a set of modules being supported by the above discussed
architecture:
i) Data Federation module of Cisco data virtualization
offers the data federation that virtually integrates the stored
data in memory to provide the complete behavior and
environment of data without the cost and overhead of
physical data consolidation.
ii) Data Discovery module of Cisco data virtualization
addresses Data Proliferation and offers a unique feature of
automating data entity and relationship identification.
Further it accelerates data modeling in such a way that data
analysts may clearly understand how the data sets have
been related and distributed.
iii) Data Abstraction module of Cisco data
virtualization converts complex data is into a very simple
form. It is a very strong and powerful data abstraction tool
that transforms the complex data into very simple form so
that its underlying structures can be mapped into common
standard semantics for easy processing and its use in the
application.
iv) Data Access, Caching and Delivery module of
Cisco Data Virtualization Improves data availability, offers
flexible standards-based data accesses, supports different
caching and delivery options for different types of
consumers for accessing the information.
v) Data Governance module of Cisco data
virtualizations maximizes control and it ensures data
security, data quality and 7x24 operations to maximize
control.
vi) Layered Architecture module of Cisco data
virtualization enables rapid change and offers a looselycoupled information architecture. This rapid development
tool provides the flexibility and agility needed to
accommodate any changes or modifications in
requirements or changes in business strategies of the big
data application.
C. Data Virtualization Implementation Methodology: [113, 18]
After defining virtualization framework and query
management, we introduce steps needed in implementation
methodologies for solving big data applications. It has to
ensure that customers are satisfied and using their

experiences, the methodology should be able to adopt


feedback. In other words, the methodology must offer the
following features and capabilities:
Providing guidelines for identifying the objectives
of big data applications effectively and efficiently
Options of verification and validation for optimal
success in the applications
Options of securing maximum returns from the
methodology
Flexible support for integration with any system
design, development and deployment processes.
Offer various internal and external resources, tools,
abstract
knowledge
levels
and
easy
implementation for predicting the outcomes and
self- sufficiency, easy adoption and reconfigurability capabilities.
The methodology is based on the same concept of
software lifecycle and includes well defined processes.
Some of the material discussed here has been derived from
[4, 8, 9, 11, 15]
The implementation methodology consists of the
following well defined structured process. There exist a
number of tools and software packages for each of these
processes.
Design and development
Configuration management
System architecture and solution architecture
Strategy and planning management
Prototype and deployment
Integrated Testing and improvement management
For the implementation and design of data
virtualization, we start with strategy planning management
where we need to define and develop an appropriate
framework that supports needs and objectives for data
virtualization. The following steps are required for its
implementation:
Step I: First, we have to identify and define data
virtualization strategies and policies that data virtualization
will offer, its structure, its usage pattern, specific project
use cases, interfacing, and other related opportunities.
Step II. Once we decide for data virtualization, we
need to identify the technical specifications and proper data
integration decision tools required for its implementation.
These tools are being used to define the structures to assess
multiple data virtualization frameworks along with their
features. Once we decide the framework we want to
consider, it allow the users to organize and prioritize the
project ownership, level of difficulty and advantages,
return of investment and other performance oriented
measures.
Step III. We also need to develop an operating manual
for the members of team to use and manage these tools.

35 | International Journal of Computer Systems, ISSN-(2394-1065), Vol. 03, Issue 01, January, 2016

Gurdeep S Hura et al

Emergent Trends and Challenges in Big Data Analytics, Data Mining, Virtualization and Cyber Crimes: An Integrated
Global Perspective- II

Further, technical skills and expertise need to be defined


for the team for their roles and responsibilities, the
productivity measurement, job descriptions, training for IT
professionals to see the capabilities of data virtualization,
hands-on development and configuration needs, knowledge
transfer process, saving time, promoting efficiency in long
run and returns of the investments.

Davis and Eve defined some of these guidelines as the


best practices for the adoption of virtualization in their big
data application [5, 13]:

Step IV. We need to create a multi-faceted training


session to educate and train wide range of IT staff on data
virtualization. Initially, we have to make the IT staff aware
of the capabilities of data virtualization and thereafter
appropriate training sessions for them for the undertaken
projects. During the implementation process of the entire
project, more and more hands-on development with experts
from data virtualization and partner system integrators will
help IT staff in gaining all the needed data virtualization
skills. This training session can be defined in a training
catalog that contains all the training sessions on daily basis
and also are defined as a set of modules.

2) Ensure that common data model that offers


consistency, high quality and create new business users
should be considered and implemented to create productive
and confidence among the potential users.

Step V. We need to define data governance policies


that list all the possible and known undefined activities and
also a list of activities from the execution of these activities
in a ripple way. The security mechanism for data
virtualization when used over internet must implement
authentication,
authorization,
encryption,
auditing
requirements,
transaction
logging,
configurations,
deployment, etc.
Step VI. We also need to Composite Professional
Services which will include well defined understanding of
governance and data visualization. These services can be
used to establish appropriate and suitable set of policies
needed for data virtualization that should be used by IT
staff as these two have to on the same page to take the full
advantages of data virtualization, its capabilities and
training for using it economically, efficiently and
effectively. This tool provides the structure needed to
assess multiple data virtualization opportunities relative to
one another. We should use it to help us organize and
prioritize our entire data virtualization project pipeline
including project owners, level of difficulty, and potential
return on investment.
The above was a brief description on virtualization
implementation methodologies that have been used in some
projects. Although there does not seem to have a
standardized methodologies, it became quite necessary to
identify general guidelines that can be used to use the
suitable methodology for a particular big data applications.

D. Guidelines for Data Virtualization Implementation


[13-15, 18]:
We have to establish our data virtualization strategy and
usage policies by first understanding what data
virtualization has to offer. We also have to learn how data
virtualization is used at other organizations including
general usage patterns and specific project use cases.
The following guidelines may be useful in deciding
whether should a company adopts data virtualization for
their big data applications.

1) Ensure that interested companies or organizations


quickly adopt data virtualization for the implementation of
intelligent storage component and create a bigger concept.

3) Ensure that we establish governance that should


include how to manage the data virtualization environment
for providing shared infrastructure and services.
4) Ensure that we create environment for providing
benefits of data virtualization, allocate consulting time for
business users and offer the services
5) Ensure that we establish performance tuning, and test
solution scalability early in the development process. We
may consider high performance computing with massively
parallel processing capability to handle query performance
on high-volume data and data analysis.
6) Ensure that we take phased approach to implement
data virtualization and then gradually implement the more
advanced federation capabilities of data virtualization.
7) Ensure that the company has prepared governance
and policies for adopting the data virtualization
8) Ensure that the company has prepared the basic
training and hands-on skills for the IT staff so that
appropriate recommendations and decisions can be
implemented by them to solve the big data projects
The above said guidelines for the use of data
virtualization tools will allow the users to implement it in
an efficient, effective and economical way. However, many
organizations and companies are showing strong interests
in using these tools for solving their big data applications
and one of the reasons for this is due to lack of evidence
and case studies, experiences with success in technical and
economic advantages.
As stated above, recent years have seen great interests
and efforts in implementing big data analytic and data
mining applications in the mainframe environment. In
particular, the social networking, scientific data in
embedded systems, Medical and Health based applications
have become very popular and are encouraging the
researchers and developers to explore new applications in
mainframe environment. The following section explains
how virtualization methodology has been implemented for
mainframe applications.
E. Mainframe Data Virtualization [9-10. 14-15]
Various surveys have concluded that about 60-70% of
worlds business and financial critical information is
maintained and manipulated in mainframe computing
systems. One of the methods of storing large amount of
data in industries is primarily based on non-relational
structure which becomes a major problem of its processing
and manipulation in standard relational data base
management frameworks. Further, many of the industries

36 | International Journal of Computer Systems, ISSN-(2394-1065), Vol. 03, Issue 01, January, 2016

Gurdeep S Hura et al

Emergent Trends and Challenges in Big Data Analytics, Data Mining, Virtualization and Cyber Crimes: An Integrated
Global Perspective- II

are facing is about the use big data for business


intelligence, analytics, cloud computing, mobile computing
and initiatives. IBMs new mainframe OZ systems are
helping the above mentioned new technologies due to
enormous amount of computing and storage.

features. Based on interesting results and solutions for big


data applications, some virtualization abstraction tools have
been introduced and have been tried in some application.
The following section describes some utility tool to be a
part of suit of virtualization tools.

The issues with the implementation of big data on


mainframes include data representation, data processing,
data replication, use of servers, connectors to point-to-point
integration, data manipulation and storage. Some of the
integrated methods of data management are expensive,
enormous growth and minimum customers expectations
for data of real-time systems.

F. Data Virtualization abstraction tools [1, 12-15, 18]


The data virtualization tool can be used as a utility tool
that may help in implementing data integration processes.
Recent year have seen it use in a number of industries to
create a platform for dynamic linked data services with
each element that can have ability of linking, browsing,
subscribing through a unified source in spite of the fact that
both data and sources may change dynamically. It has also
been used for defining layered information architecture like
data abstraction layer, data service layer, Globally
distributed data virtualization layer, logical data warehouse
etc. and this meets the needs of business process and
changes

Some of the issues of big data can be reduced by using


data virtualization technique for the scattered data across
the enterprise. It allows multiple and scattered data sources
can be accessed by a single logical interface that allows
separate an external interface from internal implementation
and it allows high degree of flexibility to changes. It is
important to know that in data visualization, data does not
move physically because it uses only the metadata to create
a virtual view of the data source/s, providing a faster, more
agile way to access and combine data from multiple
sources mainframe, distributed, Cloud and Big Data. It is
a part of implementation that data virtualization solution
resides by default under the platform like a distributed
Linux, UNIX, Windows (LUW) systems [4]
There are different approaches of transforming the
mainframe into data platform framework Rocket Software
takes a different approach. One of these approaches is
based on Rocket Data Virtualization Server (DVS): an
IBM system z data virtualization solution that maintains
mainframe connectivity and integration. It contains all the
components needed for real-time, universal access to data,
regardless of location or format. It eliminates redundant
point-to-point integration for improved performance,
scalability and manageability via its capability of reducing
the complexity of mainframe data integration. It provides
value of mainframe data to transform non-relational
mainframe data into relational format that be used by
Business Intelligence and Business analytics applications.
The development environment simplifies data
discovery, mapping and the creation of virtual tables;
standards based connectivity ensures secure, reliable
integration from any platform or data source; access
mainframe databases and programs, as well as nonmainframe data and application sources; a high
performance, multi-threaded, z/OS resident runtime
delivers highly scalable, low cost data virtualization, with
up to 99% of its processing running in the mainframe zIIP
specialty engine. The mainframe data virtualization
solution thus provides users or applications to access any
type of data and data provider independent of any
formation or location of the data [4]
The above was a brief discussion in virtualization
architecture, frameworks and methodologies for big data
analytics to implement data virtualization for various big
data applications on different platforms including
mainframe. These virtualization frameworks are very
powerful and have become basis for the development of
tools that can provide data analysis, data extraction, data
interpretation for useful information and many more

Data abstraction plays an important role in reducing the


gap between business needs and source datas original form
and format. This method and practice implementation of
data virtualization platform provides the following features
and benefits:
Offers simple information access
Offers common business view of the data
applications via an enterprise information model.
Offers more accurate data
Offers consistent security rules on data across all
data sources and consumers via a unified security
framework.
Provides end-to-end control to manage consistency
across multiple sources and consumers.
Supports business and IT change insulation where it
can adopt the changes and relocate the physical
data sources without impacting information users.
The work in the area of developing more utility for
abstraction tools is continuing and we are seeing a number
of new tools that are being used in some of the existing
applications.
The above was a brief discussion on various techniques
of data mining and virtualization that are being used or can
be used in big data analytics. Both are playing a very
crucial role in providing the solutions of big data. It is
expected that data mining will play even more vital role not
only in mining, representing the big data in a simple
readable and friendly manner, but its predictive analysis
technique and assessment ability of complex data will
enable new data analysis techniques to be introduced for
extracting and interpreting the data in a very useful way.
Further virtualization is helping the data mining-based
analysis to be more accurate and easy to understand the
outcome of processed data in a very simple way. It offers
very efficient and effective method of extracting and
interpreting the data.
The Most of the big data applications use Internet for its
transmission, processing and analysis and as such it is
expected that the communication environment over

37 | International Journal of Computer Systems, ISSN-(2394-1065), Vol. 03, Issue 01, January, 2016

Gurdeep S Hura et al

Emergent Trends and Challenges in Big Data Analytics, Data Mining, Virtualization and Cyber Crimes: An Integrated
Global Perspective- II

Internet should be highly secured, confidential and


dependable. The following section introduces a number of
Cyber Malicious Attacks and Cyber Crimes that may affect
the big data applications for deriving their solutions and
deployment. In order to understand how malicious attacks
work, we will first present attacks and crimes, understand
how these affect the normal working of big data analytics,
the consequences of attacks on our implementation and
then present techniques how these attacks can be
prevented. Finally, we present cyber-crime analysis that is
being used by law enforcement agencies for legal
investigations.

and challenging issues with network professional and


developers how to ensure that the use of internet is a safe
communication environment? A number of investigation
processes along with needed tools have been introduced
and are being used by various law enforcement agencies,
organizations and government agencies at Federal, State
and County levels. The investigation process in general
requires a dedicated team that performs the investigation
using well defined steps and tools for any internet crime in
their respective organizations.

G. Cyber Crime, Cyber Attack and Crime Analysis [1621]


As discussed above how the big data analytics
implements data mining and virtualization in the
implementation and predicting the solutions of big data
applications. The data analytics also support data security
and it has been observed in some of the above applications
how data analytics has explored intrusion detection,
differential privacy, digital watermarking, data integrity,
filtering, firewalls and malware countermeasures. In order
to understand the basics of data security, the chapter
provides a brief introduction of basic components of a
computer, data processing algorithms, interconnectivity
with networks and Internets, secured communication over
internet for various applications, etc. Further, chapter
discusses how a computer or any mobile device connected
to internet can be used as a tool for malicious attacks,
cyber-crimes, and also various counter measures that can
be implemented to protect the resources of computers.
Also, chapter summarizes various preventive measures that
have been used in data analytics in one way or the other
and still new counter measures are being investigated and
implemented.

The data processing over computer plays an important


role in various business sectors, government agencies,
private and corporate sectors and many other organizations.
All the transactions, banking, corporate records, various
activities in government agencies, and other areas are based
on computers, information security and internet. The
computer under its susceptibility to external attacks leaves
the auditor for verifying the accounts, and can be operated
from a distance using different forms of communications
over internet. The losses from computer crime cannot be
established without any clear understanding of what such
crime entails, and an accurate record of its occurrence.
Governments or corporates should have some process that
needs to be defined and considered in the event of any
computer-related illegal activities that may not be
acceptable. This may in turn have an effect on the degree to
which computer abuse is reported.

A computer can be defined as consisting of five main


components as input (which converts data and instructions
from human-readable to machine-readable codes), central
processing unit (that controls and coordinates the machines
and the data based on its operating instructions, or
program, also known as software), software (that is
qualitatively different in that it governs how these data are
processed), logical and memory units (that perform
calculations, decision-making and storage functions in
response to commands from the control unit), and the
output unit (that converts processing results back into
human-readable language or symbols).
Virtually all these components of a computer system
are vulnerable to invasion and abuse. The input can change
the data at input; operations and systems programmers can
manipulate data and software; transmission of data over
common carrier lines can be tapped; and both authorized
and unauthorized users can interfere with computer
operations at terminals.
Internet was made available to the public in early
nineties for its us and since then, we have seen a variety of
applications like e-mail, file transfer, remote login, internet
accesses, browsers, distributed computing, communication
(audio, text, pictures, images, attachments, on-line
shopping, on-line financial transactions, etc.). At the same
time, internet crime has become one of the most serious

IV.

DATA PROCESSING IN COMPUTERS

As with the automobile, the criminal use of computer


technology has increased the vulnerability of the
community, and to the extent that the definition of crimes
and the enactment of prohibitions are directed to the
protection of the community, computer technology is a
legitimate area of penal concern. Laws must not only
enable the redress of wrongs or the punishment of the
wrongdoer, they must also proscribe conduct; the
complexity of the means for misconduct afforded by
computer technology merits its special treatment.
When the computer is used as an instrument of crime,
we see familiar landmarks for identifying the conduct as
criminal where it is being used as metaphorical weapon at
any financial institutions. But when the computer is the
object of crime, this is not only limited to theft of the
computer itself, but include substantial value but that are
not tangible and whose legal status is unclear. For example,
the information stored in a computer can be misused and
retrieved without damage to the computer and without the
knowledge of the owner of its use. So great is the capacity
of a computer and so valuable are its services that use of it
even for short periods of time can be worth a lot. The
degree to which these intangibles can or should be
protected is a significant issue for the law. This is what
happens when computer crimes takes place.
Internet plays very important and crucial role in
business community around the globe and with on-line
capabilities over the internet and according one surveys,
there are over 80 million dot.com domains. Further, sale
over internet based on on-line shopping exceeds over a
trillion dollars annually. Cyber-attacks have been used as a
way of achieving criminal or political advantages. On other
hand, Cyber Crime is intended to steal personal and

38 | International Journal of Computer Systems, ISSN-(2394-1065), Vol. 03, Issue 01, January, 2016

Gurdeep S Hura et al

Emergent Trends and Challenges in Big Data Analytics, Data Mining, Virtualization and Cyber Crimes: An Integrated
Global Perspective- II

sensitive information from our computers that are


connected to internet. The following explains the basic
definition cyber terrorism, types of cyber malicious attacks
and cyber-crimes that have effect on the implementation of
big data solutions and big data analytics as whole.
The following section describes all the known cyberattacks and cyber-crimes and readers may find it interesting
to find all these in this chapter. It is possible that some of
these attacks and crimes may not be applicable to big data
analytics directly or indirectly, but it would be a good
survey of all these attacks and crimes.
A. Cyber Terrorism
This crime is caused by terrorist activities like
intentional use of computers, networks and large disruption
of computer networks over internet by the means of tools
such as computer viruses for causing destruction and harm
for personal objectives. Many of the minor incidents of
cyber terrorism have been identified and documented.
Another way of looking at cyber terrorism is to experience
any terror created in peoples minds while when similar
terror activities are created over Internet, it is known as
cyber terrorism. In some other publications, different
names for Cyber Crimes have been used such as:
cybercrime, cyberwar, terrorism and some related names.
In Cyber terrorism we deal use of electronics means to
attack on computers and information over internet.
B. Cyber Malicious Attacks
Cyber-attacks have been defined as a means of making
the system useless and take criminal or political
advantages. There are many types of cyber-attacks, but in
the literature, the following attacks have been recognized
as main Cyber-attacks as discussed below: For details,
please refer to [16-21].
i) Virus
One of the ways for transmitting malicious code in one
way into any computer is via cyber-crime virus. It is
defined as a self-replicating code embedded within another
program known as host. It is caused by a small program
that is designed to spread from one computer to another,
interfere with computer operation and leave infections. It
can destroy the operation of hardware, software and files
stored in computers. In general, all the viruses are attached
with executable program known as malicious and these
programs will not affect the computers until these are either
run, open or executed and can be spread by human being
by sending it via emails or attachments within emails.
Lets see how virus works? When user tries to execute
his/her host that has been infected by virus, the virus code
embedded with attachment executes. It tries to find another
executable program/code stored in computers file system.
Once it finds any executable program, it replaces that
program by itself (virus infected program/code). After this
action, virus now allows the host program to execute. The
viruses are spread via e-mail attachments. The virus
program occupies disk space, consumes CPU power and
can affect the computers file systems and any other
personal information stored. There is a large number
commercial antivirus software packages available that can
detect and destroy the viruses before it can cause any
damage to the computers. We have to be careful while

installing antivirus software packages as our computers


may be infected by fake antivirus applications that may
route our packets to any application we want to use through
its own intermediate server.
ii) Worm
It is defined as a self-contained program that looks for
security weak points or holes and use as entry point to
spread into computer. This crime is caused by a small
program similar to virus and is considered as sub class of
virus. A worm is similar to a virus and is considered to be a
sub-class of a virus. The worm also spreads from computer
to computer without the help of human being and as such it
travels from one computer to another computer via file or
information transport mechanism. One of the nicest
features of worm is that it replicates itself after its
execution and as such it can send large number of
replicated files from one computer to another.
A worm can send a copy of itself to any email address
and after travelling to another computer will replicate itself
into a number of copies on that computer and so on and it
consumes significant memory, the network bandwidth and
affects the working and functioning of web servers in one
way or the other. The effect of worm could be time
consuming and tedious as the IT department has to defend
computers from further attacks, investigate the computers
that have been effected, install patches, clean the computers
and bring them back into Internet.
This worm was launched in April 2004 and
same method of locating security weak point or
entering into computers. Its effect is rather
compared to other worms in the sense that the
computers shut down after booting.
iii)

uses the
hole for
minimal
infected

Instant messaging worm

This type of worm is targeted for instant messaging


systems and as such did not have much effect in 2001
(when it was launched). But now with over 800 million
using instant messaging, the effect of this worm has
become greater as those infected computers may not
provide the Microsoft instant messaging services until
appropriate patches are installed.
iv)

Conficker:

This type of worm was launched in Nov 2008 on


Windows computers and has a unique feature of
propagating through computers in a different ways.
Different variants of this worm have introduced since Nov
2008. The latest version of worm looks for computers with
weak password protection and is able to propagate through
USB flash memory devices and shared files on local area
networks. The current security measures are strong enough
to a have minimum effect of this worm on the computers.
v) Cross-site Scripting:
In this type of attack, the client-side script is injected
into web site. When user tries to access that web site, the
users browser executes the script which will record the
presence of any cookies, users activities, or perform any
other actions defined in the script.
vi)

Drive-by-downloads:

39 | International Journal of Computer Systems, ISSN-(2394-1065), Vol. 03, Issue 01, January, 2016

Gurdeep S Hura et al

Emergent Trends and Challenges in Big Data Analytics, Data Mining, Virtualization and Cyber Crimes: An Integrated
Global Perspective- II

Many of the genuine or legitimate web sites have been


infected by some kind of software that will allow the
software (not needed) to be downloaded and is known as
drive-by-demand. In some cases, user may see another
window while working on web site popping-up and asking
our permission to download the software. The user may
consider this as a part of the current web site he/she is
visiting. According to Google Anti Malware Team, more
than 300 million URLs that initiate drive-by-downloads
vii)

Trojan Horse

This crime is caused by a small program that steals


passwords to online games, changes icons on the desktop,
delete the files, destroying any information store on
computer. Sometimes it also performs actions unknown to.
This program also creates a backdoor program on our
systems which can be accessed by intruders who can access
all the information and any other confidential information.
This type of crime does not either reproduce by infections
other files or self-replicate.
viii) Backdoor Trojan
This program allows the attacker to get access to users
computer. It gives a feel that it is cleaning malware
program from the computer, but it is actually installing a
spyware
ix)

Spam

One of the most powerful applications of internet has


been e-mails and it is estimated that over one billion e-mail
accounts around the globe are active. It is also estimated
that over 300 billion e mail messages are being sent over
internet per day. Spam displaces the legitimate e-mail
message and creates a suspicious environment of the users
to guess the genuine email message different types of
networks through a program specifically designed for
searching computers with poor security and are connected
to internet. About 90% of the spam is communicated via
bot headers can that create Based on a number of surveys,
it has been seen that the number of span is increasing at an
alarming rate and in fact, in 2009, it shows that over 90%
of emails over Internet came out to be spam.
There is a significant wastage of processing, internet
bandwidth and storage over mail servers and this
constitutes a wasted productivity to the tune of over
billions of dollars. A number of spam filters have been
introduced for Internet Service Providers (ISPs) that block
spam from reaching users mailboxes.
x) Phishing and spear phishing
This attack is intended to get the access of computers
and retrieve the personal information and other sensitive
files. In this type of attack, an attacker makes use of botnet
to send e-mails to a large number of users. The IP address
of this type of mail looks genuine and advises the recipients
of e-mail to provide requested information such as login
name, password and other personal information. This
information is then used for identity theft. The number of
phishing attack is increasing every year.
Spear phishing attack is another form of phishing attack
where the attacker selected a particular category of

recipients for stealing their personal information. Some of


the groups may include: elderly people, retired people, etc.
xi)

SQL Injection

This type of attack is intended to attack web


applications that are driven and maintained by data bases.
The attacker can access the application and tries to insert
SQL based query into the text. The database will return
the needed personal information via a string in response to
SQL query.
xii)

Denial-of-Service (DoS) and


Denialof-Service (DDoS)

Distributed

This type of cyber-attack is politically motivated attack


which takes place between computers with a view to
undermine various features of internet communications
such as integrity, confidentiality, security measures,
availability, critical vulnerable infrastructures, etc. These
attacks are typically initiated by the government agencies,
terrorist organizations, and other groups who are politically
motivated for these attacks with a view to infect opponents
infrastructures and confidential policy documents.
Denial-of- Service attacks (DOS attacks) involves
flooding a computer with more requests than it can handle.
This causes the computer (e.g. a web server) to crash and
results in authorized users being unable to access the
service offered by the computer. The attackers usually
make web servers such as banks, credit card gateways, root
name servers, business, corporations, and many others.
A denial-of-service attack is characterized by an
explicit attempt by attackers to prevent legitimate users of a
service from using that service. There are two general
forms of DoS attacks: those that crash services and those
that flood services.
A DoS attack may include execution of malware that
may create following effects on the services being offered
by hosts over Internet: use of all the processing capabilities
of processors thus preventing any work from occurring,
trigger errors in the microcode of the machine, trigger
errors in the sequencing of instructions forcing the
computer to behave abnormally, exploit errors in the
operating system, crash the operating system etc.
There exists a different type of crimes such TCP/IP
SYN attack, PING of Death, Flood server with URL
requests, etc. In TCP/IP SYN, handshake protocol is
implemented to establish connection between client and
server where client requests, server acknowledges and
waits and then client acknowledges before the transmission
of data. In PING of death, many clients try to make
connection with PING server and cause significant traffic.
In the flood server with URL requests, there may be a
situation where one client or multiple clients may be
making a request at the same time causing distributed
Denial-of-service (DDoS) attack (usually in financial
sectors).
Cyber-crimes
We discuss different types of cybercrimes here and
summarize the key areas of online criminal activity in order
to summarize the types of crime which we are dealing with,

40 | International Journal of Computer Systems, ISSN-(2394-1065), Vol. 03, Issue 01, January, 2016

Gurdeep S Hura et al

Emergent Trends and Challenges in Big Data Analytics, Data Mining, Virtualization and Cyber Crimes: An Integrated
Global Perspective- II

and to seek to place them in an appropriate context in


which their impact can be judged.
The following section will describe each of the cybercrimes that affect the data analysis, virtualization and other
data mining techniques as applied to big data applications. .
i) Malware
A significant security weakness of unencrypted W-Fi
networks can be found in the extension of one of the
popular browsers as Firefox. The security weakness can
also be found in computers but malicious software known
as Malware can penetrated through these security measures
and consume significant amount of CPU, occupy a large
amount of space on our disk, destroy valuable data in file
systems. Once the attackers have access to our computers
via malware, our computers can be used as storage for
stolen credit card information, can be used as a launch pad
for transmitting spam, denial-of-service attacks on other
servers.
ii) Salami technique
This automated crime is caused due to by stealing small
amounts of assets from a large number of sources without
noticeably reducing the whole information. It is caused
due to make alteration or changes in one case for financial
institutions, banks or organizations. This type of crime is
usually committed via a series of many small actions which
gets turned into a bigger which becomes difficult to be
detected. The reason for this may be due to the fact that a
bigger action may be unlawful. One of the ways this crime
can be committed in financial sector could be take smaller
amount like penny by rounding off the figures and
accumulate a big amount over a period of time. The
implementation of small actions can be automated so that
the automatic collection of small amount can be performed
like publication sector, film industry Television, and other
similar systems.
This crime is based on the concept of divide and
conquers process of threats and alliances to be used in a
variety of application e.g. business, organizations, politics,
etc. Lets take an example bank where the processing of
interest rates is changed in such a way that the calculations
are calculated for rounded to the nearest integer value for
all the accounts. The automated program collects all the
values after rounding off and relays via funneling to the
intruder. It is very likely that this program may not be
detected as the small amount of interest will not be
coverable.
iii)

Scavenging

This crime is caused by securing of information that


may be left in or around a computer system after it has
been used for a job. The time-sharing computers are
involved in for storing and retrieving the data in different
memory devices such as tapes where previous job provides
scavenging entering of small data to read the entire tape.
Code numbers, passwords and encryption devices may be
used to prevent any unauthorized use.
iv)

Denial-of-service

We have discussed above this as an attack, but in some


books and publications, this is also considered as a cyber-

crime. It is caused by any attempt to make machine or


network resources unavailable to the users who are using
them. It is usually consist of interrupt or make unavailable
the services of hosts temporarily, or indefinitely or suspend
available that are connected to internet. Some of the
services affected by this crime include: Consumption of
computational resources, such as bandwidth, memory, disk
space, or time, Disruption of configuration information,
such as routing information, Disruption of state
information, such as unsolicited resetting of Transmission
Control Protocol (TCP) sessions, Disruption of physical
network components, Obstructing the communication
media between the intended users and the victim so that
they can no longer communicate adequately.
In general, this type of crime are implemented by either
forcing the targeted computer(s) to reset, or consuming its
resources so that it can no longer provide its intended
service or obstructing the communication media between
the intended users and the victim so that they can no longer
communicate adequately. This crime violates the proper
use policy defined by Internet Architecture Board (IAB)
and also accepted use policies defined by Internet Service
Providers. It also violates the laws of some of the countries
where these are being in used.
v) Financial crime
These crimes are caused due to cyber cheating, credit
card frauds, money laundering, hacking into financial
institutes and banks, accounting scams, computer
manipulations, etc.
vi)

On-line gambling

There exist a large number of web sites which offer


online gambling. It is interesting to note that some of the
countries have made these web sites legal and as such
online gambling is considered as legal and safe. Owners of
these web sites are licensed and hence are safe to operate
these activities safely in those countries.
vii)

Intellectual property Crimes

These crimes are caused due to software piracy, copy


right infringement trademarks violation, theft of programs
and source code, intellectual property violations (music,
poems, inventions, etc).
viii) Forgery
This crime is caused due to counterfeit currency notes,
academic certificates, mark sheets, revenue stamps that are
created by using computers, printers, scanners and
associated software.
ix)

Sale of illegal articles

This crime is caused by selling illegal items such as


illegal drugs, narcotics, weapons, pornography materials,
wildlife, information about availability of these items and
other illegal articles over Internet via posting on auction
web sites, bulletin boards, and any other similar web sites.
x) Cyber pornography
This crime is caused by creating pornographic web
sites, positing pornography magazines and bulletin boards

41 | International Journal of Computer Systems, ISSN-(2394-1065), Vol. 03, Issue 01, January, 2016

Gurdeep S Hura et al

Emergent Trends and Challenges in Big Data Analytics, Data Mining, Virtualization and Cyber Crimes: An Integrated
Global Perspective- II

or any other web sites over Internet through any computing


devices.
xi)

Email bombing

This crime is caused due to sending of a large number


of e-mails to selected target or selected server (e.g.
companys email server, internet service providers, etc.)
that crashes these servers.
xii)

Email spoofing

This is caused due to the fact that the email looks like
originating from the known source but instead it has been
sent from other source. In other words, the IP address has
been captured by attackers who in turn is using that send
their own message.
xiii) Cyber defamation
This is caused due to defamation or slander via digital
media namely computers and Internet, harming the
reputation of any individual person, business, product,
service, organization, government, religion, culture, nation,
inventions, family, any criticism without any evidence, and
any other form of defamation. It is not a specific offense,
misdemeanor or tort. Different countries have different
laws and punishments for this crime, but the fundamental
rights for this crime are defined in UN Declaration of
Human Rights and also in Fundamental Human Rights
(European Union).
xiv) Cyber Stalking
It can be defined as a technologically-based "attack" on
someone who has been targeted specifically for that attack
for reasons of anger, revenge or control. This crime is
caused by making use of internet, e-mail and any other
digital media and device for harassment, embarrassment
and humiliation of the victim, ruining the victim's credit
score, harassing family, friends and employers to isolate
the victim, scare tactics to instill fear, identity theft, threats,
vandalism, solicitation for sex or collecting information
that may be classified as harassment or threatening, false
accusation and similar acts. Cyber stalking may be offline
or online and both are criminal offenses. Cyberstalking
may be considered as a form of cyberbullying and many a
times these are used interchangeably for each other as both
are caused more or less by same set of activities.
Stalking is a continuous process, consisting of a series
of actions, each of which may be entirely legal in itself. It
can be considered as a form of mental assault and
harassment, in which the attacker repeatedly, unwantedly,
and disruptively breaks into the victims machine with
whom he does not have relationship with motives that are
directly or indirectly traceable to the affective computing
environment. It is important to know that cyberstalking is
slightly different than cyber trolling as the former deals
with an action of persistent and harmful while the later one
is mainly perceived as to be harmless. It is interesting to
note that cyberstalking if used for scrutinizing a public
figure like politicians, business, actors, etc. can be
considered lawful.
xv)

Web defacement

This crime is caused by an attack on a website over


internet with a view to change the visual appearance of the

site or a webpage. The attackers of this crime are able to


break into the web server and replace their web site
appearance by their own designed web page appearance. It
has been seen that religious, government and corporation
sites are the primary targets for the attackers to satisfy their
religious and political views and beliefs. The defacement,
these sites will be forced to shut down for repairs which
constitutes loss of profit, value and additional expenses for
their recovery.
xvi) Email bombing
This crime is caused by sending a large number of emails to the victims email address mail servers of
organizations, universities, government agencies or even
internet service providers. The mail purpose behind this
crime is an attempt to overflow mailbox or overwhelm the
mail servers with a view to cause denial-of-service to mail
boxes or mail servers. This type of crime can accomplished
by three methods: mass mailing, list linking and zip
bombing. In mass mailing, duplicate mails are being sent
to the same email address and are easy to design and
implement. This crime can also generate denial-of-service
type of attack, can use malware to attack a clusters of
computers, and also spamming for the transmission of
emails to email addresses continuously by programming
zombie botnets. This form of email bombing is similar in
purpose to other Distributed DoS flooding attacks. As the
targets are frequently the dedicated hosts handling website
and email accounts of a business, this type of attack can be
just as devastating to both services of the host. This type of
attack is more difficult to defend against than a simple
mass-mailing bomb because of the multiple source
addresses and the possibility of each zombie computer
sending a different message or employing stealth
techniques to defeat spam filters. Fortunately, some of
these crimes can be controlled by filters and firewalls.
In list linking, a selected or a particular email address
is assigned to a number of email list subscriptions. The
victim then has to unsubscribe from these unwanted
services manually. In order to prevent this type of
bombing, most email subscription services send a
confirmation email to a person's inbox when that email is
used to register for a subscription. This method of
prevention adds another new email account that can be set
to automatically forward all mails to victim.
The zip bombing is a variant of mail bombing, allows
the checking the mails after being filtered by anti-virus
software to look for file types that carries malicious
message. Such file types include: EXE, RAR, Zip, 7-Zip
and many others. These files are usually are compressed. In
order to read the contents, these files need to be unzipped
or uncompressed and this activity consumes significant
amount of processing which may cause denial-of-service
type of attack.
xvii) Spyware and Adware
Spyware as name suggests is a program that performs a
number of activities on our behalf like monitoring of web
surfing, log keystrokes, capturing the screen snapshot of
our work, transmission of reports to host, and many other
related activities of our work. Adware program on other

42 | International Journal of Computer Systems, ISSN-(2394-1065), Vol. 03, Issue 01, January, 2016

Gurdeep S Hura et al

Emergent Trends and Challenges in Big Data Analytics, Data Mining, Virtualization and Cyber Crimes: An Integrated
Global Perspective- II

hand pops-up commercial advertisements related to our


work and many other activities.
xviii) Rootkits
This crime is defined as a set of programs that provide
privileged access to our computers and will start executing
before the operating systems has completed the booting
process. In doing so, this program inserts its security
privilege to mask the underlying security measures.
xix) Bots and Botnets
A bot program acts like a backdoor Trojan that
responds to remote command and control programs. It has
effected two popular applications as Internet Relay Chat
and multiplayer Internet games and now it is being used to
support illegal activities in other applications as well. The
computers which are infected by Bots form a network
know as botnet. The size of botnet is becoming bigger and
bigger and many users are not sure if their computers are
part of botnet.
xx)

Blended threat

This crime is caused by using server and Internet


vulnerabilities to initiate the program, transmit and spread
an attack onto computers. This type of crime is more
sophisticated than that of viruses, worms, trojan horse and
malicious code as it harms to the infected system, networks
by getting propagated through different methods, points
and exploit vulnerabilities.
This type of attack is designed to use multiple modes of
transport where a worm may travel and spread through email, a single blended threat could use multiple routes
including e-mail, IRC and file-sharing sharing networks.
Sometimes in addition to specific attack on predetermined
.exe files, this attack could do multiple malicious acts, like
modify your exe files, HTML files and registry keys at the
same time and can cause damage within several areas of
networks at one time. Blended threats are considered to be
the worst risk to security since the inception of viruses, as
most blended threats also require no human intervention to
propagate.
xxi) Keylogger
This crime is caused by a program that is being used as
a covert way where the user is unaware of this program on
his/her machine. This program is known as a keylogger or
keystroke logging records each key stroke on the keyboard.
This type of program finds its use in the study of human
computer interaction. Thy keylogger uses different
keylogging methods based on hardware and software.
Some IT organizations use keyloggers to trouble shoot
technical problems with computers and business networks.
Some legal use of keyloggers includes family or business
people using them to monitor the network usage without
their users direct knowledge. However, malicious
individuals may use keyloggers on public computers to
steal passwords or credit card information.
The keylogger program can be implemented using
different approaches and some of the approaches are being
discussed below:

V.

VARIOUS IMPLEMENTATION APPROACHES


OF KEYLOGGERS:
i) In the first approach the keylogger may reside in a
malware hypervisor running underneath the
operating system that remains untouched and may
eventually become virtual machine.
ii) In another approach the program can obtain access
to root and hides itself in operating and starts
intercepting keystroke that pass through kernel.
This type of program resides at kernel level is
difficult to detect, especially for user mode
applications that do not have root access. These are
usually implemented as rootkits that subvert the
operating system kernel and gain unauthorized
access to hardware, making it more effective. It
usually becomes device drivers for them to gain
access to keyboard.
iii) Another approach, this program kooks keyboard
APIs inside a running application. It registers for
keystroke events and receives an event each time
any key is either pressed or released and records it.
Windows APIs such as GetAsyncKeyState(),
GetForegroundWindow(), etc. are used to poll the
state of the keyboard or to trigger keyboard events.
iv) Another approach is based on the logging web
form submission by recording the web browsing
for submitting events. This type of situation may
happen when we hit enter key after filling a form
which record data before it is passed over the
internet.
v) Another approach is based on memory injection
concept where keylogger changes the memory
tables associated with the browser and other system
functions to execute their logging operations. By
injecting this into memory, this program can be
used by malware users to bypass user account
controls.
vi) Another approach is based on the concept of
capturing traffic that is associated with HTTP post
even to retrieve unencrypted passwords.
vii) Another approach is based on remote access
software with added feature that allows access to
the locally recorded data from remote location.
Remote communication may be achieved via FTP
server, e-mail, wireless communication, remote
login, etc.

Keystroke logging program has been used to write a


variety of programs to study the writing process and as
such we have seen a number of programs that have been
developed to gather online process of writing new
programs defining activities like Inputlog, scriptlog,
translog, etc In addition to these condensed program, it has
found its application in writing contexts such as cognitive
writing process, framework of writing strategies, writing
learning programs for children like spelling, first and
second language writing, translation, subtitling, simple and
difficult programs for children, and many other similar
programs. Recent years have seen a big interest in using
this program for developing integrated educational domains

43 | International Journal of Computer Systems, ISSN-(2394-1065), Vol. 03, Issue 01, January, 2016

Gurdeep S Hura et al

Emergent Trends and Challenges in Big Data Analytics, Data Mining, Virtualization and Cyber Crimes: An Integrated
Global Perspective- II

for second language learning, programming skills, typing


skills and many other learning-based programs.
xxii) Internet time theft
This connotes the usage by an unauthorized person
of the internet hours paid by another person. One of the
most common and difficult to detect forms of the office
time theft are employees who use technology for non-work
related purposes. This could entail everything from
browsing the internet time theft, to spending time on
social networking sites and texting during work hours.
This type of crime may be prevented by carefully
monitoring the check in, check out time and any breaks of
the employees. It may a bit difficult to manage these
activities manually but use of time and attendance software
not only reduces the time and efforts considerably, but also
provides more accurate monitoring of employees. This
software may be integrated with different punch
clock
hardware systems like YubiKeys, swipe cards, biometric
devices, etc. This will also help other departments such as
payroll, attendance processing, accounting, etc. for
exporting work times into payroll software such as
QuickBooks, Supply accounting, etc.
xxiii) E-mail fraud
This crime is caused by falling into scam and providing
bank details based on the contents of email containing
official looking document relating to bank transfer of a
huge sum of money from Internal Revenue Service (IRS),
lottery, or inheriting account.
xxiv) Web jacking
This crime is caused by hackers who gains access and
control the web site of another user and he or she may also
change the information of that web site. The reason behind
this type of attack may be based on political objectives, or
money.
xxv) Data Diddling
This crime is caused due to illegal or unauthorized
data alteration of the information and these changes take
place before and during data input or before output to a
computer system. In other words, person may make
mistakes in the information while typing. This type of
crime can be committed without any skill and can easily be
avoided by introducing the policies and internal control via
regular audits or built-in software. Data diddling is the
changing of data before or during entry into the computer
system. In other words, information is changed from the
way it should be entered by a person typing in the data.
Usually, a virus that changes data or a programmer of the
database or application has pre-programmed it to be
changed. Anyone who creates, records, transports, encodes,
examines, checks or otherwise has access to data that will
enter a computer has an opportunity to change that
information to his or her advantage before it enters
processing.
Lets take an example of someone who filled out
data forms for payroll purposes noticed that Over-time
claims were entered into the computer by employee
number and not name. Accordingly, individual enters the
number against the claims of other employees who worked.

Overtime frequently, and received extra income over a


period of time.
This is one of the simplest methods of committing a
computer-related crime, because it requires almost no
computer skills whatsoever. Despite the ease of committing
the crime, the cost can be considerable. Another situation
may represent this problem where a person entering
accounting may change data to show their account, or that
or a friend or family member, is paid in full. By changing
or failing to enter the information, they are able to steal
from the company. To deal with this type of crime, a
company must implement policies and internal controls.
This may include performing regular audits, using software
with built-in features to combat such problems, and
supervising employees.
VI.

CONSEQUENCES OF CYBER ATTACKS AND CYBER


CRIMES

The overall significance and consequences of computer


crime sometimes may become too difficult to assess as the
statistics available are not reliable because there is a
particularly profound unwillingness to report computerrelated crime. There may be many reasons for this, but the
following four reasons have been considered as widely
acceptable to justify the above difficulty:
To avoid any damage to its reputation and loss of
public confidence;
Lack of tools and infrastructure to conclude the
existence of crime;
To estimate the concern about possible liability for
failure to prevent the incident;
To avoid the users belief that public exposure of
the incident would be tantamount to an admission
of vulnerability, as well as instruction to others on
how to commit the crime.
Cyber Crimes using computers is fully prosecutable
under existing substantive law (with perhaps some
modification in procedural law, especially in rules of
evidence). Other abuse, such as "theft" of information or of
computer time, should be left to the civil law so as to
prevent stifling innovation. One critic contends that actual
computer-assisted crime is much less prevalent than
popularly believed, though a certain mystique has
unfortunately been attached to the whole area. The
attachment of criminal consequences to unauthorized use
could have serious effects on the computer industry. In
addition, it is said that computer time and efficiency are so
valuable that the existing lax industry standards of security
should no longer be tolerated.
How to prevent Cybercrime from malicious attack?
Following is a list of preventive measures that we
should take to avoid/prevent the occurrence of cyber-crime
in our system:
Ensure that the Operating system is up-to-date. This
is essential if we are using Windows operating
system
Ensure, you have anti-virus software installed on
our system.

44 | International Journal of Computer Systems, ISSN-(2394-1065), Vol. 03, Issue 01, January, 2016

Gurdeep S Hura et al

Emergent Trends and Challenges in Big Data Analytics, Data Mining, Virtualization and Cyber Crimes: An Integrated
Global Perspective- II

Ensure we download updates frequently so that the


software has the latest fixes for new viruses,
worms, and Trojan horses.
Ensure we anti-virus program has the capability to
scan e-mail and files as they are downloaded from
the Internet.
Ensure we run full disk scans periodically. This will
help prevent malicious programs from even
reaching our computer.
Ensure we install a firewall. The firewall prevents
unauthorized use and access to we computer and
can be either hardware or software. The hardware
firewall provide a strong level of protection from
most of the attacks coming from outside world
and is available stand-alone product or as a
broadband routers. In general, the hardware
firewalls are ineffective against viruses, worms
and Trojans. The software firewall provides our
computers from outside attempts to control or
gain access to computer and usually provides
additional protection against these attacks. The
main problem with firewall is that it provides
protection against attacks only on the system
where it is installed and does not support the
network. It is important to note that firewall adds
extra security and protection when used with
operating systems updates and a good anti-virus
scanning system.
Defensive Measures:
There are many ways to protect our assets from cyberattacks and can be grouped into three main categories of
measures as: security patches, antimalware and firewalls.
In the first category of measure, software based
method
is
used
against
malicious
attack/vulnerabilities. The software codes are
being created for different categories of attacks
and are being used as patches and need to be in
up-to-date with new patches on a regular basis
Antimalware tools have been introduced to protect
our computers from malware such as viruses,
worms, adware, Trojan horses, spyware and many
other types of malwares. Antimalware tools scan
computers hard disk and other memory devices,
detect the files that may look like a suspicious file
and delete those from the memories.
Firewall is designed to block the network traffic
from Internet to computers and back to internet.
These are available as hardware and software
tools. The software firewall can be configured to
block/allow any applications from Internet to run
on his/her computers. One of the weaknesses of
firewall is that it can be detected by malware
which can allow the applications of internet to run
on our computers by bypassing or disabling the
firewall.
Cyber Crime analysis and Investigations
As stated above Cyber Crime is defined as unlawful
and unethical acts where computers are either used as a tool

or a target or even both. In April 2001, the US


Government responded to this threat by announcing a $25
million initiative involving the creation of a National HighTech Crime Unit to counter the growing use of the Internet
for criminal activity. The online world is becoming
increasingly vulnerable to criminal activity with 43% of the
public identifying cyber-crime as a problem.
Crime analysis is the systematic approach that deals
with the study of crime and disorder issues. It also helps the
law enforcement agencies with spatial, temporal and socio
demographic factors during their investigation. It also
provides support for criminal apprehension, tools and
techniques for reducing crimes, crime prevention and
evaluation.
Crime analysis process demonstrates various elements
like inquiry introduction, investigation, examination of
various situations, analysis of information, and summary of
the findings. It offers a systematic approach to investigate
the crime and any disorder problems and any other issues
related to law enforcement agencies. Many researchers feel
that it only deals with investigation and use of tools to
solve any crime situation, but this crime analysis focusses
on the application of social science data collection
procedures, analytical techniques and statistical analysis for
summarizing the findings of investigation
Crime analysis has become very important in recent
years due to big data applications and many researchers
have adopted different approaches to implement this
problem and as such we conflicting definition of crime
analysis. Although definitions of crime analysis differ in
specifics, but, all these definitions have some common
objectives: all agree that crime analysis utilizes a
systematic approach and supports the investigation efforts
of law enforcement agencies and users.
Crime analysis is being used by law enforcement
agency as a function that provides a systematic analysis for
identifying and analyzing patterns, frequency and trends in
crime and disorder activities. Information on patterns can
help law enforcement agencies deploy resources in a more
effective manner, and assist detectives in identifying and
apprehending suspects. Crime analysis also plays a role in
devising solutions to crime problems, and formulating
crime prevention strategies.
VII. CONCLUSION
With a detailed background of big data and big data
analytics technology and its successful implementation of
data mining techniques and virtualizations, the paper
provided a detailed discussion how each of these concepts
have been redefined and used in some successful
applications. The chapter also discussed various
frameworks and tools that have been introduced with their
details with a view that readers can get all needed
experiences of these frameworks and tools for development
of applications in future. The chapter also presented
limitations and challenges in some of these concepts that
need to be investigated for future work.
Virtualization is another method of representing the
information efficiently and effectively so that users can
understand the contents of the information and has been
very useful and powerful for presenting the information in

45 | International Journal of Computer Systems, ISSN-(2394-1065), Vol. 03, Issue 01, January, 2016

Gurdeep S Hura et al

Emergent Trends and Challenges in Big Data Analytics, Data Mining, Virtualization and Cyber Crimes: An Integrated
Global Perspective- II

a simple way for the users to understand. We have


discussed a number of case studies where it has been used
successfully
Since we are handling the big data over internet where
different types of attacks and crimes will affect operations
and services provided by big data, we have discussed all
the known cyber-attacks and cyber-crimes with a view to
understand how these malicious attacks and crimes affect
the functioning of big data applications.

of the data items for the evaluation of its validity and


accurate conclusions, new methods for evaluating the
quality of data, new methods to measure the accuracy,
reliability and quality of data solutions, etc.
ACKNOWLEDGMENT
I am thankful to two of my graduate students Mr.
Avinash Dudi and Ms Seethi Venkata Sandhya Dhari who
helped me in searching articles and reviewing some of the
articles for me.

VIII. FUTURE RESEARCH DIRECTIONS


Big Data and big data analytics are becoming new way
for exploring and discovering interesting, valuable
information. The volume of data for different applications
in the last few years have been increasing at exponentially
and as such the big area technology will be widely used to
provide the solutions and interpretation of useful data from
the information via data analysis Other related topics
associated with this technology like new tools and
techniques, new and improved frameworks, new analysis
tools etc. need to be discussed. In addition to these, we also
need to address and incorporate critical issues of privacy
and security of the big data and big data analytics in future
This new technology has become one of the leading
technologies in the last few years as it has implemented
advanced data mining techniques, virtualizations,
optimizations, text mining, etc. Data mining has been in
existence for over 30 years and found its use in solving a
variety of applications. With large amount of data, the data
mining techniques have been become very useful and
effective in managing high volume of data and also provide
techniques for analyzing the data and interpreting to
provide useful meaning of the information. In particular,
data mining techniques find useful applications in social
networks.
Recent years has experienced that one of the most
popular open source software framework Hadoop has
become a common solution for processing large amounts
of data in few applications. It looks that the future
development in this framework is expected to focus on
systems that should provide real-time ad hoc querying
capabilities over large scale data. Other important interest
in this framework should focus on the development of
querying systems that make use of SQL, in order to
leverage existing SQL knowledge amongst users to query
against Hadoop systems. Other interesting extension in this
framework focusses on data management and big data
technology.
Based on various publications and success stories in big
data applications, researchers and developers still feel some
issues and challenges that need to be addressed and
considered in future applications of big data and big data
analytics. Some of these include: volume or amount of data
is continually growing, transmission of this big volume of
data over internet, different techniques for processing of
data, processing of data at the location where it is stored,
retrieving the data for its processing, processing of subset
of data based on identifying the quality and other attributes,
mapping between quality data and quantity of data for
solutions and its validity, new methods of evaluating the
validity of all the data items in the set, identification of size

REFERENCES
[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]
[12]
[13]

[14]
[15]

[16]
[17]

Paolo Ciuccarelli, Giorgia Lupi, Luca Simeone (2014) "Visualizing


the Data City: Social Media as a Source of Knowledge for Urban
Planning and Management", Springer.Verlag
Pokorny, J. (2011). NoSQL databases: a step to database scalability
in web environment. In Proceedings of the 13th International
Conference on Information Integration and Web-based
Applications and Services (www.ccsenet.org/nct Network and
Communication Technologies Vol. 2, No. 1; 2013
P. Russom, (2011) Big Data Analytics , TDWI Best Practices
Report, TDWI Research, Fourth Quarter 2011, last access April 3,
2015, http://tdwi.org/research/2011/09/best-practices-report-q4-bigdata-analytics/asset.aspx
R. Weiss and L.J. Zgorski, (2012), Obama Administration Unveils
Big Data Initiative:Announces $200 Million in new R&D
Investments, Office of Science and Technology Policy Executive
Office of the President, March 2012
Rajan, S. et al. (2012). Top Ten Big Data Security and Privacy
Challenges. Retrieved from
https://downloads.cloudsecurityalliance.org/initiatives/bdwg/Big_D
ata_Top_Ten_v1.pdf
Reed, B. (2012). ZooKeeper Overview. Last access March 10, 2015
https://cwiki.apache.org/confluence/display/ZOOKEEPER/Project
Description
Ryaboy, D. (2012). Twitter at the Hadoop Summit. Last access Nov
11, 2014
http://engineering.twitter.com/2012/06/twitter-at-hadoopsummit.htm
S. Singh and N. Singh, (2011) Big Data Analytics, 2012
International Conference on
Communication, Information &
Computing Technology Mumbai India, IEEE, October 2011
Sanjay P Ahuja and Bryan Moore (2013) State of Big data analysis
in the cloud, Network and Communication technologies, Vol 2, No.
1, 62-68, 2013
S. Ghemawat, H. Gobioff, and S. Leung (2003), The Google file
system,Symposium on Op-erating Systems Principles, 2003, pp
2943.
S. Madden, (2012), From Databases to Big Data, IEEE Internet
Computing, June 2012, v.16, pp.4-6
Shashank Tiwari, (2011) Professional NoSQL, Wrox Publications,
2011 Edition
Tierney, B., Kissel, E., Swany, M., & Pouyoul, E. (2012). Efficient
data transfer protocols for big data.E-Science (e-Science), 2012
IEEE 8th International Conference on (pp. 1-9).
http://dx.doi.org/10.1109/eScience.2012.6404462
Tom White, (2012), Hadoop: The Definitive Guide, OReilly
Media, 2012 Edition
Warren Pettit, (2012), Introduction to Pig, Big Data University,
Online,
last
access
March
23,
2015,
http://bigdatauniversity.com/bdu-wp/bdu-course/introduction-topig/
Weil, K. (2010). Hadoop at Twitter. Last access, Nov 11,2014
http://engineering.twitter.com/2010/04/hadoop-at-twitter.html
U. Fayyad, G. Piatetsky-Shapiro and P. Smyth, (2012) From Data
Mining to Knowledge Discovery in Databases", American
Association for Artificial Intelligence, AI Magazine, Fall 1996, pp.
37- 54 http://engineering.twitter.com/2012/06/twitter-at-hadoopsummit.html

46 | International Journal of Computer Systems, ISSN-(2394-1065), Vol. 03, Issue 01, January, 2016

Gurdeep S Hura et al

Emergent Trends and Challenges in Big Data Analytics, Data Mining, Virtualization and Cyber Crimes: An Integrated
Global Perspective- II

[18] Unlocking
Agility
with
Data
Virtualization,
www.denodo.com/en/video/webinar/unlocking-agility-datavirtualization
[19] G. S Hura, Chapter 29: Computer Networks: LANs, MANs,
WANs, and Wireless, Digital Process Control and Networks,
Taylor and Francis Group in June 2011.
[20] G. S Hura, Chapter 30: Internet Fundamentals and Cyber Security
Management, Digital Process Control and Networks, Taylor and
Francis Group in June 2011
[21] Gurdeep. S. Hura and M. Singhal:
Data and Computer
Communications: Networking and Internetworking, CRC Press,
April 2001.

47 | International Journal of Computer Systems, ISSN-(2394-1065), Vol. 03, Issue 01, January, 2016

You might also like