Professional Documents
Culture Documents
Journal of Computer
Engineering
Technology (IJCET),
ISSN 0976-6367(Print),
INTERNATIONAL
JOURNAL
OFand
COMPUTER
ENGINEERING
&
ISSN 0976 - 6375(Online), Volume 6, Issue 3, March (2015), pp. 12-23 IAEME
TECHNOLOGY (IJCET)
IJCET
IAEME
T.I. Bagban2,
B.S.Patil3,
R.U.Patil4,
S.A.Gondil5
ABSTRACT
This paper discusses a propose cloud system that mixes On-Demand allocation of resources
with improved utilization, opportunistic provisioning of cycles from idle cloud nodes to alternative
processes .Because for cloud computing to avail all the demanded services to the cloud customers is
extremely troublesome. It's a significant issue to fulfil cloud consumers needs. Hence On-Demand
cloud infrastructure exploitation Hadoop configuration with improved C.P.U. utilization and storage
hierarchy improved utilization is projected using Fair4s Job scheduling algorithm. therefore all cloud
nodes that remains idle are all in use and additionally improvement in security challenges and
achieves load balancing and quick process of huge information in less quantity of your time and
method all kind of jobs whether or not it\'s massive or little. Here we have a tendency to compare the
GFS read write algorithm and Fair4s job scheduling algorithm for file uploading and file
downloading; and enhance the C.P.U. utilization and storage utilization. Cloud computing moves the
appliance software system and databases to the massive data centres, wherever the management of
the information and services might not be totally trustworthy. thus this security drawback is finding
by encrypting the information using encryption/decryption algorithm and Fair4s Job scheduling
algorithm that solve the problem of utilization of all idle cloud nodes for larger data.
Keywords: C.P.U Utilization, Encryption/decryption algorithm, Fair4s Job scheduling algorithm,
GFS, Storage utilization.
12
I.
INTRODUCTION
Cloud computing considered as a quickly rising new technology for delivering computing as
a utility. In cloud computing varied cloud customers demand type of services as per their
dynamically ever-changing needs. Thus it's the work of cloud computing to avail all the demanded
services to the cloud customers. But as a result of the supply of limited number of resources it's very
troublesome for cloud suppliers to produce all the demanded services. From the cloud providers'
perspective cloud resources should be allotted in a very honest manner. So, it is a very important
issue to fulfil cloud consumers' Quality of service needs and satisfaction. So as to make sure ondemand accessibility a supplier has to overprovision: keep an outsized proportion of nodes idle so
they will be wont to satisfy an on-demand request that might come back at any time. The necessity to
stay of these nodes idle results in low utilization. The only way to improve it's to keep fewer nodes
idle. But this implies probably rejecting a higher proportion of requests to some extent at that a
provider now not provides on-demand computing [2]. Many trends are gap up the era of Cloud
Computing that is a web primarily based development and use of engineering. The most cost
effective and a lot of more powerful processors, beside the software as a service (SaaS) computing
design, area unit reworking knowledge canters into pools of computing service on a large scale.
Meanwhile, the network
Band width increase and reliable nevertheless versatile network connections build it even
doable that clients will currently Subscribe top quality services from knowledge and software
package that reside solely on remote data centres. In the recent years, Infrastructure Service (IaaS)
cloud computing has emerged as an attractive different to the acquisition and management of
physical resources. An important factor of Infrastructure-as-a-Service (IaaS) clouds is providing
users on-demand access to resources. However, to supply on-demand access, cloud suppliers should
either considerably over provision their infrastructure (or pay a high value for operative resources
with low utilization) or reject an oversized proportion of user requests (in that case the access isn't
any longer on-demand). At the same time, not all users need really on-demand access to resources
[3]. Several applications and workflows are designed for recoverable systems wherever interruptions
in service are expected. Here a technique is propose, a cloud infrastructure with Hadoop
configuration that mixes on-demand allocation of resources with expedient provisioning of cycles
from idle cloud nodes to different processes. The target is to handles larger data in less amount of
your time and keeps utilization of all idle cloud nodes through rending of larger files into smaller one
exploitation Fair4s Job scheduling algorithm, additionally increase the utilization of central
processing unit and storage hierarchy for uploading files and downloading files. To stay data and
services trustworthy, security is additionally maintain using RSA algorithm that is wide used for
secure knowledge transmission. Also we have compare the GFS read write algorithm with the Fair4s
Job scheduling algorithm thus we are going to get the improved utilization results because of varied
options obtainable in Fair4s job scheduling algorithm just like the Setting Slots Quota for Pools,
Setting Slot Quota for Individual Users, assignment Slots based on Pool weight, Extending Job
Priorities these options permits provides practicality so job allocation and load equalisation takes
place in efficient manner.
II.
LITERATURE SURVEY
There is abundant analysis work in the sphere of cloud computing over the past decades. a
number of the work done has been mentioned, this paper researched cloud computing design and its
safety, planned a replacement cloud computing design, SaaS model was used to deploy the
connected software system on the cloud platform, so the resource utilization and computing of
scientific tasks quality are going to be improved [17]. Workload characterization studies square
13
measure helpful for serving to Hadoop operators determine system bottleneck and figure out
solutions for optimizing performance. several previous efforts are accomplished in numerous areas,
together with network systems [06], a cloud infrastructure that mixes on-demand allocation of
resources with expedient provisioning of cycles from idle cloud nodes to different processes by
deploying backfill virtual machines (VMs) [21].A model for securing Map/Reduce computation
within the cloud. The model uses a language primarily based security approach to enforce data flow
policies that vary dynamically because of a restricted revocable delegation of access rights between
principals. The decentralized label model (DLM) is employed to specific these policies[18].A new
security design, Split Clouds, that protects the data hold on in a cloud, whereas the architecture lets
every organization hold direct security controls to their data, rather than exploit them to cloud
providers. The main of the model includes of time period data summaries, in line security gateway
and third party auditor. By the mix of the 3 solutions, the design can prevent malicious activities
performed even by the safety administrators within the cloud providers [20].Several studies [19],
[20], [21] have been conducted for workload analysis in grid environments and parallel computer
systems.
They proposed various methods for analysing and modelling workload traces. However, the
job characteristics and scheduling policies in grid are much different from the ones in a Hadoop
system.
III.
14
Cloud computing has become a viable, thought resolution for processing, storage and
distribution, however moving massive amounts of knowledge in associated out of the cloud
presented an insurmountable challenge[4].Cloud computing is a very undefeated paradigm of service
destined computing and has revolutionized the means computing infrastructure is abstracted and
used. Three most well-liked cloud paradigms include:
1. Infrastructure as a Service (IaaS)
2. Platform as a Service (PaaS)
3. Software as a Service (SaaS)
The thought can even be extended to info as a Service or Storage as a Service. Scalable
database management system (DBMS) each for update intensive application workloads, in addition
as decision support systems square measure important a part of the cloud infrastructure. Initial styles
embody distributed databases for update intensive workloads and parallel database systems for
analytical workloads. Changes in information access patterns of application and therefore the have to
be compelled to scale intent on thousands of commodity machines led to birth of a replacement
category of systems referred to as Key-Value stores[11].In the domain of data analysis, we propose
the Map Reduce paradigm and its open-source implementation Hadoop, in terms of usability and
performance.
The System has six modules:
1.
Hadoop Configuration( Cloud Server Setup)
2.
Login & Registration
3.
Cloud Service Provider(CSP)
4.
Fair4s Job Scheduling Algorithm
5.
Encryption/decryption module
6.
Administration files(Third Party Auditor)
3.1 Hadoop Configuration (Cloud Server Setup)
The Apache Hadoop is a framework that permits for the decentralized process of huge data
sets across clusters of computers using straightforward programming models. it's designed to
proportion from single servers to several thousand nodes, providing massive computation and
storage capacity, instead of think about underlying hardware to give large availability, the
infrastructure itself is intended to handle failures at the application layer, thus delivering a most
available service on prime of a cluster of nodes, every of which can be vulnerable to failures [6].
Hadoop implements Map reduce, using the HDFS. The Hadoop Distributed File System allows users
to possess one available namespace, unfold across several lots of or thousands of servers, making
one massive file system. Hadoop has been incontestable on clusters with more than two thousand
nodes. The present style target is ten thousand node clusters.
Hadoop was inspired by MapReduce, framework during which associate application is deescalated into varied tiny parts. Any of those parts (also referred to as fragments or blocks) may be
run on any node within the cluster. The present Hadoop system consists of the Hadoop architecture,
Map-Reduce, the Hadoop distributed file system (HDFS).
15
with a lower priority. A quantified job priority contributes to differentiate the priorities of small jobs
in numerous user-groups. Programming Model
3.4.2 Fair4s Job Scheduling Algorithm
A job scheduling algorithm, Fair4S, which is modeled to be biased for small jobs. In variety
of workloads Small jobs account for the majority of the workload, and lots of them require instant
responses, which is an important factor at production Hadoop systems. The inefficiency of Hadoop
fair scheduler and GFS read write algorithm for handling small jobs motivates us to use and analyze
Fair4S, which introduces pool weights and extends job priorities to guarantee the rapid responses for
small jobs [1] In this scenario clients is going to upload or download file from the main server where
the Fair4s Job Scheduling Algorithm going to execute. On main server the mapper function will
provide the list of available cluster I/P addresses to which tasks are get assigned so that the task of
files splitting get assigned to each live clusters. Fair4s Job Scheduling Algorithm splits file according
to size and the available cluster nodes.
3.4.3 Procedure of Slots Allocation
1. The primary step is to allot slots to job pools. Every job pool is organized with two parameters of
maximum slots quota and pool weight. In any case, the count of slots allotted to a job pool wouldn't
exceed its most slots quota. If slots demand for one job pool varies, the utmost slots quota is
manually adjusted by Hadoop operators. If a job pool requests additional slots, the scheduler first
judges whether or not the slots occupance of the pool can exceed the quota. If not, the pool are
appended with the queue and wait for slot allocation. The scheduler allocates the slots by roundrobin algorithm. Probabilistically, a pool with high allocation weight are additional likely to be
allotted with slots.
2. The second step is to allot slots to individual jobs. Every job is organized with a parameter of job
priority that may be a worth between zero and a thousand. The duty priority and deficit are removed
and mixed into a weight of the duty. Inside employment pool, idle slots are allotted to the roles with
the highest weight.
3.5 Encryption/decryption
In this, file get encrypted/decrypted by exploitation the RSA encryption/decryption algorithm
encryption/decryption algorithm uses public key & private key for the encryption and
decipherment of data. Consumer transfer the file in conjunction with some secrete/public key so
private key's generated & file get encrypted. At the reverse method by using the public
key/private key pair file get decrypted and downloaded. Like client upload the file with the public
key and also the file name that is used to come up with the distinctive private key's used for
encrypting the file. During this approach uploaded file get encrypted and store at main servers and so
this file get splitted by using the Fair4s Scheduling algorithm that provides distinctive security
feature for cloud data. In an exceedingly reverse method of downloading the data from cloud servers,
file name and public key wont to generate secrete and combines The all parts of file so data get
decrypted and downloaded that ensures the tremendous quantity of security to cloud information.
18
19
IV.
RESULTS
Our results of the project will be explained well with the help of project work done on
number of clients and one main server and then three to five secondary servers so then we have get
these results bases on three parameters taken into consideration like
1) Time
2) CPU Utilization
3) Storage Utilization.
Our evaluation examines the improved utilization of Cluster nodes i.e. Secondary servers by
uploading and downloading files by using Fair4s scheduling algorithm versus GFS read write
algorithm from three perspectives. First is improved time utilization and second is improved CPU
utilization also the storage utilization also get improved tremendously.
4.1 Results for time utilization
1720
928
1473
1857
253
1859
20
107
170
117
704
38
839
Fig.08 Describes CPU utilization graph on Fair4s Algorithm on number of Cluster nodes in Hadoop.
V.
CONCLUSION
2.
22
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
22.
23.
Y. Chen, S. Alspaugh, and R.H. Katz, Interactive Analytical Processing in Big Data
Systems: A Cross-Industry Study of Mapreduce Workloads, Proc. VLDB Endowment, vol.
5, no. 12, Aug. 2012
Divyakant Agrawal et al., Big Data and Cloud Computing: Current State and Future
Opportunities, EDBT, pp 22-24, March 2011.
Z. Ren, X. Xu, J. Wan, W. Shi, and M. Zhou, Workload Characterization on a Production
Hadoop Cluster: A Case Study on Taobao, in Proc. IEEE IISWC, 2012, pp. 3-13.
Jeffrey Dean et al., MapReduce: simplified data processing on large clusters,
communications of the acm, Vol S1, No. 1, pp.107-113, 2008 January.
Y. Chen, S. Alspaugh, D. Borthakur, and R.H. Katz, Energy Efficiency for Large-Scale
Mapreduce Workloads with Significant Interactive Analysis, in Proc. EuroSys, 2012, pp. 43
56.
Stackoverflow(2014,07,14).HadoopArchitecture Internals: use of job and task
trackers[English].Available:http://stackoverflow.com/questions/11263187/hadoop
architecture-internals-use-of-job-and-task-trackers
S. Kavulya, J. Tan, R. Gandhi, and P. Narasimhan, An Analysis of Traces from a
Production Mapreduce Cluster, in Proc. CCGRID, 2010, pp. 94-103.
J. Dean et al.,MapReduce: a flexible data processing tool,In CACM, Jan 2010.
M. Stonebraker et al., MapReduce and parallel DBMSs: friends or foes? In CACM. Jan
2010.
X. Liu, J. Han, Y. Zhong, C. Han, and X. He, Implementing WebGIS on Hadoop: A Case
Study of Improving Small File I/O Performance on HDFS, in Proc. CLUSTER, 2009, pp. 18.
A. Abouzeid et al., HadoopDB: An Architectural Hybrid of MapReduce and DBMS
Technologies for Analytical Workloads, In VLDB 2009.
S. Das et al., Ricardo: Integrating R and Hadoop, In SIGMOD 2010.
J. Cohen et al.,MAD Skills: New Analysis Practices for Big Data, In VLDB, 2009.
Gaizhen Yang et al., The Application of SaaS-Based Cloud Computing in the University
Research and Teaching Platform, ISIE, pp. 210-213, 2011.
Paul Marshall et al., Improving Utilization of Infrastructure Clouds,IEEE/ACM
International Symposium, pp. 205-2014, 2011.
F. Wang, Q. Xin, B. Hong, S.A. Brandt, E.L. Miller, D.D.E. Long, and T.T. Mclarty, File
System Workload Analysis for Large Scale Scientific Computing Applications, in Proc.
MSST, 2004,
]pp. 139-152.[23] M. Zaharia, D. Borthakur, J.S. Sarma, K. Elmeleegy, S. Shenker, andI.
Stoica, Delay Scheduling: A Simple Technique for AchievingLocality and Fairness in
Cluster Scheduling, in Proc. EuroSys, 2010, pp. 265-278.
E. Medernach, Workload Analysis of a Cluster in a Grid Environment, in Proc. Job
Scheduling Strategies Parallel Process. 2005, pp. 36-61
K. Christodoulopoulos, V. Gkamas, and E.A. Varvarigos, Statistical Analysis and Modeling
of Jobs in a Grid Environment, J. Grid Computing, vol. 6, no. 1, 2008.
Gandhali Upadhye and Astt. Prof. Trupti Dange, Nephele: Efficient Data Processing Using
Hadoop International journal of Computer Engineering & Technology (IJCET), Volume 5,
Issue 7, 2014, pp. 11 - 16, ISSN Print: 0976 6367, ISSN Online: 0976 6375.
Suhas V. Ambade and Prof. Priya Deshpande, Hadoop Block Placement Policy For
Different File Formats International journal of Computer Engineering & Technology
(IJCET), Volume 5, Issue 12, 2014, pp. 249 - 256, ISSN Print: 0976 6367, ISSN Online:
0976 6375.
23