Professional Documents
Culture Documents
Abstract
Many schemes have been recently advanced for storing data on multiple clouds.
Distributing data over different cloud storage providers (CSPs) automatically provides
users with a certain degree of information leakage control, for no single point of attack can
leak all the information. However, unplanned distribution of data chunks can lead to high
information disclosure even while using multiple clouds. In this paper, we study an
important information leakage problem caused by unplanned data distribution in
multicloud storage services. Then, we present StoreSim, an information leakage aware
storage system in multicloud. StoreSim aims to store syntactically similar data on the same
cloud, thus minimizing the user’s information leakage across multiple clouds. We design
an approximate algorithm to efficiently generate similarity-preserving signatures for data
chunks based on MinHash and Bloom filter, and also design a function to compute the
information leakage based on these signatures. Next, we present an effective storage plan
generation algorithm based on clustering for distributing data chunks with minimal
information leakage across multiple clouds.
1
1. INTRODUCTION
1.1 INTRODUCTION:
With the increasingly rapid uptake of devices such as laptops, cellphones and tablets, users
require ubiquitous and massive network storage to handle their ever-growing digital lives.
To meet these demands, many cloud-based storage and file sharing services such as
Dropbox, Google Drive and Amazon S3, have gained popularity due to the easy-to-use
interface and low storage cost. However, these centralized cloud storage services are
criticized for grabbing the control of users’ data, which allows storage providers to run
analytics for marketing and advertising . Also, the information in users’ data can be leaked
e.g., by means of malicious insiders, backdoors, bribe and coercion. One possible solution
to reduce the risk of information leakage is to employ multicloud storage systems in which
no single point of attack can leak all the information. A malicious entity, such as the one
revealed in recent attacks on privacy, would be required to coerce all the different CSPs on
which a user might place her data, in order to get a complete picture of her data. Put
simply, as the saying goes, do not put all the eggs in one basket.
Yet, the situation is not so simple. CSPs such as Dropbox, among many others, employ
rsync-like protocols [7] to synchronize the local file to remote file in their centralized
clouds [8]. Every local file is partitioned into small chunks and these chunks are hashed
with fingerprinting algorithms such as SHA-1, MD5. Thus, a file’s contents can be
uniquely identified by this list of hashes. For each update of local file, only chunks with
changed hashes will be uploaded to the cloud. This synchronization based on hashes is
different from diff -like protocols that are based on comparing two versions of the same
file line by line and can detect the exact updates and only upload these updates in a patch
style [7].
Instead, the hash-based synchronization model needs to upload the whole chunks with
changed hashes to the cloud. Thus, in the multicloud environment, two chunks differing
only very slightly can be distributed to two different clouds. The following motivating
example will show that if chunks of a user’s data are assigned to different CSPs in an
unplanned manner, the information leaked to each CSP can be higher than expected.
2
Suppose that we have a storage service with three CSPs S1; S2; S3 and a user’s dataset D.
All the user’s data will be firstly chunked and then uploaded to different clouds. The
dataset D is represented as a set of hashes generated by each data chunk. This scenario is
shown in Figure 1. In addition, we consider that the data chunks are distributed to different
clouds in a round robin (RR) way.
Apparently, RR is good for balancing the storage load and each cloud thus obtains the
same amount of data. However, the same amount of datadoes not necessarily mean the
same amount of information. For example, if we find that the set of chunks fC3;C6;C9g are
almost same, it means S3 actually obtains the information equivalent to that in only one
chunk. If all other chunks are different, S1 and S2 obtain three times as much information
of data. The problem does not exist in a single storage cloud such as Dropbox since users
have no other choice but to give all their information to only one cloud.
When the storage is in the multicloud, we have the opportunity to minimize the total
information that is leaked to each CSP. The optimal case is that each CSP obtains the same
amount of information. In our example, data distribution based on RR can achieve the
optimal result only if all the chunks are different. However this is not the case in cloud
storage service due to two reasons: 1) Frequent modifications of files by users result in
large amount of similar chunks1; and 2) Similar chunks across files, due to which existing
CSPs use the data deduplication technique.
Centralized cloud storage services are criticized for grabbing the control of users’ data,
which allows storage providers to run analytics for marketing and advertising . Also, the
information in users’ data can be leaked e.g., by means of malicious insiders, backdoors,
bribe and coercion. One possible solution to reduce the risk of information leakage is to
employ multicloud storage systems in which no single point of attack can leak all the
information.
Instead, the hash-based synchronization model needs to upload the whole chunks with
changed hashes to the cloud. Thus, in the multicloud environment, two chunks differing
3
only very slightly can be distributed to two different clouds. The following motivating
example will show that if chunks of a user’s data are assigned to different CSPs in an
unplanned manner, the information leaked to each CSP can be higher than expected.
Suppose that we have a storage service with three CSPs S1; S2; S3 and a user’s dataset D.
All the user’s data will be firstly chunked and then uploaded to different clouds. The
dataset D is represented as a set of hashes generated by each data chunk. This scenario is
shown in Figure 1. In addition, we consider that the data chunks are distributed to different
clouds.
4
2. LITERATURE SURVEY
To provide fault tolerance for cloud storage, recent studies propose to stripe data across
multiple cloud vendors. However, if a cloud suffers from a permanent failure and loses all
its data, we need to repair the lost data with the help of the other surviving clouds to
preserve data redundancy. We present a proxy-based storage system for fault-tolerant
multiple-cloud storage called NCCloud, which achieves cost-effective repair for a
permanent single-cloud failure. NCCloud is built on top of a network-coding-based storage
scheme called the functional minimum-storage regenerating (FMSR) codes, which
maintain the same fault tolerance and data redundancy as in traditional erasure codes (e.g.,
RAID-6), but use less repair traffic and, hence, incur less monetary cost due to data
transfer. One key design feature of our FMSR codes is that we relax the encoding
requirement of storage nodes during repair, while preserving the benefits of network
5
coding in repair. We implement a proof-of-concept prototype of NCCloud and deploy it
atop both local and commercial clouds. We validate that FMSR codes provide significant
monetary cost savings in repair over RAID-6 codes, while having comparable response
time performance in normal cloud storage operations such as upload/download.
A growing amount of data is produced daily resulting in a growing demand for storage
solutions. While cloud storage providers offer a virtually infinite storage capacity, data
owners seek geographical and provider diversity in data placement, in order to avoid
vendor lock-in and to increase availability and durability. Moreover, depending on the
customer data access pattern, a certain cloud provider may be cheaper than another. In this
paper, we introduce Scalia, a cloud storage brokerage solution that continuously adapts the
placement of data based on its access pattern and subject to optimization objectives, such
as storage costs. Scalia efficiently considers repositioning of only selected objects that may
significantly lower the storage cost. By extensive simulation experiments, we prove the
cost-effectiveness of Scalia against static placements and its proximity to the ideal data
placement in various scenarios of data access patterns, of available cloud storage solutions
and of failures.
Delta compression and remote file synchronization techniques are concerned with efficient
file transfer over a slow communication link in the case where the receiving party already
has a similar file (or files). This problem arises naturally, e.g., when distributing updated
versions of software over a network or synchronizing personal files between different
accounts and devices. More generally, the problem is becoming increasingly common in
many networkbased applications where files and content are widely replicated, frequently
modified, and cut and reassembled in different contexts and packagings.
6
3. SYSTEM ANALYSIS
The Systems Development Life Cycle (SDLC), or Software Development Life Cycle in
systems engineering, information systems and software engineering, is the process of
creating or altering systems, and the models and methodologies that people use to develop
these systems. In software engineering the SDLC concept underpins many kinds of
software development methodologies.
In fact, the data deduplication technique, which is widely adopted by current cloud
storage services in existing clouds, is one example of exploiting the similarities among
different data chunks to save disk space and avoid data retransmission . It identifies the
same data chunks by their fingerprints which are generated by fingerprinting
algorithms such as SHA-1, MD5. Any change to the data will produce a very different
fingerprint with high probability . However, these fingerprints can only detect whether
or not the data nodes are duplicate, which is only good for exact equality testing.
Determining identical chunks is relatively straightforward but efficiently determining
similarity between chunks is an intricate task due to the lack of similarity preserving
fingerprints (or signatures).
Unplanned distribution of data chunks can lead to high information disclosure even
while using multiple clouds.
Frequent modifications of files by users result in large amount of similar chunks1;
Similar chunks across files, due to which existing CSPs use the data de duplication
technique.
7
3.3 PROPOSED SYSTEM:
We present StoreSim, an information leakage aware multicloud storage system
which incorporates three important distributed entities and we also formulate
information leakage optimization problem in multicloud.
We propose an approximate algorithm, BFSMinHash, based on Minhash to
generate similarity-preserving signatures for data chunks.
Based on the information match measured by BFSMinHash, we develop an
efficient storage plan generation algorithm, Clustering, for distributing users data to
different clouds.
8
4. IMPLEMENTATION
Initially Java language was called as “oak” but it was renamed as “java” in 1995.The
primary motivation of this language was the need for a platform-independent i.e.
architecture neutral language that could be used to create software to be embedded in
various consumer electronic devices.
An application is a program that runs on our Computer under the operating system of that
computer. It is more or less like one creating using C or C++ .Java’s ability to create
Applets makes it important. An Applet I san application, designed to be transmitted over
the Internet and executed by a Java-compatible web browser. An applet I actually a tiny
Java program, dynamically downloaded across the network, just like an image. But the
difference is, it is an intelligent program, not just a media file. It can be react to the user
input and dynamically change.
Java Architecture
Compilation of code
When we compile the code, the Java compiler creates machine code called byte code for a
hypothetical machine called Java Virtual Machine (JVM). Compiling and interpreting java
source code.
9
Fig 4.1: Structure of compilation
During run-time the Java interpreter tricks the byte code file into thinking that it is running
on a Java Virtual Machine. In reality this could be an Intel Pentium windows 95 or sun
SPARCstation running Solaris or Apple Macintosh running system and all could receive
code from any computer through internet and run the Applets.
Simple
Java was designed to be easy for the Professional programmer to learn and to use
effectively. If you are an experienced C++ Programmer, learning Java will oriented
features of C++. Most of the confusing concepts from C++ are either left out of Java or
implemented in a cleaner, more approachable manner. In Java there are a small number of
clearly defined ways to accomplish a given task.
Object oriented
Java was not designed to be source-code compatible with any other language. This allowed
the Java team the freedom to design with a blank state. One outcome of this was a clean
usable, pragmatic approach to objects. The object model in Java is simple and easy to
extend, while simple types, such as integers, are kept as high-performance non-objects.
10
Robust
MODULES:
1. DATA OWNER
2. METADATA SERVER
4. DATA USER
DATA OWNER:
Data owner has option to modify data which is uploaded to cloud. In this
process when user updates data stored in cloud1 with data which is already
available in cloud2 then total data will be visible in cloud1 only. In order
to solve this problem owner will check data similarity using minhash and
11
data matching percentage is calculated and refer to user where to upload
data.
METADATA SERVER:
Metadata servers are used to store the metadata database about the
information of files, CSPs and users,which usually are structured data
representing the whole cloud file system.
In this module, we design the Cloud functionalities. The Cloud entity can
view all customer details, file upload details and customer file download
details. In this module, we use the DriveHQ Cloud Service API for the
Cloud Integration and develop the project.
DATA USER
Data user module is used to check user data download by requesting file from
cloud.The read routine fetches the stored value v from the servers. For each j
∈[1 . . . s], piece vj is downloaded from server Sj and all pieces are combined
into v..
12
5. SOFTWARE REQUIREMENT SPECIFICATION
Requirement Specification provides a high secure storage to the web server efficiently.
Software requirements deal with software and hardware resources that need to be installed
on a serve which provides optimal functioning for the application. These software and
hardware requirements need to be installed before the packages are installed. These are the
most common set of requirements defined by any operation system. These software and
hardware requirements provide a compatible support to the operation system in developing
an application.
The hardware requirement specifies each interface of the software elements and the
hardware elements of the system. These hardware requirements include configuration
characteristics.
System : Pentium IV 2.4 GHz.
Hard Disk : 100 GB.
Monitor : 15 VGA Color.
Mouse : Logitech.
RAM : 1 GB.
The software requirements specify the use of all required software products like data
management system. The required software product specifies the numbers and version.
Each interface specifies the purpose of the interfacing software as related to this software
product.
13
Operating system : Windows XP/7/10
Database : MySQL
IDE : Netbeans
Server : Tomcat 7.0
All the other requirements which do not form a part of the above specification are
categorized as Non-Functional needs. A system perhaps needed to gift the user with a
show of the quantity of records during info. If the quantity must be updated in real
time, the system architects should make sure that the system is capable of change the
displayed record count at intervals associate tolerably short interval of the quantity of
records dynamic. Comfortable network information measure may additionally be a
non-functional requirement of a system.
Accessibility
Availability
Backup
14
Certification
Compliance
Configuration Management
Documentation
Disaster Recovery
Interoperability
15
5.5 Feasibility Study:
Preliminary investigation examines project feasibility; the likelihood the system will be
useful to the organization. The main objective of the feasibility study is to test the
Technical, Operational and Economical feasibility for adding new modules and debugging
old running system. All systems are feasible if they are given unlimited resources and
infinite time. There are aspects in the feasibility study portion of the preliminary
investigation:
Technical Feasibility
Operation Feasibility
Economical Feasibility
The technical issue usually raised during the feasibility stage of the investigation includes
the following:
Does the necessary technology exist to do what is suggested?
Do the proposed equipments have the technical capacity to hold the data required to
use the new system?
Will the proposed system provide adequate response to inquiries, regardless of the
number or location of users?
Can the system be upgraded if developed?
Are there technical guarantees of accuracy, reliability, ease of access and data security?
User-friendly
Customer will use the forms for their various transactions i.e. for adding new routes,
viewing the routes details. Also the Customer wants the reports to view the various
transactions based on the constraints. These forms and reports are generated as user-
friendly to the Client.
16
Reliability
The package wills pick-up current transactions on line. Regarding the old transactions,
User will enter them in to the system.
Security
The web server and database server should be protected from hacking, virus etc
Portability
The application will be developed using standard open source software (Except Oracle)
like Java, tomcat web server, Internet Explorer Browser etc these software will work both
on Windows and Linux o/s. Hence portability problems will not arise.
Availability
This software will be available always.
Maintainability
The system uses the 2-tier architecture. The 1st tier is the GUI, which is said to be front-
end and the 2nd tier is the database, which uses My-Sql, which is the back-end.
The front-end can be run on different systems (clients). The database will be running at the
server. Users access these forms by using the user-ids and the passwords.
The computerized system takes care of the present existing system’s data flow and
procedures completely and should generate all the reports of the manual system besides a
host of other management reports.
It should be built as a web based application with separate web server and database server.
This is required as the activities are spread throughout the organization customer wants a
centralized database. Further some of the linked transactions take place in different
locations.
Open source software like TOMCAT, JAVA, Mysql and Linux is used to minimize the
cost for the Customer.
17
6. Methodology
Umbrella
DOCUMENT CONTROL
Activity
• Feasibility Study
• TEAM FORMATION
• Project Specification
Requirements PREPARATION ANALYSIS & ASSESSMENT
Gathering DESIGN CODE UNIT TEST
INTEGRATION ACCEPTANCE
& SYSTEM DELIVERY/INS TEST
TESTING TALLATION
Umbrella
TRAINING
Activity
SDLC is nothing but Software Development Life Cycle. It is a standard which is used by
software industry to develop good software.
The requirements gathering process takes as its input the goals identified in the high-level
requirements section of the project plan. Each goal will be refined into a set of one or more
requirements. These requirements define the major functions of the intended application,
define operational data areas and reference data areas, and define the initial data entities.
Major functions include critical processes to be managed, as well as mission critical inputs,
18
outputs and reports. A user class hierarchy is developed and associated with these major
functions, data areas, and data entities. Each of these definitions is termed a Requirement.
Requirements are identified by unique requirement identifiers and, at minimum, contain a
requirement title and textual description.
These requirements are fully described in the primary deliverables for this stage: the
Requirements Document and the Requirements Traceability Matrix (RTM). The
requirements document contains complete descriptions of each requirement, including
diagrams and references to external documents as necessary. Note that detailed listings of
database tables and fields are not included in the requirements document.
The title of each requirement is also placed into the first version of the RTM, along with
the title of each goal from the project plan. The purpose of the RTM is to show that the
product components developed during each stage of the software development lifecycle are
formally connected to the components developed in prior stages.
In the requirements stage, the RTM consists of a list of high-level requirements, or goals,
by title, with a listing of associated requirements for each goal, listed by requirement title.
In this hierarchical listing, the RTM shows that each requirement developed during this
19
stage is formally linked to a specific product goal. In this format, each requirement can be
traced to a specific product goal, hence the term requirements traceability.
The outputs of the requirements definition stage include the requirements document, the
RTM, and an updated project plan.
Analysis Stage
The planning stage establishes a bird's eye view of the intended software product, and uses
this to establish the basic project structure, evaluate feasibility and risks associated with the
project, and describe appropriate management and technical approaches.
20
The most critical section of the project plan is a listing of high-level product requirements,
also referred to as goals. All of the software product requirements to be developed during
the requirements definition stage flow from one or more of these goals. The minimum
information for each goal consists of a title and textual description, although additional
information and references to external documents may be included. The outputs of the
project planning stage are the configuration management plan, the quality assurance plan,
and the project plan and schedule, with a detailed listing of scheduled activities for the
upcoming Requirements stage, and high level estimates of effort for the out stages.
Designing Stage
The design stage takes as its initial input the requirements identified in the approved
requirements document. For each requirement, a set of one or more design elements will be
produced as a result of interviews, workshops, and/or prototype efforts. Design elements
describe the desired software features in detail, and generally include functional hierarchy
diagrams, screen layout diagrams, tables of business rules, business process diagrams,
pseudo code, and a complete entity-relationship diagram with a full data dictionary. These
design elements are intended to describe the software in sufficient detail that skilled
programmers may develop the software with minimal additional input.
21
Fig no. 6.4 Designing stage
When the design document is finalized and accepted, the RTM is updated to show that
each design element is formally associated with a specific requirement. The outputs of the
design stage are the design document, an updated RTM, and an updated project plan.
The development stage takes as its primary input the design elements described in the
approved design document. For each design element, a set of one or more software artifacts
will be produced. Software artifacts include but are not limited to menus, dialogs, data
management forms, data reporting formats, and specialized procedures and functions.
Appropriate test cases will be developed for each set of functionally related software
artifacts, and an online help system will be developed to guide users in their interactions
with the software.
22
Fig no. 6.5 Coding stage
During the integration and test stage, the software artifacts, online help, and test data are
migrated from the development environment to a separate test environment. At this point,
all test cases are run to verify the correctness and completeness of the software. Successful
execution of the test suite confirms a robust and complete migration capability. During this
stage, reference data is finalized for production use and production users are identified and
linked to their appropriate roles. The final reference data (or links to reference data source
files) and production user list are compiled into the Production Initiation Plan.
23
Fig no. 6.6 Integration and Testing Stage
During the installation and acceptance stage, the software artifacts, online help, and initial
production data are loaded onto the production server. At this point, all test cases are run to
verify the correctness and completeness of the software. Successful execution of the test
suite is a prerequisite to acceptance of the software by the customer.
After customer personnel have verified that the initial production data load is correct and
the test suite has been executed with satisfactory results, the customer formally accepts the
delivery of the software.
24
Maintenance
Outer rectangle represents maintenance of a project, Maintenance team will start with
requirement study, understanding of documentation later employees will be assigned work
and they will undergo training on that particular assigned category.
25
7. System Design
The purpose of the design phase is to arrange an answer of the matter such as by the
necessity document. This part is that the opening moves in moving the matter domain to
the answer domain. The design phase satisfies the requirements of the system. The design
of a system is probably the foremost crucial issue warm heartedness the standard of the
software package. It’s a serious impact on the later part, notably testing and maintenance.
The output of this part is that the style of the document. This document is analogous to a
blueprint of answer and is employed later throughout implementation, testing and
maintenance. The design activity is commonly divided into 2 separate phases System
Design and Detailed Design.
System Design conjointly referred to as top-ranking style aims to spot the modules that
ought to be within the system, the specifications of those modules, and the way them
move with one another to supply the specified results.
At the top of the system style all the main knowledge structures, file formats, output
formats, and also the major modules within the system and their specifications square
measure set. System design is that the method or art of process the design, components,
modules, interfaces, and knowledge for a system to satisfy such as needs. Users will read
it because the application of systems theory to development.
Detailed Design, the inner logic of every of the modules laid out in system design is
determined. Throughout this part, the small print of the info of a module square measure
sometimes laid out in a high-level style description language that is freelance of the target
language within which the software package can eventually be enforced.
In system design the main target is on distinguishing the modules, whereas throughout
careful style the main target is on planning the logic for every of the modules.
26
Figure 7.1: Architecture diagram
Data Flow Diagram can also be termed as bubble chart. It is a pictorial or graphical
form, which can be applied to represent the input data to a system and multiple functions
carried out on the data and the generated output by the system.
A graphical tool accustomed describe and analyze the instant of knowledge through
a system manual or automatic together with the method, stores of knowledge, and delays
within the system. The transformation of knowledge from input to output, through
processes, is also delineate logically and severally of the physical elements related to the
system. The DFD is also known as a data flow graph or a bubble chart.The BasicNotation
used to create a DFD’s are as follows:
Dataflow:
27
Process:
.
Source:
Data Store:
Rhombus: decision
This view represents the system from the users perspective. The analysis representation
describes a usage scenario from the end-users perspective.
In this model the data and functionality are arrived from inside the system. This model
28
view models the static structures.
It represents the dynamic of behavioral as parts of the system, depicting the interactions of
collection between various structural elements described in the user model and structural
model view.
In this the structural and behavioral as parts of the system are represented as they are to be
built.
29
30
31
Figure 7.3.1 Use Case Diagram
The class diagram is the main building block of object oriented modeling. It is used both
for general conceptual modeling of the systematic of the application, and for detailed
modeling translating the models into programming code. Class diagrams can also be used
for data modeling. The classes in a class diagram represent both the main objects,
interactions in the application and the classes to be programmed. A class with three
sections, in the diagram, classes is represented with boxes which contain three parts:
32
Figure 7.3.2: Class Diagram.
5.3.3 SEQUENCEDIAGRAM
33
34
Figure 7.3.3: Sequence diagram
35
5.3.4 ACTIVITY DIAGRAM
36
Figure 7.3.4: Activity Diagram
5.3.5 ER Diagram
37
Figure 7.3.5: ER Diagram
38
8. TESTING
Testing is the process where the test data is prepared and is used for testing the modules
individually and later the validation given for the fields. Then the system testing takes place
which makes sure that all components of the system property functions as a unit. The test
data should be chosen such that it passed through all possible condition. The following is
the description of the testing strategies, which were carried out during the testing period.
Testing has become an integral part of any system or project especially in the field of
information technology. The importance of testing is a method of justifying, if one is ready
to move further, be it to be check if one is capable to with stand the rigors of a particular
situation cannot be underplayed and that is why testing before development is so critical.
When the software is developed before it is given to user to user the software must be
tested whether it is solving the purpose for which it is developed. This testing involves
various types through which one can ensure the software is reliable. The program was
tested logically and pattern of execution of the program for a set of data are repeated. Thus
the code was exhaustively checked for all possible correct data and the outcomes were also
checked.
To locate errors, each module is tested individually. This enables us to detect error and
correct it without affecting any other modules. Whenever the program is not satisfying the
required function, it must be corrected to get the required result. Thus all the modules are
individually tested from bottom up starting with the smallest and lowest modules and
proceeding to the next level. Each module in the system is tested separately. For example
the job classification module is tested separately. This module is tested with different job
and its approximate execution time and the result of the test is compared with the results
that are prepared manually. Each module in the system is tested separately. In this system
the resource classification and job scheduling modules are tested separately and their
39
corresponding results are obtained which reduces the process waiting time.
After the module testing, the integration testing is applied. When linking the modules there
may be chance for errors to occur, these errors are corrected by using this testing. In this
system all modules are connected and tested. The testing results are very correct. Thus the
mapping of jobs with resources is done correctly by the system
When that user fined no major problems with its accuracy, the system passers through a
final acceptance test. This test confirms that the system needs the original goals, objectives
and requirements established during analysis without actual execution which elimination
wastage of time and money acceptance tests on the shoulders of users and management, it
is finally acceptable and ready for the operation.
40
8.5 TEST CASES:
41
04 Prediction Whether If not Random Random High High
Random Prediction applied tree is not tree is
Forest algorithm generated generated
applied on
the data or
not
05 Recomme Whether If not It cannot It can view High High
ndation predicted displayed view prediction
data is prediction containing
displayed containing patient data
or not patient
data
06 Noisy Whether If graph is It does not It shows Low Mediu
Records the graph not show the the m
Chart is displayed variations variations
displayed in in between
or not between clean and
clean and noisy
noisy records
records
42
9. SCREEN SHOTS
43
Fig: Client Registration
44
Fig: Client Home
45
Fig: View Files and split
46
Fig: Encrypt Data
47
Fig: Modify Files
48
Fig: Modify Cloud2 Files
49
Fig: Modify data
50
Fig: download Files
51
Fig: download File
52
Fig: Meta Data server Home
53
Fig: View Cloud2 Files
54
Fig: storage Server1 Home
55
Fig: View Client requests and Sent Cloud1 Key
56
Fig: View Cloud2 Files
57
10. CONCLUSION
Distributing data on multiple clouds provides users with a certain degree of information
leakage control in that no single cloud provider is privy to the entire user’s data. However,
unplanned distribution of data chunks can lead to avoidable information leakage. We show
that distributing data chunks in a round robin way can leak user’s data as high as 80% of
the total information with the increase in the number of data synchronization. To optimize
the information leakage, we presented the StoreSim, an information leakage aware storage
system in the multicloud. Store Sim achieves this goal by using novel algorithms,
BFSMinHash and SPClustering, which place the data with minimal information leakage
(based on similarity) on the same cloud. Through an extensive evaluation based on two real
datasets, we demonstrate that StoreSim is both effective and efficient (in terms of time and
storage space) in minimizing information leakage during the process of synchronization in
multicloud. We show that our StoreSim can achieve near-optimal performance and reduce
information leakage up to 60% compared to unplanned placement. Finally, through our
attackability analysis, we further demonstrate that StoreSim not only reduces the risk of
wholesale information leakage but also makes attacks on retail information much more
complex.
Future Enhancements:
It is not possible to develop a system that makes all the requirements of the user. User
requirements keep changing as the system is being used. Some of the future enhancements
that can be done to this system are:
58
11. REFERENCES
6. G. Greenwald and E. MacAskill, “Nsa prism program taps in to user data of apple,
google and others,” The Guardian, vol. 7, no. 6, pp. 1–43, 2013.
7. T. Suel and N. Memon, “Algorithms for delta compression and remote file
synchronization,” 2002.
59
8. [8] I. Drago, E. Bocchi, M. Mellia, H. Slatman, and A. Pras, “Benchmarking
personal cloud storage,” in Proceedings of the 2013 conference on Internet
9. measurement conference. ACM, 2013, pp. 205–212.
10. [9] I. Drago, M. Mellia, M.MMunafo, A. Sperotto, R. Sadre, and A. Pras, “Inside
dropbox: understanding personal cloud storage services,” in Proceedings of the
2012 ACM conference on Internet measurement conference. ACM, 2012, pp. 481–
494.
11. [10] U. Manber et al., “Finding similar files in a large file system.” in Usenix
Winter, vol. 94, 1994, pp. 1–10.
12. [11] P. Mahajan, S. Setty, S. Lee, A. Clement, L. Alvisi, M. Dahlin, and M.Walfish,
“Depot: Cloud storage with minimal trust,” ACM Transactions on Computer
Systems (TOCS), vol. 29, no. 4, p. 12, 2011.
15. [14] L. P. Cox and B. D. Noble, “Samsara: Honor among thieves in peer-to-peer
storage,” ACM SIGOPS Operating Systems Review, vol. 37, no. 5, pp. 120–132,
2003.
60
16. [15] H. Zhuang, R. Rahman, and K. Aberer, “Decentralizing the cloud: How can
small data centers cooperate?” in Peer-to-Peer Computing (P2P), 14-th IEEE
International Conference on. Ieee, 2014, pp. 1–10.
22. [19] J.-M. Bohli, N. Gruschka, M. Jensen, L. L. Iacono, and N. Marnau, “Security
and privacy-enhancing multicloud architectures,” Dependable and Secure
Computing, IEEE Transactions on, vol. 10, no. 4, pp. 212–224, 2013.
BIBLIOGRAPHY
JAVA Technologies
61
Java Script Programming by Yehuda Shiran
HTML
JDBC
SAMPLE CODE
DB Connection Code:
package databasecon;
import java.sql.Connection;
import java.sql.DriverManager;
62
rformance", "root", "root");
} catch (Exception ex) {
ex.printStackTrace();
}
return con;
}
}
UploadServlet:
package databasecon;
import static
com.sun.corba.se.spi.presentation.rmi.StubAdapter.request;
import java.io.IOException;
import java.io.PrintWriter;
import java.sql.CallableStatement;
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
import java.util.ArrayList;
import java.util.Vector;
import javax.servlet.RequestDispatcher;
import javax.servlet.ServletException;
import javax.servlet.http.HttpServlet;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;
import javazoom.upload.*;
/**
*
63
* @author ashwini
*/
public class UploadServlet extends HttpServlet {
try {
ResultSet rs1;
String k = nreq.getParameter("course");
String l = nreq.getParameter("date");
String m = nreq.getParameter("time");
String n = nreq.getParameter("duration");
String s=k+".jpg";
String jj = s.toLowerCase();
64
upb.setFolderstore(resPath);
upb.setOverwrite(false);
upb.store(nreq, "efile");
System.out.println("material " +
filesName.get(0));
Class.forName("com.mysql.jdbc.Driver");
Connection con =
DriverManager.getConnection("jdbc:mysql://localhost:3306/studentpe
rformance", "root", "root");
pstm = con.prepareStatement("insert into details
(course,dt, tm, du, fl,image)values(?,?,?,?,?,?)");
System.out.println("========================");
pstm.setString(1, k);
pstm.setString(2, l);
pstm.setString(3, m);
pstm.setString(4, n);
65
System.out.println("hi---------------");
// rs1=mycall.executeQuery();
// rs1=mycall.getResultSet();
int i = pstm.executeUpdate();
if (i>0) {
out.println("Successfully Uploaded ");
RequestDispatcher rs =
req.getRequestDispatcher("addcoursedetails.jsp?m1=Uploded
Succes");
rs.forward(req, res);
} else {
out.println("File Upload Failed");
}
} catch (Exception e) {
RequestDispatcher rs =
req.getRequestDispatcher("addcoursedetails.jsp?m3=Uploded
Failed");
rs.forward(req, res);
}
}
Mail Code:
/*
* To change this license header, choose License Headers in
Project Properties.
66
* To change this template file, choose Tools | Templates
* and open the template in the editor.
*/
package Mail;
import java.util.Properties;
import javax.mail.Message;
import javax.mail.MessagingException;
import javax.mail.PasswordAuthentication;
import javax.mail.Session;
import javax.mail.Transport;
import javax.mail.internet.InternetAddress;
import javax.mail.internet.MimeMessage;
/**
*
* @author java2
*/
public class Mail {
67
getPasswordAuthentication() {
return new
PasswordAuthentication("projects9876@gmail.com", "projects123");
}
});
Transport.send(message);
System.out.println("Done");
return true;
} catch (MessagingException e) {
System.out.println(e);
e.printStackTrace();
return false;
// throw new RuntimeException(e);
}
}
DB Designer Code:
68
/*
* To change this template, choose Tools | Templates
* and open the template in the editor.
*/
package Db_register;
import java.io.IOException;
import java.io.PrintWriter;
import java.sql.Connection;
import java.sql.Statement;
import java.util.Random;
import javax.servlet.ServletException;
import javax.servlet.annotation.WebServlet;
import javax.servlet.http.HttpServlet;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;
import databaseconnection.databasecon;
import javax.servlet.http.HttpSession;
/**
*
* @author java3
*/
/**
* Processes requests for both HTTP
* <code>GET</code> and
* <code>POST</code> methods.
*
* @param request servlet request
* @param response servlet response
69
* @throws ServletException if a servlet-specific error occurs
* @throws IOException if an I/O error occurs
*/
@Override
public void doGet(HttpServletRequest request,
HttpServletResponse response)
throws IOException{
response.setContentType("text/html;charset=UTF-8");
PrintWriter out = response.getWriter();
Connection con = null;
Statement st = null;
String username = request.getParameter("username");
String password = request.getParameter("password");
String email = request.getParameter("email");
String dob = request.getParameter("dob");
String gender = request.getParameter("gender");
String mobile = request.getParameter("mobile");
try {
Random s = new Random();
int a = s.nextInt(10000 - 5000) +25000 ;
System.out.print(a);
con = databasecon.getconnection();
st = con.createStatement();
int i = st.executeUpdate("insert into patient values ('" +
username + "','" + password + "','" + email + "','" + dob + "','"
+ gender + "','" + mobile + "','"+address+"')");
if (i != 0) {
HttpSession session = request.getSession();
session.setAttribute("username", username);
70
response.sendRedirect("owner.jsp?owner=success");
} else {
response.sendRedirect("ownerreg.jsp?msgg=failed");
}
} catch (Exception ex) {
ex.printStackTrace();
}
}}
Login page:
/*
* To change this template, choose Tools | Templates
* and open the template in the editor.
*/
package Db_register;
import java.io.IOException;
import java.io.PrintWriter;
import java.sql.Connection;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.sql.Statement;
import java.util.Random;
import javax.crypto.Cipher;
import javax.crypto.SecretKey;
import javax.servlet.ServletException;
import javax.servlet.annotation.WebServlet;
import javax.servlet.http.HttpServlet;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;
import databaseconnection.databasecon;
71
import sun.misc.BASE64Encoder;
/**
*
* @author java3
*/
/**
* Processes requests for both HTTP
* <code>GET</code> and
* <code>POST</code> methods.
*
* @param request servlet request
* @param response servlet response
* @throws ServletException if a servlet-specific error occurs
* @throws IOException if an I/O error occurs
*/
//@Override
public class login {
72
{
return true;
}
else{
return false;
}
73