Professional Documents
Culture Documents
Rohit Handa
Lecturer, CSE-IT Department
IBM-ICE Program, BUEST Baddi
Topic: Cloud Storage & Cloud Standards: Overview, Storage as a service, Cloud storage
issues, Challenges, Standards
Topic-1: Cloud Storage
Cloud Storage is a service where data is remotely maintained, managed, and backed
up. The service is available to users over a network, which is usually the internet. It
allows the user to store files online so that the user can access them from any location
via the internet. The provider company makes them available to the user online by
keeping the uploaded files on an external server. This gives companies using cloud
storage services ease and convenience, but can potentially be costly.
Users should also be aware that backing up their data is still required when using
cloud storage services, because recovering data from cloud storage is much slower
than local backup.
Cloud storage refers to saving data to an off-site storage system maintained by a third
party. Instead of storing information to your computer's hard drive or other local
storage device, you save it to a remote database.
Cloud storage is a model of networked enterprise storage where data is stored in
virtualized pools of storage which are generally hosted by third
parties. Hosting companies operate large data centers, and people who require their
data to be hosted buy or lease storage capacity from them.
The data center operators, in the background, virtualize the resources according to
the requirements of the customer and expose them as storage pools, which the
customers can themselves use to store files or data objects.
Physically, the resource may span across multiple servers and multiple locations. The
safety of the files depends upon the hosting companies, and on the applications that
leverage the cloud storage.
Cloud storage services may be accessed through a web service application
programming interface (API) or by applications that utilize the API, such as cloud
desktop storage, a cloud storage gateway or Web-
based content management systems.
Most cloud storage providers started providing users
with the convenience of on-the-go access to their
cloud sync folder from multiple mobile devices.
For some computer owners, finding enough storage
space to hold all the data they've acquired is a real
challenge. Some people invest in larger hard drives.
Others prefer external storage devices like thumb
drives or compact discs.
Desperate computer owners might delete entire
folders worth of old files in order to make space for
new information. But some are choosing to rely on a growing trend: cloud storage.
The facilities that house cloud storage systems are called data centers. At its most
basic level, a cloud storage system needs just one data server connected to
the Internet. A client (e.g., a computer user subscribing to a cloud storage service)
sends copies of files over the Internet to the data server, which then records the
information.
Er. Rohit Handa
Lecturer, CSE-IT Department
IBM-ICE Program, BUEST Baddi
When the client wishes to retrieve the information, he or she accesses the data server
through a Web-based interface. The server then either sends the files back to the
client or allows the client to access and manipulate the files on the server itself.
Cloud storage systems generally rely on hundreds of data servers. Because computers
occasionally require maintenance or repair, it's important to store the same
information on multiple machines. This is called redundancy. Without redundancy, a
cloud storage system couldn't ensure clients that they could access their information
at any given time. Most systems store the same data on servers that use
different power supplies. That way, clients can access their data even if one power
supply fails.
Not all cloud storage clients are worried about running out of storage space. They use
cloud storage as a way to create backups of data. If something happens to the client's
computer system, the data survives off-site.
Cloud storage is based on highly virtualized infrastructure and has the same
characteristics as cloud computing in terms of agility, scalability, elasticity and multi-
tenancy, and is available both off-premises and on-premises.
Cloud storage is:
o Made up of many distributed resources, but still acts as one - often referred to
as federated storage clouds
o Highly fault tolerant through redundancy and distribution of data
o Highly durable through the creation of versioned copies
o Typically eventually consistent with regard to data replicas
Some of the services listed above are free. Others charge a flat fee for a certain amount of
storage, and still others have a sliding scale depending on what the client needs. In general,
the price for online storage has fallen as more companies have entered the industry. Even
many of the companies that charge for digital storage offer at least a certain amount for free.
Topic-2: Issues
The two biggest concerns about cloud storage are reliability and security.
1. Security: Clients aren't likely to entrust their data to another company without a
guarantee that they'll be able to access their information whenever they want and no one
else will be able to get at it. To secure data, most systems use a combination of
techniques, including:
- Encryption, which means they use a complex algorithm to encode information. To
decode the encrypted files, a user needs the encryption key. While it's possible to
crack encrypted information, most hackers don't have access to the amount of
computer processing power they would need to decrypt information.
- Authentication processes, which require to create a user name and password.
- Authorization practices -- the client lists the people who are authorized to access
information stored on the cloud system. Many corporations have multiple levels of
authorization. For example, a front-line employee might have very limited access to
Er. Rohit Handa
Lecturer, CSE-IT Department
IBM-ICE Program, BUEST Baddi
data stored on a cloud system, while the head of human resources might have
extensive access to files.
- Even with these protective measures in place, many people worry that data saved on a
remote storage system is vulnerable. There's always the possibility that a hacker will
find an electronic back door and access data.
- Hackers could also attempt to steal the physical machines on which data are stored. A
disgruntled employee could alter or destroy data using his or her authenticated user
name and password.
- Cloud storage companies invest a lot of money in security measures in order to limit
the possibility of data theft or corruption.
2. Reliability: An unstable cloud storage system is a liability. No one wants to save data to
a failure-prone system, nor do they want to trust a company that isn't financially stable.
While most cloud storage systems try to address this concern through redundancy
techniques, there's still the possibility that an entire system could crash and leave clients
with no way to access their saved data.
4. Supplier stability
- Companies are not permanent and the services and products they provide can
change. Outsourcing data storage to another company needs careful investigation and
nothing is ever certain.
- Contracts set in stone can be worthless when a company ceases to exist or its
circumstances change. Companies can:
a. Go bankrupt.
b. Expand and change their focus.
c. Be purchased by other larger companies.
d. Be purchased by a company headquartered in or move to a country that
negates compliance with export restrictions and thus necessitates a move.
e. Suffer an irrecoverable disaster.
5. Accessibility
- Performance for outsourced storage is likely to be lower than local storage, depending
on how much a customer is willing to spend for WAN bandwidth
- Reliability and availability depends on wide area network availability and on the level
of precautions taken by the service provider. Reliability should be based on hardware
as well as various algorithms used.
- If you have no internet connection, you have no access to your data.
6. Other concerns
- Piracy and copyright infringement may be enabled by sites that permit filesharing.
- The legal aspect, from a regulatory compliance standpoint, is of concern when storing
files domestically and especially internationally.
7. Usability: Be careful when using drag/drop to move a document into the cloud storage
folder. This will permanently move your document from its original folder to the cloud
storage location. Do a copy and paste instead of drag/drop if you want to retain the
documents original location in addition to moving a copy onto the cloud storage folder.
8. Bandwidth Several cloud storage services have a specific bandwidth allowance. If an
organization surpasses the given allowance, the additional charges could be significant.
However, some providers allow unlimited bandwidth. This is a factor that companies
should consider when looking at a cloud storage provider.
9. Software If you want to be able to manipulate your files locally through multiple
devices, you'll need to download the service on all devices.
Topic-3: Challenges
Security (Data Leakage)
o Security can be provided through using a combination of techniques:
Encryption
Authentication
Authorization
o Secure online storage and file sharing solutions are in high demand.
o With so many people using the internet to communicate and collaborate with
coworkers on a daily basis, often needing to share sensitive information and
Er. Rohit Handa
Lecturer, CSE-IT Department
IBM-ICE Program, BUEST Baddi
documents, the need to make those interactions secure and protected is
paramount.
o Before you choose a cloud storage solution especially one of the free ones
being offered out there take a close look at the terms of service (TOS)
agreement.
o A few of the leading online storage solutions have led to scrutiny by privacy
right advocates wondering who effectively owns users data.
o All online storage is vulnerable to security breaches from increasingly more
sophisticated hackers. Even market leader Dropbox experienced a security
breach generating a discussion about the safety of cloud storage for the most
sensitive data.
o Dont risk the security of your data in order to save money, doing so may
actually end up costing you a lot more down the line.
Reliability
o Provided through Redundancy
Performance problem: Big data, slow networks
Mesh network inconsistency: Packets can take a variety of routes between the
servers and the storage. The mesh component is vulnerable to both packet loss and
capacity problems.
Slowdowns and brownouts: This is a problem at both Amazon and GoGrid, but it is
easier to see at Amazon. Their network, and consequently their storage, has variable
performance, with slow periods that are called brownouts.
Packet loss: This is related to the capacity problems as routers will throw away
packets when they are overloaded. However, the source of the packet loss seems to be
much harder to debug in a mesh network. These problems are seen on the
GoGridnetwork, and their attempts to diagnose it are often ineffectual.
Replication stoppages
Cost of cloud storage:
o The more you store, the more you pay. Cloud storage can be costly.
The more information you store in the cloud the more expensive it can get.
o So saving all of your information online especially if you are a digital media
buff wanting to back-up lots of photos, graphic files, and videos cloud storage
may not be the best solution for you.
Access:
o If its not in your sync folder, youre out of luck.
o Perhaps the biggest yet least talked-about limitation of most popular online
storage and file sharing solutions like Dropbox or SugarSync is the fact that
only the information you remember to sync ahead of time will be readily
available for remote access once youre away from your computer.
o So what if you havent saved all of the data, video, and files that you need in
these online storage solutions and you end up being away from your
computer? How can you remotely access your files then?
o Unlike the Cloud and Dropbox and other online storage options, TappIn gives
you secure, remote access (instead of storage) to their data no matter where it is
saved. With TappIn, you dont need to plan ahead what photo or music file you
Er. Rohit Handa
Lecturer, CSE-IT Department
IBM-ICE Program, BUEST Baddi
may want to access once you leave the house. You can access it all anytime,
from any mobile device.
o However, if you (or your employers) still choose to store stuff in the cloud or in
other online storage solutions, TappIn is a great complementary solution. It is
compatible with most popular cloud storage options like Dropbox and
SugarSync. So you can access whats in multiple clouds, even if you forget to
sync.
Performance and data transfer rates become key issues as the distance between the
data and the user increases - which is what happens in cloud computing.
Bandwidth limitations Even unlimited bandwidth without solving the latency
problem will not improve the performance because it is the latency - or the chattiness
- of the protocols, plus the speed of light limitations that cause the end user
experience to be very poor. Not all data access patterns are well suited to the cloud,
particularly if there are large distances to cover. In such cases, bandwidth becomes
not only a challenge but a financial consideration. Bandwidth is a limiting factor when
accessing a public storage cloud, as they are accessed over the Internet. Primary
storage deduplication and compression, minimizes bandwidth consumption
dramatically while also improving performance.
Latency constraints Latency is the silent killer of application performance, both in
terms of response time and throughput. StorSimple takes advantage of parallelization,
persistent connections, and TCP optimizations to overcome latency and improve
performance
Manageability - Are concerned about being locked into their proprietary cloud storage
infrastructure and applications services. They dont have vendor independent tools or
industry standards to evaluate the applicability or measure the effectiveness of cloud
storage for their environment.
Interoperability/Protocol translation A serious concern exists today is: Most of
todays on-premises applications use block protocols. But Cloud storage protocols
predominantly speak only in the language of file protocols and both public and private
storage clouds are accessed via REST HTTP-based, or SOAP APIs. Since these
applications expect block access to storage, introducing a cloud storage system to the
application is like trying to have a conversation in Spanish when you only speak
English.
Topic-4: Advantages
1. Universal Access
2. Collaboration
3. Scalability
4. Economical
5. Reliability
6. Cloud storage has several advantages over traditional data storage. For example, if
you store your data on a cloud storage system, you'll be able to get to that data from
any location that has Internet access. You wouldn't need to carry around a physical
storage device or use the same computer to save and retrieve your information. With
Er. Rohit Handa
Lecturer, CSE-IT Department
IBM-ICE Program, BUEST Baddi
the right storage system, you could even allow other people to access the data, turning
a personal project into a collaborative effort.
7. Companies need only pay for the storage they actually use, typically an average of
consumption during a month. This does not mean that cloud storage is less
expensive, only that it incurs operating expenses rather than capital expenses.
8. Organizations can choose between off-premise and on-premise cloud storage options,
or a mixture of the two options, depending on relevant decision criteria that is
complementary to initial direct cost savings potential; for instance, continuity of
operations (COOP), disaster recovery (DR), security (PII, HIPAA, SARBOX, IA/CND),
and records retention laws, regulations, and policies.
9. Storage maintenance tasks, such as purchasing additional storage capacity, are
offloaded to the responsibility of a service provider.
10. Cloud storage provides users with immediate access to a broad range of resources and
applications hosted in the infrastructure of another organization via a web service
interface.
11. Cloud storage can be used for copying virtual machine images from the cloud to on-
premise locations or to import a virtual machine image from an on-premise location to
the cloud image library. In addition, cloud storage can be used to move virtual
machine images between user accounts or between data centers.[12]
12. Bandwidth You can avoid emailing files to individuals and instead send a web link
to recipients through your email.
13. Disaster Recovery It is highly recommended that businesses have an emergency
backup plan ready in the case of an emergency. Cloud storage can be used as a
backup plan by businesses by providing a second copy of important files. These files
are stored at a remote location and can be accessed through an internet connection.
14. Cost Savings Businesses and organizations can often reduce annual operating costs
by using cloud storage; cloud storage costs about 3 cents per gigabyte to store data
internally. Users can see additional cost savings because it does not require internal
power to store information remotely.
Business Requirements
For any organisation, any size, your key objectives will be:
A scalable storage solution with automated data placement to help you efficiently
deliver content and information services.
Reduce the need for third-party products by using built-in versioning, compression
and reduplication.
Mitigate risks in disaster recovery, provide long-term retention for records and
enhance both business continuity and availability.
Unified management platform to help reduce; outages and storage management
labour demands and costs, advanced data replication for cost-effective business
continuity and disaster recovery.
Your organisation requires higher levels of service for critical applications that cannot
be achieved in house.
Ensure high availability and enable customers to access their data whenever they
need it.
Topic-7: Standards
Amazon S3
Amazon S3 is storage for the Internet.
It is designed to make web-scale computing easier for developers. Amazon S3 provides
a simple web services interface that can be used to store and retrieve any amount of
data, at any time, from anywhere on the web.
It gives any developer access to the same highly scalable, reliable, secure, fast,
inexpensive infrastructure that Amazon uses to run its own global network of web
sites.
The service aims to maximize benefits of scale and to pass those benefits on to
developers.
Amazon S3 Functionality
Amazon S3 is intentionally built with a minimal feature set.
Write, read, and delete objects containing from 1 byte to 5 terabytes of data each. The
number of objects you can store is unlimited.
Each object is stored in a bucket and retrieved via a unique, developer-assigned key.
Er. Rohit Handa
Lecturer, CSE-IT Department
IBM-ICE Program, BUEST Baddi
A bucket can be stored in one of several Regions. You can choose a Region to optimize
for latency, minimize costs, or address regulatory requirements.
Objects stored in a Region never leave the Region unless you transfer them out. For
example, objects stored in the EU (Ireland) Region never leave the EU.
Authentication mechanisms are provided to ensure that data is kept secure from
unauthorized access. Objects can be made private or public, and rights can be
granted to specific users.
Options for secure data upload/download and encryption of data at rest are provided
for additional data protection.
Uses standards-based REST and SOAP interfaces designed to work with any Internet-
development toolkit.
Built to be flexible so that protocol or functional layers can easily be added. The
default download protocol is HTTP. A Bit Torrent protocol interface is provided to
lower costs for high-scale distribution.
Includes options for performing recurring and high volume deletions. For recurring
deletions, rules can be defined to remove sets of objects after a pre-defined time
period. For efficient one-time deletions, up to 1,000 objects can be deleted with a
single request.
Amazon S3 provides further protection via Versioning. You can use Versioning to
preserve, retrieve, and restore every version of every object stored in your Amazon S3
bucket.
This allows you to easily recover from both unintended user actions and application
failures.
By default, requests will retrieve the most recently written version.
Older versions of an object can be retrieved by specifying a version in the request.
Storage rates apply for every version stored.
Google Drive
Collaboration: Users of Google Drive documents must have a Google Drive account.
All updates and editing by collaborators will be synced to Google Drive. For
documents that you have permission to access, you can receive notifications when
changes are made. You can share files with people by sending them a link to your file.
Mobile App Support: Google Drive has an Android app which gives you the ability to
share the files on your Android device using your Drive account. You can also share
any file from Drive with your phone contacts.
Storage: Google Drive offers 5GB of free storage.
Strengths: Has builtin document editor so that programs such as Microsoft Word are
not required to be installed on computer in order to edit document. Allows comments
to be left on any files stored.
Weaknesses: Sharing not as easy and intuitive as Dropboxmust use the Google
Drive web application to set it up. There is no ability to set preferences on syncing
speed.