You are on page 1of 15

Dragon Slayer Consulting

Marc Staimer

Ending The Torment Of Storage


Administration
Making Storage Systems Application Aware

WHITE PAPER

Making Storage Systems Application Aware

Ending The Torment of Storage Administration


Marc Staimer, President & CDS of Dragon Slayer Consulting

Introduction
Despite the proclamations of numerous storage vendors, storage systems are for the most part dumb and
self-absorbed. The assertion of the intelligent storage system is a matter of debate. Are intelligent
storage systems truly intelligent? It depends on the individuals perspective. Once data lands on a
storage system that data can be manipulated, deduped, compressed, parity protected, moved, tiered,
snapshot, replicated, made immutable (WORM), searched, analyzed, virtually represented, mirrored,
synchronized, repurposed, migrated to the cloud, digitally destroyed, and more. This is what most
storage vendors actually mean when they describe intelligent storage. In fact storage systems today are
far more capable at slicing and dicing data than ever before. Yet, even with these extraordinary
capabilities, they have zero awareness of the applications creating the data. Storage systems dont know
whats coming from any given application workload, its value to the business, or how the data is
organized in a structured application until after it is stored. They dont know which applications have
priority and cant adapt to changing priorities, circumstances, or situations. There is no communication
between the applications creating the data and the storage systems storing it. That relationship is known
as a master-slave relationship where the slave can only manipulate the data after it receives it.
Then how do the storage systems adapt to an increasingly volatile ecosystem where virtual machines and
containers are created in minutes; where mission critical applications are mandated to deliver consistent
performance regardless of the workloads while having no methodology to communicate those
requirements to the storage system; where noisy virtual machine or container neighbors consume storage
resources and performance from more important applications? They dont. The onus is put back on the
storage administrator. In other words, the human is put in the middle. The storage administrator must
make up for that dearth in storage system application awareness, with labor-intensive time-consuming
sweat equity. This is an unsustainable formula, as the situation gets aggressively worse not better.
This analyst report examines this problem in depth as well as the common labor-intensive workarounds,
why theyre unsustainable, and how the problem should be solved. It will then examine how the new
Oracle FS1 effectively and efficiently solves these problems.

Dragon Slayer Consulting Q4 2014

WHITE PAPER

Making Storage Systems Application Aware

Table of Contents
Introduction ................................................................................................................2
Application Workloads And Structured Application Data Organization Problems.........4
Workarounds...............................................................................................................5

Hybrid storage with storage tiering or caching................................................................5

All Flash Arrays (AFA) .......................................................................................................5

VMware with VSAN..........................................................................................................6

First Steps in Eliminating The Storage Administrators Torment ...................................7


Assessing Current and Prospective Storage Systems ............................................................. 7
Select Storage Systems That Alleviate The Most Issues & Problems ...................................... 9

Why The Oracle FS1 Is a Very Worthy Contender .........................................................9


FS1 Quick Description........................................................................................................... 9
How Oracle FS1 Solves Application Workload and Structured Data Problems ....................... 9
m

Performance per Workload .............................................................................................9

Data Protection ..............................................................................................................10

Storage Processing IO Prioritization and IO Contention Managment............................11

Scalability .......................................................................................................................12

Compliance ....................................................................................................................12

Elasticity .........................................................................................................................12

Flexibility & Adaptability ................................................................................................13

Efficiency ........................................................................................................................13

Manageability ................................................................................................................13

Total Cost of Ownership (TCO).......................................................................................14

Summary and Conclusion ...........................................................................................


15

Dragon Slayer Consulting Q4 2014

WHITE PAPER

Making Storage Systems Application Aware

Application Workloads And Structured Application Data Organization Problems


Application workload is the demand placed on the supporting storage infrastructure. That demand is
measured as the load on the: storage controller processing, memory, caching, IOPS, IO bandwidth, and
system throughput. Another way to look at it is the amount of storage resources it soaks up. Application
workloads are precipitously snowballing. The ubiquitous data center acceptance of virtual machines (VM)
and containerization has made application workload creation incredibly simple. Spinning up a VM or a
container takes minutes. That is both a boon and a problem in todays data center. The boon comes from
rapid implementation of new applications.
The problem arises from the lengthy process of spinning
up the supporting external storage resource. It takes a
lot longer, typically days or weeks. This is because
storage spin-up has a lot of manually labor-intensive
time consuming tasks that have to be performed for
each and every application workload. Those tasks
require the storage administrator to have the smarts
and expertise thats lacking in the storage system. The
resulting implementation, provisionment, changes,
operations, and management of the supporting storage
for each workload take much too much time.
Application owners and users are impatient, needing to
run their applications straightaway. Waiting for storage
resources is terribly frustrating. The vast majority of storage systems have simply not kept up with this
rapid change in application workload creation or their dynamic workload requirements.
A common application issue is what is known as the noisy neighbor. Whether the application is on
physical or virtual server, the noisy neighbor is when that applications workload hogs or monopolizes
storage processing resources. Rare is the storage system that prioritizes storage system processing IO
based on the importance of the workload. Fixing the problem is not simple. It puts the onus on the
storage administrator to fix noisy neighbor problems by dedicating or isolating storage resources. Even
then it is not necessarily completely possible to fix.
Structured application data organization is different in that it is not directly tied to application workload
but is just as critical. Its how the data is laid out and organized by structured applications or databases.
This is a highly important aspect of relational databases. The
manner in which the data is organized directly correlates to
the databases performance. Storage can help or hurt that
database performance. But since most storage systems have
no inherent knowledge or automated ability to learn how that
database data is organized, it is up to the storage
administrator, database administrator, and database
application administrator to figure it out. Once determined,
the different parts of the database data must be manually
placed in the best storage locations to provide the optimum
performance. Thats quite a labor-intensive, time-consuming,
and unfortunately an obviously inflexible process. Meaning,
changes are not made quickly or in real-time.
Change
requirements are generally tracked, accumulated, and done in
a block all at once later at a more convenient time for the administrator.
A huge side effect of depending on manual processes for relational databases is that data growth is highly
exacerbated, because storage utilization is completely inefficient. That inefficiency has the added effect
of degrading performance and raising costs. None of which is in the best interests of the structured
applications so dependent on that performance.
Industry pundits have claimed these are people problems not storage technology problems. Theyre
wrong. Throwing people or professional services at the problems does not solve then. People can
mitigate the problems to a limited point, but cannot fix them. Too many of the tasks must be done serially
Dragon Slayer Consulting Q4 2014

WHITE PAPER

Making Storage Systems Application Aware

and cannot be performed in parallel. Throwing people at the problems is a temporary salve at best, a
costly one, and ultimately unsustainable. The common underlying issue in both these problems is that
the expertise and labor fall to the human administrators instead of the systems involved. Human primates
are clever problem solvers. And this has led to several workarounds.

Workarounds
The most common current workarounds to these problems:

Hybrid storage with storage tiering or caching

All flash arrays with deduplication and compression

VMware with VSAN

Hybrid storage with storage tiering or caching

Hybrid storage combines flash and HDDs in a single storage system. Hybrid
storage is based on the premise that higher value application workloads and
data require faster/higher cost flash storage; lower value application workloads
and data require lower performing/lower cost HDD storage.
Storage tiering moves data from higher performing flash storage to lower
performing HDD storage based on policies such as data age, or time passed since
last access. It can move the data back to the higher tier if access frequency
passes a set threshold. There are three key shortcomings with the vast majority
of hybrid storage tiering systems. The first is granularity. Data is moved in big
chunks of 256MB or more. That coarse granularity means either too much data
stays too long in the higher tier or doesnt stay long enough. The former
requires more significantly costly flash storage (if possible) or reduces system
and application performance. The latter ends up moving the data more
frequently both downwards and back upwards, reducing both storage system
performance and flash wear-life.
Caching has two variants: write-through (a.k.a. read caching) and write-back (a.k.a. write caching). Write-
through caching writes directly to the lower performing HDDs and places data in the flash cache based on
frequency of access. It does nothing to accelerate writes. Unless the cache can pin data in the cache, only
frequently accessed reads are accelerated. For databases and structured applications this can be
devastating to performance. Theres quite a bit of critical low access frequency data such as indexes that
has oversized impact on their performance. Inability to pin the right data in write-through cache limits its
value. Granularity is another problem. Coarse granularity just as with storage tiering means large swaths
of data that may not be hot is kept in cache or portions of data that really is hot is moved out of cache.
The result is more tasks, strain, and stress on the administrators as well as the flash itself.
Both storage tiering and caching are excellent concepts. However, their actual value is highly dependent
on their execution. Coarse granularity of data movement as previously mentioned significantly reduces
their value. Congruently, so does very fine granularity such as 4K. That very fine granularity places
demands on the storage controller to track the heat index (value) of orders of magnitude more blocks of
data. That requires more controller resources making them unavailable for application workloads. It is
analogous to utilizing a 747-8 freighter to move a single overnight package. Too fine a granularity can
lead to storage controller thrashing further reducing application workload performance.
The key to executing efficient hybrid storage is to utilize a granularity small enough to provide efficiency
but not too small to cause inefficiency. Doing so is a non-trivial exercise that requires application
workload and structured application data knowledge and understanding.
m

All Flash Arrays (AFA)

The AFA utilizes only NAND flash storage drives. There are no HDDs. The concept behind the AFA is to
simplify storage performance and simply provide fast flash storage to everyone at a reasonable cost. To
pull off this balancing act frequently requires lower cost multi-level cell (MLC), which are consumer grade
flash drives, or Enterprise multi-level cell (eMLC), which are MLC that have been hardened for the
Enterprise, and inline data reduction technologies of deduplication and compression.

Dragon Slayer Consulting Q4 2014

WHITE PAPER

Making Storage Systems Application Aware

The eMLC and especially MLC flash drives cost a lot less than SLC (single level
cell), have twice the number of states per cell, and more capacity in the same
footprint. But they also deliver significantly lower performance as measured in
latency, IOPS, and throughput, as well as a much shorter wear-life.
Inline deduplication and compression add latency to every IO. It cant be helped.
Deduplication requires incoming data be compared to data already stored to
determine duplicate blocks. That takes time (latency). The greater the amount of
data already stored, the more time (latency) it takes. Reading the data requires
rehydrating the data. Each block recalled must be rehydrated. Rehydration takes
time (more latency). Compression operates under similar principles and that
adds more latency on both writes and reads.
In addition, there is little to no distinction among application workloads and
structured data organization. This is a problem. It treats all application workloads
the same irrespective of an applications criticalness to the business or organization. Many AFAs operate
under the principle of: If you have a hammer, the whole world looks like a nail. The problem with that
approach is the storage system processes IO in a FIFO (first in first out) order and does nothing to
prioritize that IO processing. It means mission-critical IO frequently waits on non-mission critical IO.
Not knowing or understanding the structured data organization makes it exceedingly difficult for the
deduplication and compression to return much in the way of data reduction. Many structured
applications already compress before the data reaches the AFA. If the data is already compressed there
will be little to no data reduction. Not only is there no data reduction, but it also wastes the AFAs storage
processing resources because it doesnt know not to attempt any deduplication or compression of the
non-dedupible or compressible data. There is no communication or forewarning.
m

VMware with VSAN

VSAN is incorporated into VMware vSphere hypervisor to provide


storage virtualization. Its based on a distributed object store and
integrated with VMwares vSphere Storage Based Policy Based
Management (SPBM). VSAN is designed to deliver hypervisor-
converged compute concurrently with storage infrastructure in a single
vSphere administrator controlled platform. A crucial aspect of VSAN is
the leveraging of flash for caching to accelerate IO combined with an
across the cluster distribution algorithm that enhances reliability and
data protection.
The supposition behind VSAN is that the hypervisor knows what
application workloads are doing, which have priority, and what their
needs are. This enables the VMware administrator to provision, set up,
operate and manage the storage as part of the VM. No waiting. No frustration. And yet its still much too
manually labor-intensive. It has shifted much of the expertise and responsibility from the storage
administrator to the VMware administrator. While this takes one person out of the loop and eliminates
communication between administrator errors, it does not automate application workload issues or data
organization issues. VSAN doesnt provide any data reduction or storage tiering.
The storage in VSAN is no more intelligent than external storage arrays. However, there is a limited
capability within vSphere 5.x and newer that allows high priority application workloads quality of service
(QoS). This QoS enables preferential treatment for those high priority workloads where their service level
operations can be specified with a minimum latency or IOPS threshold. If the performance drops below
that threshold the hypervisor reorders the IO queue in favor of the higher prioritized VM workloads until
the threshold is again exceeded. Its a form of throttling that allows those high priority VMs to perform
better in the noisy neighbor mixed environment.
But throttling is not actually QoS because it doesnt react in real time to prevent high priority VM
workload performance thresholds from being missed. It adjusts after the SLO has declined below
acceptable levels. Then the hypervisor takes performance away from lower priority VMs to give it to the
higher priority VMs until thresholds are met. A better description of this capability is IO strangulation and
Dragon Slayer Consulting Q4 2014

WHITE PAPER

Making Storage Systems Application Aware

reallocation. And it only works with vSphere VM application workloads within a specific cluster. It doesnt
work with physical servers, containers, other hypervisors, or other clouds.

First Steps in Eliminating Th Storage Administrators Torment


It starts with the application workloads. Ascertain the priority of each and every current workload and
establish procedures that require the slotting of prioritization of every new workload before it is brought
live and online. Next comes the evaluation of the external or internal storage system.
Application workload variance is huge and diverse as previously stated.
Assessing Current and Prospective Storage Systems
Before evaluating any storage system, it is essential that the evaluation include a series of probing
comprehensive questions. Those questions should be asked and answered about both current storage
systems and any potential replacement systems. It levels the playing field by providing a full picture of
what the application workload landscape looks like now and what it might look like later. It also
effectively generates insight into how well the current and proposed storage ecosystems support each
applications workload and service level of operations (SLO) requirements.

Dragon Slayer Consulting Q4 2014

WHITE PAPER

Making Storage Systems Application Aware

Table 1: Questions T B Answered


Performance*per*Workload
Describe*the*process*in*determining*how*the*application*gets*enough*IOPS*so*as*to*meet*user*demand?
What*are*the*latency*requirements*per*application*workload*IO*&**process*for*ensuring*it*meets*response*time*(SLO)?
Whats*the*process*for*managing*application*workload*throughput*meeting,*measuring,*&*troubleshooting**SLOs?
How*does*the*storage*system*adjust*to*meet*application*SLOs*that*are*out*of*compliance?
Whats*the*storage*system*process*to*muzzle*noisy*neighbor*application*workloads?*How*are*they*isolated?
Walk*through*the*storage*system*details*of*how*it**manages*application*IO*resource*contention*?
Data*Protection
What*are*each*application*workload*recovery*point*objectives*(RPO)*i.e.*how*much*data*can*afford*to*be*lost?*
What*are*each*application*workload*recovery*time*objectives*(RTO)*or*how*fast*do*each*application*and*its*data*need*
to*be*up*and*running*operationally*after*a*disaster*or*outage?
What*are*the*storage*system*processes*to*deliver*required*per*application*workload*RPO*and*RTO*requirements?
Detail*the*storage*system*procedures*for*ensuring*endLtoLend*data*integrity*that*minimizes*recovery*events?
Storage*Processing*IO*Prioritization*and*IO*Contention*Managment
Which*application*workloads*have*IO*storage*processing*priority?
What*is*the*process*for*enforcing*that*prioritization?
What*is*that*prioritization*for*all*the*workloads?
How*are*new*application*workloads*added*and*implemented*into*the*prioritization*schedule?
Whats*the*process*to*ensure*an*application*workload*gets*the*storage*processing*priority*it*requires?*
Scalability
Describe*the*procedures*for*provisioning*IOPS,*throughput,*capacity,*and*data*protection*for*each*type*of*application*
workload*breaking*it*down*by*mission*critical,*high,*medium,*and*low?
Whats*required*to*scale*each*of*these*automatically?**
If*it*cant*be*provisioned*in*realLtime*onLdemand,*how*is*it*done?
Compliance
What*are*the*regulatory,*legal,*and*corporate*governance*compliance*requirements*per*application*workload?
What*are*the*storage*system*processes*for*compliance*management?
Elasticity
What*are*the*realLtime*storage*system*processes*altering*application*workload*IOPS,*throughput,*&*capacity?
What*are*the*processes*for*resource*deLallocation*back*into*the*pool*when*demand*drops*off?
Flexibility*&*Adaptability
Walk*through*the*application*workload*storage*change*management*processes?*
Detail*its*flexibility*or*lack*thereof?
Describe*the*process*of*how*the*storage*system*learns*and*adapts*to*application*workload*behavior?
Whats*the*proactive*process*for*storage*adjustment*to*application*workload*changes?
Efficiency
Whats*the*storage*system*optimization*&*capacity*minimization*process*for*all*application*workloads?
How*does*the*storage*system*know*or*see*into*structured*application*data*organization?
Manageability
What*are*steps*per*storage*system*required*to*manage*each*application*workload?
What*level*of*expertise*is*expected*and*required*by*the*storage*system*to*deliver*viable*management?*
Total*Cost*of*Ownership*(TCO)
Whats*the*process*for*calculating*the*storage*systems*TCO?
NOTE:*It*should*included*all
Upfront*hardware*costs
Installation*and*implementation*costs
Hardware*maintenance*costs*(above'and'beyond'the'warranty)
Hardware*growth*costs
Supporting*infrastructure*costs*(rack'space,'floor'tile'space,'cables,'transceivers,'battery'backup,'etc.)
Software*licensing*costs*(perpetual*+*maintenance*or*subscription*based)
Software*licensing*growth*costs
Annual*power*&*cooling*costs
Administration*time*costs*(people*hours)*&*professional*services*costs'(upfront'and'ongoing)
Upfront*data*migration*costs
Tech*refresh*cycle*costs

Dragon Slayer Consulting Q4 2014

WHITE PAPER

Making Storage Systems Application Aware

Select Storage Systems That Alleviate The Most Issues & Problems
Implement the storage system or systems that end up at the top of the list for performance, data
protection, storage process prioritization, noisy neighbors, storage efficiency, scalability, compliance,
elasticity, flexibility, efficiency, manageability, and TCO. If more than one, develop a live bakeoff that
closely matches the production environment. Select the system or systems that come out on top.

Why The Oracle FS1 Series (Flash Storage Series) Is a Very Worthy Contender
The Oracle FS1 is the newest application engineered SAN storage from
Oracle. It is specifically designed to solve the application workload and
structured application data organization issues and problems. Oracle
leveraged what it learned delivering the production hardened application
engineered storage of Exadata, Exalogic, SuperCluster, ZFS Storage
Appliance, and Pillar Axiom Storage to cultivate the FS1 as a new
generation of application engineered flash SAN storage.
FS1 Quick Description
The Oracle FS1 is available as both an all-flash array (AFA) and a hybrid
array. It was architected from the onset around flash and flash
characteristics as its primary storage media with HDDs as a secondary
design priority. Each Oracle FS1 can be configured with 100% flash drives
or a mix of flash and HDDs. It is a unified storage system that emphasizes
its SAN heritage with Fibre Channel and Ethernet-iSCSI as well as NFS and
CIFS. The FS1 delivers complete flexibility in a shared secure multi-tenant
storage environment by recognizing that applications are all different
and autonomically adapting to their differences heuristically (i.e. it learns
and adjusts).
That is a bare bones description. All of the more detailed specifications,
videos, data sheets, testimonials, and brochures can be found on the
Oracle FS1 website: (https://www.oracle.com/storage/san/fs1/index.html). There is no need to restate that
Oracle information. It is more important to answer the question on how well the Oracle FS1 solves the
application workload and the structured data problems described previously in this document.
How Oracle FS1 Solves Application Workload and Structured Data Problems
m

Performance per Workload

All AFAs and many hybrid storage systems make extraordinary claims about performance typically touting
hundreds of thousands to millions of IOPS and GBps throughput. The claims are not necessarily false, but
they are always best case scenarios and do not specifically address how that performance is optimized for
each application workload. Performance is a lot more than IOPS and throughput. Its also pathing, IO
bandwidth into and out of the storage system, IO contention management of resources, IO prioritization,
performance aggregation, and more.
The Oracle FS1 attacks all of these issues simultaneously. Sure it specs out into the millions of IOPS and
closes in on TBps throughput. But much more important is the way it specifically handles application
workload performance. Each Oracle FS1 HA pair can deploy up to 64 storage domains (FS1 virtual
machines). Each Storage Domain can be tailored to one or multiple application workloads. A Storage
Domain is made up of multiple RAID groups from within multiple drive enclosures (DE). In AFA mode a
storage domain can be a mix of or exclusively high performance SLC flash RAID groups and/or lower
performance high capacity eMLC flash RAID groups. In hybrid mode the mix can combine one or both
flash media types in addition to one or both high performance 10K HDDs drives as well as lower
performing high capacity 7200RPM HDDs. In fact, within the hybrid version of the FS1, a Storage Domain
can be configured with just SLC flash media as a very high performing AFA providing the best of both
worlds.
Each Storage Domain is like a unique storage system. No other workload, data, or data service can cross
the domain boundaries, effectively soundproofing and isolating workloads from the noisy neighbor and
one another (efficient and secure multi-tenancy.) Performance is modifiable on the fly by changing the
makeup of the storage domain or moving application workloads to different storage domains.
Dragon Slayer Consulting Q4 2014

WHITE PAPER

Making Storage Systems Application Aware

Storage domains are the foundation for FS1 performance management and are used extensively by the
FS1 QoS Plus management framework. QoS Plus is specifically architected to get the most possible value
out of multiple tiers of flash in an AFA or Hybrid configuration. QoS plus is an autonomous fully
automated tiering software that adapts data movement to the most cost-effective $/IOPS and $/GB tier
based on business value. Doing this effectively requires highly intelligent software optimized for multiple
flash tiers and multiple HDD tiers, which is exactly what the FS1 QoS Plus software does.
The FS1 QoS Plus utilizes a mix of common storage tiering considerations, such as data access frequency,
with the applications data business value bias (eliminating non mission critical data such as latest pop

songs, non-business pictures, etc.), factors in storage domains RAID levels to intelligently account for
1
read/write ratios . The FS1 divides all of the data into 640KB blocks to build a heat map or index that
determines which data is hot and which is not plus the IO type read, write, random, or sequential. This is
what Oracle calls the access profile. The heat map is a numerical measure of access density; the number
of IOs within a defined block of data over a defined block of time. FS1 QoS Plus factors in what is
important to the business, not just what is hot. Simply put, the FS1 QoS determines if the high IO rates are
indicative of business value, or merely a reflection of some Twitter feed? Ultimately, it factors both of
these conditions before it makes the tiering data migration decision. Independently of access density, the
FS1 asks the question: Is this data of high business value? If yes, the FS1 will actively promotes it and
tend to keep it on a premium tier. If no, the FS1 will seek to demote it, because its not important even if
everybody is accessing the latest embarrassing photo on the corporate server. In other words, the FS1
makes sure mission critical application workloads always get the resources they require to meet their
SLOs. It is not throttling as previously described. Its aligning application workloads with business value.
The FS1 additionally manages path performance its Flash Storage Path Manager (FSPM) that it provides at
no additional license cost. FSPM works with most major operating systems and hypervisors. FSPM
automates Oracle FS1 storage services host configurations, updates config information if host information
changes, and ensures optimal paths are utilized to the FS1 ALUA design. Pathing performance
bottlenecks are eliminated by load balancing application workloads across all paths reducing failovers and
pathing hot spots. By supporting WWN friendly names in addition to providing real-time telemetry on any
host IO pathing issues, FSPM simplifies the configuration process.
m

Data Protection

The FS1 has multiple layers of built-in data protection. The first is end-to-end
error detection and correction based on the ANSI T10-PI (protection
2
information) . The Oracle Automatic Storage Management (ASM) uses T10-PI
to insert an 8-byte metadata tag on each 512Byte block eliminating silent data
corruption. The data is validated first by the host operating system, then by
the host HBA, and finally by Oracle FS1, ensuring protection along the entire
IO path from application to the drive itself.
1
2

See more under the elasticity and adaptability sections.


Previously known as T10-DIF (data integrity field).

Dragon Slayer Consulting Q4 2014

10

WHITE PAPER

Making Storage Systems Application Aware

Even the FS1 system SDRAM cache is protected from data loss. Oracle designed hybrid DIMMS with half
the DIMM populated with DRAM and half with non-volatile flash. Logic circuits are embedded on each
NVDIMM that copies DRAM data to persistent flash should there be a power loss. SuperCaps (super
capacitors) packaged in a 2.5 form factor enclosure provide power to perform this copy function. When
power is restored, the reverse takes place and data is restored to SDRAM and flushed to disks. Unlike
batteries, SuperCap charge time is almost immediate so the FS1 does not have to remain in write through
mode while the batteries recharge.
The next layer of protection comes from the Oracle FS1 copy services suite. That suite consists of SAN and
NAS clones (writable snapshots), NAS snapshot capability and full volume
copies. The fundamental difference between a clone and a snapshot, beyond
read/write vs. read only, is that the snapshot exists within the primary file
system space, and the clone is a new, sparsely populated file system. A SAN
(block) clone exists as a new LUN. Its individually mounted just as a NAS clone
is a separate file system and would be treated as a new share. Full volume
copies are always on new LUNs. Both snaps and clones are space efficient in
design and are created virtually instantly. No pre-reservation space is required.
The drawback is theyre dependence on the existence of the primary volume
containing all unchanged data. The clone or snaps only contains the original
copies of blocks that have changed since the Point in Time (PIT) when the clone
was created. The updated blocks are in the primary volume. In this way, clones and snaps provide very
fast rollback to any given point in time. A volume can have up to 256 clones. FS1 full volume copies dont
have a performance copy on write penalty. FS1 snapshots enable very low RPOs (near CDP) and very
fast RTOs for each and every application workload.
To create an application-consistent recovery the FS1 also provides the Data Protection Manager (DPM).
DPM puts a light agent on each structured application server that communicates directly with FS1s copy
services. FS1 copy services have the agent quiesce the structured application or database and put it into a
consistent state (often referred to as hot backup mode.) Copy services only then creates a clone and then
has the agent to resume normal operations. Storage snapshot services normally have no application
awareness. This means the structured application or database snapshot is crash consistent and not
application consistent. The application or database can easily be corrupted when only crash consistent
and takes longer to bring it back to an application consistent state (using journaling) during a recovery.
This will devastate that application workloads RTO requirements. FS1 DPM application consistency FS1
recovery services suite means nearly instantaneous recoveries for all application workloads. Most FS1
data protection software services are included in the FS1 base price.
FS1 replication is turbocharged via the Oracle MaxRep replication
offload engine (optional and priced separately). MaxRep provides
synchronous replication, asynchronous replication, multi-hop
replication, encryption, compression, application consistent rollback,
remote location IO overheads freeing up the FS1 resources to focus on host IO. This separation enables
host IO performance to be unaffected even when a lot of data is being compressed, encrypted, and
replicated.
Finally, the FS1 is architected for Enterprise grade RAS (reliability, availability, and serviceability.) The
meaning of Enterprise-grade RAS is that the FS1 system aims at minimizing or even eliminating application
workload downtime for any reason. Disruptions are never a good thing. Therefore, microcode patches,
fixes, and upgrades are always operationally non-disruptive. And for hardware, there is never any single
point of failure (SPOF).
m

Storage Processing IO Prioritization and IO Contention Management

Each FS1 has built-in intelligent IO management that manages IOs based on business priorities. Each
application workload is designated as a premium, high, medium, low, or archive priority. The FS1 orders
the IOs resource utilization according to those designations. So if a low or medium IO comes in just ahead
of a high or premium IO, the FS1 will change the order to address the premium then high IO requests first.

Dragon Slayer Consulting Q4 2014

11

WHITE PAPER

Making Storage Systems Application Aware

In this manner, the FS1 eliminates FIFO storage system resource contention and noisy neighbor IO hogs.
Prioritization designations are adjustable on-the-fly by the storage administrator when an application
workload priority changes such as when there is a short-term marketing promotion.
m

Scalability

The FS1 scales up to 912TB of flash or 2.9PB of combined flash and


HDDs per HA (high availability) pair. The FS1 scales out to 8 HA pairs
clustered via RDMA (remote direct memory access) on FDR (full data
rate or 56Gbps) InfiniBand. This enables as much as 7.296PB of all
flash or as much as 23.2PB of combined flash and HDDs. Big data scale
is not an issue for the FS1. And it scales online in real-time. QoS Plus
enables storage domains to be elastically modified without downtime.
Volumes can be thin provisioned or fat. Thin volumes automatically
apply more capacity based on thresholds. The data copy services
enable fat volumes to be easily be converted to thin. FS1 provides
easy scalability solutions to all matters of scale.
m

Compliance

The issue of compliance is frequently as the British say, a sticky wicket. For the non-Brits, that means a
tricky problem to solve. Regulatory agencies regularly demand proof that compliance data has not been
manipulated or deleted for a specified period of time. The Oracle SecureWORMfs is a secure FS1
capability to that solves those compliance requirements.
Compliance mode cannot be altered after the file(s) in question have been made WORM. Standard mode
allows only specific administrators to modify file retention periods, ACCLs etc. The immutability and
retention periods can be invoked in several ways.

By simply closing a file, a default retention period will be applied. Retention defaults are set up on a
directory basis. Any sub-directories inherit the retention settings.

The FS1 supports the NetApp Trigger. This means specific applications such as Symantec Enterprise
Vault, CHMODS the file to 444 (Read only) and sets a future date via the touch command (touch -a -t
YYMMDDHHSS) which is when the retention period ends. Most all email archiving and indexing
software support the NetApp trigger.

Oracle FS1 SecureWORMfs enables a very flexible foundation for a long-term compliance archive.
m

Elasticity

What has become quite obvious at this point is that the FS1 is a highly automated storage system. It is
constantly monitoring application workload IO activity in real-time and adjusting to it. For example, the
FS1 actively monitors random vs. sequential IO activity and read vs. write activity. That information
enables the FS1 to adapt immediately to application workload dynamics. In the case of unexpected
sustained high write activity, the FS1 may change RAID 5 or 6 to RAID 10, or vice versa. Sequential data,
even very high bandwidth requirements can be satisfied by either flash SSDs or high capacity HDDs, so
that too is factored into any QoS Plus migration decisions. FS1 only migrates data when there will be a
real value to a migration and will continually optimize RAID types and adjust them as access patterns
change.
Dragon Slayer Consulting Q4 2014

12

WHITE PAPER

Making Storage Systems Application Aware

As previously mentioned; the FS1 QoS Plus enable the movement of workloads in real-time from a lower
performing storage domain to a higher one or vice versa; thin provisioning dynamically enables volumes
to be expanded as required; and the system can be scaled out to extremely large capacities online. The
FS1 is a highly automated elastic storage system.
m

Flexibility

Adaptability

As previously described, many of the FS1 system changes are automated, based on administrator-
established policies. Thus eliminating the majority of administrator expertise, intervention, scrambling, or
scheduled downtime. The FS1s ability to adapt to dynamically changing application workload
requirements is unprecedented.
Take the example of QoS Plus. There are 3 phases in the QoS Plus adaptive process for auto tiering:
collect, evaluate, move.
m

Efficiency

FS1 demonstrates efficiency in two key areas. The first is with QoS Plus described in considerable detail
earlier. The movement of data in fine grain 640KB blocks minimizes the amount of data in the most
expensive higher performance tiers. The heat maps are also utilized to increase Oracle Database 11g
and/or 12c performance efficiencies by moving colder, less important data to lower performing, lower
cost flash or HDD tiers.
Where the FS1 efficiencies truly stand out are when paired with the Oracle
Database 11g and/or 12c workloads. The FS1 and the Oracle Database 11g
and/or 12c are engineered together to take advantage of each others
capabilities. The Oracle Database 11g and/or 12c for example, utilizes Hybrid
Columnar Compression to greatly reduce the size of its database anywhere
from 10 to 50x when paired with an Oracle FS1 system. Since the Oracle
Database 11g and/or 12c is controlling the HCC process, it does not require the
FS1 to reduce the data footprint. Nor does it require the FS1 to rehydrate
(uncompress) the data in order to read it. The result is that the FS1 does not
add any latency to the process nor slow down the database response time.
Conversely, every other non-Oracle storage system does in fact add significant data reduction latency and
does slow Oracle Database 11g and/or 12c response times. The FS1 additionally provides up to 25x
greater data footprint reduction for Oracle Database 11g and/or 12c when compared to non-Oracle
storage systems. Hears why:

As just discussed, array-based inline or post processing deduplication and compression


noticeably reduce Oracle Database 11g and/or 12c performance because of the additional IO
latency. That latency results from the processing required to deduplicate and/or compress when
writing or reading data. That latency increases with the stored data growth. But the real irony
comes from the fact that array based inline deduplication delivers very little in actual Oracle
Database 11g and/or 12c data footprint reduction despite storage vendor claims. Limited
footprint reduction means limited storage savings. This is because each Oracle Database 8K block
has a high probability of being unique. The block headers are definitely unique and the data
portion of the block is highly likely to be so as well. Unique blocks do not deduplicate.

Oracle Database 11g and/or 12c compression on the other hand is engineered into the code.
When the Oracle Database 11g and/or 12c enables compression the data is natively compressed
in the buffer cache (in memory), and written to storage in the compressed format. It is later read
in the compressed format, and stays compressed in the buffer cache. Individual elements within
the compressed block may be decompressed for a given query; however, quite often the queries
can operate directly on the compressed formats. This means that more user data can fit into a
given amount of physical memory, and performance for both analytics and OLTP workloads will
improve. The analytics improve because fewer blocks need to be read when scanning large data
sets, and OLTP because the cache-hit ratio will increase based on the increased row density in
the buffer cache.

Manageability

The high degree of automation in the FS1 raises expectations about its management. FS1 manageability
does not disappoint. It too is highly automated and in fact called the FS1 System Manager. Of course it
Dragon Slayer Consulting Q4 2014

13

WHITE PAPER

Making Storage Systems Application Aware

has the highly intuitive interface and the command line interface that are table stakes today. However, it
is the built-in application profiles that greatly simplify FS1 implementations, operations, management,
and troubleshooting. Those application profiles take the process of tuning and deploying storage for
application workloads and databases from many hours or days to a single click.
Each of the one-click predefined application profiles is pre-tuned and tested to run out-of-the-box
optimized for the FS1. The profiles include Enterprise applications such as Oracle Database 11g and/or
12c, Oracle Fusion Middleware, Oracle E-Business Suite, Oracle JD Edwards, Oracle Siebel, Oracle
PeopleSoft, Microsoft Exchange, and many others. There are a total of 70 packaged application profiles.
These one-click profiles optimize provisioning and FS1 flash performance while minimizing administrator
expertise and tasks upfront and ongoing. Take the example of the Oracle Database 11g and/or 12c
storage profile: it can disaggregate database components such as index files, database tables, archive
logs, redo logs, control files, block change tracking files, and temp files so provisioning automatically
optimizes Oracle Database 11g and/or 12c performance without requiring any detailed knowledge of the
database components. FS1 administrators can add their own application profiles or easily modify existing

ones. All profiles are exportable to other Oracle FS1 systems standardizing and simplifying storage
provisioning across multiple geographically dispersed FS1s.
The FS1 also has the Oracle Enterprise Manager Plug-in. This plug-in empowers the Oracle Database 11g
and/or 12c administrator to be in complete control of their FS1 storage systems. The DBA can provision,
configure and manage FS1 storage resources; manage LUNS, storage domains, protection schedules, host
masking etc.; monitor and communicate with the FS1; all the while maintaining the highest level of
security with a private XML-RPC interface and HTTPS protocol.
And for those on the go, the FS1 can be managed from anywhere, anytime with IOS or Android devices
using apps available from the Apple iTunes app store and Google Play.
Oracle FS1 manageability simplifies and highly automates application workload management.
m

Total Cost of Ownership (TCO)

The FS1 does several clever and innovative things to reduce TCO. As previously discussed, the QoS Plus
and application profiles minimize higher cost tiers even in an all-flash system reducing both CapEx and
OpEx. Hybrid Columnar Compression greatly reduces the storage capacity footprint required for Oracle
Database 11g and/or12c further reducing upfront and ongoing costs.
But it is their zero license cost for all in-system software that is a true cost savings. Most FS1 software is
included in the storage system base price regardless of capacity, number of nodes, or performance. Since
most Enterprise software licenses are tied in some fashion to capacity, they continue to escalate whether
the license is perpetual plus maintenance or subscription. When there is a tech refresh, those software
licenses commonly have to be repurchased (perpetual plus maintenance model.) Oracle charges virtually
zero for those equivalent software licenses. This radically reduces the TCO over the life of the storage
system.
One other way Oracle reduces TCO is at implementation. FS1 storage systems come pre-racked, cabled,
configured, and ready to power-on as turnkey Enterprise-grade storage system, minimizing and
eliminating most if not all of the common mistakes that are made during implementation. Time-to-service
is reduced by orders of magnitude saving time, people, and increasing productivity.

Dragon Slayer Consulting Q4 2014

14

WHITE PAPER

Making Storage Systems Application Aware

Overall, the Oracle FS1 TCO should be better than the usual suspects even when those suspects are
heavily discounted. Longer TCO timeframes should produce a greater TCO advantage for the FS1.

Summary and Conclusion

Application workloads and storage are much too dependent on administrator expertise and manual labor-
intensive management. As data continues to grow and application workloads continue to increase at a
geometric rate storage administrators cannot effectively keep pace. The current situation is
unsustainable and quickly becoming intolerable. General workarounds such as AFAs, hybrid storage
systems, and VMware VSAN are merely partial fixes. But, they are incomplete and do not fully address
the problem.
Solving this problem requires much smarter far more automated storage that empowers cooperation with
the applications and their workloads.
The Oracle FS1 Series takes an innovative approach to application workload and storage system
cooperation. It solves the problems other systems fail to recognize by being application aware.

For More Oracle FS1 Information


Go to: https://www.oracle.com/storage/san/fs1/index.html
Paper sponsored by Oracle. About the author: Marc Staimer is the founder, senior analyst, and CDS of Dragon Slayer Consulting in
Beaverton, OR. The consulting practice of 16 years has focused in the areas of strategic planning, product development, and market
development. With over 34 years of marketing, sales and business experience in infrastructure, storage, server, software, databases,
and virtualization, hes considered one of the industrys leading experts. Marc can be reached at marcstaimer@me.com.

Dragon Slayer Consulting Q4 2014

15

You might also like