You are on page 1of 9

The Guide To Selecting

Flash for Virtual


Environments

Storage Switzerland, LLC

GEORGE CRUMP
LEAD ANALYSTSTORAGE SWITZERLAND

The Guide To
Selecting Flash
for Virtual
Environments

High performance flash based storage has dramatically improved


the storage infrastructures ability to respond to the demands of
servers and the applications that count on it. Nowhere does this
improvement have more potential than in the virtualized server
environment. The performance benefits of flash are so great that it
can be deployed indiscriminately and still performance gains can be
seen. But doing so may not allow the environment to take full
advantage of flash performance. It may also be a much more
expensive deployment model and put data at risk. Modern data
centers need to understand which forms of flash and which
deployment models will show the greatest return on investment
while not risking any data.

The Value of Flash for Virtual Servers


Flash storage allows for a higher number of virtual machines (VM)
per host. Increasing VM density reduces the number of physical
servers required and thereby reduces one of the largest ongoing
costs, buying more physical servers, which often are configured with
multiple processors and extra DRAM. At the same time, the high
performance and low latency of flash allows more mission critical
applications to be virtualized. Finally, storage and hypervisor vendors
are getting smarter about leveraging this faster form of storage by
including storage quality of service (QoS) intelligence in their systems
or environments.
To see the full return on the flash investment, server administrators
will need to dramatically increase the total number of VMs per
physical server and start virtualizing mission critical workloads. It also
means investing more in the server internals as well as the network
that connects servers to servers and servers to storage.
Administrators should consider at least a 30:1 VM to host ratio, even
if those VMs are CPU and storage resource intensive.

Flash Form Factors


Third

First
Flash is available in four form factors, the most common of which is
the solid state disk drive (SSD), which is flash memory containerized
in a package similar to a hard disk drive. SSDs are popular both in
servers and arrays because they can leverage the existing investment
in hard drive bays and shelves. They are generally the least expensive
form of flash that can be purchased. On the down side SSDs are not
necessarily the most efficient form of flash. They require access
through the standard SCSI storage protocol stack, and the form
factor requirements limit their density.

Second
The second form is a custom form factor that fully leverages the
reality that flash is memory first and storage second. This approach,
available in some arrays, requires a custom designed board and
interface. These systems allow for maximum performance and
maximum density. But they are proprietary and the customer is fully
dependent on the vendor to keep pace with flash memory
technology.

The third form is PCIe flash, which is flash storage placed on a PCIe
board that can be installed in a server or storage system, although
the most common by far is installation in a server. PCIe flash also
takes two forms. The first is native PCIe flash, which provides native
access to the PCIe bus and does not need to go through the SCSI
storage stack. Doing so reduces latency and delivers greater
performance, but this form of PCIe flash requires specific drivers in
order to be accessed. Most PCIe flash vendors provide VMware,
Windows and Linux support, so the driver concern should not be an
issue for most virtualized environments.
The other form of PCIe flash is better described as PCIe SSD which
means the PCIe board essentially has a SSD mounted on it along
with a storage controller to manage that SSD. While it reintroduces
SCSI latency, it often does not require special drivers.

Fourth
The forth form is Memory Bus Flash, which allows flash to be
installed in the same I/O path and slots as DRAM. Memory Bus Flash
physically looks like a memory DIMM. This network further reduces
latency for the ultimate in performance. Memory bus flash is so far
only available to install in a server, but no storage vendor as of yet
has announced a product that leverages this form of flash. Part of the
challenges for memory bus flash is that motherboards need to be
updated to natively support these modules, but so far only IBM and
Super Micro are shipping products that support memory bus flash.

Flash Implementation Options


In general, there are two locations that flash can be implemented;
server side and shared on a network. And there are three options for
shared flash implementations; server aggregated flash, hybrid flash
arrays (mixed flash and HDD) or all-flash. Each of the possible
implementations has specific advantages.

Server Side
A common starting point for many virtual server administrators is
server side deployment. In fact, many environments have moved to
using SSD, as described above, for server boot and memory swap
areas. Often in these situations the shared storage is still 100% hard
drive based and the network may be a generation or two behind the
current state of the art. In these cases a server side solution may be
ideal. These solutions, especially for smaller environments, may be
substantially less expensive than their shared storage alternatives,
especially if a network upgrade can be avoided.
Server side flash, beyond boot drives, typically involves two
components. First there is the flash device itself which can be any of
the form factors described above, but often it is higher performing
than the drive form factor SSD, with PCIe SSD being the most
common, today. This hardware is then combined with caching
software. This software automatically copies the most active data
from the local hard drive or shared storage array into the flash area
on the server. This provides maximum performance since the VM is
accessing data from flash that is installed directly in the server.

Caching Software
Because hypervisors like VMware are a clustered environment, extra consideration should be taken when choosing one for the environment.The
type of caching to be used needs to be decided on. There are three options; write around cache, write-through cache and write-back cache.
All three of these cache types write the most frequently read data to the flash storage area, but vary in how they handle writes. Understanding these
caching types and their pros and cons is also important for environments considering shared hybrid arrays, discussed below.

Write around caches are the safest form of caching available. All
new or modified data is written to the local or shared hard disk array.
Only after the data has been accessed enough is it promoted to the
flash inside the server. This technique is the most gentle on flash,
meaning that fewer write I/Os happen to the flash tier and the data
that is written to flash is qualified to be there. The downside is that all
writes are limited to hard disk and network performance. Also it will
take a longer for the data to be promoted to the flash tier, meaning a
much higher percentage of reads will be serviced from hard disk.
Write-through caches write data to both the flash area and the
hard disk area at the same time. The application is not given
acknowledgement of a completed write until the hard disk area has
completed that write, so this technique is as safe as write around
caching. The advantage of this technique is that the newly written
data, the most likely to be read again, is already in cache. This means
there are fewer reads that come from the HDD. There are three
downsides to this technique. First, it does not eliminate writes to flash
like write-around caching, so the chance of a flash wear-out is higher.
Second, the technique does not improve write performance since
acknowledgment has to come from the hard disk tier. Third, it is
susceptible to a cache over-run where a large sequential write could
replace all the data in the cache.

The final cache type is write-back. With this method all writes are
cached in the local flash storage and then written to the hard disk
drive tier asynchronously, typically a few seconds to a few minutes
after completion of the write on the flash tier. This method provides
equal improvements on both read and write performance. The
downside is that there is a point in time where data may be on flash
and not on HDDs, but the application has had the write fully
acknowledged. That means that data written to the flash tier could
be lost if there is a flash failure or the server. As a result, redundancy
should be part of a write-back implementation, at a minimum flash
should be mirrored in the server, but preferably by some sort of
external write to another flash card installed in another server or a
shared flash appliance.
There is also a downside to write-back caching in the virtualized
server environment. If a VM is migrated from one host to another, the
data that is in the write cache has to be flushed prior to migration.
This means that the cache software vendor has to be integrated with
the hypervisor software to make sure that this occurs. Most caching
vendors have VMware support, but many are lacking Hyper-V
support.

Shared Flash
The concerns and complexity that surrounds the caching software
selection has lead to an increase in the adoption of a shared flash
option. Generally there are three choices available; aggregated
server flash, shared hybrid array and an all-flash array. Since they
are shared and typically have data protection built in they avoid the
challenges associated with server side deployments, but they all
introduce network latency.

Aggregated Server Flash


Aggregated server flash leverages the cost effectiveness of server
side flash, but has the redundancy of shared flash arrays. Essentially,
the flash resources installed in each server are aggregated into a
virtual pool. That pool can be a cache in front of a hard disk tier or it
can be a dedicated flash only tier of storage. Software installed on
each server that contributes to the aggregated volume and then that
volume is accessible by the VMs in that hypervisor cluster. A RAID
like, often erasure coding, redundancy is typically implemented so
that the failure of any flash device or server should not lead to the
loss of data.
While the flash aggregation concept has a lot of appeal there are
some downsides. First, the networking between servers has to be
expertly configured so that performance is not impacted. Most of
these solutions perform their aggregation over IP based networks so
a 10GbE connection is recommended, but it does not need to be
dedicated solely to the storage aggregation. Second, there is
resource consumption on each host as it manages the aggregation
and communicates with the other hosts. This overhead should be
manageable with todays available compute resources but
aggregated server flash may lead to lower VM densities than what is
possible with dedicated shared flash. Third, and potentially the
biggest, is that this is a new way to implement storage and IT
professionals may simply not be comfortable with the choice.

Shared Hybrid Storage

Shared All-Flash

Shared hybrid storage is the deployment of flash on a more traditional


shared storage platform. In this use case, it is a mixture of flash
storage with HDD. The goal is to deliver performance while keeping the
cost per GB affordable. They will typically use one of the caching
techniques described above to move data between the HDD and flash
tiers, but the most common is a write back technique. Again, because
this is a shared storage array, the traditional concerns of flash failure
and VM migration are eliminated.

The final option is shared all-flash. If it can be afforded and combined


with a modern storage network (10GbE or 16Gbps Fibre Channel) it
eliminates much of the concerns discussed in this article. There is no
tuning and no concerns about cache misses. In fact, one of the
biggest surprises that IT professionals experience after implementing
an all-flash array is how greatly their storage management time is
reduced.

The concern with hybrid storage systems is inconsistent performance.


While they provide excellent performance when retrieving data from
flash, performance suffers when there is a cache miss, and data has to
be recalled from the hard disk tier. Hybrid systems can be fine tuned to
deliver near all-flash consistency. There are three keys to achieving that
success. First, overbuying a little on flash capacity, most vendors
suggest that flash be 5% of total capacity. Storage Switzerland finds
that cache misses can be almost eliminated by increasing this to 10%.
Second, by locking in certain mission critical or performance sensitive
workloads into flash. The third step is to optimize the flash tier through
deduplication and compression technologies, so that the 10% actually
appears larger than it is.

The key with all-flash is can we afford it?. All-flash vendors have
gone to great lengths to reduce costs. First, the price of raw flash
has decreased significantly over the past few years, and vendors
have aggressively applied storage optimization techniques like
deduplication and compression. Deduplication is the elimination of
redundant data across files and compression is the elimination of
redundant data within a file. On average, when both techniques are
combined most environments report a 5:1 efficiency rating, and this
rating is higher in a virtual server environment, by as much as 9:1.
That means a 10TB system could appear to store 50 to 90TBs of
information.
The second factor that brings all-flash pricing more inline is how
much more value it can drive from the rest of the enterprise. Since
this is a dedicated device, hosts consume no resources and since it
is all-flash, all the time performance is consistent. This means that
VM density can be pushed to new limits when all-flash is the storage
infrastructure. The elimination of two times as many hosts as in other
deployment models could more than cover the additional cost of
all-flash.

Conclusion

Flash is almost tailor made for virtualization. It handles the


random I/O profile of the environment with relative ease and
allows for additional cost savings by increasing VM density.
The challenge facing IT administrators is which of these
options to pick. All-Flash, if it can be afforded, provides the
simplest solution of them all, but as the realities of budget
constraints set in, the other options become more valid. The
key is to identify the configuration that can provide the most
consistent performance without increasing complexity.
For most environments Shared Hybrid Storage, with a slightly
higher allocation of flash capacity, makes the most sense.
This is especially true if the storage system has the ability to
pin or hard allocate certain workloads to flash. This allows
mission critical workloads to be assured high performance
while business critical workloads can count on the
automation provided by the cache technology within the
Hybrid Array to properly accelerate business critical
workloads.
This independently developed document is sponsored by Tegile
Systems. The document may utilize publicly available material from
various vendors, including Tegile, but it does not necessarily reflect
the positions of such vendors on the issues addressed in this
document.
Tegile Systems makes both All-Flash and Hybrid arrays that are
feature rich. This allows their customers to select the type of flash
solution that makes the most sense for their specific data center.
Tegiles storage arrays also include data efficiency techniques like
deduplication and compression, making them ideal for virtualized
environments.

Storage Switzerland, LLC

Storage Switzerland is the leading storage


analyst firm focused on the emerging storage
categories of memory-based storage (Flash),
Big Data, virtualization, and cloud
computing. The firm is widely recognized for
its blogs, white papers, and videos on
current approaches such as all-fash arrays,
deduplication, SSDs, software-defined
storage, backup appliances, and storage
networking. The name Storage Switzerland
indicates a pledge to provide neutral analysis
of the storage marketplace, rather than
focusing on a single vendor or approach.

You might also like