Professional Documents
Culture Documents
Abstract
This white paper describes EMC VPLEX features and functionality
relevant to VMware vSphere. The best practices for configuring a VMware
environment to optimally leverage EMC VPLEX are also presented. The
paper also discusses methodologies to migrate an existing VMware
deployment to the EMC VPLEX family.
July 2011
Table of Contents
Executive summary.................................................................................................. 4
Audience ............................................................................................................................ 4
Conclusion ............................................................................................................ 79
References ............................................................................................................ 79
Executive summary
The EMC VPLEX family of products running the EMC GeoSynchrony operating
system provides an extensive offering of new features and functionality for the era of
cloud computing. EMC VPLEX breaks the physical barriers of data centers and allows
users to access a single copy of the data at different geographical locations
concurrently, enabling a transparent migration of running virtual machines between
data centers. This capability allows for transparent load sharing between multiple
sites while providing the flexibility of migrating workloads between sites in
anticipation of planned events. Furthermore, in case of an unplanned event that
causes disruption of services at one of the data centers, the failed services can be
restarted at the surviving site with minimal effort thus minimizing recovery time
objective (RTO).
VMware vSphere virtualizes the entire IT infrastructure including servers, storage,
and networks. The VMware software aggregates these resources and presents a
uniform set of elements in the virtual environment. Thus VMware vSphere 4 brings
the power of cloud computing to the data center, reducing IT costs while also
increasing infrastructure efficacy. Furthermore, for hosting service providers, VMware
vSphere 4 enables a more economic and efficient path to delivering cloud services
that are compatible with customers internal cloud infrastructures. VMware vSphere 4
delivers significant performance and scalability to enable even the most resourceintensive applications, such as large databases, to be deployed on internal clouds.
With these performance and scalability improvements, VMware vSphere 4 can enable
a 100 percent virtualized internal cloud.
The EMC VPLEX family is thus a natural fit for a virtualization environment based on
VMware technologies. The capability of EMC VPLEX to provide both local and
distributed federation that allows transparent cooperation of physical data elements
within a single site or two geographically separated sites allows IT administrators to
break physical barriers and expand their VMware-based cloud offering. The local
federation capabilities of the EMC VPLEX allow collection of the heterogeneous data
storage solutions at a physical site and present the storage as a pool of resources for
VMware vSphere, thus enabling the major tenets of a cloud offering. Specifically, an
extension of the VPLEXs capabilities to span multiple data centers enables IT
administrators to leverage either private or public cloud offerings from hosting service
providers. The synergies provided by a VMware virtualization offering connected to
EMC VPLEX thus help customers to reduce total cost of ownership while providing a
dynamic service that can rapidly respond to the changing needs of their business.
Audience
This white paper is intended for VMware administrators, storage administrators, and
IT architects responsible for architecting, creating, managing, and using virtualized IT
environments that utilize VMware vSphere and EMC VPLEX technologies. The white
paper assumes the reader is familiar with VMware technology, EMC VPLEX, and
related software.
Scale-out clustering hardware that lets customers to start small and grow big with
predictable service levels
Distributed cache coherence for automatic sharing, balancing, and failover of I/O
across the cluster
A consistent view of one or more LUNs across VPLEX Clusters separated either by
a few feet within a data center or across asynchronous distances, enabling new
models of high availability and workload relocation
VPLEX Local: This solution is appropriate for customers that would like federation
of homogeneous or heterogeneous storage systems within a data center and for
managing data mobility between physical data storage entities.
VPLEX Metro: The solution is for customers that require concurrent access and
data mobility across two locations separated by synchronous distances. The
VPLEX Metro offering also includes the unique capability where a remote VPLEX
Metro site can present LUNs without the need for physical storage for those LUNs
at the remote site.
VPLEX Geo: The solution is for customers that require concurrent access and data
mobility across two locations separated by asynchronous distances. The VPLEX
Geo offering is currently not supported for live migration of VMware vSphere
virtual machines using VMware VMotion.
Site A
Site B
App
OS
App
OS
App
OS
App
OS
App
OS
App
OS
App
OS
App
OS
App
OS
App
OS
App
OS
App
OS
App
OS
App
OS
App
OS
App
OS
VMware ESX
VMware ESX
VMware ESX
VMware ESX
FC
SAN
FC
SAN
Distributed Volume
VPLEX
FC
Federation
Layer
FC
SAN
FC
SAN
Two directors, which run the GeoSynchrony software and connect to storage,
hosts, and other directors in the cluster with Fibre Channel and gigabit Ethernet
connections
One Standby Power Supply, which provides backup power to sustain the engine
through transient power loss
A management server, which manages the cluster and provides an interface from
a remote management station
An EMC standard 40U cabinet to hold all of the equipment of the cluster
A pair of Universal Power Supplies that provide backup power for the Fibre
Channel switches and allow the system to ride through transient power loss
10
11
Also shown in Figure 4 on page 10 in the bottom callout are the three high-level steps
that are required to provision storage from EMC VPLEX. The wizard supports a
centralized mechanism for provisioning storage to different cluster members in case
of EMC VPLEX Metro or Geo. The first step in the process of provisioning storage from
EMC VPLEX is the discovery of the storage arrays connected to it and the claiming of
storage that has been exposed to EMC VPLEX. This first part of this step needs to be
rarely executed since the EMC VPLEX proactively monitors for changes to the storage
environment. The wizard not only claims the storage in this step but also creates the
extents in that Storage Volume and finally the Virtual Volume that is created on
that extent. These components are called out in Figure 5. Figure 6 shows an example
of running through Step 1 of the EZ-Provisioning wizard which will create all objects
from the storage volume to the virtual volumes. It can be seen from the figure that
VPLEX software simplifies the process by automatically suggesting user-friendly
names for the devices that have been exposed from the storage arrays and using
those to generate names for both extents and devices.
12
13
14
When the initiators are zoned to the front-end ports of the EMC VPLEX, they
automatically log in to EMC VPLEX. As seen in Figure 8 these initiators are displayed
with the prefix, UNREGISTERED-, followed by the WWPN of the initiator. However,
initiators can also be manually registered before they are zoned to the front-end ports
of the VPLEX. The button highlighted in yellow in Figure 8 should be selected to
perform this operation. The initiators logged in to EMC VPLEX can be registered by
highlighting the unregistered initiator and clicking the Register button. This is
demonstrated in Figure 9. The inset in the figure shows the window that is opened
when the Register button is clicked. The inset also shows the facility provided by EMC
VPLEX to assign a user-friendly name to the unregistered initiator and also select a
host type for the initiator that is being registered. Once the information is added,
click OK to complete registration. Note that multiple unregistered initiators may be
selected at once for registration.
15
16
17
Once the available ports have been added, virtual volumes can be assigned to the
view as seen in Figure 12.
18
Figure 14 shows the storage view created using the wizard. The WWN of the virtual
volume exposed through the view is highlighted in the figure. This information is used
by VMware vSphere to identify the devices.
Figure 14. Viewing details of a storage view utilizing the VPLEX management interface
The newly provisioned storage can be discovered on the VMware ESX hosts by
performing a rescan of the SCSI bus. The result from the scan is shown in Figure 15. It
can be seen that the VMware ESX host has access to a device with WWN
6000144000000010e01443ee283912b8. A quick comparison of the WWN with the
information highlighted in green in Figure 14 confirms that the device discovered by
the VMware ESX host is indeed the newly provisioned VPLEX virtual volume. The
figure also shows the FC organizationally unique identifier (OUI) for EMC VPLEX
devices as 00:01:44.
19
Figure 15. Discovering newly provisioned VPLEX storage on a VMware ESX host
Once the VPLEX device has been discovered by the VMware ESX hosts, they can be
used for creating a VMware file system (datastore), or used as a RDM. However, for
optimal performance it is important to ensure the I/Os to the EMC VPLEX are aligned
to a 64 KB block boundary.
The VMware file system created using the vSphere Client automatically aligns the file
system blocks. However, a misaligned partition on a guest operating system can
impact performance negatively. Therefore, it is critical to ensure that all partitions
created on the guest operating system (either on a virtual disk presented from a
VMware file system or a RDM) are aligned to a multiple of 64 KB.
20
21
Connectivity considerations
EMC VPLEX introduces a new type of storage federation paradigm that provides
increased resiliency, performance, and availability. The following paragraph
discusses the recommendations for connecting VMware ESX hosts to EMC VPLEX. The
recommendations ensure the highest level of connectivity and availability to VMware
vSphere even during abnormal operations.
As a best practice, each VMware ESX host in the VMware vSphere environment should
have at least two physical HBAs, and each HBA should be connected to at least two
front-end ports on director A and director B on EMC VPLEX. This configuration ensures
continued use of all HBAs on the VMware ESX host even if one of the front-end ports
of the EMC VPLEX goes offline for either planned maintenance events or unplanned
disruptions.
When a single VPLEX Engine configuration is connected to a VMware vSphere
environment, each HBA should be connected to the front-end ports provided on both
the A and B directors within the VPLEX Engine. Connectivity to the VPLEX front-end
ports should consist of first connecting unique hosts to port 0 of each I/O module
emulating the front-end directors before connecting additional hosts to the remaining
ports on the I/O module. A schematic example of the wiring diagram for a four-node
22
23
from the VPLEX Engines to the storage arrays should follow the best practices
recommendation for the array. A detailed discussion of the best practices for
connecting the back-end storage is beyond the scope of this paper. Interested
readers should consult the TechBook EMC VPLEX Metro Witness Technology and High
Availability.
24
Figure 20. VMware kernel paths for a VPLEX device in Virtual Storage Integrator (VSI)
The connectivity from the VMware ESX hosts to the multiple-engine VPLEX Cluster can
be scaled as more engines are added. The methodologies discussed in this section
ensure all front-end ports are utilized while providing maximum potential
performance and load balancing for VMware vSphere.
25
simple load-balancing algorithm provided by the Round Robin policy. Therefore, for
VMware ESX version 4.x connected to EMC VPLEX, EMC recommends the use of the
Fixed policy with static load balancing by changing the preferred path. In addition, the
changes to the preferred path should be performed on all of the ESX hosts accessing
the VPLEX devices.
The preferred path on VMware ESX version 4 can be set using vSphere Client. Figure
21 shows the procedure that can be used to set the preferred path for a physical disk
in a VMware vSphere environment. Figure 22 shows the preferred path setting for two
datastores, each residing on an EMC VPLEX device presented from front-end ports A0FC00, A1-FC00, B0-FC00, and B1-FC00.
Figure 22. EMC VPLEX devices with static load balancing on ESX version 4
26
27
software creates a single pseudo device for a given array volume (LUN) regardless
of how many physical paths on which it appears. The pseudo device, or logical
volume, represents all physical paths to a given device. It is then used for creating a
VMware file system or for raw device mapping (RDM). These entities can be then used
for application and database access.
PowerPath/VEs value fundamentally comes from its architecture and position in the
I/O stack. PowerPath/VE sits above the HBA, allowing heterogeneous support of
operating systems and storage arrays. By integrating with the I/O drivers, all I/Os run
through PowerPath and allow for it to be a single I/O control and management point.
Since PowerPath/VE resides in the ESX kernel, it sits below the guest OS level,
application level, database level, and file system level. PowerPath/VEs unique
position in the I/O stack makes it an infrastructure manageability and control point,
thus bringing more value going up the stack.
PowerPath/VE features
PowerPath/VE provides the following features:
Dynamic load balancing PowerPath is designed to use all paths at all times.
PowerPath distributes I/O requests to a logical device across all available paths,
rather than requiring a single path to bear the entire I/O burden.
Automatic path testing PowerPath/VE periodically tests both live and dead
paths. By testing live paths that may be idle, a failed path may be identified
before an application attempts to pass I/O down it. By marking the path as failed
28
before the application becomes aware of it, timeout and retry delays are reduced.
By testing paths identified as failed, PowerPath/VE will automatically restore them
to service when they pass the test. The I/O load will be automatically balanced
across all active available paths.
PowerPath/VE management
PowerPath/VE uses a command set, called rpowermt, to monitor, manage, and
configure PowerPath/VE for vSphere. The syntax, arguments, and options are very
similar to the traditional powermt commands used on all other PowerPath
multipathing-supported operating system platforms. There is one significant
difference in that rpowermt is a remote management tool.
Not all vSphere installations have a service console interface. In order to manage an
ESXi host, customers have the option to use VMware vCenter Server or vCLI (also
referred to as VMware Remote Tools) on a remote server. PowerPath/VE for vSphere
uses the rpowermt command line utility for both ESX and ESXi. PowerPath/VE for
vSphere cannot be managed on the ESX host itself. There is neither a local nor
remote GUI for PowerPath on ESX. Administrators must designate a Guest OS or a
physical machine to manage one or multiple ESX hosts. The utility, rpowermt, is
supported on Windows 2003 (32-bit) and Red Hat 5 Update 2 (64-bit).
When the vSphere host server is connected to the EMC VPLEX, the PowerPath/VE
kernel module running on the vSphere host associates all paths to each device
presented from the array and assigns a pseudo device name (as discussed earlier).
An example of this is shown in Figure 24, which shows the output of rpowermt display
host=x.x.x.x dev=emcpower11. Note in the output that the device has four paths and
displays the default optimization mode for VPLEX devices ADaptive. The default
optimization mode is the most appropriate policy for most workloads and should not
be changed.
29
Symmetrix, CLARiiON, and VPLEX. This is extremely useful as a customer may have
many different types of devices presented to their vSphere environment. The Path
Management feature can set the policy for both NMP and PowerPath/VE. Figure 25
shows the navigation to change the multipathing policy with VSI.
30
31
32
33
Virtual Storage Integrator. The datastores are backed by both VPLEX and non-VPLEX
devices.
Figure 29. Details of the EMC storage device displayed by EMC Storage Viewer
The migration of the data from the Symmetrix VMAX arrays to the storage presented
from VPLEX can be performed using storage vMotion once appropriate datastores are
created on the devices presented from VPLEX. In this example the VM W2K8 VM1
will be migrated from its current datastore on the Symmetrix, to the datastore
vplex_boston_local, which resides on the VPLEX and shown earlier in Figure 17.
34
Figure 30 shows the steps required to initiate the migration of a virtual machine from
Management_Datastore_1698 to the target datastore, vplex_boston_local. The
storage vMotion functionality is also available via a command line utility. Detailed
discussion of storage vMotion is beyond the scope of this white paper. Further details
on storage vMotion can be found in the VMware documentation listed in the
References section.
Figure 30. Using storage vMotion to migrate virtual machines to VPLEX devices
35
The following steps need to be taken to encapsulate and migrate an existing VMware
deployment.
1. Zone the back-end ports of EMC VPLEX to the front-end ports of the storage array
currently providing the storage resources.
2. The next step should be to change the LUN masking on the storage array so the
EMC VPLEX has access to the devices that host the VMware datastores. In the
example below, the devices 4EC (for Datastore_1) and 4F0 (for Datastore_2) have
to be masked to EMC VPLEX.
Figure 31 shows the devices that are visible to EMC VPLEX after the masking
changes have been performed and a rescan of the storage array has been
performed on EMC VPLEX. The figure also shows the SYMCLI output of the
Symmetrix VMAX devices and their corresponding WWNs. A quick comparison
clearly shows that EMC VPLEX has access to the devices that host the datastores
that need to be encapsulated.
36
Figure 32. Encapsulating devices in EMC VPLEX while preserving existing data
4. After claiming the devices a single extent that spans the whole disk has to be
created. Figure 33 shows this step for the two datastores that are being
encapsulated in this example.
37
5. A VPLEX device (local device) with a single RAID 1 member should be created
using the extent that was created in the previous step. This is shown for the two
datastores, Datastore_1 and Datastore_2, hosted on device 4EC and 4F0,
respectively, in Figure 34. The step should be repeated for all of the storage array
devices that need to be encapsulated and exposed to the VMware environment.
Figure 34. Creating a VPLEX RAID 1 protected device on encapsulated VMAX devices
6. A virtual volume should be created on each VPLEX device that was created in the
previous step. This is shown in Figure 35 for the VMware datastores Datastore_1
and Datastore_2.
38
Figure 36. Creating a storage view to present encapsulated devices to VMware ESX
hosts
8. In parallel to the operations conducted on EMC VPLEX, new zones should be
created that allow the VMware ESX hosts involved in the migration access to the
front-end ports of EMC VPLEX. These zones should also be added to the
appropriate zone set. Furthermore, the zones that provide the VMware ESX host
access to the storage array whose devices are being encapsulated should be
removed from the zone set. However, the modified zone set should not be
activated until the maintenance window when the VMware virtual machines can
be shut down.
It is important to ensure that the encapsulated devices are presented to the ESX hosts
only through the VPLEX front-end ports. The migration of the VMware environment to
VPLEX can fail if devices are presented from both VPLEX and the storage subsystem to
the VMware ESX hosts simultaneously. Furthermore, there is a potential for data
corruption if the encapsulated devices are presented simultaneously from the storage
array and the VPLEX system.
9. When the maintenance window is open, all of the virtual machines that would be
impacted by the migration should be first shut down gracefully. This can be either
done with the vSphere Client, or command line utilities that leverage the VMware
SDK.
10. Activate the zone set that was created in step 8. A manual rescan of the SCSI bus
on the VMware ESX hosts should remove the original devices and add the
encapsulated devices presented from the VPLEX system.
11. The devices presented from the VPLEX system host the original datastore.
However, the VMware ESX hosts do not automatically mount datastores since
VMware ESX considers datastores as a snapshot since the WWN of the devices
39
exposed through the VPLEX system differs from the WWN of the devices presented
from the Symmetrix VMAX system.
12. Figure 37 shows an example of this for a VMware vSphere environment. The figure
shows all of the original virtual machines in the environment are now marked as
inaccessible. This occurs since the datastores, Datastore_1 and Datastore_2,
created on the devices presented from the VMAX system are no longer available.
Figure 37. Rescanning the SCSI bus on the VMware ESX hosts
VMware vSphere allows access to datastores that are considered snapshots in two
different ways the snapshot can be either resignatured or can be persistently
mounted. In VMware vSphere environments, the resignaturing process of datastores
that are considered snapshots can be performed on a device-by-device basis. This
reduces the risk of mistakenly resignaturing the encapsulated devices from the VPLEX
system. Therefore, for a homogeneous vSphere environment (that is, all ESX hosts in
the environment are at version 4.0 or later), EMC recommends the use of persistent
mounts for VMware datastores that are encapsulated by VPLEX. The use of persistent
mount also provides other advantages such as retaining of the history of all of the
virtual machines.
The datastore on devices encapsulated by VPLEX can also be accessed by
resignaturing it. However, using this method adds unnecessary complexity to the
recovery process and is not recommended. Therefore, the procedure to recover a
VMware vSphere environment utilizing the method is not discussed in this document.
40
The solution requires extension of VLAN to different physical data centers. Technologies such as Ciscos Overlay Transport
Virtualization (OTV) can be leveraged to provide the service.
41
design of the VMware environment has to account for a number of potential failure
scenarios and mitigate the risk for services disruption. The following paragraphs
discuss the best practices for designing the VMware environment to ensure an
optimal solution. For further information on EMC VPLEX Metro configuration readers
should consult the TechBook EMC VPLEX Metro Witness Technology and High
Availability available on Powerlink.
VPLEX witness
VPLEX uses rule sets to define how a site or link failure should be handled in a VPLEX
Metro or VPLEX Geo configuration. If two clusters lose contact, the rule set defines
which cluster continues operation and which suspends I/O. The rule set is applied on
a device-by-device basis or for a consistency group. The use of rule sets to control
which site is a winner, however, adds unnecessary complexity in case of a site failure
since it may be necessary to manually intervene to resume I/O to the surviving site.
VPLEX with GeoSynchrony 5.0 introduces a new concept to handle such an event, the
VPLEX Witness. VPLEX Witness is a virtual machine that runs in an independent (3rd)
fault domain. It provides the following features:
High availability for applications (no single points of storage failure, auto-restart)
Typically data centers implement highly available designs within a data center, and
deploy disaster recovery functionality between data centers. This is because within
the data center, components operate in an active/active (or active/passive with
automatic failover). However, between data centers, legacy replication technologies
use active/passive techniques and require manual failover to use the passive
component. When using VPLEX Metro active/active replication technology in
conjunction with VPLEX Witness, the lines between local high availability and longdistance disaster recovery are somewhat blurred because high availability is
stretched beyond the data center walls.
A configuration that uses any combination of VPLEX Metro and VPLEX Witness is
considered a VPLEX Metro HA configuration. The key to this environment is
AccessAnywhere. It allows both clusters to provide coherent read/write access to the
same virtual volume. That means that on the remote site, the paths are up and the
storage is available even before any failover happens. When this is combined with
host failover clustering technologies such as VMware HA, one gets a fully automatic
application restart for any site-level disaster. The system rides through component
failures within a site, including the failure of an entire array.
VMware ESX can be deployed at both VPLEX clusters in a Metro environment to create
a high availability environment. Figure 39 shows the Metro HA configuration that will
be used in this paper.
42
43
attached to Cluster-1. If this configuration also contains a VPLEX Witness, the witness
recognizes the outage and recommends Cluster-2 resume I/O rather than following
the rule set. However, VMware vSphere does not recognize these types of failures and
does not automatically move the failed virtual machines to the surviving site. To
protect against this, one can leverage the cross-connectivity feature of the VPLEX
storage system. Alternatively, users can intervene and manually move the virtual
machine to the Cluster-2 ESX server to provide access to the data. These options are
discussed in the section Cross-connecting VMware vSphere environments to VPLEX
clusters for increased resilience.
The following section will detail the example of a VPLEX Metro configuration with
VMware HA and VPLEX Witness and its behavior in case of a complete site failure.
44
Figure 40. Configuration of VMware clusters with EMC VPLEX Metro utilizing VPLEX
Witness
To take full advantage of the HA cluster failover capability in a VPLEX Metro cluster
that employs VPLEX Witness, it is necessary to create DRS Groups and then Rules that
will govern how the VMs will be restarted in the event of a site failure. Setting these
up is a fairly simple procedure in the vSphere Client. This will allow restarting of VMs
in the event of a failure at either Site A or Site B.
VMware DRS Groups and Rules
To access the wizards to create DRS Groups and Rules, right-click on the cluster
(Boston) and navigate to VMware DRS. Leave the automation level at Fully
automated to permit VMware to move the virtual machines as necessary as in Figure
41.
45
46
47
Now that the DRS groups are in place, rules need to be created to govern how the DRS
groups should behave when there is a site failure. There are two rules, one that
applies to cluster-1 and one that applies to cluster-2. The rule
Boston_VMs_Affinity_to_Cluster-1, seen in Figure 43, dictates that the VMs
associated with cluster-1 (through the DRS group) should run on the hosts
associated with cluster-1 (again through the DRS group) vice-versa for cluster-2s
VMs. It is important that the condition for both rules is should run and not must
run since this gives flexibility for the VMs to start-up on the two hosts that survive in
a site failure. Each rule will permit the VMs associated with the failing cluster to be
brought up on the two hosts that are part of the site that did not fail, and most
importantly, to automatically migrate back to their original hosts when the site failure
is resolved.
Figure 43. DRS Rules for cluster-1 and cluster-2 in VPLEX Metro utilizing VPLEX
Witness
Site failure
To demonstrate the effectiveness of VPLEX Witness in a VPLEX Metro with VMware a
site failure test was conducted based upon the previous configuration. At the start of
the test, the VMs were running on their preferred hosts as seen in Figure 44.
48
49
50
51
52
53
54
Figure 52. Restarting of virtual machines on the remaining hosts after site failure
Recovery
4. When the VPLEX site comes back online after site failure, the distributed devices
in the consistency groups will need to be resumed at the failed site. This is
accomplished using the VPLEXcli command resume-at-loser shown in Figure 53.
Unless explicitly configured otherwise (using the auto-resume-at-loser
property), I/O remains suspended on the losing cluster. This prevents
55
Figure 54. Rebuilding a virtual volume after resuming from site failure
5. As the ESX hosts come online, they will re-enter the VMware HA cluster. Once
again the DRS rules will come into play and the virtual machines will be migrated
off of the current hosts back to their original hosts. This is seen in Figure 55. This
is completely transparent to the user.
56
The VPLEX system provides redundant hardware to protect against failure of individual components. A catastrophic failure of
a VPLEX system is thus a highly unlikely event.
57
(highlighted in green) are from a VPLEX system. The inset shown in the figure is for
one of the virtual machines (Distributed_RH_1) hosted on the datastore,
Distributed_DSC_Site_A, created on a distributed RAID-1 VPLEX volume. The inset
shows the status of the VMware tools running within the guest operating system as
normal.
Figure 56. Display of a normally operating VMware vSphere environment with VPLEX
volumes
The state of the VMware vSphere cluster in response to a complete failure of the
VPLEX system is shown in Figure 57. It can be observed from the figure that the host
svpg-dell-c1-s09 located at the failed site is still communicating with the vSphere
vCenter Server although it has lost all access to the VPLEX virtual volumes. The
datastores hosted on the failed VPLEX volumes are either missing or reported as
inactive. In addition, as it can be seen from the inset in Figure 57, the status of the
VMware tools running in the virtual machine exhibited in Figure 56 has now changed
to Not Running.
The state exhibited in Figure 57 occurs due to the response of VMware ESX kernel to a
situation where all paths to a device (or devices) are not available (frequently referred
58
Figure 57. State of the vSphere environment after complete failure of VPLEX system
The incorrect behavior of VMware ESX hosts and the resulting unavailability of
application in response to the failure of a VPLEX system can be avoided by leveraging
another feature of GeoSynchrony version 5.0. Starting with this version, it is possible
to cross-connect hosts to VPLEX directors located at the remote site. When this
feature is leveraged in conjunction with VPLEX Witness, it is possible for ESX hosts to
ride through a localized VPLEX system failure by redirecting its I/O to the VPLEX
directors at the surviving site. However, it should be noted that to leverage the cross-
59
connect feature, the round-trip latency between the two sites involved in a VPLEX
metropolitan configuration cannot be greater than 1 ms 3.
Figure 58 shows the paths that are available to the ESX host, svpg-dell-c1-s09, to
access the VPLEX distributed RAID volume hosting the datastore
Distributed_DSC_Site_A. The screenshot generated using EMC Virtual Storage
Integrator shows that the ESX host has eight distinct paths to the device, four of
which are presented from a pair of directors from one VPLEX system and the
remainder from a pair of directors from a different VPLEX system. However, the
information shown in the figure is not in of itself sufficient to determine if the VPLEX
systems are collocated at a single physical location or if they are separated. In
addition, it can be seen in Figure 58 that VMware native multipathing (NMP) 4 software
cannot distinguish the locality of the VPLEX front-end ports and marks all paths as
active. Therefore, it is important for users to be cognizant of the possibility that failure
of a path to one of the front-end ports of the local VPLEX system can result in the ESX
host accessing the data through directors at the secondary site and thus resulting in
slightly degraded performance.
Figure 58. Displaying available paths and the multipathing policy using EMC VSI
Figure 59 shows the WWNs of the VPLEX front-end ports of directors at both sites
used in the example presented in this section. A comparison of WWNs shown in this
figure and those presented in Figure 58 indicates that the ESX host, svpg-dell-c1-s09,
can access the device hosting the datastore, Distributed_DSC_Site_A, through the
front-end ports of directors at both sites.
3
4
Customers requiring support for larger latencies should submit a RPQ to EMC for further consideration.
EMC PowerPath/VE exhibits the same behavior.
60
Figure 59. World Wide Names of the front-end ports of VPLEX directors
61
Furthermore, it can also be seen from Figure 59 that the best practices recommended
in the earlier section are followed while connecting the ESX hosts to the VPLEX
directors at the second site. Although not shown, every host at a particular site is
cross-connected to the appropriate directors at the alternate site. This type of
configuration ensures that all ESX hosts have alternate paths to the VPLEX directors
at a peer site that allows it access to the VPLEX virtual volumes in case of a complete
failure of the local VPLEX system.
Figure 60 shows the state of the ESX host svpg-dell-c1-s09 cross-connected to VPLEX
directors at both sites after simulation of a catastrophic failure of VPLEX system at the
local site. The figure also shows the status of the virtual machine that was exhibited
earlier during the presentation of the behavior of the VMware vSphere environment
without cross-connectivity. The figure clearly shows that the ESX host continues to
work properly even though it has lost all access to the local VPLEX system. The virtual
machine also continues to operate properly, accessing the data through the VPLEX
directors at the surviving site.
It is interesting to note that the datastore and all virtual machines hosted on a VPLEX
volume that is not replicated to the second site become inaccessible due to the
unavailability of the VPLEX array. This behavior is expected since there are no
additional copies of the data that can be used to restore normal operations for the
virtual machines that are hosted on those volumes. However, since the decision to
replicate a VPLEX volume or not is based on the SLA needs of a business, the loss of
the applications associated with the failed datastore should be tolerable. The
datastore, and consequently the virtual machines residing on that datastore, is
automatically restored when the VPLEX system recovers 5.
It is important to note that some of the virtual machines residing on the ESX hosts in a
state that is shown in Figure 60 will exhibit degraded performance since all I/Os
generated by them suffers an additional latency due to the distance between the ESX
host and the location of the surviving VPLEX system. Therefore, if the failed VPLEX
system is not expected to be restored for a significant amount of time, it is advisable
to use VMware VMotion to migrate the impacted virtual machines to the surviving
site.
Figure 61 shows the paths that are available from the ESX host to the VPLEX volume
hosting the datastore, Distributed_DSC_Site_A, discussed in the previous
paragraphs. It can be seen from the figure that four of the eight paths are dead. A
comparison with Figure 58 shows that the active paths belong to the directors at the
surviving site.
Due to the reaction of VMware vSphere environments to APD scenarios, virtual machines on failed datastores may require a
reboot after the datastore is fully functional.
62
Figure 60. State of the vSphere environment with cross-connect after complete failure
of the VPLEX system
63
Figure 61. State of the paths to a datastore after the failure of a local VPLEX system
Figure 62 shows the VMware vSphere environment after the local VPLEX system has
recovered. Also shown in the figure as an inset are the paths to the datastore,
Distributed_DSC_Site_A. It can be seen from the figure and the inset that the
restoration of the failed VPLEX system resulted in the return of the VMware vSphere
environment to a normal state.
It should be obvious from the discussion in this section and the previous section that
a VMware vSphere environment cross-connected to a VPLEX Metro HA system
provides the highest level of resilience and the capability to eliminate application
unavailability for a vast majority of failure scenarios. The solution can also
automatically recover failed virtual machines and the applications it hosts in
situations where disruption to the services cannot be avoided.
64
Figure 62. VMware vSphere environment after the recovery of a failed VPLEX system
65
site. To address this, the VPLEX management interface provides the capability to
manually resume I/Os to the detached devices. However, a more detailed discussion
of the procedure to perform these operations is beyond the scope of this white paper.
The TechBook EMC VPLEX Metro Witness Technology and High Availability TechBook
should be consulted for further information on VPLEX Metro.
Figure 63 shows the recommended cluster configuration for VMware deployments
that leverage devices presented through EMC VPLEX Metro that does not include the
VPLEX witness feature. It can be seen from the figure that VMware vSphere is divided
into two separate VMware clusters. Each cluster includes the VMware ESX hosts at
each physical data center (Site A and Site B). However, both VMware clusters are
managed under a single Datacenter entity, which represents the logical combination
of multiple physical sites involved in the solution. Also shown in the figure, as an
inset, are the settings for each cluster. The inset shows that VMware DRS and VMware
HA are active in each cluster, thus restricting the domain of operation of these
components of the VMware offering to a single physical location.
Figure 63. Configuration of VMware clusters utilizing devices from EMC VPLEX Metro
66
Although Figure 63 shows only two VMware clusters, it is acceptable to divide the
VMware ESX hosts at each physical location into multiple VMware clusters. The goal
of the recommended configuration is to prevent intermingling of the ESX hosts at
multiple locations into a single VMware cluster object.
Although the best practices recommendation is to segregate the ESX hosts at each
site in a separate cluster, VMware and EMC support a stretched cluster configuration
that includes ESX hosts from multiple sites. The VMware knowledge base article
1026692 available at http://kb.vmware.com/kb/1026692 should be consulted for
further details if such a configuration is desired.
The VMware datastores presented to the logical representation of the conjoined
physical data centers (Site A and Site B) are shown in Figure 64. The figure shows that
a number of VMware datastores is presented across both data centers 6. Therefore,
the logical separation of the VMware DRS and VMware HA domain does not in any
way impact, as discussed in the following section, the capability of VMware vCenter
Server to transparently migrate the virtual machines operating in the cluster
designated for each site to its peer site. The figure also highlights the fact that a
VPLEX Metro configuration in and of itself does not imply the requirement of
replicating all of the virtual volumes created on EMC VPLEX Metro to all physical data
center locations 7. Virtual machines hosted on datastores encapsulated on virtual
volumes with a single copy of the data and presented to the VMware cluster at that
location are, however, bound to that site and cannot be nondisruptively migrated to
the second site while providing protection against unplanned events. The need to
host a set of virtual machines on non-replicated virtual volumes could be driven by a
number of reasons including business criticality of the virtual machines hosted on
those datastores.
The creation of a shared datastore that is visible to VMware ESX hosts at both sites is enabled by creating a distributed
device in EMC VPLEX Metro. Detailed discussion of the procedures to create distributed devices is beyond the scope of this
paper. Readers should consult the TechBook EMC VPLEX Architecture and Deployment Enabling the Journey to the Private
Cloud for further information.
7
It is possible to present a virtual volume that is not replicated to VMware clusters at both sites. In such a configuration, when
the I/O activity generated at the site that does not have a copy of the data is not in the cache of the VPLEX Cluster at that site,
it is satisfied by the storage array hosting the virtual volume. Such a configuration can impose severe performance penalties
and does not protect the customer in case of unplanned events at the site hosting the storage array replication or for a onetime migration of virtual machines between data centers.
67
68
Figure 65. View of the datastores and virtual machines used in this study
69
70
71
Figure 68. vCenter Server allows live migration of virtual machines between sites
Figure 69 shows a snapshot during the non-disruptive migration of a virtual machine
from one site to another. The figure also shows the console of the virtual machine
during the migration process, highlighting the lack of any impact to the virtual
machine during the process.
72
73
Technologies such as storage vMotion can be used to migrate the virtual machine to a VPLEX Metro virtual volume that is
replicated and available at both sites, and thus enable the capability to migrate the virtual machine non-disruptively between
sites. However, this approach adds unnecessary complexity to the process. Nonetheless this process can be leveraged for
transporting virtual machines that cannot tolerate the overhead of synchronous replication.
74
75
2. The next step is the addition of the newly created device as a mirror to the existing
device that needs the geographical protection. This is shown in Figure 73, and just
like the previous step is independent of the host operating system utilizing the
virtual volumes created using the devices.
Figure 73. Changing the protection type of a RAID 0 VPLEX device to distributed RAID
1
3. Create or change the LUN masking on the EMC VPLEX Metro to enable the VMware
ESX hosts attached to the nodes at the second site to access the virtual volume
containing the replicated devices. Figure 74 shows the results after the execution
of the process.
76
Figure 74. Creating a view to expose the VPLEX virtual volume at the second site
4. The newly exported VPLEX virtual volume that contains replicated devices needs
to be discovered on the VMware cluster at the second site. This process is the
same as adding any SCSI device to a VMware cluster. Figure 75 shows the
replicated datastore is now available on both VMware clusters at Site A and Site B
after the rescan of the SCSI bus.
Figure 75. Viewing VMware ESX hosts that have access to a datastore
77
a virtual machine can expose interesting challenges in case of a site failure. This is
especially true if the vCenter Server is used to manage VMware environments also
deployed on the same EMC VPLEX Metro cluster.
As discussed in previous paragraphs, in case of a site failure or network partition
between the sites, EMC VPLEX automatically suspends all of the I/Os at one site. The
site at which the I/Os are suspended is determined through a set of rules that is
active when the event occurs. This behavior can increase the RTO in case of a site
failure and the VMware vCenter Server is located on an EMC VPLEX distributed volume
that is replicated to both sites. The issue can be best elucidated through the use of an
example.
Consider a VMware environment deployment in which vCenter Server and SQL Server
are running on separate virtual machines. However, the two virtual machines are
hosted on a replicated EMC VPLEX device, D, between two sites, A and B. In this
example, let us assume that the vCenter Server and SQL Server are executing at Site
A. The best practices recommendation would therefore dictate that the I/Os to device
D be suspended at Site B in case of link or site failure. This recommendation allows
the virtual machines hosting the vSphere management applications to continue
running at Site A in case of network partition 9. However, if a disruptive event causes
all service at Site A to be lost, the VMware environment becomes unmanageable
since the instance of device D at Site B would be in a suspended state unless VPLEX
witness technology is deployed. To recover from this, a number of corrective actions
listed below would have to be performed:
1. The I/Os to device D at site B have to be resumed in case VPLEX Witness
technology is not used. This can be done through the VPLEX management
interface.
2. Once the I/Os to the device D has been resumed, the vSphere Client should be
pointed to one of the ESX hosts at Site B that has access to the datastore hosted
on device D.
3. The virtual machine hosting the vCenter Server and SQL Server instance has to be
registered using the vSphere Client.
4. After the virtual machines are registered, the SQL Server should be started first.
5. Once the SQL Server is fully functional, vCenter Server should be started.
These steps would restore a fully operational VMware management environment at
Site B in case of a failure at Site A.
The example above clearly shows that hosting a vCenter Server on a replicated VPLEX
Metro device can impose additional complexity in the environment in case of a site
failure. There are two possible techniques that can be used to mitigate this:
9
It is important to note that in case of a network partition, the virtual machines executing at Site B on devices that have rules
to suspend I/O at site A in case of a link failure continue to run uninterrupted. However, since vCenter Server located at Site A
has no network connectivity to the servers at Site B, the VMware ESX hosts environment at Site B cannot be managed. This
includes unavailability of advanced functionality such as DRS and VMotion.
78
vCenter Server and SQL Server should be hosted on non-replicated EMC VPLEX
devices. VMware Heartbeat can be used to transparently replicate the vCenter
data between the sites and provide a recovery mechanism in case of site failure.
This solution allows the vCenter Server to automatically fail over to the surviving
site with minimal to no additional intervention. Readers should consult the
VMware vCenter Server Heartbeat documentation for further information.
vCenter Server and SQL Server can be located at a third and independent site (for
example, the site at which the VPLEX Witness machine is located) that is not
impacted by the failure of the site hosting the VMware ESX hosts. This solution
allows the VMware management services to be available even during network
partition that disrupts communication between the sites hosting the EMC VPLEX
Metro.
Customers should decide on the most appropriate solution for their environment after
evaluating the advantages and disadvantages of each.
Conclusion
EMC VPLEX running the EMC GeoSynchrony operating system is an enterprise-class
SAN-based federation technology that aggregates and manages pools of Fibre
Channel attached storage arrays that can be either collocated in a single data center
or multiple data centers that are geographically separated by MAN distances.
Furthermore, with a unique scale-up and scale-out architecture, EMC VPLEXs
advanced data caching and distributed cache coherency provide workload resiliency,
automatic sharing, and balancing and failover of storage domains, and enables both
local and remote data access with predictable service levels. A VMware vSphere data
center backed by the capabilities of EMC VPLEX provides improved performance,
scalability, and flexibility. In addition, the capability of EMC VPLEX to provide nondisruptive, heterogeneous data movement and volume management functionality
within synchronous distances enables customers to offer nimble and cost-effective
cloud services spanning multiple physical locations.
References
The following includes more information on VPLEX and can be found on EMC.com and
Powerlink.
Implementation and Planning Best Practices for EMC VPLEX Technical Notes
79
The following includes more information on EMC with VMware products and can be
found on EMC.com and Powerlink:
EMC PowerPath/VE for VMware vSphere Version 5.4 and Service Pack Installation
and Administration Guide (Powerlink only)
80