Professional Documents
Culture Documents
Business Continuity
Overview
Alan McSweeney
Objectives
• Types of Storage
• Enabling Greater Resource Utilisation Through Storage System Virtualisation
• Business Continuity and Disaster Recovery
• Systems Center Operations Manager (SCOM)
• Managing Disk Based Backup Through Storage Virtualisation Single Instance Storage
(Deduplication)
• Enabling greater Data Management Through Storage System SnapShots
• Enabling Greater Application Resilience Through SnapShot Technologies
• Enabling Greater Data Resilience Through Storage System Mirroring
• Easing the Pain of Development Through SnapShot Cloning
• Rapid Microsoft Exchange Recovery through Storage Systems Technologies
• Rapid Microsoft SQL Recovery through Storage Systems Technologies
• Rapid Recovery of Oracle DB Through Storage Systems Technologies
• Server Virtualisation and Storage
• Storage Management and Business Continuity/Disaster Recovery
• Storage Management and WAN
• DAS
• NAS
• SAN
Storage Protocols
Storage Protocols
Storage Protocols
• CIFS
− Common Internet File System
− Predominantly Windows Environments
• CIFS
− Common Internet File System
− Predominantly Windows Environments
• NFS
− Network File System
− Non Windows Environments
• Unix, Linux, NetWare, VMware
• Fibre Channel
− Uses Fibre Channel Switches
• FC-AL
• 1Gb, 2Gb, 4Gb
• Fibre Channel
− Uses Fibre Channel Switches
• FC-AL
• 1Gb, 2Gb, 4Gb
• iSCSI
− Uses Ethernet Switches
• 1GB
• 10Gb
• Inexpensive
− Use of large capacity SCSI and SATA drives
− No added expense for controllers
• Inexpensive
− Use of large capacity SCSI and SATA drives
− No added expense for controllers
• Performance
− Dedicated disk array with various cache options
• Inexpensive
− Use of large capacity SCSI and SATA drives
− No added expense for controllers
• Performance
− Dedicated disk array with various cache options
• Skill Levels
− No new skill levels required to mange storage
• Captive Storage
− Storage can only be used by one server
• Captive Storage
− Storage can only be used by one server
• Performance
− Disk Arrays may be limited to the number of drives that can be
used
• Captive Storage
− Storage can only be used by one server
• Performance
− Disk Arrays may be limited to the number of drives that can be
used
− Backups can be slow and inconsistent
• Expense
− Can be expensive in terms of wasted disk space.
• Expense
− Can be expensive relative to cost of single server
• Expense
− Can be expensive relative to cost of single server
• Performance
− Depending on protocol
• Expense
− Can be expensive relative to cost of single server
• Performance
− Depending on protocol
• Database Support
− No support for MS SQL or MS Exchange
• Expense
− Can be expensive relative to cost of single server
• Performance
− Depending on protocol
• Database Support
− No support for MS SQL or MS Exchange
• Skill Levels
− May require new skill sets
• High Performance
− IO/s
− Disk Utilisation
• High Performance
− IO/s
− Disk Utilisation
• Resilience
− SnapShots
− Mirroring
− Replication
• High Performance
− IO/s
− Disk Utilisation
• Resilience
− SnapShots
− Mirroring
− Replication
• Scalability
− Scales to PB
• Costs
− Initial Capital Cost
− Running Costs
− Maintenance
• Costs
− Initial Capital Cost
− Running Costs
− Maintenance
• Skill Sets
− New skill sets will be required
• Costs
− Initial Capital Cost
− Running Costs
− Maintenance
• Skill Sets
− New skill sets will be required
• Compatibility
− Most vendors require ‘Fork Lift’ upgrades
• Costs
− Initial Capital Cost
− Running Costs
− Maintenance
• Skill Sets
− New skill sets will be required
• Compatibility
− Most vendors require ‘Fork Lift’ upgrades
• Business Risk
− Lose the SAN and lose data from many servers
− Maximum resilience is a must
November 26, 2009 44
Which Storage Solution is Right for Me?
Windows Server
UNIX Server
Windows Server
iSCSI
GbE switch
CIFS NFS
FCP
FC fabric
• RAID 0
− No fault tolerance
• RAID 1
− Hardware Mirror
• RAID 4
− Single dedicated parity drive
• RAID 5
− Distributed parity
• Description
− Diagonal-Parity RAID — two parity drives per RAID group
• Benefits
− 2000~4000X data protection compared to RAID 4 or 5
− Protects against 3 modes of double disk failure
• Concurrent failure of any 2 disks (very rare)
• 2 simultaneous disk uncorrectable errors (also very rare)
• A failed disk and an uncorrectable error (most likely)
− Comparable operational cost to RAID 4
• Equivalent performance for nearly all workloads
• Equally low parity capacity overhead supported
− Less system impact during RAID reconstruction
D D D D P DP
D D D D P DP
3 1 2 3 9
3 1 2 3 9 7
1 1
2 1 5
12
2 3 1
2 8 12
1 1 3 2
7 11
3 1 2 3 9 7
1 1
2 1 5
12
7
2 3 1
2 8 12
1 1 3 2
7 11
3 1 2 3 9 7
1 1
2 1 5
12
7
2 3 1
2 8 12
1 1 3 2
7 11
3 1 2 3 9 7
1 1
2 1 5
12
7
2 3 1
2 8 12
1 1 3 2
7 11
3 1 2 3 9 7
1 1
2 1 5
12
7
2 3 1
2 8 12
1 1 3 2
7 11
3 1 2 3 9 7
1 1 2 1 5 12
2 3 1 2 8 12
1 1 3 2 7 11
RPO
RTO 2
3 1
Last System System Loss
Backup/Copy
November 26, 2009 70
Options and Issues
• Virtualised infrastructure
− Virtualise secondary and/or primary server infrastructure
• Data replication software
− DoubleTake
− WANSync
• Hardware replication
1. Core server
infrastructure
virtualised for
resilience and fault
tolerance
2. Centralised server
management and
backup
3. SAN for primary
data storage
4. Backup to disk for
speed
5. Tape backup to
LTO3 autoloader
for high capacity
6. Two-way data
replication
1. Servers backed-up to
low cost disk - fast
backup and reduced
backup window
2. Disk backup copied
to tape - tape backup
to LTO3 autoloader
for high capacity and
reduced manual
intervention
3. Move tapes offsite
• Virtual infrastructure
in VMware HA (High
Availability) Cluster
• Fault tolerant primary
infrastructure
• Failing virtual servers
automatically
restarted
• Dynamic reallocation
of resources
• SAN replication at
hardware level
• Very high bandwidth
requirements - > 1
Gbps each way
• Not all SANs support
hardware replication
• Very fast recovery
• Can be an expensive
option
• Scripted replication of
disk backup data
• Recovery to last
backup
• Low bandwidth
requirements
• Low cost option
DR Recovery
Facility
Operational Primary
Disaster Infrastructure
Recovery Designed for
And Business Resilience and
Continuity Recoverability
Plan
Processes
And
Procedures
• Hardware bottlenecks
− Need a separate target recovery server for each of the primary servers
under test
− If doing “bare metal” restore, need to locate target recovery hardware
matching exactly the primary server configurations
• Lengthy process with manual interventions
− Configure hardware and partition drives
− Install Windows and adjust Registry entries
− Install backup agent
− Before recovering automatically with the backup server
• Personnel not trained
− Complex processes and limited equipment availability make it difficult to
train personnel
• Hardware Independence
− Flexibility to restore to any hardware
• Hardware Consolidation / Pooling / Oversubscription
− Test recovery of all systems to one physical server
• Speed up recovery
− Use pre-configured templates with pre-installed OS & backup agent
• Single-step simplified capture and recovery
− Different purposes — same procedures — Staging, Deployment, Disaster
Recovery
− One step system and application recovery
− No additional licensing requirements for bare metal restore tools
− More trained personnel available
Find hardware
Find
Configure hardware hardware
Do Once
/ partition drives etc.
Physical to Physical
Install VMware
Repeat for each box
with Templates
Physical to Virtual
Install Operating
System
Install backup agent
“single-step automatic
recovery” from backup
server
•1 - Physical to Physical
•2 - Physical to Virtual
•3 - Virtual to Virtual
• Hardware
− Synchronous — data is written simultaneously to both SANs. The write
operation is not completed until both individual writes are completed. This
will require a communications link between both sites operating at least 1
Gbps.
− Asynchronous — data is not written real-time to the backup unit. Data is
buffered and written in blocks. This will require a communications link
between both sites operating at least 2 Mbps.
• Software
− CommVault QiNetix ContinuousDataReplicator
− DoubleTake
− RepliStor
− WANSync
VM1 VM2 VM3 VM4 VM1 VM2 VM3 VM4 VM5 VM6 VM7 VM8
VMWare Enterprise for two 2-processor servers €12,890.63 €2,511.72 €15,402.34 €2,511.72
and VirtualCentre
VMWare Enterprise for four 2-processor servers €21,875.00 €4,398.44 €26,273.44 €4,398.44
and VirtualCentre
VMWare Enterprise for four 4-processor servers €39,843.75 €8,171.88 €48,015.63 €8,171.88
and VirtualCentre
• Tangible savings
− Server purchases
− Operational costs
− Administration costs
− Power, HVAC
− Deferred cost
• Intangible savings
− Faster server provisioning
− Better utilisation
− Reduced floorspace
− Improved business continuity and disaster recovery
• 16 servers to be virtualised
• Avoid 4 new servers a year
Virtualisation Project Initial Year 1 Year 2 Year 3 Total
Software €21,900.00 €6,100.00 €6,100.00 €6,100.00 €6,100.00
Hardware €16,000.00
Procurement €800.00
Project Costs €25,000.00
Server Operation €3,489.40 €3,489.40 €3,489.40
Maintenance and €12,000.00 €12,000.00 €12,000.00
Support
Server Administration €573.73 €573.73 €573.73
Total €63,700.00 €22,163.13 €22,163.13 €22,163.13 €130,189.38
Saving €120,171.68
• 32 servers to be virtualised
• Avoid 6 new servers a year
Virtualisation Project Initial Year 1 Year 2 Year 3 Total
Software €29,900.00 €8,300.00 €8,300.00 €8,300.00 €8,300.00
Hardware €32,000.00
Procurement €1,600.00
Project Costs €50,000.00
Server Operation €6,978.80 €6,978.80 €6,978.80
Maintenance and €20,000.00 €20,000.00 €20,000.00
Support
Server Administration €1,147.45 €1,147.45 €1,147.45
Total €113,500.00 €36,426.25 €36,426.25 €36,426.25 €222,778.75
Saving €221,107.16
• 64 servers to be virtualised
• Avoid 8 new servers a year
Virtualisation Project Initial Year 1 Year 2 Year 3 Total
Software €45,900.00 €12,700.00 €12,700.00 €12,700.00 €12,700.00
Hardware €64,000.00
Procurement €3,200.00
Project Costs €75,000.00
Server Operation €13,957.60 €13,957.60 €13,957.60
Maintenance and €25,000.00 €25,000.00 €25,000.00
Support
Server Administration €2,294.90 €2,294.90 €2,294.90
Total €188,100.00 €53,952.50 €53,952.50 €53,952.50 €349,957.51
Saving €424,141.93
• Dell/EMC
− AXnnn - iSCSI
− NSxxx — IP
− CXnnn — Fibre Channel
− DMX
− Centera
• IBM
− DS series
− N Series — multi-protocol
• HP
− MSA
− EVA
− XP
• Agentless Monitoring
− SCOM monitors agentless servers. This is aimed at IT
environments where agents could not be installed on a few
exception nodes. Agentless monitoring is limited to status
monitoring only.
• Agent Support
− Agents are installed on servers. SCOM lets you manage
applications running on servers.
• Server Discovery Wizard
− Allows for server lists to be imported from Active Directory,
from a file, or from a typed list. It also allows the list to be
filtered using LDAP queries, as well as name— and domain
name—based wildcards.
Rule
− Rules, Overrides
− Scripts
− Computer attributes
− Views
− SCOM Server and Agent Configurations
− Nested Computer Groups
− Extensible schema for classes, attributes and associations
• State View - Provides you with a real-time, consolidated look at the health of
the computers within the managed environment by server role, such as Active
Directory domain controllers, highlighting the systems that require attention.
• Diagram View - Gives you a variety of topological views where the existence
of servers and relationships are defined by management packs. The Diagram
View allows you to see the status of the servers, access other views, and
launch context-sensitive actions, helping you navigate quickly to the root of
the problem.
• Alerts View - Provides a list of issues requiring action and the current state and
severity of each alert. It indicates whether the alerts have been acknowledged,
escalated, or resolved, and whether a Service Level Agreement has been
breached.
• Performance View - Allows you to select and display one or more
performance metrics from multiple systems over a period of time.
• Events View - Provides a list of events that have occurred on managed servers,
a description of each event, and the source of the problem.
• Computers and Groups View - Allows you to see the groups to which a
computer belongs, the processing rule groups with which it is associated, as
well as the attributes of the computer.
• If you highlight that alert its details will appear in the Alert detail
Pane
• Clicking on the “Properties” tab in the Alert Detail Pane will give
you the description (and other details) of the alert
− False Negative
− Hardware Issue
− Non Hardware Issue
14 x 72 GB disks = 1 TB capacity
Data Parity Data Data Data Data Data Data Data Parity Data Data Parity Spare
VolGB
140 0 Database
370 GB Home
40 GB
Directories
Storage Pool
vol1 vol2
• Create Storage Pool
vol3
1 Hot spare
14 x 72 GB disks = 1 TB capacity
Data Data Data Data Data Data Data Data Data Data Data Parity Parity Spare
400 GB used
Aggregate
Vol0 Database Home Dirs
Platforms
Software &
Processes
External server w/MS Win and CLARalert required to support CX dial/email home support (compare to AutoSupport).
Changed
Primary Data Blocks
9AM
Snapshot
12PM Snapshot
Block-Level
3PM Snapshot Backups
Instant Secondary
Primary Secondary
Primary Recovery
Storage
Storage Storage
Storage
Client Drag-and-Drop Restores
• Snapshots replace a large portion of the “oops!” reasons that backups are
normally relied upon for:
− Accidental data deletion
− Accidental data corruption
A B C C’
Disk blocks
A B C C’
Disk blocks
Avoids the significant costs associated with the I/O bandwidth, downtime, CPU
cycles dedicated to copying and managing entire volumes
X
Active File System Active File System
Snapshot
snap
restore
1 2 … N 1’ 2’ … N’
Corruption !
Snapshots 15:22
1 2 3 4 5 6 7 8 9
Snapshot restore
• Storage Mirroring
− Synchronous
− Semi Synchronous
− Asynchronous
• Runs over IP or FC
Step 2: Updates
Source Target
SAN or NAS Attached hosts
Periodic updates
LAN/WAN
…... of changed blocks
Snap A
Baseline
Transfer
Common
snapshot
Snap A
Baseline
Transfer
Completed
Snap B Incremental
Transfer
Completed
Snap A
Snap C
Incremental
Transfer
Completed
• Cascading Mirrors
− Replicated mirrors on a larger scale
• Disaster recovery
− Replication to “hot site” for mirror failover and eventual
recovery
November 26, 2009 172
Data Replication for Warm Backup/Offload
• For Corporations with a warm backup site, or need to offload backups from
production servers
• For generating queries and reports on near-production data
Backup Site
MAN/WAN
Tape
Library
Production Sites
November 26, 2009 173
Isolate Testing from Production
Snap C Incremental
X Transfer
SnapMirror
Resync
WAN
Office 4
Office 5
(redirect)
X
Production Site
LAN/
WAN
Create a clone
Volume 2
(a new volume based on
(Clone) the Snapshot copy)
Snapshot Copy Modify the original vol
Data Written Modify the cloned vol
to Disk:
Result:
Volume 1
Changed Blocks Independent volume copies,
Cloned Volume efficiently stored
Changed Blocks
November 26, 2009 181
Volume Splitting
Volume 2
Result:
Easily create new
permanent volume for
forking project data
November 26, 2009 182
The Pain of Development
1.4 TB Storage Solution
Prod Volume (200gb)
200 GB Free
November 26, 2009 183
Clones Remove the Pain
1.4 TB Storage Solution
Prod Volume (200gb) Test Volume
1 Tb Free
Create Clones of the Volume – no additional space required
Start working on Prod Volume and Cloned Volume
Only changed blocks get written to disk!
Mirror
Secondary
Primary Production
Array
Array
• SnapShot Management
− Rapid online backups and restores–integrates with Exchange backup
API; runs ESEFILE verification; automates log replay
− Intuitive GUI and wizards for configuration, backup, and restore
• Server Based Connection Manager
− Dynamic disk and volume expansion
− Supports both Ethernet and Fibre Channel environments
− Supports MSCS and NS Series CFO for high availability
• Single mailbox recovery software
− Restores single message, mailbox, or folder from a Snapshot™ backup to
a live Exchange server or a .pst file
• SnapShot Mirroring
− Automatic mirroring of Exchange data to remote site
− Volume based mirroring
− Occurs immediately following a Exchange backup and is
initiated by Exchange Server
− Can replicate over LAN or WAN
− Only changed blocks since previous mirror are replicated
− Rate of replication can be throttled to minimize impact on
network
• PowerControls Software
− Quickly access Exchange data already stored in the online
snapshot backups
− Select any data, down to a single message
− Restore the data to one of two locations:
• An offline mail file (.PST personal storage file) which can be o
opened
pened in
MS Outlook
• Connect to a live Exchange server and copy data directly into thethe users
mailbox, making it instantly available
• Dramatically
reduces the time required for single
mailbox and single message recovery
− From hours or days to just minutes
− Simplifies the most dreaded task by Exchange administrators
• Eliminatesthe need for expensive, cumbersome and
disk-intensive daily brick level backups
• Eliminates the need for recovery server infrastructure
• Allows easy search and discovery of email messages
and attachments
• Provides
integrated data management for SQL Server
2000 and SQL Server 2005 databases
− Automated, fast, and space-efficient backups using Snapshots
− Automated, fast, and granular restore and recovery using
SnapShot restore technologies
− Integrated with storage system Mirroring for database replication
• Providestight integration with Microsoft technologies
such as MSCS, Volume Mount Points.
Volume Mount Point Support • Support for Volume Mount Points in order to eliminate the
limitation with drive letters
Native x64 support • Supports 64bit natively on AMD64/EM64T
DBA:
• Ability to backup DB faster with fewer resources and without any
storage knowledge
• Reduces Mean Time to Recovery on failure
− Quick Restores
− More frequent backups Less logs to replay Faster Recovery
Storage Admin:
• Ability to backup and restore DB without any DB knowledge
• Space, time & infrastructure efficient backups, restores and clones
• Increased productivity and storage utilization
iSCSI or FCP
1 Benefits:
2 • Simplified, centralized management
• Shared storage for improved utilization
• Better system availability
Benefits:
• Eliminate backup windows
• Automation reduces manual errors
• More frequent backups reduce data
3 loss
2
• No performance degradations
Snapshots
Benefits:
• Fast and accurate restoration of SQL
Server
3 1 • Reduce downtime from outages
Snapshot • Automation saves administrative
Roll transaction logs 2 time
Time to restore: minutes
1 System
Mirroring
• All
SQL SnapShot related files can reside on a mounted
volume, same as that of a Standard Volume:
− SQL user databases
− SQL system databases
− SQL Server transaction log file
− SnapInfo directory
• Configuration wizard can be used to migrate database
files to a mounted volume, same as that of a Standard
Volume.
− The rules applicable for migrating databases to Standard
Volume will apply for Volume Mount Point also.
Monitor Utilization
Manage Storage System from Oracle Enterprise Manager 10g Grid Control
November 26, 2009 218
Oracle ASM
Tables Tables
Tablespace Tablespace
0010 0010 0010 0010 0010
Files 0010 0010 0010 0010 0010
File Names Automatic
File System File System Storage
Logical Vol Logical Vol Management
Disks Disk Group
Networked Storage
(SAN, NAS, DAS)
Performance
Stripe data across ASM Disks Yes No Yes
Storage Utilization
Free space management across physical No Yes Yes
disks
Thin provisioning of ASM Disks No Yes Yes
Space efficient Cloning No Yes Yes
Data Protection
Storage Snapshot based Backups No Yes Yes
Application-Based Management
Integration and
Automation
Server-Based Management
Data Sets
and Policies
Storage Systems
• Database cloning
− Ability to clone consistent copies of online databases
− GUI support for cloning
− Added support for context sensitive cloning
• Increased footprint of platforms and protocols
− Support for additional flavors of Unix
• SuSE 9, RHEL3/4 U3+, Solaris 9/10
− 32-
32-bit and 64-
64-bit
− NFS, iSCSI and FCP for various Unix platforms
− HP-
HP-UX and AIX (NFS)
− (Refer to compatibility matrix for specific details)
• Product hardening
− Increased product stability and usability
− Improved performance by utilizing snapshot vs. safecopy
− Increase performance when dealing with high number of archive logs
logs
Challenges
• DBA’s time spent on non-
value-add backup/restore tasks
• Cold backups lead to lower
SLAs
• Separate backups on each
platform
• Time-to-recover from tape
becomes prohibitive
Time to
Backup To Tape (60GB/Hr Best Case)
Snapshot™
Time to From Tape
Recover Redo Logs
SnapRestore®
Redo Logs
0 1 2 3 4 5 6 7 8
Time in Hours
November 26, 2009 226
SnapShot Management with Oracle
Automates Backup and Recovery
Primary Data Center • Backups in seconds
• Snapshot copies verified
• Near instantaneous restores
• Dramatically shortened recovery
with automated log replays
• Automated recovery tasks
DB
Server
Benefits:
• Extremely fast and efficient
• No performance degradation
• Accurate data restore and
recovery
Storage
System • Reduce downtime from outages
• Automation reduces errors and
Snapshot SnapShot Restore saves time
Create clones
Create several
Upgrade Testing
Flexible requires
Clone:
with
clones,
Pain Points
Solutions duplicate
Fast &data,
FlexClone, lengthy and
space-efficient
lengthy process, Implement
automate with expensive
data
expensive
SMO process
duplication
Patch
Deploy
Tune &
Maintain Backup and
Need reliable
Need Recovery
backup and
Use Snapshots,
reliable backup Mirror prod. solution with
recovery
SnapShotandRestore, Mirror
datadata Snapshots,
solution
recovery solution to testwith
and dev SnapShot Restore
Storage Mirroring,
system,
ReplicatorX
lengthy
November 26, 2009 process 234
Server Virtualisation and Storage
Alan McSweeney
alan@alanmcsweeney.com