You are on page 1of 48

TSM Symposium 2013: Tivoli Storage Manager: Future Expectations Vendor Talks

IBM Tivoli Storage Manager for Virtual


Environments-Data Protection for VMware:
Solution Design

Dan Wolfe
Tivoli Storage SWAT

1
17.-20. September 2013, Hilton Hotel Gendarmenmarkt, Berlin, Germany
Agenda: The four Ss of solution Design for DP for VMware

Strategizing (Planning)
Sizing
Scheduling
Support (for recovery/restore procedures)

NOTE: Familiarity with concepts of TSM for Virtual Environments is pre-req: other sessions at the TSM
Symposium provide TSM for VE concepts
2 2013 IBM Corporation
Or

Simple
Steps to a
Successful
System
Solution

3 2013 IBM Corporation


4 2013 IBM Corporation
STRATEGIZING

5 2013 IBM Corporation


Strategizing: Key areas

Transition from in-guest backup

vSphere architecture: vCenters, Datacenters, Clusters, etc.

Backup storage device selection (e.g., disk, file, tape, VTL)

6 2013 IBM Corporation


Example environment description

Parameter Value Units


Utilized Data 400000 GB
Total VMs 4000 Count
Total ESX Hosts 100 Count
Total Clusters 20 Count
Daily Data Change Rate 2% Pct
Backup Window 10 Hours
Days Retention Dev: 7, Prod: 30 Days

7 2013 IBM Corporation


Transitioning from legacy in-guest backup: Differences

Not available with legacy backup:

Full image AND file-level restore from single backup image


One step full image restore (NOTE: TBMR can provide this with legacy in-guest)
Centralized file-level restore capability using DP for VMware mount
Eliminate CPU and i/o workload on VM guest during backup
Efficient, block-level backup
Automatic detection of new VMs
Individual VM registration not required for backup
Agentless
Queries/reports are filespace based (not node based)

8 2013 IBM Corporation


Transitioning from legacy in-guest backup: Differences

Not available with VM image backup

Individual file exclude/include capability


However, VM image backup can exclude/include specific vmdks
Customized retention for files within a VM guest
Version selection for file-level restore (across all backups)

9 2013 IBM Corporation


Transitioning from legacy in-guest backup : Key planning elements

Initial phase-in of full backups:


Establish timeline requirement for initial phase-in: e.g., how many weeks?
vBS (vStorage Backup Server) sizing must include phase-in
Temporary vBSs for phase-in can be used

What to do with down level VMs?


ESX/i V3.5 does not have CBT (Change Block Tracking)
Consider continuing with in-guest backup until ESXi hosts are upgraded

Determine infrastructure requirements: image backup vs. legacy file-level


Daily incremental backup amount will be similar, with some inflation:
Disk-block (image) backup vs. file-level backup: some rounding will occur
No ability to exclude individual files: O/S files, swap files
Network infrastructure must be capable of workload
TSM server/s must be capable of workload

Application backups: Use in-guest backup only, or combine with VM image backup?

10 2013 IBM Corporation


vSphere architecture and TSM for Virtual Environments

Collaboration between VMware architects and storage/backup architects is important


Consider how vSphere architecture corresponds to backup requirements
Architecture of vSphere environment can facilitate backup scheduling and policies
For example:
VMs with similar retention requirement grouped within the same cluster
Other groupings can be used:
Folders
Similar VM names (using v6.4 wildcard specification in backup schedule)

11 2013 IBM Corporation


vSphere architecture and TSM for Virtual Environments (2)

Distribution of VM storage capacity will help to determine vBS placement


Is capacity evenly distributed across clusters?
Evenly distributed sizes result in more evenly distributed vBSs
This is usually not possible
For example, clusters with larger amounts of storage may require multiple vBSs
Placement of vBSs important to balance workload

Identify network infrastructure for backup/restore (backup vs. production)


Using the right network interface requires planning and configuration
This is critical factor when using NBD and LAN communication to TSM server
VADP backup technology uses vmkernel port and Management Network
Vlan should include TSM server when using LAN communication

Determine scope of vMotion and TSM node assignments


Contain vMotion scope within a single TSM Datacenter node

12 2013 IBM Corporation


Simplified example: vSphere Architecture

Cluster

Production Clusters:
Prod. D.C.

Cluster
15 clusters
Vcenter Server

Cluster
300TB
30 day retention
Cluster

Cluster
Dev/Test Clusters:
5 clusters
D.C.
Dev.

100TB
7 day retention

Datastore SAN

D D D D D
S S S S S

261TB * 80% = 209TB

13 2013 IBM Corporation


Backup storage pool device factors

Cost vs. restore performance


Evaluate tradeoff between restore performance and storage costs
Consider co-location by filespace (VM) with tape/VTL when possible
Specific, critical VMs can be configured as exceptions for management class
Highest performance storage device for subset of VMs

Deduplication strategy:
TSM native
NOTE: client-side deduplication will influence throughput estimates
Appliance
Hybrid:
Subset of VMs backup to TSM server instance with TSM deduplication
Subset of VMs backup to TSM server instance with deduplicating appliance
Deduplication ratio: benchmark to ensure realistic estimate is used

14 2013 IBM Corporation


SIZING

15 2013 IBM Corporation


Sizing: Estimate the resource requirements

Objectives of sizing:
Determine number of TSM server instances
Determine capacity requirements for TSM storage pools
Determine number of vBSs (vStorage Backup Servers)
Determine physical or virtual vBSs

Key parameters to determine sizing:


Utilized storage capacity of VMs to backup (current and future)
Retention requirements
Throughput estimates for backups AND restores
Backup window

16 2013 IBM Corporation


Sizing: Key factors

Workload estimates:
Daily incremental backups
Daily full backups (newly created VMs, and VMs with CBT reset)
Daily full VM images restores
Initial full backup phase-in period (contingency)
Disaster Recovery requirements (contingency)

Backup and restore throughput estimates:


This is the most critical parameter for sizing
And the most difficult to estimate
Benchmarking is STRONGLY encouraged
There is an upper limit to performance,
determined by capabilities of infrastructure: laws of physics prevail!
There is NO lower limit to performance: many factors can work against good
performance

17 2013 IBM Corporation


Sizing: TSM server

Number of TSM Server instances required


May be driven by any or all of the following:
Capacity limits of a single TSM server instance (e.g., TSM DB size)
Throughput limits of a single TSM server instance
Peak load restore requirement
Dont forget this!
DR restore requirements may drive the number of server instances required
Organizational boundaries (e.g., separation of dev/test and production)
Storage pool sizing requirements:
Total backup data
Daily change rate of data (incremental backups)
Retention requirements
Expected deduplication ratio
Key parameter, difficult to predict
Use conservative estimate or benchmarking to avoid under-sizing

18 2013 IBM Corporation


Sizing: Determine number (and placement) of vBSs

Key factor for estimating number of vBSs (vStorage Backup Servers):


Throughput estimates:
Full and incremental backup throughputs are typically different
Backup and restore rates may be different
Client-side deduplication will influence throughput rates (backup and restore)

Use of contingencies
Contingency planning is important to ensure adequate capacity
Initial phase-in
DR restores

19 2013 IBM Corporation


Determine physical or virtual vBS

Can use a combination of both physical and virtual

Suggest starting with virtual vBS


determine if there are any extenuating circumstances
For example, network infrastructure contraints requiring use of SAN data transfers

Physical vBSs could be used temporarily for initial phase-in or DR situations

Resource utilization impact:


Biggest impact from vBS will be on datastore i/o
if TSM client-side deduplication is used, then CPU is also a consideration
Datastore i/o utilization will generally be comparable between physical and virtual vBS
therefore, ESX host resource utilization should not drive the decision

20 2013 IBM Corporation


Simplified estimation for number of vBSs

Simplified estimation technique:


One vBS per 20 - 35TB of source data to backup
Suggested minimum one vBS per DRS cluster (except for small clusters)
This includes several factors:
Incremental backups
Occasional full backups (new VMs and CBT resets)
Occasional image restores
Initial phase-in contingency
DR restore contingency
Assumptions (not all are listed: refer to appendix for more details):
2% daily change rate
*Throughput: 20GB/hour incremental, 40GB/hour full
10 hour backup window

NOTE: *Throughput is for example only; benchmarking is strongly recommended

21 2013 IBM Corporation


Example vStorage Backup Servers sizing (virtual)
I
Incremental backups ONLY:
Virtual vBS 800GB/Hour Aggregate
(1 of 20) 20GB/Hour per vBS
1 proxy per cluster (20)
With contingency:
Cluster 250GB/hour per vBS
Prod. D.C.

Cluster
Vcenter Server

Cluster

Cluster TSM Server/s

Cluster
D.C.
Dev.

VTL SAN

Datastore SAN
Deduplicating VM
MC
D D D D D VTL CTL
S S S S S
VMMC
261TB * 80% = 209TB

22 2013 IBM Corporation


Notes: Example vBS Sizing

20 vBSs are suggested


Initial full backup phase-in and DR restore contingency is included
Incremental backup workload is very low per vBS (20GB/hour)

Without contingency:
10 vBSs are suggested
However Hotadd transport is not available for all clusters
10 vBSs for 20 clusters
Hotadd can provide improved efficiency and throughput

23 2013 IBM Corporation


Example of physical vBS

Cluster

Cluster
Vcenter Server

Cluster
Physical
vBSs
Cluster TSM Server/s

Cluster

TSM Lanfree to VTL

VTL SAN

Datastore SAN
TSM
SAN Transport Deduplicating STG
D D D D D Pool
S S S S S VTL

24 2013 IBM Corporation


SCHEDULING

25 2013 IBM Corporation


Scheduling of backups: Factors to consider

Schedule Scope

Schedule exceptions to scope

Number of backup sessions (parallel sessions per datamover)

26 2013 IBM Corporation


Scheduling of Backups: Schedule Scope and Exceptions

Schedule scope:
Schedule definition is simplified if schedule scope corresponds to management class
VM, ESX host/s, folder, cluster, datastore
Determined by TSM option Domain.vmfull (vm, vmhost, vmfolder, vmhostcluster, vmdatastore)

Schedule exceptions to scope:


VMs with in-guest application backups (e.g., TDPs)
Vmdk exceptions (for example, exclude application disk/s)
Special retention or storage pool requirements
Exception VMs can be excluded and backed up with special schedule

27 2013 IBM Corporation


Scheduling: What if VMs are not organized by backup scope?

Scheduling becomes more complex if it needs to be done ad hoc

If VMs cannot be organized by schedule scope:


Consider the use of backup scheduling tools:
For example Tivoli Workload Scheduler
Tivoli lab services custom scheduling tool

28 2013 IBM Corporation


Number of backup sessions

Number of backup sessions (parallel sessions per datamover)


Assume one datamover process per proxy
V6.4 provides multiple, parallel sessions per single datamover process
VMMAXPARALLEL setting in dsm.opt
Only on exception basis do you need to consider more than one
For example, separate datamovers for different backup retention policies

29 2013 IBM Corporation


SUPPORT FOR RESTORE AND
RECOVERY

30 2013 IBM Corporation


Support for restore and recovery

Define requirements for restore of individual file data and recovery of VMs:
Frequency of file-level restore
Frequency of recovery of full VM images
Disaster recovery requirements:
How many VMs, how long to restore

Define organizational responsibility for restores


Who owns file-level restores?
Central team, or self-service?
Determine if Recovery Agent needs to be installed in-guest
Who owns VM image restores?
VMware team?
Backup team?

31 2013 IBM Corporation


Support for restore: access permissions

Permissions can be controlled by:


vSphere role/permissions
For plug-in GUI access, on a per vSphere admin login basis
For TSM client access, on a per datamover basis (one VCenter login per datamover)
TSM node ID

Determine access permissions for restores


Image restores can be controlled by vSphere roles when using plug-in GUI
VM read-only vs. write/create

Access to TSM VM image backup data


control by TSM node assignment
set access client command: set access backup TYPE=VM <vmname> <nodename>
most common use is for self-serve file-level restore scenarios

32 2013 IBM Corporation


TSM node relationships for datamovers

Datamovers
VMs backed TSM use asnode
up as Datacenter to
filespaces Node Datacenter
Node
DC1

TSM Datamover TSM Datamover TSM Datamover


Node Node Node

DC1_DM1 DC1_DM2 DC1_DM2

33 2013 IBM Corporation


TSM node relationships for limiting restore access

Restore access to
VMs backed TSM individual user node ids
up as Datacenter Using set access
filespaces Node

DC1

TSM user Node TSM user Node TSM user Node

vm1_user vm2_user vm3_user

Set access backup Set access backup Set access backup


type=VM vm1 type=VM vm2 type=VM vm3
vm1_user vm2_user vm3_user

34 2013 IBM Corporation


Disaster recovery planning

VM Image recovery requirements:


Determine which VMs are most critical and need fastest recovery
Define recovery time objectives: how many VMs, how long to restore
Evaluate peak-load capacity of infrastructure, TSM server, storage devices
Establish a plan for bootstrapping environment at recovery site
How to recover vBS
How to recover vCenter Server (for example, direct restore to ESXi host)

Alternate site recovery requirements


Determine if VMs need to be restored to an alternate physical datacenter
Define TSM server replication requirements
TSM node replication (active/active TSM servers)
Hardware storage replication: VTL, Disk (active/passive TSM servers)
Offsite copy storage pools

35 2013 IBM Corporation


VBS SIZING (APPENDIX)

36 2013 IBM Corporation


Step by step guide to TSM for Virtual Environments 6.4 vBS sizing

Condensed version of document on IBM Developerworks


https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/Tivoli%20Stora
ge%20Manager/page/Guide%20to%20vStorage%20Backup%20Server%20%28Proxy%29
%20Sizing
Disclaimer:
Throughput numbers that are contained in this document are intended to be used for
estimation of vBS host sizing. Actual results are environment and configuration
dependent and may vary significantly. Users of this document should verify the
applicable data for their specific environment.

37 2013 IBM Corporation


Example environment:

38 2013 IBM Corporation


Steady State Daily Workload Assumptions

39 2013 IBM Corporation


Steady State Incremental Backup Workload Calculation

40 2013 IBM Corporation


Steady State Full Backup Workload Calculation

41 2013 IBM Corporation


Steady State Full Restore Workload Calculation

42 2013 IBM Corporation


Initial Full Backup Phase-In Calculation

43 2013 IBM Corporation


Calculate Peak VM Image Restore Workload

44 2013 IBM Corporation


Non-Deduplication Example With Initial Phase-In and Peak Restores

45 2013 IBM Corporation


Calculate Number of VBS Hosts Required: Non-Deduplication

46 2013 IBM Corporation


Non-Deduplication Example Excluding Initial Phase-In and Peak
Restores

47 2013 IBM Corporation


Calculate Number of VBS Hosts Required: Non-Deduplication

48 2013 IBM Corporation

You might also like