You are on page 1of 93

Storage Architectures and

Options

Alan McSweeney
Objectives

• Toprovide high-level information on storage options


and architectures for storing and managing digital
camera data
• To provide indicative sample solutions
• Toinitiate discussions on storage configurations and
options

November 26, 2009 2


Agenda

• Confirmation of Storage Requirements


• Data Flows and Processes
• Storage Management Architectures and Options
• Storage Management Operation, Management and Use
• Sample Solutions

November 26, 2009 3


Understanding of Requirements

• Storage solution to manage raw and processed map image data


• Store raw and processed data
− No requirement to store intermediate pre-processed data
• Keep 6 month’s raw and processed data on primary storage
• Keep online copy of additional data
• Keep all raw and processed data indefinitely
• Size for at least 5 years
• Deliverables
− Draft data management/storage policy
− SLA options on data retrieval from non-primary storage
− Set of practical options
− Storage management policy document

November 26, 2009 4


Objectives of Storage Management

• Data availability to meet service level commitments


even during failures, disasters, or other forms of primary
data loss
• Dataprotection against loss and to prevent
unauthorised access
• Dataretention that is compliant with regulations and
standards in an unalterable state, fully audited for long
periods of time
• Cost-
Cost-effective storage management infrastructure

November 26, 2009 5


Backup and Data Archival

• Backup
− Ensure efficient recoverability of data
− Does not make backup data directly available
− Optimised to bring large amounts of data back online quickly for system
recovery
− Retention management at the volume level
− Not oriented to long-term management beyond life of current environment
and media
• Archiving
− Copy from online environment to separately managed (secure) storage to
reduce cost of storage and enforce retention
− Provides easy (ideally transparent) access for retrieval
− Optimised to write and retrieve data at file granularity
− File-level retention management
− Designed to manage data over long-term, through media migration and
with access auditing and controls
− Designed to manage multiple copies of data on different media types

November 26, 2009 6


High Level Storage Management Architectures

• Multi-tier data storage architectures


− Primary/Secondary
− Primary/Secondary/Tertiary
− Primary/Secondary and Tertiary in parallel
− Secondary disk storage layer is purely for convenience to
allow recall of data
• Advantages and disadvantages in terms of cost and
service

November 26, 2009 7


Hierarchical Storage Management (HSM)

• HSM is a key requirement of effective (and cost-


effective) storage management
• Data is migrated (moved / copied) from one storage
layer to another, usually less expensive, form of storage
•A stub is created for and replaces each migrated file
− On the local system, a stub file looks and act like a regular file
• When user action restores a file but the user does not
change the file, that file is ″re-stubbed″ during the next
migration process

November 26, 2009 8


Primary/Secondary
Migrate After
Defined
Interval
Primary Secondary
Storage Storage

High speed fibre- Offline/nearline


channel disk storage

Data is directly Retain data


accessible indefinitely

Tape/optical media

November 26, 2009 9


Primary/Secondary
Migrate After
Defined
Interval
Primary Secondary
Storage Storage

Retrieve from Secondary to


Primary

November 26, 2009 10


Primary/Secondary/Tertiary
Migrate After Migrate After
Defined Defined
Interval Interval
Primary Secondary Tertiary
Storage Storage Storage

High speed fibre- High capacity ATA Offline/nearline


channel disk (SATA/FATA) disk storage

Data is directly Data is directly Retain data


accessible accessible indefinitely

Data resides Tape/optical media

November 26, 2009 11


Primary/Secondary/Tertiary
Migrate After Migrate After
Defined Defined
Interval Interval
Primary Secondary Tertiary
Storage Storage Storage

Retrieve from
Secondary/Tertiary to
Primary

November 26, 2009 12


Primary/Secondary and Tertiary in Parallel
Migrate After
Defined
Interval
Primary Secondary
Storage Storage

Tertiary
Storage

Take Copy
Immediately

November 26, 2009 13


Hardware Options

• Disk Storage
• Tape Storage — Manual or Automated
• Optical Storage — Manual or Automated
• Hybrid devices
− VTL (Virtual Tape Library)
− EMC Centera
− IBM DR550
− Storage gateways

November 26, 2009 14


Hardware Options - Disk

Disk — Advantages
• Speed - FC and SATA disk technologies allow the data to be
housed on the appropriate disks
• SATA Drive technology has mature and can lead to decreased
acquisition costs
• FC and SATA can be used within the same storage system for
primary and secondary data
• Storage Virtualisation
− Virtualise disk arrays within a storage system
− Virtualise storage systems within a fabric
− Thin provisioning allows over commitment of disk — reducing acquisition
costs
− Single Instance Storage (Deduplication) can be used but its effectiveness
depends in the nature of the data

November 26, 2009 15


Hardware Options - Disk

Disk — Disadvantages
• Acquisition cost
• Disk systems do not interoperate well
• Management - multiple skill sets may be required even
if all storage systems are from the same vendor
• Most hardware vendors focus on ensuring hardware
resilience, data resilience is not their concern
• Operating costs — power, air conditioning, maintenance

November 26, 2009 16


Hardware Options — Removable Media

• Advantages
− Control of costs
− Keep fixed number of media within automated library unit
(could keep none)

• Disadvantages
− External media needs media management and control
• Media management is greater for smaller capacity optical disks
− Manual costs of media management

November 26, 2009 17


Hardware Options — Optical Storage
Optical Storage
• UDO (Ultra Density Optical)
− 60 GB media capacity
• UDO media have a 50+ year life
• UDO technology roadmap -120GB and 240GB media capacities
• Main vendor — Plasmon
• Resold by other vendors: HP and IBM
• WORM media option
Model Gx24 Gx32 Gx80 Gx174 G238 G438 G638
Maximum Media Slots 24 32 80 174 238 438 638
Maximum Raw Capacity – (TB) – 1.4 1.9 4.8 10.4 14.3 26.3 38.3
UDO2
Max/Min Drives 2/1 2/1 4/2 6/2 12 / 2 12 / 2 12 / 2
Robotics Access Time (secs) 7 7 7.3 8.3 6.2 6.3 6.4
Library Reliability (Mean Swap 2,000,000 2,000,000 3,800,000
Between Failure)
Redundant Power NA NA Optional
Import/Export Slot Single Single Single
Bulk Load NA NA 10 disk
November 26, 2009 18
Optical Library and Drive Performance

• Poor performance relative to tape


• Direct access medium
• Use depends on data read (retrieval) and write volumes

Media Load Time 5 sec


Media Unload Time 3 sec
Average Seek Time 35 msec
Buffer Memory 32MB
Max Sustained Transfer Rate - Read 12 MB/s
Max Sustained Transfer Rate - Write 6 MB/s (with verification)
MSBF - Mean Swap Between Failure > 750,000 load/unload cycles
MTBF - Mean Time Between Failure > 100,000 hours
Interface Wide Ultra 2 LVD SCSI or USB 2.0

November 26, 2009 19


Single Drive/Path Tape and Optical Read and
Write Performance

GB Hours
Tape Read Tape Write Optical Optical
Time Time Read Time Write Time

100 0.2 0.2 4.6 2.3


200 0.5 0.5 9.3 4.6
300 0.7 0.7 13.9 6.9
400 0.9 0.9 18.5 9.3
500 1.2 1.2 23.1 11.6
600 1.4 1.4 27.8 13.9
700 1.6 1.6 32.4 16.2
800 1.9 1.9 37.0 18.5
900 2.1 2.1 41.7 20.8
1,000 2.3 2.3 46.3 23.1

November 26, 2009 20


Hardware Options — Optical Storage

Optical — Advantages
• Reduced cost over disk
• Larger capacity media planned for the future
• Can have embedded encryption
• Long media shelf life before refresh is required
• Very reliable medium
• True WORM option

November 26, 2009 21


Hardware Options — Optical Storage

Optical — Disadvantages
• Low capacity
• Media must be managed offline unless multiple libraries
are bought
• Low data access speed — not suited to large data volume
restores

November 26, 2009 22


Hardware Options — Optical Storage

Optical Storage Issues


• Low medium capacity
− UDO — 60 GB currently, 120 GB and 240 GB planned
• Tape
− LTO-4 Ultrium 1840 — 800 GB uncompressed
− LTO-3 Ultrium 960 — 400 GB uncompressed

November 26, 2009 23


Tape and Optical Media Capacities

• Optical media capacity cumulative annual increase of c. 31%


• Tape media capacity cumulative annual increase of c. 64%
900 10,000

800 9,000
Capacity GB - Past and Current

8,000
700

Capacity GB - Future
7,000
600
6,000
500
5,000
400
4,000
300
3,000

200
2,000

100 1,000

0 0
1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013

Optical Media Capacity Tape Media Capacity Future Optical Media Capacity Future Tape Media Capacity

November 26, 2009 24


Hardware Options — Tape

Tape — Advantages
• Cost
• Very well defined road map for LTO
− LTO4 (Dec 2006) - 1.6TB (2:1 compression) and data transfer rates of up to
240 MB/second (2:1 compression)
− LTO5 (Planned) - 3.2 TB (2:1 compression) and data transfer rates of up to
360 MB/second (assuming a 2:1 compression)
− LTO6 (Planned) - 6.4 TB (2:1 compression) and data transfer rates of up to
540 MB/second (assuming a 2:1 compression)
• High capacity media
• Designed for large data volume restore
• Multiple media can be streamed to aggregate capacity and speed
• Can have embedded encryption

November 26, 2009 25


Hardware Options — Tape

Tape — Disadvantages
• Media shelf life — medium
• Media long-term reliability
• Cumbersome single file restores
• Sequential access medium

November 26, 2009 26


Hardware Options — Tape Library

• Widely available from large number of vendors:


Dell, HP, IBM, Quantum
− IBM System Storage TS3500 Tape Library
− One base frame, and up to 15 expansion frames
− Up to 12 drives per frame (up to 192 per library)
− Up to 5.5 PB with LTO 4 cartridges
− LTO Fibre Channel interface for server attachment
• Very high capacity automated data management
• Long-term data storage

November 26, 2009 27


VTL (Virtual Tape Library)

• Hybrid units that emulate tape libraries


• Use low cost disk (and possibly tape)
• Works with existing tape backup software
• Improved backup speeds
• No removable medium backup
• Sample products
− IBM
• IBM Virtualization Engine TS7510
• IBM Virtualization Engine TS7520
− HP
• StorageWorks Virtual Library System (VLS)
• VLS1000i
• VLS6000
November 26, 2009 28
IBM Virtualization Engine TS75x0

• TS7510 • TS7520
• 96 TB Capacity at 2:1 • 2.6 PB Capacity at 2:1
Compression Compression
• Maximum number of virtual • Maximum number of virtual
libraries — 128 libraries — 512
• Maximum number of virtual • Maximum number of virtual
drives — 1,024 drives — 4,096
• Maximum number of virtual • Maximum number of virtual
cartridges — 8,192 cartridges — 64,000
• Maximum number of • Maximum number of
concurrent backups – 32 concurrent backups – 32

November 26, 2009 29


HP StorageWorks Virtual Library System (VLS)

• VLS1000i • VLS6000
• 3 TB Capacity at 2:1 • 105 TB Capacity at 2:1
Compression Compression
• Maximum number of virtual • Maximum number of virtual
libraries — 6 libraries — 16
• Maximum number of virtual • Maximum number of virtual
drives — 12 drives — 128

November 26, 2009 30


IBM DR550

• Uses multiple storage tiers (disk, tape, optical) within an archive


• Software - System Storage Archive Manager
• Two models
− DR1 - 36.88 TB raw
− DR2 - 168 TB raw
• Attached devices — support for PB capacities
− Tape systems
− Optical systems
• Awards
− Data Protection Summit–Information Lifecycle Management (ILM)–Best
of Show, 2007
− AIIM (The Enterprise Content Management Association)–Best in Show,
2005, 2006

November 26, 2009 31


Software Options

HSM
• HSM is a principle most products offer the same basic
functionality
− Automatic migration and management of data from one
medium to another
− Stubs or pointer are left in place of migrated files
− Speed of retrieval depends upon speed of hardware upon
which the files have been migrated to, this gives online, near-
line and off-line options

November 26, 2009 32


Software Options

Bridgehead Software
• Small company, employee owned
− Can they offer the level of service and support required when really
needed
− Are they possible acquisition targets
• Ideal for mid — large customers
− Can it handle the levels of data over time

Caminosoft
• Major corporation — publicly listed and managed by SEC rules
and regulations
• Primary focus is on managing file server type data
• Repackaged by vendors such as CA
November 26, 2009 33
Software Options

Symantec
• Major corporation
• Two products:
− NetBackup
− Enterprise Vault
• NetBackup
− HSM does not support Windows
• Enterprise Vault
− KVS staff still provide support, separate entity within Symantec
− Focus is largely on email and compliance
− Some integration with NetBackup
− Files to be migrated are collected into CAB files
− Entire CAB file recalled
− Poor support for tape as archival medium
• Recommended that you only use tape for data that is seldom or never accessed

November 26, 2009 34


Software Options

IBM — Tivoli
• Major corporation
• Vast knowledge within the company
• Extensive R&D budgets
• Agents
and options from most major software and
hardware vendors

November 26, 2009 35


Software Options

HP — File Archiver
• Major corporation
• Vast knowledge within the company
• Extensive R&D budgets
• “Simple Lightweight Solution” according to HP

November 26, 2009 36


Software Options

HSM Product
What is Required from chosen vendor / application?
• Stable and functionally bullet proof solution
• Easy to use
• Capable of handling files
• Capable of handling data volumes
• Must integrate with backup application (so as NetBackup does
not initiate a restore when backing up or restoring stubs)
• Expert support knowledge
• Expert integration knowledge
− These products are dependant on hardware vendors solutions

November 26, 2009 37


Data Deduplication

• Store only one copy of data


• The deduplication process should be granular
− The smaller the data block examined, the more likely it is
duplicate data will be found.
• Thededuplication process should be designed with
minimal overhead when deduplicating (storing) and un-
deduplicating (retrieving) data
− Hardware better than software
• Thededuplication process should provide resiliency to
insure that all data can be reliably stored and retrieved,
even in the event of system failure
November 26, 2009 38
Data Deduplication

• Available for range of storage — hardware and software


− Symantec Enterprise Vault creates a MD5 fingerprint for
every file that is archived
• If multiple files have the same hash code, only one copy of the file is
physically stored
− IBM N Series has Advanced Single Instance Storage (ASIS)
• Hardware and block-based deduplication

November 26, 2009 39


Deduplication in Action
Sales ed.ppt Client.ppt

20 x 4K blocks = Identical blocks Identical file - 20 blocks

With ASIS - 38 total blocks

Without ASIS – 74
total blocks

Sales ed v2.ppt White paper.doc

Edited file - 24 blocks Different file - 10 blocks

November 26, 2009 40


Potential Deduplication Savings — Dependent in
Data Types

Medical Imaging

Web & Microsoft Office


Data
Engineering Home
Directories

Software Archive

Technical Pubs
Archive
DataBase
Backup

0% 10% 20% 30% 40% 50% 60% 70% 80%

November 26, 2009 41


Software and Solution Design Constraints and
Issues
Bottom Line
• Produce a realistic design before implementation and validate design
• Solutions must be fully tested to ensure it works as expected
• Decisions can then easily be made on the basis of the tests
• NetBackup integration must be thoroughly tested with any solution
• Primary to secondary to tertiary migration and retrievals must be tested and
documented
• Misconfiguration or lack of understanding can lead to data loss or primary
production system failure
• Need to look at the total cost of ownership — maintenance, power, manual
effort — put a cost on all elements and activities to ensure fair comparison
• Reduced complexity — fewer components, vendors — means long-term ease of
operation and use and has a genuine value

November 26, 2009 42


Sample Storage Capacity Planning

• Sizing issues and assumptions


− Annual growth rate
− Overhead for determination of actual disk storage requirements (RAID
overhead, etc.)
− Archival storage medium utilisation overhead (allowance for unfilled tapes,
optical platters, RAID for VTL, etc.)
− Storage lifecycle
− Number of storage layers — 2 or 3
• Sample storage capacity planning scenarios
− Annual growth rates — 0%, 10%, 20%, 30%
− Translated into monthly growth rates for calculations - 20% annual
growth = 1.531% monthly
− Three tiers
− Migrate from Tier 1 to Tier 2 after 6 months
− Migrate from Tier 2 to Tier 3 after further 6 months

November 26, 2009 43


Disk Space Calculations

• Storage
estimates expressed as raw capacities required
to accommodate data
• Includes
overhead for effective usability, RAID,
snapshots, online spare, less than 100% utilisation, etc.
• Primary
storage after 5 years with 10% annual growth =
25,580 GB
• Equates to at least 34,533 GB of raw disk capacity

November 26, 2009 44


Sample Storage Capacity Planning — 0% Annual
Growth Rate
Annual Growth Rate 0%
Disk Storage Contingency, Allowance for Less Than 100% Utilisation, RAID, Other Overhead 35%
Tape Storage Contingency, Allowance for Less Than 100% Utilisation, Other Overhead 25%
Number of Years to Cater For in Initial Storage Solution 5
Raw Data per Month GB 700
Pre-processed Dara Per Month GB 2,000
Processed Dara Per Month GB 2,000
Primary Data Storage Retention Months 6
Secondary Data Storage Retention Months 6
Tertiary Data Copy Months 12
Tertiary Data Storage Retention Months 9999
Primary
Total Primary Data Per Month GB 2,700
Total Primary Data Per Month Including Contingency and Growth GB 3,645
Primary Storage Including Contingency GB 21,870
Primary Storage Including Contingency and Growth GB 21,870
Secondary
Total Secondary Data Per Month GB 2,700
Total Secondary Data Per Month Including Contingency and Growth GB 3,645
Secondary Storage Including Contingency GB 21,870
Secondary Storage Including Contingency and Growth GB 21,870
UDO Medium Capacity GB 60
LTO4 Medium Capacity Compressed 1600
November 26, 2009 45
Capacities - Annual Growth Rate — 0%

Month Primary Total Secondary Total Tertiary Total UDO LTO4


GB Primary GB Secondary GB Tertiary Medium Media
GB GB GB Slots
Month 6 3,645 21,870 0 0 0 0 0 0
Month 12 3,645 21,870 3,645 21,870 0 0 0 0
Month 18 3,645 21,870 3,645 21,870 3,375 20,250 338 13
Month 24 3,645 21,870 3,645 21,870 3,375 40,500 675 25
Month 30 3,645 21,870 3,645 21,870 3,375 60,750 1,013 38
Month 36 3,645 21,870 3,645 21,870 3,375 81,000 1,350 51
Month 42 3,645 21,870 3,645 21,870 3,375 101,250 1,688 63
Month 48 3,645 21,870 3,645 21,870 3,375 121,500 2,025 76
Month 54 3,645 21,870 3,645 21,870 3,375 141,750 2,363 89
Month 60 3,645 21,870 3,645 21,870 3,375 162,000 2,700 101

November 26, 2009 46


Storage Capacities - 0% Annual Growth Rate

180,000

160,000

140,000

120,000

100,000
GB

80,000

60,000

40,000

20,000

0
1

10

13

16

19

22

25

28

31

34

37

40

43

46

49

52

55

58
th

th

th

th

th

th

th

th

th

th

th

th

th

th

th

th

th

th

th

th
on

on

on

on

on

on

on

on

on

on

on

on

on

on

on

on

on

on

on

on
M

M
Total Secondary GB Total Primary GB Total Tertiary GB

November 26, 2009 47


Media Requirements - 0% Annual Growth Rate

3,000

2,500
Number of Media

2,000

1,500

1,000

500

0
Month Month Month Month Month Month Month Month Month Month Month Month Month Month Month
1 5 9 13 17 21 25 29 33 37 41 45 49 53 57

Month

UDO Medium Slots LTO4 Media LTO3 Media

November 26, 2009 48


Sample Storage Capacity Planning — 10% Annual
Growth Rate
Annual Growth Rate 10%
Disk Storage Contingency, Allowance for Less Than 100% Utilisation, RAID, Other Overhead 35%
Tape Storage Contingency, Allowance for Less Than 100% Utilisation, Other Overhead 25%
Number of Years to Cater For in Initial Storage Solution 5
Raw Data per Month GB 700
Pre-processed Dara Per Month GB 2,000
Processed Dara Per Month GB 2,000
Primary Data Storage Retention Months 6
Secondary Data Storage Retention Months 6
Tertiary Data Copy Months 12
Tertiary Data Storage Retention Months 9999
Primary
Total Primary Data Per Month GB 2,700
Total Primary Data Per Month Including Contingency and Growth GB 3,645
Primary Storage Including Contingency GB 21,870
Primary Storage Including Contingency and Growth GB 32,020
Secondary
Total Secondary Data Per Month GB 2,700
Total Secondary Data Per Month Including Contingency and Growth GB 3,645
Secondary Storage Including Contingency GB 21,870
Secondary Storage Including Contingency and Growth GB 32,020
UDO Medium Capacity GB 60
LTO4 Medium Capacity Compressed 1600
November 26, 2009 49
Capacities - Annual Growth Rate — 10%

Month Primary Total Secondary Total Tertiary Total UDO LTO4


GB Primary GB Secondary GB Tertiary Medium Media
GB GB GB Slots
Month 6 3,823 22,459 0 0 0 0 0 0
Month 12 4,010 23,586 3,823 22,459 0 0 0 0
Month 18 4,205 24,737 4,010 23,586 3,713 21,723 362 14
Month 24 4,410 25,945 4,205 24,737 3,894 44,447 741 28
Month 30 4,626 27,211 4,410 25,945 4,084 68,280 1,138 43
Month 36 4,851 28,539 4,626 27,211 4,283 93,276 1,555 58
Month 42 5,088 29,932 4,851 28,539 4,492 119,492 1,992 75
Month 48 5,337 31,393 5,088 29,932 4,711 146,988 2,450 92
Month 54 5,597 32,925 5,337 31,393 4,941 175,826 2,930 110
Month 60 5,870 34,533 5,597 32,925 5,183 206,071 3,435 129

November 26, 2009 50


Storage Capacities - 10% Annual Growth Rate

250,000

200,000

150,000
GB

100,000

50,000

0
1

M h7

10

13

16

19

22

25

28

31

34

37

40

43

46

49

52

55

58
th

th

th

th

th

th

th

th

th

th

th

th

th

th

th

th

th

th

th
t
on

on

on

on

on

on

on

on

on

on

on

on

on

on

on

on

on

on

on

on
M

M
Total Secondary GB Total Primary GB Total Tertiary GB

November 26, 2009 51


Media Requirements - 10% Annual Growth Rate
3,500

3,000

2,500
Number of Media

2,000

1,500

1,000

500

0
Month Month Month Month Month Month Month Month Month Month Month Month Month Month Month
1 5 9 13 17 21 25 29 33 37 41 45 49 53 57

Month

UDO Medium Slots LTO4 Media LTO3 Media

November 26, 2009 52


Sample Storage Capacity Planning — 20% Annual
Growth Rate
Annual Growth Rate 20%
Disk Storage Contingency, Allowance for Less Than 100% Utilisation, RAID, Other Overhead 35%
Tape Storage Contingency, Allowance for Less Than 100% Utilisation, Other Overhead 25%
Number of Years to Cater For in Initial Storage Solution 5
Raw Data per Month GB 700
Pre-processed Dara Per Month GB 2,000
Processed Dara Per Month GB 2,000
Primary Data Storage Retention Months 6
Secondary Data Storage Retention Months 6
Tertiary Data Copy Months 12
Tertiary Data Storage Retention Months 9999
Primary
Total Primary Data Per Month GB 2,700
Total Primary Data Per Month Including Contingency and Growth GB 3,645
Primary Storage Including Contingency GB 21,870
Primary Storage Including Contingency and Growth GB 45,350
Secondary
Total Secondary Data Per Month GB 2,700
Total Secondary Data Per Month Including Contingency and Growth GB 3,645
Secondary Storage Including Contingency GB 21,870
Secondary Storage Including Contingency and Growth GB 45,350
UDO Medium Capacity GB 60
LTO4 Medium Capacity Compressed 1600

November 26, 2009 53


Capacities - Annual Growth Rate — 20%

Month Primary Total Secondary Total Tertiary Total UDO LTO4


GB Primary GB Secondary GB Tertiary Medium Media
GB GB GB Slots
Month 6 3,993 23,016 0 0 0 0 0 0
Month 12 4,374 25,274 3,993 23,016 0 0 0 0
Month 18 4,791 27,687 4,374 25,274 4,050 23,163 386 14
Month 24 5,249 30,329 4,791 27,687 4,437 48,413 807 30
Month 30 5,750 33,224 5,249 30,329 4,860 76,072 1,268 48
Month 36 6,299 36,395 5,750 33,224 5,324 106,371 1,773 66
Month 42 6,900 39,869 6,299 36,395 5,832 139,562 2,326 87
Month 48 7,558 43,674 6,900 39,869 6,389 175,921 2,932 110
Month 54 8,280 47,843 7,558 43,674 6,998 215,750 3,596 135
Month 60 9,070 52,409 8,280 47,843 7,666 259,381 4,323 162

November 26, 2009 54


Storage Capacities - 20% Annual Growth Rate

250,000

200,000

150,000
GB

100,000

50,000

0
1

10

13

16

19

22

25

28

31

34

37

40

43

46

49

52

55

58
th

th

th

th

th

th

th

th

th

th

th

th

th

th

th

th

th

th

th

th
on

on

on

on

on

on

on

on

on

on

on

on

on

on

on

on

on

on

on

on
M

M
Total Secondary GB Total Primary GB Total Tertiary GB

November 26, 2009 55


Media Requirements - 20% Annual Growth Rate
4,500

4,000

3,500
Number of Media

3,000

2,500

2,000

1,500

1,000

500

0
Month Month Month Month Month Month Month Month Month Month Month Month Month Month Month
1 5 9 13 17 21 25 29 33 37 41 45 49 53 57

Month

UDO Medium Slots LTO4 Media LTO3 Media

November 26, 2009 56


Sample Storage Capacity Planning — 30% Annual
Growth Rate
Annual Growth Rate 30%
Disk Storage Contingency, Allowance for Less Than 100% Utilisation, RAID, Other Overhead 35%
Tape Storage Contingency, Allowance for Less Than 100% Utilisation, Other Overhead 25%
Number of Years to Cater For in Initial Storage Solution 5
Raw Data per Month GB 700
Pre-processed Dara Per Month GB 2,000
Processed Dara Per Month GB 2,000
Primary Data Storage Retention Months 6
Secondary Data Storage Retention Months 6
Tertiary Data Copy Months 12
Tertiary Data Storage Retention Months 9999
Primary
Total Primary Data Per Month GB 2,700
Total Primary Data Per Month Including Contingency and Growth GB 3,645
Primary Storage Including Contingency GB 21,870
Primary Storage Including Contingency and Growth GB 62,463
Secondary
Total Secondary Data Per Month GB 2,700
Total Secondary Data Per Month Including Contingency and Growth GB 3,645
Secondary Storage Including Contingency GB 21,870
Secondary Storage Including Contingency and Growth GB 62,463
UDO Medium Capacity GB 60
LTO4 Medium Capacity Compressed 1600

November 26, 2009 57


Capacities - Annual Growth Rate — 30%

Month Primary Total Secondary Total Tertiary Total UDO LTO4


GB Primary GB Secondary GB Tertiary Medium Media
GB GB GB Slots
Month 6 4,156 23,545 0 0 0 0 0 0
Month 12 4,739 26,937 4,156 23,545 0 0 0 0
Month 18 5,403 30,713 4,739 26,937 4,388 24,575 410 15
Month 24 6,160 35,019 5,403 30,713 5,003 52,398 873 33
Month 30 7,024 39,927 6,160 35,019 5,704 84,122 1,402 53
Month 36 8,008 45,524 7,024 39,927 6,503 120,292 2,005 75
Month 42 9,131 51,906 8,008 45,524 7,415 161,532 2,692 101
Month 48 10,410 59,182 9,131 51,906 8,454 208,554 3,476 130
Month 54 11,870 67,477 10,410 59,182 9,639 262,167 4,369 164
Month 60 13,534 76,936 11,870 67,477 10,991 323,294 5,388 202

November 26, 2009 58


Storage Capacities - 30% Annual Growth Rate

250,000

200,000

150,000
GB

100,000

50,000

0
1

10

13

16

19

22

25

28

31

34

37

40

43

46

49

52

55

58
th

th

th

th

th

th

th

th

th

th

th

th

th

th

th

th

th

th

th

th
on

on

on

on

on

on

on

on

on

on

on

on

on

on

on

on

on

on

on

on
M

M
Total Secondary GB Total Primary GB Total Tertiary GB

November 26, 2009 59


Media Requirements - 30% Annual Growth Rate
5,000

4,500

4,000

3,500
Number of Media

3,000

2,500

2,000

1,500

1,000

500

0
Month Month Month Month Month Month Month Month Month Month Month Month Month Month Month
1 5 9 13 17 21 25 29 33 37 41 45 49 53 57

Month

UDO Medium Slots LTO4 Media LTO3 Media

November 26, 2009 60


10 Year Data Storage Capacities — Different
Growth Rates
1,800,000

1,600,000

1,400,000

1,200,000

1,000,000
GB

800,000

600,000

400,000

200,000

0
Month Month Month Month Month Month Month Month Month Month Month Month Month Month Month Month Month Month Month Month
6 12 18 24 30 36 42 48 54 60 66 72 78 84 90 96 102 108 114 120

Total Primary GB - 10% Total Secondary GB - 10% Total Tertiary GB - 10%


Total Primary GB - 20% Total Secondary GB - 20% Total Tertiary GB - 20%
Total Primary GB - 30% Total Secondary GB - 30% Total Tertiary GB - 30%
November 26, 2009 61
Single Drive/Path Tertiary Layer Data Write Times
— Tape and Optical
2,000

1,800

1,600

1,400

1,200
Hours

1,000

800

600

400

200

0
1

5
M h9

on 3

on 7

on 1

on 5

on 9

on 3

on 7

on 1

on 5

on 9

on 3

on 7

on 1

on 5

on 9

on 3

on 7

on 1

on 5

on 9

on 3
7

on 05

on 09

on 13

7
1

9
M th 9

10

11
th

th

1
th

th

th

th

th

th

th

th

th

th

th

th

th

th

th

th

th

th

th

th

th
t
on

on

on

th

th

th

th

th
on

on

on
M

M
Tape Write Time Hours 10% Growth Optical Write Time Hours 10% Growth Tape Write Time Hours 20% Growth
Optical Write Time Hours 20% Growth Tape Write Time Hours 30% Growth Optical Write Time Hours 30% Growth

November 26, 2009 62


Implementation Options

• Factors:
− 2 or 3 tiers
− Optical, tape or VTL as the last tier
− Use of existing storage (HP/Dell) or new storage
− DR or no DR
• Offsite manual copy or replication
− Software HSM — use existing NetBackup or other: HT
FileStore, CaminoSoft, IBM Tivoli

November 26, 2009 63


Spectrum of Options

All disk Mixed disk/tape/optical/VTL/manual/automated Primary disk


DR option Secondary -
with tape
replicated
data

November 26, 2009 64


Data Retrieval Operation

• Secondary disk
− Data is retrieved to primary immediately — available within
seconds/minutes
• Secondary/tertiary VTL
− Data is retrieved to primary immediately — available within minutes
• Secondary/tertiary tape library
− Data is retrieved to primary immediately — available within minutes
• Secondary/tertiary optical library
− Data is retrieved to primary immediately — available within hours
• Manual media retrieval
− Retrieval times depends on media location and staff allocated to media
handling

November 26, 2009 65


Sample Options

• Three tiers — optical or tape library as third tier


• All disk
• Reuse/expand existing hardware
• Low cost ATA disks for secondary storage

• Notall available options — presented for review and


feedback

November 26, 2009 66


Physical Option 1 — Three Tiers — Optical or Tape

November 26, 2009 67


Physical Option 1 — Three Tiers — Optical or Tape

November 26, 2009 68


Physical Option 1 - Components

• Primary storage — SAN with fibre disk


• Second storage — SAN with ATA disk
• Tertiary storage — optical library
• Software
− HT Filestore
− Caminosoft
− NetBackup Storage Migrator
− Tivoli Storage Manager

November 26, 2009 69


Resilience

• Primary storage
mirrored for
resilience

November 26, 2009 70


Operation and Service Level Agreement

November 26, 2009 71


Physical Option 2 — All Disk Configuration

• All disk storage option


• Two mirrored sites with realtime replication
• Multiple replicated components for resilience
• Sample configuration
− Primary Storage
• Clustered SAN Controllers with 594 x 300 GB Fibre Channel Drives =
151 TB Raw Storage
− Secondary Storage
• Clustered SAN Controllers with 336 x 750 GB SATA Drives = 252 TB
Raw Storage
− Total 403 TB of Raw Storage capacity (doubled for DR)

November 26, 2009 72


All Disk Configuration

November 26, 2009 73


Resilience — Multiple Points of Redundancy

November 26, 2009 74


Resilience

• SAN switches
• SAN controllers
• Two disks per shelf
• Entire site

November 26, 2009 75


All Disk Configuration

• Indicative hardware and software (replication, snapshot)


cost
− €1.8 million
− €4,460 per TB (doubled for DR)
•5 standard racks in each location
• Does not include
− HSM software
− Installation and commissioning
• Represents high water mark in terms of costs and
functionality

November 26, 2009 76


All Disk Configuration

Advantages
• High performance
• Low manual intervention
• Highly resilient

Disadvantages
• High cost of acquisition and operation
• Growth in data volumes means additional expense
• No upper limit on cost
November 26, 2009 77
Physical Option 3 — Existing Hardware

• Raw, pre-processed and processed data resides on HP


EVA
• Replicated continuously to second EVA
• Dell CX disk array used as secondary location
• ExistingADIC LTO drives used for tertiary and long
term offsite storage

November 26, 2009 78


November 26, 2009 79
Existing Hardware

Advantages
• Cost

• Some skill sets already in organisation

Disadvantages
• Investment in old technology
• Software based HSM product skills required

November 26, 2009 80


Introduction of Tertiary Device

• Existing HP and Dell storage still employed


• UDO or LTO device used as final destination before
removal to offsite archive

November 26, 2009 81


November 26, 2009 82
Introduction of Tertiary Device

Advantages
• Cost — use of existing hardware
• Some skill sets already in organisation
• Media life is increased with UDO

Disadvantages
• Cost — UDO or new tape library
• Management of archived media — especially UDO as they are
low capacity
• Investment in old technology
• Software based HSM product skills required
• UDO retrieval speeds

November 26, 2009 83


Virtual Tape Library

• VTL device will act as a tape library


• VTL will be secondary location
• HSM product skills may not be required
• NetBackup could manage this process
• VTL data will ultimately be archived to tape via ADIC
tape library

November 26, 2009 84


November 26, 2009 85
Virtual Tape Library

Advantages
• Some skill sets already in organisation
• No new third party migration tool absolutely necessary
• Extension of NetBackup system using NetBackup Storage
Migrator

Disadvantages
• Cost — VTL with required capacity can be expensive
• Cannot take VTL backups offsite — tertiary solution still required
• Lack of vendor implementation experience

November 26, 2009 86


Physical Option 4 — Disk Based Secondary
Information Store
• Singlestorage device with multiple PB of data
scalability
• Datacan be retained on information store for 15+ years
and beyond
•1 TB disk make this possible
• Data can be moved to storage attached tape
• Internal
backup features of information store can aid
NetBackup routine (SnapShots, Vaulting)

November 26, 2009 87


November 26, 2009 88
Disk Based Information Store

Advantages
• Speed of retrieval
• No new third party migration tool absolutely necessary
• Simplicity
• Integration with NetBackup — no effect on daily backup routines
• Information store can be split across multiple information stores
to give multiple PB capacity is required

Disadvantages
• Cost — may be expensive initially but storage can be added over
time as needed
November 26, 2009 89
Central Management — Storage Virtualisation

• Controller site above storage systems


• Handle day to day management of storage across all
platforms
Advantages
• Skill set consolidation
• Costs

Disadvantages
• Vendor based skill are still ultimately required

November 26, 2009 90


November 26, 2009 91
Key Questions

• Number of storage tiers and preferred configuration


• Use of tape/optical/VTL
• Software HSM option
• Disaster recovery/business continuity requirements and
options
• Capacity planning constraints and assumptions
• New hardware or reuse of existing hardware
• Level of automation required for archival level
• Financial constraints and budget available
• Implementation schedule

November 26, 2009 92


More Information

Alan McSweeney
alan@alanmcsweeney.com

November 26, 2009 93

You might also like