Professional Documents
Culture Documents
Environment
Mathias Defiebre
IBM Lab Services
mathias.defiebre@de.ibm.com
Agenda
Data Deduplication Overview
Data Deduplication Theory
Data Deduplication Approaches in Practice
Data Deduplication Considerations and Value Proposition
TS7650 ProtecTIER Deduplication Gateway
TS7650 ProtecTIER Deduplication Appliance Series
A look in the Future
Links
A B C D A E F F D
Identical Chunks
1. File based
2. Block based
Understands explicit data formats and chunks data objects according to the format
Chunking is based on an algorithm that looks for logical breaks or similar elements
within a data object/stream
A B C D A E F F D
1. Hashing
2. Binary Comparison
3. Delta Differencing
Computes a delta between two similar chunks of data where one chunk is the
baseline and the second is the delta
Since each delta is unique there is no possibility of collision
To reconstruct the original chunk the delta(s) have to be re-applied to the baseline
chunk
Client
LAN or SAN
LAN
Client-side
Server-side
Storage-side
+ Allows cross-correlation
among multiple Clients
+ Reduces bandwidth on
LAN
Adds load to Client
No cross-correlation
among multiple clients
Storage Device
10
11
Format Aware
Format
Agnostic
12
Hash based
Fixed/Variable
Block Size
Hashing
Delta Diff
Binary Diff
Content Aware
HyperFactor
Ah Bh Ch Dh Eh
3. Compare hashes with hash table
Hash Value
Storage
locations
Object
References
13
Example:
Chunk size of 8KB, each hash is 20 bytes long
With a 1 TB repository:
1 TByte repository has ~134,000,000 chunks of 8 KB each
Need pointers scheme to reference inside 1 TByte
Hash table requires ~2.5 GB of memory no issue
With a 100 TB repository:
Hash table requires ~250 GB of memory performance!!!
14
HyperFactor Approach
15
HyperFactor Approach
1. Look through data stream for similarity and filter similar elements
Using HyperFactor Index (fixed size 4 GB)
Element A
Element B
Element C
17
18
19
Compaction
Lotus Notes Database
Multiplexing
Multiple backup streams to a single tape drive
Veritas Backup Exec
Computer Associates ARCserve
Oracle RMAN multiplexing of backup sets
Encryption
Above technologies change the data stream making identical data non-identical!
20
No
encryption
Dataencryption
encryptionprior
priortotodedeData
duplicationprocessing
processingcan
can
duplication
subvertdata
datareduction
reduction
subvert
Important text
Data Store
Data
source 2
Important text
Encryption
key 1
Data
Deduplication
txpt tnatroemI
Important text
txpt tnatroemI
Data
source 3
te tarpIxtntom
Important text
1. Three data
sources have
the same text file
21
Encryption
key 2
te tarpIxtntom
2. After encryption,
text files do not
match
Compression
possible
3. Deduplication
processing does not
detect redundancy
22
23
ProtecTIER Server
FC
Backup Server
24
ProtecTIER
Application
Disk Storage
System
HyperFactor
Data Storage
Memory
Resident Index
(4 GB, may contain
predefined elements)
Disk Arrays
FC Switch
ProtecTIER
Server
Existing Data
Backup Servers
Restore Index
Filtered data
Backup Policies
# full backups
# Inc backups
backup frequency
data retention period
26
Backup
Server
Represented capacity
Primary Site
ProtecTIER
Gateway
Physical
capacity
Backup
Server
Represented capacity
Secondary Site
Backup
Server
27
ProtecTIER
Gateway
Physical
capacity
28
1u empty space
1u empty space
1u empty space
1u empty space
1u empty space
1u empty space
1u empty space
1u empty space
1u empty space
1u empty space
1u empty space
U U U U P U U U U P U U U U P S
U U U U P U U U U P U U U U P S
U U U U P U U U U P U U U U P S
U U U U P U U U U P U U U U P S
U U U U P U U U U P U U U U P S
M M M M M m m m m m U U U U P S
M M M M M m m m m m U U U U P S
1u empty space
1u empty space
1u empty space
X3850 M2
3 x 6core, 24GB RAM
EXP810
U U U U P U U U U P U U U U P S
EXP810
U U U U P U U U U P U U U U P S
EXP810
M M M M M m m m m m U U U U P S
X3850 M2
3 x 6core, 24GB RAM
U U U U P U U U U P U U U U P S
DS4700
U U U U P U U U U P U U U U P S
Power: FC1903
U U U U P U U U U P U U U U P S
Power: Base
Power: FC1903
M M M M M m m m m m U U U U P S
EXP810
U U U U P U U U U P U U U U P S
EXP810
U U U U P U U U U P U U U U P S
U U U U P U U U U P U U U U P S
EXP810
M M M M M m m m m m U U U U P S
Power: Base
EXP810
Power: Base
29
U U U U P U U U U P U U U U P S
DS4700
1u empty space
U U U U P U U U U P U U U U P S
Power: Base
1u empty space
1u empty space
1u empty space
DS4700
X3850 M2
3 x 6core, 24GB RAM
Power: FC1903
X3850 M2
3 x 6core, 24GB RAM
EXP810
X3850 M2
3 x 6core, 24GB RAM
EXP810
1u empty space
1u empty space or TSSC
1u empty space or TSSC
EXP810
1u empty space
1u empty space or TSSC
1u empty space or TSSC
DS4700
Power: FC1903
1u empty space
1u empty space or TSSC
1u empty space or TSSC
1u empty space
1u empty space
WTI Switch
1u empty space
1u empty space
1u empty space
EXP810
1u empty space
1u empty space
U U U U P U U U U P U U U U P S
EXP810
1u empty space
1u empty space
EXP810
1u empty space
500MB/sec
DS4700
1u empty space
31.5TB
EXP810
1u empty space
1u empty space
1u empty space
F05
Base Frame
1u empty space
1u empty space
1u empty space
1u empty space
1u empty space
1u empty space
1u empty space
1u empty space
1u empty space
500MB/sec
EXP810
1u empty space
1u empty space
31.5TB
Clustered 4700
128 spindle 450GB (8 drawer)
36TB 450MB/sec
EXP810
F05
Base Frame
1u empty space
1u empty space
Standalone 4700
128 spindle 450GB (8 drawer)
36TB 450MB/sec
F05
Base Frame
15.8TB
F05
Base Frame
6.3TB
Standalone 4700
64 spindle 450GB (4 drawer)
18TB 250MB/sec
DS4700
Standalone 4700
32 spindle 450GB (2 drawer)
7TB 100MB/sec
M M M M M m m m m m U U U U P S
30
Intelligent storage devices will be tighly integrated with 3rd party backup
applications
e.g. controlling & monitoring replication from a backup application
31
Links
32
Links I
TS7650G ProtecTIER Deduplication Gateway
http://www-03.ibm.com/systems/storage/tape/ts7650g/index.html
TS7650 ProtecTIER Deduplication Appliance
http://www-03.ibm.com/systems/storage/tape/ts7650a/index.html
Whitepaper: IBM Data Deduplication Strategy and Operations
http://www.ibm.com/developerworks/wikis/display/tivolistoragemana
ger/IBM+Tivoli+Storage+Manager+V6.1+Data+Deduplication+Strate
gy+and+Operations
Redbook: The IBM System Storage TS7650G and TS7650
ProtecTIER Servers
http://w3.itso.ibm.com/redpieces/abstracts/sg247652.html?Open
33
Links II
34
__________________________________________________________________________________________________________________________________________
35
Software Development
Storage & Tape
Linux
Mainframe
File Systems
__________________________________________________________________________________________________________________________________________
We offer technical
support from the
planning phase through
well after installation
Our Services
Our Expertise
36
Hindi
Hebrew
Simplified
Chinese
Russian
Gracias
ThankYou Obrigado
Spanish
English
Arabic
Tak
Brazilian Portuguese
Danish
Danke
Grazie
German
Italian
Korean
Merci
French
Japanese
Tamil
Traditional Chinese
37
Thai
Disclaimer I
38
No part of this document may be reproduced or transmitted in any form without written permission from
IBM Corporation.
The performance data contained herein were obtained in a controlled, isolated environment. Results
obtained in other operating environments may vary significantly. While IBM has reviewed each item for
accuracy in a specific situation, there is no guarantee that the same or similar results will be obtained
elsewhere. These values do not constitute a guarantee of performance. The use of this information or
the implementation of any of the techniques discussed herein is a customer responsibility and depends
on the customer's ability to evaluate and integrate them into their operating environment. Customers
attempting to adapt these techniques to their own environments do so at their own risk.
Product data has been reviewed for accuracy as of the date of initial publication. Product data is
subject to change without notice. This information could include technical inaccuracies or
typographical errors. IBM may make improvements and/or changes in the product(s) and/or
program(s) at any time without notice. Any statements regarding IBM's future direction and intent are
subject to change or withdrawal without notice, and represent goals and objectives only
References in this document to IBM products, programs, or services does not imply that IBM intends to
make such products, programs or services available in all countries in which IBM operates or does
business. Any reference to an IBM Program Product in this document is not intended to state or imply
that only that program product may be used. Any functionally equivalent program, that does not
infringe IBM's intellectually property rights, may be used instead. It is the user's responsibility to
evaluate and verify the operation of any on-IBM product, program or service.
Disclaimer II
THE INFORMATION PROVIDED IN THIS DOCUMENT IS DISTRIBUTED "AS IS" WITHOUT ANY
WARRANTY, EITHER EXPRESS OR IMPLIED. IBM EXPRESSLY DISCLAIMS ANY WARRANTIES
OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE OR NONINFRINGEMENT.
IBM shall have no responsibility to update this information. IBM products are warranted according to
the terms and conditions of the agreements (e.g. IBM Customer Agreement, Statement of Limited
Warranty, International Program License Agreement, etc.) under which they are provided. IBM is not
responsible for the performance or interoperability of any non-IBM products discussed herein.
Information concerning non-IBM products was obtained from the suppliers of those products, their
published announcements or other publicly available sources. IBM has not tested those products in
connection with this publication and cannot confirm the accuracy of performance, compatibility or any
other claims related to non-IBM products. Questions on the capabilities of non-IBM products should be
addressed to the suppliers of those products.
The provision of the information contained herein is not intended to, and does not, grant any right or
license under any IBM patents or copyrights. Inquiries regarding patent or copyright licenses should be
made, in writing, to:
IBM Director of Licensing
IBM Corporation
North Castle Drive
Armonk, NY 10504-1785
U.S.A.
39
Trademarks
The following terms are trademarks or registered trademarks of the IBM Corporation in either the
United States, other countries or both.
IBM, TotalStorage, zSeries, pSeries, xSeries, S/390, ES/9000, AS/400, RS/6000
z/OS, z/VM, VM/ESA, OS/390, AIX, DFSMS/MVS, OS/2, OS/400, ESCON, Tivoli
iSeries, ES/3090, VSE/ESA, TPF, DFSMSdfp, DFSMSdss, DFSMShsm, DFSMSrmm, FICON,
40
ProtecTIER, XIV
Microsoft, Windows, Windows NT, and the Windows logo are trademarks of Microsoft Corporation in
the United States, other countries, or both. Other company, product, and service names mentioned
may be trademarks or registered trademarks of their respective companies.