You are on page 1of 214

Front cover

ILM Library: Techniques with Tivoli Storage and IBM TotalStorage Products
Learn about basic ILM concepts Use TPC for Data to assess ILM readiness Stages to ILM implementation

Charlotte Brooks Giacomo Chiapparini Wim Feyants Pallavi Galgali Vinicius Franco Jose

ibm.com/redbooks

International Technical Support Organization ILM Library: Techniques with Tivoli Storage and IBM TotalStorage Products February 2006

SG24-7030-00

Note: Before using this information and the product it supports, read the information in Notices on page xi.

First Edition (February 2006)


Copyright International Business Machines Corporation 2006. All rights reserved. Note to U.S. Government Users Restricted Rights -- Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp.

Contents
Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii Notices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi Trademarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xii Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii The team that wrote this redbook. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii Become a published author . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiv Comments welcome. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv Part 1. ILM overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 Chapter 1. Introduction to ILM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.1 What is ILM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.2 Why ILM is needed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.2.1 IT challenges and how ILM can help . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.3 ILM elements. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.3.1 Tiered storage management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.3.2 Long-term data retention. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 1.3.3 Data lifecycle management. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 1.3.4 Policy-based archive management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 1.4 Standards and organizations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 1.4.1 Storage Networking Industry Association (SNIA) . . . . . . . . . . . . . . . . . . . . . . . . . 15 1.5 IT Infrastructure Library and value of ILM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 1.5.1 What is ITIL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 1.5.2 ITIL management processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 1.5.3 ITIL and ILM value . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 Chapter 2. ILM within an On Demand storage environment . . . . . . . . . . . . . . . . . . . . . 2.1 Information On Demand . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1.1 Infrastructure Simplification. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1.2 Business Continuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1.3 Information Lifecycle Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 IBM and ILM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 IBM Information On Demand environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4 Supporting ILM through On Demand storage environment . . . . . . . . . . . . . . . . . . . . . Chapter 3. Implementing ILM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Logical stages in ILM implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.1 Assessment and planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.2 Execution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.3 Monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.4 Flow diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 IBM ILM consulting and services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chapter 4. Product overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 Summary of IBM products for ILM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 TotalStorage Productivity Center for Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.2 Key aspects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Copyright IBM Corp. 2006. All rights reserved.

21 22 22 22 23 24 25 28 29 30 30 33 34 34 34 37 38 38 38 39 iii

4.2.3 Product highlights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 SAN Volume Controller . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.2 Virtualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.3 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4 IBM TotalStorage DS family of disk products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4.1 Enterprise disk storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4.2 Mid-range disk storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5 IBM TotalStorage tape solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5.1 IBM Virtualization Engine TS7510 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.6 Tivoli Storage Manager . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.6.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.6.2 Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.6.3 Tivoli Storage Manager applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.6.4 Tivoli Storage Manager APIs and DR550 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.7 DB2 Content Manager . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.7.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.7.2 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.7.3 Standards and data model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.8 DB2 CommonStore . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.8.1 DB2 CommonStore for Exchange Server . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.8.2 DB2 CommonStore for Lotus Domino. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.8.3 DB2 CommonStore for SAP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.9 More information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

44 45 45 47 48 48 48 48 49 49 49 49 50 51 55 56 58 58 59 59 59 60 61 61

Part 2. Evaluating ILM for your organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 Chapter 5. An ILM quick assessment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 5.1 Initial steps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 5.2 Getting business and storage information. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 5.3 Defining data collection reports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 5.3.1 Creating groups of data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 5.3.2 Collecting reports from TPC for Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 5.4 Classifying data and analyzing reports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 5.4.1 Types of data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 5.4.2 Data classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 5.5 Defining actions with classified data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 5.5.1 Actions for non-business files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 5.5.2 Actions for duplicate files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 5.5.3 Actions for temporary files. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 5.5.4 Actions for stale files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 5.5.5 Actions to RDBMSs space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 5.6 ILM - Return on investment (ROI) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 5.6.1 Data classification and storage cost . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 5.6.2 Data management and personnel cost . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 5.6.3 Long-term retention and non-compliancy penalties cost. . . . . . . . . . . . . . . . . . . . 99 5.6.4 Backup/archiving solutions cost - Disk or tape . . . . . . . . . . . . . . . . . . . . . . . . . . 100 5.7 ILM Services offerings from IBM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 Chapter 6. The big picture for an ILM implementation framework . . . . . . . . . . . . . . . 6.1 The big picture and why you should care about it . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1.1 Business consulting, assessment, definition. . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1.2 Application and server hardware. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1.3 Software infrastructure and automation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv
ILM Library: Techniques with Tivoli Storage and IBM TotalStorage Products

103 104 105 106 107

6.1.4 Hardware infrastructure. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 6.1.5 Management tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 6.2 What to do now - The many entry points to ILM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 Part 3. Sample solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 Chapter 7. ILM initial implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1 Storage management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1.1 Capacity management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1.2 Service level management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2 Optimization of storage occupation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2.1 Reclaimable space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2.2 Avoiding over allocation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3 Tiered storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3.1 What storage devices to use. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chapter 8. Enforcing data placement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.1 Moving from the initial ILM scenario . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2 Requirements for data placement enforcement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2.1 Data classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2.2 Enforcing data placement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chapter 9. Data lifecycle and content management solution . . . . . . . . . . . . . . . . . . . 9.1 Moving from the previous steps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.2 Placement in function of moment in lifecycle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.2.1 Determining the value of the data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.2.2 Placement of data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.2.3 Movement of data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.2.4 Using document management systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.3 E-mail management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.3.1 Reclaim invalid space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.3.2 E-mail archiving . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.4 IBM System Storage Archive Manager . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.4.1 Chronological archive retention. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.4.2 Event-based retention policy. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Related publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . IBM Redbooks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . How to get IBM Redbooks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Help from IBM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 120 121 124 127 128 146 149 150 153 154 155 156 161 165 166 166 167 171 174 175 177 177 179 181 181 182 185 185 185 185

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187

Contents

vi

ILM Library: Techniques with Tivoli Storage and IBM TotalStorage Products

Figures
1-1 1-2 1-3 1-4 1-5 1-6 1-7 1-8 1-9 1-10 2-1 2-2 2-3 2-4 2-5 3-1 3-2 3-3 3-4 4-1 4-2 4-3 4-4 4-5 4-6 4-7 4-8 4-9 4-10 4-11 4-12 4-13 4-14 4-15 4-16 4-17 5-1 5-2 5-3 5-4 5-5 5-6 5-7 5-8 5-9 5-10 5-11 5-12 Information Lifecycle Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Data value changes over time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 ILM elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Traditional non-tiered storage environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 Multi-tiered storage environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 ILM policies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 Information value changes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 Value of information and archive/retrieve management . . . . . . . . . . . . . . . . . . . . . . 15 SNIA vision for ILM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 ITIL processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 IS, BC, and ILM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 Business Continuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 Convergence of technologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 Information On Demand storage environment. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 Information Assets and Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 Data classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 Information classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 Storage tiers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 Flow diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 Storage resource management lifecycle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 First screen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 Availability report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 Asset Report of a computer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 Largest files by computer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 SVC block virtualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 SVC components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 Tivoli Storage Manager components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 Tivoli Storage Manager for Mail . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 Tivoli Storage Manager for Databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 Tivoli Storage Manager for Application Servers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 Tivoli Storage Manager for ERP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 Tivoli Storage Manager for Hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 Tivoli Storage Manager for Space Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 TSM APIs and DR550 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 Enterprise content management components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 IBM content management portfolio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 Quick Assessment steps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 Access File Summary report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 Access Time Summary report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 Disk Capacity Summary report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 Oldest Orphaned Files report. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 Storage Access Times report. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 Storage Capacity report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 Storage Modification Times report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 Total Freespace report. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 User Space Usage report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 Wasted Space report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 Largest Files report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

Copyright IBM Corp. 2006. All rights reserved.

vii

5-13 5-14 5-15 5-16 5-17 5-18 5-19 5-20 5-21 5-22 5-23 5-24 6-1 6-2 6-3 6-4 6-5 6-6 7-1 7-2 7-3 7-4 7-5 7-6 7-7 7-8 7-9 7-10 7-11 7-12 7-13 7-14 7-15 7-16 7-17 7-18 7-19 7-20 7-21 7-22 7-23 7-24 7-25 7-26 7-27 8-1 8-2 8-3 8-4 8-5 8-6 9-1 9-2 viii

Duplicate Files report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 File Types Report. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 Access Time report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 Modification Time Reporting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 Database Storage by Computer report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 Database Storage by Computer report table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 Total Database Free report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 Segments with Wasted Space report. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 Access Time Reporting by report group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 Access Time reporting file systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 Modification Time Reporting by report group. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 Modification Time reporting file systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 ILM implementation framework at service level maturity . . . . . . . . . . . . . . . . . . . . . 105 Business, assessment, and ongoing tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 Server types and agents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 Software components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 Storage Hardware Infrastructure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 Placement of WBEM and CIM technology. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 Generic process. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 Capacity management tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 Space usage over time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 Service level management activities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 Steps to create a service level agreement. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126 Overview of space usage. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128 Top 10 file types using the most space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130 Defining non-business data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 Ratio between temporary space and used space remains constant . . . . . . . . . . . . 134 Ratio between temporary used space and total used space increasing . . . . . . . . . 135 Decrease in ratio between temporary and used space . . . . . . . . . . . . . . . . . . . . . . 136 Creating a profile - Defining statistics to gather . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 Defining the file filters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 Defining the scan systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138 Defining the scan profile . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138 Generating a report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 Temporary space report. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 Organizational and project-based file-sharing structure . . . . . . . . . . . . . . . . . . . . . 140 Moving stale data in a two-tier Tivoli Storage Manager HSM solution. . . . . . . . . . . 142 TPC for data access time reporting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 HSM Data placement in function of time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145 Overview of two-tier HSM implementation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146 File system unused space reporting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147 Defining the space allocation trigger level . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148 Database unused space report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148 Handling over-allocated file systems and databases. . . . . . . . . . . . . . . . . . . . . . . . 149 Matching data classes to storage tiers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150 Initial static ILM implementation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154 Adding automated data placement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155 Adding file-based location rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156 Complete picture of data types to tiers mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . 161 Server tiered volume mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162 Enforcing data placement using TPC for Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164 Adding the lifecycle dimension. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166 The changing value of data over time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167

ILM Library: Techniques with Tivoli Storage and IBM TotalStorage Products

9-3 9-4 9-5 9-6 9-7 9-8 9-9 9-10

Business process to data mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cost versus benefit for storage placement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Process example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E-mail propagation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E-mail archiving diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Standard IBM System Storage Archive Manager archive retention. . . . . . . . . . . . . Event driven archiving mechanism - Honoring RETVER. . . . . . . . . . . . . . . . . . . . . Event driven archiving mechanism - honouring RETMIN - case 2 . . . . . . . . . . . . .

168 170 173 178 179 181 182 183

Figures

ix

ILM Library: Techniques with Tivoli Storage and IBM TotalStorage Products

Notices
This information was developed for products and services offered in the U.S.A. IBM may not offer the products, services, or features discussed in this document in other countries. Consult your local IBM representative for information on the products and services currently available in your area. Any reference to an IBM product, program, or service is not intended to state or imply that only that IBM product, program, or service may be used. Any functionally equivalent product, program, or service that does not infringe any IBM intellectual property right may be used instead. However, it is the user's responsibility to evaluate and verify the operation of any non-IBM product, program, or service. IBM may have patents or pending patent applications covering subject matter described in this document. The furnishing of this document does not give you any license to these patents. You can send license inquiries, in writing, to: IBM Director of Licensing, IBM Corporation, North Castle Drive Armonk, NY 10504-1785 U.S.A. The following paragraph does not apply to the United Kingdom or any other country where such provisions are inconsistent with local law: INTERNATIONAL BUSINESS MACHINES CORPORATION PROVIDES THIS PUBLICATION "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF NON-INFRINGEMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Some states do not allow disclaimer of express or implied warranties in certain transactions, therefore, this statement may not apply to you. This information could include technical inaccuracies or typographical errors. Changes are periodically made to the information herein; these changes will be incorporated in new editions of the publication. IBM may make improvements and/or changes in the product(s) and/or the program(s) described in this publication at any time without notice. Any references in this information to non-IBM Web sites are provided for convenience only and do not in any manner serve as an endorsement of those Web sites. The materials at those Web sites are not part of the materials for this IBM product and use of those Web sites is at your own risk. IBM may use or distribute any of the information you supply in any way it believes appropriate without incurring any obligation to you. Information concerning non-IBM products was obtained from the suppliers of those products, their published announcements or other publicly available sources. IBM has not tested those products and cannot confirm the accuracy of performance, compatibility or any other claims related to non-IBM products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products. This information contains examples of data and reports used in daily business operations. To illustrate them as completely as possible, the examples include the names of individuals, companies, brands, and products. All of these names are fictitious and any similarity to the names and addresses used by an actual business enterprise is entirely coincidental. COPYRIGHT LICENSE: This information contains sample application programs in source language, which illustrates programming techniques on various operating platforms. You may copy, modify, and distribute these sample programs in any form without payment to IBM, for the purposes of developing, using, marketing or distributing application programs conforming to the application programming interface for the operating platform for which the sample programs are written. These examples have not been thoroughly tested under all conditions. IBM, therefore, cannot guarantee or imply reliability, serviceability, or function of these programs. You may copy, modify, and distribute these sample programs in any form without payment to IBM for the purposes of developing, using, marketing, or distributing application programs conforming to IBM's application programming interfaces.

Copyright IBM Corp. 2006. All rights reserved.

xi

Trademarks
The following terms are trademarks of the International Business Machines Corporation in the United States, other countries, or both:
Eserver Eserver Redbooks (logo) developerWorks iSeries xSeries z/OS AIX Domino DB2 Universal Database DB2 Enterprise Storage Server FlashCopy Informix IBM Lotus Notes Lotus Notes OS/390 POWER5 Redbooks System Storage Tivoli TotalStorage VideoCharger Virtualization Engine WebSphere

The following terms are trademarks of other companies: JDBC, Streamline, and all Java-based trademarks are trademarks of Sun Microsystems, Inc. in the United States, other countries, or both. Excel, Microsoft, Outlook, PowerPoint, Visio, Visual Basic, Windows, and the Windows logo are trademarks of Microsoft Corporation in the United States, other countries, or both. Intel, Intel logo, Intel Inside logo, and Intel Centrino logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States, other countries, or both. UNIX is a registered trademark of The Open Group in the United States and other countries. Linux is a trademark of Linus Torvalds in the United States, other countries, or both. Other company, product, or service names may be trademarks or service marks of others.

xii

ILM Library: Techniques with Tivoli Storage and IBM TotalStorage Products

Preface
Every organization has large amounts of data to store, use, and manage. For most, this quantity is increasing. However, over time, the value of this data changes. How can we map data to appropriately priced storage media, so that it can be accessed in a timely manner when needed, retained for as long as required, and disposed of when no longer needed? Information Lifecycle Management (ILM) provides solutions. What is ILM? ILM is the process of managing information from creation, through its useful life, to its eventual destruction. In a manner that aligns storage costs with the changing business value of information. We can think of ILM as an integrated solution of five IT management and infrastructure components working together: Service management (service levels), content management, workflow management (or process management), storage management, and storage infrastructure. This IBM Redbook will help you understand what ILM is, why it is of value to you in your organization, and some suggested ways to implement it using IBM products. It focuses particularly on data life cycle management. Look for other Redbooks on topics such as archive and retention management.

The team that wrote this redbook


This redbook was produced by a team of specialists from around the world working at the International Technical Support Organization, San Jose Center. Charlotte Brooks is an IBM Certified IT Specialist and Project Leader for Storage Solutions at the International Technical Support Organization, San Jose Center. She has 15 years of experience with IBM in storage hardware and software support, deployment, and management. She has written many Redbooks, and has developed and taught IBM classes in all areas of storage and storage management. Before joining the ITSO in 2000, she was the Technical Support Manager for Tivoli Storage Manager in the Asia Pacific Region. Giacomo Chiapparini is an IBM Certified System Expert for Open Systems Storage Solutions in IBM Global Services Switzerland. He has eight years of practical experience in designing, implementing, and supporting different storage solutions across the country. He is an SNIA certified professional and holds product certifications for Linux, AIX, and Windows. His areas of expertise include storage products, storage networking, and open systems server hardware with corresponding operating systems. Wim Feyants is an IBM Certified IT Specialist in Belgium. He has 11 years of experience in different IT fields. His areas of expertise include storage infrastructure and storage management solution, and designing and implementing them for clients. He has written extensively on different storage-related matters, including a number of Redbooks. Pallavi Galgali is a Software Engineer at the IBM India Software Lab in Pune, India. Pallavi has been involved in development and maintenance projects with products such as SAN File System and Advanced Distributed File System. She has co-authored the IBM Redbook The IBM TotalStorage Solutions Handbook, SG24-5250, and an article on developerWorks titled A comparison of security subsystems on AIX, Linux, and Solaris. She holds a degree in Computer Engineering from Pune Institute of Computer Technology, India. Her areas of expertise include storage networking, file systems, and device drivers. Vinicius Franco Jose is a Senior IT Specialist at IBM Brazil. He has been in the IT industry for eight years and has extensive experience implementing UNIX and Storage solutions.
Copyright IBM Corp. 2006. All rights reserved.

xiii

His areas of expertise include several TotalStorage Disk and Tape solutions, Tivoli Storage products, and TPC. He also has experience in storage networking, SAN Volume Controller, and SAN File System. He holds products certifications including AIX, Tivoli Storage Manager, and TPC for Data. He is currently working in IBM Global Services on client support and services delivery. He is also a member of the ILM IT Solution group in Brazil deploying solutions for ILM projects.

Figure 1 The team: Wim, Charlotte, Pallavi, Giacomo, Vinicius

Thanks to the following people for their contributions to this book: David Bartlett, Larry Heathcote, BJ Klingenberg, Toby Marek, Scott McPeek, Dave Russell, Evan Salop, Chris Saul, Scott Selvig, Alan Stuart, Sergei Varbanov IBM Emma Jacobs, Mary Lovelace, Sangam Racherla International Technical Support Organization, San Jose Center Julie Czubik International Technical Support Organization, Poughkeepsie Center Taya Wyss Enterprise Strategy Group Pillip Mills SNIA Rep for IBM

Become a published author


Join us for a two- to six-week residency program! Help write an IBM Redbook dealing with specific products or solutions, while getting hands-on experience with leading-edge technologies. You'll team with IBM technical professionals, Business Partners and/or clients.

xiv

ILM Library: Techniques with Tivoli Storage and IBM TotalStorage Products

Your efforts will help increase product acceptance and client satisfaction. As a bonus, you'll develop a network of contacts in IBM development labs, and increase your productivity and marketability. Find out more about the residency program, browse the residency index, and apply online at:
ibm.com/redbooks/residencies.html

Comments welcome
Your comments are important to us! We want our Redbooks to be as helpful as possible. Send us your comments about this or other Redbooks in one of the following ways: Use the online Contact us review redbook form found at:
ibm.com/redbooks

Send your comments in an email to:


redbook@us.ibm.com

Mail your comments to: IBM Corporation, International Technical Support Organization Dept. QXXE Building 80-E2 650 Harry Road San Jose, California 95120-6099

Preface

xv

xvi

ILM Library: Techniques with Tivoli Storage and IBM TotalStorage Products

Part 1

Part

ILM overview
In this part we introduce basic definitions and concepts for ILM, as well as some of the core IBM and Tivoli products in this solution space.

Copyright IBM Corp. 2006. All rights reserved.

ILM Library: Techniques with Tivoli Storage and IBM TotalStorage Products

Chapter 1.

Introduction to ILM
Information is essential to any business. Organizations have the challenge to efficiently manage information throughout its lifecycle, related to its business value. The quantity of information and its value changes over time, and becomes increasingly costly and complex to store and manage. This chapter discusses the importance of ILM, its benefits, and introduces you to the elements of data lifecycle management.

Copyright IBM Corp. 2006. All rights reserved.

1.1 What is ILM


Information Lifecycle Management (ILM) is a process for managing information through its lifecycle, from conception until disposal, in a manner that optimizes storage and access at the lowest cost. ILM is not just hardware or softwareit includes processes and policies to manage the information. It is designed upon the recognition that different types of information can have different values at different points in their lifecycle. Predicting storage needs and controlling costs can be especially challenging as the business grows. The overall objectives of managing information with Information Lifecycle Management are to help reduce the total cost of ownership (TCO) and help implement data retention and compliance policies. In order to effectively implement ILM, owners of the data need to determine how information is created, how it ages, how it is modified, and if/when it can safely be deleted. ILM segments data according to value, which can help create an economical balance and sustainable strategy to align storage costs with businesses objectives and information value. The adoption of ILM technologies and processes, as shown in Figure 1-1, turns that strategy into a business reality.

Figure 1-1 Information Lifecycle Management

1.2 Why ILM is needed


In order to run your business efficiently, you need fast access to your stored data. But in todays business environment, you face increasing challenges: The explosion of the sheer volume of digital information, the increasing cost of storage management, tight regulatory requirements for data retention, and manual business and IT processes that are increasingly complex and error prone. Although the total value of stored information has increased overall, historically, not all data is created equal, and the value of that data to business operations fluctuates over time. This is shown in Figure 1-2 on page 5, and is commonly referred to as the data lifecycle. The existence of the data lifecycle means that all data cannot be treated the same.

ILM Library: Techniques with Tivoli Storage and IBM TotalStorage Products

120
100

Data Value

80 60 40 20 0 7 Days 14 Days 21 Days 28 Days 3 Months 6 9 Months Months 1 Year

Data Base Development Code Email Productivity Files MPEG

5 Years

10 Years

Source of graph: Enterprise Strategy Group

Time

Figure 1-2 Data value changes over time

Figure 1-2 shows typical data values of different types of data, mapped over time. Most frequently, the value of data decreases over time, albeit at different rates of decline. However, infrequently accessed or inactive data can become suddenly valuable again as events occur, or as new business initiatives or projects are taken on. Historically, the need to retain information has resulted in a buy more storage mentality. However, this approach has only served to increase overall storage management costs and complexity, and has increased the demand for hard-to-find qualified personnel. Executives today are tasked with reducing overall spending while supporting an ever-increasing number of service and application demands. While support and management tasks increase, IT departments are being asked to justify their position by demonstrating business value to the enterprise. IT must also develop and enhance the infrastructure in order to support business initiatives while facing some or all of these data storage issues: Costs associated with e-mail management can reduce employee productivity in many companies. Backup and recovery windows continue to expand as data volumes grow unmanaged. Inactive data consumes valuable, high-performance disk storage space. Duplicate data copies are consuming additional storage space. As data continues to grow and management costs increase, budgets continue to be under pressure.

1.2.1 IT challenges and how ILM can help


There are many challenges facing business today that make organizations think about managing their information more efficiently and effectively. Among these are some particular issues that might motivate you to develop an ILM strategy and solution: Information and data growing faster than the storage budget. What data can I delete and when? What to keep and for how long? Disk dedicated to specific applications - inhibits sharing. Duplicated copies of files and other data. Where are they and how much space do they use? No mapping of the value of the data to the value of the hardware on which it is stored.

Chapter 1. Introduction to ILM

Longer time required to backup data, but the window keeps shrinking. Storage performance does not meet requirements. Low utilization of existing assets; for example, in open environments, storage utilization rates of around 30 percent are quite typical. Manual processes causing potential business risk due to errors. Regulatory requirements dictate long-term retention for certain data. Inability to achieve backup/recovery/accessibility objectives for critical data. Inability to grow the support staff to keep up with the demand for storage management in an increasingly complex environment. Multiple backup and restore approaches and processes. Storage management requirements not well defined. In response to these, it is necessary to define specific objectives to support and improve information management: Control demand for storage and create policies for allocation. Reduce hardware, software. and storage personnel costs. Improve personnel efficiency, optimizing system and productivity. Define and enforce policies to manage the lifecycle of data. Define and implement the appropriate storage strategy to address current and future business requirements. In the next section, we describe the major ILM solution components and how they can help you to overcome these challenges, and propose an ILM assessment for planning and design.

1.3 ILM elements


To manage the data lifecycle and make your business ready for on demand, there are four main elements that can address your business to an ILM structured environment, as shown in Figure 1-3 on page 7. They are: Tiered storage management Long-term data retention Data lifecycle management Policy-based archive management

ILM Library: Techniques with Tivoli Storage and IBM TotalStorage Products

Tiered Storage
Incorporates tiered storage and advanced SAN technologies. Storage ranging from enterprise disk, midrange disk and tape to optimize costs and availability

Long-Term Data Retention


Address needs for risk and compliance objectives; leverages Content Management and Records Management technologies

The process of managing information, from creation to disposal, in a manner that aligns costs with the changing value of information

Data Life Cycle Management


Exploit Hierarchical Storage Management for any data that needs to be protected and retained for a period of time and then disposed. Establish policies and automation to move data among different storage systems

Policy-based Archive Management


E-mail, database and application archive. Focused offerings driven by efficiency of major applications

Figure 1-3 ILM elements

In the next four sections we describe each of these elements in detail: 1.3.1, Tiered storage management on page 7 1.3.2, Long-term data retention on page 9 1.3.3, Data lifecycle management on page 12 1.3.4, Policy-based archive management on page 14

1.3.1 Tiered storage management


Most organizations today seek a storage solution that can help them manage data more efficiently. They want to reduce the costs of storing large and growing amounts of data and files and maintain business continuity. Through tiered storage, you can reduce overall disk-storage costs, by providing benefits like: Reducing overall disk-storage costs by allocating the most recent and most critical business data to higher performance disk storage, while moving older and less critical business data to lower cost disk storage. Speeding business processes by providing high-performance access to most recent and most frequently accessed data. Reducing administrative tasks and human errors. Older data can be moved to lower cost disk storage automatically and transparently.

Typical storage environment


Storage environments typically have multiple tiers of data value, such as application data that is needed daily and archive data that is accessed infrequently. But typical storage configurations offer only a single tier of storage, as in Figure 1-4 on page 8, which limits the ability to optimize cost and performance.

Chapter 1. Introduction to ILM

Figure 1-4 Traditional non-tiered storage environment

Multi-tiered storage environment


A tiered storage environment is the infrastructure needed to align storage cost with the changing value of information. The tiers will be related to data value. The most critical data is allocated to higher performance disk storage, while less critical business data is allocated to lower cost disk storage. Each storage tier will provide different performance metrics and disaster recovery capabilities. Creating classes and storage device groups is an important step to configure a tiered storage ILM environment. We will provide details of this in later chapters of this book. Figure 1-5 on page 9 shows a multi-tiered storage environment.

ILM Library: Techniques with Tivoli Storage and IBM TotalStorage Products

Figure 1-5 Multi-tiered storage environment

An IBM ILM solution in a tiered storage environment is designed to: Reduce the total cost of ownership (TCO) of managing information. It can help optimize data costs and management, freeing expensive disk storage for the most valuable information. Segment data according to value. This can help create an economical balance and sustainable strategy to align storage costs with business objectives and information value. Help make decisions about moving, retaining, and deleting data, because ILM solutions are closely tied to applications. Manage information and determine how it should be managed based on content, rather than migrating data based on technical specifications. This approach can help result in more responsive management, and offers you the ability to retain or delete information in accordance with business rules. Provide the framework for a comprehensive enterprise content management strategy. Key products of IBM for tiered storage solutions and storage virtualization solutions are: IBM TotalStorage SAN Volume Controller (SVC) IBM TotalStorage DS family of disk storage - DS4x000, DS6000, and DS8000 IBM TotalStorage tape drives, tape libraries, and virtual tape solutions For details of these, see Chapter 4, Product overview on page 37.

1.3.2 Long-term data retention


There is a rapidly growing class of data that is best described by the way in which it is managed rather than the arrangement of its bits. The most important attribute of this kind of data is its retention period, hence it is called retention managed data, and it is typically kept in an archive or a repository. In the past it has been variously known as archive data, fixed
Chapter 1. Introduction to ILM

content data, reference data, unstructured data, and other terms implying its read-only nature. It is often measured in terabytes and is kept for long periods of time, sometimes forever. In addition to the sheer growth of data, laws and regulations governing the storage and secure retention of business and client information are increasingly becoming part of the business landscape, making data retention a major challenge to any institution. An example of these is the Sarbanes-Oxley Act in the US, of 2002. Businesses must comply with these laws and regulations. Regulated information can include e-mail, instant messages, business transactions, accounting records, contracts, or insurance claims processing, all of which can have different retention periods, for example, for 2 years, for 7 years, or retained forever. Moreover, some data must be kept just long enough and no longer. Indeed, content is an asset when it needs to be kept; however, data kept past its mandated retention period could also become a liability. Furthermore, the retention period can change due to factors such as litigation. All these factors mandate tight coordination and the need for ILM. Not only are there numerous state and governmental regulations that must be met for data storage, but there are also industry-specific and company-specific ones. And of course these regulations are constantly being updated and amended. Organizations need to develop a strategy to ensure that the correct information is kept for the correct period of time, and is readily accessible when it needs to be retrieved at the request of regulators or auditors. It is easy to envisage the exponential growth in data storage that will result from these regulations and the accompanying requirement for a means of managing this data. Overall, the management and control of retention managed data is a significant challenge for the IT industry when taking into account factors such as cost, latency, bandwidth, integration, security, and privacy.

Regulations examples
It is not within the scope of this book to enumerate and explain the regulations in existence today. For illustration purposes only, we list some of the major regulations and accords in Table 1-1, summarizing their intent and applicability.
Table 1-1 Some regulations and accords affecting companies Regulation SEC/NASD Intention Prevent securities fraud. Applicability All financial institutions and companies regulated by the SEC All public companies trading on a U.S. Exchange Health care providers and insurers, both human and veterinarian Financial industry

Sarbanes Oxley Act HIPAA

Ensure accountability for public firms. Privacy and accountability for health care providers and insurers. Promote greater consistency in the way banks and banking regulators approach risk management across national borders. Approval accountability.

Basel II aka The New Accord

21 CFR 11

FDA regulation of pharmaceutical and biotechnology companies

10

ILM Library: Techniques with Tivoli Storage and IBM TotalStorage Products

For example, in Table 1-2, we list some requirements found in SEC 17a-4 to which financial institutions and broker-dealers must comply. Information produced by these institutions, regarding solicitation and execution of trades and so on, is referred to as compliance data, a subset of retention-managed data.
Table 1-2 Some SEC/NASD requirements Requirement Capture all correspondence (unmodified) [17a-4(f)(3)(v)]. Store in non-rewritable, non-erasable format [17a-4(f)(2)(ii)(A)]. Verify automatically recording integrity and accuracy [17a-4(f)(2)(ii)(B)]. Duplicate data and index storage [17a-4(f)(3)(iii)]. Enforce retention periods on all stored data and indexes [17a-4(f)(3)(iv)(c)]. Search/retrieve all stored data and indexes [17a-4(f)(2)(ii)(D)]. Met by Capture incoming and outgoing e-mail before reaching users. Write Once Read Many (WORM) storage of all e-mail, all documents. Validated storage to magnetic, WORM. Mirrored or duplicate storage servers (copy pools). Structured records management. High-performance search retrieval.

IBM ILM data retention strategy


Regulations and other business imperatives, as we just briefly discussed, stress the need for an Information Lifecycle Management process and tools to be in place. The unique experience of IBM with the broad range of ILM technologies, and its broad portfolio of offerings and solutions, can help businesses address this particular need and provide them with the best solutions to manage their information throughout its lifecycle. IBM provides a comprehensive and open set of solutions to help. IBM has products that provide content management, data retention management, and sophisticated storage management, along with the storage systems to house the data. To specifically help companies with their risk and compliance efforts, the IBM Risk and Compliance framework is another tool designed to illustrate the infrastructure capabilities needed to help address the myriad of compliance requirements. Using the framework, organizations can standardize the use of common technologies to design and deploy a compliance architecture that may help them deal more effectively with compliance initiatives. For more details about the IBM Risk and Compliance framework, visit:
http://www-306.ibm.com/software/info/openenvironment/rcf/

Key products of IBM for data retention and compliance solutions are: IBM Tivoli Storage Manager, including IBM System Storage Archive Manager IBM DB2 Content Manager Family, which includes DB2 Content Manager, Content Manager OnDemand, CommonStore for Exchange Server, CommonStore for Lotus Domino, and CommonStore for SAP IBM DB2 Records Manager IBM TotalStorage DS4000 with S-ATA disks IBM System Storage DR550 IBM TotalStorage Tape (including WORM) products

Chapter 1. Introduction to ILM

11

For details on these products, see Chapter 4, Product overview on page 37. Important: The IBM offerings are intended to help clients address the numerous and complex issues relating to data retention in regulated and non-regulated business environments. Nevertheless, each clients situation is unique, and laws, regulations, and business considerations impacting data retention policies and practices are constantly evolving. Clients remain responsible for ensuring that their information technology systems and data retention practices comply with applicable laws and regulations, and IBM encourages clients to seek appropriate legal counsel to ensure their compliance with those requirements. IBM does not provide legal advice or represent or warrant that its services or products will ensure that the client is in compliance with any law.

1.3.3 Data lifecycle management


At its core, the process of ILM moves data up and down a path of tiered storage resources, including high-performance, high-capacity disk arrays, lower-cost disk arrays such as serial ATA (SATA), tape libraries, and permanent archival media where appropriate. Yet ILM involves more than just data movement; it encompasses scheduled deletion and regulatory compliance as well. Because decisions about moving, retaining, and deleting data are closely tied to application use of data, ILM solutions are usually closely tied to applications. ILM has the potential to provide the framework for a comprehensive information-management strategy, and helps ensure that information is stored on the most cost-effective media. This helps enable administrators to make use of tiered and virtual storage, as well as process automation. By migrating unused data off of more costly, high-performance disks, ILM is designed to help: Reduce costs to manage and retain data. Improve application performance. Reduce backup windows and ease system upgrades. Streamline data management. Allow the enterprise to respond to demandin real-time. Support a sustainable storage management strategy. Scale as the business grows. ILM is designed to recognize that different types of information can have different value at different points in their lifecycle. As shown in Figure 1-6 on page 13, data can be allocated to a specific storage level aligned to its cost, with policies defining when and where data will be moved.

12

ILM Library: Techniques with Tivoli Storage and IBM TotalStorage Products

Figure 1-6 ILM policies

But, sometimes, the value of a piece of information may change and data that was previously inactive and was migrated to a lower-cost storage now could be needed and should be processed in a high-performance disk. A data lifecycle management policy can be defined to move the information back to enterprise storage, making the storage cost aligned to data value, as illustrated in Figure 1-7.

Figure 1-7 Information value changes

Key products of IBM for lifecycle management are: IBM TotalStorage Productivity Center IBM TotalStorage SAN Volume Controller (SVC) IBM Tivoli Storage Manager including IBM System Storage Archive Manager IBM Tivoli Storage Manager for Space Management
Chapter 1. Introduction to ILM

13

For details of these products, see Chapter 4, Product overview on page 37.

1.3.4 Policy-based archive management


As businesses of all sizes migrate to e-business solutions and a new way of doing business, they already have mountains of data and content that have been captured, stored, and distributed across the enterprise. This wealth of information provides a unique opportunity. By incorporating these assets into e-business solutions, and at the same time delivering newly generated information media to their employees and clients, a business can reduce costs and information redundancy and leverage the potential profit-making aspects of their information assets. Growth of information in corporate databases such as Enterprise Resource Planning (ERP) systems and e-mail systems makes organizations think about moving unused data off the high-cost disks. They need to: Identify database data that is no longer being regularly accessed and move it to an archive where it remains available. Define and manage what to archive, when to archive, and how to archive from the mail system or database system to the back-end archive management system. Database archive solutions can help improve performance for online databases, reduce backup times, and improve application upgrade times. E-mail archiving solutions are designed to reduce the size of corporate e-mail systems by moving e-mail attachments and/or messages to an archive from which they can easily be recovered if needed. This action helps reduce the need for end-user management of e-mail, improves the performance of e-mail systems, and supports the retention and deletion of e-mail. The way to do this is to migrate and store all information assets into an e-business enabled content manager. ERP databases and e-mail solutions generate large volumes of information and data objects that can be stored in content management archives. An archive solution allows you to free system resources, while maintaining access to the stored objects for later reference. Allowing it to manage and migrate data objects gives a solution the ability to have ready access to newly created information that carries a higher value, while at the same time still being able to retrieve data that has been archived on less expensive media, as shown in Figure 1-8 on page 15.

14

ILM Library: Techniques with Tivoli Storage and IBM TotalStorage Products

Figure 1-8 Value of information and archive/retrieve management

Key products of IBM for archive management are: IBM Tivoli Storage Manager including IBM System Storage Archive Manager IBM DB2 Content Manager family of products IBM DB2 CommonStore family of products For details o these products, see Chapter 4, Product overview on page 37.

1.4 Standards and organizations


The success and adoption of any new technology, and any improvement to existing technology, is greatly influenced by standards. Standards are the basis for the interoperability of hardware and software from different, and often rival, vendors. Although standards bodies and organizations such as the Internet Engineering Task Force (IETF), American National Standards Institute (ANSI), and International Organization for Standardization (ISO) publish these formal standards, other organizations and industry associations, such as the Storage Networking Industry Association (SNIA), play a significant role in defining the standards and market development and direction.

1.4.1 Storage Networking Industry Association (SNIA)


The Storage Networking Industry Association is an international computer system industry forum of developers, integrators, and IT professionals who evolve and promote storage networking technology and solutions. SNIA was formed to ensure that storage networks become efficient, complete, and trusted solutions across the IT community. IBM is one of the founding members of this organization. SNIA is uniquely committed to networking solutions into a broader market. SNIA is using its Storage Management Initiative (SMI) and its Storage Management Initiative Specification (SMI-S) to create and promote adoption of a highly

Chapter 1. Introduction to ILM

15

functional interoperable management interface for multi-vendor storage networking products. SMI-S makes multi-vendor storage networks simpler to implement and easier to manage. IBM has led the industry in not only supporting the SMI-S initiative, but also using it across its hardware and software product lines. The specification covers fundamental operations of communications between management console clients and devices, auto-discovery, access, security, the ability to provision volumes and disk resources, LUN mapping and masking, and other management operations.

Data Management Forum


SNIA has formed the Data Management Forum (DMF) to focus on defining, implementing, qualifying, and teaching improved methods for the protection, retention, and lifecycle management of data.

Vision for ILM by SNIA and DMF


The Data Management Forum defines ILM as a new management practice for the datacenter. ILM is not a specific product, nor is it just about storage and data movement to low-cost disk. It is a standards-based approach to automating datacenter operations by using business requirements, business processes, and the value of information to set policies and service level objectives for how the supporting storage, compute, and network infrastructure operates. The key question that flows from this vision of ILM is How do we get there?, since these capabilities do not fully exist today. This is the work of SNIA and the Data Management Forum: To unify the industry towards a common goal, to develop the relevant standards, to facilitate interoperability, and to conduct market education around ILM. Figure 1-9 illustrates the SNIA vision for ILM.

Figure 1-9 SNIA vision for ILM

For additional information about the various activities of SNIA and DMF see its Web site at:
http://www.snia.org

1.5 IT Infrastructure Library and value of ILM


The intent of this section is introduce you to the IT Infrastructure Library (ITIL) and the value of ILM within the ITIL methodology. We begin by defining ITIL and its Service Support processes. 16
ILM Library: Techniques with Tivoli Storage and IBM TotalStorage Products

1.5.1 What is ITIL


ITIL is a process-based methodology used by IT departments to verify that they can deliver IT services to end users in a controlled and disciplined way. It incorporates a set of best practices that are applicable to all IT organizations, no matter what size or what technology is used. ITIL is used to create and deliver service management processes. These tasks are made easier by the use of service and system management tools. Over recent decades, multiple IT process models have been developed. ITIL is the only one that is not proprietary. Late 1970s: Information Systems Management Architecture (ISMA) (IBM) Late 1980s: IT Infrastructure Library V1 (ITIL) (CCTA - now OGC) 1995: IT Process Model (ITPM) (IBM) 2000: Enterprise Operational Process Framework (IBM) 2000: IT Service Management Reference Model (ITSM) (HP) 20002001: Microsoft Operations Framework (MOF) (Microsoft) 20012002: IT Infrastructure Library V2 (ITIL) (OGC) Note: ITIL is a registered trademark of the OGC. OGC is the UK Government's Office of Government Commerce. CCTA is the Central Computer and Telecommunications Agency. ITIL has a library of books describing best practices for IT services management that describe goals, activities, and inputs and outputs of processes. It is a set of best practices. ITIL has a worldwide approach to IT management and its methodology sets that specific procedures can vary from organization to organization. ITIL is not tied to any particular vendor, and IBM has been involved with ITIL since its inception in 1988.

1.5.2 ITIL management processes


The ITIL approach to creating and managing service management processes is widely recognized around the world and the adoption of its principles is clearly growing, as evidenced by new groups appearing in more countries every year. The service management disciplines are grouped into the two areas of Service Support and Service Delivery. There are now eleven basic processes used in the areas of Service Support and Service Delivery, as shown in Figure 1-10 on page 18. Since it can take a long time to implement these disciplines it is not uncommon to find only some of the processes in use initially.

Chapter 1. Introduction to ILM

17

Core ITIL Service Management Processes


Capacity Management

Provide quality, cost-effective IT services

Availability Management

Service Delivery

IT Business Continuity

Service Level Management

Financial Management

Release Management

Configuration Management

Provide stability and flexibility for IT service provision

Change Management

Service Support
Incident & Problem Management

Service Desk

Figure 1-10 ITIL processes

Now we briefly explain each component of Service Support and Service Delivery.

Service Support
The processes in the Service Support group are all concerned with providing stability and flexibility for the provisioning of IT Services.

Configuration Management
Configuration Management is responsible for registering all components in the IT service (including clients, contracts, SLAs, hardware and software components, and more) and maintain a repository of configurable attributes and relationships between the components.

Service Desk
The Service Desk acts as the main point-of-contact for the users of the service.

Incident Management
Incident Management registers incidents, allocates severity, and coordinates the efforts of the support teams to ensure timely and correct resolution of problems. Escalation times are noted in the SLA and are as such agreed between the client and the IT department. Incident Management also provides statistics to Service Level Management to demonstrate the service levels achieved.

Problem Management
Problem Management implements and uses procedures to perform problem diagnosis and identify solutions that correct problems. It registers solutions in the configuration repository, and agrees on escalation times internally with Service Level Management during the SLA negotiation. It provides problem resolution statistics to support Service Level Management.

18

ILM Library: Techniques with Tivoli Storage and IBM TotalStorage Products

Change Management
Change Management ensures that the impact of a change to any component of a service is well known, and the implications regarding service level achievements are minimized. This includes changes to the SLA documents and the Service Catalog, as well as organizational changes and changes to hardware and software components.

Release Management
Release Management manages the master software repository and deploys software components of services. It deploys changes at the request of Change Management, and provides management reports on the deployment.

Service Delivery
The processes in the Service Delivery group are all concerned with providing quality, cost-effective IT services.

Service Level Management


The purpose of Service Level Management is to manage client expectations and negotiate Service Delivery Agreements. This involves finding out the client requirements and determining how these can best be met within the agreed budget. Service Level Management works together will all IT disciplines and departments to plan and ensure delivery of services. This involves setting measurable performance targets, monitoring performance, and taking action when targets are not met.

Financial Management for IT Services


Financial Management registers and maintains cost accounts related to the usage of IT services. It delivers cost statistics and reports to Service Level Management to assist in obtaining the right balance between service cost and delivery. It assists in pricing the services in the Service Catalogue and Service Level Agreements.

IT Service Continuity Management


Service Continuity Management plans and ensures the continuing deliveryor minimum outageof the service by reducing the impact of disasters, emergencies, and major incidents. This work is done in close collaboration with the companys business continuity management, which is responsible for protecting all aspects of the companys businessincluding IT.

Capacity Management
Capacity Management is responsible for planning and ensuring that adequate capacity with the expected performance characteristics is available to support the Service Delivery. It delivers capacity usage, performance, and workload management statistics, as well as trend analysis to Service Level Management.

Availability Management
Availability Management is responsible for planning and ensuring the overall availability of the services. It provides management information in the form of availability statisticsincluding security violationsto Service Level Management. This discipline may also include negotiating underpinning contracts with external suppliers, and a definition of maintenance windows and recovery times.

Chapter 1. Introduction to ILM

19

1.5.3 ITIL and ILM value


ILM is a service-based solution with policies and processes. The ITIL methodology has the processes needed for delivery and support storage services to manage the lifecycle of information. The ILM components tiered-storage, archive management, long-term retention, and data lifecycle management, aligned to ITIL processes, are a powerful solution for IT organizations to manage their data. By implementing ILM within the ITIL methodology, they will be able to achieve its objectivesenabling the management of data lifecycle, and providing quality, stability, flexibility, and cost-effective IT services.

20

ILM Library: Techniques with Tivoli Storage and IBM TotalStorage Products

Chapter 2.

ILM within an On Demand storage environment


This chapter explains how ILM fits in to the strategy for IBM On Demand Business. It briefs on the pillars of Information On Demand, that is: Infrastructure Simplification Business Continuity Information Lifecycle Management It talks about the IBM On Demand storage environment, which helps organizations achieve Information On Demand and discusses ILM components of this environment.

Copyright IBM Corp. 2006. All rights reserved.

21

2.1 Information On Demand


In todays ever more competitive and growing business environment, information is an increasingly valuable, but costly organizational asset. The volume of information is growing very rapidly in most organizations. And with this, the need to protect and manage information also continues to increase. Organizations are seeking to reduce costs, improve efficiency, and increase effectiveness by aligning IT investments according to information value and business needs. This is a step towards Information On Demand. With Information On Demand, business can respond with flexibility and speed to client requirements and market opportunity. Getting there involves three aspects: Infrastructure Simplification (IS): Simplification of the underlying IT infrastructure and its management to lower the cost and complexity Business Continuity (BC): Assuring security and durability of information Information Lifecycle Management (ILM): Efficiently managing information over its lifecycle

Figure 2-1 IS, BC, and ILM

2.1.1 Infrastructure Simplification


Infrastructure simplification is a process by which organizations contain expenses, enable business growth, and reduce operational risks by optimizing IT resources. Simplified infrastructures hold the promise of improved system optimization and Total Cost of Ownership (TCO), higher personnel productivity, and greater application availability through infrastructure resiliency. IBM products are designed to help clients obtain these benefits through consolidation, virtualization, and automated management. Once simplified, the infrastructure can be better managed for lower cost and with fewer errors. Note: For more information about Infrastructure Simplification, see:
http://www-1.ibm.com/servers/storage/solutions/is/

2.1.2 Business Continuity


The business climate in today's on demand era is highly competitive. Clients, employees, suppliers, and business partners expect to be able to tap into your information at any hour of the day from any corner of the globe. If you have continuous business operations, then

22

ILM Library: Techniques with Tivoli Storage and IBM TotalStorage Products

people can get what they need from your businesshelping bolster your success and competitive advantage. Thus downtime is unacceptable today. Businesses must also be increasingly sensitive to issues of client privacy and data security, so that vital information assets are not compromised. To achieve all this, you need a comprehensive Business Continuity plan for your business. As shown in Figure 2-2, Business Continuity can be achieved with: High Availability Continuous Operations Disaster Recovery High Availability is achieved by means of fault tolerant, failure resistant infrastructure supporting continuous application processing. Continuous Operations imply non-disruptive backups and system maintenance coupled with continuous availability of applications. Disaster Recovery means protection against unplanned outages such as natural disasters through reliable and predictable recovery methods.

Figure 2-2 Business Continuity

Note: For more information about Business Continuity, see:


http://www-1.ibm.com/servers/storage/solutions/business_continuity/

2.1.3 Information Lifecycle Management


In Chapter 1, Introduction to ILM on page 3, we defined ILM as a process for managing information through its lifecycle, from conception until disposal, in a manner that optimizes storage and access at the lowest cost. The most efficient ILM strategy for a business manages information according to its value. For small and medium sized enterprises, predicting storage needs and controlling costs can be especially challenging as the business grows. IBMs unique experience with the broad range of ILM technologies and its broad portfolio of offerings and solutions, including offerings in Tivoli Storage Management software,

Chapter 2. ILM within an On Demand storage environment

23

TotalStorage hardware, TotalStorage Open Software, and DB2 Information Management software, can help provide businesses with the best solutions to manage their information throughout its lifecycle. Note: For more information about Information Lifecycle management, refer to:
http://www-1.ibm.com/servers/storage/solutions/ilm/

2.2 IBM and ILM


At IBM, ILM is being addressed by the convergence of several technologies, as shown in Figure 2-3.

Figure 2-3 Convergence of technologies

Tivoli software provides automated storage management functions via TotalStorage Productivity Center and Tivoli Storage Manager. They help effectively manage the growth of information. IBM TotalStorage (and IBM System Storage) provides hardware offerings like DR550, DS8000, DS6000, and DS4000, as well as tape solutions to enable ILM implementations. IBM TotalStorage also provides offerings in the area of virtualization like SAN Volume Controller (SVC), which help clients use their storage environments most optimally. IBM also offers DB2s integrated content management software solution to manage and archive unstructured data. IBM has a very long history in the area of Information Lifecycle Management with a variety of products. IBM has marketed tape drives and libraries since 1952. IBM has been selling various disk storage systems including DS4000 and Enterprise Storage Server (ESS) since 1957. The worlds first Hierarchical Storage Manager was developed by IBM in 1974. Tivoli 24
ILM Library: Techniques with Tivoli Storage and IBM TotalStorage Products

Storage Manager, which provides backup, archive, and lifecycle management for a wide range of operating platforms, has been available for more than 10 years. In the area of content management, IBMs DB2 Content Manager has been available since 1988. IBM provides extensive planning and solutions services through its IBM Global Services organizations to assist businesses in developing their ILM strategies, providing assessments, helping businesses meet the challenges of compliance, etc. Thus IBM is a complete one-stop storage solution with its wide range of products combined with services, education, and financing.

2.3 IBM Information On Demand environment


To achieve Information On Demand via Infrastructure Simplification, Business Continuity, and Information Lifecycle Management as explained above, IBM has developed the Information On Demand environment. This is shown in Figure 2-4. In an ILM perspective, we are focussed mainly on the bottom element, Information Assets and Systems; however, the diagram helps to position this in the wider IT and business environment.

Figure 2-4 Information On Demand storage environment

Figure 2-5 on page 26 shows the details of the Information Assets and Systems box (Figure 2-5 on page 26).

Chapter 2. ILM within an On Demand storage environment

25

Figure 2-5 Information Assets and Systems

Here we talk about the building blocks of this environment, and then in 2.4, Supporting ILM through On Demand storage environment on page 28, we discuss the components that belong specifically to Information Lifecycle Management.

Systems
The Systems layer is the hardware infrastructure for the information assets. It includes the servers, networking, and storage systems. In the storage arena, IBM provides a complete range of hardware, providing flexibility in the choice of service quality and cost structure. The products support a common, industry standard management interface (SMI-S). For more information about this, see the Web site:
http://www.ibm.com/servers/storage/

Resource virtualization
Resource virtualization products are designed to improve the flexibility and utilization of the hardware. Resource virtualization includes virtualization of both the servers and the storage. For server virtualization, IBM provides Virtual Machines, Hypervisor, Virtual Ethernet, and Virtual I/O. Storage virtualization includes tape virtualization (Virtual Tape Server and the TS7000 series), disk virtualization (SAN Volume Controller), and array partitioning (for example, LPARs on the IBM TotalStorage DS8000). Disk virtualization works by pooling the storage volumes, files, and file systems into a single logical repository of capacity for centralized management. This repository can include storage capacity from multiple vendors and platforms in heterogeneous environments. Virtualization products also comply with SMI-S. For more information about this, see the Web site:
http://www.ibm.com/servers/storage/software/virtualization/index.html

Infrastructure management
infrastructure management is used to provide a single point of management and automation and is also considered from both the server and storage perspective. For servers, IBM Director and Enterprise Workload Manager provide the control point and workload management function, respectively. For storage, infrastructure management is designed to make resource sharing possible across the enterprise, including heterogeneous networks. It interacts with the bottom layers in the environment like virtualization and hardware using common and open interfaces. It helps empower administrators by providing an integrated view of the entire storage environment, including software and hardware. Storage infrastructure management provides insight into the historic, operational, and predictive analytics of the storage environment that, in turn, can help administrators improve storage capacity and network utilization, and help avoid business outages. It also supports 26
ILM Library: Techniques with Tivoli Storage and IBM TotalStorage Products

policy-based automation, such as capacity provisioning, performance optimization, and data management, helping to provide outstanding business agility. For information about products related to storage infrastructure management, see the Web site:
http://www.ibm.com/servers/storage/software/center/index.html

Retention and Lifecycle Management


This is what we cover in detail in this book. It includes archive management for files, e-mail, databases, and applications, as well as HSM.

Archive Management
Archive management gives complete solutions designed to help enterprises archive, retain, and manage data to help satisfy regulatory, legal, and other business requirements. Archive management products are interoperable with many content management products available in the marketplace, including the IBM DB2 Content Management family. For more information about product offerings in this area, see the Web site:
http://www.ibm.com/software/tivoli/products/storage-mgr-data-reten/

HSM
IBM Open Software Family Hierarchical Storage Management (HSM) capabilities provide a way to capture low-activity or inactive data and feed it into a hierarchy of lower cost or tiered storage. This helps control data storage growth and costs. Automated, policy-based capabilities determine where data should be stored, based on factors such as its criticality to the business, how accessible and available it should be, and the cost structures of available devices. Interoperability with IBM Content Management and Records Management products allows enterprise data to be moved from one medium to another with efficiency while helping avoid disruptions in service. For more information about product offerings in this area, see the Web site:
http://www.ibm.com/software/tivoli/products/storage-mgr-space/

Business Continuity
Business Continuity is the process of maintaining availability of IT services, including timely recovery in the event of failure. Similarly to Retention and Lifecycle Management, and Infrastructure Management, Business Continuity is also considered from a server and storage perspective. Servers use various clustering techniques to maintain availability. From a storage perspective, Advanced Copy Services (or Replication), including FlashCopy, Metro Mirror, and Global Mirror, also provide service availability. IBM also offers a range of product for managing recovery.

Advanced Copy Services


Advanced copy and mirroring functions are designed to help reduce application downtime and provide realtime remote mirroring for disaster recovery. For more information about this, see the Web site: http://www.ibm.com/servers/storage/disk/enterprise/advanced_copy.html

Recovery Management
Recovery management solutions are designed to quickly and reliably recover enterprise data when needed, utilizing centralized Web-based management, intelligent backup and archiving (with minimal or no impact on application availability), and automated policy-based data migration copy services. For more information about product offerings in this area, see the Web site:
http://www.ibm.com/software/tivoli/products/storage-mgr/

Chapter 2. ILM within an On Demand storage environment

27

2.4 Supporting ILM through On Demand storage environment


At the heart of our delivery of Information Lifecycle Management solutions is the Information On Demand environment, as described in the previous section. In support of Information Lifecycle management, the Information On Demand storage environment delivers: A complete hardware infrastructure offering different media types with different qualities of service and cost structures Storage Infrastructure Management to help IT managers categorize their data Virtualization software to pool the different cost/quality-of-service storage hardware and provide policies that automatically place the different categories of data on the right cost of storage Hierarchical Storage Management software to control storage growth Archive Management software to help manage the cost of storing bookkeeping and compliance data for long periods of time Content Management software for integrating and delivering unstructured business information

28

ILM Library: Techniques with Tivoli Storage and IBM TotalStorage Products

Chapter 3.

Implementing ILM
This chapter gives an overview of the logical stages involved in implementing ILM in any organization. We also mention some IBM products that prove helpful during these stages. For detailed information about these products, please refer to Chapter 4, Product overview on page 37. A more detailed consideration of this process follows in the next chapters. Finally, it gives information about IBMs ILM consulting and services for ILM, including the IBM four-step process for an ILM setup.

Copyright IBM Corp. 2006. All rights reserved.

29

3.1 Logical stages in ILM implementation


In simple terms, an ILM implementation mainly involves identifying the right data, putting it on the right media at the right time until its ultimate disposal. The whole implementation process can be subdivided into three main stages, as follows: Assessment and planning Execution Monitoring This section describes these three stages in detail.

3.1.1 Assessment and planning


Assessment and planning is the first and the most important stage in the ILM implementation. This is a sequential process that involves different phases like gatherings service levels, classifying data, finding information classes, designing storage tiers, and deciding ILM policies. This section explains these phases in more detail.

Gathering service levels


This phase actually lets you understand what are the requirements or objectives of your organizations data. Service levels can be defined in terms of availability, performance, recoverability, accessibility, security, support, and billing. The service levels may be expressed using language like: I need RAID 5 level of availability for all of my business data. I need 500 MB per second of performance for my media data. I have an Recovery Time Objective (RTO) of 25 minutes for or all SAP data. Data that is less than 30 days old needs to be on RAID 5, while all other data can be on RAID 1. Above are just some examples of service level requirements. These can vary from organization to organization, depending upon the types of data and its value to the business. Thus, in this step we gather all the service levels required from different departments of the organization for their various types of data.

Data classification
Data classification is a very important phase in ILM strategy because a business does not want to spend money protecting and storing data that is not needed or used, or is not business critical. Data classification can be defined as sorting of like data, based on consumer wants and needs, into data classes that provide the ability to simplify and comprehend the business process requirements. Once we study the data from different angles or dimensions, such as the size of the files, access patterns, access times, types of the files, age of the files, etc., we can categorize it into different classes as per its value to the business. Data classification may differ from organization to organization, depending upon the kind of data they have. The classes might be application specific like SAP data, e-mail data, etc. They also can be department specific like HR data, development data, etc. One example of data classification is shown in Figure 3-1 on page 31. Here valid data is all the business-critical data. Stale or orphan data is data that is no longer required but is still being 30
ILM Library: Techniques with Tivoli Storage and IBM TotalStorage Products

stored. Non-business files can be employees personal data. All other terms used in the classification are self explanatory. In a classification like this, we want to manage the valid data and the system files, while minimizing, archiving, or eliminating all the other data. We will show other examples of data classes in later chapters in this book.

Figure 3-1 Data classification

IBM TotalStorage Productivity Center for Data (TPC for Data) is very helpful in the data classification process. See Chapter 4, Product overview on page 37, for more information about this product.

Finding information classes


In this phase, we study the different service levels and data classes that were identified in the previous phases, so that we can define information classes that have similar service levels. Thus, information classes are the groups of data classes that have similar service level requirements. If we try to derive information classes from the data classes mentioned in Figure 3-1, we may come up with the classes named A, B, and C, with decreasing service level objectives. Information class A: All valid data will go in this. Information class B: All system files, redundant application data, log files, dump files, and temp files will go in this. Information class C: All duplicate data, stale/orphan data, and non-business data will go in this. Figure 3-2 on page 32 shows this mapping.

Chapter 3. Implementing ILM

31

Figure 3-2 Information classes

Designing storage tiers


This phase gets inputs from all the above phases. Here we need to consider the types of storage that are available, budgetary constraints for new purchasing, and data growth trends that will affect future demands for the storage. After considering all these factors, we design storage tiers for the in-scope environment. These tiers can exploit the attributes of different device types like cost, performance, concurrent access, etc. Storage tiers are often expressed as follows: Platinum Mirrored - Potentially remotely, online recovery points High performance, high availability, high-end SAN fabric High duty cycles, frequently backed up Example - Enterprise disk, disk subsystem replication, director class switches Mirrored, less frequent online recovery points than platinum Less cache Less frequently backed up Example - Enterprise disk, disk subsystem replication, director class switches RAID, but not mirroring Lower performance, less redundancy for components Backed up once a day Example - Mid-range disk, Fibre Channel SAN with lower cost switches

Gold

Silver

Bronze SATA disks Fibre channel or iSCSI SAN Figure 3-3 on page 33 also shows how the quality of service and cost can be mapped against storage tiers.

32

ILM Library: Techniques with Tivoli Storage and IBM TotalStorage Products

Figure 3-3 Storage tiers

As shown above, we categorize the available (or planned) storage in to the hierarchy to exploit its features and optimize the usage to reduce the cost.

Deciding ILM policies


This is the final phase in the assessment and planning stage of implementing ILM. Here we map information classes to storage tiers. We define the policies for active (frequently accessed) as well as inactive data. Policies for active data involve data placement on various tiers, and data movement or migration across tiers. Policies for inactive data involve backups, archivals, retention, or destruction, as per regulatory compliance. Policies can be defined addressing file/database ownerships, retrieval from archive, quotas, threshold reporting, compliance, deletion/backup/recovery of files, or databases. Policies may look like: Data on tier 3 will get backed up every day. Move information class A data, which is older than 30 days from tier1 to tier 2. Destroy all information class B data that is older than one year. Clearly defined policies are essential to enable easier data path management. The success of the whole ILM implementation is very much dependent on well defined policies.

3.1.2 Execution
After assessment and planning, in the execution stage we actually implement whatever is designed and planned. The main tasks involved are: Implementation of the storage tiers Enforcing policies We first simplify the storage infrastructure and consolidate it. We pool the storage according to its classes of service. We can also use block level virtualization to create logical volumes from physical disks. At the end we come up with the storage tiers, as designed in the planning stage.

Chapter 3. Implementing ILM

33

IBMs wide range of disk and tape solutions can be used to design the storage tiers. The IBM SAN Volume Controller provides block level virtualization. For more information about these products please see Chapter 4, Product overview on page 37. Once all the tiers are in place, we enforce the ILM policies. Products like Tivoli Storage Manager provide the facilities to enforce these policies automatically. Please see Chapter 4, Product overview on page 37, for more information about these products.

3.1.3 Monitoring
The last but not least stage is monitoring. In this stage, we monitor the in-scope environment for major changes in data types, sizes, and access trends. These changes might reflect the changes in the data classes or information classes in the future. We also monitor whether the performance given by the storage tiers are as expected. This stage actually deals with verifying the ILM implementation for its expected returns. IBM TotalStorage Productivity Center is a very useful product for this stage.

3.1.4 Flow diagram


The flow diagram in Figure 3-4 gives a complete view of the logical stages of ILM implementation explained in the above sections.

Gathering Service Levels

Data Classification

Finding Information Classes

Designing Storage Tiers

Assessment and Planning

Deciding ILM Policies

Execution

Monitoring

Figure 3-4 Flow diagram

3.2 IBM ILM consulting and services


ILM offers organizations a methodology for evolving existing data-centric storage infrastructures into business-centric infrastructures that can more efficiently manage data according to its value to the organization. Often, IT departments in the organizations do not 34
ILM Library: Techniques with Tivoli Storage and IBM TotalStorage Products

have the in-house expertise to design ILM storage infrastructure solutions. In this situation, IBM Global Services provides a comprehensive ILM storage infrastructure design offering that provides IT departments with the plan they need to help transform their existing data storage infrastructure into a robust, business-centric infrastructure that takes optimal advantage of ILM and provides greater value to the organization. IBM takes into consideration the following factors while designing solutions: Data mix Current storage infrastructure Data retention requirements ILM storage infrastructure goals and objectives Short and long term opportunities for storage wins It then provides a custom-tailored design for ILM storage infrastructure implementation that can help: Optimize existing storage efficiencies. Manage the costs of storage infrastructure changes. Increase compliance and ROI. Validate the implementation tools under consideration. Ensure a smooth rollout. The solution offering process involves four main steps, as follows.

Step 1: ILM data collection


Collect the information/data related to the in-scope environment, both automatically and manually.

Step 2: Analysis and data classification


Define the classes of data, the ILM policies for each class, and the requirements on how data in each class should be stored throughout its lifecycle (capacity, performance, availability, protection, retention). Identify opportunities for so-called quick wins, for example, data cleanup, rationalized space usage, and adaptive capacity plans.

Step 3: Methodology and architecture definition


Define the model and architecture for the storage technology, the storage management processes and organization that support the data classes requirements and the ILM storage infrastructure policies. Establish a preliminary business case for ILM.

Step 4: Solution roadmap and recommendations


Establish a decision model, apply the defined architecture to known ILM storage infrastructure vendor solutions, and select the "best-fit" solution. Identify existing gaps between current and target environments and then create a complete program for change relative to the deployment and implementation of the selected ILM storage infrastructure solution. At the end of this four-step process the client organization will have: Recommendations for optimizing existing storage for short-term "wins"

Chapter 3. Implementing ILM

35

A custom blueprint for more effective ILM storage infrastructure and variable-cost storage hierarchy A description of corresponding storage services and classes of services A roadmap that shows the IT department how to get from where it is to where it wants to be A validated business case that anticipates short-term and long-term ROI For more information about IGS offerings for ILM, contact your IBM representative.

36

ILM Library: Techniques with Tivoli Storage and IBM TotalStorage Products

Chapter 4.

Product overview
This chapter gives an overview of various IBM products that aid clients in the process of implementing ILM.

Copyright IBM Corp. 2006. All rights reserved.

37

4.1 Summary of IBM products for ILM


We have seen in Chapter 2, ILM within an On Demand storage environment on page 21, that ILM can be a key step towards evolving to an On Demand storage environment. IBM has solutions for every aspect of ILM. As data classification is a very important step in ILM implementation, IBM TotalStorage Productivity Center for Data provides a rich functionality for enterprise-wide data reporting. For building storage tiers, IBM Tivoli Storage Manager and IBM TotalStorage SAN Volume Controller (backed by a variety of IBM and other storage systems) are the main contributors. For the storage infrastructure itself, IBM also offers a wide range of disk storage systems starting from entry-level and mid-range disk systems to enterprise disk systems, plus tape systems ranging from a single tape drive to tape libraries that store petabytes of data. For matching the information classes to storage tiers and applying policies, the following products are indicated: IBM Tivoli Storage Manager for Space Management IBM TotalStorage SAN Filesystem IBM Tivoli Storage Manager Archive IBM System Storage Archive Manager IBM DB2 CommonStore IBM System Storage DR550 In the area of content management, IBM has products like IBM DB2 Content Manager and IBM DB2 Records Manager. This chapter gives a brief overview of these products, which form the basis of an ILM implementation.

4.2 TotalStorage Productivity Center for Data


As a component of the IBM TotalStorage Productivity Center, IBM TotalStorage Productivity Center for Data is designed to help you improve your storage ROI by: Improving storage utilization Enabling intelligent capacity planning Helping you manage more storage with the same staff Supporting high application availability From an ILM perspective, this is a very helpful tool that provides various reports on the enterprise data. These reports play a major role in the data classification process. It also proves useful in the post-monitoring process once the initial ILM implementation is complete.

4.2.1 Overview
IBM TotalStorage Productivity Center (TPC) for Data helps discover, monitor, and create enterprise policies for disks, storage volumes, file systems, files, and databases. Knowing where all your storage is located and the properties of your data places you in a better position to act intelligently on your data. Architected for efficiency and ease-of-use, IBM TotalStorage Productivity Center for Data uses a single agent per server to provide detailed information without a high consumption of network bandwidth or CPU cycles.

38

ILM Library: Techniques with Tivoli Storage and IBM TotalStorage Products

Figure 4-1 shows the concept of storage resource management from a lifecycle perspective. The idea is to establish a base understanding of the storage environment, with an emphasis on discovering areas where simple actions can deliver rapid return on investment. Ideally, the process should identify potential areas of exposure, evaluate the data residing on the servers, set up control mechanisms for autonomic management, and start the capacity planning process by predicting growth.

Figure 4-1 Storage resource management lifecycle

TPC for Data monitors storage assets, capacity, and usage across an enterprise. It can look at: Storage from a host perspective: Manage all the host-attached storage, capacity, and resources attributed to file systems, users, directories, and files, as well as the view of the host-attached storage from the Storage Subsystem perspective. Storage from an application perspective: Monitor and manage the storage activity inside different database entities including instance, tablespace, and table. Storage utilization: Provide chargeback information so that storage usage is justified or accounted for. TPC for Data provides over 300 standardized reports (and the ability to customize reports) about file systems, databases, and storage infrastructure. These reports provide the storage administrator information about: Assets Availability Capacity Usage Usage violation Backup With this information, the storage administrator can: Discover and monitor storage assets enterprise-wide. Report on enterprise-wide assets, files, file systems, databases, users, and applications. Provide alerts (set by the user) on issues such as capacity problems and policy violations. Support chargebacks by usage or capacity.

4.2.2 Key aspects


The key aspects are discussed in this section.

Chapter 4. Product overview

39

Basic menu
Figure 4-2 shows the IBM TotalStorage Productivity Center - initial screen, which displays when the application is started. It shows a quick summary of the overall health of the storage environment, and can highlight potential problem areas for further investigation.

Figure 4-2 First screen

This screen contains four viewable areas, which cycle among seven predefined panels. It shows the following statistics: Enterprise-wide summary: The Enterprise-wide Summary panel shows statistics accumulated from all the agents. The statistics are total file system capacity available, total file system capacity used, total number of monitored servers, total number of users, total number of disks, etc. File system used space: Displays a pie chart showing the distribution of used and free space in all file systems. Users consuming the most space: By default, displays a bar chart of the users who are using the largest amount of file system space. Monitored server summary: Shows a table of total disk file system capacity for the monitored servers, sorted by operating system type. File systems with least free space percentage: Shows a table of the most full file systems, including the percent of space free, the total file system capacity, and the file system mount point. Users consuming the most space report: Shows the same information as the Users Consuming the Most Space panel, but in a table format. 40

ILM Library: Techniques with Tivoli Storage and IBM TotalStorage Products

Alerts pending: Shows active alerts that have been triggered but are still pending.

Discover and monitor information


TPC for Data uses three methods to discover information about the assets in the storage environment: Pings, probes, and scans. These are typically set up to run automatically as scheduled tasks. You can define different ping, probe, and scan jobs to run against different Agents or groups of Agents (for example, to run a regular probe of all Windows systems), according to your particular requirements.

Pings
A ping is a standard ICMP ping that checks registered agents for availability. If an agent does not respond to a ping (or a predefined number of pings) you can set up an alert to take some action. The actions could be one, any, or all of: SNMP trap Notification at login Entry in the Windows event log Run a script Send e-mail to a specified user or users Pings are used to generate Availability Reports, which list the percentage of times a computer has responded to the ping. An example of an Availability Report for Ping is shown in Figure 4-3.

Figure 4-3 Availability report

Chapter 4. Product overview

41

Probes
Probes are used to gather information about the assets and system resources of monitored servers, such as processor count and speed, memory size, disk count and size, file systems, and so on. Probes also gather information about the files, instances, logs, and objects that make up the monitored databases. Data collected by probes is used in the Asset Reports. Figure 4-4 shows an Asset Report for the detected computer.

Figure 4-4 Asset Report of a computer

Scans
The scan process is used to gather statistics about usage and trends of the server storage. Scans also gather information about the storage usage and trends within the monitored databases. Data collected by the scan jobs are tailored by Profiles. Results of scan jobs are stored in the enterprise repository. This data supplies the data for the capacity, usage, usage violations, and backup reporting functions. These reports can be scheduled to run regularly, or they can be run ad hoc by the administrator.

Profiles limit the scanning according to the parameters specified in the profile. Profiles are
used in scan jobs to specify what file patterns will be scanned, what attributes will be gathered, what summary view will be available in reports, and the retention period for the statistics. TPC for data supplies a number of default profiles that can be used, or additional profiles can be defined. Some of these include: Largest files: Gathers statistics on the largest files Largest directories: Gathers statistics on the largest directories Most obsolete files: Gathers statistics on most obsolete files Figure 4-5 on page 43 shows the sample report of largest files by computer. 42
ILM Library: Techniques with Tivoli Storage and IBM TotalStorage Products

Figure 4-5 Largest files by computer

Reporting
Reporting in TPC for Data is very rich, with over 300 predefined views, and the capability to customize those standard views. Reports can be scheduled or run as needed. You can also create your own individual reports according to particular needs and set them to run as needed or in batch (regularly). Reports can be produced in table format or in a variety of charting (graph) views. You can export reports to CSV or HTML formats for external usage. Reports are generated against data already in the repository. A common practice is to schedule scans and probes just before running reports. Reporting can be done at almost any level in the system, from the enterprise down to a specific entity and any level in between. Reports can be produced either system-wide or grouped into views, such as by computer or operating system type. TPC for Data allows you to group information about similar entities (disk, file systems, and so on) from different servers or business units into a Summary Report so that business and technology administrators can manage an enterprise infrastructure. Or you can summarize information from a specific server. The flexibility and choice of configuration is entirely up to the administrator. Major reporting categories for file systems and databases are:

Assets Reporting uses the data collected by probes to build a hardware inventory of the storage assets. You can then navigate through a hierarchical view of the assets by drilling down through computers, controllers, disks, file systems, directories, and exports. For

Chapter 4. Product overview

43

databases, information about instances, databases, tables, and data files is presented for reporting.

Availability Reporting shows responses to ping jobs, as well as computer uptime. Capacity Reporting shows how much storage capacity is installed, how much of the installed capacity is being used, and how much is available for future growth. Reporting is done by disk and file system, and for databases, by database. Usage Reporting shows the usage and growth of storage consumption, grouped by file
system, and computers, individual users, or enterprise-wide.

Usage Violation Reporting shows violations to the corporate storage usage policies, as
defined through TPC for Data. Violations are either of Quota (defining how much storage a user or group of users is allowed) or Constraint (defining which file types, owners, and file sizes are allowed on a computer or storage entity). You can define what action should be taken when a violation is detected, for example, SNMP trap, e-mail, or running a user-written script.

Backup Reporting identifies files which are at risk because they have not been backed up.

Alerts
An alert defines an action to be performed if a particular event occurs or condition is found. Alerts can be set on physical objects (computers and disks) and/or logical objects (file systems, directories, users, databases, and operating system user groups). Alerts can tell you, for instance, if a disk has a lot of recent defects, or if a file system or database is approaching capacity. Alerts on computers and disks come from the output of probe jobs and generate an alert for each object that meets the triggering condition. If you have specified a triggered action (running a script, sending an e-mail, and so on), then that action will happen if the condition is met. Alerts on file systems, directories, users, and operating system user groups come from the combined output of a probe and a scan. Again, if you have specified an action, that action will be performed if the condition is met. An Alert will register in the Alert log, plus you can also define one, some, or all of the following actions to be performed in addition: Send an e-mail indicating the nature of the alert. Run a specific script with relevant parameters supplied from the content of the Alert. Make an entry into the Windows event log. Pop up next time the user logs in to IBM TotalStorage Productivity Center for Data. Send an SNMP trap.

4.2.3 Product highlights


Following are the main product highlights of TPC for Data.

Enterprise reporting
This supports over 300 comprehensive enterprise-wide reports designed to help administrators make intelligent capacity management decisions based on current and trended historical data.

Policy-based management
Policy-based management enables administrators to set thresholds, while monitoring can detect when thresholds have been exceeded and issue an alert or initiate a predefined action.

44

ILM Library: Techniques with Tivoli Storage and IBM TotalStorage Products

Automated file system extension


This feature enables administrators to ensure application availability by providing on demand storage for file systems.

Direct Tivoli Storage Manager integration


This allows administrators to initiate a Tivoli Storage Manager archive or backup via a constraint or directly from a file report, simplifying policy-based actions.

The Database capacity reporting feature


This is designed to enable administrators to see how much storage is being consumed by users, groups of users, and OSs within the database application.

Chargeback capabilities
Chargeback capabilities are designed to provide usage information by department, group, or user, making data owners aware of and accountable for their data usage.

Advanced provisioning
TPC for Data is an integral piece of the IBM TotalStorage Productivity Center with Advanced Provisioning solution and is designed to allow you to automate capacity provisioning capabilities through automated workflows. Note: For more information about IBM TotalStorage Productivity Center for Data refer to the IBM TotalStorage Productivity Center V2.3: Getting Started, SG24-6490, and to the Web site:
http://www-03.ibm.com/servers/storage/software/center/data/index.html

4.3 SAN Volume Controller


The SAN Volume Controller (SVC) is designed to simplify the storage infrastructure by enabling changes to the physical storage with minimal or no disruption to applications. SAN Volume Controller implements virtualization by combining the capacity from multiple disk storage systems into a single storage pool, which can be managed from a central point. This is simpler to manage, and helps to increase utilization and improve application availability. SAN Volume Controller's extensive support for non-IBM storage systems, including EMC, HP, and HDS, enables a tiered storage environment to better allow matching of the cost of the storage to the value of the data. It also allows advanced copy services to be applied across storage systems from many different vendors to help further simplify operations.

4.3.1 Overview
Storage area networks (SANs) enable companies to share homogeneous storage resources across the enterprise. But for many companies, information resources are spread over a variety of locations and storage environments, often with products from different vendors. To achieve higher utilization of resources, companies need to be able to share their storage resources from all of their environments, regardless of vendor. While storage needs rise rapidly, and companies operate on lean budgets and staffing, the best solution is one that leverages the investments already made and that provides growth when needed. IBM TotalStorage SAN Volume Controller (SVC) offers a solution that can help strengthen existing SANs by increasing storage capacity, efficiency, uptime, administrator productivity, and functionality.

Chapter 4. Product overview

45

The SAN Volume Controller combines hardware and software into a comprehensive, modular appliance. Using xSeries server technology in highly reliable clustered pairs, SVC is designed to avoid single points of failure. SAN Volume Controller software is a highly available cluster optimized for performance and ease of use. The following are the key features of the IBM TotalStorage SAN Volume Controller.

Storage utilization
The SAN Volume Controller is designed to help increase the amount of storage capacity that is available to host applications. By pooling the capacity from multiple disk arrays within the storage area network (SAN), it helps enable host applications to access capacity beyond their island of SAN storage.

High scalability
An I/O group is formed by combining a pair of high-performance, redundant Intel processor-based servers. Each I/O group contains 4 GB of mirrored cache memory. Highly available I/O groups are the basic configuration of a cluster. Adding another I/O group can help increase cluster performance and bandwidth. At its base level, a SAN Volume Controller contains a single I/O group. It can scale up to support four I/O groups. For every cluster, the SAN Volume Controller supports up to 4096 virtual disks.

Personnel productivity
The SAN Volume Controller is designed to help improve administrator productivity by enabling management at the cluster level, and it is designed to provide a single point of control over all the storage it manages. The SAN Volume Controller provides a comprehensive, easy-to-use graphical interface for central management. This simple interface incorporates the Storage Management Initiative Specification (SMI-S) application programming interface (API) and further demonstrates the IBM commitment to open standards. With this single interface, administrators can perform configuration, management, and service tasks over storage volumes from disparate storage controllers. The SAN Volume Controller allows administrators to map disk storage volumes to virtual pooled volumes to help better use existing storage.

Application availability
By pooling storage into a single reservoir, the SAN Volume Controller insulates host applications from physical changes to the storage pool, so that applications continue to run without disruption. The SAN Volume Controller includes a dynamic data-migration function to help administrators migrate storage from one device to another, without taking it offline. This helps administrators reallocate and scale storage capacity without disrupting applications. The solution supports both local area network (LAN)-free and server-free backups. Through the IBM FlashCopy function, administrators can copy point-in-time mission critical data to lower cost storage devices, such as Serial Advanced Technology Attach (SATA) devices. The SAN Volume Controller also supports IBM TotalStorage Multipath Subsystem Device Driver (SDD). This mature multipathing software provides failover and load-balancing capabilities.

Tiered storage
In most IT environments, inactive data makes up a high proportion, if not the bulk of, the total stored data. SAN Volume Controller is designed to help administrators control storage growth

46

ILM Library: Techniques with Tivoli Storage and IBM TotalStorage Products

more effectively by moving low-activity or inactive data into a hierarchy of lower-cost storage. And administrators can free disk space on higher-value storage for more-important, active data. The SAN Volume Controller is designed to enable you to match the cost of storage to the value of data.

4.3.2 Virtualization
SVC provides block aggregation and volume management for disk storage within the SAN. In simpler terms, this means that the SVC manages a number of back-end storage controllers and maps the physical storage within those controllers to logical disk images that can be seen by application servers and workstations in the SAN. The SAN is zoned in such a way that the application servers cannot see the back-end storage, preventing any possible conflict between SVC and the application servers both trying to manage the back-end storage. The SVC I/O Groups are connected to the SAN in such a way that all back-end storage and all application servers are visible to all of the I/O Groups. The SVC I/O Groups see the storage presented to the SAN by the back-end controllers as a number of disks, known as Managed Disks or mdisks. The mdisks are collected into groups, known as Managed Disk Groups. The mdisks that are used in the creation of a particular vDisk must all come from the same Managed Disk Group. A vdisk is the SVC device that appears to a host system as a SAN attached disk. Each mDisk is divided up into a number of extents (default minimum size 16 MB, maximum size of 512 MB), which are numbered sequentially from the start to the end of each mDisk. Conceptually, this might be represented as shown in Figure 4-6.

Managed Disk Group


Extent 1a Extent 1b Extent 1c Extent 1d Extent 1e Extent 1f Extent 1g Extent 1h Extent 1i Extent 2a Extent 2b Extent 2c Extent 2d Extent 2e Extent 2f Extent 2g Extent 2h Extent 2i Extent 3a Extent 3b Extent 3c Extent 3d Extent 3e Extent 3f Extent 3g Extent 3h Extent 3i Extent 1a Extent 2a Extent 3a

create a striped virtual disk

VDisk1

Extent 1b Extent 2b Extent 3b Extent 1c Extent 2c Extent 3c

MDisk1

MDisk2

MDisk3

A VDisk is a collection of extents (each 16 to 512MB)

Figure 4-6 SVC block virtualization

Chapter 4. Product overview

47

4.3.3 Architecture
The IBM TotalStorage SAN Volume Controller is a modular solution that consists of a Master Console for management, up to eight cluster nodes (added in pairs), and dual UPS for write cache data protection (Figure 4-7). The nodes are the hardware elements of the SAN Volume Controller. SVC combines these nodes (servers) to create a high availability cluster. Each of the servers in the cluster is populated with 4 GB of high-speed memory, which serves as the cluster cache.

Figure 4-7 SVC components

The storage engines (or storage nodes) are always installed in pairs and combined into a high availability cluster. SVC nodes within the cluster are grouped into pairs (called I/O groups), with a single pair being responsible for serving I/O on a given vdisk. One node within the I/O Group will represent the preferred path for I/O to a given vdisk (the other node representing the non-preferred path). This preference will alternate between nodes as each vDisk is created within an I/O Group to balance the workload evenly between the two nodes. Note: For more information about the IBM TotalStorage SAN Volume Controller, see IBM TotalStorage SAN Volume Controller, SG24-6423, and the Web site:
http://www-03.ibm.com/servers/storage/software/virtualization/svc/index.htm

4.4 IBM TotalStorage DS family of disk products


IBM has a wide variety of disk products to meet a range of price/performance needs. All of these can be included in a tiered storage environment, either standalone or behind a SAN Volume Controller.

4.4.1 Enterprise disk storage


The IBM TotalStorage DS6000 and IBM TotalStorage DS8000 are designed for high reliability, scalability, capacity, performance, and multiplatform support.

4.4.2 Mid-range disk storage


The IBM TotalStorage DS4000 family, including DS4800, DS4500, DS4300, and DS4100, offers a range of price/performance/functionality options for lower tiered storage.

48

ILM Library: Techniques with Tivoli Storage and IBM TotalStorage Products

Note: For more information about IBM TotalStorage disk products, see the Web site:
http://www.ibm.com/servers/storage/disk/

4.5 IBM TotalStorage tape solutions


IBM tape products are designed to address business continuity, infrastructure simplification, and Information Lifecycle Management requirements. As IT budgets continue to shrink, tape products have become even more appealing to help reduce costs by providing a cost-effective alternative to primary disk storage. IBM offers tape systems and tape accessories that range from a single tape drive to tape libraries that store petabytes of data, and has virtualization offerings that combine the performance of disk with the affordability of tape. IBM also has offerings that are designed to help address regulatory or compliance requirements. Tape is removable and portable, provides high volumetric efficiency (amount of data that can be stored in a small form factor), and has a long life. All of these factors make tape a critical part of any tiered storage environment. Note: For more information about IBM TotalStorage tape products, see the Web site:
http://www.ibm.com/servers/storage/tape/

4.5.1 IBM Virtualization Engine TS7510


The IBM Virtualization Engine TS7510 has been developed to exploit a tiered storage hierarchy. Data that resides on tape in an ILM environment should be data that will usually be accessed infrequently, but needs to be stored on a cost-effective, reliable medium. The TS7510 architecture provides fast access to data on virtual volumes on the disk buffer, and cost-effective storage by migrating virtual volumes to tape. It has throughput of up to 600 MB/sec and supports a native storage capacity of up to 46 TB. It can be configured with up to 128 virtual tape libraries, 1024 virtual drives, and 8192 virtual volumes. Note: For more information about the IBM Virtualization Engine TS7510, seeIntroducing the IBM Virtualization Engine TS7510: Tape Virtualization for Open Systems Servers, SG24-7189, and the Web site:
http://www.ibm.com/servers/storage/tape/virtualization/index.html

4.6 Tivoli Storage Manager


IBM Tivoli Storage Manager and its complementary products provide a comprehensive solution focused on the key data protection activities of backup, archive, recovery, space management, hierarchical storage management, and disaster recovery planning.

4.6.1 Overview
IBM Tivoli Storage Manager is an enterprise-wide solution that: Provides backup-restore and archive-retrieve solutions and stores backup and archive copies of data in off-site storage Scales to protect hundreds of computers running a dozen operating systems Provides intelligent data move and store techniques

Chapter 4. Product overview

49

Provides optional modules that allow business-critical applications, which run 24 hour a day, 365 days a year, to use data protection with no interruption in service Designed for a heterogeneous environment, IBM Tivoli Storage Manager uses Local Area Network (LAN), Wide Area Network (WAN), Internet, and Storage Area Network (SAN) connectivity to provide smart data move and store techniques, comprehensive policy-based automation, and data management.

4.6.2 Components
Figure 4-8 shows the different components of Tivoli Storage Manager.

administrative intervace

backup-archive clients and Web clients Storage Manager Server

Administration Center

scheduler

TSM driver TSM database


scheduler

Policy Domain
Active Policy Set

TSM recovery log

Management Class

Backup Copy Group storage pool hierarchy tape library Archive Copy Group

Figure 4-8 Tivoli Storage Manager components

Backup-archive clients
The Tivoli Storage Manager client sends data to, and retrieves data from, a Tivoli Storage Manager server. The Tivoli Storage Manager backup-archive client must be installed on every machine that needs to transfer data to server-managed storage called storage pools. Data can be recovered to the same client machine that initially transferred it or to another client with a compatible file system format if that client has provided permission.

Storage manager server


In a traditional LAN configuration, the role of a storage manager server is to store the backup or archive data from the backup-archive clients that it supports to storage media. It also has a database of information to keep track of the data it manages, including policy management objects, and user, administrator, and client nodes.

50

ILM Library: Techniques with Tivoli Storage and IBM TotalStorage Products

Administration center
The administration center operates on the Integrated Solutions Console (ISC). It provides a task-oriented graphical user interface for storage administrators. The administration console provides support for tasks including: Creating server maintenance scripts Scheduling Adding storage devices Setting policy domains User management Viewing the health monitor

Tivoli Storage Manager database


IBM Tivoli Storage Manager saves information in the Tivoli Storage Manager database about each file, raw logical volume, or database that it backs up, archives, or migrates. This information includes the file names, file size, management class, copy group, location of the files in Tivoli Storage Manager server storage, and all other information except for the data. Data is stored in the storage pools.

Tivoli Storage Manager recovery log


The TSM recovery log keeps track of the all changes made to the Tivoli Storage Manager database, so that if a system outage should occur, a record of the changes will be available for recovery.

Policy-based management
Business policy is used to centrally manage backup-archive client data. Policies are created by the administrator and stored in the database on the server. They are maintained in the form of policy domains, policy sets, and management classes, as shown in the Figure 4-8 on page 50.

Storage pool hierarchy and tape libraries


Storage pools are a collection of like media that provides storage for backed-up, archived, and migrated data. These pools can be chained in order to create a storage hierarchy. Tivoli Storage Manager also supports a variety of tape library types including manual libraries, SCSI libraries, 349X and 358X (LTO) libraries, and external libraries.

4.6.3 Tivoli Storage Manager applications


IBM Tivoli Storage Manager offers several optional software modules to handle special needs and applications. The following section provides an overview of these software modules.

IBM Tivoli Storage Manager for Mail


IBM Tivoli Storage Manager for Mail is a software module for IBM Tivoli Storage Manager that automates the data protection of e-mail servers running either Lotus Domino or Microsoft Exchange. This module utilizes the application program interfaces (APIs) provided by e-mail application vendors to perform online hot backups without shutting down the e-mail server, and improves data restore performance. Figure 4-9 on page 52 shows how Tivoli Storage Manager for Mail works.

Chapter 4. Product overview

51

Figure 4-9 Tivoli Storage Manager for Mail

IBM Tivoli Storage Manager for Databases


IBM Tivoli Storage Manager for Databases is a software module that works with IBM Tivoli Storage Manager to protect a wide range of application data through the protection of the underlying database management systems holding that data. Tivoli Storage Manager for Databases exploits the backup-certified utilities and interfaces provided for Oracle and Microsoft SQL Server. In conjunction with Tivoli Storage Manager, this module automates data protection tasks and allows database servers to continue running their primary applications while they back up and restore data to and from offline storage. DB2 no longer requires a data protection module. Figure 4-10 shows how Tivoli Storage Manager for Databases works.

Figure 4-10 Tivoli Storage Manager for Databases

IBM Tivoli Storage Manager for Application Servers


IBM Tivoli Storage Manager for Application Servers is a software module that works with IBM Tivoli Storage Manager to better protect the infrastructure and application data and improve the availability of WebSphere Application Servers. It works with the WebSphere Application Server software to provide an applet GUI to do reproducible, automated online backup of a WebSphere Application Server environment, including the WebSphere administration database (DB2 Universal Database), configuration data, and deployed application program files. 52

ILM Library: Techniques with Tivoli Storage and IBM TotalStorage Products

Figure 4-11 shows how Tivoli Storage Manager for Application Servers works.

Figure 4-11 Tivoli Storage Manager for Application Servers

IBM Tivoli Storage Manager for Enterprise Resource Planning


Specifically designed and optimized for the SAP R/3 environment, IBM Tivoli Storage Manager for Enterprise Resource Planning (ERP) provides automated data protection, reduces the CPU performance impact of data backups and restores on the R/3 server, and greatly reduces the administrator workload necessary to meet data protection requirements. Tivoli Storage Manager for ERP builds on the SAP database and includes a set of database administration functions integrated with R/3 for database control and administration. Figure 4-12 shows how Tivoli Storage Manager for ERP works.

Figure 4-12 Tivoli Storage Manager for ERP

IBM Tivoli Storage Manager for Hardware


IBM Tivoli Storage Manager for Hardware improves the data protection of your business-critical databases and ERP applications that require 24-hour by 365-day availability. This software module helps IBM Tivoli Storage Manager and its other data protection modules to perform high-efficiency data backups and archives of your most business-critical applications while eliminating nearly all performance impact on database or ERP servers. Figure 4-13 on page 54 shows how Tivoli Storage Manager for Hardware works.

Chapter 4. Product overview

53

Figure 4-13 Tivoli Storage Manager for Hardware

IBM System Storage Archive Manager


IBM System Storage Archive Manager (formerly IBM Tivoli Storage Manager for Data Retention) facilitates compliance with the most stringent regulatory requirements in the most flexible and function-rich manner. It helps manage and simplify the retrieval of the ever-increasing amount of data that organizations must retain for strict records retention regulations. Many of the regulations demand the archiving of records, e-mails, design documents, and other data for many years, in addition to requiring that the data not be changed or deleted.

IBM Tivoli Storage Manager for Space Management


IBM Tivoli Storage Manager for Space Management frees administrators and users from manual file system pruning tasks and defers the need to purchase additional disk storage by automatically and transparently migrating rarely accessed files to Storage Manager storage while the files most frequently used remain in the local file system. This can be used for implementing Hierarchical Storage Management (HSM). Figure 4-14 on page 55 shows how Tivoli Storage Manager for Space Management works.

54

ILM Library: Techniques with Tivoli Storage and IBM TotalStorage Products

Figure 4-14 Tivoli Storage Manager for Space Management

4.6.4 Tivoli Storage Manager APIs and DR550


Tivoli Storage Manager APIs are used for Tivolis own Disaster Protection products but they are also documented and published. This allows ISVs to adapt their solutions to integrate with Tivoli Storage Manager to extend its functionality. In particular, various vendors have used the APIs to provide bare metal recovery solutions for various platforms. Among the vendors exploiting these APIs for disaster recovery include Cristie, UltraBac Software, and Symantec. The IBM System Storage DR550 is a packaged solution that uses IBM System Storage Archive Manager, as shown in Figure 4-15 on page 56. The DR550 is a storage solution designed for fast affordable access to retention managed data. The following are the main feature highlights of the DR550: It is designed as a preconfigured solution to help store, retrieve, manage, share, and secure regulated and nonregulated data. It supports nondisruptive enterprise scalability of up to 56 TB physical capacity. It is designed to offer automatic provisioning, migration, expiration, and archiving capabilities. It offers a comprehensive suite of software tools for policy-based and event-based data management. It is designed to avoid a single point of failure. For more information about the DR550, see Understanding the IBM TotalStorage DR550, SG24-7091, and the Web site:
http://www-03.ibm.com/servers/storage/disk/dr/

Figure 4-15 on page 56 shows how the Tivoli Storage Manager API fits into the DR550 solution.

Chapter 4. Product overview

55

IBM DB2 Content Management Applications Tivoli Storage Manager API

Document Management System (from ISV) Tivoli Storage Manager API

IBM System Storage Archive Manager IBM TotalStorage DS4100 Disk

IBM System Storage DR550


Figure 4-15 TSM APIs and DR550

Note: For more information about IBM Tivoli Storage Manager, see.
http://www-306.ibm.com/software/tivoli/products/storage-mgr/

For more information about the IBM System Storage DR550, see Understanding the IBM TotalStorage DR550, SG24-7091.

4.7 DB2 Content Manager


Content lifecycle management is a very important part of ILM, since unstructured data like e-mails, messages, and documents form a big part of organization data. Figure 4-16 on page 57 shows components of enterprise content management.

56

ILM Library: Techniques with Tivoli Storage and IBM TotalStorage Products

Portals

Web Browsers

Rich Clients

Presentation & Deployment


Content Extensions

Collaboration Collaboration Workflow Management Workflow Management Document Document Management Management Output & Output & Reports Reports Web Content Web Content Management Management Forms Forms Search Search

Process Monitoring Process Monitoring

Imaging Imaging Records Records Mgmt. Mgmt. Federation

Multi-media Multi-media Digital Rights Digital Rights

Content Control & Integration


Integration Services

Mgmt. Services

Version Control Version Control Check-in/Check-out Check-in/Check-out Library Servers Library Servers Catalogs Catalogs Modeling Modeling

Annotation/Redaction Access Control, RI Resource Managers Resource Managers

Archiving, +++

Content Repository Content


Storage
1

Text, Image, Video, Text, Image, Video, Audio, Print Stream, +++ Audio, Print Stream, +++

Physical Data Physical Data Storage Storage


2005 IBM Corporation

Figure 4-16 Enterprise content management components

IBM has a wide portfolio of content management products to deal with these different components of content management. Figure 4-17 shows the different products in this portfolio with their basic functionality.

D B 2 C o ntent M anager D B 2 D o cum ent M anager D B 2 C o ntent M anager O nD em and D B 2 C o ntent M anager V ideoC harger D B 2 C o m m onS tore - S AP - M S E xchan ge - Lotus D o m ino

Im aging D igital R ights M anagem ent D ocum ent M anagem ent C O LD C ontent Integration

D igital A sse D ig ita l Asset t M anagem e n M an ag e ment t Archiving

D B2 Content M anager

D B 2 Inform atio n Integrator C ontent E dition

R ecords M anagem ent D B 2 R ecord s M anager

W eb C ontent M anagem ent

W orkplace W eb C ontent M anagem ent

Figure 4-17 IBM content management portfolio

Here we cover DB2 Content Manager, which is a very important product in this portfolio.

Chapter 4. Product overview

57

DB2 Content Manager manages all types of digitized content including HTML and XML Web content, document images, electronic office documents, printed output, audio, and video. V8.3 expands record management integration and workflow.

4.7.1 Overview
Unlike simple file systems, Content Manager uses a powerful relational database to provide indexed search, security, and granular access control at the individual content item level. Content Manager provides check-in and check-out capabilities, version control, object-level access control, a flexible data model that enables compound document management, and advanced searching based on user-defined attributes. It also includes workflow functionality, automatically routing and tracking of content through a business process according to predefined rules. It provides the content infrastructure for solutions such as: Compliance in a regulated life sciences environment Records management Document lifecycle management Lotus Notes e-mail management Exchange Server e-mail management Monitoring electronic messages Digital media Web content management The multi-tier, distributed architecture of DB2 Content Manager offers: Scalability to grow from a single department to a geographically dispersed enterprise Openness to support multiple operating systems (including Linux), databases, applications, and resources A secure environment and a single source of access for administration A powerful and expressive XML-ready data model

4.7.2 Architecture
The Content Manager multi-tier distributed architecture is fully Web enabled, scalable, and extensible. It includes five core components, as follow.

Library Server
Library Server is the central source for indexing, describing, locating, organizing, and managing enterprise content. It locates stored objects using a variety of search technologies and provides secured access to content and manages transactions.

Mid-tier Server
Mid-tier Server is the Web-exploiting broker that mediates between the client and the Library Server. It supports the enhanced Content Manager API toolkit and manages connections to the Library Server and, optionally, to the Resource Managers.

Resource Manager
Resource Managers (formerly called Object Servers) are specialized repositories optimized to manage the storage, retrieval, and archival of enterprise content. DB2 Content Manager for Multiplatforms V8 includes document, image, and rich media resource managers. DB2 Content Manager VideoCharger provides streaming media support. DB2 Content Manager OnDemand manages resources for high volume print output (COLD) data. 58
ILM Library: Techniques with Tivoli Storage and IBM TotalStorage Products

eClient
eClient is the browser-based thin client that provides the graphical user interface to the Content Manager and related systems. The eClient communicates through the mid-tier server and/or directly to Resource Managers, enabling fast and secure delivery of objects while maintaining full transactional support with referential integrity.

Client for Windows


Client for Windows is the desktop client that exploits the client-server architecture, providing out-of-the-box capabilities for supporting high volume, high performance, production-level document applications.

4.7.3 Standards and data model


Content Manager is based on industry standards and internet protocols. It supports HTTP, FTP, RTSP, JDBC, SQL, and LDAP. Designed to be fully open to any application, Content Manager publishes a robust set of application programming interfaces (APIs) to handle different types of content for unified search, retrieval, workflow, access control management, and system administration. Content Manager V8 provides a powerful XML-ready, physical data model. The advanced model can capture structural and relationship information across all types of content and its associated metadata or attributes. It facilitates the integration of structured data with unstructured content. The V8 data model, combined with OO APIs, allows faster development of new and more sophisticated applications with greater flexibility. Applications which exploit fully linked relationships and management of virtual or compound documents can be created. Note: For more information about DB2 Content Manager and other products mentioned in the IBM content manager portfolio, please see the Web sites:
http://www-306.ibm.com/software/data/cm/cmgr/mp/ http://www-306.ibm.com/software/data/

4.8 DB2 CommonStore


DB2 CommonStore is within the IBM content management portfolio and works in the area of archiving, as shown in Figure 4-17 on page 57. DB2 CommonStore middleware seamlessly integrates SAP, Lotus Domino, and Microsoft Exchange Server with IBM archives. It has three different products, as follows: DB2 CommonStore for Exchange Server DB2 CommonStore for Lotus Domino DB2 CommonStore for SAP

4.8.1 DB2 CommonStore for Exchange Server


DB2 CommonStore for Exchange Server manages e-mail archiving and retrieval. It helps: Trim the size of the Exchange database to reduce storage costs. Improve e-mail system performance. Provide virtually unlimited mailbox space for each user.

Chapter 4. Product overview

59

Automated offloading to archives reduces requirements to manage e-mail server growth. It provides direct access to archives, which lets users view them via browser or Outlook desktop. V8.3 supports Exchange 5.5 Servers and easier migration from Exchange 5.5 to Exchange 2000 Server or Exchange 2003 Server. It also supports: Additional Outlook client platforms Windows Services to automate tasks More than 600 IBM or non-IBM storage devices to meet business and content lifecycle needs HTTPS to prevent unauthorized access to critical data CommonStore is a middleware server between the Exchange Server mail server system and the back-end archive management system. CommonStore does not store data or documents, but defines and manages what to archive, when to archive, and how to archive from the mail system to the back-end archive management system. The back-end archive management system can be one of the following IBM repositories: DB2 Content Manager for Multiplatforms or DB2 Content Manager for z/OS IBM DB2 Content Manager OnDemand for Multiplatforms IBM Tivoli Storage Manager The companion product, Tivoli Storage Manager for Mail, automates the data protection of e-mail servers running either Lotus Domino or Microsoft Exchange. This module utilizes the application program interfaces (APIs) provided by e-mail application vendors to perform online "hot" backups without shutting down the e-mail server and improve data-restore performance. Tivoli Storage Manager for Mail protects the growing amount of new and changing data that should be securely backed up to help maintain 24x365 application availability. Refer to Figure 4-9 on page 52. Tivoli Storage Manager for Mail allows client access to backed up e-mail after it has restored the entire set, while CommonStore allows a client immediate access to archived mail at a click of a button from an existing user client interface, Notes client, or Outlook, providing the best of both worlds. The CommonStore solution also can be extended with DB2 Content Manager or DB2 Content Manager OnDemand providing access for a user population beyond the messaging system users. The CommonStore archive for e-mail messages and documents can be accessed by any user who has rights to access, including messaging system clients, Content Manager, or Content Manager OnDemand clients, Internet, and intranet users.

4.8.2 DB2 CommonStore for Lotus Domino


DB2 CommonStore for Lotus Domino manages e-mail archiving and retrieval for any Notes database or server platform. It helps: Trim the size of the Notes database to reduce storage costs. Improve e-mail system performance. Provide virtually unlimited user mailbox space. It is tightly integrated with Domino ND6 and with Content Manager repositories and it provides options to integrate e-mail content with images, facsimiles, and for policy-driven archives. It supports more than 600 IBM or non-IBM storage devices, providing flexibility to meet business and content lifecycle needs. V8.3 integrates records management capabilities and strengthens security, search, and archiving.

60

ILM Library: Techniques with Tivoli Storage and IBM TotalStorage Products

DB2 CommonStore for Lotus Domino connects the world of Lotus Notes and Domino with the electronic archive. Three archives are supported: Tivoli Storage Manager allows you to archive attachments only, so the document remains in the Notes database, but has a pointer inserted that points to the location of the attachment. DB2 Content Manager captures, indexes, manages, and distributes electronic documents (including scanned paper, faxes, electronic documents, images, audio, video, and e-mail). DB2 Content Manager OnDemand stores and manages e-mail and automatically captures, indexes, and manages print streams.

4.8.3 DB2 CommonStore for SAP


DB2 CommonStore for SAP helps clients: Offload operational SAP databases. Work with non-SAP documents from within SAP Business Objects. Process business documents that reside in an external archiving system. It supports any SAP operational database such as DB2 Universal Database, Informix, Oracle, etc. CommonStore for SAP is a middleware server between the SAP ArchiveLink interface and a (required) back-end archive. It integrates documents into SAP applications such as: SAP Document Management System (DMS), allowing users to archive in batch instead of one document at a time. SAP R/3 Document Finder for access to any enterprise content stored in Content Manager or DB2 CM OnDemand, not just SAP archived items. SAP Workflow Integration to start workflow from document capture outside SAP. DB2 CommonStore does not store any data or documents. Instead, it manages the data and document archive process defined by the SAP ArchiveLink protocolstoring and retrieving archived content to and from the back-end archive management repositories. The back-end archive management system can be one of the following IBM repositories: IBM DB2 Content Manager IBM DB2 Content Manager OnDemand IBM Tivoli Storage Manager Note: For more information about DB2 CommonStore, see the Web site:
http://www-306.ibm.com/software/data/commonstore/

4.9 More information


Here we only give you an overview of the specific IBM products. For more information about these, as well as some other IBM products that are helpful when implementing ILM, see the Web sites: The IBM TotalStorage Solutions Handbook, SG24-5250:
http://www.redbooks.ibm.com/redbooks/pdfs/sg245250.pdf

IBM System Storage and TotalStorage Web site:


http://www.storage.ibm.com

Chapter 4. Product overview

61

IBM Disk Storage Systems:


http://www-03.ibm.com/servers/storage/disk/index.html

IBM Tape Systems:


http://www-03.ibm.com/servers/storage/tape/index.html

IBM TotalStorage Productivity Center with Advanced Provisioning:


http://www-03.ibm.com/servers/storage/software/center/provisioning/

IBM Tivoli Software Web site:


http://www-306.ibm.com/software/tivoli/

62

ILM Library: Techniques with Tivoli Storage and IBM TotalStorage Products

Part 2

Part

Evaluating ILM for your organization


In this part we introduce techniques for evaluating and proposing the value of ILM in an organization.

Copyright IBM Corp. 2006. All rights reserved.

63

64

ILM Library: Techniques with Tivoli Storage and IBM TotalStorage Products

Chapter 5.

An ILM quick assessment


This chapter provides a methodology to perform a quick assessment for ILM solutions. It will help you: Get information about data usage profiles in your current environment. Use IBM TotalStorage Productivity Center for Data in the assessment, including the best reports to run. Collect, classify, and analyze data. Discuss Return On Investment (ROI) on an ILM project.

Copyright IBM Corp. 2006. All rights reserved.

65

5.1 Initial steps


One of the most important steps in implementing an ILM solution is to classify the current data and define the classes aligned to its value and service level requirements. This is the first logical step, as explained in Chapter 3, Implementing ILM on page 29. The assessment objectives are to collect information about the current storage environment and create reports to enable storage administrators to take actions. These actions will allow you to store information on the storage device with the most appropriate cost, maximize the utilization of the installed storage devices, and improve the capacity to plan storage growth through better forecasting and trending. An ILM assessment can be executed by performing the steps shown in Figure 5-1, beginning with documenting the current environment information and concluding with defining appropriate actions leading to the assessment findings.

Get business and storage environment information Define data collection reports (IBM TPC for Data) Classify data and analyze TPC for Data reports Define actions: (Delete data, migrate data, etc)

Calculate ROI

Assessment Findings
Figure 5-1 Quick Assessment steps

The next sections describe each step and help you to create a quick assessment for an ILM solution.

5.2 Getting business and storage information


First of all, it is necessary to know what is the current storage environment and what are the specific business and technical drivers for improving data management. Collecting the following information is useful and makes it easier to understand the current storage environment: Types of storage technologies running Quantities of enterprise, mid-range, and long-term retention storages

66

ILM Library: Techniques with Tivoli Storage and IBM TotalStorage Products

Terabytes of storage installed, terabytes in use, growth rate, and reasons for the growth How many people are managing the storage environment and what tools they are using What the objectives are for storage environment What the main storage management problems to be solved are and their priority What, if any, data classes are currently defined and whether a tiered-storage environment has been implemented to support these classes The main applications running and their objectives for storage performance, capacity, availability, and recoverability This is a comprehensive list; you may not be able to collect all of the information, for reasons of time, practicality, or complexity. In 5.3, Defining data collection reports on page 67, we discuss methods to help collect this information and classify the data.

5.3 Defining data collection reports


This section provides some steps to collect data, define reports, and evaluate gains with an ILM solution. These steps can be summarized as: 1. Define current storage tiers in use. 2. Match file systems in use to storage tiers creating file system groups. 3. Choose the best reports to run. To collect reports from current storage in use and classify data, we use IBM TotalStorage Productivity Center for Data (TPC for Data). For more information about TPC for Data, see Chapter 4, Product overview on page 37. For installation, configuration, and usage instructions of TPC for Data, access:
http://publib.boulder.ibm.com/infocenter/tivihelp/v4r1/index.jsp?toc=/com.ibm.itpc_data. doc/toc.xml

5.3.1 Creating groups of data


The first step is to define tiers of storage where the current data is allocated. For example, we show some typical tiers to classify storage type, as shown in Table 5-1. Most enterprises have a subset of these tiers. You should consider only the tiers currently installed.
Table 5-1 Storage tiers Tier T0 T1 T2 T3 Storage type Local Direct Attached Storage (DAS) Storage Area Network (SAN) - Enterprise storage Storage Area Network (SAN) - Mid-range storage Low-cost archive storage

After identifying tiers per storage type, we can create groups for file systems considering the type and which tier of storage they are using. Table 5-2 on page 68 shows an example of file system groups for a typical company that has Windows and UNIX systems. The entries in the Storage tier column map back to the storage tiers shown in Table 5-1.
Chapter 5. An ILM quick assessment

67

Table 5-2 File system groups File system group T0-Workstation T0-Windows-OS T0-Windows-App T0-Unix-OS T0-Unix-App T1-Windows T1-Unix T2-Windows T2-Unix T3-Windows T3-Unix Server type Workstation Windows Windows UNIX UNIX Windows UNIX Windows UNIX Windows UNIX File system type All OS Application OS Application Any Any Any Any Any Any Storage tier T0 T0 T0 T0 T0 T1 T1 T2 T2 T3 T3

More details about creating file system groups can be found in Chapter 2, Monitoring, of the Manual IBM TotalStorage Productivity Center for Data Users Guide, GC32-1728. After creating file system groups, the system administrators should assign each actual file system to one file system group only. All file systems being monitored should be assigned to a group. These file system groups will be used to collect data using TPC for Data. Table 5-3 shows an example of assigning file systems to file system groups.
Table 5-3 Matching file systems to groups Computer AIX_Server1 AIX_Server1 AIX_Server1 Windows_Server1 Windows_Server1 File system /usr /db_datafilesa /products C: E:b File system group T0-Unix-OS T1-Unix T0-Unix-App T0-Windows-OS T2-Windows

a. Assuming /db_datafiles is created on enterprise storage. b. Assuming E: is a drive of a mid-range storage.

Why create file system groups? Creating file system groups enables TPC for Data reports to be generated at the file system groups level. It will facilitate the analysis of the usage of each tier. Remember that data classification is used to select candidate files of a storage tier to be moved to others. By creating reports of these file system groups, you can analyze each data type separately in a storage tier.

68

ILM Library: Techniques with Tivoli Storage and IBM TotalStorage Products

5.3.2 Collecting reports from TPC for Data


For a quick assessment, we propose some useful TPC for Data reports, divided into four classes: System Reports File Reports Access Load Reports Database Reports We will use these reports to collect information about servers, disks, files, users, databases, and file systems that will help to classify data and define the data value classes. The System Reports in TPC for Data contains standard reports that are automatically generated for all the machines on the network that are being monitored. These pre-defined reports provide a quick and efficient view of the enterprise data. Data for these system reports is gathered during the last scan scheduled for each computer. For more information about the scan process, see Chapter 2, Monitoring, in IBM TotalStorage Productivity Center for Data Users Guide, GC32-1728.

File Reports are reports on the files found during the scan process. During the scan process, TPC for Data gathers a vast number of statistics on the files and a number of attributes about the files. These can be reported on in a variety of ways:
Largest Files Reporting - Information about the largest files in the environment Most Obsolete Files Reporting - Information about the files in the environment that have not been accessed in the longest period of time Duplicate Files Reporting - Information about the files in the environment during a scan that have duplicate file names Orphan Files Reporting - Information about files owned by users that are no longer active in the environment File Size Distribution Reporting - Information about the distribution of file sizes across storage resources File Summary Reporting - Summary information about the files in the environment File Types Reporting - Information about the storage usage of different file types in the environment The Access Load Reports monitor and report on the usage and growth of the consumption of storage. These reports can be viewed based on specific file systems and computers, groups of file systems and computers, or throughout the entire enterprise. Use these reports to: View which servers and file systems are experiencing the heaviest (or lightest) load of data and storage access. Identify wasted space by pinpointing files that are no longer needed or have not been accessed for the longest time. The Database Reports provide both overview and detailed information about the storage used by tables in a Relational Database Management System (RDBMS), including Oracle, Sybase SQL Server, Microsoft SQL Server, and UDB/DB2. The reporting features are very powerful; you can select the instances, tablespaces, databases, devices, containers, data files, fragments, tables, control files, redo logs, archive log directories, and even users to report on.

Chapter 5. An ILM quick assessment

69

These reports help project storage consumption for the future and maximize the current storage assets currently in place by eliminating wasted space and making the most of the space available.

Best TPC for Data system reports for ILM


We selected some System Reports that provide the most value for ILM, as our best practice system reports for an ILM assessment.

Access File Summary


The Access File Summary report provides overview information for files used by computers, file systems, and other resources. Through this report you can view the historical number of files for each resource in the report. The historical chart can be generated to show daily, weekly, or monthly history. This report will provide the following information: Total Size - Total size of the storage space consumed by the files on a network File Count - Total number of files on a network Directory Count - Total number of directories on a network Avg File Size - Average storage space consumed by each of the files on a network File system Capacity - Total storage capacity of the files on a network Figure 5-2 is an example of this report, showing historical values and predicted trends for space used by files. This report helps with storage capacity planning.

Figure 5-2 Access File Summary report

70

ILM Library: Techniques with Tivoli Storage and IBM TotalStorage Products

Access Time Summary


The Access Time Summary report provides a summary of the number of files in the environment and when they were last accessed (for example, created, modified, etc.) during the past day, the past week, over a year, etc. This report provides the following information: Last Accessed <= 1 day - The number and total size of the files accessed within the last day Last Accessed 1 day 1 week - The number and total size of the files accessed between 1 day and 1 week ago Last Accessed 1 week 1 month - The number and total size of the files accessed between 1 week and 1 month ago Last Accessed 1 month 2 months - The number and total size the files accessed between 1 month and 2 months ago Last Accessed 2 months 3 months - The number and total size of the files accessed between 23 months ago Last Accessed 3 months 6 months - The number and total size of the files accessed between 36 months ago Last Accessed 6 months 9 months - The number and total size of the files accessed between 69 months ago Last Accessed 9 months 1 year - The number and total size of the files accessed between 9 months and 1 year ago Last Accessed > 1 year - The number and total size of the files accessed over a year ago Total Count - Total number of files across a network Total Size - Total size of the space consumed by the files on a network Average Age - Average age of files on a network measured by days, hours, minutes, and seconds Figure 5-3 on page 72 shows an example of this report.

Chapter 5. An ILM quick assessment

71

Figure 5-3 Access Time Summary report

Disk Capacity Summary


The Disk Capacity Summary report reports and chart disk capacity, per disk, per computer, per cluster, per computer group, per domain, or for the whole network. This report provides the following information: Capacity - Total storage capacity of the disks on the computers within a network File system Used Space - Amount of used storage space on the file systems within a network File system Free Space - Amount of unused storage space on the file systems within a network Raw Volume Space - Space on host-side logical volumes that is not occupied by file systems Overhead - Total RAID/Mirror redundancy (For example, two 1 GB disks mirrored together would have an overhead of 1 GB.) Unallocated Space - Space assigned to a (monitored) host that is not part of any logical volume. Unknown LUN Capacity - LUN capacity of unknown usage Figure 5-4 on page 73 shows an example of this report.

72

ILM Library: Techniques with Tivoli Storage and IBM TotalStorage Products

Figure 5-4 Disk Capacity Summary report

Oldest Orphaned Files


The Oldest Orphaned Files report provides information about files that have the oldest creation date with owners who are no longer registered as users on the computer/network. This report provides the following information: Access Time - Date and time when an orphaned file was last accessed. Computer - Name of the computer where an orphaned file is stored. File system - File system where an orphaned file is stored. Path - Full path to the location of an orphaned file. Owner - Operating system internal ID of the user who owned the orphaned file. This is the internal ID the operating system uses to identify the user, and not the user ID. OS User Group - OS user group to which the owner of an orphaned file belongs. Physical Size - Physical size of an orphaned file (measured in kilobytes, megabytes, or gigabytes). Modification Time - Date and time when an orphaned file was last modified. Create Time - Data and time when an orphaned file was created. Figure 5-5 on page 74 shows an example of this report.

Chapter 5. An ILM quick assessment

73

Figure 5-5 Oldest Orphaned Files report

Storage Access Times


The Storage Access Times report indicates when files were last accessed. This report provides the following information: Computer - Name of a computer against which the report was run Last Accessed <= 1 day Count, Total Size - Number of files that were accessed within the last 24 hours and the total physical size of those files (measured in kilobytes, megabytes, or gigabytes) Last Accessed 1 day - 1 week Count, Total Size - Number of files that were accessed between 1 day to 1 week previous and the total physical size of those files (measured in kilobytes, megabytes, or gigabytes) Last Accessed 1 week - 1 month Count, Total Size - Number of files that were accessed between 1 week to 1 month previous and the total physical size of those files (measured in kilobytes, megabytes, or gigabytes) Last Accessed 1 month - 1 year Count, Total Size - Number of files that were accessed between 1 month to 1 year previous and the total physical size of those files (measured in kilobytes, megabytes, or gigabytes) Last Accessed > 1 year Count, Total Size - Number of files that were accessed over one year previous and the total physical size of those files (measured in kilobytes, megabytes, or gigabytes) Total Count - Total of all the counts Total Size - Total of all the sizes Average Age - Average time since the file was last accessed

74

ILM Library: Techniques with Tivoli Storage and IBM TotalStorage Products

Figure 5-6 shows an example of this report.

Figure 5-6 Storage Access Times report

Storage Capacity
The Storage Capacity report provides storage capacity information about each monitored system. This report provides the following information: Computer - Name of a computer against which the report was run Capacity - Total storage capacity for a computer Unallocated Space - Amount of unused storage space on a computer (not in file systems seen by this operating system) OS Type - Operating system running on a computer Network Address - Network address of a computer IP Address - IP address of a computer Time Zone - Time zone in which a computer is running Figure 5-7 on page 76 shows an example of this report.

Chapter 5. An ILM quick assessment

75

Figure 5-7 Storage Capacity report

Storage Modification Times


The Storage Modification Times report provides information about files within the network that were modified: Within the last 24 hours Between 24 hours and one week previous Between one week to one month previous Between one month to one year previous More than one year previous This report provides the following information: Last Modified <= 1 day Count, Total Size - Number of files that were modified in the last 24 hours and the total physical size of those files (measured in kilobytes, megabytes, or gigabytes) Last Modified 1 day - 1 week Count, Total Size - Number of files that were modified between 1 day to 1 week previous and the total physical size of those files (measured in kilobytes, megabytes, or gigabytes) Last Modified 1 week - 1 month Count, Total Size - Number of files that were modified between 1 week to 1 month previous and the total physical size of those files (measured in kilobytes, megabytes, or gigabytes) Last Modified 1 month - 1 year Count, Total Size - Number of files that were modified between 1 month to 1 year previous and the total physical size of those files (measured in kilobytes, megabytes, or gigabytes) Last Modified > 1 year Count, Total Size - Number of files that were modified over a year previous and the total physical size of those files (measured in kilobytes, megabytes, or gigabytes) 76
ILM Library: Techniques with Tivoli Storage and IBM TotalStorage Products

Total Count - Sum of the files Total Size - Sum of the size of the files Average Age - Average age since files were modified Figure 5-8 shows an example of this report.

Figure 5-8 Storage Modification Times report

Total Freespace
The Total Freespace report shows the total amount of unused storage across a network. This report provides the following information: Free Space - Total amount of available storage space available on a network Percent Free Space - Percentage of total space that is unused on a network Used Space - Amount of used storage space on a network Capacity - Total amount (capacity) of storage space on a network File Count - Total number of files on a network Directory Count - Total number of directories on a network Percent Free Inodes - Percent of free Inodes on a network Used Inodes - Number of used Inodes on a network Free Inodes - Number of free Inodes on a network Figure 5-9 on page 78 shows an example of this report.

Chapter 5. An ILM quick assessment

77

Figure 5-9 Total Freespace report

User Space Usage


The User Space Usage report provides information about storage statistics related to a specific user. This report provides the following information: User Name - ID of a user Total Size - Total amount of space used by a user File Count - Number of files owned/created by a user Directory Count - Number of directories owned/created by a user Largest File - Largest file owned by a user 2nd Largest File - Second-largest file owned by a user File Size <1KB Count, Total Size - Number and total space usage of files under 1 KB in size File Size 1KB - 10KB Count, Total Size - Number and total space usage of files between 1 KB and 10 KB in size File Size 10KB - 100KB Count, Total Size - Number and total space usage of files between 10 KB and 100 KB in size File Size 100KB - 1MB Count, Total Size - Number and total space usage of files between 100 KB and 1 MB in size File Size 1MB - 10MB Count, Total Size - Number and total space usage of files between 1 MB and 10 MB in size

78

ILM Library: Techniques with Tivoli Storage and IBM TotalStorage Products

File Size 10MB - 100MB Count, Total Size - Number and total space usage of files between 10 MB and 100 MB in size File Size 100MB - 500MB Count, Total Size - Number and total space usage of files between 100 MB and 500 MB in size File Size > 500MB Count, Total Size - Number and total space usage of files over 500 MB in size Figure 5-10 shows an example of this report.

Figure 5-10 User Space Usage report

Wasted Space
The Wasted Space report provides information about storage statistics on non-OS files not accessed in the last year and orphan files. This report provides the following information: Computer - Name of the computer that contains wasted space Total Size - Total amount of space used by the obsolete and orphan files File Count - Total number of obsolete and orphan files Directory Count - Total number of orphan directories Avg File Size - Average size of obsolete and orphan files Figure 5-11 on page 80 shows an example of this report generated with the following conditions: (ATTRIBUTES include any of (ORPHANED) OR (NAME matches none of ('?:\WINNT\system*\%', '/usr/lib/%', '/usr/bin/%', '/sbin/%', '/usr/sbin/%', '/etc/%') AND LAST ACCESSED earlier than 365 days 06:00 ago)).

Chapter 5. An ILM quick assessment

79

Figure 5-11 Wasted Space report

Best TPC for Data File Reports for ILM


We selected some File Reports that provide most value for ILM, as our best practice file reports for an ILM assessment. All of them can be used to report by file system group level. Use the groups created in 5.3.1, Creating groups of data on page 67, to generate the following reports.

Largest Files report


The Largest Files report generates reports of detailed information about the largest files found in the environment. The report can be viewed by directory, directory group, file system, file system group, cluster, computer, computer group, domain, and for the entire network. The default largest files profile is set to collect the 20 largest files per file system. This value may be increased if required. Figure 5-12 on page 81 shows an example of this report.

80

ILM Library: Techniques with Tivoli Storage and IBM TotalStorage Products

Figure 5-12 Largest Files report

Duplicate Files report


The Duplicate Files report generates reports on duplicate files found during a scan. This data can be analyzed to identify files that might no longer be needed and could be wasting storage space. The report can be viewed by directory, directory group, file system, file system group, cluster, computer, computer group, domain, and for the entire network. Figure 5-13 on page 82 shows an example of this report.

Chapter 5. An ILM quick assessment

81

Figure 5-13 Duplicate Files report

File Types report


The File Types report shows data organized by the file types found during a scan. TPC for Data will collect storage usage information about such file types as .exe, .zip, .sys, .pdf, .doc, .dll, .wav, .mp3, .avi, etc. Each of these file types is represented by its own row in the generated reports. Use these reports to: Relate the space used by applications to the total capacity and used space. For example, view the total amount of storage consumed by Acrobat files and Lotus Notes mail databases. View which applications are consuming the most space across a given set of storage resources. View the total amount of storage consumed by different types of data, like non-business and temporary files. Details about non-business files, temporary files, and other types of data are in 5.4.1, Types of data on page 90. Figure 5-14 on page 83 shows an example of this report.

82

ILM Library: Techniques with Tivoli Storage and IBM TotalStorage Products

Figure 5-14 File Types Report

Best TPC for Data access load reports for ILM


We selected some access load reports that provide most value for ILM, as our best practice access load reports for an ILM assessment. They can be used to report by file system group level. Use the groups created in 5.3.1, Creating groups of data on page 67, to generate the following reports.

Access Time report


The Access Time report shows the amount of data and the number of files that have been accessed during the last day, week, month, year, and more. The information can be viewed at the directory level, file system level, computer level, domain level, and for the entire network. This report will help to identify which files are candidates to be migrated to another storage tier by identifying files which are infrequently accessed but are currently stored on high-cost storage (or the opposite). Figure 5-15 on page 84 shows an example of this report.

Chapter 5. An ILM quick assessment

83

Figure 5-15 Access Time report

Modification Time report


The Modification Time report shows the amount of data and the number of files that have been modified during the last day,week, month, year, and beyond. This information can be viewed at the directory level, file system level, computer level, domain level, and for the entire network. This report will help to identify which files are candidates to be migrated to another storage tier by showing mismatches between modification times and the cost of the storage used. Figure 5-16 on page 85 shows an example of this report.

84

ILM Library: Techniques with Tivoli Storage and IBM TotalStorage Products

Figure 5-16 Modification Time Reporting

Best TPC for Data database reports for ILM


We selected some database reports that provide the most value for ILM as our best practice database reports for an ILM assessment.

Database Storage By Computer


The Database Storage By Computer report shows information about the databases in the environment sorted by the computer or computers on which they are stored. This report provides the following information: Computer - Name of the computer where the databases are located Total Size - Amount of space consumed by the databases on the computers Data File Capacity - Storage capacity of the data files within the databases Data File Free space - Amount of free space available in the databases data files Tablespace Count - Number of tablespaces associated with the databases Data File Count - Number of data files associated with the tablespaces in the databases Log File Count - Number of log files associated with the databases The Figure 5-17 on page 86 shows an example of a Database Storage By Computer report with size distribution among the databases.

Chapter 5. An ILM quick assessment

85

Figure 5-17 Database Storage by Computer report

This report can also be viewed in table layout. Check the free space for each database, as shown in Figure 5-18 on page 87.

86

ILM Library: Techniques with Tivoli Storage and IBM TotalStorage Products

Figure 5-18 Database Storage by Computer report table

Total Database Freespace or DMS Container Freespace report


The Total Database Freespace report provides information about the total free space for data files at a network-wide level, as well as the percentage of free space, the total used space, number of free extents, and number of data files. This report provides the following information: Free Space - Amount of free space on all data files in the network Percent Free - Percent of free space available on the data files in the network Used Space - Amount of used space on the data files in the network Total Size - Total size of data files in the network Free Extents - Number of free extents on the data files in the network Coalesced Extents - Number of coalesced extents on the databases in the network Number Data Files - Number of data files on the databases in the network The Total DMS Container Freespace report shows the total free space for the containers associated with DMS tablespaces on UDB Instances within the environment. This report provides the following information: Free Space - Amount of free space available on the DMS containers within a network Percent Free - Percentage of free space available on the DMS containers within a network Used Space - Amount of storage space consumed on the DMS containers within a network Total Size - Total amount of space on the DMS containers within a network Number of Containers - Number of DMS containers within a network

Chapter 5. An ILM quick assessment

87

Figure 5-19 shows an example of this report.

Figure 5-19 Total Database Free report

Segments with Wasted Space


The Segments with Wasted Space report provides information about the Oracle segments containing allocated space that is currently empty/not being used. This report can help to discover space that can be reclaimed and allocated to other objects. This report is available for Oracle databases only. This report provides the following information: Empty Used Space - Amount of empty used space within a segment (table, index, etc.) Segment Creator - Owner of a segment Segment Name - Name of a segment Computer - Name of the computer on which the segments Instance resides Instance - SID of an Oracle Instance Database - Name of the database to which the segment belongs Tablespace - Name of the tablespace to which the segment belongs Partition - Partition on which the segment is stored Segment Type - Type of the segment, including: Table Table partition Table subpartition Nested table Cluster Index Index partition Index subpartition Lobindex

88

ILM Library: Techniques with Tivoli Storage and IBM TotalStorage Products

Lobsegment Lob partition Lob subpartition Parent Type - Subset of the segment type Parent Creator - Owner of a segment Parent Name - Name of a segment Total Size - Amount of space allocated to a segment Number of Extents - Number of extents allocated to a segment Freelist Blocks - Number of blocks on the freelist chain Initial Extent - Size of the first extent allocated to a segment Next Extent - Amount of space Oracle will retrieve when allocating another extent Maximum Extents - Maximum number of extents that Oracle would allocate to an object Percent Increase - Percent increase is size Oracle will allocate for the next extent Figure 5-20 shows an example of this report.

Figure 5-20 Segments with Wasted Space report

5.4 Classifying data and analyzing reports


Classifying data is important because organizations should not pay, store, or protect data that is not used or that is not critical to business. After classifying data, they will be able to reclaim storage space taking actions like deleting, moving, and archiving data. The following list contains the reports we selected in 5.3.2, Collecting reports from TPC for Data on page 69. The best system reports are: Access File Summary

Chapter 5. An ILM quick assessment

89

Access Time Summary Disk Capacity Summary Oldest Orphaned Files Storage Access Times Storage Capacity Storage Modification Times Total Freespace User Space Usage Wasted Space The best database reports are: Access File Summary Total Database Freespace Segments with Wasted Space These system reports and database reports provide information about storage infrastructure, occupancy, storage allocation, and usage of a single computer, database, or entire network that make easier to understand the storage environment currently in use. The best file reports are: Largest Files Reporting Duplicated Files Reporting File Types Reporting The best access load reports are: Access Time Reporting Modification Time Reporting The next step is to classify the data, understand how to use each report above, separate all types of files, and view the amount consumed by each of them. The next section describes an overview of each type of data.

5.4.1 Types of data


The business value of files typically changes over time. Some files have no value at all, while some are only of temporary value. This section describes different types of files that should be identified in a storage environment. The objective of classifying data is to reclaim storage space and find file candidates for moving to lower-cost storage or delete data without business value.

Non-business files
These are files that do not belong to any business application. One approach to identifying these files may be to use their extension, for example, an mp3 may represent a personal music file and be a non-business file. However, the file types that are business-related are often industry specific; for a media company, .mp3 files may well be business-critical data. Therefore, organizations should individually define what data are related to their business. We discuss approaches to this in Chapter 7, ILM initial implementation on page 119.

Duplicate files
These might be copies of the same file that are created in different locations to share data among different applications, or files that are duplicated by users typically in fileservers. Duplicate files are usually identified by the file name and size.

90

ILM Library: Techniques with Tivoli Storage and IBM TotalStorage Products

Temporary files
These are files that are created and, after being used, should be deleted. If not managed, these files can use storage space needed for critical data. These files are mostly identified by their extension, for example, .zip, .bkp, .old, and might also be older log files, dump files, etc.

Stale files
Files that belong to users that no longer exist (also known as orphaned files) and files that have not been accessed in a period of time or have an access rate below a certain threshold. These are mostly identified by their last access time or modification time.

Valid files
These are all remaining files that are related to business. Critical data, application data, and all files with value to a business are considered valid data. This data should be protected and allocated to high-cost or low-cost storage according to its value.

5.4.2 Data classification


This section describes steps to classify and view the amount of storage used by different types of data. The steps are: 1. 2. 3. 4. 5. Reporting non-business files Reporting duplicate files Reporting temporary files Classifying valid data and verifying stale files to be migrated to other tiers Reporting database unused space

Reporting non-business files


As mentioned before, organizations and system administrators should define which data has no value to business. A quick way to find non-business files is to search for file extensions defined as non-business, and then investigate these files to see how much storage space is being consumed. Use the File Types report on page 82 to report storage space occupied by file extensions defined as non-business. This report can be viewed by the filespace group. Using the groups defined in 5.3.1, Creating groups of data on page 67, you can evaluate how much space these files are using in each storage tier.

Reporting duplicate files


Duplicate files are usually created by fileserver users or intentionally created by administrators to share files among different applications. These files may be located in the same or different servers. The quickest way to find duplicated files is to search for files with the same name, date, and size reported by Duplicate Files report on page 81. This report analyzes metadata information only. File content is not checked by TPC for Data, and duplicated files with different names, for example, will not be identified. This report can also be viewed by the file system group previously defined. Then the space that these files are using in each storage tier can be easily viewed, too.

Chapter 5. An ILM quick assessment

91

Reporting temporary files


Temporary files are files that no longer have value for any application. These files can usually be deleted. The quickest way to find temporary files is to search for file extensions most common for temporary files, for example, .tmp, .bak, .log, .dmp, .txt, .zip, .bkp, .old, core. Use the File Types report on page 82 to report storage space occupied by file extensions most commonly found in temporary files. This report can also be viewed by file system groups previously defined. By using these groups you can evaluate how much space these files are using in each storage tier.

Classify valid data and verify stale files


After viewing invalid data (non-business, temporary, duplicated files), the next step is to analyze the valid data and check if files are allocated in the correct storage tier. The following reports help to evaluate the storage usage and search for stale data.

Last access date report


Create a report showing the amount of storage accessed during different periods of time for each file system group or storage tiers. First create a report file system group. Include the file system groups created in 5.3.1, Creating groups of data on page 67, of the same tier. Table 5-4 shows an example.
Table 5-4 Reporting file system groups Reporting file system group T1-Total T2-Total Monitoring file system groups T1-Windows, T1-Unix T2-Windows, T2-Unix

For More details about reporting on file system groups on TPC for Data, see Chapter 5, Reporting, in IBM TotalStorage Productivity Center for Data Users Guide, GC32-1728. After creating report groups, use the report Access Time report on page 83 to show the amount of storage last accessed: Between the last 3 to last 6 months Between the last 6 to last 9 months Between the last 9 to last 12 months Before the last 1 year Generate this report by the report groups created in Table 5-4. Figure 5-21 on page 93 shows an example of this report generated by report groups, where the amount of storage for each tier can be evaluated.

92

ILM Library: Techniques with Tivoli Storage and IBM TotalStorage Products

Figure 5-21 Access Time Reporting by report group

Evaluate the amount of storage accessed during these periods of time. If there are file systems in one or more periods, they should be good candidates for migration to a lower-cost storage tier, as shown in Figure 5-22 on page 94.

Chapter 5. An ILM quick assessment

93

Figure 5-22 Access Time reporting file systems

Last modification date report


Create a report showing the amount of storage modified during different periods of time for each file system group or storage tiers. Use the report described in Modification Time report on page 84 to show the amount of storage last modified: Between the last 3 to last 6 months Between the last 6 to last 9 months Between the last 9 to last 12 months Before the last 1 year Generate this report using the report groups created in Table 5-4 on page 92. Figure 5-23 on page 95 shows an example of this report generated by report groups. Use this report to evaluate the amount of storage per tier.

94

ILM Library: Techniques with Tivoli Storage and IBM TotalStorage Products

Figure 5-23 Modification Time Reporting by report group

Evaluate the amount of storage modified during these periods of time. If there are file systems in one or more periods, they could be good candidates for migration to a lower-cost storage tier, as shown in Figure 5-24 on page 96.

Chapter 5. An ILM quick assessment

95

Figure 5-24 Modification Time reporting file systems

Largest files report


Create a report showing the largest files for each file system group or storage tier. Use the report described in Largest files report on page 96 to show the largest files for each file system, file system group, or tier (report group), and evaluate whether they are allocated in the appropriate storage tier. You can define filters to this reports to exclude file paths or file names that match to valid large files. For example, exclude all files in /oracle or .dbf, since these correspond to database files. Generate this report using the report groups shown in Table 5-4 on page 92. For more details about creating filters for reports on TPC for Data, see Chapter 5, Reporting, in IBM TotalStorage Productivity Center for Data Users Guide, GC32-1728.

Reporting database unused space


Databases are typically large consumers of storage space and a complete ILM assessment should analyze these data files to report the amount of storage used and unused by them. Use the report described in Total Database Freespace or DMS Container Freespace report on page 87 to show the amount of free space remaining in the datafiles. Utilization of 60 percent or less in database files is common, meaning the remaining space cannot be used by other files. This report checks the data files to show how much space is free. Another useful report specifically for Oracle databases is described in Segments with Wasted Space on page 88. Use this report to show the unused segments and to check how much space is wasting storage resources.

96

ILM Library: Techniques with Tivoli Storage and IBM TotalStorage Products

5.5 Defining actions with classified data


The quick assessment goal is to collect, as quickly as possible, information about storage usage to help administrators view the current environment status, and help them take actions to reclaim storage space and improve storage capacity and availability. These actions are important decisions and should be taken considering the information provided by data classification reports, administrators knowledge about their systems, and best practices and recommendations of product vendors. The following sections provide general suggestions to help manage the space used by each type of data. For more details and proposed solutions for ILM and data classification, see Chapter 7, ILM initial implementation on page 119.

5.5.1 Actions for non-business files


The section Reporting non-business files on page 91 shows TPC for Data-generated reports with the file types selected as non-business and the amount of storage they are using. Ultimately, the best choice might be to simply delete these files, considering that enterprise storage space should not be used for non-business files. However, at this stage of the assessment, it is important to know how much space is being used by these files. If reports show large amounts of storage occupied by non-business files, an administrator should further investigate why these files are being created, and take appropriate actions to stem this growth, for example, implementing quotas for users personal directories or issuing guidelines on storage of non-business files by users.

5.5.2 Actions for duplicate files


In Reporting duplicate files on page 91, TPC for Data generated reports for duplicate files and the amount of storage they are using. Possible solutions for duplicate files are: Share files on network drives so that only one copy is accessed by all users. Create symbolic links to allow applications to share files. Delete files for unnecessary duplicated files. Sharing duplicate files means reorganizing files to be located in a shared place where all applications can use them. Sharing file solutions can be a simple shared folder, Network Attached Storage (NAS). Creating symbolic links is a useful and fast solution for duplicate files in the same UNIX system. They can be easily created between versions of duplicate files or directories, saving storage space and preserving access paths needed by applications. Deleting duplicated files may be the best option, but this should be done carefully to ensure that users and applications still have access to their files.

5.5.3 Actions for temporary files


In Reporting temporary files on page 92, TPC for Data generated reports on temporary files and the amount of storage they are using.

Chapter 5. An ILM quick assessment

97

Often temporary files can be deleted on a cyclical basis, for example, files in temporary directories that are older than 90 days (or whatever policy you want to set), or archiving older log files. It is important to monitor the amount of storage space used for temporary files, and check if the growth rate is constant or variable. It should be proportional to used storage growth rate.

5.5.4 Actions for stale files


In Classify valid data and verify stale files on page 92, TPC for Data generated reports with the amount of storage last accessed and last modified in different periods of time. Analyzing this report, we can select some file systems to be migrated to a lower-cost tier storage. In the first report, described in Last access date report on page 92, files last accessed between the past 3 and 9 months are good candidates to be evaluated and migrated to a lower tier. Files last accessed over the past 9 months may be considered candidates for migration to the lowest-cost storage or archived to a sequential media (for example, optical or tape). In the second report, described in Last modification date report on page 94, files last modified between the last 6 to 9 months may be considered candidates for migration to a lower-cost storage. But a further study should be done to check which storage they are using and which application is using them. If they are located in high-performance-cache-write disks, then evaluate whether storage with this characteristic is still needed for these files.

5.5.5 Actions to RDBMSs space


In Reporting database unused space on page 96, TPC for Data generated reports with the amount of storage used by RDBMS databases, datafiles free space and segments wasting storage. Check the storage space unused by databases and the amount of free space remaining in datafiles and report them to database administrators. This space can be used to satisfy future needs and growth. When more space is needed, expand the database in the unused space instead of allocating more disks. If data files have free space, reclaim it with procedures to reduce the databse size. For example, check with database administrators if it is possible to export and import the database to reduce the storage space.

5.6 ILM - Return on investment (ROI)


We should expect an ILM project to generate a return on investment; but how can this be calculated? ILM is a service level solution, and an ROI calculation should consider savings in many areas of investmenthardware, software, services, and support costs, among others. The main ROI factors focus on improvements in application availability, storage utilization, and personnel productivity. Some or all of the main elements of ILM can be implemented to meet the data management needs: Tiered storage management Long-term data retention Data lifecycle management Policy-based archive management Organizations should invest in information management solutions to be able to grow storage, and back up and move data without impacting access. They should have a centralized control for backup, restore, copy services, and a common pool for unallocated storage. They need

98

ILM Library: Techniques with Tivoli Storage and IBM TotalStorage Products

detailed knowledge of storage device utilization to be easier to plan storage capacity and growth. We describe in the next sections some facts to improve storage return on investment.

5.6.1 Data classification and storage cost


After reclaiming space by deleting invalid files and extra versions of duplicated files, the ROI is demonstrated with: Additional space on existing storage using the delta between the actual and future utilization of existing storage. Reduce the utilization of actual storage, and thus reduce the amount of disk storage space needed to be purchased. Data on storage with the appropriate quality of service and cost. Enterprise storage resources are used for critical data only and their reclaimed space is used for critical data growth or new critical applications. Recoverability, availability, and accessability of enterprise storage data are alll improved, because invalid data is not consuming critical storage tier space.

5.6.2 Data management and personnel cost


An important element for calculating ROI in an ILM solution relates to application outages, in particular, how much of these outages are due to the problems caused by the difficulty of managing storage. The cost of the outages is measured in terms of revenue, profit, or savings lost due to downtime. In many organizations, between 10 to 20 percent of outages are related to problems with storage management, for example, application outages caused by no free space. With ILM techniques, manual effort is reduced for functions such as: Gathering the storage-used information and analyzing it Planning and implementing new applications that require significant amounts of storage Moving and migrating data Planning for storage growth Going server to server to individually manage storage The reduction of manual effort can be done by investing in: Better tools that make it easier to create new storage for new applications or make it automatically. Ability to move and migrate data nondisruptively, virtualizing storage and defining policies to automatically migrate data among storage tiers. Manage server data from a centralized point. Centralize access to storage. These techniques will strongly improve return on investment, reducing storage personnel cost and application and storage downtime.

5.6.3 Long-term retention and non-compliancy penalties cost


The first step in a companys compliance efforts should be to assess the effectiveness of its current internal controls and information management processes. Identification of risks and controls and evaluation of the effectiveness of these controls are important processes for organizations that need to be compliant to regulations. Some non-compliancy penalties
Chapter 5. An ILM quick assessment

99

should be considered when investing in long-term retention solutions. For example, Sarbanes-Oxley(SOX), which requires CEOs and CFOs to personally certify quarterly and annual financial statements, can bring fines up to $5 million or 20 years in prison if violated. Other specific industries, like healthcare and life sciences, insurance, banking, etc., have compliance regulations and penalties costs that are important values when considering ROI on ILM long-term solutions. In defining the technical, business, and regulatory requirements for archiving space, it is critical to consider a solution's key features and whether those features meet the organization's archiving and compliance needs. ILM techniques for retention, driven by storage costs and return on investment, are a major focal point in approaching a solution of this nature. The impact of the solution with respect to an overall enterprise content management initiative, as well as support for a greater storage management and compliance infrastructure, is key to defining an ILM project's success.

5.6.4 Backup/archiving solutions cost - Disk or tape


Tape solutions have been used as the main backup and archive storage media for many years. But with the need to recover critical data quickly and make the data available in seconds after a disaster, disk solutions have also become an important media type for fast recovery. The emergence of Serial ATA (SATA) technology and disk cost reduction are contributing to make disk another media solution for backup and archive storage. A disk-to-disk backup process can be done to make copies instantly. Usually called snapshot copy, point-in-time data is copied from one disk to another disk and can be restored rapidly. For instant disaster recovery, disk-to-disk remote mirroring solutions make data available in seconds after disk in the primary site is lost. A target disk located in the secondary site is continuously updated with primary site disk change. Disk-to-disk backup has many benefits, but cannot replace tape completely. Tape has several unique benefits that should be considered when investing in backup and archive storage, for example: For disaster recovery environments, tape is removable for offsite storage and it is inexpensive. For backup requirements where several versions are needed. Disk-to-disk backup can be used for the current data backup version, but the cost and capacity of disk makes it less viable for maintaining multiple (older) backup versions. Tape is very useful in this instance to store many versions and provide historical point-in-time recovery. For scalability and growth. The capacity needed to back up data is growing at the same rate as data volume increases. Disk growth needs more controllers, floorspace, and software. Tape is much easy to scale by simply adding more tape cartridges. For long-term retention. Regulations and compliance require lengthy retention periods (years) for data that may not be accessed for a long period of time. Using disks to store all the compliance data will be very costly. Tapes are the most cost-effective storage for long-term archival requirements. When investing in backup or archive storage, consider not only the price per gigabyte of storage, but also: Disk raw capacity and average utilization Tape average utilization Hardware configuration cost 100
ILM Library: Techniques with Tivoli Storage and IBM TotalStorage Products

Controllers and software for servers Environmental costs Typically, including all the information above to calculate the entire cost will show that disk solutions can be up to 11 times more expensive than tape solutions. Therefore, the best choice between disk or tape solutions for backup and archive depends on access, amount of data, and recovery time objectives. If the business states that critical data must be restored in seconds, disk solutions are the best choice. For fast restore of some data and lower cost storage, tape is the best solution.

5.7 ILM Services offerings from IBM


You can engage IBM to help perform the ILM assessment in your enterprise. Among the offerings is the 46 week ILM Assessment study that addresses the following: Introduction to ILM concepts, solutions, tool, products, and services High-level review of storage issues and challenges Prioritized recommendations, benefits, financial analysis, and next steps You may then choose to have a deeper analysis of the storage environment to help you answer questions about the following: Development of an ILM strategy, roadmap, or implementation plan Data classification review and recommendations ROI and TCO analysis for an ILM solution Regulatory compliance analysis and guidelines For more information about these and other service offerings, please consult your IBM representative.

Chapter 5. An ILM quick assessment

101

102

ILM Library: Techniques with Tivoli Storage and IBM TotalStorage Products

Chapter 6.

The big picture for an ILM implementation framework


After reading the previous chapters you now know the different business drivers to implement ILM, the key technology enablers, and the different benefits. You know that ILM requires a change in thinking to leave the traditional function-based storage model and evolve to a service-based storage model. In this chapter you will learn about: The big picture and why you should care about it Some entry points to ILM, whether ILM is for you, why there is not one packaged solution Why ILM is more then tiered storage Why ILM is not the final step to the ultimate storage model How the different elements of ILM fit together

Copyright IBM Corp. 2006. All rights reserved.

103

6.1 The big picture and why you should care about it
There has been much discussion around what ILM is. The best way to understand it is to know where the ILM best practices fit in your IT environment. We give you an example of such an IT infrastructure, as shown in the big picture in Figure 6-1 on page 105. This figure represents a sample environment only. There are four main blocks represented by the large boxes: Business, Consulting, Assessment, Definition Application, Server Hardware Software Infrastructure, Automation Hardware Infrastructure We will show you on the following pages what components are part of each group and how they interact. This will enable you to recognize where your current software and hardware fits. This also gives you a starting point to implement ILM best practices, and to logically place them in a new framework. Finally, you see in Figure 6-1 on page 105 on the left side a description of the major components. On the right side you see the orchestration tools.

104

ILM Library: Techniques with Tivoli Storage and IBM TotalStorage Products

Enterprise Storage @ services based model level


Consulting, Assessment, Implementation and ongoing tasks Output are Service Level : Platinum Gold Silver Bronze IBM DB2 Content Manager Metadata for OnDemand, Record Management Teamwide Collaboration, Content, VideoCharger, CommonStore Software Infrastructure / mechanisms toward Automation TPM (Tivoli Provisioning Manager) Orchestration

Server / Application Infrastructure Platinum

Gold Server / Multiple OS


Mail Apps File / NAS

Silver

TPM (GUI)

WEB

DB

TSM (Tivoli Storage Manager) HSM (scheduled) T4

CM System Administrator Client TSM & HSM (GUI)

Pools
TSM Agent TPC Agent Content Manager Agent Subsystem Device Driver (SDD) Multipath staged
TSM & HSM T4 TSM for DR

TPC (Tivoli Productivity Center) for data : Monitor, Reports SMI-S for fabric: Manage SAN for disk: Manage Disk for replication: Manage HW CopyServices

TPC (GUI)

T1 static

T2

T2

static

T3

T3

LAN ( File Level ) SAN ( Block Level )

Storage Hardware Infrastructure

T1 Platinum Enterprise Virtual Disk Disk

Tier implementation staged online Migrate

T2 Gold Virtual Virtual Disk Disk

T3 Silver Virtual Virtual Disk Virtual Disk Virtual Disk Disk Virtual Disk

T4 Bronze Tape WORM Tape

Compliance

WORM Tape Vault

Pools

Virtualized Filesystem (Global Namespace) SMI-S SAN Volume Controller (SVC) Archive staged TSM for DR Backup WORM Disk

Enterprise
DS8000, DS6000, ESS & non IBM Products

Mid Range
DS4000 (Fiber) & non IBM Products

Cost centric
DS4000 (SATA) & non IBM Products

Backup
DS4000 (SATA) / Enterprsie Tape Library & non IBM Products

Archive
DR550 & non IBM Products

Storage Environment

Backup Environment

Archive

Figure 6-1 ILM implementation framework at service level maturity

6.1.1 Business consulting, assessment, definition


The aim of ILM best practices is to align information to business requirements through management policies and service levels associated with applications, metadata, and data. If your organization is large, has complex IT environments and distributed storage solutions, the initial assessment can be quite challenging. IBM can work with you to define this baseline or you can do it yourself. We provide some guidelines for this assessment in the following chapters. During the assessment, you will identify the value, lifecycle, and classification of information for each business unit. This collaboration requires not only input from IT service management, but also from the business processes and business applications owners. This
Chapter 6. The big picture for an ILM implementation framework

105

collaboration is very time intensive, but crucial for the rest of the ILM implementation. Depending on the size of your organization, this baseline assessment may take weeks or even months. IBM services can help you shorten this time by giving your organization a common context and frame of reference for discussing ILM within your organization. We recommend using a Storage Resource Management (SRM) tool (such as IBM TotalStorage Productivity Center for Data) to identify the information assets and infrastructure resources and services. An overview of this was given in Chapter 4, Product overview on page 37. Chapter 5, An ILM quick assessment on page 65, gives you a selection of reports we used to identify data. After a first step in ILM implementation, you need a tool like TPC for Data to monitor the changes. You want to compare the monitored data to your new service levels to see if you met the expected results and benefits. You may want to refine the service levels you defined. The next step would be an enforcement of those policies. Although this task will still be manual work in the begining of your ILM implementation, your policy-based service enables you now to automate this tasks with ILM management tools. Figure 6-2 summarizes the steps in this part of the implementation.
Consulting, Assessment, Implementation and ongoing tasks
1

OS

Apps valid data

Service Level Gatherings

Data Classification

Adding business value to data

Finding Information Classes Assessment and Planning Designing Storage Tiers

Deciding ILM Policies

Implement Monitor SL / Target


Figure 6-2 Business, assessment, and ongoing tasks

Enforce

6.1.2 Application and server hardware


Applications or physical servers are often used to define service levels and tiers. The choice of operating system may dictate the tools available. Using a single operating system or a small number of platforms simplifies the communication to each ILM management tool. This is because, for key monitored and ILM-operated servers using IBM software, an agent is required. This agent is available for most major operating system platforms. For specific details see the IBM support site for each product. Organizations with

106

ILM Library: Techniques with Tivoli Storage and IBM TotalStorage Products

many different operating systems see a great benefit in having only one type of agent across the different operating systems. In our big picture we used these four agents, as shown in Figure 6-3: Tivoli Storage Manager client The Tivoli Storage Manager client is the Backup Agent. It is used to communicate to a Tivoli Storage Manager Server for backup and archives, and can transfer data over either the LAN or the SAN. The Tivoli Storage Manager client optionally includes the Hierarchical Storage Management (HSM) Agent. TPC Agent The Totalstorage Productivity Center (TPC) Agent is used to communicate with the TPC Server components. It is used to monitor the file systems from a server point of view. CM Agent The Content Management (CM) agent communicates via LAN to the CM Server. This agent is used to check the policies and can be used to execute scripts on policy violation. SDD Driver A common problem in complex storage configurations is the requirement to install and maintain multiple device drivers, to handle hardware from different vendors. With a virtualized solution, like IBM SAN Volume Controller (SVC), you need only one disk multipathing driver, while supporting hardware from many storage vendors.
S erver / A pp licatio n Infra structure P la tin um

G old S erv er / M ultiple O S


M a il Apps F ile / NAS

S ilv er

W EB

DB

T SM A ge n t TPC Agent C o n te n t M an a g er A g en t S u b s ys tem D e v ic e D rive r (S D D ) M u ltip a th

TSM & H SM T4 T S M f or D R

sta g ed

T1 static

T2

T2

static

T3

T3

Figure 6-3 Server types and agents

Figure 6-3 also shows a sample tiered storage setup, which is mapped to the storage classes. So in this setup, the Web, database, and mail servers have been determined to require a platinum class of storage, and within that class are two storage tiers, T1 and T2. In the gold class, T2 and T3 are used, while in the silver class, T3 is used, plus an additional tape-based tier, T4.

6.1.3 Software infrastructure and automation


ILM best practices require some software tools to be standardized in your enterprise. Figure 6-4 on page 109 shows the major software components of a full ILM implementation. These are: Backup/Restore/Archive/Tape Library Management/HSM tool
Chapter 6. The big picture for an ILM implementation framework

107

Storage Resource Management (SRM) tool Provisioning Manager Content Manager Data mover to automate the enforcement of policies Each software component is described in more detail in Chapter 4, Product overview on page 37. We suggest first looking at what you already have; nearly every client has one or more automated backup products. These should be standardized on one comprehensive product if possible. Many enterprise backup software solutions provide a wide range of operating system support, which is key for a centralized solution. The backup solution should have a script and/or API to provide automation, and also allow automatic deletion of data (compliance, or non-business) after it has been archived. The SRM tool is used during the assessment phase to collect data on how much data you have. In the post-implementation phase, the SRM tool (for example, TPC for Data) allows you to monitor your service levels. In this case you define reports according to your defined policies to represent the service levels you want to meet. A next step would be the automation of data placement. This requires, among other things, a provisioning manager tool (for example, Tivoli Prosivioning Manager). Hierarchical Storage Manager (HSM) is the tool used to dynamically move data to less expensive storage tiers. According to your defined SLAs, which are represented as policies, you use HSM to analyze and move infrequently accessed data to the lower tier, leaving the sub-file behind. If any application or user needs to access the file, the HSM agent will automatically recall the file from storage. Most backup/HSM and SRM tools (including Tivoli Storage Manager and TPC for Data) use metadata such as the creation/modify/accesss dates, file name, and file type for operation. For more sophisticated file management criteria, you need to implement a content management solution; see Chapter 9, Data lifecycle and content management solution on page 165, for more information. If data retention is needed for compliance solutions, you can use an appliance-based solution, such as the IBM System Storage DR550, or a more general purpose solution, such as IBM System Storage Archive Manager.

108

ILM Library: Techniques with Tivoli Storage and IBM TotalStorage Products

S oftware Infras tructure / m ec hanis ms toward A utomation TPM (Tivoli P rovisioning Mana ge r) O rc hes tration

IBM DB2 Conte nt Ma nage r M etadata for OnDem and, Rec ord M anagem ent Team wide Collaboration, Content, V ideoCharger, Com m onS tore

TPM (GUI)

TSM (Tivoli S tora ge Mana ge r) HS M (sc heduled) T4

CM S ystem Administra tor Clie nt TSM & HSM (GUI)

Pools

TP C (Tivoli P roductivity Center) for data : M onitor, Reports for disk : M anage Disk for replic ation: M anage HW CopyS ervic es SMI-S for fabric : M anage SA N

TP C (G UI)

Figure 6-4 Software components

6.1.4 Hardware infrastructure


The hardware environment has three components: Storage environment Backup environment Archive environment In each of these, we might use different types of storage media. See Figure 6-5 on page 110 for an overview. It shows the more expensive and reliable storage on the left. As you move to the right it becomes a lower class of storage, down to backup, and finally archives. This corresponds to the selection of tiers: From right to left is tier 1 to 4 in this example. In a horizantal perspective, we show the virtualized environment, using the SAN Volume Controller (SVC). Overall, the components can be managed from a centralized SMI-S management point.

Chapter 6. The big picture for an ILM implementation framework

109

g SAN ( Block Level ) LAN ( File Level ) T4 Bronze Virtual Disk Tape WORM Tape

T1 Platinum Enterprise Virtual Disk Disk

Tier implementation staged online Migrate

T2 Gold Virtual Virtual Disk Disk

T3 Silver Virtual Virtual Disk Virtual Disk Virtual Disk Disk

Compliance WORM Tape Vault

Pools

Virtualized Filesystem (Global Namespace) SMI-S SAN Volume Controller (SVC) Archive staged TSM for DR Backup WORM Disk

Enterprise
DS8000, DS6000, ESS & non IBM Products

Mid Range
DS4000 (Fiber) & non IBM Products

Cost centric
DS4000 (SATA) & non IBM Products

Backup
DS4000 (SATA) / Enterprsie Tape Library & non IBM Products

Archive
DR550 & non IBM Products

Storage Environment

Backup Environment

Archive

Figure 6-5 Storage Hardware Infrastructure

Storage environment
As client space requirements grow, the request for new storage solutions is not only based on user, application, and storage subsystem features. The serviceability of the new configuration is one major step that requires you to have more flexibility to choose among different vendors and technologies. Market-leading products in particular may often be incompatible with other vendors hardware products. Often the interconnection is simply impossible, or can only be achieved by giving up the advantage of premium features. In storage products these are often copy services like synchronous copy (PPRC) or FlashCopy and snapshot features. Although most enterprise products include such additional features, many of them are not compatible between different products even from the same vendor. Virtualization is a key means to attack this problem with a product such as the SVC. As you can see in Figure 6-5, the SVC is used between the different storage products and the virtual logical drives represented to the server operating systems. It provides these advantages: Independency of disk vendors Tested interoperability Simplification in storage server deployment, because of a single multipath diskdriver Reliability by mirroring virtual disks across different vendors storage subsystems Single provisioning point for all your disk management tasks Freedom of online migration across tiers For a detailed description of the SVC features see Chapter 4, Product overview on page 37. ILM best practices require the flexibility to move data across tiers. SVC is one possibility for manually migrating data into the new storage environment. While implementing ILM best practices, the ability to build tiers is one of the foundation tasks to be done. Usually this includes different moving tasks across an existing storage infrastructure, but also the task of migrating to new storage products. Downtime can be minimized by leveraging the import and migrate features of SVC. As the SVC moves data across the SAN, the migration can be done without minimal impact to the servers. Among storage and server practitioners, a very commonly used storage partitioning rule is to have the same logical drive size for every server. While this simplifies administration, the big disadvantage is wasting space. ILM best practices save space by exactly assigning the space required. Obviously there is a need to get more space in the future or to evenly spread the access to storage among all available physical disk resources. This is another feature of

110

ILM Library: Techniques with Tivoli Storage and IBM TotalStorage Products

SVC. You now use pools with standard-sized chunks to create so-called virtual disks. You can either increase individual virtual disks on the SVC, or add new virtual disks and concatenate them with a Logical Volume Manager (LVM). If a virtual disk is not being used, it can be deleted, returning the free space to the pool for future re-allocation. Interoperability often requires diligence to find a common known level of features. Advanced features like copy services are almost never interoperable among vendors. With a tiered storage environment there can be a need to copy data across tiers, meaning from an enterprise storage product to a midrange or low cost storage product. The copy service feature is almost never compatible outside of the component class. SVC provides storage class and vendor-independent SAN-based copy services. This also simplifies the management of copy service licences across the enterprise, as you only need one licence. Check out your savings in copy service licenses.

Backing up data from the storage environment


Most backup solutions are built on a client-server structure. It is the responsibility of the backup server to decide where the backup data is stored, thus the movement is always controlled by a backup server. This also means that backup metadata is transported across a LAN connection, while backuped data itself can be sent either through a LAN or through a SAN connection. ILM best practices help simplify the backup configuration by reducing the amount of data to be backed up. See the Redbook IBM Tivoli Storage Manager Implementation Guide, SG24-5416, for more details.

Backup environment
Traditionally, the backup environment consists of tape libraries as the ultimate backup destination. Consolidation can be achieved by using midrange to enterprise libraries, which can be divided into independent partitions. Flexibility is given to move resources like tape drives or tape slots across logical libraries without hardware intervention. Todays backup solutions are often based on the principle of save everything because of the simplicity of implementation. However, such a mentality leads to expensive implementation because of: High hardware cost for many duplicated backup copies Higher tape usage and more required tape drives for tape handling Increasing backup time as data grows Increasing restore time as data grows As your environment evolves to an ILM process-based storage environment, you are not only saving data storage costs, but also reducing backup times.

Archive environment
Today, data archiving is controlled not only by law, but also by corporate governance rules and the protection of corporate assets. Hence, there are company critical assets that need to be archived. Typically the retention time for this data is several years; however, for specific data, this can can vary from a few months to forever. And this can change, especially if there are ongoing litigations requiring your archived data. To differentiate transactional data from retention data you should consider some storage and data characteristics, as described in Table Table 6-1 on page 112.

Chapter 6. The big picture for an ILM implementation framework

111

Table 6-1 Retention-data characteristics 1 Storage characteristics of retention-managed data include:

Variable data retention period: Usually a minimum of a few months, up to forever. Variable data volume: Many clients are starting with 5 to 10 TB of storage for this kind of application (archive) in an enterprise. It also usually consists of a large number of small files. Data access frequency: Write once, read rarely, or read never: See data life cycle in the following list. Data read/write performance: Write handles volume; read varies by industry and application. Data protection: Pervasive client requirements for non-erasability, non-rewritability, and destructive erase (data shredding) when the retention policy expires.
Data characteristics of retention-managed data include:

Data life cycle: Usage after capture, 30 to 90 days, and then near zero. Some industries have peaks that require access, such as check images in the tax season. Data rendering after long-term storage: Ability to view or use data stored in a very old data format (say after 20 years). Data mining: With all this data being saved, we believe there is intrinsic value in the content of the archive that could be exploited. ILM best practices urge you to do the classification of your data as one of the first steps. But it is important to understand that there is no tool able to merge your retention value to data automatically. This is one characteristic defined in your information classes during the first ILM assessment. So the output of this ILM assessment makes you decide on what ILM policies you will use to automate your data management. It is important to understand that these policies are constantly changing, depending on the numerous compliancy rules. You should consider seeking appropriate legal counsel to ensure your solution is in compliance with those requirements. Clients remain responsible for ensuring that their information technology systems and data retention practices comply with applicable laws and regulations. The key software products of the IBM data retention and compliance solution are IBM Tivoli Storage Manager (including IBM System Storage Archive Manager), IBM DB2 Content Manager, and TPC for Data. The required hardware infrastructure uses disk space and WORM-capable tape drives or optical libraries. The IBM System Storage DR550 is a preconfigured solution for data retention and is described further in the Understanding the IBM TotalStorage DR550, SG24-7091. The main focus of the DR550 is to provide for a secure storage system, where deletion or modification of data is completely disallowed except through well-defined retention and expiration policies. ILM-based archiving practices do not focus only on compliance. Depending on the defined service levels for your data, you may want to archive non-compliance-based data. The same tools can be used to analyze your data, for example, TPC for Data.

6.1.5 Management tools


Administrators are confronted with a huge amount of different hardware management tools. This makes the management more error prone, and human error is usually the most common cause of service outages. With the increase in different tools, much time and effort is expended on education and support of these software products. For enterprise clients the complexity of different vendors interoperability is also present. ILM best practices move

112

ILM Library: Techniques with Tivoli Storage and IBM TotalStorage Products

toward a centralized management solution. A first step was made by the Storage Networking Industry Association (SNIA) by establishing a storage management initiative specification (SMI-S). The interface to can be used to implement the following functionality: Volume management LUN masking/mapping Asset management Status/event monitoring This interface is based on the common information model (CIM) and Web-based enterprise management (WBEM) technology. With CIM/WBEM technology, clients are able to manage multi-vendor products with a single management application. Figure 6-6 shows where this technology can be found.

Management applications

Demo client

Device Manager

WBEM
(XML over HTTP)

CIMOM CIMOM Array Provider Provider

CIMOM CIMOM Library Other Library Provider Provider

CIMOM SNIA-SML Provider

CIMOM SNIA HBA API Provider

CIMOM CIMOM Switch Switch Provider Provider

Disk arrays

Tape Libraries

Tape Libraries

FC HBAs

FC Switches

Figure 6-6 Placement of WBEM and CIM technology

We used the IBM Totalstorage Productivity Center (TPC) to manage our lab storage environment. There is a TPC section in 4.2, TotalStorage Productivity Center for Data on page 38. The cross management of different vendors hardware is a very young and growing field. You should carefully review the compatibility and interoperability features of a management tool. The best way to test the functionality in your environment is done by performing a proof of concept (POC) with management tools of different vendors. As the next step in ILM best practices is to automate your tasks, you should pay close attention to the ability to use scripts with these management tools. As a goal, a centralized storage infrastructure management should be able to combine or automate several subtasks to help the administrator solve daily management tasks. An example could be following situation: Your application administrator requires more disk space. You as the storage administrator know the required service level and the derived profiles. Now you need to know the required size and performance of the new disk space. Today this task would include the use of several tools on different storage products. It is a time-consuming task a lot of administrators have to ignore because of lack of time, tools, or even lack of understanding of the higher management for a new investment need. Now imagine a centralized storage management tool using its ability to check all your storage assets and giving you the best option by enforcing the service levels, where you get an estimate on your storage growth to be requested in advance and a final report to enable exact charge back to your clients.
Chapter 6. The big picture for an ILM implementation framework

113

Manual storage management is a risk. The intent of management automation is to eliminate the possibility of human errors and to reduce the time to provide storage to a server. These tasks are commonly known as storage provisioning. An example product is IBM Tivoli Provisioning Manager (TPM) with a storage-only focus. How does it work today without storage provisioning automation? Manually. A manual storage provisioning process requires a wide-spread knowledge of products, from software to hardware. It includes many different management tools, and in enterprise companies the coordination of several administrators across business lines. This is time consuming and includes many points of misunderstanding and introduction of errors. As an example, here is a list of the common steps that would be required to assign a new SAN-attached LUN to a server: 1. Add a volume (storage subsystem). a. b. c. d. e. f. g. h. i. j. k. Select a storage subsystem. Select or create a new volume. Select host HBA ports (WWNs). Select subsystem controller ports (WWNs). Map the volume to controller ports. Map the volume to host HBA ports. Determine whether multiple paths are required. Create or update zones. Get an active zone set. Add a zone to the zone set. Activate the zone set.

2. Set paths (SAN fabric switches).

3. Set up replication (if necessary). 4. Map the HBA LUNs to the operating system and file system. 5. Update the volume group and file system (host server). a. Add the physical volume to the volume group. b. Add the physical volume to the logical volume. c. Create or extend the file system. 6. Extend the application to use additional space. 7. Reconfigure backup. This is an impressive list of skills an administrator needs to have and requires a lot of planning and coordination. Furthermore, you need to know your current environment. You need the documentation of the actual configuration of each involved device and also a change management document. Often each change requires a myriad of checks to be sure you have up-to-date information. Tivoli Provisioning Manager, especially in conjuction with TPC, can be used to automate these tasks. Its ability to perform tasks across end-to-end components and management disciplines is the great benefit of using TPM. If you are now thinking of automating tasks like the one in the example above, bear in mind that it is not always a good idea to copy only manual steps to create an automation process. A great amount of time should be spent defining the automation steps. Repeat the steps in your workflows to know where the risks are. Usually, defining the first automation task is the most complex, but once this first step is taken, TPM will significantly simplify and reduce the amount of time to manage your environment.

114

ILM Library: Techniques with Tivoli Storage and IBM TotalStorage Products

Note: For more information about TPM see the following IBM Redbooks and Redpaper: An Introduction to Storage Provisioning with Tivoli Provisioning Manager and TotalStorage Productivity Center, REDP-3900 Exploring Storage Management Efficiencies and Provisioning - Understanding IBM TotalStorage Productivity Center and IBM TotalStorage Productivity Center with Advanced Provisioning, SG24-6373 Provisioning On Demand Introducing IBM Tivoli Intelligent ThinkDynamic Orchestrator, SG24-8888

6.2 What to do now - The many entry points to ILM


When considering ILM, it is common for organizations to start with a target like implementing tiered storage to reduce cost. This transformation is one ILM practice, but there are more. In fact, tiering is sometimg erroneously considered to be synonymous with ILM. Most small and medium businesses (SMBs) gets in contact with ILM because of tiering. Other entry points for ILM are in conjuction with the introduction of new disaster recovery capabilities, new security in classification of data, or new regulations or compliance requirements for long-term data archiving. The foundation of all these entry points is tiering. Tiering your storage means defining a hierarchy of storage systems based on service requirements. But in an ILM context tiering is only the second step, for example, the output of the first step. The first step, as we have seen already, is to know what kind of data you have and to add business value to the data. As covered in previous chapters, going through a complete ILM process involves working through a number of business processes; establishing service level agreements (SLAs); classifying data; creating information classes; and implementing, regularly monitoring, and reviewing your changes. You will need to consider, when deciding whether to embark on an ILM project, whether the end cost/efficiency savings will justify this investment. Get the big picture first. Where is ILM evolving from? What is the roadmap? Many service offerings are available today for a first ILM assessment. After one week of work the output of such a review may not be quite what was expected, particularly for smaller enterprises, where the amount of application and services is limited to a few business-critical applications. Such small environments have service levels bound to every single application, and the hardware infrastructure is homogenous, containing one single class of disk storage. Your review may end up telling you to have three service levels for your three applications, which was exactly the way you where running your environment before ILM. The advantages lie more in the storage cost savings you can achieve by eliminating the need of buying more storage. You should ask yourself why there is a significant data growth in your company. The answer highly depends on you as the client. It may be because of new applications, or you are simply running out of volume space, to name a few possibilities. Companies are growing by users, increasing storage requests and wasting more and more space with redundant data. ILM best practices define policies and give you the tools with the power to manage all this data by stemming the constant growth in storage requirements. It is much less expensive to introduce these practices at an early stage of storage growth. If you see the possibility of your data growing significantly in the near term, you should consider first implementing ILM best practices. And although a final ILM implementation involves significant automation, we suggest starting easy by analyzing your environment and deciding, after your know about your data, what could be automated.

Chapter 6. The big picture for an ILM implementation framework

115

A common misunderstanding is implementing tiered storage and expecting this alone to account for significant storage savings. In these cases, often storage costs continue to rise. How can this be? The fact is that even though raw dollars per megabyte of storage continue to decrease, the overall costs increase, because management costs form an increasingly higher proportion of the total costs, combined also with the costs of monitoring and regulatory compliance. Enforcement of your service levels must be done by using your policies, which are the output of an ILM assessment, to configure storage resource management software. Now that we have looked at the big picture, we can go into some specifics. There are examples of particular implementations in Chapter 7, ILM initial implementation on page 119, Chapter 8, Enforcing data placement on page 153, and Chapter 9, Data lifecycle and content management solution on page 165.

116

ILM Library: Techniques with Tivoli Storage and IBM TotalStorage Products

Part 3

Part

Sample solutions
This section provides a three-stage solution process for implementing ILM.

Copyright IBM Corp. 2006. All rights reserved.

117

118

ILM Library: Techniques with Tivoli Storage and IBM TotalStorage Products

Chapter 7.

ILM initial implementation


This chapter describes how to implement parts of an ILM solution that will reduce overall storage infrastructure and management costs in a short period of time. The solution will be based on three pillars, which are: Establishing a storage service management process Reducing allocated storage space Implementing tiered storage As this is the primary entry point, we provide an ILM concept that is static in the sense that data will not move automatically between the active disk storage tiers over time. The solution is based on these three components: TotalStorage Productivity Center for Data SAN Volume Controller IBM Tivoli Storage Manager for Space Management

Copyright IBM Corp. 2006. All rights reserved.

119

7.1 Storage management


A first step in the initial ILM implementation scenario is to implement and adapt storage management practices. The goal of this step is to create a service-based storage environment that has clearly defined goals and rules on what will be delivered to the users. When starting to define a service-based environment, a couple of things should be documented and created. The starting point is always the so-called governance model. The governance model is the high-level service definition, which includes: The principles, which define the general rules by which the storage service is implemented and run. For each defined principle, the benefits that it brings to the business should be listed, as they are the justification for adopting it. In addition, principles should include the key tasks, required organization, and cost for the application of the principle. An example of a principle could be: All business related data residing on the IT Systems will be stored on a centrally managed storage infrastructure. Without such principles there is no foundation for any decision leading to change, making it very difficult to justify the changes. This becomes even more important when embarking on ILM projects, as they set a direction for the future that might take some time to realize fully. The policies, which define how things will work from a high-level perspective. For example, a policy could define who has the responsibility for a certain task or how a certain goal will be reached. A policy could, for example, define the security level for certain data, who can access it, and why. Finally, the guidelines provide a view of the future: What are the expected goals and their benefits. For example, in an ILM context, a guideline could be that occupation of storage devices should be utilized to a certain level (for example, 85 percent ), or that data should be stored according to its business value. Although the above might seem abstract and exaggerated, going through this exercise will set a basic framework to facilitate the decisions required to reach the final goal. Implementing an ILM-based solution will require cooperation and decision making at a global level in an organization, with clear and stated management support. Next to the governance model, the service definition should also contain the processes that will be followed. The following list provides an overview of possible processes involved in storage management activities: Capacity Management Provisioning Management Performance Management Procurement Monitoring and Alerting Reporting Backup Asset Management Incident/Problem Management Policy Management, Relationship Management, Billing and Charge back When mapped to the ITIL framework, the following two processes are most closely related to the first step of the ILM solution: Capacity management Service level management Although other ITIL processes like availability management, Business Continuity, change management, and financial management are also important, they will not have a direct input

120

ILM Library: Techniques with Tivoli Storage and IBM TotalStorage Products

for the ILM implementation tasks. There will be relationships between all of these processes. For example, availability management will be required to control violations in the service level managed by the service level management process. Also, financial management will be called upon for expansion of capacity when the capacity management process requests such an expansion. In the following two sections we focus on the primary two processes, which will provide valuable input for the design of the ILM solution.

7.1.1 Capacity management


The first process that we discuss is the capacity management process. The capacity management process focuses on the following three points: Monitoring the IT infrastructure and supporting components (resources) from a performance and usage aspect Improving efficiency by performing well defined tuning actions Planning for future requirements and growth, including the refresh of technology Note: The ITIL definition of capacity management includes performance management. The main goal of this process is to have the available infrastructure to meet requirements as they arise, at the most optimal cost. If this process is not followed, infrastructure expansions will tend to be reactive (rather than proactive), and cost analyses will be done ad hoc for individual requirements, rather than included in a general plan. Often costs will be higher if requests are handled and procured individually rather than grouped together (more components in one procurement action). In addition, a capacity management plan ensures that no components are forgotten when a new request is fulfilled. For example, when adding systems to a storage environment, you need to not only review storage capacity, but also SAN capacity, cabling, host FC HBAs, and so on. A process as defined by ITIL will always have an input and an output (see Figure 7-1). Depending on the type of process, the required input information and output results will differ.

Input

Process

Output

Figure 7-1 Generic process

For the capacity management process, the following input can be used: A technology review, in order to understand how it can help to achieve the strategy and goals. Since we are providing a storage service that relies heavily on the available storage infrastructure, knowing what the current capabilities are will help to define the possibilities on offer. An example of this is the available I/O rate on disk devices. It makes no sense to specify a rate that is higher than that currently available on the market.
Chapter 7. ILM initial implementation

121

The existing service level agreements (SLAs). In 7.1.2, Service level management on page 124, we describe the service level management process. The link to the capacity management process is that a good service level agreement should specify the expected growth and future projects that are planned. For example, a service level agreement might state that a certain application currently needs 100 GB of disk capacity, and that it will need an additional 100 GB in 6 months. In addition, the SLA will also state the expected performance levels. The planned change requests. The financial and budgetary information, which provides a view on what can be spent on capacity or performance enhancements, both for the planned growth and the tuning operations. The output of the capacity management process includes the capacity plan, the required capacity reports, alert levels, and recommendations to include in the service level requirements. Now let us consider the capacity management process activities or tasks.

Tasks in capacity management


Figure 7-2 overviews the tasks that make up the capacity management processmonitoring, change, tuning, and analysis.

TPC for Data Capacity reports (current/growth) Usage of file systems and databases Tuning

Change

Analysis

Monitoring

Service level management Exception reports


Figure 7-2 Capacity management tasks

Resource utilization Exception reports

The monitoring activity will monitor the utilization of the storage resources. This includes usage (current and future) and performance. As these are parts of the service level definitions for performance, it can be used to report on service level violations. The reports required can be created using TPC for Data. An example is shown in Figure 7-3 on page 123.

122

ILM Library: Techniques with Tivoli Storage and IBM TotalStorage Products

Figure 7-3 Space usage over time

In addition to the growth reports, TPC for Data can also be used to create exception reports whenever the capacity of a storage component reaches a defined high threshold. The analysis activity will be used to show trends in performance and utilization, and can be used to plan for growth. Note that growth can be attributed to two factors: Natural growth from existing applications and usage Change-induced growth from changes to or implementation of new applications. This analysis process will only reveal natural growth. Change caused by growth is an input to the capacity management process coming out of the change management process. In addition, the analysis phase can be used to compare current utilization against the baseline set in the initial architecture. The tuning step and subsequent change step will define and implement better resource usage techniques based on the result of the analysis phase. It is logical that after a change action all phases will reiterate to review the change and find new issues or exceptions to the defined service levels.

The capacity plan


One of the goals of the capacity management is the creation of a storage capacity plan. A capacity plan will be the primary input for all further storage infrastructure investments. As a result, it will allow the fulfillment of future storage requests, in line with business requirements. The plan should contain a current usage report, and future utilization trend informationideally, split into near-term, mid-term and long-term.

Chapter 7. ILM initial implementation

123

7.1.2 Service level management


Along with capacity management, service level management is a very important part of the storage management optimization process. The goal of service level management is to create and improve an IT Service (like storage), so that it aligns with client requirements and cost justifications. The base component of the service level management practice is the service level agreement (SLA). An SLA is a two-sided agreement between the service provider and the user. It documents the service targets and responsibilities. Although focus is usually given to establishing the targets, the responsibilities are equally important. They document who is responsible for what, and set the conditions under which the service levels are met. Figure 7-4 describes the activities and relationships between them in the service level management process.

Establish and improve SLA

Monitor

Report

Figure 7-4 Service level management activities

Establishing a service level


The targets or service level requirements are a set of specifications that the storage service should meet. The following list provides the high-level topics that can be included in a service requirement definition: Availability The availability component of the service level specifies the time periods in which the service is available. It includes the overall availability (for example, 99.9 percent) and can include specifications for planned and unplanned downtime, planned downtime acceptable periods, and required advanced notice. Performance The performance component of an SLA sets the minimum performance levels that the infrastructure should meet. The most common metrics used are throughput and response time. Recoverability The recoverability component defines what data should be recovered in the case of failure, and how long it should take. Commonly used specifications are the recovery time objective (RTO) and the recovery point objective (RPO). The RTO is the time needed to recover from a disaster (in other words, how long you can afford to be without your systems). The RPO describes the age of the data you want the ability to restore in the event of a disaster. For example, if your RPO is 6 hours, you want to be able to restore systems back to the state they were in no longer than 6 hours before the disaster. To achieve this, you need to be making backups or other data copies at least every 6 hours. Any data created or modified inside your recovery point objective will either be lost or must

124

ILM Library: Techniques with Tivoli Storage and IBM TotalStorage Products

be recreated during a recovery. If your RPO is that no data is lost, synchronous remote copy solutions are required. It might be useful to describe the recoverability service levels related to different types of disasters. Clearly, the RTO will be different in the case of a site-wide disaster, compared to an accidental file deletion (which some users may also describe as a disaster). Accessibility Probably the most abstract component of the storage service level, the accessibility component, also describes the most important part of the data management from an ILM perspective. Accessibility will provide information about required capacity and planned growth, but also about conditions to move data from one storage class to another. In addition, it can describe the access patterns a certain data type has, for example, block or file, sequential or random. Security Data classification includes a security component, for example, confidential data, auditable data, etc. It can also include regulatory requirements for datadata that must be retained for set periods of time, and so on. Support The support component describes what the helpdesk will do, and when it will do it. It should define response times for different types of incidents, as well as document the types of incidents to which the help desk will respond. Billing Finally, the billing component describes the methodology used to charge back storage services used and set the cost, mostly in terms of the used capacity. This component is only required in the SLA for environments that have implemented billing for internal IT services. As this list indicates, creating a service level for storage can be a complex and time-consuming task. However, in most cases, starting with just a subset (for example, one or two components) of the classes will allow the construction of data classes and subsequent storage classes or storage tiers. When considering storage tiers, the four most applicable components are availability, performance, recoverability, and accessibility. A way to approach the actual creation of the requirements is to ask the users for their most important attribute of the storage serviceavailability, performance, recoverability, or accessibility. You can then focus on the most important component or components, reducing the complexity of the storage service level. As the service matures, additional components can be added. A second approach is to focus on the user pains. By asking which part of the storage service can be improved or what are the biggest issues, a view of things to fix and to define can be easily gained. While this has the potential advantage of fixing issues and improving the user satisfaction in the short-term, be careful not to set unrealistic expectations for the result. This might disappoint the users and inhibit further cooperation on service level agreements.

Creating a service level agreement


The flowchart in Figure 7-5 on page 126 provides an example of the steps involved in creating an SLA.

Chapter 7. ILM initial implementation

125

Define stakeholders Issue questionnaire Define most important component Create draft SL Documented storage capabilities Perform feasibility check Agree on SL Implement SL Renegotiate

Figure 7-5 Steps to create a service level agreement

The steps are: 1. Define who will be involved in the SLA creation process; in other words, who is the user. For storage services, determining the users is not always as straightforward as it might seem. This is because the storage service is mostly a component of a larger IT function (for example, e-mail, Internet, intranet, file serving) delivered to business users. In most cases, the system or application administrators can provide the required input for the service level requirements. This presumes that a SLA already exists for the over-riding IT function. If this is not so, the users of the IT function should be included in the service level definition process. 2. Create a questionnaire to allow the users to define their most important components of the storage service. As noted above, this can be done based on capacity, availability, recoverability, and accessibility. The most difficult part here is to create a questionnaire that translates storage parameters into user terminology. Be careful not to overstate the requirement. Users may tend to set standards that are higher than the ones they actually require. To avoid this, work with a predefined set of available service levels, including a charge back or indicative cost component. Note: A single function can have multiple service levels, depending on the type of data. You should analyze the questionnaire answers to decide which components to focus on. 3. A draft service level proposal can be created based on the input received. It should be reviewed early on to see if the required service levels have achievable targets. It is pointless to create a service level definition for a storage service, which is impossible using available technology at a reasonable cost. It is also important in this phase to check whether monitoring can be accomplished on the most important parts of the service level objectives. For more details on the monitoring part, see the next section, Monitoring on page 127. If the review identifies either achievability or monitoring as critical factors, you may have to renegotiate the requirements. Once the feasibility check is complete (and successful), the service level objectives can be agreed upon to create the service level agreement. 126
ILM Library: Techniques with Tivoli Storage and IBM TotalStorage Products

Monitoring
Monitoring is a critical part of service level management. It includes monitoring all components of the service level to allow further analysis. When defining monitors, the most important thing is that the monitors measure something meaningful, as expressed in the service level, and that they are aligned with the users perceptions. It is a good practice to combine measurements with user reviews to ensure that the measurements are in line with what the user experiences. In addition, remember that the service level agreement consists of two parts: The agreed-upon service level objectives The agreed-upon conditions or user responsibilities under which these objectives can be obtained As a result, the monitoring activity should also include the monitoring of the responsibilities and the conditions as defined in the service level to make sure these are not factors in not meeting the service level.

Reporting
Reports will allow the user and service provider to review the service levels delivered. The reporting part of a service level agreement should include the following: How the reporting will be done and to whom it will be distributed. This also includes details on how measurements will be taken. What will be reported on. When reports will be released. There are two basic types of report triggered by different events: A periodic report, triggered on a time basis A service level breach report, triggered by an exception condition, which reports on the incident and eventual corrective actions

7.2 Optimization of storage occupation


This section discusses the storage occupation optimization part of the initial ILM implementation scenario. It provides information about how to reduce the occupied space, turning it into allocatable free space. Free space allows the storage administrator to have capacity available for future capacity requirements, delaying and reducing the frequency of capacity upgrades. Figure 7-6 on page 128 provides an overview of storage space allocation, and how the space is divided up.

Chapter 7. ILM initial implementation

127

Raw Capacity

Useable capacity

Overhead (RAID)

Allocated

Free Capacity

Used

Unused

Business data Reclaimable

Space reclamation focus

Space reclamation goal

Non business Temporary

Duplicate

Stale

Allocated

Free Capacity

Figure 7-6 Overview of space usage

The initial capacity, or raw capacity, defines the total capacity of all disk devices available in the storage subsystems. This raw capacity will typically be groups in RAID arrays. The overhead introduced at this level depends on the actual chosen RAID level. For RAID 10 arrays, this is around 50 percent. For RAID 5 arrays in a 7-disk plus parity configuration, the overhead is 12.5 percent. These overheads are required for availability reasons. The usable capacity is the actual amount of capacity that will be available for allocation to the hosts. This capacity is divided into two parts: Allocated capacity, which is space that is currently allocated to hosts Free capacity, which is the part of the usable capacity that we want to increase so that it is usable for new host allocation requirements To allow an increase in the usable capacity, we focus on two parts of the allocated capacity: The part of the allocated space that is currently not in use, that is, unused space. For details on this see 7.2.2, Avoiding over allocation on page 146. The part of the used space that is reclaimable. Reclaimable space is currently occupied by data that has little to no business value. The next section, 7.2.1, Reclaimable space on page 128, discusses the different components of this part of the used space, and shows techniques to reclaim it.

7.2.1 Reclaimable space


As explained above, data is of two types: Valuable business data and data that can be considered unnecessary (non-business data). When optimizing storage, it is a good practice

128

ILM Library: Techniques with Tivoli Storage and IBM TotalStorage Products

to first reduce (or remove) this data from the storage infrastructure. Consider an analogy of moving from one house to another. When you do, you have two options: Perform a cleanup/consolidation activity before moving, throwing away anything that is broken, or that you no longer need. While this requires some effort up front, it results in faster packing and faster moving using fewer resources, and makes the job of fitting everything in the new house much easier, since everything can go directly where it is supposed to go. Pack everything, move everything, and clean up after the move. While this is easier, as no pre-sorting is required, you will spend longer packing and moving everything in your house; use more resources like boxes, truck space, and gas; and the unpack will be more difficult. This is because you have more things than you need, so the tendency is to dump everything in any room, then potentially have several iterations of moving things to their final storage location. If you have not reviewed your possessions before moving, you might even be buying a larger house than you really need, just to have space to store unnecessary items. In the storage optimization process, sorting the data and removing any unnecessary data before starting the move to tiered storage makes the capacity planning and the actual data placement faster and more accurate. The unnecessary data is of one of the following types: Non-business data - Data that has no business use. Often this data is collected and stored by users contrary to (unenforced) company policies (personal files or files downloaded from the Internet). Temporary files - For example, files that are created during installations, or dump files. Duplicate data - Multiple versions of the same data object. Stale data - Data that has not been accessed in a long period of time belongs to users who are no longer active, or was part of obsolete applications, etc. We now discuss techniques to locate and determine these data types.

Non-business data
While it is clear that non-business data should not be residing on enterprise class storage, few people actually implement rules or processes to enforce this. While this might seem illogical, the most usual reasons are very understandable, namely: It is difficult to identify what is business data and what is not. There is no accurate idea on how much storage space is consumed by this type of data. There are no clear SLAs with end users on how to manage non-business data. Distinguishing business data from non-business data is not an easy task. While a naive blanket rule might be to prohibit files based on their file extension, for example, media files like mp3, mpeg, wmv, and wma, the growing popularity of technologies such as e-learning and podcasting makes these file types appropriate to a business context in some cases. Therefore such rules are too coarsely grained. Another issue is that it is not always easy to know what volume this type of data represents. If non-business data has not even been identified, how can you measure how much storage it is conuming? Finally, in most cases, there is no existing agreement between users and the storage service provider on what types of data should be stored on the storage systems. Here we give one methodology for differentiating between business and non-business data. 1. A prerequisite for this method is the adoption of a critical guiding principleall managed data should be described within a service level agreement (as described in 7.1.2, Service level management on page 124).

Chapter 7. ILM initial implementation

129

Adopting this principle will allow us to map the data to the services supported and identify which users and applications should have which data. Then we can proceed. 2. Map each function described by an SLA to the actual applications used. For example, the e-mail service might be provided by the Lotus Domino application. File serving should also be included. 3. After identifying the applications, the next step is to identify the location of the application data. The data location can be defined as the system on which it resides, and/or the directories in which the data is placed. Depending on the consolidation of applications on a system, choose one or both. 4. Using TPC for Data, create a report of the type of files (based on file extension) that use the most space. A sample is shown in Figure 7-7. You should limit the number of types detected to reduce the subsequent complexity. Depending on the usage level, between 10 to 20 file extensions might be sufficient. The actual number of types that need to be investigated upon depend on the utilization. For example, if the top five file types use 95 percent of your storage, do not bother examining 20 file types. If the top 20, however, only represent 50 percent, you will need to add more file types.

Figure 7-7 Top 10 file types using the most space

5. Create a matrix to map applications to file types, indicating which file type is used by which application. You will need the help of the application or system administrators. For example, a file extension of .nsf is a Lotus Domino database, an extension of .xls is used by Microsoft Exchange. Table 7-1 provides an example.
Table 7-1 Define which application uses what file type Application A File type 1 Yes Application B No Conclusion Used by A

130

ILM Library: Techniques with Tivoli Storage and IBM TotalStorage Products

Application A File type 2 File type 3 No Yes

Application B No Yes

Conclusion Not used Used by A&B

In the above table, we can see that file type 2 is not used by any application. As a result, this type of data should not exist. Other file types are considered as valid, since they are used by at least one application. 6. Create a TPC for Data report that shows the file types relative to their location. Based on the other information gathered, we can now create an exception report that will list all file types that are not in their defined location. With the above information we can now conclude the following about the file types existing in the storage environment: File types that belong to an application and are in the correct location. These are part of the business data. File types allowed by the applications, but that are not in the correct location. These might be non-business data. File types that are not associated with any application, and as a result are not defined by an IT function. These files are probably non-business data. 7. After having identified the non-business data, the final step is to deal with it. Basically, there are three options: Create exception reports and communicate them to the users, indicating violations of the service level agreement. Move identified data to a quarantine location, from which data can be recovered if required. Delete the files. This whole process is summarized in Figure 7-8 on page 132.

Chapter 7. ILM initial implementation

131

Obtain SLAs Define top file types (space usage)

Identify applications

Create matrix of used file types per application Document data location per application Create report listing file types in function of location Create exception report Handle exceptions Communicate No Delete Yes Yes

Required ?

Correct location?

No

Update locations

Move

Figure 7-8 Defining non-business data

The issue with this method is that it might be difficult to do for file servers. File servers often do not have specific file types that are considered to be allowed. If there is such a specification, then there is no problem. If there is no such specification, then it is very likely that there is non-business data on the file servers, as that is a common place for users to store files. As a result, an additional control should be added to the location and file type conditions we defined above. One way to do this is to create a TPC for Data report listing the top space using file types as a function of the users to which these files belong. When this list has been generated, it should be carefully reviewed. Some users will be able to justify using a lot of storage for a certain type of file (for example, a graphics designer can easily justify having a high number of image type files). For users who do not have a business need for a particular file type (or who exceed an acceptable quota for these types of files).

132

ILM Library: Techniques with Tivoli Storage and IBM TotalStorage Products

further investigation will be needed to more precisely define the nature of these files. This carries some risk: Creating a negative user relationship Need to repeat these analyses regularly As a result, be careful before starting this investigation process. Make sure the possible space reclamation is sufficient to justify the steps taken. If the suspected data volumes are only a small fraction of the space you want to reclaim, it is probably not worth pursuing. This assumes that a reclamation goal has actually been set. A quick way to determine the potential space gain is to just look at the top space using file types. If the reported space usage is much less than the reclamation goal, the results of this operation might not be optimal. In addition, these operations should be defined clearly in the SLA as part of the reporting activity. An example of an SLA statement could be: A monthly review will be performed by the storage management team, that will generate and distribute a list of the following usage violations for the file servers: Top 20 file types that use the most disk space Users owning the above files For each violation, a justification will need to be provided. If the files cannot be qualified as business data, a delete operation will be performed after informing the file owner.

Temporary files
A second category of non-business data is temporary files. Temporary files are typically created at one point in time, and are intended to be deleted shortly afterwards. This, however, is not always the case. As a result, controlling the space occupied by temporary files might be an easy way to reclaim space. Again, the main issue is identifying these files, as they can be located in different locations throughout the file systems. One advantage we have is that they usually follow some naming convention. As a result, the following rules could be used to detect them: Temporary application files Look for files ending with tmp (*.tmp) and files containing a tilde (~) somewhere in the file name (~*.*, *.~*). Look for temporary directories or file systems like tmp, temp, and temporary. Dump and trace files Look for files ending with dmp (*.dmp) or having a name like dump (dump*). Look for files ending with trc (*.trc) or having a name like trace (trace*). Look for directories or file systems like dmp or dump. Log files Look for files starting or ending with log (log* or *log). Look for log directories. This list can be supplemented with specific temporary files. Again, the application administrators should be involved to help define what is temporary data. It could be included in the application data definition activity in the determination of non-business data (see Non-business data on page 129 for details). Once you have identified the correct types, reports from TPC for Data can be analyzed to determine the space occupied by the temporary files. It should be clear that this space cannot be considered as automatically deletable. Temporary files can contain valid information that

Chapter 7. ILM initial implementation

133

is required for operations. They should follow one rule, however: There is no reason that temporary data would grow more rapidly than other data. So the amount of temporary data must be compared to the used space and monitored over time. Ideally, the ratio should stay constant. First, define the baseline ratio (that is, the current percentage of space used by temporary files). Again, TPC for Data can give this information. Next, monitor to see if the percentage changes over time. The following cases will explain what the results could be: Constant ratio The first possible result (see Figure 7-9) is that the ratio of the space used by temporary files to the total used space remains constant over time. This means that there is no increase in temporary files; or better, no increase larger than the increase in total used space. Make sure to trend this ratio over a reasonable amount of time, and compare start values with end values. You do not want to mistake a one-off spike for a continuing trend, for example, when lots of trace/dump data is collected for problem determination, or during an application migration.

Temporary _ Space Used _ Space

Temporary increases can be normal, use averaging techniques

Baseline ratio

Time

Figure 7-9 Ratio between temporary space and used space remains constant

If the ratio is constant, there is a good chance that you will not be able to reclaim any of the temporary space. Note: Even if the ratio remains constant, you should look at the initial ratio between temporary space and used space. If this seems high, a review of used temporary space might be required to determine if too much is being kept. You could do a one-time purge of very old trace/dump and other temporary data to reclaim some space. Increasing ratio Your monitoring may show the ratio increasing, as shown in Figure 7-10 on page 135. If so, it is likely that applications are not cleaning their temporary files, or that system and application administrators are using high-end disk space to store dumps or trace files. It certainly warrants a closer investigation. You should also realize that the increasing space consumption has probably been going on for some time, that is, it did not just happen

134

ILM Library: Techniques with Tivoli Storage and IBM TotalStorage Products

when you started the monitoring. This means that the initial ratio is probably too high, and a cleanup operation might provide significant space gains.

Temporary _ Space Used _ Space

Increase

Baseline ratio

Time

Figure 7-10 Ratio between temporary used space and total used space increasing

A point of discussion is the actual increase percentage at which this investigation should be triggered. Suppose you are monitoring a system that has a 20 percent ratio between temporary and total used space. If you see an increase of 1 percent per month in the ratio, your ratio will increase to 32 percent by the end of the year. If you have a yearly growth of 20 percent on your total storage, the temporary space would represent 6 percent of this growth for one year. Note: A sudden increase in the ratio, with an almost constant curve before and after the increase, does not mean there is an issue, as this could simply be the result of a reclamation operation in the used data volume not impacting the temporary space. If this occurs, you should, however, reset the baseline ratio to allow further comparisons. Decreasing ratio A final possibility is a (sudden) decrease in the ratio. This could be caused by: A cleanup operation in the temporary space An increase in the used space Neither of these necessarily indicate a problem situation. However, you should reset the baseline to the lower ratio to allow accurate future trending.

Chapter 7. ILM initial implementation

135

Temporary _ Space Used _ Space

Baseline ratio

Set new baseline here

Time

Figure 7-11 Decrease in ratio between temporary and used space

When starting to review the temporary space, be sure to add a section in the service level objectives to cover this operation. For example, you could add a clause in the conditions part of the objectives that indicates what is an acceptable ratio between temporary and used space. If this ratio is exceeded, a corrective action (remove files) will be performed, or justification will be required from the group owning the system or application. Now we show how to create TPC for Data reports on temporary space in the file systems using filters on directory names. For example, we could define a filter that shows the space usage in any directory that starts with tmp or temp. Start by creating a profile that will gather this information, as shown in Figure 7-12 on page 137.

136

ILM Library: Techniques with Tivoli Storage and IBM TotalStorage Products

Figure 7-12 Creating a profile - Defining statistics to gather

On the File Filters tab, define filters to search for the temporary directories (Figure 7-13).

Figure 7-13 Defining the file filters

When the profile has been created, define a scan that will use this profile (Figure 7-14 on page 138 and Figure 7-15 on page 138).

Chapter 7. ILM initial implementation

137

Figure 7-14 Defining the scan systems

Figure 7-15 Defining the scan profile

Now run the scan and go to the reporting section. The reports we need are located in the File Summary section. Select the correct profile and generate the report (Figure 7-16 on page 139).

138

ILM Library: Techniques with Tivoli Storage and IBM TotalStorage Products

Figure 7-16 Generating a report

In this example, the results (shown in Figure 7-17) are shown by file system. Ideally, the temporary space should be analyzed system wide. This is because we are looking at the overall ratio between temporary space and used space. Temporary data often resides in a different file system from the application data, for example, the file system in which the operating system and applications are installed. The generated report shows the space used by temporary directories, as well as the space used in the entire file system.

Figure 7-17 Temporary space report

Duplicate data
A third component of the non-business data is duplicate data. Duplicate data is made up of: Data intentionally duplicated (replicated) to allow two applications to use the same data. Data duplicated by users, typically on file servers. Maybe everyone wanted their own personal copy of the CEOs last address to the stockholders. We discuss the first case in Chapter 8, Enforcing data placement on page 153.

Chapter 7. ILM initial implementation

139

For the second case, since duplicate files basically contain the same information, they should not exist. With TPC for Data, duplicate files can easily be located throughout a file system. This means that we do not have the same detection problem as with the previous two types of non-business data. The problem here is to reclaim the space used by duplicate files. Note: TPC for Data detects duplicate files by comparing the metadata of the files. However, it will not scan the files for contents. This means that there is no 100 percent guarantee that both files will actually be identical. On the other hand, it also does not detect duplicate information stored in files with different metadata (for example, different names). Duplicate data cannot simply be consolidated by deleting all of the extra versions of a file. Doing this would obliterate one of the important metadata parts of the file, namely the path in which it is located. As most users and applications rely on the path as the means to locating the file, removing or changing the path would mean a loss of data. So, in order to remove or reduce duplicate data, a change in data sharing practices should be done. On UNIX systems, symbolic links might solve the above issue. This will work if the files will remain duplicates over time. On Windows file servers, consider rearranging the data, moving from a user-centric schema for the home directories to an organization-centric schema. For organizations that are project oriented, a project-based directory might also be appropriate. See Figure 7-18 for an example.

User directories Organization 1 Members Global Project 1 Team Global

Standard home directory

Directory for an organization

Contains files for all members of an organization Contains files from members of an organization for sharing with other organizations

Directory for a project

Contains files for the project team

Contains files for other project teams

Figure 7-18 Organizational and project-based file-sharing structure

With the above structure in place, it will now be possible for end users to share data files between several people, reducing the need for all of them to have their own copy. Once this is available, TPC for Data could be used to generate lists of duplicate data still residing in user directories, and this list could then be used to create an awareness or enforcement of the use of the shared directories.

140

ILM Library: Techniques with Tivoli Storage and IBM TotalStorage Products

There is much that could be done by future technology solutions in the area of duplicate data. We have considered here only file-based data. Semi-structured data, like e-mail systems, are an ideal breeding place for duplicate data copies because of frequent forwarding of attachments. Ideally, a solution to this should be implemented at the mail server, so that attachments would actually point to a central file repository. Another way to attack this is to restrict the maximum size of forwarded e-mails, although this has separate user-related issues. Structured data, like databases, also often contains duplicate data because of lack of control or poor design in of creating tables with duplicate information, duplicate records, etc. We expect solutions and service offerings to address these issues to become more prevalent in the future.

Stale data
A final type of non-business data is stale data. Stale data is data that has not been accessed in a certain period of time, or data that has access rates below a certain threshold. Stale data can be separated further into the following categories: Obsolete data (for example, information that is no longer current or data that is no longer used by applications) Archive style data (for example, snapshots or point-in-time copies of particular files) Data belonging to users who no longer work for the company (also commonly called

orphaned data)
Data that is accessed periodically, but with long periods of inactivity (for example, financial data that is used at the end of the quarter or year) These are quite distinct categories; therefore, different handling methods are required. Obsolete data, for example, could be deleted, archive and orphaned data could go to offline tape storage, while periodic data could go to nearline storage. The issue again, however, is to distinguish these four types of data. Unless they are clearly defined already (which is highly improbable), there is no intrinsic way to determine which data is of which type. Therefore, we can only differentiate them according to the available metadata, that is, the inactivity period. A key point is to have a clear view of this inactivity metric. In other words, how long does a file need to be inactive before it can be considered stale? The definition of these periods should be part of the service level agreement (accessibility part). Once this is defined, a two-tiered archiving solution might be created, which will: Store data on archival disk for a period of time, after a certain inactivity period has passed Move data to tape storage after a certain time has passed, typically equal to the inactivity time of the periodic data This process, shown graphically in Figure 7-19 on page 142, can be accomplished using Tivoli Storage Manager for Space Management (commonly referred to as HSM).

Chapter 7. ILM initial implementation

141

File is moved to TSM by the TSM HSM client after x days of inactivity.

TSM migrates file from disk to tape after y days.

Active data tiers

TSM Storage Pools

Figure 7-19 Moving stale data in a two-tier Tivoli Storage Manager HSM solution

The figure shows a setup using a two-tiered Tivoli Storage Manager solution with HSM. The space management client is installed on the servers managing the volumes that are eligible for space management. When a file meets the conditions for migration, it will be automatically moved to the Tivoli Storage Manager server storage, leaving a stub file behind at the original location. A stub file is basically a pointer to the content of the file. To a user, it appears that the file still resides on the original file system. However, the user only sees the pointer or stub. When accessed, a so-called recall of the file will be performed automatically and transparently, copying the file back from Tivoli Storage Manager managed storage (disk or tape) to a file system managed storage space (disk). The conditions for moving a file depend on the operating system on which the HSM client is installed, but typically include file age (can be defined by creation, last access, or last modification date), file location, or name (including the extension), and space used within the original file system. The advantage of setting up a two-tiered HSM solution is that files can be retrieved from disk instead of tape in a short initial period of time. This is ideal for files that have a very high probability of being accessed in a certain period of time (like the periodically accessed files), since the recall time will be limited to the network transfer time of the file. If it has moved to tape, there is additional tape processing time, including mounting the tape and moving to the location of the required file. Note: As well as this general two-tiered implementation, Tivoli Storage Manager for Space Management can also make use of different management classes, allowing management of files (for example, move from disk to tape). To do this, however, a clear definition of file types and/or location must first be obtained. With the above in mind, the following steps can be performed to establish an HSM-based inactive data management system. A first step, as usual, is to determine the applicability and benefit of an HSM-based solution. TPC for Data can create reports that will show the average age of files, based on the last access date. There are three dates applicable to the file age: Last access date Last modification date Creation date Of these three values, the last access date is probably the most accurate one when thinking in terms of utilization frequency of a file. Based on this report, you can show the data access patterns. Figure 7-20 on page 143 is an example of a TPC report on last access times for files.

142

ILM Library: Techniques with Tivoli Storage and IBM TotalStorage Products

Figure 7-20 TPC for data access time reporting

Note: The above report can be created at different levels. You can display the access distribution relative to the number of files or relative to the total space used by the files. As our primary concern at this point is space reclamation, it should be viewed at the level of the total space consumed by the files. The above report can guide you on what the ideal retention periods would be to keep inactive or stale data on your disk subsystems. For the two-tiered HSM solution explained above, you need to determine two periods: An initial stale period (x), after which the data will be moved to the primary Tivoli Storage Manager disk storage A second period (y), after which the data moves from disk storage to tape storage Table 7-2 on page 144 provides an example of the distribution of data over date last accessed. When analyzing the data, you can add a column to indicate possible savings on your primary data tiers. To do this, make a cumulative summation of the percentages per access period, starting with the oldest data. The table shows that 10 percent of the data has not been accessed in over 1 year. A further 5 percent has not been accessed in more than 9 months, but less than a year. Therefore, if we moved all data that has not been accessed in 9 months or longer to lower tier storage, the potential capacity savings on our prime storage would be 10 + 5 percent = 15 percent.

Chapter 7. ILM initial implementation

143

Table 7-2 Usage in function of access date example Data accessed Less than 1 day Between 1 day and 1 week Between 1 week and 1 month Between 1 and 2 months Between 2 and 3 months Between 3 and 6 months Between 6 and 9 months Between 9 months and 1 year Over 1 year Percentage of the total data size 10 5 10 20 5 30 5 5 10 Savings percentage 100 95 85 75 55 50 20 15 10

You should first define the time period a file must be inactive before it is moved from disk to Tivoli Storage Manager storage pools. While this period should be part of the SLA, it might be a good idea to look at the above table and come up with a proposed value. A common value for this is one month. This means that the potential space savings for this data profile will be 75 percent of the total volume (referring again to the Savings percentage column). As well as the volumes as such, this data can also give you some important information about the prospective load there will be on the HSM environment. The table shows that 20 percent of the data volume is accessed every 1 or 2 months. If the total storage is 1000 GB, this means 200 GB is accessed every 1 or 2 months. Then, assuming 30 days in a month, the largest access frequency for that group of data is 200 GB divided by 30 days, or approximately 7 GB per day. Using this, we can create a view of the environment, as shown in Table 7-3 (using a 1000 GB environment).
Table 7-3 Access loads for data based on age Data accessed Less than 1 day Between 1 day and 1 week Between 1 week and 1 month Between 1 and 2 months Between 2 and 3 months Between 3 and 6 months Between 6 and 9 months Between 9 months and 1 year Over 1 year Volume 100 GB 50 GB 100 GB 200 GB 50 GB 300 GB 50 GB 50 GB 100 GB Days 0 1 7 30 60 90 180 270 365 Load N/A 50 GB/day 15 GB/day 7 GB/day 8 GB/day 3 GB/day 0.3 GB/day 0.2 GB/day 0.3 GB/day Cumulative load N/A 84 GB/day 34 GB/day 19 GB/day 12 GB/day 4 GB/day 0.8 GB/day 0.5 GB/day 0.3 GB/day

Based on this, if we archive all files that have not been accessed in one month or longer, the average load on the recall process will be about 19 GB per day (worst case average, shown

144

ILM Library: Techniques with Tivoli Storage and IBM TotalStorage Products

in the Cumulative load column). As a result, the archiving solution should be designed to handle this load. Note: The above numbers reflect the expected average recall load for the HSM solution, that is, how much will move back from archival storage to active storage. Assuming a constant overall volume, we can then also conclude that if 19 GB per day is recalled, 19 GB per day is also archived. Next, we should determine how long the archived data will remain on the Tivoli Storage Manager disk storage pools before being moved to tape. Here the periodically accessed data might be a good indicator. If you have data that is accessed typically in 3 monthly (quarterly) periods, you could plan to leave this data on the Tivoli Storage Manager disk storage pools for this period, depending also on the total volume of such data. So this means that we would need to keep the data an additional 2 months on Tivoli Storage Manager disk (remember, we already assumed that data was only moved to Tivoli Storage Manager after an inactivity period of one month). Figure 7-21 shows an overview of where the data will reside in function of time.

Day 0: file A is accessed for the last time Access

A
Disk tier (active) TSM disk storage pool TSM tape storage pool

Day 30: file A is migrated to the TSM disk storage pool using the space management client. A pointer is left on the active disk. HSM Migrate

A
Disk tier (active)

A
TSM disk storage pool TSM tape storage pool

Day 90: file A is migrated from the TSM disk storage pool to the TSM tape storage pool using the TSM MigrateByAge function. TSM Migrate

A
Disk tier (active) TSM disk storage pool

A
TSM tape storage pool

Figure 7-21 HSM Data placement in function of time

This means that 50 percent of the current active data will be moved to tape, based on the 50 percent cumulative percentage amount shown in Table 7-2 on page 144 for the between 3 and 6 months last access. Figure 7-22 on page 146 summarizes the above information.

Chapter 7. ILM initial implementation

145

1000 GB

Active data tiers

250 GB

19GB/day

750GB

19GB/day Active data tiers

TSM Storage Pools

4GB/day 250 GB 15GB/day 250GB 500GB

19GB/day Active data tiers

4GB/day TSM Storage Pools

Figure 7-22 Overview of two-tier HSM implementation

The starting point is one (or more) active storage tiers that contain 1000 GB of data. The first rule adopted (and that should be included in the SLA data accessibility section) is that all data that has not been accessed in 30 days will be migrated to Tivoli Storage Manager storage pools using the HSM client. The immediate result will be (based on a TPC for Data analysis) that 750 GB will be migrated from the active data tiers, freeing up 75 percent of the storage. A second step is that we implemented a TSM migration process from disk storage pools to tape storage pools that will move files that have resided at least two months in the disk storage pool (and as a result have not been accessed in 90 days). The result is that the Tivoli Storage Manager disk pool will contain 250 GB, the tape pool 500 GB. The average load is calculated using the same TPC for Data report as the last access time analysis. The above information can be used for file systems accessed at a file level. For file systems containing database or e-mail type (structured on semi-structured) data, archiving should be under the control of the database application, in combination with a content management system. As the complexity of this surpassed the goal of this first ILM implementation level, it will be discussed later in Chapter 9, Data lifecycle and content management solution on page 165.

7.2.2 Avoiding over allocation


After the non-business data residing on the storage infrastructure, over-allocated file systems and application space is a second big contributor to the non-optimal usage of the available

146

ILM Library: Techniques with Tivoli Storage and IBM TotalStorage Products

capacity. Typically this is caused by lack of planning in assigning the initial file system space or overestimating the space required to avoid a subsequent out-of-space condition. This can be hard to fix, as older server systems may not allow you to easily change the file system space assignment. Use of volume managers (for example, LVM on AIX) makes this less of an issue, since they allow online expansion of file systems. Virtualization, for example, SAN Volume Controller (SVC), also allows you to extend allocated volumes online. This means that when a file system reaches an upper usage limit, it can easily be extended to match the additional space requirements, without impacting the attached systems functionality. Note: Online file system expansion is not supported on all operating systems. For details see: http://www-03.ibm.com/servers/storage/software/virtualization/svc/index.html Database space containers also tend to be over-allocated. In some cases, this is due again to a lack of good growth numbers. Using correct capacity management (as described in 7.1.1, Capacity management on page 121), this problem can be resolved for the future. To analyze the gravity of the current situation, TPC for Data can again be usedspecifically the file system free space report (see Figure 7-23). This report will provide an overview of the global over allocation, as well as views per system or per file system.

Figure 7-23 File system unused space reporting

If the report shows an issue with over allocation, you need a process to fix it. A first step is to create a capacity plan. This will provide you with a (funded) growth pattern, based on historical data. In addition, the SLA accessibility component will also provide you with expected growth patterns. Next, define a process to handle space allocation. The crucial thing here is to define how long it will take to allocate space in case of an expansion requirement, that is, what the elapsed time period will be between the triggering of the
Chapter 7. ILM initial implementation

147

process (manual or automatic) and the actual availability of the extra capacity for the host. When this information is known, the curve shown in Figure 7-24 can be used to define the level at which an alert should be posted, triggering the space allocation process. Note: When defining the lead time to add space to a file system, a distinction should be made between allocations from the free capacity and allocations requiring the addition of raw capacity, as they both have significantly different lead times.

File system usage (%)

Growth curve

100% Allocation trigger

Time Time required for space allocation

Figure 7-24 Defining the space allocation trigger level

Once the trigger level is defined, TPC for Data can be used to define exception reports to indicate that the maximum usage level defined has been reached. As well as file systems, databases can also be using too much space. A complicating factor is that raw logical volumes are often used for database table spaces. If this is so, the file system reports will not provide any information, as these raw volumes are only accessible by the database management system itself. To allow scanning the space usage of database systems, the database component of TPC for Data must be used. Figure 7-25 provides an example of a database free space report.

Figure 7-25 Database unused space report

Once the allocation issues have been identified, you want to fix them, to reduce the total amount of available free space. This will involve moving data, which may require downtime/application outages. Figure 7-26 on page 149 provides an overview of actions and their interconnection to fix and monitor space allocations.

148

ILM Library: Techniques with Tivoli Storage and IBM TotalStorage Products

Capacity management

Service level management

Growth planning Define allocation trigger Allocation procedure lead time

Create report (usage level)

Detect overallocation

Maximum usage level reached ? Yes Allocate capacity

No

Fix overallocation

Figure 7-26 Handling over-allocated file systems and databases

7.3 Tiered storage


A final part of the initial ILM implementation is the addition of storage tiers to your storage environment. As explained in Chapter 3, Implementing ILM on page 29, storage tiers enable you to distribute data based on the required service level. In fact, it actually enables the different SLA levels to be implemented at the appropriate cost point. To allow the creation of tiers, the capacity management and service level management processes should be in place. Capacity management is required to allow the sizing of the different tiers, and to allow the addition of capacity when required by growth. The service level management process will be required to match the capabilities of the storage tiers to the actual user requirements as agreed upon in the SLA. When designing the different storage tiers, it is likely that the number of different data classes (a data class is a collection of data that has the same service level requirements) is higher than the practical limit of storage tiers. Normally, two or three active storage tiers should be sufficient to fill most requirements. Figure 7-27 on page 150 provides an overview on how to do this.

Chapter 7. ILM initial implementation

149

Increase in capability

Data6 Data3 Data5 Tier 3 Data4 Tier 2 Data2 Data1 Tier 1 Requirements Capability

Figure 7-27 Matching data classes to storage tiers

Match each data class to a higher capability storage class. For some data classes (Data6 in this example) requirements might be so high that the resulting infrastructure cost would be too high. For this type of data, a reiteration of the definition of the data class might be required, and a mapping to a lower level storage class might be necessary.

7.3.1 What storage devices to use


The storage tiers defined here are static, so that in normal conditions, data classes will not move to a different tier over time. This next stage in ILM evolution is covered in Chapter 9, Data lifecycle and content management solution on page 165. Clearly there is a wide choice of disk and tape technology that could be used to implement this kind of tiered storage. However, we suggest using the SVC for the higher tiered storage for the following reasons: The SVC allows easy integration and movement of existing storage volumes to the new assigned storage tiers, without disrupting access to the data residing on these volumes. In addition, the disk migration functionality can also be used on rare occasions where a data class needs to move from one tier to another. Note that the movement occurs on the logical volume level; files are not moved individually. The SVC provides additional functions, which can reduce the operational management complexities of having different types of storage subsystems for different tiers. These functions include: Common copy functions (remote copy and Point-in-Time Copy) for all storage components Common volume assignments Possibility to extend logical volume sizes As discussed in Stale data on page 141, Tivoli Storage Manager for space management can also be used to add two tiers for archival storage. Typically, archived data can effectively use tape as a media, which is cheaper, since total cost of ownership of tape is around 10 to

150

ILM Library: Techniques with Tivoli Storage and IBM TotalStorage Products

20 times lower than disk. Using Tivoli Storage Manager for space management for archive data also introduces automatic data movement does occur. This completes the first ILM implementation scenario. In the next chapters we expand on this initial scenario and add the automatic data movement functions.

Chapter 7. ILM initial implementation

151

152

ILM Library: Techniques with Tivoli Storage and IBM TotalStorage Products

Chapter 8.

Enforcing data placement


In this chapter we discuss the second step in an ILM implementation process. After implementing service level management and other storage management related processes, defining an initial data classification and implementing storage tiers, as shown in Chapter 7, ILM initial implementation on page 119, this chapter will add to these initial definitions. The focus in this chapter is the expansion of the data classes, moving to an individual file placement level. In addition, we define a way to enforce data placement and automatically move files that do not comply with the rules.

Copyright IBM Corp. 2006. All rights reserved.

153

8.1 Moving from the initial ILM scenario


With the initial ILM implementation explained in Chapter 7, ILM initial implementation on page 119, we related the data placement to the initial value of the data. This means that each data class is mapped to a certain storage tier (based on input received from the service level management process), and that these mappings are static in overtime (see Figure 8-1). In addition, mapping was performed at the logical volume level, meaning that data class granularity was limited.

Service Level Management Data Class 1 Data Class 2 Data Class 2

Tier 1

Tier 2

Tier 3

Figure 8-1 Initial static ILM implementation

As a second step, we maintain the static mapping over time, but we add a function that allows automatic data placement as a function of its data class, thus enforcing mapping the correct data class to the correct storage tier (see Figure 8-2 on page 155). In addition, we will allow data classes to become more detailed, adding individual data types rather than individual applications or systems (which was done in Chapter 7, ILM initial implementation on page 119).

154

ILM Library: Techniques with Tivoli Storage and IBM TotalStorage Products

Service Level Management Data Class 1 Data Class 2 Data Class 2

Data placement enforcement

Tier 1

Tier 2

Tier 3

Figure 8-2 Adding automated data placement

The initial ILM design shown in Chapter 7, ILM initial implementation on page 119, can be optimized in two ways: Add an increased control over where data is placed. Using the initial implementation, if a DBA places a database on a file server, the only point of control is the SLA violation, which is triggered after the event. In this chapter, we show a way to automatically fix this. Increase granularity of data class assignments. In the first design, an entire logical disk needed to be mapped to one tier. In this design, we will be able to create data classes that will include individual file or database objects.

8.2 Requirements for data placement enforcement


When implementing the next step in ILM, we add additional requirements to the storage environment. From Chapter 7, ILM initial implementation on page 119, we retain the following three main requirements: Create and apply storage management policies, with a focus on the service level objectives definition. Define initial data classes, dividing the total data volumes into valid and invalid data. Create a tiered storage environment, using the SAN Volume Controller as a single front-end to simplify operations when using different back-end storage devices. Now we need to add the following two functions: Add a function that places data according to the SLA. Define rules for data placement according to file types. These rules should be included in the SLAs. Figure 8-3 on page 156 shows how this ties into the already existing ILM architecture, including the defined service level objectives (SLOs).

Chapter 8. Enforcing data placement

155

All data
Valid Data Invalid Data

Data Classes Application Function Modified SLA File name (parts) Known data Well defined Unknown Data Undefined Un-definable Owner (UID/GID) Definable

Should not exist, classified as invalid.

Location

Date

Rules Tier 1 Tier 3 Tier 2

Move from system or volume wide SLOs to specific data SLOs

Figure 8-3 Adding file-based location rules

8.2.1 Data classification


In 7.2.1, Reclaimable space on page 128, we discussed the fact that stored data can be split into two parts: The valid data, which has a business value The invalid data, consisting mainly of non-business data and duplicate data We will start this discussion assuming both types of data are already identified, so we can focus on the valid data. The first distinction to make is the difference between known and unknown data. Known data is data that is assigned to or associated with an application. Unknown data is data of unknown origin. Due to a principle adopted earlier, all valid data should be assigned to an SLA and, as a consequence to an application, this unknown data should not be part of the valid data pool, and can as a result be considered invalid. As we said, known data is assigned to or associated with an application. Remember the table we created, assigning certain file types to certain applications (see Table 8-1).
Table 8-1 Define which application uses what file type Application A File type 1 File type 2 File type 3 Yes No Yes Application B No No Yes Conclusion Used by A Not used Used by A&B

156

ILM Library: Techniques with Tivoli Storage and IBM TotalStorage Products

This table should be our starting point, as it defines the known data. However, since our placement policies will have more detail than just the application or the system, we must expand the table, adding a clear definition of each file type per application. At this point, file identification and classification should be done using the files metadata. Unfortunately, metadata of a file is rather limited, making this a more difficult task. When defining the data parts of each application, we should come up with a description for each type of file based on the file name (or parts of it), the location (or parts of it), the dates (creation, modification, or last access), the owning user or group (UNIX only), or a combination of these attributes. As always, there is a chance that some of the data cannot be defined or identified by this method. In this case, two options exist: Review the data assignment to see if somehow the data can be transformed or moved to a location in which it becomes definable. Create a catch-all data class, which will be used for all the undefinable data. You will need to identify a storage tier to place it on. One approach could be that, since you do not know the explicit value of the data, you should place it on the highest tier, to be safe. Or you could choose a lower tier. Once the data is defined, rules need to be created to assign each type of data to a certain data class and as a result to a certain tier. It is important here that the SLOs reflect these rules, and that they are agreed upon in an SLA. Table 8-2 shows an example of a data classification exercise based on three identification rules: The extension of the file The location of the file File name or part of the file name (specification)
Table 8-2 Data classification based on application and file metadata Applications Data Identification Extension Office automation Word Excel PowerPoint Visio Text based Acrobat Imaging Archives Intranet/Internet doc, dot xls, xla, xlt, csv ppt, pot, pps vsd, vss, vdx, vsx, vtx txt, rtf, ?? pdf, ps jpg, gif, bmp, tiff zip, rar homedirs\<uid> homedirs\<uid> homedirs\<uid> homedirs\<uid> homedirs\<uid> homedirs\<uid> homedirs\<uid> homedirs\<uid> All All All All All All All All Low Low Low Low Low Low Low Low Very high Very high Very high Very high Very high Very high Very high Very high Medium Medium Medium Medium Medium Low Low Low Locationa, b Specification Data requirements Performance Capacity Availability

Chapter 8. Enforcing data placement

157

Applications

Data Identification Extension Location


a, b

Data requirements Specification All All All Performance Low Low Low Capacity Very high Very high Medium Availability Low Low Low

Multimedia Acrobat HTML E-mail Domino

mp3, wmv pdf html, htm, xml

homedirs\<uid> homedirs\<uid> homedirs\<uid>

nsf, ntf

<uid>\notes\data

mail db ml*.nsf Other dbs

High Low

High Very high

High Medium

a. Including all subdirectories starting from this path, except any directory with temp as the directory name. Temporary files should be placed on the lowest available storage tier. b. If a file is in a directory is called \homedirs\<uid>\application, do not move it.

Here is the basic process for constructing a table like this: 1. Define which IT functions will be considered. Examples here are file sharing (could be office automation as well), Internet, and e-mail. 2. Define the applications that are used in the function. These can be applications that offer the function, for example, Lotus Domino for e-mail, or applications that are used to handle the content that the function produces. For example, the Internet function might allow users to download multimedia files, which are then handled by the appropriate applications. 3. Identify the data assigned to the application. Note that we are looking at the actual data, and not the application files (executables, configuration) themselves. A first identifier can be the file extension. For example, a Microsoft Word document will typically have doc as the extension. Lotus Domino databases might have nsf or ntf as the extension. Sometimes, however, extensions are not so clear. For example, basic text documents might have txt or rtf extensions, but could have others as well. Take, for example, a read.me file. Where sometimes this is straightforward, at other times it is not possible. This is the undefinable part of our data. Under normal conditions, this should not form a large proportion of the total data, since the fact that it does not use a well defined extension also makes it likely that the file type itself is not commonly used. If it does turn out to be a commonly used extension, the location (directory) or part of the file name can also be used as an identifier to classify the data further. To be assured that undefined data does not form a significant proportion of the total data, use the TPC for Data file type analysis reports, listing space used by file type. 4. Identify the location or locations of the application data in the file system structure. Where this was less important initially, because file systems were typically handled as one entity and placed as such, it now becomes mandatory to perform this step for two reasons: By defining the location, you will be able to place data accordingly. The location defines the rules for data placement compliance. If a certain file type does not meet the location rule, it will be considered as invalidly placed and will need to be moved. When defining the location, we should remember why we are doing this classification: To place data (individual files) on a tiered storage device infrastructure. In a non-virtualized file system environment, files will end up on different drives (or file system mountpoints) based on their classification. So, it is not useful to define the location, for example, as

158

ILM Library: Techniques with Tivoli Storage and IBM TotalStorage Products

D:\homedirs\<uid>\data, as the drive letter might change depending on the final tier placement. Instead, use a location relative to the file system name (D:\ in this case). So a correct location would be \homedirs\<userid>\data. Also, decide if data may be legitimately located in subdirectories, maybe specifying which ones. For example, you could allow any data in any subdirectory for a file server \homedirs\<uid>\, except files located in temporary directories, indicated by a \temp directory. 5. Add any further specification for the data, including the file name. For example, a Domino server might contain mail databases and other databases (for example, a team database). A rule could be that mail databases have a higher requirement than team databases. This means that there should be a unique identifier for these mail databases, for example, all files starting with mail. This completes the definition of the different data types based on their metadata. The second half of the table lays out the actual requirements for these different data types. In this example, we created requirement definitions based on three different characteristics: Performance, capacity, and availability. For each of these requirements, a classification is made indicating the importance (ranging from low, medium, and high, to very high). There are many different ways to define requirements of the different data types, as explained in Establishing a service level on page 124. For this example though, we created a basic approach to establishment of the SLOs, not including detailed definitions of the different requirements (What does it mean if someone says he requires high performance?). It is clear that in a real-life situation, these classifications should be mapped to actual numbers, so that interpretation is left to a minimum and that SLOs can be measured afterwards. This completes the classification of the data and the definition of their requirements shown in Table 8-2 on page 157. Next, we need to define the different data classes that came out of the above analysis. Basically, this means that we need to record all the distinct combinations of the capacity, performance, and availability requirements. Table 8-3 shows the four different classes we would require based on the input.
Table 8-3 Data classes Data class Requirements Performance Data class 1 Data class 2 Data class 3 Data class 4 High Low Low Low Capacity High Very high Very high Medium Availability High Medium Low Low

Combining Table 8-2 on page 157 and Table 8-3 gives us the data classification table, Table 8-4.
Table 8-4 Application to data class mappings Applications Office automation Data class 1 Data class 2 Data class 3 Data class 4

Chapter 8. Enforcing data placement

159

Applications Word Excel PowerPoint Visio Text based Acrobat Imaging Archives Intranet/Internet Multimedia Acrobat HTML E-mail Domino - Mail databases Domino - non-mail

Data class 1

Data class 2 X X X X X

Data class 3

Data class 4

X X X

X X X

X X

This completes the initial data classification. A final part of the data classification exercise is to map the data classes to the actual storage tiers. Assume the available tiers for online storage (non-archive) have been defined as shown in Table 8-5.
Table 8-5 Tier definitions Storage tiers Requirements Performance Tier 1 Tier 2 Tier 3 Very high Medium Low Capacity High Very high Very high Availability Very high Medium Low

If we compare the tiers (Table 8-5) to the data class requirements (Table 8-4 on page 159), we can see that we do not have exact matches. Also, the number of required data classes exceeds the number of available storage tiers. This means that we must define a best match for the different data classes. We can see that the requirements for data class 1 are higher than the ones available in tier 2. As a result, data class 1 is mapped to tier 1. Data class 2 has higher requirements that tier 3, but lower ones than tier 2. As a result, data class 2 can be mapped to tier 2. Data class 3 and 4 will be mapped to tier 3.
Table 8-6 Tier definitions Tier 1 Data class 1 Data class 2 X X Tier 2 Tier 3

160

ILM Library: Techniques with Tivoli Storage and IBM TotalStorage Products

Tier 1 Data class 3 Data class 4

Tier 2

Tier 3 X X

Figure 8-4 puts it all together.

Data Classes E-Mail


Mail DB

Service Level Agreement

Other DB

Data Class 1

T1

Office automation
Text

Data Class 2

T2

Word Excel PPT Visio Acrobat Images Archives

Data Class 3 T3 Data Class 4

Intranet
Multi Media Acrobat HTML

Figure 8-4 Complete picture of data types to tiers mapping

8.2.2 Enforcing data placement


We have just explained how to map certain data types to storage tiers using the data classes. In this section we discuss defining rules to allow us to enforce the agreed data classes. We use TPC for Data to detect data placement violations and to create alerts that will initiate the correct placement procedures. As a result, we must define our rules based on the view TPC for Data has of our data. TPC for Data works in a server-centric mode of operation, meaning it will analyze the data from the application or file server perspective (in contrast to analyzing files from a storage subsystem perspective). This means that the rules themselves must be defined from a server point of view. Continuing the example used so far, we look at the data from two servers: FileServer1, which hosts user office automation and Internet files FileServer2, which hosts Lotus Notes user databases Assume we will create the volume mappings shown in Figure 8-5 on page 162. Our two servers are mapped to the tiered storage, with server 1 using the D:\ drive for tier 1 and the

Chapter 8. Enforcing data placement

161

E:\ drive for tier 2. Server 2 uses drive D:\ for tier 2 and drive E:\ for tier 3. From a user point of view, this means four mappings to the file servers are required, two for each server.

\\server1\d$:\user1\notes\data \\server1\e$:\user1\notes\data

Tier 1 D: Tier 2 LAN FileServer1 SAN E: D: Tier 3

Client Workstation

\\server2\d$:\user1\notes\data \\server2\e$:\user1\notes\data

FileServer2

E:

Figure 8-5 Server tiered volume mapping

We can define rules using two techniques: Prescriptive rules, defining which files belong on a certain tier and move scripts to move all data that does not belong there. Exception rules, defining which files do not belong on a certain tier and move scripts to move them to another tier. Table 8-7 provides an overview of the applicable rules.
Table 8-7 Rule definition Rule ID 1 2 3 4 Rule If the system name is FileServer1, move all files that do not correspond to mail*.nsf and are on the D: (tier1) drive to the E: drive (tier 2). If the system name is FileServer1, move all files that correspond to mail*.nsf and are on the E: drive to the D: drive. If the system name is FileServer1, create a report of all files that do not end in nsf or ntf. If the system name is FileServer2, move all files that end with mp3 or wvm and that are in the D:\homedirs directory (tier 2) to the E:\homedirs directory (tier 3), except if they are located in the application subfolder. If the system name is FileServer2, move all files that end with pdf or ps and that are in the D:\homedirs directory (tier 2) to the E:\homedirs directory (tier 3), except if they are located in the application subfolder. If the system name is FileServer2, move all files that end with jpg, gif, bmp, or tiff and that are in the D:\homedirs directory (tier 2) to the E:\homedirs directory (tier 3), except if they are located in the application subfolder. If the system name is FileServer2, move all files that end with zip or rar and that are in the D:\homedirs directory (tier 2) to the E:\homedirs directory (tier 3), except if they are located in the application subfolder.

162

ILM Library: Techniques with Tivoli Storage and IBM TotalStorage Products

Rule ID 8

Rule If the system name is FileServer2, move all files that end with htm, html, or xml and that are in the D:\homedirs directory (tier 2) to the E:\homedirs directory (tier 3), except if they are located in the application subfolder. If the system name is FileServer2, move all files that are in a directory called tmp in the D:\homedirs directory (tier 2) to the E:\homedirs directory (tier 3), except if they are located in the application subfolder. If the system name is FileServer2, create a report that lists all files that are in the D:\homedirs directory and not in an application subfolder, and that do not end in doc, dot, xls, xla, xlt, csv, ppt, pot, pps, vsd, vss, vdx, vsx, vtx, txt, or rtf.

10

Note that we have explicitly stated the server name to which each rule should apply. This will avoid undesirable side effects of moving files on other servers. Being explicit in this way gives greater control over what gets moved, but also increases the number of rules required and associated complexity. The first three rules set the environment for FileServer1, which contains the Domino databases for the users. Rule 1 states that if it is not a mail database, it should be moved to tier 2. Rule 2 actually moves mail databases to tier 1, as we want all mail databases to be on the same tier for performance reasons. The third rule creates a list of non-Domino files that are on this file server. As they should not be here, a report is created that lists violations of this rule. Note that the actual data movement will be triggered by TPC for Data, but that the script required to execute the move needs to be written separately. TPC for Data provides the ability to launch a Visual Basic script (Windows) or a Perl script (UNIX). As the scripts execute on the system detecting the violation (in this case, FileServer1), this means that the script can only move data between drives to which it has access. This means that rule 3 cannot be coded to move non-Domino files to FileServer2, as FileServer1 has no access to these disks. Rules 4 through 8 provide the data placement for the home directories of the users, acting on the application classes defined before (see Table 8-2 on page 157). One addition that has been made is that the rules do not act on files that are located in the application subdirectory of the \homedirs\<uid> directory. This is to allow users to install application data that is linked to by an application. As applications can require that their data is kept together, this rule allows this. Rule 9 moves all temporary files to tier 3, while rule 10 is the global exception rule that lists (and does not perform a move) all exceptions to the defined rules. If it should appear that the amount of files that are detected in this manner is too large, the rules (and service level) should be reviewed to include these files. Figure 8-6 on page 164 shows an overview of the above.

Chapter 8. Enforcing data placement

163

Data Classes E-Mail


Mail DB R2 R3 Other DB R1

Service Level Agreement Rule 4 (Constraint)


All MP3 and WVM files must be on T3 D:\<uid>\notes\data\ E:\<uid>\notes\data\ Constraint Violation ? Yes

Enforcement rule

T1

Run TPC File Types Report

Office automation
Text

D:\homedirs\<UID> Word Excel PPT Visio Acrobat Images Archives R5 R6 R7 E:\homedirs\<UID> D:\homedirs\<UID>

R9 R10

T2
*.mp3

No

Alert: Run Script Move data Inform

T3
*.mp3

End

Intranet
Multi Media Acrobat HTML R4 R5 R8 E:\homedirs\<UID>

Drive letters are relative to server containing the data Data is placed in location where it should reside based on the agreed upon service levels.

Figure 8-6 Enforcing data placement using TPC for Data

One of the potential issues with implementing these data placement enforcement rules is a reduced user satisfaction level. As the rules are enforced, user data will be moved to different file systems, changing their directory access paths. While this might seam radical initially, there are some ways to mitigate the risk of dissatisfied users. Here are some ways to do this: Communicate the service level to users, explaining to them that they should place their data in the appropriate location. Correct data placement is a way to allow correct usage of available resources. This means that if a user uses a tier that is too high for a certain type of data that does not require the given performance or availability, it will reduce the available resources for data that require this. Inform users of file movement. This can be done by adding e-mail messages in the scripts, informing a user that his data has been moved. Configure applications in such a way that they use the correct location as the default location. This completes the discussion on how to implement data placement enforcement.

164

ILM Library: Techniques with Tivoli Storage and IBM TotalStorage Products

Chapter 9.

Data lifecycle and content management solution


In this chapter we discuss the third and final step in an ILM implementation process. After implementing service level management and other storage management related processes, defining an initial data classification, implementing storage tiers, and enforcing data placement, we now continue and add the following to the implementation: Data management as a function of the lifecycle of the data Data management for database data and e-mail environments

Copyright IBM Corp. 2006. All rights reserved.

165

9.1 Moving from the previous steps


In the final ILM implementation step we continue on the road we took when starting with ILM in the initial implementation step. This means that we look further at how to place data in the appropriate storage tier based on the service level agreement. However, we add two new topics: Add the time or lifecycle dimension to the placement policy. Look at structured and semi-structured data, comprised of databases and mail systems. The above allow us to come to a totally integrated ILM solution for all types of data, approaching a correct placement towards the data value. Figure 9-1 shows a summary of the placement of data as a function of its lifecycle. We still have the different data classes, but they are no longer bound to one storage tier. As the requirements of the data might vary according to the point in life of the data, it can be assigned to different storage classes at different points in time (a data class is never assigned to more then one storage tier at one point in time). This means that we need to add the lifecycle dimension to the service level agreement, and add a component of a method to move the data between tiers whenever needed.

Service Level Management

Add SLOs in function of lifecycle of data

Data Class 1

Data Class 2

Data Class 2

Automatic data mover

Tier 1

Tier 2

Tier 3

Figure 9-1 Adding the lifecycle dimension

In the following section we go into further detail on how to define a lifecycle and how to manage the data accordingly.

9.2 Placement in function of moment in lifecycle


As explained in 1.2, Why ILM is needed on page 4, data value changes over time. This means that the placement as explained in Chapter 7, ILM initial implementation on page 119, and Chapter 8, Enforcing data placement on page 153, provides us with a system that allows placement in function of the initial value of the data, or better, the value of the data as reflected in the service level agreement discussing the data. Where this is already

166

ILM Library: Techniques with Tivoli Storage and IBM TotalStorage Products

a step forward towards a unique placement of data, it does mean that we have old data mixed with newer data. Assuming that data loses its value over time, this means that we are actually storing data on a storage tier that surpasses the current value of the data. In Chapter 7, ILM initial implementation on page 119, we discussed the possibility of using Tivoli Storage Manager for Space Management (HSM) solutions to move data to archival-type storage as a function of the time the data has not been used (accessed). The solution we present here is distinct from an HSM solution in that: HSM solutions typically move data off file systems, leaving a stub file (link) that automatically recalls the file when required. An archive solution moves the entire file, and the retrieve must be explicitly performed. HSM movement is done based on date only, while other events might be important to indicate that the data value has changed. An archive solution can do this.

9.2.1 Determining the value of the data


Information or data value changes over time. This is a fact on which everyone seams to agree. The issue, however, is to understand how and why this happens. Figure 9-2 shows the results of a study performed by Enterprise System Group, indicating the change in value of information over time. We discussed this earlier in 1.2, Why ILM is needed on page 4; however, we repeat the chart for clarification.

120
100

Data Value

80 60 40 20 0 7 Days 14 Days 21 Days 28 Days 3 Months 6 Months 9 Months 1 Year

Data Base Development Code Email Productivity Files MPEG

5 Years

10 Years

Source of graph: Enterprise Storage Group

Time

Figure 9-2 The changing value of data over time

The data showed in the chart are averages across various industrys and can differ strongly in particular environments. In addition, the time dimension (X-axis) can also be thought of as an event dimension. Let us take the example of a sales cycle. At some point, an invoice is issued to the client. From that point to the point where the invoice is actually paid, the information needs to be kept at hand. Now, imagine you have a rule stating all invoices must be paid within 30 days. Does this mean that after 30 days the information can be considered as less important? No. The actual event that triggers the fact that the information changes in value is the fact or event that the invoice has been paid, even if this happens before the 30 days delay (or after). So instead of defining the value of this particular information as a function of time, a more meaningful and accurate way would be to define the value as a function of a certain event (which itself can be the expiration of a certain period of time). The actual event, or change trigger, should be dictated by the business processes supported by the information and ultimately the data.

Chapter 9. Data lifecycle and content management solution

167

This means that a first step in analyzing information as a function of its position in its life cycle is to draw up the business processes, indicating the link to the information and data used (a business process needs and creates information, which is made up of data). Figure 9-3 shows an overview of the relationship between the business process and the data.

Business process Business process cycle Business process Business process cycle cycle Output Input

Information

Information

Data

Data

Data

Figure 9-3 Business process to data mapping

Once the links between the different process cycles and the data has been made, the value of the data in each business cycle must be defined. Note that one piece of data is typically used throughout the entire process, meaning the change trigger is typically associated with the movement from one cycle to another. Next, the value of the data for a business process should be defined at each stage or cycle. Determining the value of the data is not easy, and requires knowledge of the business. The value of data can be defined by the potential loss if the data is unavailable. The availability of data can be defined using the following five states: Data is available with low response times (guaranteed performance). This is the state that will typically be required for interactive transaction-based processes. For example, a person working in a financial environment should be able to enter transactions without noticeable delays. Data is available with higher response times (guaranteed availability and capacity). This state is normally acceptable for aysynchronous-type processes, where the operation handling the data takes much longer than accessing it. A good example is e-mail. A delay of a second in sending an e-mail is not noticeable, and should not have a negative impact on business. Data is available on archival storage, and access can be delayed in the order of minutes. This is the first archival state, typically achieved by using HSM solutions. Time to access data includes the time to retrieve the data from the Tivoli Storage Manager disk or tape storage pools. This is normally sufficient for parts of the business process that involve handling of old data. For example, a yearly inventory check could be allowed to wait for data. 168
ILM Library: Techniques with Tivoli Storage and IBM TotalStorage Products

Data is available on archival media, and access can be delayed for hours. In this case, the data is available on tape media (or similar), but not readily available. As a result, the recall of the data involves fetching and handling offline or off-site tape media. Normally, this is only usable for process parts that occur only on very rare occasions. An example is an audit where there is lead time available to retrieve data. Data is unavailable. In this case, data is no longer available. This state is normally applicable when either of the following is true: Data is no longer required. Data can be recreated from other sources, and the expected frequency of use is very low. A second step in determining the value is to create a loss versus cost analysis for each process cycle and its associated data. The loss, as explained above, is the potential loss in benefit if the data is located on a storage class that is not the ideal one. The ideal storage class for each data type is the one where the loss equals zero. The cost is the actual storage cost for each storage class. It is evident that a high performance disk solution will cost more per GB stored than a low cost tape solution. Table 9-1 shows an example of how to do this.
Table 9-1 Loss and cost Storage class 1 Loss Storage cost Process cycle 2 Loss Storage cost Process cycle 3 Loss Storage cost 0 10 N/A N/A N/A N/A Storage class 2 2 7 0 7 N/A N/A Storage class 3 15 5 15 5 0 5

Process ID Process cycle 1

For each process cycle, first determine the ideal class. As stated, this is the one where the loss is 0. For example, process cycle 1 requires storage class 1, process cycle 2 requires storage class 2, and process cycle 3 requires storage class 3. Next, the loss is noted for each storage class that is not the ideal one. For example, process cycle 1 would have a loss of 15 units if it was placed on storage class 3. The loss can be, for example, due to the fact that the duration of the transaction dictates the cost of the transaction. The second part of the table is the cost. For each process cycle, we determined the data attached to it, and as a result the capacity required for it. Each storage class or tier has an associated cost per capacity. As a result, it is easy to determine how much it will cost to store the data from a process cycle on a certain storage class. Next, it is the time to make the balance. Each solution will cost a certain amount. The total cost of each storage class is the sum of the storage cost and the loss. Figure 9-4 on page 170 shows this for one business process.

Chapter 9. Data lifecycle and content management solution

169

Cost Benefit
Loss 0 Benefit

Loss

Cost of storing data on storage class

Potential loss if storage not on correct storage class

Storage class 1

Storage class 2 Ideal storage class

Storage class 3

Figure 9-4 Cost versus benefit for storage placement

The cost for storage class 1 is equal to the cost of storage. As this is the ideal storage class, the loss of placing data in that class is zero. The cost for the other two storage classes is the sum of their cost for storage and the loss. As you can see, it could be the case that the cost for a lower storage class is actually lower than the ideal solution (from a requirements point of view). In that case, there is no real reason not to place the data on the second storage class. The one thing that should be kept in mind is that not meeting service levels can also have other negative impacts than just financial, like user satisfaction of loss of image. As a result, make sure the impact of the lower service level on these other factors is well known and documented. Figure 9-2 shows the above conclusion in numbers.
Table 9-2 Total solution cost Storage class 1 Loss Storage cost Total cost (loss + storage cost) Benefit to ideal solution 0 10 10 0 Storage class 2 2 7 9 1 Storage class 3 15 5 20 -10

Process ID Process Cycle 1

170

ILM Library: Techniques with Tivoli Storage and IBM TotalStorage Products

Process ID Process Cycle 2 Loss Cost Total cost (loss + storage cost) Benefit to ideal solution Process Cycle 3 Loss Cost Total cost (loss + storage cost) Benefit to ideal solution

Storage class 1 N/A N/A

Storage class 2 0 7 7

Storage class 3 15 5 20 -13 0 5 5 0

N/A N/A N/A N/A N/A

0 N/A N/A N/A N/A

This concludes the discussion on determining the value and cost of the storage solution, based on the position of process data in the process lifecycle. In the next topic, we look at how we can perform the placement of this data in function of the business process.

9.2.2 Placement of data


Once the value or use of the data is defined, a next step would be to define a trigger that indicates when the data changes from one state to another. From a business point of view, this might be rather easy. For example, the fact that an invoice has been paid provides an easy and logical description of the fact that the data associated with the order can change state, and that the process can move along. The problem, however, is linking the event in the business process to a state notification change for the data. In order to understand the difficulties with this process, we must first make the distinction between the different types of data from a management perspective. A common way to describe differences is as follows: Structured data, which indicates data that is organized in a specific way, and where content has a limited level of freedom. In most cases, structure data is the data stored in the tables of a database. The advantage of structured data is that it is quite easy to create data classes based on the information stored, as the information itself can be used. For example, a database record could have a field indicating that the information is handled and can be archived. Unstructured data. Unstructured data is characterized by the fact that its content and format are totally free, without any rules. Typically, these are the normal files we find in our file systems. The main issue with unstructured data is to define rules to identify and manage (or classify) it, as the information available is limited to the standard metadata. Content-based solutions, which control which information is stored in the file, tend to be very difficult to apply. Imagine that you need to figure out, based on the content of a Microsoft Word document, whether the file is a business document. Short of a person manually scanning the document (which would be cost-prohibitive), another approach could be to search its content for a positive list of business terms. But, this list should be so large that it would basically filter nothing. In addition, problems like language, spelling errors, and others complicate this process. One way around this would be the use of template documents, making them lean to the semi-structured approach. Semi-structured data is data that lies between structured and unstructured data. Content is still mainly free format, but a lot of other information is fixed and needs to comply to certain rules. For example, e-mail messages can be considered as semi-structured. The

Chapter 9. Data lifecycle and content management solution

171

content of an e-mail message is free; however, some fields like sender, destination, subject, urgency, confidentiality, and others are well defined. As a result, we can create more intricate rules for managing these types of data, as we have more information available. For structured data (databases), this might be a field in the database indicating a change in state. However, for unstructured data and semi-structured data, no metadata information exists that indicates this state of change. As a result, our approach to move data based on the metadata as described in Chapter 8, Enforcing data placement on page 153, cannot be used as such for event-driven data placement. This means that an additional layer must be added to describe the documents state inside the business process, which will add information to the existing document metadata. Such products are commonly known as document management solutions. A document management solution will typically exist in a database, describing the information in a way that is in line with the business process using the document. As a result, the document management system might be used to gather information about which storage media the document should be. If document management systems are not available, it might be worthwhile to try to approximate the event changes by using the available metadata. One way to do this is to link the files location or name to its state. For example, a user moving all his project-related documentation to an archive folder might be such a thing. As you can understand, this would mean that the responsibility of placing the data is left to the user, and, more importantly, that there is no way to identify these files if not placed correctly. As a result, this way of working might prove to be inefficient and unreliable for providing correct data placement. Another indicator might be the data values of a file. As explained earlier, metadata of a file contains three dates: Creation data Last write or modification date Last access date The creation date can typically not be used to describe state changes of a document, as it is fixed and only indicates when a file is created. The remaining two dates do provide us with some information that indicates the use of the file. The last modification date will reflect when a file was last opened for update, and the last access date reflects when a file is opened for read. This means that either of these two dates provide us with the required information about when a file was last used, and what was the action performed on it. The date stamp, which most likely will be providing the required information, is the last access date. Deletion of data can be done by both solutions. As explained above, using this last access date as the sole indicator of the files position in the business process would be incorrect. On the other hand, business events that trigger file state changes are typically definable in the period on which the data has not been used. Let us explain using the invoice example shown in Figure 9-5 on page 173.

172

ILM Library: Techniques with Tivoli Storage and IBM TotalStorage Products

Invoice send

Invoice paid

Archive Issue invoice Wait for payment Reissue invoice


30 days max

Move

30 days elapsed Create


Original invoice is used to create reminder.

Invoice resend

Access

Create

Invoice.doc Reminder.doc

Process step

Event

Data action

Data

Figure 9-5 Process example

Figure 9-5 shows four distinct items: The process step, for example, the creation of the invoice. The event that moves from one step in the process to the next one. The data action, which can be summarized using the following five actions: Create, access, modify, delete, and move. In the ILM context, we are interested especially in any operation that is not done by the process itself, which means the move or delete operations. The data used in this example simplified to a single file object. When the invoice is issued, a document is created (invoice.doc in this example). Next, the process goes into a wait status, which can take a maximum of 30 days, as this is the period of time in which invoices are due for payment. When the wait period has elapsed, or if the payment is received, the next step begins. If the payment is not received within this 30-day period, the invoice will be reissued and a reminder will be sent. As a result, the original invoice is opened to provide input for the reminder invoice (reminder.doc) and another 30-day wait period begins. If the payment is received within this 30-day period, the invoice cycle is complete and the documents attached to it can be archived. Archiving consists of moving the documents to the next storage tier (online or archive type). Now, the goal of this example is to find a way to define the different data actions as a function of the events occurring. Table 9-3 on page 174 provides an example on how to do this, listing every action that occurs on each part of the data.

Chapter 9. Data lifecycle and content management solution

173

Table 9-3 Data actions in function of event Invoice sent Invoice.doc Create document (inherent to process step). Document does not exist. Invoice paid Move document. 30 days elapsed Open document (inherent to process step). Create document (inherent to process step). Invoice resent No action

Reminder.doc

Move document (if it exists).

No action

Note: The above table describes the data portion in terms of effective file objects, using the name of the file. In real life, these should be noted in more general descriptive ways, like the path name, part of the file name, the extension, or the owner. This means that instead of using individual file names, file or data classes should be used. Once the above is done, a next step is to find a way to describe the data move or deletion actions in function of time elapsed between one of the dates describing these files (creation, access, modification). The easiest way to do this is to consider which events are time driven. In this example, the only event that is time driven is the passing of 30 days without receiving payment. Looking at Table 9-3, we can see that this event uses both the files involved in the process. As a result, we can conclude that if a file has not been accessed in 30 days, the event that describes the fact that the payment has been done has occurred. This means that we can conclude that the files may be moved to archive if they have not been accessed in more then 30 days. Note that we are approximating the events in terms of the metadata of a file. As a result, the move based on the last access date may not always be correctly reflecting the actual business process. An example of this is that if the invoice gets paid in, for example, 5 days, the data will still reside on the initial storage tier for 25 extra days (30 - 5 = 25) until the 30-day counter has elapsed. A second example would be the possibility that more then one time-driven event occurs in our process, each using a different delay. If we would modify our example and state that a reminder of an invoice must be paid in a 10-day period (rather then a 30-day period), this would be the case. The most correct solution (from a process point of view) would be at that moment to take the largest delay, reducing the risk of not having the data at the correct location at the correct point in the process. The downside is that data might remain longer then required on a higher class storage tier.

9.2.3 Movement of data


With the above defined, the next question is how to actually perform the move and delete operations defined by the process events. We offer two possibilities: TPC for Data with attached scripts Tivoli Storage Manager for space management clients, including the Windows HSM client Both solutions can be used to move data from one location to another; however, their use depends on what type of move you are performing. When the purpose is to move files from one file system to another (attached to another storage tier), TPC for Data is the most appropriate solution. When moving the data to archival-type storage, the space management solution is the best. The actual selection can be done based on the possible use states of the data, as we explained above. Table 9-4 on page 175 shows the selection process.

174

ILM Library: Techniques with Tivoli Storage and IBM TotalStorage Products

Table 9-4 Best product to move data in function of data use Data use - Action to this state from any higher one Data is accessed and updated on limited occasions and needs to be readily available. Data is dormant for a defined period of time, after which it will be accessed. Data needs to be available, but chances of reuse are very low. Data is no longer required. Action required Move Move Move Delete X TPC for Data X X X X X Space management

The TPC for Data movement process will be the same as the one described in 8.2.2, Enforcing data placement on page 161. An exception report will be generated, of a list of files that have not been accessed or modified in a certain time period and that respond to the specification of the data class to which they adhere. The actual data movement will then be done by running a script (automatically trigger by TPC for data) using the list of files provided. For the space management solution, files will be migrated to the IBM Tivoli Storage Manager server. This function is described in Stale data on page 141.

9.2.4 Using document management systems


Document management systems will add a layer to a normal document, allowing better management in function of its lifecycle. IBM Lotus Domino Document Manager provides such a layer. Document Manager is a document manager system running on a Lotus Domino server, which is built on the storage metaphor of file rooms, file cabinets, binders (folders), and documents. The document manager solution is built from the following components: The library The library is the entry point into Document Manager. It is the main view from which users navigate the storage system to access documents. You can have more than one library per document manager server. Each library is a separate storage hierarchy with separate access control. While you cannot share documents between libraries, you can move documents and file cabinets from one library to another. File rooms The file room provides a way to logically categorize file cabinets to facilitate navigation. All file cabinets are contained in a file room. When creating a new file cabinet, the user can add it to an existing file room or create a new file room. File cabinets Document Manager uses file cabinets to organize and manage binders and documents. File cabinets consist of Notes databases, (nsf) files that reside on the Domino server. Because file cabinets are nsf files, the necessary Notes forms for entering information (metadata) into a document are contained in the file cabinet along with the document content. The views for accessing the information and the application logic that automates the processes related to the document are also included. Binders

Chapter 9. Data lifecycle and content management solution

175

The Document Manager binder is a container within a file cabinet that is used to group documents logically. Each binder has attributes that facilitate organization and retrieval. System-generated attributes associated with every binder include the title, type, author, creation date, modification date, and number of documents. User-defined attributes can also be applied to every binder within the file cabinet regardless of binder type. Binders can also be grouped in categories to facilitate organization of large collections of documents. Documents A document in Document Manager is the information that is being managed. It can be a data file like a Word processed document or a spreadsheet, an OLE object, or a Notes document. It is given a descriptive title and saved in a binder within a file cabinet. Each document has attributes, or metadata, that facilitate document organization and retrieval, generally describing the piece of information that is saved in the document repository. System-generated attributes are associated with every document and may include, for example, the document author, creation date, date of last modification, or document title. Application attributes are specific to the individual application and may include, for example, the project name, document type, or proposal number. These attributes can be configured. Access to document content and attributes is limited to authorized managers, editors, and readers. The check-in/check-out feature of Document Manager ensures that only one user can modify a document at a time. When the document is checked out, it is locked in Document Manager. When it is checked back in, it can be as a new draft, a new version, or an update. The document manager will add certain ILM functions to the storage and management of documents. These include: Automation of document versioning For example, if a user checks in a newer version or draft of a document, it can be automatically increased in version number. Other possibilities include the users ability to manually set a version or automatically overwrite the previous version, keeping only the most recent one. Document review and approval options Collaborative documents are often formally reviewed by a group and finally approved for publication. Any current draft document can be submitted for review by any draft editor who then checks out the latest draft, opens the working copy, and sets up and initiates a review cycle. When the review cycle is complete, the approval cycle can be set up and initiated. When you create a document type, you can specify default review parameters for documents of this type (including parameters like when the documents should be reviewed or approved), and identify the reviewers and approvals and their roles in the review and approval process (editor, reviewer, approver manager) and the type of review or approval (serial or parallel). Document archiving and retrieval options When a document has progressed through its lifecycle and is no longer needed for regular and instant access, you can archive the document's content to an external storage facility where it can easily be recalled. Archiving large, out-of-date documents frees space for current Document Manager documents. A document can be archived based on criteria you specify in the document type form, or it can be manually marked for archiving. A proxy document that contains the profile and 176
ILM Library: Techniques with Tivoli Storage and IBM TotalStorage Products

security information is retained in the file cabinet so that the document can be retrieved from the archive. The following periodic background agents are used for the archiving and retrieval processes: Mark for Archive identifies documents that are ready for archiving based on the archive triggers defined on the document type form. If a current version of a document is being archived, the agent locks the document so that it cannot be checked out for editing before being archived. Archive to File System extracts a file attachment from the document, stores it in the external file system, and creates a proxy document. Retrieve from File System uses the information in the proxy document to retrieve the file attachment and restore it to the Document Manager document. The Archive to Tivoli Storage Manager add-in provides the capability to archive and retrieve Document Manager data on a Tivoli Storage Manager server. The add-in operates as a task that runs daily. It requires that a Tivoli Storage Manager Client be available on the Document Manager server. When a document is marked for IBM Tivoli Storage Manager retrieval, the add-in task attempts to retrieve any archived rich text and/or file attachments from the appropriate Tivoli Storage Manager server, as specified by the corresponding document type. This completes the section of document management and file management in a lifecycle dimension. In the next topic, we discuss e-mail management.

9.3 E-mail management


Until now, our main scope was file systems containing normal file objects. In this section, we start by looking at application data, and in particular e-mail messages. As discussed earlier, e-mail messages are considered to be semi-structured data, making them ready for more intricate management policies. One complication, however, is that e-mail messages are not file level types of objects, meaning they must be managed through the e-mail applications with an additional content management tool. In the following sections, we look at how to do this, using the same approach as the one we used until now: Reclaim invalid space. Establish policies. Automate data placement.

9.3.1 Reclaim invalid space


As with normal file systems, a starting point in the implementation of ILM practices on e-mail environments is the reclamation of space that is currently used by information that is not used for business purposes, or that contains data that is no longer considered usable. As with normal file systems, e-mail environments tend to be a breeding place for invalid data. Depending on the policies in place, the invalid data space usage can be even more important than normal file system environments. Think about the ease of distribution of documents, business related or not, to large numbers of users. When creating policies for e-mail environments, one of the key points to embrace is collaboration. Actually, this is what e-mail messaging is all about: Allow employees and clients to share information, improving collaboration. But, strangely enough, the use of e-mail attachments can tend to lessen the

Chapter 9. Data lifecycle and content management solution

177

efficiency of collaboration by creating separate, not yet necessarily idenditcal, versions of information. Imagine an e-mail message sent with a document, requesting a review of the information. In most cases, all receivers of the document will read, correct, and resend the document. This means that at that point, n different versions of the document exist, making it more difficult to manage and having an impact on the storage used. Figure 9-6 shows an example of what could happen. Imagine a person sends an e-mail to five colleagues, requesting feedback. If all five answer with an updated version, and reply all to everyone in the initial addressee list, you would have 25 times the original message after the first resend. If a document was attached, well, you would need 25 times the space for storing one piece of information.

Figure 9-6 E-mail propagation

The above clearly shows the risk involved in using e-mail as the sole tool for information sharing. It might even be worse if any of the addressees feel the need to copy additional people. A way around this is to create a shared store in which information can be shared in a more intelligent and efficient way. Solutions like team rooms, Intranet Web sites, forums, or blogs are such solutions. Instead of spreading out the information, you are actually consolidating the information. The advantage of working this way is dual: Less space is required. Consolidation of information, making sure that everybody can access all information regarding a topic, also meaning that no multiple different versions of documents exist. Where the above is not really part of the scope of an ILM solution, it should be an initial step towards storage optimization. Remember that reducing the amount of storage only makes further steps easier. What should be done from an e-mail management point of view is the implementation of two rules that indirectly enforce people not using the e-mail systems as document storage. These two practices include limitating mail box sizes and creating limits for attachment sizes. TPC for Data can provide you with a general look at how much space is used by the mail system, based on a file type report. This means that for Lotus Domino environments, you can obtain usage numbers for individual mail boxes (as each mail box has its own domino mail database file). For MS Exchange environments it depends on the configuration, but in most cases you will only be able to see the global file size used by the mail database stores.

178

ILM Library: Techniques with Tivoli Storage and IBM TotalStorage Products

Gr

ow th

9.3.2 E-mail archiving


E-mail archiving can be used to act upon the stale or old data in e-mail environments. This reduces the size of the active mail databases, improving performance and manageability. Figure 9-7 shows a diagram of a basic archiving system, where a content management solution provides an interface between the e-mail application and a storage manager, including the used archival storage.

E-Mail application

Content Manager

Storage Manager

Storage Devices
Figure 9-7 E-mail archiving diagram

Today, many people have implemented a non-central archiving solution, in which the user initiates the archive task, saving older mails to a so-called archive mail database or file. The disadvantage of working in this way is that the archive is difficult to manage. The first possibility is that the user saves his file on his local workstation disk drive. As workstations are often not backed up, this means that there is a serious risk for file loss. A second possibility is to store this archive on centrally managed storage. The problem there is that this means that backup operations always back up the complete file, even if only a small part has been changed. As these files tend to grow to a large size, this means a considerable backup overhead. As with normal file archiving, a first step in the process is to define the rules for archiving e-mail messages. IBM provides two solutions for e-mail archiving: CommonStore for Lotus Domino and Commonstore for MS Exchange. The following sections explain how the products work and what they bring into the e-mail archiving task.

CommonStore for Lotus Domino


CommonsStore for Lotus Domino will move documents or folders from the Lotus Notes databases to an archive location, from where they can be retrieved when required. The archiving can be done manually or by using an automated policy-driven solution. The policy will archive documents based on the following criteria: The age of the object. The age can be determined based on the creation date or the last modification date. In addition, you can specify to count the age between these dates and the current date, or a specified date. For example, a file created on 1/1/2005 could be compared against 30/1/2005, and show an age of 30 days. The size of the object, allowing documents larger than a certain size to be archived. In addition, the rule can be combined with a rule to only apply this rule in databases that are larger than a certain size. Any object that responds to a Lotus Notes formula.

Chapter 9. Data lifecycle and content management solution

179

If the archive process finds an object that responds to the policy, it will perform an archive action. The archiving can be done at different levels, which include: Archiving of the document attachments in Notes documents Entire notes documents, including the attachments and information Possibility to archive in other formats, including XML, ASCII, or RTF When CommonStore sends the archived object, it can send it to any of the following three destinations: DB2 Content Manager DB2 Content Manager OnDemand Tivoli Storage Manager After archiving the object, there are different possibilities on what to do with the original documents: Leave them untouched (only good as backup solution). Keep a pointer to the archived document (transparent to users). Delete all information (need for an external archive search and retrieval tool). As stated above, CommonStore can work with two Content Manager solutions. There are a couple of advantages to doing this, rather then archiving directly to Tivoli Storage Manager. One of the advantages is that the Content Management database will store the metadata information concerning the archive object. As a result, searches and queries can be performed, even if the stub file is no longer available in the original Lotus Domino database. Both of the applications (DB2 Content Manager and DB2 Content Manager OnDemand) are capable of ultimately sending the stored files to Tivoli Storage Manager. So, you are adding one layer between CommonStore and Tivoli Storage manager. The differences between DB2 Content Manager and DB2 Content Manager OnDemand are at a functional and flexibility level. DB2 Content Manager is a complete product, which allows advance interaction using the APIs available. As a result, DB2 Content Manager can be a middleware for more complex, user-written applications. One of the advantages that DB2 Content Manager in combination with Lotus Domino has over DB2 Content Manager OnDemand is that it has a feature called single-instance-store. This means that if one person sends an e-mail with a PDF attachment in it to 50 people, the archiving system is clever enough to save just one copy of the PDF file and link the 50 archived e-mails to this one physical file, saving a large amount of storage space. The advantages of DB2 Content Manager OnDemand over DB2 Content Manager in an archiving environment with Tivoli Storage Manager is that it will perform aggregation and compression of the archived objects. This means that it will not send single mail objects to the IBM Tivoli Storage Manager Server, but aggregated, larger files. As a result, the performance for storing and retrieving them is higher (especially when located on tape drives), and the impact on the IBM Tivoli Storage Manager Servers database is less.

CommonStore for MS Exchange


CommonStore for MS Exchange is very similar to CommonStore for Domino when it concerns functionality. The main differences lay in the definition of the policies, as they reflect the mail application structure. For example, the CommonStore for MS Exchange product allows archiving based on size of user mailboxes, rather then the database size. All available archiving options are also the same, and include DB2 Content Manager and DB2 Content Manager OnDemand, as well as a direct interface to Tivoli Storage Manager.

180

ILM Library: Techniques with Tivoli Storage and IBM TotalStorage Products

9.4 IBM System Storage Archive Manager


The IBM System Storage Archive Manager product is the new, renamed version of the Tivoli Storage Manager for Data Retention product. IBM System Storage Archive Manager is an archiving solution, based on Tivoli Storage Manager, with event-driven archive management. In a standard IBM Tivoli Storage Manager solution, archives are deleted automatically when a certain retention period elapsed. The retention period is defined as the number of days the archive is stored on the IBM Tivoli Storage Manager Server. With IBM System Storage Archive Manager, the retention period is controlled by the archive client sending the data, and this through the client API component. This means that each archive can have a different retention policy, based on the policy set in the application sending the data to the IBM System Storage Archive Manager server. There are two possibilities on how to set when data will be deleted or expired: Chronological archive retention Event driven archive retention IBM System Storage Archive Manager controls archive retention using three parameters: RETVER, RETINIT, and RETMIN. The retain version value (RETVER) within the archive copy group specifies the number of days to retain each archive object. This has always been available within the IBM Tivoli Storage Manager archive copy groups. IBM System Storage Archive Manager introduces a parameter called the RETINIT parameter, which specifies when the time specified by the retain version (RETVER=n days) attribute is initiated. The possible values for this parameter are creation or event, which basically control whether the data will follow chronological retention rules or event-based retention rules. By setting this parameter to creation (RETINIT=creation) in the archive copy group, you specify that the retention time specified by the RETVER attribute (retver=n days) is initiated at the time an archive copy is stored on the server. This is referred to as chronological archive retention. By setting this parameter to event (RETINIT=event) in the archive copy group, you specify that the retention time (RETVER=n days) for the archived data is initiated by an application that utilizes API function calls. If the application never initiates the retention, the data is retained indefinitely. This method of archive retention is referred to as event-based archive retention. The RETMIN value indicates the minimum number of days an archive needs to be kept, regardless of the value of the RETVER parameter. This was also introduced with the IBM System Storage Archive Manager.

9.4.1 Chronological archive retention


Figure 9-8 shows how a chronological retention policy works.

Application stores object on ISSAM


Retention period RETVER=365

ISSAM deletes the object

Time Day 0 RETINIT=CREATE Day 365

Figure 9-8 Standard IBM System Storage Archive Manager archive retention Chapter 9. Data lifecycle and content management solution

181

With RETINIT=creation and RETVER=365 days, a file that is archived on day 0 is retained 365 days and becomes eligible for expiration. In this case, after 365 days from the time the data was created, all references to that data are deleted from the database.

9.4.2 Event-based retention policy


In certain situations, data retention periods cannot be easily defined, or they depend on events taking place after the data is archived. To answer to this problem, archives can now be managed using the occurrence of an event. This means that the retention counter will only start when an event occurs. To do this, the RETINIT parameter must be set to EVENT. In order to be able to maintain the archival storage, however, a second parameter was added, called RETMIN. This parameter controls the minimum retention period of the archive from the time of archiving. Figure 9-9 shows an event-driven archiving mechanism.

Application stores object on ISSAM

Application sends expiration start event

ISSAM deletes the object


RETVER=365

RETMIN=730 Time Day 0 Day x RETINIT=EVENT Day x+365

Figure 9-9 Event driven archiving mechanism - Honoring RETVER

In the above example, the archived data is retained for a minimum of 730 days (RETMIN=730). If the retention time (RETVER) is activated through an event, IBM System Storage Archive Manager assigns an expiration date for this object, which is the date of the event (day x) plus the RETVER value (365 days). As a result, the expiration of the object will occur on day x+365. If the expiration event occurs, and the RETVER timer reaches its end before the expiration of the RETMIN value, the file will be kept until this RETMIN value expires (see Figure 9-10 on page 183).

182

ILM Library: Techniques with Tivoli Storage and IBM TotalStorage Products

Application stores object on ISSAM

Application sends expiration start event


RETVER=365 RETMIN=730

ISSAM deletes the object

Time Day 0 Day x RETINIT=EVENT Day x+365 Day 730 x+365 < 730

Figure 9-10 Event driven archiving mechanism - honouring RETMIN - case 2

Note: Event based expiration can only be used with applications using the IBM Tivoli Storage Manager API to send the event. A number of independent software vendor applications are available and certified for both IBM System Storage Archive Manager and the DR550. In addition, the IBM Content Management suite of applications is ready to used the benefits of the IBM System Storage Archive Manager solution. This includes the following products: IBM DB2 Content Manager for Multiplatforms IBM Content Manager for z/OS IBM DB2 Content Manager OnDemand for Multiplatforms IBM DB2 Content Manager OnDemand for z/OS and OS/390 IBM DB2 CommonStore for Exchange IBM DB2 CommonStore for Lotus Domino IBM DB2 CommonStore for SAP IBM Backup Recovery and Media Services for iSeries The IBM System Storage DR 550 (DR550) is an integrated solution of IBM System Storage Archive Manager and the required hardware, including a processor (POWER5), disk (DS4000) and tape devices (optional), which provides policy-based non-erasable, non-rewriteable storage. For more information on this product Understanding the IBM TotalStorage DR550, SG24-7091.

Chapter 9. Data lifecycle and content management solution

183

184

ILM Library: Techniques with Tivoli Storage and IBM TotalStorage Products

Related publications
The publications listed in this section are considered particularly suitable for a more detailed discussion of the topics covered in this redbook.

IBM Redbooks
For information on ordering these publications, see How to get IBM Redbooks on page 185. Note that some of the documents referenced here may be available in softcopy only. IBM TotalStorage Productivity Center V2.3: Getting Started, SG24-6490 IBM TotalStorage SAN Volume Controller, SG24-6423 Understanding the IBM TotalStorage DR550, SG24-7091 The IBM TotalStorage Solutions Handbook, SG24-5250 IBM Tivoli Storage Manager Implementation Guide, SG24-5416 An Introduction to Storage Provisioning with Tivoli Provisioning Manager and TotalStorage Productivity Center, REDP-3900 Exploring Storage Management Efficiencies and Provisioning - Understanding IBM TotalStorage Productivity Center and IBM TotalStorage Productivity Center with Advanced Provisioning, SG24-6373 Provisioning On Demand Introducing IBM Tivoli Intelligent ThinkDynamic Orchestrator, SG24-8888

How to get IBM Redbooks


You can search for, view, or download Redbooks, Redpapers, Hints and Tips, draft publications and Additional materials, as well as order hardcopy Redbooks or CD-ROMs, at this Web site:
ibm.com/redbooks

Help from IBM


IBM Support and downloads
ibm.com/support

IBM Global Services


ibm.com/services

Copyright IBM Corp. 2006. All rights reserved.

185

186

ILM Library: Techniques with Tivoli Storage and IBM TotalStorage Products

Index
A
Access File Summary report 70 access load reports 83 access pattern 125 Access Time report 83 Access Time Summary report 71 accessibility 125, 147 active data 33 Advanced Provisioning 45 alert 44 ANSI 15 apacity planning 129 API 55 archive data 9 archive management 14, 27 archive management system 14 asset reporting 4243 automated management 22 availability 124 Availability Management 19 availability reporting 41, 44 Configuration Manaagement 18 configuration repository 18 consolidation 22 content management 38, 56 content manager 14 Content Manager Family 11 Continuous Operations 23 copy services 110 creation date 142 Cristie 55 CSV output 43

D
data access patterns 142 data classes 30, 125, 149, 166 enforcing 161 data classification 30, 38, 89 data lifecycle 4, 6 data lifecycle management 1213 data placement 154, 161 data replication 139 data retention 4, 108 data sharing 5 data types 90 data value 9, 168 database asset reporting 42 trend reporting 42 use too much space 148 database free space 148 database reports 85 Database Storage By Computer report 85 database unused space 96 data-centric storage 34 DB2 24 DB2 CommonStore 59 DB2 Content Manager 25, 57, 180 DB2 Content Manager Family 11 DB2 Content Manager OnDemand 180 Disaster Recovery 23 Disk Capacity Summary report 72 DMF 16 DMS tablespaces 87 document archiving 176 document management 172 document review 176 document versioning 176 DR550 24, 112, 183 duplicate data 5, 129, 139, 156 Duplicate files 91 duplicate files 90, 97 Duplicate Files report 81

B
backup reporting 44 backup window 6 bare metal recovery 55 billing 125 Business Continuity 22 business continuity 23 business continuity management 19 business data 128 business data value 168

C
capacity managemen Capacity Management capacity management analysis 123 monitoring 122 capacity plan 123 capacity planning 70 capacity reporting 44 CCTA 17 Change Management change request 122 change trigger 167 charge back 125 CIM 113 cluster cache 48 collaboration 177 CommonsStore 179 compliance data 11 121 19 149

19

Copyright IBM Corp. 2006. All rights reserved.

187

E
E-mail archiving 179 enforce data classes 161 enterprise content managemen 56 ERP 14 escalation 18 ESS 24 exception report 148 exception reports 123

L
Largest Files report 80, 96 last access date 142 Last access date report 92 last modification date 142 load-balancing 46 logical volume 154 Lotus Domino CommonStore 179 Lotus Domino Document Manager 175 LPAR 26 LVM 147

F
failover 46 file system free space report 147 file system groups 67 File Types report 82 files statistics 42 filesystem capacity 40 Financial Management 19 FlashCopy 46 free space 127

M
manage 20 management reports 19 Master Console 48 master software repository 19 metadata 141, 157, 171 Modification Time report 84 monitoring 34, 121, 126127 MS Exchange CommonStore 180

G
governance model 120 growth reports 123

N
NAS 97 non=business data 156 non-business 91 non-business data 128129 non-business files 82, 90, 97 non-disruptive data movement 150

H
hardware infrastructure 26 helpdesk 125 Hierarchical Storage Manager 24 high availability 23 HSM 141, 167 HTML output 43

O
obsolete data 141 OGC 17 Oldest Orphaned Files report 73 OLE 176 On Demand 21 on demand storage environment 25 Oracle wasted space 88 orphaned data 141 orphaned files 91

I
IBM Risk and Compliance framework 11 IETF 15 ILM 4, 22 assessment and planning 30 quick assessment 65 ROI 98 SNIA 16 storage occupation 127 ILM elements 6 inactive data 5, 33, 46 Incident Management 18 information classes 31 Information Lifecycle Management 22 infrastructure management 26 Infrastructure Simplification 22 infrastructure simplification 22 invalid data 92, 156 ISO 15 ISV 55 IT services management 17 ITIL 16, 120

P
performance 19, 124 performance management 121 performance trends 123 periodic report 127 Perl 163 ping 41, 44 planning 121 policy-based archive management 14 PPRC 110 probe 4244 Problem Management 18 problem resolution 18

188

ILM Library: Techniques with Tivoli Storage and IBM TotalStorage Products

profile 42

Q
quick assessment 65

R
raw capacity 128 raw logical volume 148 recall 142 recoverability 124 recovery management 27 recovery point objective 124 recovery time objective 124 Redbooks Web site 185 Contact us xv reference data 10 regulation 10 regulatory requirements 4, 125 Release Management 19 remote copy 125 report file system group 92 reporting 127 assets 43 availability 44 capacity 44 usage 44 usage violation 44 repository 18 response tim 124 retention managed data 9 retention-managed data 11 ROI 65, 98 RPO 124 RTO 30, 124

S
SAN Volume Controller 147 GUI 46 scalability 46 storage utilization 46 Sarbanes-Oxley 100 SATA 12, 100 scan 4243, 69 TPC for Data 44 SDD 107 security 125 security violations 19 Segments with Wasted Space report 88 semi-structured 171 semi-structured data 141 service based environment 120 Service Catalog 19 Service Catalogue 19 Service Continuity Management 19 service definition 120 Service Delivery 17, 19 Service Delivery Agreements 19 Service Desk 18

service level 154 service level agreement 122, 124125, 129, 133 creating 125 monitoring 127 reporting 127 service level breach report 127 Service Level Management 1819 service level management 124, 149 service level violations 122 service levels 30 service management process 17 Service Support 1718 services management 17 SLA 122, 124, 133, 155 SLA see service level agreement SLAs 18 SLO 155 SMI 15 SMI-S 15, 26 SNIA 15 SNMP 41 trap 44 stale data 92, 129, 141 stale files 91, 98 standards 15 storage asset discovery 42 capacity 44 growth 44 reporting 43 usage trends 42 Storage Access Times report 74 storage capacity plan 123 Storage Capacity report 75 storage class 125 storage classes 125 storage infrastructure 26, 38 Storage Infrastructure Management 28 storage management 4 storage management optimization 124 Storage Modification Times report 76 storage occupation optimization 127 storage pool 33 storage resource utilization 122 storage space allocation 127 storage threshold 123 storage tier 68, 154 storage tiers 32, 38, 125, 149 storage utilization 6 storage virtualization 26 structured data 141, 171 stub file 142 Subsystem Device Driver (SDD) 46 support 125 SVC 24, 107, 147, 150, 155 Symantec 55 symbolic link 140 symbolic links 97 synchronous copy 110 synchronous remote copy 125

Index

189

System Storage Archive Manager 181 systems layer 26

T
tape 150 TCO 4, 9, 22 technology refresh 121 technology review 121 temporary files 82, 9192, 97, 129, 133 threshold 123 throughput 124 tiered storage 7, 107, 129, 149 tiered storage environment 155 Tivoli Storage Manager 107, 142 Tivoli Storage Resource Manager alert 44 Tivoli Storage Resource Manager for Databases asset reporting 42 trend reporting 42 Total Database Freespace report 87 Total DMS Container Freespace report 87 Total Freespace report 77 TotalStorage Productivity Center for Data 38 TPC 38 TPC for Data 31, 68, 122123, 130, 142, 147, 161 Access file summary report 70 access load reports 83 access time report 83 access time summary report 71 agent statistics 40 alert 44 alert log 44 alerts 41, 161 asset reporting 4244 availability reporting 41 backup reporting reporting backup 44 capacity reporting 44 CSV output 43 dashboard 40 database reports 85 database storage by computer 85 discovery 41 disk capacity summary report 72 duplicate files report 81 Enterprise-wide Summary 40 File Reports 80 file statistics 42 file types report 82 filesystem capacity 40 largest files report 80, 96 last access date report 92 modification time report 84 oldest orphaned files report 73 OS User Groups 44 ping 41, 44 probe 4244 profile 42 reporting 43

reports 69 scan 4243, 69 scheduled jobs 41 script 44, 163 segments with wasted space report 88 storage access times report 74 storage capacity report 75 storage modification times report 76 summary reporting 43 System Reports 70 total database freespace report 87 total DMS container freespace report 87 total freespace report 77 trend reporting 42 Triggered Action 44 usage reporting 44 user space summary 40 user space usage report 78 wasted space report 79 TPM 114115 trend reporting 42 Triggered Action 44 TSM API 55 TTPC for Data HTML output 43 tuning 121

U
UltraBac 55 unnecessary data 129 duplicate data 139 non-business data 129 stale data 141 temporary files 133 unstructured data 56, 171 UPS 48 usable capacity 128 usage reporting 44 user centric organization 140 user space summary 40 User Space Usage report 78

V
valid data 92, 156 valid files 91 value of data 168 vDisk 48 virtual disks 111 virtualization 22, 26, 28, 33 Visual Basic 163 volume manager 147

W
wasted space 88 Wasted Space report 79 WBEM 113 Windows

190

ILM Library: Techniques with Tivoli Storage and IBM TotalStorage Products

event log 41 workload management 19 WORM 112

Index

191

192

ILM Library: Techniques with Tivoli Storage and IBM TotalStorage Products

ILM Library: Techniques with Tivoli Storage and IBM TotalStorage Products

ILM Library: Techniques with Tivoli Storage and IBM TotalStorage

ILM Library: Techniques with Tivoli Storage and IBM TotalStorage

ILM Library: Techniques with Tivoli Storage and IBM TotalStorage Products

ILM Library: Techniques with Tivoli Storage and IBM TotalStorage Products

ILM Library: Techniques with Tivoli Storage and IBM TotalStorage Products

Back cover

ILM Library: Techniques with Tivoli Storage and IBM TotalStorage Products
Learn about basic ILM concepts Use TPC for Data to assess ILM readiness Stages to ILM implementation
Every organization has large amounts of data to store, use, and manage. For most, this quantity is increasing. However, over time, the value of this data changes. How can we map data to an appropriate storage media, so that it can be accessed in a timely manner when needed, retained for as long as required, and disposed of when no longer needed? Information Lifecycle Management (ILM) provides solutions. ILM is the process of managing informationfrom creation, through its useful life, to its eventual destructionin a manner that aligns storage costs with the changing business value of information. We can think of ILM as an integrated solution of five IT management and infrastructure components working together: Service management (service levels), content management, workflow management (or process management), storage management, and storage infrastructure. This IBM Redbook will help you understand what ILM is and why it is of value to you in your organization, and provide you with suggested ways to implement it using IBM products.

INTERNATIONAL TECHNICAL SUPPORT ORGANIZATION

BUILDING TECHNICAL INFORMATION BASED ON PRACTICAL EXPERIENCE


IBM Redbooks are developed by the IBM International Technical Support Organization. Experts from IBM, Customers and Partners from around the world create timely technical information based on realistic scenarios. Specific recommendations are provided to help you implement IT solutions more effectively in your environment.

For more information: ibm.com/redbooks


SG24-7030-00 ISBN 0738496049

You might also like