You are on page 1of 7

Playing Catch-up With Your Tape Library GazillaByte LLC June 2012

ABSTRACT: Over the past 10 years we have all been told, ad nauseam that tape, as a storage medium for information is as good as dead. As a result, very little planning has been done for its continued use, and now, having come to the conclusion that tape is far from dead, many companies are scrambling to come to terms with the inherent challenges of managing large quantities of offline data. To make matters worse, those tasked with addressing this deficit, lack the tools, process and knowledge to confront this challenge, and find themselves having to explain a problem which has no contemporary vocabulary. At the same time as companies face these challenges, governments around the world are implementing strict corporate governance laws in parallel with increased expectations within the community for both access to up to date information and infinitely retained historical data.

THE RAPID EVOLUTION OF DATA PROCESSING At a very simplistic level, data processing has, up until recently, revolved around two basic components; data and business logic. Traditionally, information has been manually input at the keyboard and then processed through business logic to create knowledge (the end product). The by-product of this process is information. Data = Input Logic = Process Knowledge = Output (Product) Information = By-product

In this traditional model, logic is reusable and input is only used once. As a practical example, consider the data processing activities of a bank. Data is collected by tellers and input into the computer, the data is then processed through business logic to create knowledge: in this example the account balance. Throughout the process of calculating knowledge information is created. This information will be used to create bank statements at the end of the month, but after this point the information becomes redundant. As information technology has evolved, and we have moved away from the analog mindset of the Industrial Revolution. The digital mindset has changed the equation to the point where logic is becoming increasingly single use, while increasingly, information is being reused. To illustrate this new paradigm, consider cancer treatment; in the past data has been used in research to create treatments. While several competing treatment options may emerge, each of these treatments are placed on the market as options. Future cancer treatments will be developed uniquely for the individual patient, and in this equation, the valuable component will no longer be data, or the logic required to analyze this data, but the DNA of the patient and their ancestors. DNA is not data, nor is it logic, it is information. It is the information that is reused through our own lives and passed on to our children to be perpetually reused. Although some believe that this DNA information is the product of Providence and others believe it as the product of evolution, this makes very little difference to the fact that increasingly we are learning that knowledge can be obtained from this DNA information without the need to understand the logic behind it.

In this model: Information = Input Information = Process Knowledge = Output (Product) Information = Byproduct

As strange as this concept may seem, the computer pioneers who worked on the Manhattan Projects early computers used the free time (between crunching numbers for the nuclear calculations) to of study the organic nature of information1, so this is not a new idea. As our understanding and development of this new paradigm evolves, backup becomes an integral part of the required business process. It is predicted that within the next decade the worlds information storage requirements will grow by 50 times todays quantities2. With this explosive information growth, information and information management itself will become increasingly organic in nature. THE EVOLUTION OF INFORMATION STORAGE For many decades computer storage mechanisms were relatively unreliable, not only were the components prone to failure, but prerequisite technologies such as power supply, cooling and formalized quality-control were still in their infancy. As magnetic technologies were developed to take on the base load of information storage, two adaptations of the same magnetic storage technique evolved, one being disk which was designed for online data and the other being tape which was designed to backup that online data. During this period, the concept of Disaster Recovery was introduced. Disaster Recovery involved restoring a backup from tape to continue information processing operations on alternate hardware, ideally at an alternate site. As enterprises increased their reliance on information technologies, demand grew for greater redundancy in both hardware and computer systems. This demand resulted in the development of High Availability technologies such as RAID disk, distributed data processing and distributed disk subsystems. With information processing workloads continuing to increase, the costs associated with maintaining reliable online storage and the resulting heat-load, created a new
1 2

Barricellis Universe http://tinyurl.com/cxug3hx Computer World June 28 2011 http://tinyurl.com/44ec97f

technology bottleneck. To address this problem, data that was no longer being actively used was moved to newly design secondary storage archive mechanisms; primarily these archive systems used tape storage. Today, data storage can be seen in four distinct categories: 1. 2. 3. 4. Online Storage: Active data stored on disk. High Availability Storage: A real-time copy of data stored on disk. Backup Storage: Snapshots of active data taken at specific points in time. Archive Storage: Inactive data moved from disk to tape to reduce the cost of Online, High Availability and Backup Storage.

In addition, compliance requirements have resulted in two sub-categories: 1. Backup Copy Storage: Copies of backups retained for litigation purposes. 2. Archive Backup Storage: Copies of Archive Storage for redundancy purposes. FAILED ATTEMPTS TO SIMPLIFY STORAGE REQUIREMENTS As storage requirements and their associated management disciplines increased in complexity, many technology companies tried to meet the demands of their customers by developing complex disk based technologies. The disk based technologies attempted to blur the lines between the various storage categories by merging High Availability, Backup and Archive into one solution. To achieve this convergence between the categories the new technique of sub-file level backup was developed. The premise behind sub-file level backup was that the majority of data stored between the 4 categories was text based, and that on any given day only a small percentage of each of the files requiring redundancy and backup underwent any form of modification; therefore it was argued, why backup or replicate all of a file when only 5% of that file had changed? The technique deployed for sub-file level backup works by: 1. 2. 3. 4. 5. Breaking each file into sections of a fixed length. Calculating a signature that statistically represents to contents of each block. Comparing the block signature to any known signature. A backup of that block is taken if the signature does not match. Backup is deemed unnecessary if the signatures do match.

Sub-file level backup had been deliberately overlooked by the established storage companies for the following reasons:

1. A small, but measurable risk exists that signature values may collide resulting in a changed block being falsely identified as having not changed. Resulting in the unrecoverable corruption of any that backup and any future sub-file version. 2. The inevitable bottleneck that results from restore pipelines which are provisioned on the basis of backup and not restore. 3. The realization that data input would increasingly come from devices other than keyboards. This data being already preprocessed and not containing small block level differences. 4. That as data volumes increased, the cost of assembling blocks from greater numbers of files, which themselves had greater size was not viably scalable. As a result sub-file level backup has generally failed as a mainstream business based solution that can adequately address the requirements of High Availability, Backup and Archive in the form of one magic bullet. The time it has taken to come to this realization however, has had a significant disruptive effect on the evolution of storage management. This disruption has both stifled the investment in development and deployment of new management solutions and constrained the maintenance of solutions that were already in place. THE NEW REALIZATION Enterprises around the world are now faced with the realization that sub-file level disk to disk backup will not deliver on its promises, at the same time that they are facing a whole new set of challenges; challenges that nobody saw coming and few have prepared for. These challenges include: 1. A marketplace where their competitors compete on the basis of their technology. 2. The increasing demands of compliance and litigation. 3. A future workforce who will expect to work with new tools that are as well designed and easy to understand as Facebook. 4. The realization that information is no longer a byproduct of knowledge and that in many cases it is the product. 5. The entry of states and terrorist organizations into the world of computer hacking that has resulted in new undetectable viruses written specifically for enterprise level computers and their microcode. At the same time, many companies who sell backup and information management services such as Iron Mountain3, Oracle and EMC are abandoning their sub-file level backup interests and returning to their core competency of tape backup.
3

Computer World April 27 2011 http://tinyurl.com/62k2lda

MEETING THE CURRENT CHALLENGES AND THE CHALLENGES OF THE FUTURE At GazillaByte we have never believed sub-file level backup would meet the enterprise level requirements of the day, or the requirements of the future. As a result we have continued to develop technologies that are based on the belief that storage management is best separated into its respective towers. It wasnt that hard to work out, after all, we all know that shampoo and conditioner do as better job that a 2-in-one; so what chances does a 3-in-one solution stand? The reality is that tape is here for the foreseeable future and in all probability will outlast magnetic disk. As long as you have tape you need to know: 1. How many tapes you have (Asset Management). 2. Where those tapes have been and who has come in contact with them (Chain of Custody). 3. Where those tapes need to go (Library Management). 4. Which tapes are required in the event of a disaster and that they are offsite (Disaster Recovery Management). 5. That the processes you have in place are working (Quality Control) We call this the 5 Pillars of Tape Management; its what weve been working on while others have been trying to get sub-file level backup to solve all of your problems. Weve written our TapeTrack software around these 5 Pillars and weve put a lot of our energies into making sure you can implement our software into your environment quickly and economically to catch up with todays challenges and meet the challenges that are headed your way in the future.

ABOUT GAZILLABYTE GazillaByte LLC is based in Colorado USA where it develops and supports its flagship TapeTrack tape management software. Today TapeTrack is used by over 4000 enterprises around the world. These companies range from the top of the Fortune 500 through to newly created technology companies that you are yet to hear of. To learn more about TapeTrack, visit the product website at www.tapetrack.com, or call GazillaByte LLC on +1-720-583-8880 to organize a free 90 day no obligation trial of our unique technology.

You might also like