Professional Documents
Culture Documents
4imprint.com
1 Angry Employee Deletes All of Companys Data | Fox News.Fox News. FOX News Network, 24 Jan. 2008. Web. 14 Aug. 2012. <http://www.foxnews.com/story/0,2933,325285,00.html>. 2 Papadimoulis, Alex. Death by Delete.Redmond Developer News. 1105 Media Inc., 1 Jan. 2009. Web. 14 Aug. 2012. <http://reddevnews.com/articles/2009/01/01/death-by-delete.aspx>.
2012 4imprint, Inc. All rights reserved
to ask when planning data recovery. Finally, well close with a handful of device definitions on the most common kinds of data backup software and systems. Lets begin!
2000s: In February 2000, the cost of storing 1GB of data dropped again to $19.709another 99.78 percent decrease from the previous decade. In addition to USBs storing between 8MB-256GB, optical formats now included BluRay discs with 25GB storage capacity.10 In July 2009, it cost only $0.07 to house 1GB of data.11 Thats when big data truly began to flourish due to the relatively low cost to house expansive databases. Now that weve had a brief history lesson on the evolutionary cost of data, lets talk about your data and what you need to know to store it safely.
1. etadata standards and data provenance M Metadata provides structured information explaining such details as the purpose, origin, geographic location, access conditions, and terms of use of a data collection. To put this into context, files without metadata are like a library without a card catalogue. Here are a few questions worth considering when setting your metadata plan in motion:12 Which metadata standards will you use? Why have you chosen them? How will you record these details? What information is needed to make the data you collect meaningful to others? Likewise, what information do you need to make that data reusable?
2. rovisions for privacy, confidentiality and licensing P You should first explain how and when the data will become available. If there is an embargo period for sharing the data, make sure you provide details explaining the delay. If the data is sensitive in natureif, for example, it contains health-related privacy issues or competitive analysis insightand public access is inappropriate, address the means by which you plan to control access. For instance: Who will hold the intellectual property rights to the data? How long will the original data creator/principal investigator retain the right to use the data before making it available for wider distribution? Are there any embargo periods for political or commercial patent reasons? If so, what are the details? Describe any permission restrictions that will need to be placed on the data. Are there ethical or privacy issues? If so, how will these be resolved?
12 Higgins, Sarah. What Are Metadata Standards.What Are Metadata Standards | Digital Curation Centre. Digital Curation Centre, n.d. Web. 22 Aug. 2012. <http://www.dcc.ac.uk/resources/briefing-papers/standards-watch-papers/what-are-metadata-standards>.
2012 4imprint, Inc. All rights reserved
If you have approval from the U.S. Department of Health and Human Services (HHS) Institutional Review Board (IRB), or are in the process of applying for it, how will you comply with those obligations? 3. olicies for data access during and after your project P Think about how you prepare and manage your data for sharing and explain how you will actively share your data with non-group members after the project is complete. You should explain how and where the data will be accessible as well as identify who will be allowed to use it, how they will be allowed to utilize it and whether or not they will be allowed to disseminate it. Think about some of these questions: Will your data be accessible? How will you make it available? Include resources like necessary equipment and systems needed to do that. What is its intended use? Who are its intended users? If permission restrictions exist, what is the process for gaining access to the data? Explain how you will store data during the projects lifetime. How you will archive that data? If applicable, how will you transfer or transmit that data? 4. lans for archiving and preservation P To archive data is to move less important information from an active storage device to a less-used storage device for basic retention purposes. This eases the capacity and enhances the performance of the first, more active device. In terms of data archival, there are many subject-specific data repositories, all of which could serve as an archiving option for your data. But first, ask: How long should data be kept beyond the life of the project? What data will be preserved in the long-term? Which database have you identified as a place to deposit the data?
2012 4imprint, Inc. All rights reserved
What is the long-term strategy for maintaining and curating your data? What procedures does your intended long-term data storage facility have in place for preservation and backup? Are there any conversions necessary to prepare data for preservation or data sharing? What you save and how you save it are directly linked. So be sure to have a solid understanding of the kinds of files and documents and information formats are saved on your computer or server. That way, youll know just what it will take to properly save and store your data.
1. Hardware failure - 40% 2. Human error - 29% 3. Software corruption - 13% 4. Theft - 9% 5. Computer viruses - 6% 6. Hardware destruction - 3% Whether its hardware failure or human error, failure happens. Unfortunately, lost data cannot be saved by implementing a backup system after its gone. Plan appropriately because data backup failure is not uncommon.
13 Backing up Data - Why You Need to Do It.Backing up Data - Why You Need to Do It | PC 911. PC911, 28 Feb. 2011. Web. 17 Sept. 2012. <http://pcnineoneone.com/howto/backup1/>. 14 Smith, David M. Graziadio Business Review | Graziadio School of Business and Management | Pepperdine University.The Cost of Lost Data - Graziado Business Review | Graziado School of Business and Management | Pepperdine University. Pepperdine University, 2003. Web. 17 Sept. 2012. <http://gbr.pepperdine.edu/2010/08/the-cost-of-lost-data/>.
So what happens to your business when your data backup fails? Well, in the same study by Pepperdine University, a company that experiences a computer outage lasting for more than 10 days will never fully recover financially.15 Worse still is that half of companies that endure such a dilemma will likely be out of business within five years. Hard to believe? Well, computer-stored data, though intangible, is worth a great deal. Value of data lost is determined by its primary utility and frequency of use, both of which are specific to the business that lost it. Take a moment to think about the price of your data. To do that, you might first think of what capabilities you would lose if you lost your data. A lot of them. Maybe even all of them. Could you function without them? Probably not.
1. Restore time objectives (RTO) refers to the amount of time your organization needs to recover from a data loss. Many organizations have multiple RTOs. For example, one RTO may specify how long before the major functions of the enterprise are back online while a second, longer RTO determines how long until everything is fully recovered.16 2. Restore point objectives (RPO) is the maximum length of time you can do without data. Or rather, how quickly do you want or need it restored? Like the RTO, the RPO is often assigned critical functions such as transaction processing. Having a short RPO means having less immediate functions and recovering to a point further back in time. It can be anywhere from a few seconds in the case of a sophisticated (and expensive) remote mirroring system, to several hours, or even several days for less critical data.
15 Ibid. 16 Cook, Rick. Set Disaster-recovery Objectives.Set Disaster-recovery Objectives. SearchStorage, n.d. Web. 22 Aug. 2012. <http://searchstorage.techtarget.com/tip/Set-disaster-recovery-objectives>.
3. Network recovery objective (NRO) is the time needed to recover network operations, specifically, how long before you appear recovered to your customers? It includes such jobs as establishing alternate communications links, reconfiguring Internet servers, setting alternate TCP/IP addresses and everything else to make the recovery transparent to customers, remote users and others. 4. Restore granularity objectives (RGO) refers to the level of objects that can be easily recovered (e.g. a file, email, directory, hard drive, full system image, etc.).
However you lose it, the majority of cases83 percentcan be recovered. Youve been warned, though: Recovery can be an expensive operation.17
Device definitions
Most sources available for data storage fail to recognize that in many organizations, not everyone responsible for IT is necessarily an IT professional. This is especially true for small businesses where most employees wear multiple hats. So when it comes to data storage, there are a handful of terms and device definitions to be familiar with in case data is lost and needs to be restored. Here are some basic storage hardware configurations to know: Remote mirroring systems18 One of the most basic tools for the purposes of data storage and backup is known as a remote mirroring system (See also: cloud storage.) As its name implies, it generates a mirror image of the data on one or more disks located locally or remotely. It functions in real time so as to provide the most current critical business data accessible via duplicate disks. Information stored on them can be used for substitution in case of an emergency or be used to facilitate data migration. Disk array A disk array is a kind of storage system that links multiple hard drives into one big drive. Disk arrays organize data into something called logical units (LU).19 To the client, these look like blocks. Small arrays with only a few disks can store eight LU while larger arrays with hundreds of disks can store thousands of LU.20
17 Ibid. 18 Larsen, Brian. Disk Mirroring - Local or Remote.Disk Mirroring - Local or Remote - InfoManagement Direct Article. InfoManagement Direct, 1 Dec. 2003. Web. 18 Sept. 2012. <http://www.information-management.com/infodirect/20031212/7861-1.html>. 19 What Is Disk Array?What Is Disk Array? - A Word Definition From the Webopedia Computer Dictionary. Webopedia, n.d. Web. 17 Sept. 2012. <http://www.webopedia.com/TERM/D/disk_array.html>. 20 Ibid.
2012 4imprint, Inc. All rights reserved
The most common kind of disk array is a Redundant Array of Independent Disks (RAID). The advantage of RAID backup lies in its name: Redundancy implies its ability to write and store data to multiple locations in case a file is damaged or stored in a bad cluster. If thats the case, it is instantaneously rewritten on another disk in the array, which increases overall storage performance.21 This kind of configuration is particularly useful for organizations with servers laden with multimedia-heavy data.22 In case youre unfamiliar with this term, perhaps you know it as a drive array or storage array, which generally mean magnetic or solid state disks. These are two or more disk drives built into a stand-alone unit, typically using some RAID configuration (seeRAID). However, optical drives (CD, DVD, etc.) also come in multi-drive units (seeoptical disc library). SeeSAN,NASandserver farm.23 Direct attached storage (DAS) Direct attached storage involves a direct connection to the server, either through the use of an internal server disk controller or an external storage subsystem.24 DAS systems are recognized for their ease of management, generally low operating costs and overall simplicity. However, one drawback of using DAS is that it creates information isolation, meaning that the information is inaccessible from other servers. Small businesses may see this as only slightly problematic whereas larger businesses, not being able to access data may become a serious problem. Network attached storage (NAS) As it implies, NAS is storage attached to the common network via Ethernet. It is essentially a file server that often integrates an optimized operating system dedicated to file sharing. This means that all processing is done locally at the clients request. Besides its reputation for easy installation, another major benefit to NAS is solving the compatibility issue with Microsofts Windows platform and UNIX, allowing file access without additional software. To give this acronym more context, Western Digitals WD Sentinel DX4000 is a prime example of a NAS device designed for small businesses. As with most devices, installation is as simple as plug and play, which initializes the automatic
21 RAID - Redundant Array of Independent Disks.What Is RAID (Redundant Array of Independent Disks)? A Webopedia.com IT DefinitionWebopedia. Webopedia, n.d. Web. 17 Sept. 2012. <http://www.webopedia.com/TERM/R/RAID.html>. 22 Kayne, R., and Niki Foster. What Are Disk Arrays?WiseGeek. Conjecture, 11 July 2012. Web. 17 Sept. 2012. <http://www.wisegeek.com/what-are-disk-arrays.htm>. 23 Encyclopedia.Disk Array Definition from PC Magazine Encyclopedia. PC Magazine, n.d. Web. 22 Aug. 2012. <http://www.pcmag.com/encyclopedia_term/0,1237,t=hard+disk+array&i=41489,00.asp>. 24 Parwar, Ashwin. Understanding Storage Basics - DAS-NAS-SAN.Understanding Storage Basics - DAS-NASSAN. WizIQ, n.d. Web. 22 Aug. 2012. <http://www.wiziq.com/tutorial/74910-Understanding-Storage-Basics-DAS-NAS-SAN>.
2012 4imprint, Inc. All rights reserved
system configuration. On the users end, setting user preferences is the final task. The major drawback for employing a NAS, however, is its performance. It provides file-level input/output (I/O) via traditional file shares, while DAS and SAN provide block-level I/O. If your eyes are already glazing over, youre not alone. When thinking of file vs. block access, lets look at it from another perspective: File sharing is like reading a classic novel. You have an in-depth view of the characters, the landscape and the plot. You can revisit each section and draw deeper conclusions. Conversely, block sharing is similar to the CliffsNotes versionyou still get useable information, albeit not as complete. Block data is suitable for images or other large files that are not altered often while file access is most appropriate for documents requiring change more regularly. Storage area network (SAN) Storage Area Networks are designed to be accessible by multiple servers, just as local area networks (LAN) connect a server to multiple computers.25 Unlike a DAS or NAS, all of which contain a single piece of hardware, SANs are built from multiple hardware components. These componentshubs, switches, bridges, Small Computer System Interface (SCSI)are typically connected by a Fibre Channel. If an Ethernet cable is like a straw pulling information off the network, a Fibre Channel is like an oil pipeline for information. These hardware components play a role in three areas: redundancy, speed and volume. Switches and hubs generally do the same thing. Like the post office, both process incoming informationor mail. Switches take that information and quickly deliver it to a specific locationor mailbox. Hubs, however, arent as discerning. Imagine a small apartment building where the mail is left in the lobby in bulk. Each tenant must sort the mail and determine what is addressed to them, creating a time-consuming redundancy in the analysis. Both have their advantages, but hubs operate best with small enterprises, whereas switches are for more data-intense operations. Referring back to what type of data is being produced by the organization will help determine which components will be most beneficial.26 From availability, reliability, scalability, performance, manageability, and return on information management, SANs have many advantages. 27
25 SAN.SAN (Storage Area Network) Definition. TechTerms.com, n.d. Web. 22 Aug. 2012. <http://www.techterms.com/definition/san>. 26 SAN Tutorial. Manhattan Skyline GmbH, n.d. Web. 11 Sept. 2012. <http://www.mskl.de/CONTENT/PDF/SAN_Tutorial.pdf>. 27 Storage Area Networks.AllSAN.com - All about Storage Area Network. AllSAN.com, n.d. Web. 22 Aug. 2012. <http://allsan.com/sanoverview.php3>.
2012 4imprint, Inc. All rights reserved
As we already stated, NAS operates with file level access, whereas DAS and SAN are block level, but there are several different types of high-speed interfaces used to determine SAN function. In fact, many SANs today use a combination of different interfaces. Currently, Fibre Channel serves as the de facto standard in most SANs. Fibre Channel is an industry-standard interconnect and highperformance serial I/O protocol that is media independent and supports simultaneous transfer of many different protocols. Additionally, SCSI interfaces are frequently used as sub-interfaces between internal components of SAN members, such as between raw storage disks and a RAID (redundant array of independent disks) controller. In an effort to illustrate a few ways to utilize a SAN and the benefits to be had lets take, for example, an insurance agency with two locations, each with two SANs: Location A has SAN 1 programmed to back up its internal operations each hour. On SAN 2, backup runs for Location B. Location B mirrors this set up. If the first SAN in location 2 fails, a simple DNS reroute will restore operations within moments rather than risking several days of downtime while IT tries to remedy the situation. In a simplified example, a big box retail chain stores their inventory on Server A and their transactions on Server B. With the SAN, the sales agent can call upon both servers to analyze the supply on Server A and demand on Server B, all in real time and directly from his personal computer. While all of the aforementioned systems provide backup, various backup software work better with a SANs. Imagine this scenario: In a drive-thru, you order a cheese burger and you pull around to the window, where they provide you with your order. If your order is correct and timely, do you think twice about the process that occurred inside? Probably not. The same theory applies to basic data storage systems.
Conclusion
Data storage and backup are complex issues, but they are also critically important. As you explore storage options for your companys valuable data, keep these helpful guidelines in mind: Reevaluate your backup software annually. Ask yourself if it is still able to meet your needs. Organizations that do not monitor data storage are more likely to let crisis drive them toward an inefficient change.
Stay on top of your backup infrastructure. Use three simple rules: Match the class of software to the environment; keep your backup software up to date; and continue to enhance the architecture as your performance and capacity needs increase. Look closely at different vendors. When evaluating vendor offerings, look to how they are employing agentless backup, storage level snapshots, and APIs in the virtual infrastructure (such as VMware) for fast, low overhead and virtual infrastructure backup. Leverage capacity-based licensing. To this end, look to cost justification, better data management and storage tiers. Some argue that up to 70 percent of data subject to backup is unchanged and should not be in primary storage, but rather in an archive. Capacitybased licensing exposes the cost of backup by data volume, reducing the volume and thus the cost of backup. Capacity licensing should also incorporate some overhead for expected data growth. Even if backup doesnt seem like a pressing priority right now, youll want to prepare sooner rather than later because backup isnt important until it fails. As they say, Theres no time like the present.
4imprint serves more than 100,000 businesses with innovative promotional items throughout the United States, Canada, United Kingdom and Ireland. Its product offerings include giveaways, business gifts, personalized gifts, embroidered apparel, promotional pens, travel mugs, tote bags, water bottles, Post-it Notes, custom calendars, and many other promotional items. For additional information, log on to www.4imprint.com.
2012 4imprint, Inc. All rights reserved