Rose Business Technologies: Primary Storage Data Reduction

Rose Business Technologies
Clarity - Simplicity - Productivity
Primary Storage Data Reduction

Data reduction on primary storage is a reality today and with the unchecked growth of data, it will undoubtedly become a key part of storage efficiency. Standard in many backup and archival products, data reduction is now becoming more prevalent for primary storage. The main drivers for this phenomenon are measurable cost savings from having to buy fewer disks and reducing annual support fees, to lowering operational expenses related to storage management. Data reduction may also have a pleasant impact on data storage performance: by not having inactive data occupy valuable high-performance storage, overall storage and application performance may get a welcome performance boost. In a typical enterprise, according to Storage Networking Industry Association (SNIA) research, 80% of files stored on primary storage havent been accessed in the last 30 days; the same report asserts that inactive data grows at more than four times the rate of active data. With these facts in mind, its no surprise that data reduction techniques have been making their way into primary storage. But in contrast to data reduction methods for backup and archiving, primary storage systems cant tolerate even a little impact on performance and reliability, the two most relevant attributes of primary storage systems. As a result, data reduction techniques vary and have different relevance on primary storage than they do in storage used for backup and archival. On backup and archival systems, deduplication and compression are the primary data reduction methods, but for primary storage those techniques are clearly second to more subtle and proven approaches that dont hinder performance as dedupe and compression can.
2011 Rose Business Technologies, LLC. All Rights Reserved.
Contact: http://www.rosebt.com/contact.html

These are the main data reduction techniques that are being applied on primary storage systems: Choosing the right RAID level Thin provisioning Efficient clones Automated storage tiering Deduplication Compression Putting choosing the appropriate RAID level at the top of a list of data reduction techniques may seem strange at first, but unlike other data reduction approaches, its the only option available on all storage systems and it greatly impacts disk requirements, performance and reliability. Were it not for its detrimental reliability shortcoming, RAID 0 (block-level striping across all disks without parity or mirroring) would be the most cost-efficient and best performing option, but losing the whole RAID group with the loss of a single drive makes it a no-go in the data center. RAID 1 (mirroring without parity or striping) and RAID 10 (mirrored drives in a striped set), on the other hand, combine good performance and high reliability but require twice the disk capacity and are therefore the antithesis of data reduction. RAID 5 (block-level striping with distributed parity) with its requirement for a single additional drive has been the best compromise in recent years, but as disks increased in size and rebuild times grew longer, the risk of losing two drives while the RAID is rebuilt after a drive failure has increased to an uncomfortable if not unacceptable level. As a result, storage vendors have been implementing RAID 6, which extends RAID 5 by adding an additional parity block and drive, enabling it to withstand two concurrent drive failures without data lossbut it comes with a varying performance penalty, depending on implementation. RAID 6 and a RAID 6 performance benchmark should be on anyones evaluation list then shopping for a new storage system. Until recently, there wasnt a real alternative to over-provisioning allocated storage and, as a result, storage utilization has been dismal. Its not unusual for companies to have hundreds of gigabytes of over-provisioned and unused storage in their data centers. Thin provisioning technologies can help put an end to this profligate management of storage resources by allowing storage to be assigned to users and servers beyond actual available physical capacity. Storage is allocated to thin-provisioned volumes on an as-needed basis. For instance, thin provisioning enables allocation of a 100 GB volume even though it may only have 10 GB of physical storage assigned. Thin provisioning is transparent to users, who will see a full 100 GB volume. The cost savings of thin provisioning can be tremendous and enables storage utilization beyond 90%.
2011 Rose Business Technologies, LLC. All Rights Reserved. Contact: http://www.rosebt.com/contact.html

The number of vendors that support thin provisioning is growing quickly, and it should be one of the key criteria when selecting a storage system. Keep in mind, though, that not all thin provisioning implementations are equal. While some systems require setting aside areas that can be thin provisioned, in others all capacity is available for thin provisioning without the need for special reservation. The ability to convert regular thick volumes into thin volumes, how unused storage is recovered and the way thin provisioning is licensed are other areas of differentiation. With more storage provisioned than physically present, running out of physical

storage is an ever-present risk in thinly provisioned environments. Therefore, alerts, notifications and storage analytics are essential features that play an even greater role in thinly provisioned environments than they do in traditionally provisioned storage. Cloning is used to create an identical copy of an existing volume, and it has become more relevant with server virtualization where its frequently used to clone virtualized OS volumes. The most basic and still predominant implementation of a clone is creating a full copy of the source volume, with the cloned volume allocating the same amount of physical storage as the source volume. The next level up is the ability to clone thinly provisioned volumes. While some storage systems turn thinly provisioned volumes into thick volumes during cloning, others can create a copy of a thinly provisioned volume where the thinly provisioned source volume and cloned volume allocate the same amount of physical storage. The most efficient clones are thin clones, where a cloned volume holds no data at all, but instead references blocks on the source image. Thin clones only have to store differences between the original image and the cloned image, resulting in huge disk space savings. In other words, a fresh clone requires minimal physical disk space and only as clones change do differences from the original image need to be stored. NetApps FlexClone and the cloning feature in the Oracle ZFS Storage Appliance (Sun ZFS Storage 7000 series) are examples of storage systems that support thin clones today. Automated storage tiering is another mechanism for reducing data on primary storage. An arrays ability to keep active data on fast, expensive storage and to move inactive data to lessexpensive slower tiers allows you to limit the amount of expensive tier-1 storage. The importance of automatic storage tiering has increased with the adoption of solid-state storage in contemporary arrays and with the advent of cloud storage to supplement on-premises storage. Automated storage tiering enables users to keep data on appropriate storage tiers, thereby reducing the amount of premium storage needed and enabling substantial cost savings and performance improvements. There are a couple of key features to look for in automated storage tiering: The more granular the data that can be moved from one tier to another, the more efficiently expensive premium storage can be used. Sub-volumelevel tiering where blocks of data can be relocated rather than complete volumes, and byte-level rather than file-level tiering, are

preferable. The inner workings of the rules that govern data movement between tiers will determine the effort required to put automated tiering in place. Some systems, like EMCs Fully Automated Storage Tiering (FAST), depend on policies that define when to move data and what tiers to move it to. Conversely, NetApp and Oracle (in the Sun ZFS Storage 7000 series) advocate that the storage system should be smart enough to automatically keep data on the appropriate tier without requiring user-defined policies. Well established in the backup and archival space, data deduplication is gradually finding its way into primary storage. The main challenge that has slowed adoption of deduplication in primary storage is performance. NetApp offers a deduplication option for all its systems, and it can be activated on a per-volume basis. NetApps deduplication isnt executed in realtime though. Instead, its performed using a scheduled process, generally during off hours, that scans for duplicate 4 KB blocks and replaces them with a reference to the unique block. Instead of generating a unique hash for each 4 KB block, NetApp uses the blocks existing checksum to identify duplicate blocks. To prevent hash collisions, which happen if non-identical blocks share the same checksum (hash), NetApp does a block-level comparison of the data in the blocks and only deduplicates those that match. NetApps deduplication is currently performed by individual volumes or LUNs and doesnt span across them. Similar to NetApp, Oracle features block-level deduplication in its Sun ZFS Storage 7000 series systems. But unlike NetApp, dedupe is performed in realtime while data is written to disk. Among smaller players, BridgeSTOR LLC, with its application-optimized storage (AOS), supports deduplication. Another vendor apparently committed to data reduction is Dell Inc. With the acquisition of Ocarina Networks in 2010, Dell picked up content-aware deduplication and compression technology, which it intends to incorporate into all its storage systems. While the aforementioned companies developed or acquired data deduplication technology, Permabit Technology Corp. has developed Albireo, a dedupe software library it intends to license to storage vendors, enabling them to add deduplication to storage systems with the advantage of time to market and without the risk inherent in developing it themselves.

Compression shares many of the challenges of deduplication in primary storage. Like deduplication, compression has a performance overhead; its limited to a volume and whenever data is moved out of that volume, it has to be decompressed, just like deduplicated data has to be deduped when moved from one volume to another. In an ideal world, different tiers, including backup and archival tiers, should be able to accept and deal with compressed and deduplicated data, but because of a lack of standards, they usually dont. Compression and deduplication are complementary technologies and vendors that implement deduplication usually also offer compressionBridgeSTOR, Dell, NetApp and Sun all do. While deduplication is usually more efficient for virtual server volumes, email attachments, files and backup environments, compression yields better results with random data, such as databases. In other words, deduplication outperforms compression where the likelihood of repetitive data is high. In addition to the above vendors, EMC Corp. offers compression in its VNX Unified storage products and with the single-instance storage feature for filebased content, which enables storing single copies of identical files, it does offer some level of deduplication. IBM offers its Real-time Compression Appliances (STN6500 and STN6800) to front-end NAS storage; the appliances and the compression technology came to IBM via its 2010 Storwize acquisition. Data reduction features like RAID 6, thin provisioning, efficient clones and automated storage tiering are becoming must-haves and should be on anyones feature list when evaluating a primary storage system. Data deduplication and compression, on the other hand, are emerging technologies that will become more pervasive over time, but right now these relative newcomers are just beginning to have an effect on primary storage. Archiving: Quick data reduction on primary storage The simplest method of regaining valuable space on primary storage is through archiving. Companies, like individuals, have a tendency to keep too much stuff. Businesses keep reams of data on primary storage for the unlikely event it might be needed one day. Archiving can be as simple as relocating data to archival storage and restoring it back to primary storage when neededat zero cost.

Those who want to automate the process of moving data into archival storage and restoring it to primary storage can use products like Symantec Corp. Enterprise Vault or Waterford Technologies archival products that can leave stubs (references) to archived data on primary storage that conceal the location of files from users. The archival product will automatically pull data referenced by stubs back into primary storage when accessed, fully transparent to users.

Rose Business Technologies: Primary Storage Data Reduction

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Rose Business Technologies: Primary Storage Data Reduction

Uploaded by

Copyright:

Available Formats

Rose Business Technologies

Clarity - Simplicity - Productivity

Primary Storage Data Reduction

2011 Rose Business Technologies, LLC. All Rights Reserved.